## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

R

in

Networking

Vol. 2, No. 1 (2007) 1–133

c 2007 C. Fragouli and E. Soljanin

DOI: 10.1561/1300000003

Network Coding Fundamentals

Christina Fragouli

1

and Emina Soljanin

2

1

´

Ecole Polytechnique F´ed´erale de Lausanne (EPFL), Switzerland,

christina.fragouli@epﬂ.ch

2

Bell Laboratories, Alcatel-Lucent, USA, emina@research.bell-labs.com

Abstract

Network coding is an elegant and novel technique introduced at the turn

of the millennium to improve network throughput and performance. It

is expected to be a critical technology for networks of the future. This

tutorial addresses the ﬁrst most natural questions one would ask about

this new technique: how network coding works and what are its bene-

ﬁts, how network codes are designed and how much it costs to deploy

networks implementing such codes, and ﬁnally, whether there are meth-

ods to deal with cycles and delay that are present in all real networks.

A companion issue deals primarily with applications of network coding.

1

Introduction

Networked systems arise in various communication contexts such as

phone networks, the public Internet, peer-to-peer networks, ad-hoc

wireless networks, and sensor networks. Such systems are becoming

central to our way of life. During the past half a century, there has

been a signiﬁcant body of research eﬀort devoted to the operation and

management of networks. A pivotal, inherent premise behind the oper-

ation of all communication networks today lies in the way informa-

tion is treated. Whether it is packets in the Internet, or signals in a

phone network, if they originate from diﬀerent sources, they are trans-

ported much in the same manner as cars on a transportation network

of highways, or ﬂuids through a network of pipes. Namely, indepen-

dent information streams are kept separate. Today, routing, data stor-

age, error control, and generally all network functions operate on this

principle.

Only recently, with the advent of network coding, the simple but

important observation was made that in communication networks, we

can allow nodes to not only forward but also process the incoming inde-

pendent information ﬂows. At the network layer, for example, interme-

diate nodes can perform binary addition of independent bitstreams,

2

3

whereas, at the physical layer of optical networks, intermediate nodes

can superimpose incoming optical signals. In other words, data streams

that are independently produced and consumed do not necessarily need

to be kept separate when they are transported throughout the net-

work: there are ways to combine and later extract independent infor-

mation. Combining independent data streams allows to better tailor

the information ﬂow to the network environment and accommodate the

demands of speciﬁc traﬃc patterns. This shift in paradigm is expected

to revolutionize the way we manage, operate, and understand organi-

zation in networks, as well as to have a deep impact on a wide range of

areas such as reliable delivery, resource sharing, eﬃcient ﬂow control,

network monitoring, and security.

This new paradigm emerged at the turn of the millennium, and

immediately attracted a very signiﬁcant interest in both Electrical

Engineering and Computer Science research communities. This is an

idea whose time has come; the computational processing is becoming

cheaper according to Moore’s law, and therefore the bottleneck has

shifted to network bandwidth for support of ever-growing demand in

applications. Network coding utilizes cheap computational power to

dramatically increase network throughput. The interest in this area

continues to increase as we become aware of new applications of these

ideas in both the theory and practice of networks, and discover new

connections with many diverse areas (see Figure 1.1).

Throughout this tutorial we will discuss both theoretical results

as well as practical aspects of network coding. We do not claim to

exhaustively represent and reference all current work in network coding;

the presented subjects are the problems and areas that are closer to

our interests and oﬀer our perspective on the subject. However, we did

attempt the following goals:

(1) to oﬀer an introduction to basic concepts and results in net-

work coding, and

(2) to review the state of the art in a number of topics and point

out open research directions.

We start from the main theorem in network coding, and proceed to

discuss network code design techniques, beneﬁts, complexity require-

4 Introduction

Fig. 1.1 Connections with other disciplines.

ments, and methods to deal with cycles and delay. A companion volume

is concerned with application areas of network coding, which include

wireless and peer-to-peer networks.

In order to provide a meaningful selection of literature for the

novice reader, we reference a limited number of papers representing

the topics we cover. We refer a more interested reader to the webpage

www.networkcoding.info for a detailed literature listing. An excellent

tutorial focused on the information theoretic aspects of network coding

is provided in [49].

1.1 Introductory Examples

The following simple examples illustrate the basic concepts in net-

work coding and give a preliminary idea of expected beneﬁts and

challenges.

1.1.1 Beneﬁts

Network coding promises to oﬀer beneﬁts along very diverse dimensions

of communication networks, such as throughput, wireless resources,

security, complexity, and resilience to link failures.

1.1 Introductory Examples 5

Throughput

The ﬁrst demonstrated beneﬁts of network coding were in terms of

throughput when multicasting. We discuss throughput beneﬁts in

Chapter 4.

Example 1.1. Figure 1.2 depicts a communication network repre-

sented as a directed graph where vertices correspond to terminals and

edges correspond to channels. This example is commonly known in the

network coding literature as the butterﬂy network. Assume that we

have slotted time, and that through each channel we can send one bit

per time slot. We have two sources S

1

and S

2

, and two receivers R

1

and R

2

. Each source produces one bit per time slot which we denote

by x

1

and x

2

, respectively (unit rate sources).

If receiver R

1

uses all the network resources by itself, it could

receive both sources. Indeed, we could route the bit x

1

from source

S

1

along the path ¦AD¦ and the bit x

2

from source S

2

along the path

¦BC, CE, ED¦, as depicted in Figure 1.2(a). Similarly, if the second

receiver R

2

uses all the network resources by itself, it could also receive

both sources. We can route the bit x

1

from source S

1

along the path

¦AC, CE, EF¦, and the bit x

2

from source S

2

along the path ¦BF¦

as depicted in Figure 1.2(b).

Fig. 1.2 The Butterﬂy Network. Sources S

1

and S

2

multicast their information to receivers

R

1

and R

2

.

6 Introduction

Now assume that both receivers want to simultaneously receive the

information from both sources. That is, we are interested in multicast-

ing. We then have a “contention” for the use of edge CE, arising from

the fact that through this edge we can only send one bit per time slot.

However, we would like to simultaneously send bit x

1

to reach receiver

R

2

and bit x

2

to reach receiver R

1

.

Traditionally, information ﬂow was treated like ﬂuid through pipes,

and independent information ﬂows were kept separate. Applying this

approach we would have to make a decision at edge CE: either use it

to send bit x

1

, or use it to send bit x

2

. If for example we decide to

send bit x

1

, then receiver R

1

will only receive x

1

, while receiver R

2

will

receive both x

1

and x

2

.

The simple but important observation made in the seminal work by

Ahlswede et al. is that we can allow intermediate nodes in the network

to process their incoming information streams, and not just forward

them. In particular, node C can take bits x

1

and x

2

and xor them to

create a third bit x

3

= x

1

+ x

2

which it can then send through edge

CE (the xor operation corresponds to addition over the binary ﬁeld).

R

1

receives ¦x

1

, x

1

+ x

2

¦, and can solve this system of equations to

retrieve x

1

and x

2

. Similarly, R

2

receives ¦x

2

, x

1

+ x

2

¦, and can solve

this system of equations to retrieve x

1

and x

2

.

The previous example shows that if we allow intermediate node in the

network to combine information streams and extract the information

at the receivers, we can increase the throughput when multicasting.

This observation is generalized to the main theorem for multicasting in

Chapter 2.

Wireless Resources

In a wireless environment, network coding can be used to oﬀer beneﬁts

in terms of battery life, wireless bandwidth, and delay.

Example 1.2. Consider a wireless ad-hoc network, where devices A

and C would like to exchange the binary ﬁles x

1

and x

2

using device

B as a relay. We assume that time is slotted, and that a device can

1.1 Introductory Examples 7

Fig. 1.3 Nodes A and C exchange information via relay B. The network coding approach

uses one broadcast transmission less.

either transmit or receive a ﬁle during a timeslot (half-duplex commu-

nication). Figure 1.3 depicts on the left the standard approach: nodes

A and C send their ﬁles to the relay B, who in turn forwards each ﬁle

to the corresponding destination.

The network coding approach takes advantage of the natural capa-

bility of wireless channels for broadcasting to give beneﬁts in terms of

resource utilization, as illustrated in Figure 1.3. In particular, node C

receives both ﬁles x

1

and x

2

, and bitwise xors them to create the ﬁle

x

1

+ x

2

, which it then broadcasts to both receivers using a common

transmission. Node A has x

1

and can thus decode x

2

. Node C has x

2

and can thus decode x

1

.

This approach oﬀers beneﬁts in terms of energy eﬃciency (node B

transmits once instead of twice), delay (the transmission is concluded

after three instead of four timeslots), wireless bandwidth (the wireless

channel is occupied for a smaller amount of time), and interference

(if there are other wireless nodes attempting to communicate in the

neighborhood).

The beneﬁts in the previous example arise from that broadcast trans-

missions are made maximally useful to all their receivers. Network cod-

ing for wireless is examined in the second part of this review. As we will

8 Introduction

discuss there, x

1

+ x

2

is nothing but some type of binning or hashing

for the pair (x

1

, x

2

) that the relay needs to transmit. Binning is not

a new idea in wireless communications. The new element is that we

can eﬃciently implement such ideas in practice, using simple algebraic

operations.

Security

Sending linear combinations of packets instead of uncoded data oﬀers

a natural way to take advantage of multipath diversity for security

against wiretapping attacks. Thus systems that only require protection

against such simple attacks, can get it “for free” without additional

security mechanisms.

Example 1.3. Consider node A that sends information to node D

through two paths ABD and ACD in Figure 1.4. Assume that an

adversary (Calvin) can wiretap a single path, and does not have access

to the complementary path. If the independent symbols x

1

and x

2

are sent uncoded, Calvin can intercept one of them. If instead linear

combinations (over some ﬁnite ﬁeld) of the symbols are sent through

the diﬀerent routes, Calvin cannot decode any part of the data. If for

example he retrieves x

1

+ x

2

, the probability of his guessing correctly

x

1

equals 50%, the same as random guessing.

Similar ideas can also help to identify malicious traﬃc and to protect

against Byzantine attacks, as we will discuss in the second part of this

review.

Fig. 1.4 Mixing information streams oﬀers a natural protection against wiretapping.

1.1 Introductory Examples 9

1.1.2 Challenges

The deployment of network coding is challenged by a number of issues

that will also be discussed in more detail throughout the review. Here

we brieﬂy outline some major concerns.

Complexity

Employing network coding requires nodes in the network to have addi-

tional functionalities.

Example 1.4. In Example 1.2, Figure 1.3, node B has additional

memory requirements (needs to store ﬁle x

1

instead of immediately

broadcasting it), and has to perform operations over ﬁnite ﬁelds

(bitwise xor x

1

and x

2

). Moreover, nodes A and C need to also

keep their own information stored, and then solve a system of linear

equations.

An important question in network coding research today is assessing the

complexity requirements of network coding, and investigating trade-

oﬀs between complexity and performance. We discuss such questions

in Chapter 7.

Security

Networks where security is an important requirement, such as net-

works for banking transactions, need to guarantee protection against

sophisticated attacks. The current mechanisms in place are designed

around the assumption that the only eligible entities to tamper with

the data are the source and the destination. Network coding on the

other hand requires intermediate routers to perform operations on

the data packets. Thus deployment of network coding in such net-

works would require to put in place mechanisms that allow network

coding operations without aﬀecting the authenticity of the data. Ini-

tial eﬀorts toward this goal are discussed in the second part of the

review.

10 Introduction

Integration with Existing Infrastructure

As communication networks evolve toward an ubiquitous infrastruc-

ture, a challenging task is to incorporate the emerging technologies

such as network coding, into the existing network architecture. Ideally,

we would like to be able to proﬁt from the leveraged functionalities

network coding can oﬀer, without incurring dramatic changes in the

existing equipment and software. A related open question is, how could

network coding be integrated in current networking protocols. Making

this possible is also an area of current research.

2

The Main Theorem of Network Multicast

Network multicast refers to simultaneously transmitting the same infor-

mation to multiple receivers in the network. We are concerned with suf-

ﬁcient and necessary conditions that the network has to satisfy to be

able to support the multicast at a certain rate. For the case of unicast

(when only one receiver at the time uses the network), such conditions

have been known for the past 50 years, and, clearly, we must require

that they hold for each receiver participating in the multicast. The

fascinating fact that the main network coding theorem brings is that

the conditions necessary and suﬃcient for unicast at a certain rate to

each receiver are also necessary and suﬃcient for multicast at the same

rate, provided the intermediate network nodes are allowed to combine

and process diﬀerent information streams. We start this chapter with

a suitable single unicast scenario study, and then move to multicast.

We use an algebraic approach to present and prove the network mul-

ticast theorem, and in the process, introduce basic notions in network

coding.

11

12 The Main Theorem of Network Multicast

2.1 The Min-Cut Max-Flow Theorem

Let G = (V, E) be a graph (network) with the set of vertices V and the

set of edges E ⊂ V V . We assume that each edge has unit capacity,

and allow parallel edges. Consider a node S ∈ V that wants to transmit

information to a node R ∈ V .

Deﬁnition 2.1. A cut between S and R is a set of graph edges whose

removal disconnects S from R. A min-cut is a cut with the smallest

(minimal) value. The value of the cut is the sum of the capacities of

the edges in the cut.

For unit capacity edges, the value of a cut equals the number of edges

in the cut, and it is sometimes referred to as the size of the cut. We will

use the term min-cut to refer to both the set of edges and to their total

number. Note that there exists a unique min-cut value, and possibly

several min-cuts, see for example Figure 2.1.

One can think about a min-cut as being a bottleneck for information

transmission between source S and receiver R. Indeed, the celebrated

max-ﬂow, min-cut theorem, which we discuss below, claims that the

maximum information rate we can send from S to R is equal to the

min-cut value.

Fig. 2.1 A unicast connection over a network with unit capacity edges. The min-cut between

S and R equals three. There exist three edge disjoint paths between S and R that bring

symbols x

1

, x

2

, and x

3

to the receiver.

2.1 The Min-Cut Max-Flow Theorem 13

Theorem 2.1. Consider a graph G = (V, E) with unit capacity edges,

a source vertex S, and a receiver vertex R. If the min-cut between S

and R equals h, then the information can be send from S to R at a

maximum rate of h. Equivalently, there exist exactly h edge-disjoint

paths between S and R.

We next give a constructive proof that, under the conditions of the

theorem, there exist exactly h edge-disjoint paths between S and R.

Since these paths are made up of unit capacity edges, information can

be sent through each path at unit rate, giving the cumulative rate h

over all paths. The procedure we present for constructing edge-disjoint

paths between S and R is a part of the ﬁrst step for all network code

design algorithms that we discuss in Chapter 5.

Proof. Assume that the min-cut value between S and R equals h.

Clearly, we cannot ﬁnd more than h edge disjoint paths, as otherwise

removing h edges would not disconnect the source from the receiver.

We prove the opposite direction using a “path-augmenting” algorithm.

The algorithm takes h steps; in each step we ﬁnd an additional unit

rate path from the source to the receiver.

Let p

e

uv

be an indicator variable associated with an edge e that

connects vertex u ∈ V to vertex v ∈ V . (There may exist multiple edges

that connect u and v.)

Step 0: Initially, set p

e

uv

= 0 for all edges e ∈ E.

Step 1: Find a path P

1

from S to R, P

1

= ¦v

1

0

= S, v

1

1

, v

1

2

, . . . , v

1

1

=

R¦, where

1

is the length of the path, and set p

e

v

1

i

v

1

i+1

= 1,

0 ≤ i <

1

, using one edge between every consecutive pair of

nodes. The notation p

e

v

1

i

v

1

i+1

= 1 indicates that the edge e has

been used in the direction from v

1

i

to v

1

i+1

.

Step k: (2 ≤ k ≤ h) Find a path P

k

= ¦v

k

0

= S, v

k

1

, v

k

2

, . . . , v

k

k

= R¦ of

length

k

so that the following condition holds:

There exists edge e between v

k

i

, v

k

i+1

, 0 ≤ i <

k

such that

p

e

v

k

i

v

k

i+1

= 0 or p

e

v

k

i+1

v

k

i

= 1. (2.1)

14 The Main Theorem of Network Multicast

Accordingly, set p

e

v

k

i

v

k

i+1

= 1 or p

e

v

k

i+1

v

k

i

= 0. This means that

each newly found path uses only edges which either have not

been used or have been used in the opposite direction by the

previous paths.

Note that each step of the algorithm increases the number of edge

disjoint paths that connect the source to the receiver by one, and thus

at the end of step k we have identiﬁed k edge disjoint paths.

To prove that the algorithm works, we need to prove that, at each

step k, with 1 ≤ k ≤ h, there will exist a path whose edges satisfy the

condition in (2.1). We will prove this claim by contradiction.

(1) Assume that the min-cut to the receiver is h, but at step

k ≤ h we cannot ﬁnd a path satisfying (2.1).

(2) We will recursively create a subset 1 of the vertices of V .

Initially 1 = ¦S¦. If for a vertex v ∈ V there exists an edge

connecting v and S that satisﬁes (2.1), include v to 1. Con-

tinue adding to 1 vertices v ∈ V such that, for some u ∈ 1,

there exists an edge between u and v that satisﬁes the con-

dition in (2.1) (namely, p

e

uv

= 0 or p

e

vu

= 1), until no more

vertices can be added.

(3) By the assumption in (1), we know that 1 does not con-

tain the receiver R, otherwise we would have found the

desired path. That is, the receiver belongs to 1 = V ¸ 1. Let

ϑ1 = ¦e[e = (u, v) ∈ E with u ∈ 1, v ∈ 1¦ denote the set of

all edges e that connect 1 with 1. These edges form a cut.

By the construction of 1, p

e

uv

= 1 and p

e

vu

= 0 for all edges

e ∈ ϑ1. But from (1),

e∈ϑV

p

e

uv

≤ k − 1, and thus there

exists a cut of value at most k − 1 < h, which contradicts

the premise of the theorem.

2.2 The Main Network Coding Theorem

We now consider a multicast scenario over a network G = (V, E)

where h unit rate sources S

1

, . . . , S

h

located on the same network

node S (source) simultaneously transmit information to N receivers

2.2 The Main Network Coding Theorem 15

R

1

, . . . , R

N

. We assume that G is an acyclic directed graph with unit

capacity edges, and that the value of the min-cut between the source

node and each of the receivers is h. For the moment, we also assume

zero delay, meaning that during each time slot all nodes simultaneously

receive all their inputs and send their outputs.

What we mean by unit capacity edges: The unit capacity edges

assumption models a transmission scenario in which time is slotted, and

during each time-slot we can reliably (with no errors) transmit through

each edge a symbol from some ﬁnite ﬁeld F

q

of size q. Accordingly,

each unit rate source S

i

emits σ

i

, 1 ≤ i ≤ h, which is an element of the

same ﬁeld F

q

. In practice, this model corresponds to the scenario in

which every edge can reliably carry one bit and each source produces

one bit per time-unit. Using an alphabet of size, say, q = 2

m

, simply

means that we send the information from the sources in packets of m

bits, with m time-units being deﬁned as one time-slot. The m bits are

treated as one symbol of F

q

and processed using operations over F

q

by the network nodes. We refer to such transmission mechanisms as

transmission schemes over F

q

. When we say that a scheme exists “over

a large enough ﬁnite ﬁeld F

q

,” we eﬀectively mean that there exists

“a large enough packet length.”

2.2.1 Statement of the Theorem

We state the main theorem in network coding as follows:

Theorem 2.2. Consider a directed acyclic graph G = (V, E) with unit

capacity edges, h unit rate sources located on the same vertex of the

graph and N receivers. Assume that the value of the min-cut to each

receiver is h. Then there exists a multicast transmission scheme over a

large enough ﬁnite ﬁeld F

q

, in which intermediate network nodes lin-

early combine their incoming information symbols over F

q

, that delivers

the information from the sources simultaneously to each receiver at a

rate equal to h.

From the min-cut max-ﬂow theorem, we know that there exist

exactly h edge-disjoint paths between the sources and each of the

16 The Main Theorem of Network Multicast

receivers. Thus, if any of the receivers, say, R

j

, is using the network

by itself, the information from the h sources can be routed to R

j

through a set of h edge disjoint paths. When multiple receivers are

using the network simultaneously, their sets of paths may overlap.

The conventional wisdom says that the receivers will then have to

share the network resources, (e.g., share the overlapping edge capac-

ity or share the access to the edge in time), which leads to reduced

rates. However, Theorem 2.2 tells us that, if we allow intermediate

network nodes to not only forward but also combine their incom-

ing information ﬂows, then each of the receivers will be getting the

information at the same rate as if it had sole access to network

resources.

The theorem additionally claims that it is suﬃcient for intermediate

nodes to perform linear operations, namely, additions and multiplica-

tions over a ﬁnite ﬁeld F

q

. We will refer to such transmission schemes

as linear network coding. Thus the theorem establishes the existence of

linear network codes over some large enough ﬁnite ﬁeld F

q

. To reduce

computational complexity, the ﬁeld F

q

should be chosen as small as

possible. In order to provide a very simple proof of Theorem 2.2, we

will not worry, for a moment, about being eﬃcient in terms of the ﬁeld

size. This question will be addressed in Chapter 7, where we discuss

resources required for network coding.

2.2.2 An Equivalent Algebraic Statement of the Theorem

In order to route the h information sources to a particular receiver, we

ﬁrst have to ﬁnd h edge-disjoint paths that connect the source node

to this receiver. We can do that by using, for example, the algorithm

given within the proof of the min-cut max-ﬂow theorem, which is in

fact a special case of the Ford–Fulkerson algorithm. In the multicast

case, we have to ﬁnd one set of such paths for each of the N receivers.

Note that the paths from diﬀerent sets may overlap. The example in

Figure 2.2 shows a network with two sources and three receivers. Each

subﬁgure shows a set of two edge disjoint paths for diﬀerent receivers.

Observe that the paths toward diﬀerent receivers overlap over edges

BD and GH.

2.2 The Main Network Coding Theorem 17

Fig. 2.2 The paths to the three receivers overlap for example over edges BD and GH.

If we were restricted to routing, when two paths bringing symbols σ

i

and σ

j

from sources S

i

and S

j

, i ,= j, overlap over a unit capacity edge,

we could forward only one of these two symbols (or timeshare between

them). In linear network coding, instead, we can transmit through the

shared edge a linear combination of σ

i

and σ

j

over F

q

. Such opera-

tions may be performed several times throughout the network, that is,

if paths bringing diﬀerent information symbols use the same edge e,

a linear combination of these symbols is transmitted through e. The

coeﬃcients used to form this linear combination constitute what we

call a local coding vector c

(e) for edge e.

Deﬁnition 2.2. The local coding vector c

**(e) associated with an edge e
**

is the vector of coeﬃcients over F

q

with which we multiply the incoming

symbols to edge e. The dimension of c

(e) is 1 [In(e)[, where In(e) is

the set of incoming edges to the parent node of e.

Since we do not know what values should the coeﬃcients in the local

coding vectors take, we assume that each used coeﬃcient is an unknown

variable, whose value will be determined later. For the example in

Figure 2.2, the linear combination of information is shown in Figure 2.3.

The local coding vectors associated with edges BD and GH are

c

(BD) = [α

1

α

2

] and c

(GH) = [α

3

α

4

].

18 The Main Theorem of Network Multicast

Fig. 2.3 The linear network coding solution sends over edges BD and GH linear combina-

tions of their incoming ﬂows.

We next discuss how this linear combining can enable the receivers to

get the information at rate h.

Note that, since we start from the source symbols and then at inter-

mediate nodes only perform linear combining of the incoming symbols,

through each edge of G ﬂows a linear combination of the source sym-

bols. Namely, the symbol ﬂowing through some edge e of G is given by

c

1

(e)σ

1

+ c

2

(e)σ

2

+ + c

h

(e)σ

h

=

_

c

1

(e) c

2

(e) c

h

(e)

¸

. ¸¸ .

c(e)

_

¸

¸

¸

_

σ

1

σ

2

.

.

.

σ

h

_

¸

¸

¸

_

,

where the vector c(e) = [c

1

(e) c

2

(e) . . . c

h

(e)] belongs to an h-

dimensional vector space over F

q

. We shall refer to the vector c(e)

as the global coding vector of edge e, or for simplicity as the coding

vector.

Deﬁnition 2.3. The global coding vector c(e) associated with an edge

e is the vector of coeﬃcients of the source symbols that ﬂow (linearly

combined) through edge e. The dimension of c(e) is 1 h.

2.2 The Main Network Coding Theorem 19

The coding vectors associated with the input edges of a receiver

node deﬁne a system of linear equations that the receiver can solve

to determine the source symbols. More precisely, consider receiver R

j

.

Let ρ

j

i

be the symbol on the last edge of the path (S

i

, R

j

), and A

j

the

matrix whose ith row is the coding vector of the last edge on the path

(S

i

, R

j

). Then the receiver R

j

has to solve the following system of linear

equations:

_

¸

¸

¸

¸

_

ρ

j

1

ρ

j

2

.

.

.

ρ

j

h

_

¸

¸

¸

¸

_

= A

j

_

¸

¸

¸

_

σ

1

σ

2

.

.

.

σ

h

_

¸

¸

¸

_

(2.2)

to retrieve the information symbols σ

i

, 1 ≤ i ≤ h, transmitted from the

h sources. Therefore, choosing global coding vectors so that all A

j

,

1 ≤ j ≤ N, are full rank will enable all receivers to recover the source

symbols from the information they receive. There is one more condition

that these vectors have to satisfy: the global coding vector of an output

edge of a node has to lie in the linear span of the coding vectors of the

node’s input edges. For example, in Figure 2.3, the coding vector c(GH)

is in the linear span of c(DG) and c(FG).

Alternatively, we can deal with local coding vectors. Clearly, given

all the local coding vectors for a network, we can compute global coding

vectors, and vice versa. The global coding vectors associated with edges

BD and GH in Figure 2.3 are

c(BD) = [α

1

α

2

] and c(GH) = [α

3

+ α

1

α

4

α

2

α

4

].

Consequently, matrices A

j

can be expressed in terms of the compo-

nents of the local coding vectors ¦α

k

¦. For our example in Figure 2.3,

the three receivers observe the linear combinations of source symbols

deﬁned by the matrices

A

1

=

_

1 0

α

3

+ α

1

α

4

α

2

α

4

_

, A

2

=

_

0 1

α

1

α

2

_

, and

A

3

=

_

α

1

α

2

α

3

+ α

1

α

4

α

2

α

4

_

.

20 The Main Theorem of Network Multicast

The network code design problem is to select values for the coeﬃcients

¦α

k

¦ = ¦α

1

, . . . , α

4

¦, so that all matrices A

j

, 1 ≤ j ≤ 3, are full rank.

The main multicast theorem can be expressed in algebraic language as

follows:

Algebraic Statement of the Main Theorem: In linear network

coding, there exist values in some large enough ﬁnite ﬁeld F

q

for the

components ¦α

k

¦ of the local coding vectors, such that all matrices A

j

,

1 ≤ j ≤ N, deﬁning the information that the receivers observe, are full

rank.

In the following section, we prove this equivalent claim of the main

coding theorem for multicast.

2.2.3 Proof of the Theorem

The requirement that all A

j

, 1 ≤ j ≤ N, be full rank is equivalent to

the requirement that the product of the determinants of A

j

be diﬀerent

from zero:

f(¦α

k

¦) det(A

1

)det(A

2

) det(A

N

) ,= 0, (2.3)

where f(¦α

k

¦) is used to denote this product as a function of ¦α

k

¦.

Note that, because we assumed that the graph G is acyclic, i.e.,

there are no feedback loops, and because we only perform linear oper-

ations at intermediate nodes of the network, each entry of matrix A

j

is a polynomial in ¦α

k

¦. We will also show this rigorously in the next

chapter. Therefore, the function f(¦α

k

¦) is also a polynomial in ¦α

k

¦

of ﬁnite total degree.

Next, we show that f(¦α

k

¦) is not identically equal to zero on some

ﬁeld. Since f(¦α

k

¦) is a product, it is suﬃcient to show that none of

its factors, namely, none of the detA

j

, is identically equal to zero. This

follows directly from the min-cut max-ﬂow theorem: simply ﬁnd h-edge

disjoint paths from the sources to receiver R

j

, and then use value one

for the coeﬃcients corresponding to the edges of these paths, and value

zero for the remaining coeﬃcients. This coeﬃcient assignment makes

A

j

to be the h h identity matrix.

2.2 The Main Network Coding Theorem 21

When dealing with polynomials over a ﬁnite ﬁeld, one has to keep

in mind that even a polynomial not identically equal to zero over that

ﬁeld can evaluate to zero on all of its elements. Consider, for example,

the polynomial x(x + 1) over F

2

, or the following set of matrices:

A

1

=

_

x

2

x(1 + x)

x(1 + x) x

2

_

and A

2

=

_

1 x

1 1

_

. (2.4)

Over F

2

, at least one of the determinants of these matrices is equal to

zero (namely, for x = 0, det(A

1

) = 0 and x = 1, det(A

2

) = 0).

The following lemma tells us that we can ﬁnd values for the coef-

ﬁcients ¦α

k

¦ over a large enough ﬁnite ﬁeld such that the condition

f(¦α

k

¦) ,= 0 in (2.3) holds, and thus concludes our proof. For exam-

ple, for the matrices in (2.4) we can use x = 2 over F

3

so that both

det(A

1

) ,= 0 and det(A

2

) ,= 0.

Lemma 2.3 (Sparse Zeros Lemma). Let f(α

1

, . . . , α

η

) be a mul-

tivariate polynomial in variables α

1

, . . . , α

η

, with maximum degree in

each variable of at most d. Then, in every ﬁnite ﬁeld F

q

of size q > d on

which f(α

1

, . . . , α

η

) is not identically equal to zero, there exist values

p

1

, . . . , p

η

, such that f(α

1

= p

1

, . . . , α

η

= p

η

) ,= 0.

Proof. We prove the lemma by induction in the number of variables η:

(1) For η = 1, we have a polynomial in a single variable of degree

at most d. The lemma holds immediately since such polyno-

mials can have at most d roots.

(2) For η > 1, the inductive hypothesis is that the claim holds

for all polynomials with fewer than η variables. We can

express f(α

1

, . . . , α

η

) as a polynomial in variable α

η

with coef-

ﬁcients ¦f

i

¦ that are polynomials in the remaining variables

α

1

, . . . , α

η−1

:

f(α

1

, . . . , α

η

) =

d

i=0

f

i

(α

1

, . . . , α

η−1

)α

i

η

. (2.5)

Consider a ﬁeld F

q

with q > d on which f(α

1

, . . . , α

η

) is not

identically equal to zero. Then at least one of the polyno-

mials ¦f

i

¦ is not identically equal to zero on F

q

, say f

j

.

22 The Main Theorem of Network Multicast

Since the number of variables in f

j

is smaller than η, by the

inductive hypothesis, there exist values α

1

= p

1

, . . . , α

η−1

=

p

η−1

such that f

j

(p

1

, . . . , p

η−1

) ,= 0. Substituting these val-

ues, f(p

1

, . . . , p

η−1

, α

η

) becomes a polynomial in the single

variable α

η

with degree of at most d and at least one coeﬃ-

cient diﬀerent than zero in F

q

. The lemma then holds from

the argument in (1).

Note that we have given a simple proof that network codes exist

under certain assumptions, rather than a constructive proof. A number

of questions naturally arise at this point, and we will discuss them in the

next session. In Chapter 3, we will learn how to calculate the parame-

terized matrices A

j

for a given graph and multicast conﬁguration, and

give an alternative information-theoretical proof of the main theorem.

In Chapter 5, we will look into the existing methods for network code

design, i.e., methods to eﬃciently choose the linear coeﬃcients.

2.2.4 Discussion

Theorem 2.2 claims the existence of network codes for multicast, and

immediately triggers a number of important questions. Can we con-

struct network codes eﬃciently? How large does the operating ﬁeld F

q

need to be? What are the complexity requirements of network cod-

ing? Do we expect to get signiﬁcant beneﬁts, i.e., is coding/decoding

worth the eﬀort? Do network codes exist for other types of network

scenarios, such as networks with noisy links, or wireless networks, or

traﬃc patterns other than multicast? What is the possible impact on

applications? We will address some of these issues in later chapters.

In the rest of this chapter, we check how restrictive are the modeling

assumptions we made, that is, whether the main theorem would hold

or could be easily extended if we removed or relaxed these assumptions.

Unit capacity edges – not restrictive, provided that the capacity of

the edges are rational numbers and we allow parallel edges.

Collocated sources – not restrictive. We can add to the graph an

artiﬁcial common source vertex, and connect it to the h source vertices

through unit capacity edges.

2.2 The Main Network Coding Theorem 23

Fig. 2.4 An undirected graph where the main theorem does not hold.

Directed graph – restrictive. The gain in network coding comes from

combining information carried by diﬀerent paths whenever they overlap

on an edge. However, in undirected graphs, diﬀerent paths may traverse

the same edge in opposite direction. This is the case in the example in

Figure 2.4 for edge CD. For this example, network coding does not oﬀer

any beneﬁts: we can achieve the maximal rate of 1.5 to each receiver by

time-sharing across edge CD. On the other hand, there exist undirected

graphs where use of network coding oﬀers beneﬁts. Such an example

is provided by the butterﬂy network in Figure 1.2, if we assume that

the edges of the network are undirected. Without network coding we

can achieve rate 1.5 to each receiver, while using network coding we can

achieve rate 2.

Zero delay – not restrictive. All practical networks introduce delay.

From a theoretical point of view, delay introduces a relationship

between network coding and convolutional codes. We discuss networks

with delay in Chapter 6.

Acyclic graph – not restrictive. Note that the underlying graph for

the example in Figure 2.2 contains the cycle ¦FG, GH, HF¦, and this

fact did not come up in our discussion. However, there exist network

conﬁgurations where we need to deal with the underlying cyclic graphs

by taking special action, for example by introducing memory, as we

discuss in Chapter 6.

Same min-cut values – restrictive. If the receivers have diﬀerent

min-cuts, we can multicast at the rate equal to the minimum of the

24 The Main Theorem of Network Multicast

min-cuts, but cannot always transmit to each receiver at the rate equal

to its min-cut. For example, consider again the butterﬂy network in

Figure 1.2 and assume that each vertex of the graph is a receiver.

We will then have receivers with min-cut two, and receivers with min-

cut one. Receivers with min-cut two require that linear combining is

performed somewhere in the network, while receivers with min-cut one

require that only routing is employed. These two requirements are not

always compatible.

Notes

The min-cut max-ﬂow theorem was proven in 1927 by Menger [40]. It

was also proven in 1956 by Ford and Fulkerson [22] and independently

the same year by Elias et al. [20]. There exist many diﬀerent variations

of this theorem, see for example [18]. The proof we gave follows the

approach in [8], and it is speciﬁc to unit capacity undirected graphs.

The proof for arbitrary capacity edges and/or directed graphs is a direct

extension of the same ideas.

The main theorem in network coding was proved by Ahlswede et al.

[2, 36] in a series of two papers in 2000. The ﬁrst origin of these ideas

can be traced back to Yeung’s work in 1995 [48]. Associating a random

variable with the unknown coeﬃcients when performing linear network

coding was proposed by Koetter and M´edard [32]. The sparse zeros

lemma is published by Yeung et al. in [49] and by Harvey in [25],

and is a variation of a result proved in 1980 by Schwartz [45]. The

proof approach of the main theorem also follows the proof approach

in [25, 49].

3

Theoretical Frameworks for Network Coding

Network coding can and has been studied within a number of diﬀerent

theoretical frameworks, in several research communities. The choice of

framework a researcher makes most frequently depends on his back-

ground and preferences. However, one may also argue that each net-

work coding issue (e.g., code design, throughput beneﬁts, complexity)

should be put in the framework in which it can be studied the most

naturally and eﬃciently.

We here present tools from the algebraic, combinatorial, information

theoretic, and linear programming frameworks, and point out some of

the important results obtained within each framework. At the end, we

discuss diﬀerent types of routing and coding. Multicasting is the only

traﬃc scenario examined in this chapter. However, the presented tools

and techniques can be extended to and are useful for studying other

more general types of traﬃc as well.

3.1 A Network Multicast Model

3.1.1 The Basic Model

A multicast scenario is described by a directed graph G = (V, E), a

source vertex S ∈ V (on which h unit rate sources S

i

are collocated),

25

26 Theoretical Frameworks for Network Coding

and a set ¹= ¦R

1

, R

2

, . . . , R

N

¦ of N receivers. We will refer to these

three ingredients together as a (multicast) instance ¦G, S, ¹¦. From

the main theorem on network coding (Theorem 2.2), we know that a

necessary and suﬃcient condition for feasibility of rate h multicast is

that the min-cut to each receiver be greater or equal to h. We call this

condition the multicast property for rate h.

Deﬁnition 3.1. An instance ¦G, S, ¹¦ satisﬁes the multicast property

for rate h if the min-cut value between S and each receiver is greater

than or equal to h.

Let ¦(S

i

, R

j

), 1 ≤ i ≤ h¦ be a set of h edge-disjoint paths from the

sources to the receiver R

j

. Under the assumption that the min-cut to

each receiver equals at least h, the existence of such paths is guaranteed

by the min-cut max-ﬂow theorem. The choice of the paths is not unique,

and, as we discuss later, aﬀects the complexity of the network code.

Our object of interest is the subgraph G

of G consisting of the hN

paths (S

i

, R

j

), 1 ≤ i ≤ h, 1 ≤ j ≤ N. Clearly, the instance ¦G

, S, ¹¦

also satisﬁes the multicast property.

As discussed in Chapter 2, in linear network coding, each node of G

receives an element of F

q

from each input edge, and then forwards (pos-

sibly diﬀerent) linear combinations of these symbols to its output edges.

The coeﬃcients deﬁning the linear combination on edge e are collected

in a local coding vector c

(e) of dimension 1 [In(e)[, where In(e) is

the set of edges incoming to the parent node of e (see Deﬁnition 2.2).

As a consequence of this local linear combining, each edge e carries a

linear combination of the source symbols, and the coeﬃcients describ-

ing this linear combination are collected in the h-dimensional global

coding vector c(e) = [c

1

(e) c

h

(e)] (see Deﬁnition 2.3). The symbol

through edge e is given by

c

1

(e)σ

1

+ + c

h

(e)σ

h

.

Receiver R

j

takes h coding vectors from its h input edges to form the

rows of the matrix A

j

, and solves the system of linear equations (2.2).

Network code design is concerned with assigning local coding vectors,

or equivalently, coding vectors, to each edge of the graph.

3.1 A Network Multicast Model 27

Deﬁnition 3.2. An assignment of coding vectors is feasible if the cod-

ing vector of an edge e lies in the linear span of the coding vectors

of the parent edges In(e). A valid linear network code is any feasible

assignment of coding vectors such that the matrix A

j

is full rank for

each receiver R

j

, 1 ≤ j ≤ N.

We have seen that for multicast networks satisfying the conditions of

the main network coding theorem, a valid linear network code exists

over some large enough ﬁeld. There are, however, some other network

traﬃc scenarios for which there are no valid linear network codes, and

then some for which there are neither linear nor nonlinear valid network

codes.

Deﬁnition 3.3. A network is solvable if there exist operations the

nodes of the network can perform so that each receiver experiences the

same maximum rate as if it were using all the network resources by

itself. A network is linearly solvable if these operations are linear.

All multicast instances are, therefore, linearly solvable.

Observe now that in linear network coding, we need to perform lin-

ear combining at edge e, only if there exist two or more paths that share

e but use distinct edges of In(e). We then say that edge e is a coding

point. In practice, these are the places in the network where we require

additional processing capabilities, as opposed to simple forwarding.

Deﬁnition 3.4. Coding points are the edges of the graph G

where we

need to perform network coding operations.

Finally, we will be interested in minimal graphs, namely, those that do

not have redundant edges.

Deﬁnition 3.5. A graph G is called minimal with the multicast prop-

erty if removing any edge would violate this property.

Identifying a minimal graph before multicasting may allow us to use

less network resources and reduce the number of coding points as the

following example illustrates.

28 Theoretical Frameworks for Network Coding

Example 3.1. Consider the network with two sources and two

receivers shown in Figure 3.1(a) that is created by putting one butter-

ﬂy network on the top of another. A choice of two sets of edge-disjoint

paths (corresponding to the two receivers) is shown in Figures 3.1(b)

and 3.1(c). This choice results in using two coding points, i.e., linearly

combining ﬂows at edges AB and CD. Notice, however, that nodes

A and B in Figure 3.1(a) and their incident edges can be removed

without aﬀecting the multicast condition. The resulting graph is then

minimal, has a single coding point, and is identical to the butterﬂy

network shown in Figure 1.2.

We can extend the same construction by stacking k butterﬂy networks

to create a non-minimal conﬁguration with k coding points. On the

other hand, as we prove in Section 3.3, minimal conﬁgurations with

two sources and two receivers have at most one coding point. Thus,

identifying minimal conﬁgurations can help to signiﬁcantly reduce the

number of coding points.

Identifying a minimal conﬁguration for an instance ¦G, S, ¹¦ can be

always done in polynomial time, as we discuss in Chapter 5. Contrary

Fig. 3.1 A network with two sources and two receivers: (a) the original graph, (b) two edge-

disjoint paths from the sources to the receiver R

1

, and (c) two edge-disjoint paths from the

sources to the receiver R

2

.

3.1 A Network Multicast Model 29

to that, identifying a minimum conﬁguration (that has the minimum

number of coding points) is in most cases an NP-hard problem, as we

discuss in Chapter 7.

3.1.2 The Line Graph Model

To deﬁne a network code, we eventually have to specify which linear

combination of source symbols each edge carries. Thus, it is often more

transparent to work with the graph

γ =

_

1≤i≤h

1≤j≤N

L(S

i

, R

j

), (3.1)

where L(S

i

, R

j

) denotes the line graph of the path (S

i

, R

j

), that is, each

vertex of L(S

i

, R

j

) represents an edge of (S

i

, R

j

), and any two vertices

of L(S

i

, R

j

) are adjacent if and only if their corresponding edges share

a common vertex in (S

i

, R

j

). Figure 3.2 shows the network with two

sources and three receivers which we studied in Chapter 2 together

with its line graph.

Fig. 3.2 A network with 2 sources and 3 receivers, and its line graph.

30 Theoretical Frameworks for Network Coding

Without loss of generality (by possibly introducing an auxiliary

node and h auxiliary edges), we can assume that the source vertex

S in graph G

**has exactly h outgoing edges, one corresponding to each
**

of the h co-located sources. As a result, the line graph contains a node

corresponding to each of the h sources. We refer to these nodes as

source nodes. In Figure 3.2, S

1

A and S

2

C are source nodes.

Deﬁnition 3.6. Source nodes ¦S

e

1

, S

e

2

, . . . , S

e

h

¦ are the nodes of γ cor-

responding to the h sources. Equivalently, they are the h edges in G

**emanating from the source vertex S.
**

Each node in γ with a single input edge merely forwards its input

symbol to its output edges. Each node with two or more input edges

performs a coding operation (linear combining) on its input symbols,

and forwards the result to all of its output edges. These nodes are the

coding points in Deﬁnition 3.4. Using the line graph notation makes

the deﬁnition of coding points transparent. In Figure 3.2(b), BD and

GH are coding points.

Finally, we refer to the node corresponding to the last edge of the

path (S

i

, R

j

) as the receiver node for receiver R

j

and source S

i

. For a

conﬁguration with h sources and N receivers, there exist hN receiver

nodes. In Figure 3.2(b), AF, HF, HK, DK, DE, and CE are receiver

nodes. Our deﬁnitions for feasible and valid network codes directly

translate for line graphs, as well as our deﬁnition for minimality.

Note that vertices corresponding to the edges of In(e) are parent

nodes of the vertex corresponding to e. The edges coming into the

node e are labeled by the coeﬃcients of local coding vector c

(e). Thus

designing a network code amounts to choosing the values ¦α

k

¦ for the

labels of edges in the line graph.

3.2 Algebraic Framework

One of the early approaches to network coding (and deﬁnitely the

one that galvanized the area) was algebraic. Because the approach

is very straightforward and requires little background, we adopted it

for explaining the fundamentals in Chapter 2. Randomized coding,

3.2 Algebraic Framework 31

believed to be of paramount importance for practical network coding,

has been developed within the algebraic framework.

This approach is particularly easy to explain using the notion of

the line graph in which each edge carries either label 1 or the unique

label corresponding to a variable in ¦α

k

¦. The main idea is to think of

each vertex of the line graph (edge in the original graph) as a memory

element that stores an intermediate information symbol. Then, if we

consider a speciﬁc receiver R

j

, our line graph acts as a linear system

with h inputs (the h sources) and h outputs (that the receiver observes),

and up to m := [E[ memory elements. This system is described by the

following set of ﬁnite-dimensional state-space equations:

s

k+1

= As

k

+ Bu

k

,

y

k

= C

j

s

k

+ D

j

u

k

,

(3.2)

where s

k

is the m 1 state vector, y

k

is the h 1 output vector, u

k

is

the h 1 input vector, and A, B, C

j

, and D

j

are matrices with appro-

priate dimensions which we discuss below. (The state space equations

(3.2) can also be interpreted as describing a convolutional code, as we

explain in Chapter 6.) A standard result in linear system theory gives

us the transfer matrix G

j

(T):

G

j

(T) = D

j

+ C

j

(T

−1

I − A)

−1

B, (3.3)

where T is the indeterminate delay operator. Using unit delay, we

obtain the transfer matrix for receiver R

j

:

A

j

= D

j

+ C

j

(I − A)

−1

B. (3.4)

In (3.2) matrix A is common for all receivers and reﬂects the way

the memory elements (states) are connected (network topology). Its

elements are indexed by the states (nodes of the line graph), and an

element of A is nonzero if and only if there is an edge in the line graph

between the indexing states. The nonzero elements equal either to 1

or to an unknown variable in ¦α

k

¦. Network code design amounts to

selecting values for the variable entries in A.

Matrix B is also common for all receivers and reﬂects the way the

inputs (sources) are connected to our graph. Matrices C

j

and D

j

,

32 Theoretical Frameworks for Network Coding

respectively, express how the outputs receiver R

j

observes depend upon

the state variables and the inputs. Matrices B, C

j

, and D

j

can be cho-

sen to be binary matrices by possibly introducing auxiliary vertices and

edges. Recall that the dimension of matrices A, B, C

j

, and D

j

depends

upon the number of edges in our graph, which, in general, can be very

large. In the next section, we discuss a method to signiﬁcantly reduce

this number.

By accordingly ordering the elements of the state space vector,

matrix A becomes strictly upper triangular for acyclic graphs, and

therefore, nilpotent (A

n

= 0 for some positive integer n). Let L denote

the length of the longest path between the source and a receiver. Then

A

L+1

= 0. In other words,

(I − A)

−1

= I + A + A

2

+ + A

L

. (3.5)

This equation immediately implies that the elements of the trans-

fer matrices A

j

are polynomials in the unknown variables ¦α

k

¦, a

result that we used in the proof of the main theorem. Moreover, (3.5)

oﬀers an intuitive explanation of (3.4). It is easy to see that A is

eﬀectively an incidence matrix. The series in (3.5) accounts for all

paths connecting the network edges. The transfer matrix expresses

the information brought along these paths from the sources to the

receivers.

We will from now on, without loss of generality, assume that D

j

= 0

(this we can always do by possibly adding auxiliary edges and increasing

the size m of the state space). Observe that substituting D

j

= 0 in (3.4)

gives A

j

= C

j

(I − A)

−1

B.

We next present a very useful lemma, helpful, for example, in some

code design algorithms. Here, we use it to prove an upper bound on the

ﬁeldsize (code alphabet size) suﬃcient for the existence of anetworkcode.

Lemma 3.1. Let A

j

= C

j

(I − A)

−1

B. Then for A strictly upper tri-

angular

[ detA

j

[ = [ detN

j

[, where N

j

=

_

C

j

0

I − A B

_

. (3.6)

3.3 Combinatorial Framework 33

Proof. Note that

detN

j

= det

_

C

j

0

I − A B

_

= ±det

_

0 C

j

B I − A

_

,

since permuting columns of a matrix only aﬀects the sign of its determi-

nant. Now, using a result known as Schur’s formula for the determinant

of block matrices, we further obtain

±detN

j

= det

_

0 C

j

B I − A

_

= det(I − A)det(C

j

(I − A)

−1

B).

The claim (3.6) follows directly from the above equation by noticing

that, since A is strictly upper triangular, then det(I − A) = 1.

Theorem 3.2. For a multicast network (under the model assumptions

in Section 3.1) an alphabet of size q > N is always suﬃcient.

Proof. Each variable α

k

appears at most once in each matrix N

j

. As a

result, the polynomial

f(α

1

, . . . , α

η

) = detN

1

detN

2

detN

N

(3.7)

has degree at most N in each of the η variables ¦α

k

¦. From

Lemma 5.6, for q > N there exist values p

1

, p

2

, . . . , p

η

∈ F

q

such that

f(α

1

= p

1

, . . . , α

η

= p

η

) ,= 0.

We discuss how tight this bound is and present additional alphabet

size bounds in Chapter 7.

3.3 Combinatorial Framework

We present an approach within the combinatorial framework that is

mainly based on graph theory and to a certain extent on projective

geometry. This approach has been useful for understanding compu-

tational complexity of network coding in terms of the required code

34 Theoretical Frameworks for Network Coding

alphabet size and the number of coding points, which we discuss in

Chapter 7. In this section, we show how we can use it to identify struc-

tural properties of the multicast conﬁgurations. We give this discussion

in two levels. First we discuss the results and structural properties infor-

mally at a high level using the original graph G. At a second level, we

use the line graph approach for a formal description and proofs.

3.3.1 Basic Notions of Subtree Decomposition

The basic idea is to partition the network graph into subgraphs through

which the same information ﬂows, as these are subgraphs within which

we do not have to code. Each such subgraph is a tree, rooted at a coding

point or a source, and terminating either at receivers or other coding

points.

For the example in Figure 3.2, coding points are the edges BD

and GH, and we can partition the graph into four trees. Through

T

1

= ¦S

1

A, AB, AF, FG¦ ﬂows the source symbol σ

1

, and through

T

2

= ¦S

2

C, CB, CE¦ ﬂows the source symbol σ

2

. Since source sym-

bols ﬂow through these trees, we call them source subtrees. Through

T

3

= ¦BD, DE, DG, DK¦ ﬂows the linear combination of source sym-

bols that ﬂows through the coding point BD, and similarly through

T

4

= ¦GH, HF, HK¦ the linear combination that ﬂows through the

coding point GH. We call T

3

and T

4

coding subtrees.

Deﬁnition 3.7. A subtree T

i

is called a source subtree if it starts with

a source and a coding subtree if it starts with a coding point.

Each receiver R

j

observes the linear combination of the sources that

ﬂow through the last edges on the paths (S

i

, R

j

), 1 ≤ i ≤ h. We call

these edges receiver nodes. Each receiver has h receiver nodes.

Deﬁnition 3.8. The receiver nodes for a receiver R

j

are deﬁned to be

the last edges on the paths (S

i

, R

j

), 1 ≤ i ≤ h.

In Figure 3.2 receiver R

1

has the receiver nodes AF and HF.

3.3 Combinatorial Framework 35

For the network code design problem, we only need to know how

the subtrees are connected and which receiver nodes are in each T

i

.

Thus we can contract each subtree to a node and retain only the edges

that connect the subtrees. The resulting combinatorial object, which

we will refer to as the subtree graph Γ, is deﬁned by its underlying

topology (V

Γ

, E

Γ

) and the association of the receivers with nodes V

Γ

.

Figure 3.3(b) shows the subtree graph for the network in Figure 3.2.

The network code design problem is now reduced to assigning an

h-dimensional coding vector c(T

i

) = [c

1

(T

i

) c

h

(T

i

)] to each sub-

tree T

i

. Receiver j observes h coding vectors from h distinct subtrees

to form the rows of the matrix A

j

.

Example 3.2. A valid code for the network in Figure 3.2(a) can be

obtained by assigning the following coding vectors to the subtrees in

Figure 3.3(b):

c(T

1

) = [1 0], c(T

2

) = [0 1], c(T

3

) = [1 1], and c(T

4

) = [0 1].

For this code, the ﬁeld with two elements is suﬃcient. Nodes B and G

in the network (corresponding to coding points BD and GH) perform

binary addition of their inputs and forward the result of the operation.

Fig. 3.3 (a) Line graph with coding points BD and GH for the network in Figure 3.2, and

(b) the associated subtree graph.

36 Theoretical Frameworks for Network Coding

The rest of the nodes in the network merely forward the information

they receive. The matrices for receivers R

1

, R

2

, and R

3

are

A

1

=

_

c(T

1

)

c(T

4

)

_

=

_

1 0

0 1

_

, A

2

=

_

c(T

2

)

c(T

3

)

_

=

_

0 1

1 1

_

, and

A

3

=

_

c(T

3

)

c(T

4

)

_

=

_

1 1

0 1

_

.

3.3.2 Formal Description of Subtree Decomposition

Using the line graph approach (see Figure 3.3), the subtree decompo-

sition can be formally performed as follows. We partition the vertices

of the line graph γ in (3.1) into subsets V

i

so that the following holds:

(1) Each V

i

contains exactly one source node (see Deﬁnition 3.6)

or a coding point (see Deﬁnition 3.4), and

(2) Each node that is neither a source node nor a coding point

belongs to the V

i

which contains its closest ancestral coding

point or source node.

T

i

is the subgraph of γ induced by V

i

(namely, V

i

together with the

edges whose both endpoints are in V

i

). Source subtrees start with one of

the source nodes ¦S

e

1

, S

e

2

, . . . , S

e

h

¦ of the line graph and coding subtrees

with one of the coding points.

We shall use the notation T

i

to describe both the subgraph of

the line graph γ, and the corresponding node in the subtree graph

Γ, depending on the context.

3.3.3 Properties of Subtree Graphs

Here we prove a number of properties for subtree graphs and mini-

mal subtree graphs, and use these properties to show that, for a given

number of sources and receivers, there exists a ﬁnite number of cor-

responding subtree graphs. For example, for a conﬁguration with two

sources and two receivers there exists exactly one multicast conﬁgu-

ration requiring network coding, which corresponds to the butterﬂy

network.

3.3 Combinatorial Framework 37

Theorem 3.3. The subgraphs T

i

satisfy the following properties:

(1) Each subgraph T

i

is a tree starting at either a coding point

or a source node.

(2) The same linear combination of source symbols ﬂows through

all edges in the original graph that belong to the same T

i

.

(3) For each receiver R

j

, there exist h vertex-disjoint paths in

the subtree graph from the source nodes to the receiver nodes

of R

j

.

(4) For each receiver R

j

, the h receiver nodes corresponding to

the last edges on the paths (S

i

, R

j

), 1 ≤ i ≤ h, belong to dis-

tinct subtrees.

(5) Each subtree contains at most N receiver nodes, where N is

the number of receivers.

Proof. We prove the claims one by one.

(1) By construction, T

i

contains no cycles, the source node (or

the coding point) has no incoming edges, and there exists a

path from the source node (or the coding point) to each other

vertex of T

i

. Thus T

i

is a tree rooted at the source node (or

the coding point).

(2) Since T

i

is a tree, the linear combination that ﬂows through

the root (source node or coding point) also has to ﬂow

through the rest of the nodes.

(3) Because the min-cut condition is satisﬁed in the original

network, there exist h edge-disjoint paths to each receiver.

Edge-disjoint paths in the original graph correspond to

vertex-disjoint paths in the line graph γ.

(4) Assume that the receiver nodes corresponding to the last

edges of paths (S

k

, R

j

) and (S

, R

j

) belong to the same sub-

tree T

i

. Then both paths go through the root vertex (source

node or coding point) of subtree T

i

. But the paths for receiver

R

j

are vertex-disjoint.

38 Theoretical Frameworks for Network Coding

(5) This claim holds because there are N receivers and, by the

previous claim, all receiver nodes contained in the same sub-

tree are distinct.

From deﬁnition, the multicast property is satisﬁed in Γ if and only

if the min-cut condition is satisﬁed for every receiver in G. Thus the

following holds:

Lemma 3.4. There is no valid codeword assignment (in the sense of

Deﬁnition 3.2) for a subtree graph which does not satisfy the multicast

property.

We use this lemma to show properties of minimal subtree graphs.

Directly extending Deﬁnition 3.5, a subtree graph is called minimal

with the multicast property if removing any edge would violate the

multicast property.

Theorem 3.5. For a minimal subtree graph, the following holds:

(1) There does not exist a valid network code where a subtree is

assigned the same coding vector as one of its parents.

(2) There does not exist a valid network code where the vec-

tors assigned to the parents of any given subtree are linearly

dependent.

(3) There does not exist a valid network code where the coding

vector assigned to a child belongs to a subspace spanned by

a proper subset of the vectors assigned to its parents.

(4) Each coding subtree has at most h parents.

(5) If a coding subtree has 2 ≤ P ≤ h parents, then there exist P

vertex-disjoint paths from the source nodes to the subtree.

Proof.

(1) Suppose a subtree is assigned the same coding vector as one

of its parents. Then removing the edge(s) between the sub-

tree and the other parent(s) results in a subtree graph with

3.3 Combinatorial Framework 39

a valid coding vector assignment. By Lemma 3.4, the multi-

cast property is also satisﬁed. But we started with a minimal

subtree graph and removed edges without violating the mul-

ticast property, which contradicts minimality.

(2) Suppose there is a subtree T whose parents T

1

, . . . , T

P

are

assigned linearly dependent vectors c(T

1

), . . . , c(T

P

). With-

out loss of generality, assume that c(T

1

) can be expressed

as a linear combination of c(T

2

), . . . , c(T

P

). Then remov-

ing the edge between T and T

1

results in a subtree graph

with a valid coding vector assignment. From Lemma 3.4

the multicast property is also satisﬁed. But this contradicts

minimality.

(3) Suppose there is a subtree T whose parents T

1

, . . . , T

P

are

assigned vectors c(T

1

), . . . , c(T

P

). Without loss of generality,

assume that T is assigned a vector c that is a linear combi-

nation of c(T

2

), . . . , c(T

P

). Then removing the edge between

T and T

1

results in a subtree graph with a valid coding vec-

tor assignment. From Lemma 3.4 the multicast property is

also satisﬁed. But this contradicts minimality.

(4) Since coding vectors are h-dimensional, this claim is a direct

consequence of claim (2).

(5) By claim (2), the coding vectors assigned to the P parent

subtrees must be linearly independent, which requires the

existence of P vertex disjoint paths.

The ﬁrst three claims of Theorem 3.5 describe properties of valid

codes for minimal subtree graphs, while the last two claims describe

structural properties of minimal subtree graphs. The additional struc-

tural properties listed below for networks with two sources directly

follow from Theorems 3.3 and 3.5.

Theorem 3.6. In a minimal subtree decomposition of a network with

h = 2 sources and N receivers:

(1) A parent and a child subtree have a child or a receiver (or

both) in common.

40 Theoretical Frameworks for Network Coding

(2) Each coding subtree contains at least two receiver nodes.

(3) Each source subtree contains at least one receiver node.

Proof.

(1) Suppose that the minimal subtree graph has two source and

m coding subtrees. Consider a network code which assigns

coding vectors [01] and [10] to the source subtrees, and a

diﬀerent vector [1α

k

], 0 < k ≤ m, to each of the coding sub-

trees, where α is a primitive element of F

q

and q > m. As

discussed in Appendix, any two diﬀerent coding vectors form

a basis for F

2

q

, and thus this code is feasible and valid. Let

T

i

and T

j

denote a parent and a child subtree, respectively,

with no child or receiver in common. Now, alter the code

by assigning the coding vector of T

i

to T

j

, and keep the

rest of coding vectors unchanged. This assignment is fea-

sible because T

i

and T

j

do not share a child, which would

have to be assigned a scalar multiple of the coding vector

of T

i

and T

j

. Since T

i

and T

j

do not share a receiver, the

code is still valid. Therefore, by claim (1) of Theorem 3.5,

the conﬁguration is not minimal, which contradicts the

assumption.

(2) Consider a coding subtree T. Let P

1

and P

2

be its parents.

By claim (1), a parent and a child have either a receiver or

a child in common. If T has a receiver in common with each

of its two parents, then T has two receiver nodes. If T and

one of the parents, say P

1

, do not have a receiver in common,

then they have a child in common, say C

1

. Similarly, if T and

C

1

do not have receiver in common, then they have a child

in common. And so forth, following the descendants of P

1

,

one eventually reaches a child of T that is a terminal node of

the subtree graph, and thus has no children. Consequently, T

has to have a receiver in common with this terminal subtree.

Similarly, if T and P

2

do not have a child in common, there

exists a descendant of P

2

and child of T which must have a

receiver in common with T.

3.3 Combinatorial Framework 41

(3) If the two source subtrees have no children, then each must

contain N receiver nodes. If the source subtree has a child,

say T

1

, then by claim (1) it will have a receiver or a child

in common with T

1

. Following the same reasoning as in the

previous claim, we conclude that each source subtree contains

at least one receiver node.

Lemma 3.7. For a network with two sources and two receivers, there

exist exactly two minimal subtree graphs shown in Figure 3.4.

Proof. Figure 3.4(a) shows the network scenario in which each receiver

has access to both source subtrees, and thus no network coding

is required, i.e., there are no coding subtrees. If network coding is

required, then by Theorem 3.6, there can only exist one coding sub-

tree containing two receiver nodes. Figure 3.4(b) shows the network

scenario in which each receiver has access to a diﬀerent source subtree

and a common coding subtree. This case corresponds to the familiar

butterﬂy example of a network with two sources and two receivers.

Therefore, all instances ¦G, S, ¹¦ with two sources and two receivers

where network coding is required, “contain” the butterﬂy network in

Figure 1.2, and can be reduced to it in polynomial time by removing

edges from the graph G.

Continuing along these lines, it is easy to show that for example

there exist exactly three minimal conﬁgurations with two sources and

Fig. 3.4 Two possible subtree graphs for networks with two sources and two receivers: (a)

no coding required, and (b) network coding necessary.

42 Theoretical Frameworks for Network Coding

three receivers, and seven minimal conﬁgurations with two sources and

four receivers. In fact, one can enumerate all the minimal conﬁgurations

with a given number of sources and receivers.

3.4 Information-Theoretic Framework

The original approach to network coding as well as the original proof of

the main theorem was information theoretic. Subsequently, both infor-

mation theorists and computer scientists used information theoretic

tools to establish bounds on achievable rates for the cases where the

main theorem does not hold (e.g., to derive bounds for multiple unicast

sessions traﬃc, as we discuss in part II of the review). We here brieﬂy

summarize the very beautiful proof idea used for the main theorem, and

refer the interested reader to the literature for in depth and rigorous

expositions on the information theoretic approach (see Section 3.6.3).

The proof outline we give is high level and overlooks a number of tech-

nical details.

Consider the instance ¦G = (V, E), S, ¹¦. Let In(v) and Out(v)

denote the set of incoming and outgoing edges for vertex v. For sim-

plicity, we assume binary communication, that is, we can send only bits

through the unit capacity edges of the network. Similar analysis goes

through in the case of larger ﬁnite alphabets.

3.4.1 Network Operation with Coding

The network operates as follows:

(1) A source S of rate ω

S

produces B packets (messages), each

containing nω

S

information bits. It also selects [Out(S)[ func-

tions ¦f

S

i

¦,

f

S

i

: 2

nω

S

→2

n

,

which map the nω

S

information bits into n coded bits to be

transmitted over its outgoing edges.

(2) Similarly, each vertex v ∈ V selects [Out(v)[ functions ¦f

v

i

¦,

one for each of its outgoing edges. Each function f

v

i

has as

its inputs [In(v)[ packets of length n, and as its output one

3.4 Information-Theoretic Framework 43

packet of length n

f

v

i

: 2

n|In(v)|

→2

n

.

The functions ¦f

S

i

¦ and ¦f

v

i

¦ are chosen uniformly at random

among all possible such mappings. For example, if we were

to restrict the network nodes operation to linear processing

over F

2

, each function f

v

i

would be deﬁned by a randomly

selected binary matrix of dimension [In(v)[n n.

(3) The network is clocked: at time k of the clock, for 1 ≤ k ≤ B,

the source S maps one of its B information packets denoted by

m

k

to [Out(S)[ packets, and sends these packets through its

outgoing edges. After time B, the source stops transmission.

(4) For each information packet m

k

, 1 ≤ k ≤ B, each other (non-

source) vertex v ∈ V waits until it receives all [In(v)[ incom-

ing packets that depend only on packet m

k

. It then (by

the means of functions ¦f

v

i

¦) computes and sends [Out(v)[

packets through its outgoing edges, also depending only on

packet m

k

.

(5) For each information packet m

k

, 1 ≤ k ≤ B, each receiver

R

j

gets [In(R

j

)[ packets, which it uses to decode the source

packet m

k

, based on its knowledge of the network topology

and all the mapping functions that the source and the inter-

mediate nodes employ.

(6) The receivers decode all B source packets during at most

B + [V [ time slots. Consequently, the received information

rate is

nω

S

B

n(B + [V [)

→ω

S

for B ¸[V [.

3.4.2 The Information Rate Achievable with Coding

We nowargue that the previously described network operation allows the

receivers to successfully decode the source messages, provided that the

rate ω

S

is smaller than the min-cut between the source and each receiver.

(1) Consider two distinct source messages m and m

∈ ¦1, 0¦

nω

S

.

We ﬁrst calculate the probability Pr(m, m

) that a speciﬁc

44 Theoretical Frameworks for Network Coding

receiver R

j

is not going to be able to distinguish between

m and m

**. That is, we calculate the pairwise probability of
**

error.

(2) Deﬁne 1 = 1(m, m

**) to be the set of vertices in V that cannot
**

distinguish between source messages m and m

. This means

that, for all v ∈ 1, all the In(v) incoming packets correspond-

ing to messages m and m

**are the same. Clearly, S belongs to
**

1 = V ¸ 1 (the complement of 1 in V ). Moreover, we have

an error at receiver R

j

if and only if R

j

∈ 1. Let ϑ1 denote

the edges that connect 1 with 1, that is, ϑ1 = ¦e[e = (v, u) ∈

E with v ∈ 1, u ∈ 1¦. When an error occurs, ϑ1 deﬁnes a cut

between the source and the receiver R

j

.

(3) We now calculate the probability that such a cut occurs. For

every edge e = (v, u) ∈ ϑ1, v can distinguish between m and

m

**but u cannot. Thus, v receives from its input edges two
**

distinct sets of packets for m and m

**, and maps them using
**

the randomly chosen f

v

e

to the same packet for edge e. Since

the mappings are chosen uniformly at random, this can hap-

pen with probability 2

−n

. Furthermore, since the mappings

for every edge are chosen independently, the probability that,

at all the edges in ϑ1, messages m and m

are mapped to

the same point equals 2

−n|ϑS|

.

(4) We have 2

nω

S

distinct messages in total, and thus, by the

union bound, the total probability or error does not exceed

2

−n|ϑV|

2

nω

S

. Therefore, provided that 2

−n(|ϑV|−ω

S

)

decays to

zero, the probability of error also decays to zero. For this

to happen, the condition [ϑ1[ > ω

S

is suﬃcient. In other

words, if m = min

V

[ϑS[ is the min-cut value between the

source and the destination, the network operation we pre-

viously described allows to achieve all rates strictly smaller

than this min-cut.

(5) Using the union bound we can show that the same result

holds simultaneously for all receivers.

We now make several important observations. First, note that

the network operation does not use the knowledge of the min-cut

value, at any other part but the source that produces packets at

3.5 Linear-Programming Framework 45

the information rate. That is, each intermediate node performs the

same operation, no matter what the min-cut value is, or where the

node is located in the network. This is not the case with routing,

where an intermediate node receiving linear independent information

streams needs, for example, to know which information stream to for-

ward where and at which rate. This property of network coding proves

to be very useful in dynamically changing networks, as we will dis-

cuss in the second part of the review. Second, the packet length n

can be thought of as the “alphabet size” required to achieve a cer-

tain probability of error when randomly selecting the coding oper-

ations. Third, although the proof assumed equal alphabet sizes, the

same arguments go through for arbitrary alphabet sizes. Finally, note

that the same proof would apply if, instead of unit capacity edges, we

had arbitrary capacities, and more generally, if each edge were a Dis-

crete Memoryless Channel (DMC), using the information theoretical

min-cut.

1

3.5 Linear-Programming Framework

There is a very rich history in using Linear Programming (LP) for

studying ﬂows through networks. Network coding has also been stud-

ied within this framework, which turned out to be the most natural

for addressing networks with costs. Another notable achievement of

this approach is the characterization of throughput beneﬁts of network

coding.

We start with some standard formulations for the max-ﬂow problem

and for multicast without network coding, and then present the LP

formulations for multicast with network coding. In this section, we

always assume a capacitated directed graph G = (V, E), where edge

e = (v, u) that connects vertex v ∈ V to vertex u ∈ V has an associated

capacity c

vu

≥ 0.

3.5.1 Unicast and Multicast Flows

Consider the problem of maximizing the ﬂow from a source S ∈ V to

a single destination R ∈ V . Let f

vu

∈ ¹

+

denote the ﬂow through the

1

As deﬁned for example in [17], Chapter 14.

46 Theoretical Frameworks for Network Coding

edge e = (v, u) ∈ E. We want to maximize the ﬂow that the source

sends to the receiver, subject to two conditions: capacity constraints

and ﬂow conservation. The capacity constraints require that the value

of the ﬂow through each edge does not exceed the capacity of the edge.

Flow conservation ensures that at every node u of the graph, apart from

the source and the destination, the total incoming ﬂow equals the total

outgoing ﬂow. In fact, by adding a directed edge of inﬁnite capacity

that connects the destination to the source, ﬂow conservation can also

be applied to the source and destination nodes. The associated LP can

be stated as follows:

Max-Flow LP:

maximize f

RS

subject to

(v,u)∈E

f

vu

=

(u,w)∈E

f

uw

, ∀ u ∈ V (ﬂow conservation)

f

vu

≤ c

vu

, ∀ (v, u) ∈ E (capacity constraints)

f

vu

≥ 0, ∀ (v, u) ∈ E

Consider now the problem of maximizing the multicast rate from

the (source) vertex S ∈ V to a set ¹= ¦R

1

, R

2

, . . . , R

N

¦ of N terminals.

If we do not use network coding, then this problem is equivalent to the

problem of Packing Steiner Trees. A Steiner tree is a tree rooted at the

source vertex that spans (reaches) all receivers. A partial Steiner tree

may not reach all receivers. Figure 3.5 depict a Steiner tree rooted at

S

2

and a partial Steiner tree rooted at S

1

.

With each tree t, we associate a random variable y

t

that

expresses the information rate conveyed through the tree t to the

receivers. Let τ be the set of all Steiner trees in ¦G, S, ¹¦, and n

t

the number of terminals in t. The maximum fractional

2

packing

of Steiner trees can be calculated by solving the following linear

program.

2

Fractional means that we do not restrict yt to integer values.

3.5 Linear-Programming Framework 47

Fig. 3.5 The butterﬂy network with a Steiner tree rooted at S

2

and a partial Steiner tree

rooted at S

1

.

Packing Steiner Trees LP:

maximize

t∈τ

y

t

subject to

t∈τ:e∈t

y

t

≤ c

e

, ∀ e ∈ E (capacity constraints)

y

t

≥ 0, ∀ t ∈ τ

This is a hard problem to solve, as the set of all possible Steiner trees

can be exponentially large in the number of vertices of the graph.

3.5.2 Multicast with Network Coding

with and without Cost

If we use network coding, the problem of maximizing the throughput

when multicasting has a polynomial time solution. The new idea we

can use is the following. With each receiver R

i

, we associate its own

ﬂow f

i

. Each individual ﬂow f

i

obeys ﬂow conservation equations.

However, the individual ﬂows are allowed to “overlap,” for example

using linear combination, resulting in a conceptual ﬂow f, which simply

summarizes what information rate actually ﬂows through each edge.

48 Theoretical Frameworks for Network Coding

It is now this ﬂow that needs to satisfy the capacity constraints. Note

that the conceptual ﬂow f does not need to satisfy ﬂow conservation.

Again we add virtual edges (R

i

, S) of inﬁnite capacity that connect

each receiver R

i

to the source vertex.

Network Coding LP:

maximize χ

subject to

f

i

R

i

S

≥ χ, ∀ i

(v,u)∈E

f

i

vu

=

(u,w)∈E

f

i

uw

, ∀ u ∈ V, ∀ i (ﬂow conservation)

f

i

vu

≤ f

vu

, ∀ (v, u) ∈ E (conceptual ﬂow constraints)

f

vu

≤ c

vu

, ∀ (v, u) ∈ E (capacity constraints)

f

i

vu

≥ 0 and f

vu

≥ 0, ∀ (v, u) ∈ E, ∀ i

The ﬁrst constraint ensures that each receiver has min-cut at least χ.

Once we solve this program and ﬁnd the optimal χ, we can apply the

main theorem in network coding to prove the existence of a network

code that can actually deliver this rate to the receivers, no matter how

the diﬀerent ﬂows f

i

overlap. The last missing piece is to show that we

can design these network codes in polynomial time, and this will be the

subject of Chapter 5. Note that, over directed graphs, instead of solving

the LP, we could simply identify the min-cut from the source to each

receiver, and use the minimum such value. The merit of this program

is that it can easily be extended to apply over undirected graphs as

well, and that it introduces an elegant way of thinking about network

coded ﬂows through conceptual ﬂows that can be applied to problems

with similar ﬂavor, such as the one we describe next.

In many information ﬂow instances, instead of maximizing the

rate, we may be interested in minimizing a cost that is a function of

the information rate and may be diﬀerent for each network edge. For

example, the cost of power consumption in a wireless environment

can be modeled in this manner. Consider an instance ¦G, S, ¹¦ where

the min-cut to each receiver is larger or equal to h, and assume

3.6 Types of Routing and Coding 49

that the cost of using edge (v, u) is proportional to the ﬂow f

vu

with a weight w

vu

≥ 0 (i.e., the cost equals w

vu

f

vu

). The problem of

minimizing the overall cost in the network is also computationally hard

if network coding is not allowed. On the other hand, by using network

coding, we can solve it in polynomial time by slightly modifying the

previous LP, where we associate zero cost with the virtual edges (R

i

, S):

Network Coding with Cost LP:

minimize

(x,y)∈E

w

xy

f

xy

subject to

f

i

R

i

S

≥ h, ∀ i

(v,u)∈E

f

i

vu

=

(u,w)∈E

f

i

uw

, ∀u ∈ V, ∀i (ﬂow conservation)

f

i

vu

≤ f

vu

, ∀ (v, u) ∈ E (conceptual ﬂow constraints)

f

vu

≤ c

vu

, ∀ (v, u) ∈ E (capacity constraints)

f

i

vu

≥ 0 and f

vu

≥ 0, ∀ (v, u) ∈ E, ∀ i

In general, although network coding allows us to send information

to each receiver at a rate equal to its min-cut, i.e., the same rate the

receiver would achieve by using all the network resources exclusively to

support its information transmission, if there is a cost associated with

using the edges of the graph, network coding does not necessarily allow

us to achieve the optimal cost that the exclusive use of the network

would oﬀer. We will see a speciﬁc such case in Example 6.2.

3.6 Types of Routing and Coding

We here describe several types of routing and coding we have or will

encounter in this review.

3.6.1 Routing

Routing refers to the case where ﬂows are sent uncoded from the source

to the receivers, and intermediate nodes simply forward their incoming

50 Theoretical Frameworks for Network Coding

information ﬂows. We distinguish the following types of routing:

(1) Integral routing, which requires that through each unit capac-

ity edge we route at most one unit rate source.

(2) Fractional routing, which allows multiple fractional rates

from diﬀerent sources to share the same edge as long as they

add up to at most the capacity of the edge.

(3) Vector routing, which can be either integral or fractional but

is allowed to change at the beginning of each time slot.

Fractional (integral) routing is equivalent to the problem of packing

fractional (integral) Steiner trees. Vector routing amounts to packing

Steiner trees not only over space, but also over time.

3.6.2 Network Coding

Network coding refers to the case where the intermediate nodes in the

network are allowed to perform coding operations. We can think of

each source as producing a length L binary packet, and assume we can

send through each edge one bit per time slot. There are several possible

approaches to network coding.

(1) In linear network coding, sets of m consecutive bits are

treated as a symbol of F

q

with q = 2

m

. Coding amounts

to linear operations over the ﬁeld F

q

. The same operations

are applied symbol-wise to each of the L/m symbols of the

packet. Thus, each incoming packet in a coding point is mul-

tiplied with a symbol in F

q

and added to other incoming

packets.

(2) In vector linear network coding, we treat each packet as a vec-

tor of length L/m with symbols in F

2

m for some m. Encoding

amounts to linear transformations in the vector space F

L/m

2

m :

each incoming packet (that is, the corresponding vector) is

multiplied by an

L

m

L

m

matrix over F

2

m, and then added to

other packets.

(3) In nonlinear network coding, intermediate nodes are allowed

to combine packets using any operation (e.g., bitwise AND).

3.6 Types of Routing and Coding 51

For traﬃc conﬁgurations where the main theorem in network coding

does not apply, there exist networks that are solvable using vector linear

network coding, but not solvable using linear network coding. Moreover,

there exist networks which are solvable only using nonlinear operations

and are not solvable using linear operations. Finally, the alphabet size

of the solution can be signiﬁcantly reduced as we move from linear, to

vector linear, and to nonlinear solutions.

3.6.3 Coding at the Source

This is a hybrid scheme between routing and network coding: source

nodes are allowed to perform coding operations and send out encoded

information streams. Intermediate nodes in the network however, are

only allowed to forward their incoming information streams.

In networks where there are no losses, to maximize the multicast

rate, we can combine routing (or vector routing) with erasure correcting

coding. For example:

• Pack through the network and over n-time slots partial

Steiner trees, i.e., trees that are rooted at the source nodes

that span a subset of the receivers such that each receiver

appears in at least f trees.

• Encode the information using an erasure correcting code at

the source and send a diﬀerent coded symbol through each

partial tree.

The idea behind this scheme is to assume that a receiver experiences

an “erasure” if it is not spanned by a partial Steiner tree. Thus each

receiver essentially observes the source information through an erasure

channel.

We now formulate the problem of creating an appropriate routing

schedule that supports such a coding scheme as a linear program.

Consider an instance ¦G, S, ¹¦. Let τ denote the set of partial Steiner

trees in G rooted at S with terminal set ¹. We will be using the

number of time slots, n, as a parameter. In each time slot we seek

a feasible fractional packing of partial Steiner trees. The goal is to

maximize the total number of trees that each receiver occurs in, across

52 Theoretical Frameworks for Network Coding

the time slots. We express this as a linear program as follows. For a

tree t ∈ τ and a time slot k, we have a non-negative variable y(t, k)

that indicates the amount of t that is packed in time slot k.

Coding at the Source LP:

maximize f

subject to

k

t∈τ:R

i

∈t

y(t, k) ≥ f, ∀ R

i

t∈τ:e∈t

y(t, k) ≤ c

e

, ∀ e ∈ E, 1 ≤ k ≤ n

y(t, k) ≥ 0, ∀ t ∈ τ, 1 ≤ k ≤ n

Given a solution y

∗

to this linear program, let m =

k

t∈τ

y

∗

(t, k).

We can use an erasure code that employs m coded symbols to convey

the same f information symbols to all receivers, i.e., to achieve rate

T

i

= f/m. (To be precise, we need f/m to be a rational and not a real

number.) For integer edge capacities c

e

, there is an optimum solution

with rational coordinates.

Example 3.3. The network in Figure 3.6 has N = 10 receivers, and

each receiver has min-cut three from the source. If we are restricted

to integral routing over a single time-slot, we can pack one Steiner

tree, which would lead to T

i

= 1. Alternatively, we can pack three par-

tial Steiner trees, where each receiver is taking part in two distinct

trees. Using an erasure code of rate 2/3 at the source, we can then

achieve T

i

= 2.

This scheme is very related to the rate-less codes approach over lossy

networks, such as LT codes and Raptor codes. In lossy networks, packet

erasures can be thought of as randomly pruning trees. At each time slot,

the erasures that occur correspond to a particular packing of partial

Steiner trees. Rate-less codes allow the receivers to decode as soon as

they collect a suﬃcient number of coded symbols.

3.6 Types of Routing and Coding 53

Fig. 3.6 A network with unit capacity edges where, if we are restricted to integral routing

over one time-slot, we can pack (Case 1) one unit rate Steiner tree, or (Case 2) three unit

rate partial Steiner trees.

Notes

The algebraic framework for network coding was developed by Koetter

and M´edard in [32], who translated the network code design to an alge-

braic problem which depends on the structure of the underlying graph.

Lemma 3.1 is proved by Ho et al. in [26] and by Harvey in [25]. The pre-

sented combinatorial framework follows the approach of Fragouli and

Soljanin in [24]. The information theoretic framework comes from the

original paper by Ahlswede et al. [2]. Introduction to LP can be found

in numerous books, see for example [10, 44]. The network coding LP

54 Theoretical Frameworks for Network Coding

that achieved ﬂow maximization was proposed by Li et al. in [37], while

the cost minimization LPs and decentralized distributed algorithms for

solving them were proposed by Lun et al. in [38]. Vector routing was

examined by Cannons et al. in [11]. Linear vs. nonlinear solvability was

discussed by Riis in [42] and by Dougherty et al. in [19]. Coding at the

source using partial tree packing was proposed by Chekuri et al. in [15].

4

Throughput Beneﬁts of Network Coding

The multicast examples considered in the previous chapters demon-

strated that network coding can oﬀer throughput beneﬁts when com-

pared to routing; we will here look into how large such beneﬁts can

be. We consider both directed and undirected networks, under two

types of routing: integral routing, which requires that through each

unit capacity edge we route at most one unit rate source, and frac-

tional routing, which allows multiple fractional rates from diﬀerent

sources that add up to at most one on any given edge. We consider two

throughput measures: symmetric throughput, which refers to the com-

mon information rate guaranteed to all receivers, and average through-

put, which refers to the average of the rates that the individual receivers

experience.

We will see how throughput beneﬁts of network coding can be

related to the integrality gap of a standard LP formulation for the

Steiner tree problem. This approach applies to both directed and undi-

rected graphs. For the special case of undirected networks, we will

outline a simple proof that coding can at most double the through-

put. For directed networks, we will give examples where coding oﬀers

55

56 Throughput Beneﬁts of Network Coding

large throughput beneﬁts as well as examples where these beneﬁts are

limited.

4.1 Throughput Measures

We denote by T

c

the maximum rate that the receivers experience when

network coding is used, and refer to it as coding throughput. For the

routing throughput, we use the following notation:

• T

j

i

and T

j

f

denote the rate that receiver R

j

experiences with

fractional and integral routing, respectively, under some rout-

ing strategy.

• T

i

and T

f

denote the maximum integral and fractional rate

we can route to all receivers, where the maximization is over

all possible routing strategies. We refer to this throughput as

symmetric or common.

• T

av

i

=

1

N

max

N

j=1

T

j

i

and T

av

f

=

1

N

max

N

j=1

T

j

f

denote the

maximum integral and fractional average throughput.

The beneﬁts of network coding in the case of the common through-

put measure are described by

T

i

T

c

and

T

f

T

c

and the beneﬁts of network coding in the case of the average throughput

measure are described by

T

av

i

T

c

and

T

av

f

T

c

.

For a multicast conﬁguration with h sources and N receivers, it

holds that

T

c

= h,

from the main network multicast theorem. Also, because there exists a

tree spanning the source and the receiver nodes, the uncoded through-

put is at least 1. We, therefore, have

1 ≤ T

av

i

≤ T

av

f

≤ h,

4.2 Linear Programming Approach 57

and thus

1

h

≤

T

av

i

T

c

≤

T

av

f

T

c

≤ 1. (4.1)

The upper bound in (4.1) is achievable by the conﬁgurations in which

network coding is not necessary for multicast at rate h.

4.2 Linear Programming Approach

We here use the Linear Programming (LP) framework to show

that the throughput beneﬁts network coding oﬀers equal the inte-

grality gap of a standard formulation for the Steiner tree prob-

lem. We prove this result for the symmetric throughput in Theo-

rem 4.1 and give without proof the respective result for the aver-

age throughput in Theorem 4.2. We focus our discussion on directed

graphs, as we will treat undirected graphs separately later on. How-

ever, the described approach applies to both directed and undirected

graphs.

4.2.1 Symmetric Throughput Coding Beneﬁts

Consider an instance ¦G, S, ¹¦ where G = (V, E) is a directed graph,

S ∈ V is a root (source) vertex, and ¹= ¦R

1

, R

2

, . . . , R

N

¦ a set of N

terminals (receivers). For the instance ¦G, S, ¹¦, a Steiner tree is a

subset of G that connects S with the set of receivers ¹.

With every edge e of the graph, we can in general associate two

parameters: a capacity c

e

≥ 0 and a cost (weight) w

e

≥ 0. Let c = [c

e

]

and w = [w

e

], e ∈ E, denote vectors that collect the set of edge capaci-

ties and edge weights, respectively. Either the edge weights or the edge

capacities or both may be relevant in a particular problem. We con-

sider essentially two problems: the Steiner tree packing problem and

the minimum Steiner tree problem.

In the Steiner tree packing problem, we are given ¦G, S, ¹¦ and a set

of non-negative edge capacities c. The maximum throughput achievable

with routing for this instance can be calculated using the following LP

58 Throughput Beneﬁts of Network Coding

(see Chapter 3):

Packing Steiner Trees – Primal:

maximize

t∈τ

y

t

subject to

t∈τ:e∈t

y

t

≤ c

e

, ∀ e ∈ E

y

t

≥ 0, ∀ t ∈ τ

(4.2)

Recall that the variable y

t

expresses how much fractional rate

we route from the source to the receivers through the Steiner tree t,

for all possible Steiner trees t ∈ τ for our instance. Let T

f

(G, S, ¹, c)

denote the optimal solution of this problem, and T

c

(G, S, ¹, c) the rate

network coding achieves. For further analysis, it is useful to look at

the dual problem of the above LP:

Packing Steiner Trees – Dual:

minimize

e∈E

c

e

z

e

subject to

e∈t

z

e

≥ 1, ∀ t ∈ τ

z

e

≥ 0, ∀ e ∈ E

(4.3)

In the dual program, we can think of the variable z

e

as expressing

a cost of edge e. We are asking what cost z

e

should be associated with

edge e (for all e ∈ E) so that each Steiner tree has cost at least one and

the total cost

e∈E

c

e

z

e

is minimized.

In the minimum Steiner tree problem, we are given ¦G, S, ¹¦

and a set of non-negative edge weights w. We are asked to ﬁnd the

minimum weight tree that connects the source to all the terminals.

Here edge capacities are not relevant: the Steiner tree either uses or

4.2 Linear Programming Approach 59

does not use an edge. To state this problem, we need the notion of a

separating set of vertices. We call a set of vertices T ⊂ V separating

if T contains the source vertex S and V ¸ T contains at least one of

the terminals in ¹. Let δ(T) denote the set of edges from T to V ¸ T,

that is, δ(T) = ¦(u, v) ∈ E : u ∈ T, v / ∈ T¦. We consider the following

integer programming (IP) formulation for the minimum Steiner tree

problem:

Steiner Tree Problem:

minimize

e∈E

w

e

x

e

subject to

e∈δ(D)

x

e

≥ 1, ∀ T: T is separating

x

e

∈ ¦0, 1¦, ∀ e ∈ E

(4.4)

where there is a binary variable x

e

for each edge e ∈ E to indicate

whether the edge is contained in the tree. We denote by opt(G, w, S, ¹)

the value of the optimum solution for the given instance.

A linear relaxation of the above IP is obtained by replacing the

constraint x

e

∈ ¦0, 1¦ by 0 ≤ x

e

≤ 1 for all e ∈ E. We can further

simplify this constraint to x

e

≥ 0, e ∈ E (noticing that if a solution is

feasible with x

e

≥ 1, then it remains feasible by setting x

e

= 1), and

obtain the following formulation:

Linear Relaxation for Steiner Tree Problem:

minimize

e∈E

w

e

x

e

subject to

e∈δ(D)

x

e

≥ 1, ∀ T: T is separating

x

e

≥ 0, ∀ e ∈ E

(4.5)

60 Throughput Beneﬁts of Network Coding

Let lp(G, w, S, ¹) denote the optimum value of this linear program for

the given instance. The value lp(G, w, S, ¹) lower bounds the cost of

the integer program solution opt(G, w, S, ¹). The integrality gap of the

relaxation on G is deﬁned as

α(G, S, ¹) = max

w≥0

opt(G, w, S, ¹)

lp(G, w, S, ¹)

,

where the maximization is over all possible edge weights. Note that

α(G, S, ¹) is invariant to scaling of the optimum achieving weights.

The theorem tells us that, for a given conﬁguration ¦G, S, ¹¦, the

maximum throughput beneﬁts we may hope to get with network coding

equals the largest integrality gap of the Steiner tree problem possible

on the same graph. This result refers to fractional routing; if we restrict

our problem to integral routing on the graph, we may get even larger

throughput beneﬁts from coding.

Theorem 4.1. Given an instance ¦G, S, ¹¦

max

w

opt(G, w, S, ¹)

lp(G, w, S, ¹)

= max

c

T

c

(G, S, ¹, c)

T

f

(G, S, ¹, c)

.

Proof. We ﬁrst show that there exist weights w so that

max

c

T

c

(G, S, ¹, c)

T

f

(G, S, ¹, c)

≤

opt(G, w, S, ¹)

lp(G, w, S, ¹)

. (4.6)

(1) For the instance ¦G, S, ¹¦, ﬁnd c = c

∗

such that the left-hand

side ratio is maximized, i.e.,

T

c

(G, S, ¹, c

∗

)

T

f

(G, S, ¹, c

∗

)

= max

c

T

c

(G, S, ¹, c)

T

f

(G, S, ¹, c)

.

Assume c

∗

is normalized so that the minimum min-cut from

the source to the receivers (and as a result the network coding

rate) equals one, that is,

T

c

(G, S, ¹, c

∗

)

T

f

(G, S, ¹, c

∗

)

=

1

T

f

(G, S, ¹, c

∗

)

. (4.7)

Note that normalization does not aﬀect fractional routing.

4.2 Linear Programming Approach 61

(2) Now solve the dual program in (4.3). From strong duality,

1

we get that

T

f

(G, S, ¹, c

∗

) =

y

∗

t

=

c

e

z

∗

e

. (4.8)

Then the dual program equivalently says that there exist

costs z

∗

e

such that each Steiner tree has cost at least one.

From the slackness conditions,

2

each tree t used for fractional

routing (with y

t

> 0), gets cost of exactly one

e∈t

z

∗

e

= 1.

(3) We now use our second set of optimization problems. Con-

sider the same instance ¦G, S, ¹¦, and associate weights

w

e

= z

∗

e

with every edge of the graph, and solve on this graph

the Steiner tree problem in (4.4). Since for this set of weights,

each Steiner tree in our graph has weight at least one, and

(as previously discussed) the slackness conditions imply that

there exist trees of weight one, the minimum value the Steiner

tree IP can achieve equals one:

opt(G, w = z

∗

, S, ¹) = 1. (4.9)

(4) Consider the LP relaxation of this problem stated in (4.5).

Note that we can get a feasible solution by choosing x = c

∗

(the capacities in the original graph). Indeed, for these values

the min-cut to each receiver equals at least one. Moreover,

z

∗

e

c

e

= T

f

(G, S, ¹, c

∗

). Therefore, the LP minimization will

lead to a solution

lp(G, w = z

∗

, S, ¹) ≤ T

f

(G, S, ¹, c

∗

). (4.10)

Putting together (4.7), (4.9), and (4.10), we get (4.6).

We next show that there exist capacities c so that

T

c

(G, S, ¹, c)

T

f

(G, S, ¹, c)

≥ max

w

opt(G, w, S, ¹)

lp(G, w, S, ¹)

. (4.11)

1

Strong duality holds since there exists a feasible solution in the primal.

2

See [10, 44] for an overview of LP.

62 Throughput Beneﬁts of Network Coding

(1) We start from the Steiner tree problem in (4.4) and (4.5).

Let w

∗

be a weight vector such that

α(G, S, ¹) =

opt(G, w

∗

, S, ¹)

lp(G, w

∗

, S, ¹)

= max

w

opt(G, w, S, ¹)

lp(G, w, S, ¹)

.

Let x

∗

be an optimum solution for the LP on the instance

(G, w

∗

, S, ¹). Hence

lp(G, w

∗

, S, ¹) =

e

w

∗

e

x

∗

e

.

Note that vectors with positive components, and thus the

vectors x = ¦x

e

, e ∈ E¦ satisfying the constraints of (4.5), can

be interpreted as a set of capacities for the edges of G. In

particular, we can then think of the optimum solution x

∗

as

associating a capacity c

e

= x

∗

e

with each edge e so that the

min-cut to each receiver is greater or equal to one, and the

cost

e

w

∗

e

x

∗

e

is minimized.

(2) Consider now the instance with these capacities, namely,

¦G, c = x

∗

, S, ¹¦ and examine the coding throughput ben-

eﬁts we can get. Since the min-cut to each receiver is at least

one, with network coding we can achieve throughput

T

c

(G, c = x

∗

, S, ¹) ≥ 1. (4.12)

With fractional routing we can achieve

T

f

(G, c = x

∗

, S, ¹) =

y

∗

t

(4.13)

for some y

∗

t

.

(3) We now need to bound the cost opt(G, w

∗

, S, ¹) that the

solution of the IP Steiner tree problem will give us. We will

use the following averaging argument. We will show that the

average cost per Steiner tree will be less or equal to

w

∗

e

x

∗

e

T

f

(G, c = x

∗

, S, ¹)

.

Therefore, there exists a Steiner tree with the weight smaller

than the average, which upper bounds the solution of (4.5):

opt(G, w

∗

, S, ¹) ≤

w

∗

e

x

∗

e

T

f

(G, x

∗

, S, ¹)

=

lp(G, w

∗

, S, ¹)

T

f

(G, x

∗

, S, ¹)

4.2 Linear Programming Approach 63

But from (4.12), we have

lp(G, w

∗

, S, ¹)

T

f

(G, c = x

∗

, S, ¹)

≤ lp(G, w

∗

, S, ¹)

T

c

(G, c = x

∗

, S, ¹)

T

f

(G, c = x

∗

, S, ¹)

and we are done.

(4) Let w

t

=

e∈t

w

∗

e

denote the weight of a tree t, and consider

t∈τ

w

t

y

∗

t

(the total weight of the packing y

∗

). We have

t∈τ

w

t

y

∗

t

≥ min

t∈τ

¦w

t

¦

t∈τ

y

∗

t

.

Thus there exists a tree t

1

∈ argmin

t∈τ

¦w

t

¦ of weight w

t

1

such that

w

t

1

≤

t∈τ

w

t

y

∗

t

t∈τ

y

∗

t

. (4.14)

(5) Finally, we need to show that

t∈τ

w

t

y

∗

t

≤

e∈E

w

∗

e

x

∗

e

.

Indeed, by changing the order of summation, we get

t∈τ

w

t

y

∗

t

=

t∈τ

y

∗

t

e∈t

w

∗

e

=

e∈E

w

∗

e

t:e∈t

y

∗

t

.

By the feasibility of y

∗

for the capacity vector c = x

∗

, the

quantity

t:e∈t

y

∗

t

is at most x

∗

e

. Hence we have that

t∈τ

w

t

y

∗

t

≤

e∈E

w

∗

e

x

∗

e

. (4.15)

Putting together (4.6) and (4.11) proves the theorem.

4.2.2 Average Throughput Coding Beneﬁts

We now consider the coding advantage for average throughput over a

multicast conﬁguration ¦G, S, ¹¦ and a set of non-negative capacities c

on the edges of G. We will assume for technical reasons that the min-cut

from S to each of the terminals is the same. This can be easily arranged

by adding dummy terminals. That is, if the min-cut to a receiver R

i

is larger than required, we connect the receiver node to a new dummy

terminal through an edge of capacity equal to the min-cut. Then the

network coding throughput is equal to the common min-cut value.

64 Throughput Beneﬁts of Network Coding

The maximum achievable average throughput with routing is given

by the maximum fractional packing of partial Steiner trees. A partial

Steiner tree t stems from the source S and spans all or only a subset

of the terminals. With each tree t, we associate a variable y

t

denot-

ing a fractional ﬂow through the tree. Let τ be the set of all partial

Steiner trees in ¦G, S, ¹¦, and n

t

the number of terminals in t. Then

the maximum fractional packing of partial Steiner trees is given by the

following linear program:

Packing Partial Steiner Trees:

maximize

t∈τ

n

t

N

y

t

subject to

t∈τ:e∈t

y

t

≤ c

e

, ∀ e ∈ E

y

t

≥ 0, ∀ t ∈ τ

Let T

av

f

(G, S, ¹, c) denote the value of the above linear program on a

given instance. The coding advantage for average throughput on G is

given by the ratio

β(G, S, ¹) = max

c

T

c

(G, c, S, ¹)

T

av

f

(G, c, S, ¹)

.

Note that β(G) is invariant to scaling capacities. It is easy to see that

β(G, S, ¹) ≥ 1, since we assumed that the min-cut to each receiver is

the same, and thus network coding achieves the maximum possible sum

rate. It is also straightforward that β(G, S, ¹) ≤ α(G, S, ¹), since for

any given conﬁguration ¦G, c, S, ¹¦, the average throughput is at least

as large as the common throughput we can guarantee to all receivers,

namely, T

av

f

≥ T

f

.

Let β(G, S, ¹

∗

) denote the maximum average throughput beneﬁts

we can get on graph G when multicasting from source S to any possible

subset of the receivers ¹

⊆ ¹:

β(G, S, ¹

∗

) = max

R

⊆R

β(G, S, ¹

).

4.3 Conﬁgurations with Large Network Coding Beneﬁts 65

Theorem 4.2. For a conﬁguration ¦G, S, ¹¦ where the number of

receivers is [¹[ = N and the min-cut to each receiver is the same, we

have

β(G, S, ¹

∗

) ≥ max

_

1,

1

H

N

α(G, S, ¹)

_

,

where H

N

is the Nth harmonic number H

N

=

N

j=1

1/j.

Note that the maximum value of T

f

and T

av

f

is not necessarily achieved

for the same capacity vector c, or for the same number of receivers N.

What this theorem tells us is that, for a given ¦G, S, ¹¦, with [¹[ = N,

the maximum common rate we can guarantee to all receivers will be at

most H

N

times smaller than the maximum average rate we can send

from S to any subset of the receivers ¹. The theorem quantitatively

bounds the advantage in going from the stricter measure α(G, S, ¹)

to the weaker measure β(G, S, ¹

∗

). Furthermore, it is often the case

that for particular instances (G, S, ¹), either α(G, S, ¹) or β(G, S, ¹

∗

)

is easier to analyze and the theorem can be useful to get an estimate

of the other quantity.

We now comment on the tightness of the bounds in the theo-

rem. There are instances in which β(G, S, ¹

∗

) = 1, take for example

the case when G is a tree rooted at S. On the other hand there

are instances in which β(G, S, ¹

∗

) = O(1/lnN)α(G, S, ¹). Examples

include network graphs discussed in the next section. In general, the

ratio α(G, S, ¹)/β(G, S, ¹

∗

) can take a value in the range [1, H

N

].

4.3 Conﬁgurations with Large Network Coding Beneﬁts

The ﬁrst example that demonstrated the beneﬁts of network coding

over routing for symmetric throughput was the combination network

B(h, k) depicted in Figure 4.1.

Example 4.1. The Combination Network

The combination network B(h, k), where k ≥ h, has h sources, N =

_

k

h

_

receivers, and two layers of k intermediate nodes A and B. All sources

66 Throughput Beneﬁts of Network Coding

Fig. 4.1 The combination B(h, k) network has h sources and N =

k

h

**receivers, each receiver
**

observing a distinct subset of h B-nodes. Edges have unit capacity and sources have unit

rate.

are connected to k A nodes. Each A node is connected to a correspond-

ing B node. Each of the

_

k

h

_

receivers is connected to a diﬀerent subset

of size h of the B nodes, as Figure 4.1 depicts. B(h, k) has k coding

points, that are the edges (A

i

, B

i

), 1 ≤ i ≤ k, between the A and B

nodes. A speciﬁc instance B(h = 3, k = 5) is depicted in Figure 3.6.

The beneﬁts of network coding compared to integral routing for the

network B(h, k) can be bounded as

T

i

T

c

≤

1

h

, for k ≥ h

2

. (4.16)

To see that, note that the min-cut condition is satisﬁed for every

receiver, and thus T

c

= h. Choose any integral routing scheme, that

sends one source through every edges that is a coding point. From the

pigeonhole principle, because there exist h sources and k > h

2

points,

there will exist at least one source, say S

1

, that will be allocated to

at least h coding points. Consider the receiver that observes these h

coding points where S

1

ﬂows. These receiver will experience rate equal

to one, and thus the bound (4.16) follows.

4.3 Conﬁgurations with Large Network Coding Beneﬁts 67

It is also straightforward to upper-bound the fractional throughput

of B(k, h). Note that each Steiner tree needs to use k − (h − 1) out of

the k A

i

B

i

edges, to reach all the receivers. Therefore, the fractional

packing number is at most k/(k − h + 1), and consequently

T

f

T

c

≤

k

h(k − h + 1)

. (4.17)

On the other hand, as we will see the following section, the aver-

age throughput beneﬁts for B(h, k) are upper bounded by a factor of

two. In addition, we will see that by using a coding scheme at the

source and vector routing, the symmetric throughput can approach

the average, and thus the coding beneﬁts are upper bounded by the

same factor. It is, therefore, natural to ask if there exist conﬁgura-

tions where network coding oﬀers signiﬁcant beneﬁts with respect to

the average throughput. We next describe a class of networks with N

receivers for which coding can oﬀer up to O(

√

N)-fold increase of the

average throughput achievable by routing. We refer to this class as

zk networks, because they were introduced by Zosin and Khuller, who

constructed them to demonstrate a possible size of the integrality gap

for the directed Steiner tree problem.

Example 4.2. The zk(p, N) Network

Let N and p, p ≤ N, be two integers and 1 = ¦1, 2, . . . , N¦ be an index

set. We deﬁne two more index sets: / as the set of all (p − 1)-element

subsets of 1 and B as the set of all p-element subsets of 1. A zk(N, p)

network, illustrated in Figure 4.2, is deﬁned by two parameters N and

p as follows: Source S transmits information rate ω = h =

_

N−1

p−1

_

to N

receiver nodes R

1

R

N

through a network of three sets of nodes A, B,

and C. A-nodes are indexed by the elements of /, and B- and C-nodes,

by the elements of B. An A node is connected to a B node if the index

of A is a subset of the index of B. A B node is connected to a C node

if and only if they correspond to the same element of B. A receiver

node is connected to the C nodes whose indices contain the index of

the receiver. All edges in the graph have unit capacity. The out-degree

of the source node is

_

N

p−1

_

, the out-degree of A nodes is N − (p − 1),

the in-degree of B nodes is p, the out-degree of C nodes is p, and the

68 Throughput Beneﬁts of Network Coding

Fig. 4.2 zk(N, p) network.

in-degree of the receiver nodes is

_

N−1

p−1

_

. In fact, it is easy to show that

there exist exactly

_

N−1

p−1

_

edge disjoint paths between the source and

each receiver, and thus the min-cut to each receiver equals

_

N−1

p−1

_

.

The following theorem provides an upper bound to the average

throughput beneﬁts achievable by network coding in zk(N, p) networks.

Theorem 4.3. In a zk(N, p) network (Figure 4.2) with h =

_

N−1

p−1

_

sources and N receivers,

T

av

f

T

c

≤

p − 1

N − p + 1

+

1

p

. (4.18)

Proof. If only routing is permitted, the information is transmitted from

the source node to the receiver through a number of trees, each carrying

a diﬀerent information source. Let a

t

be the number of A-nodes in tree

t, and c

t

, the number of B- and C-nodes. Note that c

t

≥ a

t

, and that

the c

t

C-nodes are all descendants of the a

t

A-nodes. Therefore, we can

4.3 Conﬁgurations with Large Network Coding Beneﬁts 69

count the number of the receivers spanned by the tree as follows: Let

n

t

(A

j

) be the number of C-nodes connected to A

j

, the jth A-node in

the tree. Note that

at

j=1

n

t

(A

j

) = c

t

.

The maximum number of receivers the tree can reach through A

j

is

n

t

(A

j

) + p − 1. Consequently, the maximum number of receivers the

tree can reach is

at

j=1

[n

t

(A

j

) + p − 1] = a

t

(p − 1) + c

t

.

To ﬁnd an upper bound to the routing throughput, we need to ﬁnd the

number of receivers that can be reached by a set of disjoint trees. Note

that for any set of disjoint trees, we have

t

a

t

≤

_

N

p − 1

_

and

t

c

t

≤

_

N

p

_

.

Thus, T

i

can be upper-bounded as

T

i

≤

1

N

t

(a

t

(p − 1) + c

t

) =

1

N

(p − 1)

t

a

t

+

t

c

t

≤ (p − 1)

_

N

p − 1

_

+

_

N

p

_

. (4.19)

The network coding rate T

c

is equal to

_

N−1

p−1

_

. Therefore,

T

av

i

T

c

≤

p − 1

N − p + 1

+

1

p

. (4.20)

We can apply the exact same arguments to upper bound T

av

f

, by allow-

ing a

t

and c

t

to take fractional values, and interpreting these values as

the fractional rate of the corresponding trees.

For a ﬁxed N, the right-hand side of (4.20) is minimized for

p =

N + 1

√

N + 1

√

N,

70 Throughput Beneﬁts of Network Coding

and for this value of p,

T

av

f

T

c

≤ 2

√

N

1 + N

2

√

N

. (4.21)

4.4 Conﬁgurations with Small Network Coding Beneﬁts

For a number of large classes of networks, coding can at most double

the average rate achievable by using very simple routing schemes.

3

We

next describe several such classes of networks, which include several

examples considered in the network coding literature up to date. The

simplest examples of such networks are conﬁgurations with two sources.

Example 4.3. For a network with two sources the bounds in (4.1)

give

1

2

≤

T

av

i

T

c

≤ 1

by setting h = 2. We can achieve the lower bound by simply ﬁnding a

spanning tree. In general, a tighter bound also holds:

T

av

i

T

c

≥

1

2

+

1

2N

,

and there are networks for which this bound holds with equality.

There are networks with two sources with even smaller coding through-

put advantage.

Example 4.4. In any combination network with two unit rate sources

(network coding throughput T

c

= 2) a simple routing scheme can

achieve an average throughput of at least 3T

c

/4 as follows: We route S

1

through one half of the k coding points A

i

B

i

and S

2

through the other

half. Therefore, the average routing throughput, for even k, is given by

T

av

i

=

1

_

k

2

_

_

2

_

k

2

2

_

+ 2

__

k

2

_

− 2

_

k

2

2

___

= 2

_

3

4

+

1

4(k − 1)

_

>

3

4

T

c

.

3

Recall that computing an optimum routing is in general NP-hard.

4.4 Conﬁgurations with Small Network Coding Beneﬁts 71

The term 2

_k

2

2

_

is the sum rate for the receivers that only observe

one of the two sources, while the term 2[

_

k

2

_

− 2

k

2

2] is the sum rate

for the receivers that observe both sources. A similar bound holds for

odd k.

The subtree decomposition can prove useful here in enabling us to

estimate coding advantages for large classes of networks. For exam-

ple, network coding at most doubles the throughput in all networks

with h sources and N receivers whose information ﬂow graphs are

bipartite and each coding point has two parents. The same is true

for conﬁgurations where all coding points have min-cut h from the

sources.

Example 4.5. The combination network provides an interesting

example where coding oﬀers signiﬁcant beneﬁts for the symmetric

throughput (as discussed in the previous section), while it cannot even

double the average throughput achievable by routing. To see that,

we consider a simple randomized routing scheme and compute the

expected throughput observed by a single receiver. In this scheme,

for each edge going out of the source node, one of the h information

sources is chosen uniformly at random and routed through the edge.

The probability that a receiver does not observe a particular source is

thus equal to

=

_

h − 1

h

_

h

. (4.22)

Therefore, the number of sources that a receiver observes on the average

is given by

T

av

i

= h(1 − ) = h

_

1 −

_

1 −

1

h

_

h

_

h(1 − e

−1

) >

T

c

2

. (4.23)

Some thought is required to realize that this routing scenario can

be cast into a classic urns and balls experiment in which h balls (cor-

responding to the receiver’s h incoming edges) are thrown indepen-

dently and uniformly into h urns (corresponding to the h sources).

The established theory of such occupancy models can tell us not only

72 Throughput Beneﬁts of Network Coding

the expected value of the random variable representing the number of

observed sources but also its entire probability distribution. Moreover,

we can say that as the number of sources increases, the probability that

the normalized diﬀerence between the observed throughput is greater

than some value x decreases exponentially fast with x

2

. Similar con-

centration results hold when the number of receivers is large (the rates

they experience tend to concentrate around a much larger value than

the minimum) and give us yet another reason for looking at the average

throughput.

The next example illustrates what can be achieved by coding at the

source that was described in Section 3.6.3.

Example 4.6. If in Example 4.5, in addition to the described random-

ized routing scheme, the source is allowed to apply channel coding (over

time), then the integral throughput can achieve the average asymptot-

ically over time. To see that, we ﬁrst note that under the randomized

routing, which is repeated at each time instant, each receiver observes

the sequence of each source outputs as if it had passed through an

erasure channel with the probability of erasure given by (4.22). The

capacity of such a channel is equal to 1 − . Information theory tells

us that there exists a sequence of codes with rates k/n such that the

probability of incorrect decoding goes to 0 and k/n →1 − as n →∞.

Such codes can be applied at each source, enabling each receiver to

get the same throughput. Therefore, since there are h sources, we have

T

i

(n) →h k/n as n →∞. Since k/n can be taken arbitrary close to

the capacity, we have T

i

(n) →h(1 − ) = T

i

.

These results, using uniform at random allocation of the sources,

hold in the case of B(h, k) combination networks, and additionally,

over networks with h sources, k coding point A

i

B

i

and an arbi-

trary number of receivers observing the coding points. When the

conﬁguration is symmetric, as in the case of B(h, k) networks, the

random routing can be replaced by deterministic, and the integral

throughput T

i

can achieve the average in a ﬁnite number of time

units.

4.5 Undirected Graphs 73

4.5 Undirected Graphs

We now consider undirected networks, where through each edge e =

(x, y) we can simultaneously route ﬂows f

xy

and f

yx

in opposite direc-

tions, provided their sum does not exceed the capacity of the edge, i.e.,

f

xy

+ f

yx

≤ c

e

. In undirected graphs, the coding advantage of network

coding is bounded by a constant factor of two.

Theorem 4.4. In multicasting over undirected graphs, the through-

put beneﬁts network coding oﬀers as compared to routing are upper

bounded by a factor of two

T

c

≤ 2T

f

.

To prove the theorem, we use the following result for directed graphs.

Theorem 4.5 (Bang-Jensen, 1995). Let G = (V, E) be a directed

graph with the set of receivers ¹⊂ V , such that at all vertices in V

other than the source and the receivers, the in-degree is greater or

equal than the out-degree. Assume that receiver R

i

∈ ¹ has min-cut

m

i

from the source, and let k = max

R

i

∈R

m

i

. Then there are k edge-

disjoint partial Steiner trees in G such that each node R

i

∈ R occurs

in m

i

of the k trees.

Proof. Consider an instance ¦G, S, ¹¦ where now G is an undirected

graph with unit capacity edges, and where the min-cut to each receiver

equals h. Therefore, T

c

= h. Replace each undirected edge with two

opposite directed edges, each of capacity 1/2. This graph has the fol-

lowing properties:

(1) The min-cut to each receiver is at least h/2.

(2) The in-degree at each intermediate vertex equals the out-

degree.

Clearly, the conditions of Theorem 4.5 are satisﬁed. Therefore, there

are h/2 edge-disjoint Steiner trees such that each receiver occurs in

74 Throughput Beneﬁts of Network Coding

h/2 of them. In other words, these trees span all receivers, and thus

the routing throughput can be made to be at least h/2. We conclude

that T

c

≤ 2T

f

.

Theorem 4.5 actually provides a more general claim than what we

need to prove Theorem 4.4. It additionally tells us that, even if the min-

cut to each receiver is diﬀerent, that is, we have a non-uniform demand

network, over undirected graphs we can still send to each receiver rate

larger or equal to half its min-cut value using routing.

One should be aware that although network coding does not oﬀer

signiﬁcant throughput beneﬁts for multicasting over undirected graphs,

it does oﬀer beneﬁts in terms of the transmission scheme design

complexity. Network coding schemes achieving throughput T

c

can be

designed in polynomial time (solving the LP in Section 3.5 and then

using any network code design algorithm), while ﬁnding the optimal

routing (packing Steiner trees) is a computationally hard problem.

We also note however that empirical studies that use polynomial time

approximation algorithms to solve the routing problem showed a com-

parable performance to network coding.

Notes

The ﬁrst multicast throughput beneﬁts in network coding referred

to the symmetric integral throughput in directed networks, and were

reported by Sanders et al. in [43]. Theorem 4.1 was subsequently proved

by Agarwal and Charikar in [1], and it was extended to Theorem 4.2

by Chekuri et al. in [15]. The interested reader is referred to [44] for

details on integer and linear programs, and to [4, Ch. 9] for details

on the Steiner tree problem formulations. The zk networks were intro-

duced by Zosin and Khuller in [50], and were used to demonstrate

network coding throughput beneﬁts in [1, 15]. Theorem 4.5 was proved

Bang-Jensen et al. in [5]. Throughput beneﬁts over undirected graphs

were examined by Li et al. in [37], and by Chekuri et al. in [14]. Infor-

mation theoretic rate bounds over undirected networks with two-way

channels where provided by Kramer and Savari in [33]. Experimental

results by Wu et al. reported in [47] showed small throughput bene-

4.5 Undirected Graphs 75

ﬁts over undirected network graphs of six Internet service providers.

Throughput beneﬁts that network coding can oﬀer for other types of

traﬃc scenarios were examined by Rasala-Lehman and Lehman in [41],

and by Dougherty et al. in [19]. Non-uniform demand networks were

examined by Cassuto and Bruck in [13] and later in [14].

5

Network Code Design Methods for Multicasting

We now look at network code design algorithms for multicasting under

the assumptions of the main network coding theorem (Theorem 2.2).

We assume the network multicast model as established in Section 3.1:

namely, a directed acyclic graph with unit capacity edges where the

min-cut to each of the N receivers equals the number of unit-rate

sources h.

Perhaps the most important point of this chapter is that, provided

we can use an alphabet of size at least as large as the number of

receivers, we can design network codes in polynomial time.

1

This is

what makes network coding appealing from a practical point of view.

Maximizing the multicast rate using routing is, as we discussed in

Section 3.5, NP-hard. On the other hand, coding solutions that achieve

higher rates than routing (namely, the multicast capacity) can be found

in polynomial time. Moreover, a very simple and versatile randomized

approach to coding allows to achieve very good performance in terms

of rate, and has been considered for use in practice.

1

Polynomial in the number of edges of the network graph.

76

5.1 Common Initial Procedure 77

We classify network code design algorithms based on the type

of information they require as their input: we say that an algo-

rithm is centralized if it relies on global information about the

entire network structure, and decentralized if it requires only local

and no topological information. We start this chapter by describ-

ing a common initial procedure that all algorithms execute. We then

present examples of both centralized and decentralized algorithms,

and discuss their various properties, such as scalability to network

changes.

5.1 Common Initial Procedure

Given a multicast instance ¦G = (V, E), S, ¹¦, the ﬁrst common steps

in all the algorithms described in this chapter are to

(1) ﬁnd h edge-disjoint paths ¦(S

i

, R

j

), 1 ≤ i ≤ h¦ from the

sources to the receivers, the associated graph G

=

1≤i≤h

1≤j≤N

(S

i

, R

j

), the set of coding points (, and

(2) ﬁnd the associated minimal conﬁguration.

Recall that a conﬁguration is minimal if removing any edge would vio-

late the min-cut condition for at least one receiver (Deﬁnition 3.5),

and coding points are edges in the network where two or more incom-

ing information symbols have to be combined (Deﬁnition 3.4).

Step (1) is necessary in all the algorithms described in this chapter.

To implement it we can use any max ﬂow algorithm, such as the one

given within the proof of the min-cut max-ﬂow theorem (Theorem 2.1).

A ﬂow-augmenting path can be found in time O([E[), thus the total

complexity is O([E[hN).

Step (2) is not necessary, but it can signiﬁcantly reduce the

required network resources, such as the number of coding points

and the number of employed edges by the information ﬂow. Algo-

rithm 5.1 describes a brute force implementation, which sequentially

attempts to remove each edge and check if the min-cut condition

to each receiver is still satisﬁed. This approach requires O([E[

2

hN)

operations.

78 Network Code Design Methods for Multicasting

Algorithm 5.1: Initial Processing(G, S, ¹)

Find (S

i

, R

j

) for all i, j

G

= (V

, E

) ←

1≤i≤h

1≤j≤N

(S

i

, R

j

)

∀e ∈ E

_

_

_

if (V

, E

**¸ ¦e¦) satisﬁes the multicast property
**

then

_

E

←E

¸ ¦e¦

G

←(V

, E

)

return (G

)

5.2 Centralized Algorithms

5.2.1 Linear Information Flow Algorithm

Consider an instance ¦G, S, ¹¦. A given network code (choice of coding

vectors) conveys to receiver R

j

rate at most equal to the number of

independent linear combinations that the receiver observes. In that

sense, we can (with some abuse of terminology) talk about the min-

cut value a receiver experiences under a given network code, and of

whether a speciﬁc network code preserves the multicast property. In

other words, we can think of the network code as being part of the

channel.

The Linear Information Flow (LIF) is a greedy algorithm based

exactly on the observation that the choice of coding vectors should

preserve the multicast property of the network. The algorithm sequen-

tially visits the coding points in a topological order

2

(thus ensuring

that no coding point is visited before any of its ancestors in the graph),

and assigns coding vectors to them. Each assigned coding vector pre-

serves the multicast property for all downstream receivers up to the

coding point to which it is assigned. Intuitively, the algorithm pre-

serves h “degrees of freedom” on the paths from the sources to each

receiver.

2

A topological order is simply a partial order. Such an order exists for the edges of any

graph G that is acyclic.

5.2 Centralized Algorithms 79

To precisely describe the algorithm, we need some additional nota-

tion. Let ( denote the set of all coding points and R(δ) the set of all

receivers that employ a coding point δ in one of their paths. Each cod-

ing point δ appears in at most one path (S

i

, R

j

) for each receiver R

j

.

Let f

j

←

(δ) denote the predecessor coding point to δ along this path

(S

i

, R

j

).

For each receiver R

j

, the algorithm maintains a set (

j

of h coding

points, and a set B

j

= ¦c

j

1

, . . . , c

j

h

¦ of h coding vectors. The set (

j

keeps

track of the most recently visited coding point in each of the h edge

disjoint paths from the source to R

j

. The set B

j

keeps the associated

coding vectors.

Initially, for all R

j

, the set C

j

contains the source nodes

¦S

e

1

, S

e

2

, . . . , S

e

h

¦, and the set B

j

contains the orthonormal basis

¦e

1

, e

2

, . . . , e

h

¦, where the vector e

i

has one in position i and zero else-

where. Preserving the multicast property amounts to ensuring that the

set B

j

forms a basis of the h-dimensional space F

h

q

at all steps of the

algorithm and for all receivers R

j

. At step k, the algorithm assigns a

coding vector c(δ

k

) to the coding point δ

k

, and replaces, for all receivers

R

j

∈ R(δ

k

):

• the point f

j

←

(δ

k

) in (

j

with the point δ

k

,

• the associated vector c(f

j

←

(δ

k

)) in B

j

with c(δ

k

).

The algorithm selects the vector c(δ

k

) so that for every receiver

R

j

∈ R(δ

k

), the set (B

j

¸ ¦c(f

j

←

(δ

k

))¦) ∪ c(δ

k

) forms a basis of the

h-dimensional space. Such a vector c(δ

k

) always exists, provided

that the ﬁeld F

q

has size larger than N, as Lemma 5.2 proves.

When the algorithm terminates, the set B

j

contains the set of lin-

ear equations the receiver R

j

needs to solve to retrieve the source

symbols.

Lemma 5.1. Consider a coding point δ with m≤ h parents and a

receiver R

j

∈ R(δ). Let 1(δ) be the m-dimensional space spanned by

the coding vectors of the parents of δ, and 1(R

j

, δ) be the (h − 1)-

dimensional space spanned by the elements of B

j

after removing

80 Network Code Design Methods for Multicasting

c(f

j

←

(δ)). Then

dim¦1(δ) ∩ 1(R

j

, δ)¦ = m − 1.

Proof. Since the dimension of the intersection of two subspaces A and

B is given by

dim¦A ∩ B¦ = dimA + dimB − dim¦A ∪ B¦,

we only need to show that dim¦1(δ) ∪ 1(R

j

, δ)¦ = h. This is true

because 1(δ) contains c(f

j

←

(δ)) and 1(R

j

, δ) contains the rest of the

basis B

j

.

Lemma 5.2. The LIF algorithm successfully identiﬁes a valid network

code using any alphabet F

q

of size q > N.

Proof. Consider a coding point δ with m≤ h parents and a receiver

R

j

∈ R(δ). The coding vector c(δ) has to be a non-zero vector in the

m-dimensional space 1(δ) spanned by the coding vectors of the par-

ents of δ. There are q

m

− 1 such vectors, feasible for c(δ). To make the

network code valid for R

j

, c(δ) should not belong to the intersection

of 1(δ) and the (h − 1)-dimensional space 1(R

j

, δ) spanned by the ele-

ments of B

j

after removing c(f

j

←

(δ)). The dimension of this intersection

is m − 1, and thus the number of the vectors it excludes from 1(δ) is

q

m−1

− 1. Therefore, the number of vectors excluded by [R(δ)[ receivers

in δ is at most [R(δ)[q

m−1

− 1 ≤ Nq

m−1

− 1. (Here, the all-zero vector

is counted only once.) Therefore, provided that

q

m

> Nq

m−1

⇔q > N,

we can ﬁnd a valid value for c(δ). Applying the same argument succes-

sively to all coding points concludes the proof.

Note that that this bound may not be tight, since it was obtained by

assuming that the only intersection of the [R(δ)[ (h − 1)-dimensional

spaces 1(R

j

, δ), R

j

∈ R(δ) is the all-zero vector, which was counted

5.2 Centralized Algorithms 81

only once. However, each pair of these spaces intersects on an

(h − 2)-dimensional subspace. Tight alphabet size bounds are known

for networks with two sources and for some other special class of net-

works, which we discuss in Chapter 7.

The remaining point to discuss is how the algorithm identiﬁes a

valid coding vector c(δ

k

) at step k. It is possible to do so either deter-

ministically (e.g., by exhaustive search), or by randomly selecting a

value for c(δ

k

).

Lemma 5.3. A randomly selected coding vector c(δ

k

) at step k of

the LIF preserves the multicast property with probability at least

1 − N/q.

Proof. Receiver R

i

∈ R(δ

k

) requires that c(δ

k

) does not belong to

the (m − 1)-dimensional space spanned by the elements of B

j

¸

¦c(f

j

←

(δ

k

))¦. Randomly selecting a vector in F

m

q

does not satisfy

this property with probability q

m−1

/q

m

= 1/q. Using the fact that

[R(δ

k

)[ ≤ N and applying the union bound the result follows.

From Lemma 5.3, if for example we choose a ﬁeld F

q

of size q > 2N

the probability of success for a single coding vector is at least

1

2

. Thus

repeating the experiment on the average two times per coding point we

can construct a valid network code.

A ﬁnal observation is that we can eﬃciently check whether the

selected vector c(δ

k

) belongs to the space spanned by B

j

¸ ¦c(f

j

←

(δ

k

))¦

by keeping track of, and performing an inner product with, a vector α

j

,

that is orthogonal to this subspace.

3

The LIF algorithm is summarized

in the following Algorithm 5.2. The algorithm can be implemented to

run in expected time O([E[Nh

2

).

3

Note that B

j

\ {c(f

j

←

(δ

k

))} forms an hyperplane, i.e., an h − 1 dimensional subspace,

of the h-dimensional space that can be uniquely described by the associated orthogonal

vector.

82 Network Code Design Methods for Multicasting

Algorithm 5.2: LIF(¦G, S, ¹¦, F

q

with q > N)

( ←coding points of ¦G, S, ¹¦ in topological order

∀δ ∈ (, ¹(δ) ←¦R

j

such that δ is in a path (S

i

, R

j

)¦

∀R

j

∈ ¹,

_

_

_

B

j

←¦e

1

, e

2

, . . . , e

h

¦,

C

j

←¦S

e

1

, S

e

2

, . . . , S

e

h

¦

∀δ ∈ C

j

, a

j

(δ) = e

j

• Repeat for k steps, 1 ≤ k ≤ [([

At step k access δ

k

∈ (

and

_

¸

¸

¸

¸

¸

¸

¸

_

¸

¸

¸

¸

¸

¸

¸

_

Find c(δ

k

) such that c(δ

k

) a

j

(δ

k

) ,= 0 ∀R

j

∈ R(δ

k

)

∀R

j

∈ R(δ

k

)

_

¸

¸

¸

¸

¸

_

¸

¸

¸

¸

¸

_

C

j

←(C

j

¸ ¦f

j

←

(δ

k

)) ∪ δ

k

B

j

←(B

j

¸ ¦c(f

j

←

(δ

k

))¦) ∪ c(δ

k

)

α

j

(δ

k

) ←

1

c(δ

k

)·a

j

(f

j

←

(δ

k

))

a

j

(f

j

←

(δ

k

))

∀δ ∈ C

j

¸ δ

k

:

α

j

(δ) ←α

j

(δ) − (c(δ

k

) α

j

(δ))α

j

(δ

k

)

return (the coding vectors c(δ), ∀δ ∈ ()

5.2.2 Matrix Completion Algorithms

The problem of network code design can be reduced to a problem of

completing mixed matrices. Mixed matrices have entries that consist

both of unknown variables and numerical values. The problem of matrix

completion is to select values for the variables so that the matrix satis-

ﬁes desired properties. For the network code design, the desired prop-

erty is that N matrices sharing common variables are simultaneously

full rank. These are the N transfer matrices from the source to each

receiver, as discussed in Section 3.2.

We now discuss a very nice connection between bipartite graph

matching and matrix completion. Such ideas have been extended to

deterministic algorithms for network code design.

Consider a matrix A whose every entry is either zero or an unknown

variable. The problem of selecting the variable values so that the matrix

is full rank over the binary ﬁeld (if possible) can be reduced to a bipar-

tite matching problem as follows. Construct a bipartite graph with the

5.3 Decentralized Algorithms 83

left nodes corresponding to the rows and the right nodes the columns

of matrix A. For a non-zero entry in the position (i, j) in A, place an

edge between left node i and right node j. A perfect matching is a sub-

set of the edges such that each node of the constructed bipartite graph

is adjacent to exactly one of the edges. Identifying a perfect matching

can be accomplished in polynomial time. It is also easy to see that the

identiﬁed matching corresponds to an assignment of binary values to

the matrix entries (one for an edge used, zero otherwise) such that the

matrix is full rank.

5.3 Decentralized Algorithms

5.3.1 Random Assignment

The basic idea in this algorithm is to operate over a ﬁeld F

q

with q

large enough for even random choices of the coding vector coeﬃcients

to oﬀer with high probability a valid solution. This algorithm requires

no centralized or local information, as nodes in the network simply

choose the non-zero components of local coding vectors independently

and uniformly from F

q

. The associated probability of error can be made

arbitrarily small by selecting a suitably large alphabet size, as the fol-

lowing theorem states.

Theorem 5.4. Consider an instance ¦G = (V, E), S, ¹¦ with N = [¹[

receivers, where the components of local coding vectors are chosen uni-

formly at random from a ﬁeld F

q

with q > N. The probability that

all N receivers can decode all h sources is at least (1 − N/q)

η

, where

η

**≤ [E[ is the maximum number of coding points employed by any
**

receiver.

That is, η

**is the number of coding points encountered in the h paths
**

from the source node to any receiver, maximized over all receivers.

Recall that a network code is valid if it assigns values to variables

α

1

, . . . , α

η

, (η

< η) such that

f(α

1

, . . . , α

η

) = detA

1

detA

2

detA

N

,= 0, (5.1)

84 Network Code Design Methods for Multicasting

where A

j

is the transfer matrix for receiver j, 1 ≤ j ≤ N. We have

seen in Chapters 2 and 3 that for acyclic directed networks, f is a

polynomial with maximum degree in each variable at most N. With

this observation, the following lemma is suﬃcient to prove Theorem 5.4.

Lemma 5.5. Let f(α

1

, . . . , α

η

) be a multivariate polynomial in vari-

ables α

1

, . . . , α

η

, with maximum degree in each variable of at most N

and total degree at most Nη

**. If the values for α
**

1

, . . . , α

η

are chosen

uniformly at random from a ﬁnite ﬁeld F

q

of size q > N on which f

is not identically equal to zero, then the probability that f(α

1

, . . . , α

η

)

equals zero is at most 1 − (1 − N/q)

η

.

We omit the proof of this lemma, and instead present a simple proof

of a slightly weaker but more often used result which, in Theorem 5.4,

replaces η

**by the total number of local coding coeﬃcients η. For this
**

result, the following lemma is suﬃcient.

Lemma 5.6. Let f(α

1

, . . . , α

η

) be a multivariate polynomial in η vari-

ables with the maximum degree in each variable of at most N. If the

values for α

1

, . . . , α

η

are chosen uniformly at random from a ﬁnite ﬁeld

F

q

of size q > N on which f is not identically equal to zero, then the

probability that f(α

1

, . . . , α

η

) equals zero is at most 1 − (1 − N/q)

η

.

Proof. We prove the lemma by induction in the number of variables η:

(1) For η = 1, we have a polynomial in a single variable of degree

at most N. Since such polynomials can have at most N roots,

a randomly chosen element from a ﬁeld F

q

satisfying the

conditions of the theorem will be a root of f with probability

of at most N/q.

(2) For η > 1, the inductive hypothesis is that the claim holds

for all polynomials with fewer than η variables. We express

f in the form

f(α

1

, . . . , α

η

) = α

d

η

f

1

(α

1

, . . . , α

η−1

) + f

2

(α

1

, . . . , α

η

),

5.3 Decentralized Algorithms 85

where d ≤ N, f

1

is a not-identically zero polynomial in η − 1

variables, and f

2

is a polynomial in η variables with degree in

each variable of at most N, that may contain all other powers

α

η

except for α

d

η

. Assume that we uniformly at random chose

values for the variables α

1

, . . . , α

η

from a ﬁnite ﬁeld F

q

of size

q > N. Then

Pr[f = 0] = Pr[f

1

= 0] Pr[f = 0[f

1

= 0]

+(1 − Pr[f

1

= 0]) Pr[f = 0[f

1

,= 0].

We now bound the above three probabilities as follows:

(a) Pr[f

1

= 0] ≤ 1 − (1 − N/q)

(η−1)

by the inductive

hypothesis.

(b) Pr[f = 0[f

1

= 0] ≤ 1.

(c) Pr[f = 0[f

1

,= 0] ≤ N/q, since when we evaluate f at

the values of α

1

, . . . , α

η−1

it becomes a polynomial in

α

η

of degree at least d and at most N.

Therefore

Pr[f = 0] ≤ Pr[f

1

= 0] + (1 − Pr[f

1

= 0])

N

q

by (b) and (c)

= Pr[f

1

= 0]

_

1 −

N

q

_

+

N

q

≤

_

1 −

_

1 −

N

q

_

(η−1)

__

1 −

N

q

_

+

N

q

by (a)

= 1 −

_

1 −

N

q

_

η

.

The randomized coding approach may use a larger alphabet than

necessary. However, it is decentralized, scalable and yields to a very

simple implementation. In fact, as we discuss in the second part of this

review, it is very well suited to a number of practical applications, such

as dynamically changing networks. We next brieﬂy discuss how this

approach, and more generally, network coding, can be implemented in

practice.

86 Network Code Design Methods for Multicasting

5.3.2 Network Coding in Practice: Use of Generations

In practical networks, information is sent in packets. Each packet con-

sists of L information bits, and sets of m consecutive bits are treated

as a symbol of F

q

with q = 2

m

. Coding amounts to linear operations

over the ﬁeld F

q

. The same operations are applied symbol-wise to each

of the L/m symbols of the packet.

Over the Internet packets between a source–destination pair are

subject to random delays, may get dropped, and often follow diﬀerent

routes. Moreover, sources may produce packets asynchronously. It is

thus diﬃcult to implement a centralized network coding algorithm. To

deal with the lack of synchronization, the packets are grouped into

generations. Packets are combined only with other packets in the same

generation. A generation number is also appended to the packet headers

to make this possible (one byte is suﬃcient for this purpose).

Assume that a generation consists of h information packets, which

we will call source packets. A feasible approach to perform network

coding in a distributed and asynchronous manner, is to append to each

packet header an h-dimensional coding vector that describes how the

packet is encoded, that is, which linear combination of the source pack-

ets it carries.

Intermediate nodes in the network buﬀer the packets they receive.

To create a packet to send, they uniformly at random select coeﬃcients,

and symbol-wise combine buﬀered packets that belong in the same

generation. They use the packet’s header information to calculate the

new coding vector to append. The receivers use the appended coding

vectors to decode.

Note that use of the coding vector in the header incurs a small

additional overhead. For example, for a packet that contains 1400 bytes,

where every byte is treated as a symbol over F

2

8, if we have h = 50

sources, then the overhead is approximately 50/1400 ≈ 3.6%.

The size of a generation can be thought of as the number of sources

h in synchronized networks: it determines the size of matrices the

receivers need to invert to decode the information. Since inverting an

h h matrix requires O(h

3

) operations, and also aﬀects the delay, it

is desirable to keep the generation size small. On the other hand, the

5.3 Decentralized Algorithms 87

size of the generation aﬀects how well packets are “mixed,” and thus

it is desirable to have a fairly large generation size. Indeed, if we use a

large number of small-size generations, intermediate nodes may receive

packets destined to the same receivers but belonging to diﬀerent gener-

ations. Experimental studies have started investigating this trade-oﬀ.

5.3.3 Permute-and-Add Codes

Randomized network code requires each intermediate node to perform

O(h

2

) operations for every packet it encodes. The codes we describe in

this section reduce this complexity to O(h). These codes can be thought

of as performing, instead of the randomized network coding over F

q

described previously, randomized vector network coding over F

L

2

, with

q = 2

L

. However, to achieve low encoding complexity, these codes do

not randomly select L L matrices over F

L

2

, but restrict the random

selection over permutation matrices. Each source produces a length

L binary packet. As their name indicates, the permute-and-add codes

have intermediate nodes permute each of their incoming binary packets

and then xor them, to generate the new packet to send. Intermediate

nodes choose uniformly at random which of the L! possible permutation

to apply to each of their incoming packets. Each receiver solves a set

of hL hL binary equations to retrieve the information data.

Theorem 5.7. Let h be the binary min-cut capacity. For any > 0,

the permute-and-add code achieves rate R = h − ([E[ + 1) with prob-

ability greater than 1 − 2

L+log(N

|E|+h

L+1

)

.

We here only give a high-level outline of the proof. The basic idea is an

extension of the polynomial time algorithms presented in Section 5.2.1.

To show that the transfer matrix from the source to each receiver is

invertible, it is suﬃcient to make sure that the information crossing

speciﬁc cut-sets allows to infer the source information. These cut-sets

can be partially ordered, so that no cut-set is visited before its “ances-

tor” cut-sets. The problem is then reduced to ensuring that the transfer

matrix between successive cut-sets is, with high probability, full rank.

88 Network Code Design Methods for Multicasting

5.3.4 Algorithms for Certain Classes of Networks

These algorithms use some high level a priori information about the

class of networks the multicast conﬁguration belongs to. The following

example gives simple decentralized algorithms for networks with two

receivers.

Example 5.1. Codes for networks with two receivers

If we have a minimal conﬁguration with h sources and two receivers,

each coding point needs to simply xor its incoming information symbols

to get a valid network code. This is because the binary alphabet is suf-

ﬁcient for networks with two receivers, and for a minimal conﬁguration,

each input to a coding point have to be multiplied by non-zero coeﬃ-

cients.

Codes for networks with two sources are also simple to construct.

Example 5.2. Codes for networks with two sources

As discussed in Section 3.3, to label the nodes of a subtree graph of a

network with two sources, we can use the points on the projective line

PG(1, q)

[01], [10], and [1α

i

] for 0 ≤ i ≤ q − 2. (5.2)

Recall that for a valid network code, it is suﬃcient and necessary that

the coding vector associated with a subtree lies in the linear span of

the coding vectors associated with its parent subtrees, and the coding

vectors of any two subtrees having a receiver in common are linearly

independent. Since any two diﬀerent points of (5.2) are linearly inde-

pendent in F

2

q

, and thus each point is in the span of any two diﬀerent

points of (5.2), both coding conditions for a valid network code are

satisﬁed if each node in a subtree graph of a network is assigned a

unique point of the projective line PG(1, q). We here present two algo-

rithms which assign distinct coding vectors to the nodes in the subtree

graph.

In Chapter 7, we will discuss a connection between the problem

of designing codes for networks with two sources and the problem of

5.3 Decentralized Algorithms 89

graph coloring. Using this connection we can additionally employ all

graph coloring algorithms for network code design in this special case.

Such algorithms may result in codes that are more eﬃcient in terms of

the alphabet size. However, those presented here are decentralized and

mostly scalable to network changes.

The ﬁrst method is inspired by the Carrier Sense Multiple Access

(CSMA) systems, where access to the network is governed by the

possession of a token. The token is circulating around the network,

and when a device needs to transmit data, it seizes the token when

it arrives at the device. The token remains at the possession of the

transmission device until the data transfer is ﬁnished. In our algo-

rithm, a token will be generated for each coding vector. Each cod-

ing point (associated terminal) seizes a token and uses the associ-

ated coding vector. If a change in the network occurs, for example,

receivers leave the multicast session, tokens that are no longer in use

may be released back in the network to be possibly used by other coding

points.

An alternative simple way to organize a mapping from coding vec-

tors to coding points is described below. Recall that at each sub-

tree, we locally know which receivers it contains and which sources

are associated with each receiver (at the terminal before the cod-

ing point). In networks with two sources, each coding subtree con-

tains at least one receiver node associated with S

1

and at least one

receiver node associated with S

2

. Let R(T; S

1

) = ¦R

i

1

, R

i

2

, . . . , R

iu

¦,

where i

1

< i

2

< < i

u

be the set of receivers associated with S

1

in

a given subtree T. We choose [1 α

i

1

−1

] to be the label of that subtree.

This way no other subtree can be assigned the same label since the

receiver R

i

1

can be associated with the source S

1

in at most one sub-

tree. Note that this is not the most eﬃcient mapping as it may require

alphabet size of q = N + 1, as opposed to q = N. This is because for N

receivers, we will use coding vectors from the set [1α

i

] for 0 ≤ i ≤ q − 2

with q − 2 = N − 1.

We now discuss how the code design problem can be simpliﬁed at the

cost of using some additional network resources. These algorithms are

well suited to applications such as overlay and ad-hoc networks, where

90 Network Code Design Methods for Multicasting

we have at our disposal large graphs rather then predetermined sets of

paths from sources to receivers.

Example 5.3. One way to take advantage of the increased connec-

tivity is to (if possible) partition the network G with h sources (h

even) into h/2 independent two-source conﬁgurations, and then code

each subnetwork separately in a possibly decentralized manner. Algo-

rithm 5.3 outlines this procedure.

Algorithm 5.3: Paired-Sources Code(G)

Group sources into pairs, and

for each pair of sources (S

i

1

, S

i

2

)

_

_

_

Find paths (S

i

1

, R

j

), (S

i

2

, R

j

), 1 ≤ j ≤ N in G

Design a network code based on these paths.

Update G by removing the used paths.

The following example provides the natural generalization of Exam-

ple 5.2 to networks with h sources.

Example 5.4. Codes that employ vectors in general position

In a network with [([ coding points, a simple decentralized code design

is possible if we are willing to use an alphabet of size [([ + h − 1 as

well as additional network resources (edges and terminals) to ensure

that the min-cut to each coding point be h. The basic idea is that the

h coding vectors of the parents of each coding point T, form a basis of

the h-dimensional space. Thus, any coding vector in the h-dimensional

space is eligible as c(T). Additionally, selecting the coding vectors to

be points in general position (see Appendix) ensures that each receiver

observes h linearly independent coding vectors.

More speciﬁcally, for a minimal conﬁguration with h sources and [([

coding points where the min-cut to each coding point is h, an alphabet

of size [([ + h − 1 is suﬃciently large for decentralized coding. Indeed,

we can use the [([ + h points in general position in a normal rational

5.4 Scalability to Network Changes 91

curve in PG(h − 1, [([ + h − 1) (see Appendix) to assign h vectors to

the source subtrees and [([ vectors to the coding subtrees.

Here, we can think of coding points as edges incident to special

network terminals, that have not only enhanced computational power,

but also enhanced connectivity. Note that, even in this case, multicast

with network coding requires lower connectivity then multicast without

coding, because multicast without coding is possible iﬀ the min-cut

toward every node of the graph is h, not just the coding points.

5.4 Scalability to Network Changes

A desirable characteristic of network codes is that codes do not change

when receivers join or leave the network. For the following discussion,

we will use the subtree decomposition described in Section 3.3. An

advantage of decentralized codes that employ vectors in general posi-

tion is that they do not have to be changed with the growth of the

network as long as the network’s subtree decomposition retains the

same topology regardless of the distribution of the receivers, or the

new subtree graph contains the original subtree graph.

In a network using decentralized coding, the coding vectors associ-

ated with any h subtrees provide a basis of the h-dimensional space.

We can think of subtrees as “secondary sources” and allow the new

receivers to connect to any h diﬀerent subtrees. Thus we can extend

the multicast network, without any coding/decoding changes for exist-

ing users. Subtree decomposition enables scalability in the described

manner even when the code is not decentralized, since we can have new

receivers contact subtrees, much like today terminals contact servers,

until they connect to h subtrees that allow them to retrieve the source

information.

In the above scenario, the receivers are allowed to join the network

only in a way that will not change the topology of the subtree graph.

Alternatively, if for example in a network with two sources, addition of

new receivers results in new subtrees without disturbing the existing

ones, then the existing code can be simply extended without any cod-

ing/decoding changes for existing users. Note that the projective line

PG(1, q) (see Appendix) can be thought of as a subset of the projective

92 Network Code Design Methods for Multicasting

line PG(1, q

), where F

q

is an extension ﬁeld of F

q

. Thus, if we need

to create additional coding vectors to allocate to new subtrees, we can

employ unused points from the projective line PG(1, q

).

Notes

LIF, proposed by Sanders et al. in [43], and independently by Jaggi

et al. in [29] was the ﬁrst polynomial algorithms for network code

design (see also [30]). These algorithms were later extended to include

procedures that attempt to minimize the required ﬁeld size by Barbero

and Ytrehus in [6]. Randomized algorithms were proposed by Ho et al.

in [26], and also by Sanders et al. in [43], and their asynchronous imple-

mentation over practical networks using generations by Chou et al.

in [16]. Lemma 5.5 instrumental for Theorem 5.4 was proved by Ho et al.

in [26]. Codes that use the algebraic structure were designed by Koetter

and M´edard [32], while the matrix completion codes were investigated

by Harvey in [25]. Permute-and-add codes were recently proposed by

Jaggi et al. in [28]. Decentralized deterministic code design was intro-

duced by Fragouli and Soljanin in [24]. Minimal conﬁgurations and the

brute force algorithm to identify them were also introduced in [24].

Improved algorithms for identifying minimal conﬁgurations were con-

structed in [34].

6

Networks with Delay and Cycles

For most of this review we have assumed that all nodes in the network

simultaneously receive all their inputs and produce their outputs, and

that networks have no cycles. We will now relax these assumptions.

We ﬁrst look at how to deal with delay over acyclic graphs. We then

formally deﬁne networks with cycles, and discuss networks that have

both cycles and delay. In fact, we will see that one method to deal with

cycles is by artiﬁcially introducing delay.

6.1 Dealing with Delay

Network links typically introduce variable delay which may cause prob-

lems of synchronization in the network code operations. There are

two known ways to deal with networks with delay: the asynchronous

approach, brieﬂy discussed in Chapter 5, does not allocate to network

nodes predetermined operations, but instead, divides information pack-

ets into generations, and appends a “generation identiﬁcation tag” to

the packet. Intermediate network nodes store the received packets of

the same generation, opportunistically combine them, and append the

associated coding vector to the outgoing packets. Here network nodes

93

94 Networks with Delay and Cycles

only need to know at which rate to inject coded packets into their out-

going links. This approach is ideally suited to dynamically changing

networks, where large variations in delay may occur.

In contrast, the approach discussed in this chapter deals with the

lack of synchronization by using time-windows at intermediate network

nodes to absorb the variability of the network links delay. This second

approach is better suited for networks with small variability in delay.

Intermediate nodes wait for a predetermined time interval (larger than

the expected delay of the network links) to receive incoming packets

before combining them. Under this scenario, we can model the network

operation by associating one unit delay T (or more) either with each

edge or only with edges that are coding points. We next discuss how to

design network codes while taking into account the introduced (ﬁxed)

delay.

Example 6.1. Consider the butterﬂy network in Figure 1.2 and

assume that there is a unit delay associated with every edge of the

graph. Let σ

1

(t) and σ

2

(t) denote the bits transmitted by the source at

time t, t ≥ 1. Assume that node C still simply xors its incoming infor-

mation bits. Then through edge AD, the receiver R

1

gets the sequence

of bits

¦∗, σ

1

(1), σ

1

(2), σ

1

(3), σ

1

(4), . . .¦ =

t=1

T

t+1

σ

1

(t),

and through edge ED, the sequence of bits

¦∗, ∗, ∗, σ

1

(1) + σ

2

(1), σ

1

(2) + σ

2

(2) + ¦ =

t=1

T

t+3

(σ

1

(t) + σ

2

(t)),

where “∗” means that nothing was received through the edge during

that time slot. Equivalently, receiver R

1

observes sequences y

1

(T) and

y

2

(T) given as

_

y

1

(T)

y

2

(T)

_

=

_

T 0

T

3

T

3

_

. ¸¸ .

A

1

_

σ

1

(T)

σ

2

(T)

_

,

where σ

1

(T) =

t

σ

1

(t)T

t

and σ

2

(T) =

t

σ

2

(t)T

t

, and we use A

1

to

denote the transfer matrix. For acyclic graphs, the elements of the

6.1 Dealing with Delay 95

transfer matrix are polynomials in T. Receiver R

1

can decode at time

t + 3 the symbols σ

1

(t) and σ

2

(t). Namely, it can process its received

vectors by multiplying with the inverse transfer matrix B

1

.

_

T

2

0

T

2

1

_

. ¸¸ .

B

1

_

y

1

(T)

y

2

(T)

_

= T

3

_

1 0

0 1

__

σ

1

(T)

σ

2

(T)

_

.

In practice, ﬁxed delay only means that instead of dealing with

matrices and vectors whose elements belong to F

q

, we now have to

deal with those whose elements are polynomials in the delay opera-

tor T with coeﬃcients over F

q

, i.e., belong in the ring of polynomials

F

q

[T] = ¦

i

α

i

T

i

, α

i

∈ F

q

¦. More generally, as we will discuss in the

section about cycles, we will be dealing with vector spaces over the

ﬁeld of rational functions of polynomials in F

q

[T], that is, elements of

F

q

(T) =

_

a(D)

b(D)

, a(T), b(T) ∈ F

q

[T], b(T) ,= 0

_

. In this framework, we

can still talk about full rank and invertible matrices, where a matrix

B is the inverse of a matrix A if AB = BA = I. The goal of network

code design is to select coding vectors so that each receiver R

j

observes

an invertible transfer matrix A

j

. It is easy to see that the main theo-

rem in network coding (Theorem 2.2) still holds. Indeed, we can follow

a very similar proof, merely replacing elements of F

q

with elements

of F

q

(T).

6.1.1 Algorithms

There are basically two ways to approach network code design for net-

works for delay:

(1) Coding vectors have elements over F

q

. That is, intermedi-

ate nodes are only allowed to linearly combine over F

q

the

incoming information streams, but do not incur additional

delay. The delay is introduced only by the network links.

(2) Coding vectors have elements over F

q

[T] or F

q

(T). That

is, intermediate nodes may store their incoming informa-

tion streams, and transmit linear combinations of their

past and current inputs, i.e., introduce additional delay.

96 Networks with Delay and Cycles

The beneﬁts we may get in this case as compared to the

previous approach, is operation over a small ﬁeld F

q

, such as

the binary ﬁeld.

In both cases, all algorithms described in Chapter 5 for network code

design can eﬀectively still be applied. For example, consider the LIF

algorithm in Section 5.2.1. This algorithm sequentially visits the coding

points in a topological order and assigns coding vectors to them. Each

assigned coding vector is such that all downstream receivers can still

decode h source symbols produced during the same time-slot. In the

case of networks with delay, it is suﬃcient to select coding vectors so

that all downstream receivers get h linear independent combinations

that allow them to decode h information symbols, one from each source,

but where the h symbols are not necessarily produced during the same

time slot.

6.1.2 Connection with Convolutional Codes

If we associate delay with the network links, we can think of the net-

work as a convolutional code where memory elements are connected as

dictated by the network topology, inputs are connected at the sources

and outputs are observed at the receivers. This code is described

by the ﬁnite-dimensional state-space equations (3.2) and the transfer

matrix (3.3).

If we associate delay only the coding points in the graph, we obtain a

convolutional code corresponding to the subtree graph of the network.

An example subtree conﬁguration is shown in Figure 6.1(a) and its

corresponding convolutional code in Figure 6.1(b).

Convolutional codes are decoded using a trellis and trellis decod-

ing algorithms, such as the Viterbi algorithm. Given that the receivers

observe the convolutional code outputs with no noise, in trellis decod-

ing we can proceed step by step, at each step decoding h information

symbols and keeping track of one state only. In trellis decoding ter-

minology, the decoding trace-back depth is one. Thus the complexity

of decoding is determined by the complexity of the trellis diagram,

namely, the number of states and the way they are connected.

6.1 Dealing with Delay 97

Fig. 6.1 Conﬁguration with 2 sources and 5 receivers: (a) the subtree graph; (b) the corre-

sponding convolutional encoder.

One way to reduce the decoder complexity is to identify, among

all encoders that satisfy the multicast condition and the constraints

of a given topology, the one that has the smallest number of mem-

ory elements, and thus the smallest number of states. Note that the

minimization does not need to preserve the same set of outputs, as we

are not interested in error-correcting properties, but only the min-cut

condition for each receiver.

Another way to reduce the decoder complexity is to use for decoding

the trellis associated with the minimal strictly equivalent encoder to

G

i

(T). Two codes are strictly equivalent if they have the same mapping

of input sequences to output sequences. Among all strictly equivalent

encoders, that produce the same mapping of input to output sequences,

the encoder that uses the smallest number of memory elements is called

minimal.

Typically for convolutional encoders, we are interested in equivalent

encoders, that produce the same set of output sequences. Only recently,

with the emergence of turbo codes where the mapping from input to

output sequences aﬀects the code’s performance, the notion of strictly

equivalent encoders has become important. Here we provide another

example where this notion is useful. Note that in the conventional use

of convolutional codes, there is no need to use a diﬀerent encoder to

encode and decode. In our case, we are restricted by the network con-

ﬁguration for the choice of the encoder G

i

(T). However, given these

constraints, we still have some freedom at the receiver to optimize for

decoding complexity.

98 Networks with Delay and Cycles

6.2 Optimizing for Delay

Consider the following scenario. A source needs to multicast h infor-

mation bits (or packets), ¦b

1

, b

2

, . . . , b

h

¦, to a set of N receivers with

as low delay as possible over a network with a delay associated with

traversing every link of the network. For multimedia applications, a

delay measure of interest is the inter-arrival delay. Deﬁne delay δ

ij

to

be the number of time slots between the moment when receiver R

j

suc-

cessfully decodes bit b

i−1

to the moment when it successfully decodes

bit b

i

. We are interested in identifying the optimal routing and network

coding strategy, such that the overall average inter-arrival delay

j

i

δ

ij

Nh

is minimized. Note that use of network coding may allow us to achieve

higher information rates, thus reducing the delay. On the other hand,

use of network coding may increase the delay, because a receiver may

need to wait for the arrival of several coded bits before decoding.

As the following example illustrates, either routing or network cod-

ing or a combination of the two may enable us to achieve the optimal

delay when multicasting. But neither routing nor network coding may

allow us to achieve the delay that a receiver would experience were

it to use all the network resources by itself. When in Part II we dis-

cuss network coding applications to wireless, we will see other examples

showing how use of network coding may enable delay reduction.

Example 6.2. Consider the butterﬂy network in Figure 1.2 and

assume that the source would like to deliver m bits to both receivers.

Assume that edges AD and BF have unit delay, while edge CE has an

associated delay of α delay units.

Then with network coding, the average interarrival delay equals

1

2(α − 1) +

m−2(α−1)

2

m

=

1

2

+

α − 1

m

1

For simplicity, we ignore the fact that the network coding solution does not preserve the

ordering of the bits.

6.3 Dealing with Cycles 99

timeslots. Indeed, for the ﬁrst α − 1 timeslots each receiver gets one

uncoded bit through AD and BF, respectively, then for (m − 2(α −

1))/2 timeslots each receiver can successfully decode two bits per time-

slot, and ﬁnally for α − 1 timeslots the receivers only receive and decode

one bit per timeslot through CE.

On the other hand, by using routing along AD and BF and time-

sharing along CE, the source can deliver the information in

α − 1 +

2(m−α+1)

3

m

=

2

3

+

1

3

α − 1

m

timeslots. Depending on the value of α and m, the network coding solu-

tion might lead to either less or more delay than the routing solution.

Finally, if a receiver exclusively uses the network resources, the

resulting delay would be

1

2

+

1

2

α − 1

m

that outperforms both previous approaches.

6.3 Dealing with Cycles

In this section, we consider network topologies that have cycles. We

distinguish two cases, depending on whether a partial order con-

straint, which we describe below, is satisﬁed. The problem of cycles

has to be addressed only for networks which do not satisfy this

constraint.

We observe that each path (S

i

, R

j

) from source S

i

to receiver R

j

induces a partial order on the set of edges: if edge a is a child of edge

b then we say that a < b. The source node edge is the maximal ele-

ment. Each diﬀerent path imposes a diﬀerent partial order on the

same set of edges. We distinguish the graphs depending on whether

the partial orders imposed by the diﬀerent paths are consistent. Con-

sistency implies that for all pairs of edges a and b, if a < b in some

path, there does not exist a path where b < a. A suﬃcient, but not

necessary, condition for consistency is that the underlying graph G is

acyclic.

100 Networks with Delay and Cycles

Fig. 6.2 Two networks with a cycle ABCD: (a) paths (S

1

, R

1

) = S

1

A →AB →BC →

CD →DR

1

and (S

2

, R

2

) = S

2

B →BC →CD →DA →AR

2

impose consistent partial

orders to the edges of the cycle; (b) paths (S

1

, R

1

) = S

1

A →AB →BC →CD →DR

1

and (S

2

, R

2

) = S

2

C →CD →DA →AB →BR

2

impose inconsistent partial orders to the

edges of the cycle.

Example 6.3. Consider the two networks in Figure 6.2. Sources S

1

and S

2

use the cycle ABCD to transmit information to receivers R

1

and R

2

, respectively. In the network shown in Figure 6.2(a), the paths

from sources S

1

and S

2

impose consistent partial orders on the edges

of the cycle. In the network shown in Figure 6.2(a), for the path from

source S

1

, we have AB > CD, whereas for the path from source S

2

, we

have CD > AB.

When the partial orders imposed by the diﬀerent paths are consis-

tent, although the underlying network topology contains cycles, the line

graph associated with the information ﬂows does not. Thus, from the

network code design point of view, we can treat the network as acyclic.

When the partial orders imposed by the diﬀerent paths are not consis-

tent, then the network code design needs to be changed accordingly. It

is this set of networks we call networks with cycles.

6.3.1 Delay and Causality

Our ﬁrst observation is that, in networks with cycles, we need to intro-

duce delay in each cycle to guarantee consistent and causal information

propagation around the cycle, as the following example illustrates.

6.3 Dealing with Cycles 101

Fig. 6.3 A networks with a cycle ABA. Unless we introduce delay in the cycle, the infor-

mation ﬂow is not well deﬁned.

Example 6.4. Consider nodes A and B that insert the independent

information ﬂows x

A

and x

B

in the two-edge cycle (A, B, A), as depicted

in Figure 6.3. Consider the network coding operation at node A. If we

assume that all nodes instantaneously receive messages from incoming

links and send them to their outgoing links, then node A simultaneously

receives x

A

and x

BA

and sends x

AB

. Similarly for node B. This gives

rise to the equation

x

AB

= x

A

+ x

BA

= x

A

+ x

B

+ x

AB

⇒x

A

+ x

B

= 0,

which is not consistent with the assumption that x

A

and x

B

are inde-

pendent. It is also easy to see that local and global coding vectors are

no longer well deﬁned.

6.3.2 Code Design

There are two ways to approach designing network codes for networks

with cycles. The ﬁrst approach observes that networks with cycles cor-

respond to systems with feedback, or, in the convolutional code frame-

work, recursive convolutional codes. That is, the transfer matrices are

rational functions of polynomials in the delay operator. The convolu-

tional code framework enables us to naturally take into account cycles

without changing the network code design: we select coding opera-

tions so that each receiver observes a full rank transfer matrix. Note

102 Networks with Delay and Cycles

that decoding can still be performed using the trellis diagram of the

recursive convolutional code, a trellis decoding algorithm like Viterbi,

and trace-back depth of one. The decoding complexity mainly depends

on the size of the trellis, exactly as in the case of feed-forward convo-

lutional codes. As a result, cycles per se do not increase the number of

states and the complexity of trellis decoding.

The second network code design approach attempts to remove the

cycles. This approach applies easily to networks that have simple cycles,

i.e., cycles that do not share edges with other cycles. Observe that an

information source needs to be transmitted through the edges of a cycle

at most once, and afterwards can be removed from the circulation by

the node that introduced it. We illustrate this approach through the

following example.

Example 6.5. Consider the cycle in Figure 6.2(b), and for simplicity

assume that each edge corresponds to one memory element. Then the

ﬂows through the edges of the cycle are

AB : σ

1

(t − 1) + σ

2

(t − 3)

BC : σ

1

(t − 2) + σ

2

(t − 4)

CD : σ

1

(t − 3) + σ

2

(t − 1)

DA : σ

1

(t − 4) + σ

2

(t − 2),

(6.1)

where σ

i

(t) is the symbol transmitted from source S

i

at time t. Oper-

ations in (6.1) can be easily implemented by employing a block of

memory elements as shown in Figure 6.4. Thus, we can still have a

feed-forward encoder by representing the cycle with a block of memory

elements.

The approach in Example 6.5 generalizes to arbitrary graphs with

simple cycles.

A straightforward algorithm is to start with the graph γ =

(S

i

, R

j

)

(or the subtree graph) where vertices correspond to edges in the original

graph G, and treat each cycle separately. Every vertex k in γ (edge

in the original graph) corresponds to a state variable s

k

. We need to

express how the information that goes through each s

k

evolves with

6.3 Dealing with Cycles 103

Fig. 6.4 Block representation of the cycle in Figure 6.2(b).

time. To do that, we can look at the I inputs (incoming edges) of

the cycle (for example in Figure 6.4, we have I = 2 inputs). Each input

follows a path of length L

i

(in Figure 6.4, L

1

= L

2

= 3). For each input,

we create L

i

memory elements. Through each edge of the cycle ﬂows

a linear combination of a subset of these

L

i

memory elements. This

subset is deﬁned by the structure of the paths and the structure of the

cycle, i.e., it is not selected by the code designer. The code designer

can only select the coeﬃcients for the linear combinations. Using this

approach, we create an expanded matrix A, that contains, for every

cycle, an additional set of

L

i

memory elements. For example, in

Figure 6.4, we use six additional memory elements, thus obtaining a

convolutional code with a total of 12 memory elements. We can think

of this approach as “expanding in time” when necessary, and along

speciﬁc paths.

This approach does not extend to networks with overlapping cycles,

as the following example illustrates.

Example 6.6. Consider the network depicted in Figure 6.5 that has

a knot, that is, an edge (in this case r

2

r

3

) shared by multiple cycles.

There are three receivers t

1

, t

2

, and t

3

and four sources A, B, C, and D.

All four information streams σ

A

, σ

B

, σ

C

, and σ

D

have to ﬂow through

the edge r

2

r

3

. Consider ﬂow x

A

: to reach receiver t

1

it has to circulate

104 Networks with Delay and Cycles

Fig. 6.5 A network conﬁguration with multiple cycles sharing the common edge r

2

r

3

(knot).

Fig. 6.6 Convolutional code corresponding to the network conﬁguration in Figure 6.5 if we

associate unit delay with the edges r

1

r

2

, r

4

r

2

, r

3

r

1

, and r

3

r

4

.

inside the cycle (r

2

, r

3

, r

4

). Since none of the nodes r

2

, r

3

, and r

4

has

σ

A

, σ

A

cannot be removed from the cycle. Thus, the second approach

would not work. To apply the ﬁrst approach, assume we associate a

unit delay element with each of the edges r

1

r

2

, r

4

r

2

, r

3

r

1

, and r

3

r

4

.

The resulting convolutional code is depicted in Figure 6.6.

Notes

Coding in networks with cycles and delay was ﬁrst considered in the

seminal papers on network coding by Ahlswede et al. [2, 36]. Using poly-

nomials and rational functions over the delay operator T to take delay

6.3 Dealing with Cycles 105

into account was proposed, together with the algebraic approach, by

Koetter and M´edard in [32]. The connection with convolutional codes

was observed by Fragouli and Soljanin in [23] and independently by

Erez and Feder in [21]. Using a partial order to distinguish between

cyclic and acyclic graphs was proposed by Fragouli and Soljanin in [24]

and later extended by Barbero and Ytrehus in [7]. The “knot example”

comes from [7]. Methods to code over cyclic networks were also exam-

ined by Ho et al. in [27]. A very nice tutorial on convolutional codes

is [39].

7

Resources for Network Coding

Now that we have learned how to design network codes and how much

of throughput increase to expect in networks using network coding,

it is natural to ask how much it costs to operate such networks. We

focus our discussion on resources required to linear network coding

for multicasting, as this is the better understood case today. We can

distinguish the complexity of deploying network coding to the following

components:

(1) Set-up complexity: This is the complexity of designing the

network coding scheme, which includes selecting the paths

through which the information ﬂows, and determining the

operations that the nodes of the network perform. In a time-

invariant network, this phase occurs only once, while in a

time-varying conﬁguration, the complexity of adapting to

possible changes becomes important. Coding schemes for net-

work coding and their associated design complexity are dis-

cussed in Chapter 5.

(2) Operational complexity: This is the running cost of using

network coding, that is, the amount of computational and

106

107

network resources required per information unit successfully

delivered. Again this complexity is strongly correlated to the

employed network coding scheme.

In linear network coding, we look at information streams

as sequences of elements of some ﬁnite ﬁeld F

q

, and coding

nodes linearly combine these sequences over F

q

. To recover

the source symbols, each receiver needs to solve a system of

h h linear equations, which requires O(h

3

) operations over

F

q

if Gaussian elimination is used. Linearly combination of

h information streams requires O(h

2

) ﬁnite ﬁeld operations.

The complexity is further aﬀected by

(a) The size of the ﬁnite ﬁeld over which we operate.

We call this the alphabet size of the code. The

cost of ﬁnite ﬁeld arithmetics grows with the ﬁeld

size. For example, typical algorithms for multiplica-

tions/inversions over a ﬁeld of size q = 2

n

require

O(n

2

) binary operations, while the faster imple-

mented algorithms (using the Karatsuba method for

multiplication) require O(n

log3

) = O(n

1.59

) binary

operations. Moreover, the alphabet size also aﬀects

the required storage capabilities at intermediate

nodes of the network. We discuss alphabet size

bounds in Section 7.1.

(b) The number of coding points in the network where we

need to perform the encoding operations. Note that

network nodes capable of combining incoming infor-

mation streams are naturally more expensive than

those merely capable of duplicating and forward-

ing incoming packets, and can possibly cause delay.

Therefore, from this point of view as well, it is in our

interest to minimize the number of coding points. We

look into this issue in Section 7.2.

There are also costs pertinent speciﬁcally to wireless networks, which

we discuss in the chapter on wireless networks, in part II of the review.

108 Resources for Network Coding

7.1 Bounds on Code Alphabet Size

We are here interested in the maximum alphabet size required for a

multicast instance ¦G, S, ¹¦ with h sources and N receivers. That is,

the alphabet size that is suﬃcient but not necessary for all networks

with h sources and N receivers. Recall that the binary alphabet is

suﬃcient for networks which require only routing.

7.1.1 Networks with Two Sources and N Receivers

To show that an alphabet of size q is suﬃcient, we can equivalently

prove that we can construct a valid code by using as coding vectors the

k = q + 1 diﬀerent vectors over F

2

q

in the set

¦[1 0], [0 1], [1 α], . . . , [1 α

q−1

]¦, (7.1)

where α is a primitive element of F

q

. Any two such coding vectors form

a basis of the two-dimensional space (see Example A.2 in Appendix).

For the rest of this section, we use the combinatorial framework of

Section 3.3. We restrict our attention to minimal subtree graphs, since

a valid network code for a minimal subtree graph Γ directly translates

to a valid network code for any subtree graph Γ

**that can be reduced
**

to Γ (the code for Γ simply does not use some edges of Γ

). Thus,

an alphabet size suﬃcient for all possible minimal subtree graphs is

suﬃcient for all possible non-minimal subtree graphs as well.

Let Γ be a minimal subtree graph with n > 2 vertices (subtrees);

n − 2 is the number of coding subtrees (note that when n = 2, Γ has

only source subtrees and no network coding is required). We relate

the problem of assigning vectors to the vertices of Γ to the problem

of vertex coloring a suitably deﬁned graph Ω. Vertex coloring is an

assignment of colors to the vertices of a graph so that adjacent vertices

have diﬀerent colors. In our case, the “colors” are the coding vectors,

which we can select from the set (7.1), and the graph to color Ω is a

graph with n vertices, each vertex corresponding to a diﬀerent subtree

in Γ. We connect two vertices in Ω with an edge when the corresponding

subtrees cannot be allocated the same coding vector.

If two subtrees have a common receiver node, they cannot be

assigned the same coding vector. Thus, we connect the corresponding

7.1 Bounds on Code Alphabet Size 109

vertices in Ω with an edge which we call receiver edge. Similarly, if

two subtrees have a common child, by Theorem 3.5, they cannot be

assigned the same coding vector. We connect the corresponding ver-

tices in Ω with an edge which we call a ﬂow edge. These are the only

independence conditions we need to impose: all other conditions follow

from these and minimality. For example, by Theorem 3.5, parent and

child subtrees cannot be assigned the same coding vector. However,

we need not worry about this case separately since by Theorem 3.6, a

parent and a child subtree have either a child or a receiver in common.

Figure 7.1 plots Ω for an example subtree graph (which we considered

in Chapter 3).

Lemma 7.1. For a minimal conﬁguration with n > 2, every vertex i

in Ω has degree at least 2.

Proof. We prove the claim separately for source and coding subtrees.

(1) Source subtrees: If n = 3, the two source subtrees have exactly

one child which shares a receiver with each parent. If n > 3,

the two source subtrees have at least one child which shares

a receiver or a child with each parent.

(2) Coding subtrees: Each coding subtree has two parents. Since

the conﬁguration is minimal, it cannot be allocated the same

Fig. 7.1 A subtree graph Γ and its associated graph Ω. The receiver edges in Ω are labeled

by the corresponding receivers.

110 Resources for Network Coding

coding vector as either of its parents. This implies that in Ω

there should exist edges between a subtree and its parents,

that may be either ﬂow edges, or receiver edges, and the

corresponding vertex has degree at least two.

Deﬁnition 7.1. The chromatic number of a graph is the minimum

number of colors required to color the vertices of the graph, so that

no two adjacent vertices are assigned the same color. A graph is

k-chromatic if its chromatic number is exactly k.

Lemma 7.2. ( [9, Ch. 9]) Every k-chromatic graph has at least k

vertices of degree at least k − 1.

Theorem 7.3. For any minimal conﬁguration with N receivers, the

code alphabet F

q

of size

¸

_

2N − 7/4 + 1/2|

is suﬃcient. There exist conﬁgurations for which it is necessary.

Proof. Assume that our graph Ω has n vertices and chromatic number

χ(Ω) = k ≤ n. Let m − k, where m is a nonnegative integer. We are

going to count the number of edges in Ω in two diﬀerent ways:

(1) From Lemmas 7.1 and 7.2, we know that each vertex has

degree at least 2, and at least k vertices have degree at least

k − 1. Consequently, we can lower bound the number of edges

of Ω as

E(Ω) ≥

k(k − 1) + 2m

2

. (7.2)

(2) Since there are N receivers and n − 2 coding subtrees, we

have at most N receiver edges and at most n − 2 distinct

ﬂow edges (we count parallel edges only once). Thus,

E(Ω) ≤ N + n − 2 = N + k + m − 2. (7.3)

7.1 Bounds on Code Alphabet Size 111

From (7.2) and (7.3), we obtain

N ≥

k(k − 1)

2

− k + 2. (7.4)

Equation (7.4) provides a lower bound on the number of receivers we

need in order to have chromatic number k. Solving for q = k − 1 we

get the bound

q ≤ ¸

_

2N − 7/4 + 1/2|.

This proves the ﬁrst claim of the theorem that, for any minimal con-

ﬁguration with N receivers, an alphabet of size ¸

_

2N − 7/4 + 1/2| is

suﬃcient.

To show that there exist conﬁgurations for which an alphabet of

this size is necessary, we are going to construct a subtree graph where

(7.4) becomes equality, i.e., we will construct a subtree graph that

has N =

k(k−1)

2

− k + 2 receivers and the corresponding graph Ω has

chromatic number k. We start with a minimal subtree graph Γ that

has k vertices and k − 1 receivers. Such a graph can be constructed as

depicted in Figure 7.2. The corresponding graph Ω has k − 2 ﬂow edges

and k − 1 receiver edges. Add

k(k−1)

2

− [(k − 2) + (k − 1)] receivers, so

that Ω becomes a complete graph with E(Ω) =

k(k−1)

2

edges. Thus Ω

cannot be colored with less than k colors. The corresponding subtree

graph has N =

k(k−1)

2

− k + 2 receivers, and requires an alphabet of

size q = k − 1.

7.1.2 Networks with h Sources and N Receivers

Based on the results of the previous section, a lower bound on the

alphabet size a network with h sources and N receivers may require is

¸

_

2N − 7/4 + 1/2|, since such a network may contain a two-source

subnetwork that requires this alphabet size. The interesting question

is whether there are networks that require larger alphabet.

Theorem 7.4. For all conﬁgurations with h sources and N receivers,

an alphabet of size greater than N is always suﬃcient.

112 Resources for Network Coding

Fig. 7.2 A minimal subtree graph for a network with two sources, N receivers, and N − 1

coding subtrees.

We have already given two diﬀerent proofs of this result, in Theorem 3.2

and Lemma 5.2. Here we discuss whether this bound is tight or not.

The proof of Theorem 3.2 does not use the fact that we only care

about minimal conﬁgurations, where paths cannot overlap more than

a certain number of times, and thus the entries of the matrices A

j

are

not arbitrary. For example, there does not exist a minimal conﬁguration

with two sources and two receivers corresponding to the matrices

A

1

=

_

1 0

0 x

_

, A

2

=

_

1 1

1 x

_

. (7.5)

In fact, as we saw in Theorem 3.5, there exist two minimal conﬁguration

with two sources and two receivers. Therefore, the proof of Lemma 3.1

is more general than it needs to be, and does not necessarily give a tight

bound. Similarly, recall that in the proof of Lemma 5.2, we overesti-

7.2 Bounds on the Number of Coding Points 113

mated the number of vectors that cannot be used at a certain coding

point. Therefore, this lemma may not give a tight alphabet size either.

In fact, up to today, we do not have examples of networks where the

alphabet size needs to be larger than O(

√

N). It has also been shown

that this alphabet size is suﬃcient, under some regularity conditions

or special network structure. The conjecture that this is indeed the

suﬃcient alphabet size for all networks is as far as we know today

open.

7.1.3 Complexity

It is well known that the problem of ﬁnding the chromatic number of a

graph is NP-hard. We saw in Section 7.1.1 that the problem of ﬁnding

the minimum alphabet size can be reduced to the chromatic number

problem. Here we show that the reverse reduction also holds: we can

reduce the problem of coloring an arbitrary graph G with the minimum

number of colors, to the problem of ﬁnding a valid network code with

the smallest alphabet size, for an appropriately chosen network (more

precisely, an appropriately chosen subtree graph Γ, that corresponds

to a family of networks). This instance will have two sources, and the

coding vectors (corresponding to colors) will be the vectors in (7.1).

Let G = (V, E) be the graph whose chromatic number we are seek-

ing. Create a subtree graph Γ = (V

γ

= V, E

γ

), that has as subtrees the

vertices V of G. We will ﬁrst select the subtrees (vertices) in Γ that act

as source subtrees. Select an edge e ∈ E that connects vertices v

1

and

v

2

, with v

1

, v

2

∈ V . Let v

1

and v

2

in Γ act as source subtrees, and the

remaining vertices as coding subtrees, that is, E

γ

= ¦(v

1

, v), (v

2

, v)[v ∈

V ¸ ¦v

1

, v

2

¦¦. For each e = (v

1

, v

2

) ∈ E, create a receiver that observes

v

1

and v

2

in Γ. It is clear that ﬁnding a coding scheme for Γ is equivalent

to coloring G. We thus conclude that ﬁnding the minimum alphabet

size is also an NP-hard problem.

7.2 Bounds on the Number of Coding Points

As we discussed in Example 3.1, non-minimal conﬁgurations can have a

number of coding points that grows with the number of edges in the net-

work. In minimal conﬁgurations, this is no longer the case. Intuitively,

114 Resources for Network Coding

if we have h sources and N receivers, the paths from the sources to the

receivers can only intersect in a ﬁxed number of ways. The bounds we

give below for minimal conﬁgurations depend only on h and N.

The problem of ﬁnding the minimum number of coding points is NP-

hard for the majority of cases. However, it is polynomial time provided

that h = O(∞), N = O(∞), and the underlying graph is acyclic.

7.2.1 Networks with Two Sources and N Receivers

For networks with two sources, we can calculate a tight upper bound

on the number of coding points using combinatorial tools.

Theorem 7.5. In a minimal subtree decomposition of a network with

two sources and N receivers, the number of coding subtrees is upper-

bounded by N − 1. There exist networks which have a minimal subtree

decomposition that achieves this upper bound.

Proof. Recall that there are exactly 2N receiver nodes. The ﬁrst part

of the claim then follows directly from Theorem 3.6. Figure 7.2 demon-

strates a minimal subtree graph for a network with two sources and N

receivers that achieves the upper bound on the maximum number of

subtrees.

7.2.2 Networks with h Sources and N Receivers

Here we restrict our attention to acyclic networks, but similar bounds

exist for cyclic networks.

Theorem 7.6. Consider a multicast instance ¦G, S, ¹¦. Then we can

eﬃciently ﬁnd a feasible network code with [([ ≤ h

3

N

2

, where h is the

number of sources, N the number of receivers, and [([ denotes the

number of coding points for the network code.

The proof idea of this theorem is to ﬁrst examine conﬁgurations with

h sources and two receivers (see Figure 7.3). For such minimal con-

ﬁgurations, one can show that the number of coding points is O(h

3

).

7.2 Bounds on the Number of Coding Points 115

Fig. 7.3 Minimal conﬁguration with h = 4 sources and N = 2 receivers.

Conﬁgurations with N receivers can be reduced to O(N

2

) conﬁgura-

tions with two receivers as follows. Given a pair of receivers, simply

remove the edges/coding points corresponding to the remaining N − 2

receivers. The resulting conﬁguration will have at most O(h

3

) coding

points. Thus the total number of coding points cannot exceed O(h

3

N

2

).

Lemma 7.7. There exist instances ¦G, S, ¹¦ with h sources and N

receivers such that [([ ≥ Ω(h

2

N).

We prove this lemma constructively. Consider the minimal subtree

graph in Figure 7.3 with h = 4 sources and two receivers. It is easy

to see that this is a minimal conﬁguration. The construction directly

extends to the case of h sources with h(h − 1)/2 coding points. We can

extend this construction to conﬁgurations with N receivers by using

N/2 non-overlapping receiver pairs.

7.2.3 Complexity

The problem of minimizing the number of coding points is NP-hard

for the majority of cases. It is polynomial time in the case where the

number of sources and receivers is constant, and the underlying graph

is acyclic. The following theorem deals with the case of integral network

coding over cyclic graphs (graphs that are allowed to have cycles).

116 Resources for Network Coding

Fig. 7.4 Reduction between the minimum-number-of-coding-points and link-disjoint-paths

problems.

Theorem 7.8. Consider integral network coding over cyclic graphs.

Let > 0 be a constant. Finding the minimum number of coding points

within any multiplicative factor or within an additive factor of [V [

1−

is NP-hard, where V is the number of vertices in the graph.

This can be proved by reduction to the link-disjoint path (LDP) prob-

lem in directed graphs. Here we give the main proof idea. Given a

directed graph and nodes S

1

, S

2

, R

1

, and R

2

, the LDP problem asks

to ﬁnd two edge disjoint paths (S

1

, R

1

) and (S

2

, R

2

). Connect S

1

and

S

2

to a virtual common source S, and also directly connect S

2

to R

1

and S

1

to R

2

through virtual edges. Consider now the problem of h = 2

sources multicasting from S to R

1

and R

2

using the minimum number

of coding points. If this minimum number equals zero, we have identi-

ﬁed a solution for the LDP problem. If it is greater than zero, the LDP

problem has no solution. Since the LDP is known to be NP-complete,

our original problem is at least equally hard.

1

7.3 Coding with Limited Resources

In the previous sections we looked into network coding costs in terms

of (i) the number of nodes required to perform coding (which are more

1

We underline that this reduction applies for the case of integral routing.

7.3 Coding with Limited Resources 117

expensive and may cause delay) and (ii) the required code alphabet size

which determines the complexity of the electronic modules performing

ﬁnite ﬁeld arithmetics. We now examine whether we can quantify the

trade-oﬀs between alphabet size, number of coding points, throughput,

and min-cut. Up to now we only have preliminary results toward these

directions that apply for special conﬁgurations such as the following

examples.

7.3.1 Min-Cut Alphabet-Size Trade-Oﬀ

We here consider networks with two sources where the processing com-

plexity is a stronger constraint than the bandwidth. In particular,

the system cannot support an alphabet size large enough to accom-

modate all users, but on the other hand, the min-cut toward each

receiver is larger than the information rate that we would like to

multicast.

When the min-cut toward each user is exactly equal to the number

of sources, the bound in Theorem 7.3 gives the maximum alphabet size

a network with N users may require. One would expect that, if the

min-cut toward some or all of the receivers is greater than the number

of sources, a smaller alphabet size would be suﬃcient. The intuition

is that, if the min-cut to each receiver is m > h, and we only want

to route h sources, we can “select” the h paths toward each receiver

that lead to the smallest alphabet size requirements. Equivalently, if

we have N receivers, h sources, and each receiver has mincut m≥ h,

we have a choice of

_

m

h

_

N

possible conﬁgurations ¦G, S, ¹¦ to select

from.

2

For the special case when the subtree graph is bipartite (which

means that the parent of coding subtrees can only be source subtrees)

and we have h = 2 sources, we can show that this is indeed true by

applying the following result. Consider a set of points X and a family

T of subsets of X. A coloring of the points in X is legal if no element of

T is monochromatic. If a family admits a legal coloring with k colors,

then it is called k-colorable.

2

Note however that several of these choices may turn out to correspond to the same minimal

conﬁguration.

118 Resources for Network Coding

Theorem 7.9. (Erd¨ os 1963) Let T be a family of sets each of size at

least m. If [T[ < k

m−1

, then T is k-colorable.

In our case, X is the set of subtrees, m is the min-cut from the sources

to each receiver, each element of T corresponds to the set of m sub-

trees observed by a receiver, and the set of colors are the points on

the projective line. Therefore, T is a family of sets each of size m, and

[T[ = N. Suppose that we can use an alphabet size k − 1 (which gives

k colors). Note that each receiver can observe both sources if T is col-

orable, since that means that no set of subtrees observed by a receiver

is monochromatic. From Theorem 7.9, this holds as long as

N < k

m−1

.

The above inequality shows a trade-oﬀ between the min-cut m to each

user and the alphabet size k − 1 required to accommodate N receivers

for the special class of conﬁgurations we examine. We expect a similar

tradeoﬀ in the case where the graph is not bipartite as well. However,

Theorem 7.9 cannot be directly applied, because in this case there are

additional constraints on coloring of the elements of X coming from the

requirement that each child subtree has to be assigned a vector lying

in the linear span of its parents’ coding vectors.

7.3.2 Throughput Alphabet-Size Trade-Oﬀ

Consider again the case where the subtree graph is bipartite, we have

h = 2 sources, and the min-cut to each receiver is m. We are interested

in the number of receivers which will not be able to decode both sources,

when we are restricted to use an alphabet of size k − 1. We obtain a

bound to this number by making use of the following result.

Theorem 7.10. For every family T whose all members have size

exactly m, there exists a k-coloring of its points that colors at most

[T[k

1−m

of the sets of T monochromatically.

Thus if we have [T[ = N receivers, the min-cut to each receiver is m,

and we only employ routing, then at most Nk

1−m

receivers will not

7.3 Coding with Limited Resources 119

be able to decode both sources. In other words, if the alphabet size is

not large enough to accommodate all users, but on the other hand, the

min-cut toward each receiver is larger than the information rate that

we would like to multicast, at least [F[(1 − k

1−m

) receivers will still be

able to successfully decode both sources.

7.3.3 Design Complexity and Alphabet-Size Trade-oﬀ

Here we show that, if we use very low complexity decentralized network

code design algorithms, we may also use an alphabet size much larger

than the optimal code design might require.

We make this case for the speciﬁc class of zk(p, N) networks that

are described in Example 4.2. We will show that for the zk(p, N) con-

ﬁgurations there exist network codes over the binary alphabet. On the

contrary, if we perform coding by random assignment of coding vectors

over an alphabet F

q

, the probability P

d

N

that all N receivers will be

able to decode is bounded (Theorem 5.4) as

P

d

N

≥

_

1 −

N

q

_

(

N

p

)

e

−N(

N

p

)/q

.

Thus, if we want this bound to be smaller than, say, e

−1

, we need to

choose q ≥ N

_

N

p

_

.

For arbitrary values of p and N, network coding using a binary

alphabet can be achieved as follows: We ﬁrst remove the edges going

out of S into those A-nodes whose labels contain N. There are

_

N−1

p−2

_

such edges. Since the number of edges going out of S into A-nodes is

_

N

p−1

_

, the number of remaining edges is

_

N

p−1

_

−

_

N−1

p−2

_

=

_

N−1

p−1

_

. We

label these edges by the h =

_

N−1

p−1

_

diﬀerent basis elements of F

h

2

. We

further remove all A-nodes which have lost their connection with the

source S, as well as their outgoing edges. The B-nodes merely sum

their inputs over F

h

2

, and forward the result to the C-nodes.

Consider a C-node that the Nth receiver is connected to. Its label,

say ω, is a p-element subset of 1 containing N. Because of our edge

removal, the only A-node that this C-node is connected to is the one

with the label ω ¸ ¦N¦. Therefore, all C-nodes that the Nth receiver

is connected to have a single input, and all those inputs are diﬀerent.

Consequently, the Nth receiver observes all the sources directly.

120 Resources for Network Coding

Each of the receivers 1, 2, . . . , N − 1 will have to solve a system

of equations. Consider one of these receivers, say R

j

. Some of the

C-nodes that R

j

is connected to have a single input: those are the

nodes whose label contains N. There are

_

N−2

p−2

_

such nodes, and they

all have diﬀerent labels. For the rest of the proof, it is important to

note that each of these labels contains j, and the

_

N−2

p−2

_

labels are all

(p − 1)-element subsets of 1 which contain j and do not contain N. Let

us now consider the remaining

_

N−1

p−1

_

−

_

N−2

p−2

_

=

_

N−2

p−1

_

C-nodes that

R

j

is connected to. Each of these nodes is connected to p A-nodes. The

labels of p − 1 of these A-nodes contain j, and only one does not. That

label is diﬀerent for all C-nodes that the receiver R

j

is connected to.

Consequently, R

j

gets

_

N−2

p−2

_

sources directly, and each source of the

remaining

_

N−2

p−1

_

as a sum of that source and some p − 1 of the sources

received directly.

7.3.4 Throughput Number-of-Coding-Points Trade-oﬀ

We show this trade-oﬀ again for the speciﬁc class of zk(p, N) net-

works also used in the previous section. We examine a hybrid cod-

ing/routing scheme in which only a fraction of the nodes that are

supposed to perform coding according to the scheme described in

Section 7.3.3 are actually allowed only to forward one of their inputs.

We derive an exact expression for the average throughput in this sce-

nario, and show that it increases linearly with the number of coding

points.

Lemma 7.11. Let A

k

zk

be a hybrid coding/routing scheme in which

the number of coding points in Γ

zk

(p, N) that are allowed to perform

linear combination of their inputs (as opposed to simply forwarding one

of them) is restricted to k. The average throughput under this scheme

T(A

k

zk

) is given by

T(A

k

zk

) =

h

N

_

p +

N − p

p

+ k

p − 1

h

_

. (7.6)

7.3 Coding with Limited Resources 121

Proof. If only k out of the

_

N−1

p

_

coding points are allowed to code, we

get that

T

av

k

=

1

N

__

N − 1

p − 1

_

+ (N − 1)

_

N − 2

p − 2

_

+

__

N − 1

p

_

− k

_

+ kp

_

.

(7.7)

In the above equation, we have

• the ﬁrst term because receiver N observes all sources,

• the second term because each of the remaining N − 1

receivers observes

_

N−2

p−2

_

sources directly at the source nodes,

• the third term because, at the

_

N−1

p

_

− k forwarding points,

exactly one receiver gets rate 1, and

• the fourth term because, the k coding points where coding

is allowed, all of its p receivers get rate 1 by binary addition

of the inputs at each coding point (see the description of the

coding scheme in Section 7.3.3).

Equation (7.6) follows from (7.7) by simple arithmetics.

Substituting k = h

N−p

p

in (7.7), i.e., using network coding at all

coding points, we get that T

av

k

= h as expected. At the other extreme,

substituting k = 0, i.e., using routing only, we get an exact characteri-

zation of T

av

i

as

T

av

k

= T

av

i

=

h

N

_

p +

N − p

p

_

.

Theorem 7.6 shows that the throughput beneﬁts increase linearly with

the number of coding points k, at a rate of (p − 1)/(hN). Thus, a

signiﬁcant number of coding points is required to achieve a constant

fraction of the network coding throughput.

Notes

The very ﬁrst paper on network coding by Ahlswede et al. required

codes over arbitrarily large sequences. Alphabets of size N h were

shown to be suﬃcient for designing codes based on the algebraic

122 Resources for Network Coding

approach of Koetter and M´edard [32]. The polynomial time code design

algorithms of Jaggi et al. in [30] operate of alphabets of size N. That

codes for all networks with two sources require no more than O(

√

N)

size alphabets was proved in by Fragouli and Soljanin in [24] by observ-

ing a connection with network code design and vertex coloring. The con-

nection with coloring was independently observed by Rasala-Lehman

and Lehman in [35]. It is not known whether alphabet of size O(N)

is necessary for general networks; only examples of networks for which

alphabets of size O(

√

N) are necessary were demonstrated in [24, 35].

Algorithms for fast ﬁnite ﬁeld operations can be found in [46].

The encoding complexity in terms of the number of coding points

was examined in [24] who showed that the maximum number of coding

points a network with two sources and N receivers can have equals to

N − 1. Bounds on the number of coding points for general networks

were derived by Langberg et al. in [34], who also showed that not only

determining the number of coding points but even determining whether

coding is required or not for a network is for most cases NP-hard.

Coding with limited resources was studied by Chekuri et al. in [15],

Cannons and Zeger in [12], and Kim et al. in [31].

Appendix

Points in General Position

For networks with h sources and linear network coding over a ﬁeld

F

q

, the coding vectors lie in F

h

q

, the h-dimensional vector space over

the ﬁeld F

q

. Since in network coding we only need to ensure linear

independence conditions, we are interested in many cases in sets of

vectors in general position:

Deﬁnition A.1. The vectors in a set / are said to be in general

position in F

h

q

if any h vectors in / are linearly independent.

The moment curve provides an explicit construction of q vectors in gen-

eral position. It can be deﬁned as the range of the f : F

q

→F

h

q

function

f(α) = [1 α α

2

. . . α

h

].

Deﬁnition A.2. The moment curve in F

h

q

is the set

¦[1 α α

2

. . . α

h

] [ α primitive element of F

q

¦.

123

124 Points in General Position

The vectors in the moment curve are in general position. Indeed, take

any h of these q vectors, the corresponding determinant

det

_

¸

¸

¸

_

1 x

1

x

2

1

. . . x

h−1

1

1 x

2

x

2

2

. . . x

h−1

2

.

.

.

.

.

.

.

.

. . . .

.

.

.

1 x

h

x

2

h

. . . x

h−1

h

_

¸

¸

¸

_

,

is a Vandermonde determinant, that equals

1≤j<i≤h

(x

i

− x

j

).

Thus, provided x

i

,= x

j

(for all i ,= j), it is nonzero.

Example A.1. For q + 1 ≥ h, the following set of q + 1 points are in

general position in F

h

q

:

[1 x

1

x

2

1

. . . x

h−1

1

]

[1 x

2

x

2

2

. . . x

h−1

2

]

.

.

.

.

.

.

.

.

. . . .

.

.

.

[1 x

q

x

2

q

. . . x

h−1

q

]

[0 0 0 . . . 1]

where x

i

∈ F

q

and x

i

,= x

j

for i ,= j.

Example A.2. For networks with two sources, we often use the q + 1

vectors in the set

[0 1], [1 0], and [1 α

i

], for 1 ≤ i ≤ q − 1, (A.1)

where α is a primitive element of F

q

. Any two diﬀerent vectors in this

set form a basis for F

2

q

, and thus these vectors are in general position.

In fact, we cannot expect to ﬁnd more than q + 1 vectors in general

position in F

2

q

. Indeed, if g denotes the maximum possible number of

such vectors, g needs to satisfy

g(q − 1) + 1 ≤ q

2

⇒g ≤ q + 1.

This follows from a simple counting argument. For each of the g vector,

we cannot reuse any of its q − 1 nonzero multiples. We also cannot use

125

the all-zero vector. Hence the left-hand side. The right-hand side counts

all possible vectors in F

2

q

.

Thus, for networks with two sources, we can, without loss of gener-

ality, restrict our selection of coding vectors to the set (A.1).

Example A.3. The following set of h + 1 points are in general position

in F

h

q

:

[1 0 . . . 0]

[0 1 . . . 0]

.

.

.

.

.

.

.

.

.

.

.

.

[0 0 . . . 1]

[1 1 . . . 1]

The maximum size g(h, q) that a set of vectors in F

h

q

in general

position can have, is not known in general. Some bounds include the

following:

• g(h, q) ≥ q + 1,

• g(h, q) ≥ n + 1,

• g(h + 1, q) ≤ 1 + g(h, q),

• g(h, q) ≤ q + n − 1.

Some known results include the following:

• g(h, q) = q + 1 if h = 2 or if h = 3 and q is odd,

• g(h, q) = q + 1 if q = h + 1 and q is odd,

• g(h, q) = q + 2 if h = 3 and q is even,

• g(3, q) = q + 2 for even q,

• g(4, q) = g(5, q) = q + 1,

• g(6, q) = g(7, q) = q + 1 if q is even,

• g(6, q) = q + 1 if q = 11 or 13.

In general, for h ≥ q, we know that g(h, q) = h + 1, whereas, for h ≤ q,

it holds that g(h, q) ≥ h + 1, and it is widely believed that

g(h, q) =

_

q + 2 if q is even and either h = 3 or h = q − 1,

q + 1 otherwise.

126 Points in General Position

A.1 Projective Spaces and Arcs

Vectors in general position correspond to points in arcs in projective

spaces. The projective space PG(h − 1, q) is deﬁned as follows:

Deﬁnition A.3. The projective (h − 1)-space over F

q

is the set of h-

tuples of elements of F

q

, not all zero, under the equivalence relation

given by

[a

1

a

h

] ∼ [λa

1

λa

h

], λ ,= 0, λ ∈ F

q

.

Deﬁnition A.4. In PG(h − 1, q), a k-arc is a set of k points any h of

which form a basis for F

h

q

.

In a projective plane, i.e., PG(2, q), a k-arc is a set of k points no

three of which are collinear (hence the name arc).

As a ﬁnal note, arcs have been used by coding theorists in the

context of MDS codes, i.e., codes that achieve the Singleton bound.

Consider a matrix G with columns a set of vectors in general position

in F

h

q

. Matrix G is a generator matrix for an MDS code of dimension

h over F

q

.

Notes

Although the problem of identifying g(h, q) looks combinatorial in

nature, most of the harder results on the maximum size of arcs have

been obtained by using algebraic geometry which is also a natural tool

to use for understanding the structure (i.e., geometry) of arcs. The

moment curve was discovered by Carath´eodory in 1907 and then inde-

pendently by Gale in 1956. A good survey on the size of arcs in pro-

jective spaces can be found in [3].

Acknowledgments

This work was in part supported by the Swiss National Science

Foundation under award No. PP002–110483, NSF under award

No. CCR-0325673, and DIMACS under the auspices of Special focus

on Computational Information Theory and Coding.

127

Notations and Acronyms

F

q

: ﬁnite ﬁeld with q elements

h: number of sources

N: number of receivers

¦G, S, ¹¦: a multicast instance comprising of a directed graph

G = (V, E), a source vertex S ∈ V , and a set ¹= ¦R

1

, R

2

, . . . , R

N

¦

of N receivers.

σ

i

: source S

i

emits symbols σ

i

over a ﬁnite ﬁeld F

q

(S

i

, R

j

): path from source S

i

to receiver R

j

S

e

i

: source node in the line graph, edge through which source S

i

ﬂows

c

**(e): local coding vector associated with edge e of size 1 [In(e)[
**

and elements over F

q

c(e): global coding vector of size 1 h and elements over F

q

c

xy

: capacity associated with the edge (x, y) that connects vertex

x to vertex y

γ: a line graph

Γ: an information ﬂow decomposition graph

T: delay operator

In(e): set of incoming edges for edge e

128

Notations and Acronyms 129

Out(e): set of outgoing edges for edge e

In(v): set of incoming edges for vertex v

Out(v): set of outgoing edges for vertex v

LP: Linear Program

IP: Integer Program

LIF: Linear Information Flow algorithm

T

c

: throughput achieved with network coding

T

f

: symmetric throughput achieved with rational routing

T

i

: symmetric throughput achieved with integral routing

T

av

f

: symmetric throughput achieved with rational routing

T

av

i

: symmetric throughput achieved with integral routing

References

[1] A. Agarwal and M. Charikar, “On the advantage of network coding for improv-

ing network throughput,” IEEE Information Theory Workshop, San Antonio,

Texas, 2004.

[2] R. Ahlswede, N. Cai, S.-Y. R. Li, and R. W. Yeung, “Network information

ﬂow,” IEEE Transactions on Information Theory, vol. 46, pp. 1204–1216, July

2000.

[3] A. H. Ali, J. W. P. Hirschfeld, and H. Kaneta, “On the size of arcs in projective

spaces,” IEEE Transaction on Information Theory, vol. 41, pp. 1649–1656,

September 1995.

[4] M. Ball, T. L. Magnanti, C. L. Monma, and G. L. Nemhauser, “Network Mod-

els,” in Handbooks in Operations Research and Management Science, North

Holland, 1994.

[5] J. Bang-Jensen, A. Frank, and B. Jackson, “Preserving and increasing local

edge-connectivity in mixed graphs,” SIAM Journal on Discrete Mathematics,

vol. 8, pp. 155–178, 1995.

[6]

´

A. M. Barbero and Ø. Ytrehus, “Heuristic algorithms for small ﬁeld mul-

ticast encoding,” 2006 IEEE International Symposium Information Theory

(ISIT’06), Chengdu, China, October 2006.

[7] A. M. Barbero and Ø. Ytrehus, “Cycle-logical treatment for cyclopathic net-

works,” Joint special issue of the IEEE Transactions on Information Theory

and the IEEE/ACM Transaction on Networking, vol. 52, pp. 2795–2804, June

2006.

[8] B. Bollob´as, Modern Graph Theory. Springer-Verlag, 2002.

[9] J. A. Bondy and U. S. R. Murty, Graph Theory with Applications. Amsterdam:

North-Holland, 1979.

130

References 131

[10] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University

Press, March 2004.

[11] J. Cannons, R. Dougherty, C. Freiling, and K. Zeger, “Network routing capac-

ity,” IEEE Transactions on Information Theory, vol. 52, no. 3, pp. 777–788,

March 2006.

[12] J. Cannons and K. Zeger, “Network coding capacity with a constrained num-

ber of coding nodes,” Allerton Conference on Communication, Control, and

Computing Allerton Park, Illinois, September 2006.

[13] Y. Cassuto and J. Bruck, “Network coding for non-uniform demands,” 2005

IEEE International Symposium Information Theory (ISIT 2005), Adelaide,

Australia, September 2005.

[14] C. Chekuri, C. Fragouli, and E. Soljanin, “On achievable information rates in

single-source non-uniform demand networks,” IEEE International Symposium

on Information Theory, July 2006.

[15] C. Chekuri, C. Fragouli, and E. Soljanin, “On average throughput beneﬁts and

alphabet size for network coding,” Joint Special Issue of the IEEE Transac-

tions on Information Theory and the IEEE/ACM Transactions on Networking,

vol. 52, pp. 2410–2424, June 2006.

[16] P. A. Chou, Y. Wu, and K. Jain, “Practical network coding,” Allerton Con-

ference on Communication, Control, and Computing, Monticello, IL, October

2003.

[17] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley

& Sons, New York, 1991.

[18] R. Diestel, Graph Theory. Springer-Verlag, 2000.

[19] R. Dougherty, C. Freiling, and K. Zeger, “Insuﬃciency of linear coding in net-

work information ﬂow,” IEEE Transactions on Information Theory, vol. 51,

no. 8, pp. 2745–2759, August 2005.

[20] P. Elias, A. Feinstein, and C. E. Shannon, “Note on maximum ﬂow through a

network,” IRE Transaction on Information Theory, vol. 2, pp. 117–119, 1956.

[21] E. Erez and M. Feder, “Convolutional network codes,” IEEE International

Symposium on Information Theory (ISIT), Chicago 2004.

[22] L. R. Ford Jr. and D. R. Fulkerson, “Maximal ﬂow through a network,” Cana-

dian Journal of Mathematics, vol. 8, pp. 399–404, 1956.

[23] C. Fragouli and E. Soljanin, “A connection between network coding and con-

volutional codes,” IEEE International Conference on Communications (ICC),

vol. 2, pp. 661–666, June 2004.

[24] C. Fragouli and E. Soljanin, “Information ﬂow decomposition for network cod-

ing,” IEEE Transactions on Information Theory, vol. 52, pp. 829–848, March

2006.

[25] N. Harvey, “Deterministic network coding by matrix completion,” MS Thesis,

2005.

[26] T. Ho, R. K¨otter, M. M´edard, M. Eﬀros, J. Shi, and D. Karger, “A random lin-

ear network coding approach to multicast,” IEEE Transactions on Information

Theory, vol. 52, pp. 4413–4430, October 2006.

132 References

[27] T. Ho, B. Leong, R. K¨otter, and M. M´edard, “Distributed Asynchronous Algo-

rithms for Multicast Network Coding,” 1st Workshop on Network Coding,

WiOpt 2005.

[28] S. Jaggi, Y. Cassuto, and M. Eﬀros, “Low complexity encoding for network

codes,” in Proceedings of 2006 IEEE International Symposium Information

Theory (ISIT’06), Seattle, USA, July 2006.

[29] S. Jaggi, P. A. Chou, and K. Jain, “Low complexity algebraic network multicast

codes,” presented at ISIT 2003, Yokohama, Japan.

[30] S. Jaggi, P. Sanders, P. Chou, M. Eﬀros, S. Egner, K. Jain, and L. Tolhuizen,

“Polynomial time algorithms for multicast network code construction,” IEEE

Transaction on Information Theory, vol. 51, no. 6, no. 6, pp. 1973–1982, 2005.

[31] M. Kim, C. W. Ahn, M. M´edard, and M. Eﬀros, “On minimizing network coding

resources: An evolutionary approach,” Network Coding Workshop, 2006.

[32] R. K¨otter and M. M´edard, “Beyond routing: An algebraic approach to network

coding,” IEEE/ACM Transaction on Networking, vol. 11, pp. 782–796, October

2003.

[33] G. Kramer and S. Savari, “Cut sets and information ﬂow in networks of two-way

channels,” in Proceedings of 2004 IEEE International Symposium Information

Theory (ISIT 2004), Chicago, USA, June 27–July 2 2004.

[34] M. Langberg, A. Sprintson, and J. Bruck, “The encoding complexity of network

coding,” Joint special issue of the IEEE Transactions on Information Theory

and the IEEE/ACM Transaction on Networking, vol. 52, pp. 2386–2397, 2006.

[35] A. R. Lehman and E. Lehman, “Complexity classiﬁcation of network informa-

tion ﬂow problems,” SODA, 2004.

[36] S.-Y. R. Li, R. W. Yeung, and N. Cai, “Linear network coding,” IEEE Trans-

actions on Information Theory, vol. 49, pp. 371–381, February 2003.

[37] Z. Li, B. Li, and L. C. Lau, “On achieving optimal multicast throughput in undi-

rected networks,” in Joint Special Issue on Networking and Information Theory,

IEEE Transactions on Information Theory (IT) and IEEE/ACM Transactions

on Networking (TON), vol. 52, June 2006.

[38] D. S. Lun, N. Ratnakar, M. M´edard, R. K¨otter, D. R. Karger, T. Ho, E. Ahmed,

and F. Zhao, “Minimum-cost multicast over coded packet networks,” IEEE

Transactions on Information Theory, vol. 52, pp. 2608–2623, June 2006.

[39] R. J. McEliece, “The algebraic theory of convolutional codes,” in Handbook of

Coding Theory, (V. Pless and W. C. Huﬀman, eds.), North Holland, October

1998.

[40] K. Menger, “Zur allgemeinen Kurventheorie,” Fundamenta Mathematicae,

vol. 10, pp. 95–115, 1927.

[41] A. Rasala-Lehman and E. Lehman, “Complexity classiﬁcation of network infor-

mation ﬂow problems,” SODA, pp. 142–150, 2004.

[42] S. Riis, “Linear versus nonlinear boolean functions in network ﬂow,” in Pro-

ceedings of 38th Annual Conference on Information Sciences and Systems

(CISS’04), Princeton, NJ, March 2004.

[43] P. Sanders, S. Egner, and L. Tolhuizen, “Polynomial time algorithms for net-

work information ﬂow,” in Proceedings of 15th ACM Symposium on Parallel

Algorithms and Architectures, 2003.

References 133

[44] A. Schrijver, Theory of Linear and Integer Programming. John Wiley & Sons,

June 1998.

[45] J. T. Schwartz, “Fast probabilistic algorithms for veriﬁcation of polynomial

identities,” Journal of the ACM, vol. 27, pp. 701–717, 1980.

[46] J. von zur Gathen and J. Gerhard, Modern Computer Algebra. Cambridge

Univ. Press, Second Edition, September 2003.

[47] Y. Wu, P. A. Chou, and K. Jain, “A comparison of network coding and tree

packing,” ISIT 2004, 2004.

[48] R. W. Yeung, “Multilevel diversity coding with distortion,” IEEE Transaction

on Information Theory, vol. 41, pp. 412–422, 1995.

[49] R. W. Yeung, S.-Y. R. Li, N. Cai, and Z. Zhang, “Network coding theory:

A tutorial,” Foundation and Trends in Communications and Information The-

ory, vol. 2, pp. 241–381, 2006.

[50] L. Zosin and S. Khuller, “On directed Steiner trees,” in Proceedings of the 13th

Annual ACM/SIAM Symposium on Discrete Algorithms (SODA), pp. 59–63,

2002.

1

Introduction

Networked systems arise in various communication contexts such as phone networks, the public Internet, peer-to-peer networks, ad-hoc wireless networks, and sensor networks. Such systems are becoming central to our way of life. During the past half a century, there has been a signiﬁcant body of research eﬀort devoted to the operation and management of networks. A pivotal, inherent premise behind the operation of all communication networks today lies in the way information is treated. Whether it is packets in the Internet, or signals in a phone network, if they originate from diﬀerent sources, they are transported much in the same manner as cars on a transportation network of highways, or ﬂuids through a network of pipes. Namely, independent information streams are kept separate. Today, routing, data storage, error control, and generally all network functions operate on this principle. Only recently, with the advent of network coding, the simple but important observation was made that in communication networks, we can allow nodes to not only forward but also process the incoming independent information ﬂows. At the network layer, for example, intermediate nodes can perform binary addition of independent bitstreams,

2

3 whereas, at the physical layer of optical networks, intermediate nodes can superimpose incoming optical signals. In other words, data streams that are independently produced and consumed do not necessarily need to be kept separate when they are transported throughout the network: there are ways to combine and later extract independent information. Combining independent data streams allows to better tailor the information ﬂow to the network environment and accommodate the demands of speciﬁc traﬃc patterns. This shift in paradigm is expected to revolutionize the way we manage, operate, and understand organization in networks, as well as to have a deep impact on a wide range of areas such as reliable delivery, resource sharing, eﬃcient ﬂow control, network monitoring, and security. This new paradigm emerged at the turn of the millennium, and immediately attracted a very signiﬁcant interest in both Electrical Engineering and Computer Science research communities. This is an idea whose time has come; the computational processing is becoming cheaper according to Moore’s law, and therefore the bottleneck has shifted to network bandwidth for support of ever-growing demand in applications. Network coding utilizes cheap computational power to dramatically increase network throughput. The interest in this area continues to increase as we become aware of new applications of these ideas in both the theory and practice of networks, and discover new connections with many diverse areas (see Figure 1.1). Throughout this tutorial we will discuss both theoretical results as well as practical aspects of network coding. We do not claim to exhaustively represent and reference all current work in network coding; the presented subjects are the problems and areas that are closer to our interests and oﬀer our perspective on the subject. However, we did attempt the following goals: (1) to oﬀer an introduction to basic concepts and results in network coding, and (2) to review the state of the art in a number of topics and point out open research directions. We start from the main theorem in network coding, and proceed to discuss network code design techniques, beneﬁts, complexity require-

security. wireless resources.1.1 Introductory Examples The following simple examples illustrate the basic concepts in network coding and give a preliminary idea of expected beneﬁts and challenges. we reference a limited number of papers representing the topics we cover.4 Introduction Fig. ments. We refer a more interested reader to the webpage www. 1. such as throughput. In order to provide a meaningful selection of literature for the novice reader. complexity.networkcoding. An excellent tutorial focused on the information theoretic aspects of network coding is provided in [49]. A companion volume is concerned with application areas of network coding. 1. and methods to deal with cycles and delay.1 Connections with other disciplines. 1. which include wireless and peer-to-peer networks.1 Beneﬁts Network coding promises to oﬀer beneﬁts along very diverse dimensions of communication networks. and resilience to link failures. .info for a detailed literature listing.

as depicted in Figure 1. . We discuss throughput beneﬁts in Chapter 4. Example 1. Figure 1.2(b). CE. Indeed. and that through each channel we can send one bit per time slot.1 Introductory Examples 5 Throughput The ﬁrst demonstrated beneﬁts of network coding were in terms of throughput when multicasting. and the bit x2 from source S2 along the path {BF } as depicted in Figure 1. it could also receive both sources. respectively (unit rate sources). 1.2 The Butterﬂy Network. Each source produces one bit per time slot which we denote by x1 and x2 . This example is commonly known in the network coding literature as the butterﬂy network.2(a). if the second receiver R2 uses all the network resources by itself.1. it could receive both sources. we could route the bit x1 from source S1 along the path {AD} and the bit x2 from source S2 along the path {BC. EF }. Assume that we have slotted time. CE.2 depicts a communication network represented as a directed graph where vertices correspond to terminals and edges correspond to channels. Fig. ED}. We have two sources S1 and S2 .1. Sources S1 and S2 multicast their information to receivers R1 and R2 . If receiver R1 uses all the network resources by itself. and two receivers R1 and R2 . We can route the bit x1 from source S1 along the path {AC. Similarly.

x1 + x2 }. This observation is generalized to the main theorem for multicasting in Chapter 2. That is. The previous example shows that if we allow intermediate node in the network to combine information streams and extract the information at the receivers. We then have a “contention” for the use of edge CE. is that we can allow intermediate nodes in the network to process their incoming information streams. wireless bandwidth. and independent information ﬂows were kept separate. In particular. Wireless Resources In a wireless environment. Similarly. network coding can be used to oﬀer beneﬁts in terms of battery life. If for example we decide to send bit x1 .2. while receiver R2 will receive both x1 and x2 . R2 receives {x2 . we would like to simultaneously send bit x1 to reach receiver R2 and bit x2 to reach receiver R1 . x1 + x2 }. node C can take bits x1 and x2 and xor them to create a third bit x3 = x1 + x2 which it can then send through edge CE (the xor operation corresponds to addition over the binary ﬁeld). Traditionally. The simple but important observation made in the seminal work by Ahlswede et al. and can solve this system of equations to retrieve x1 and x2 . Applying this approach we would have to make a decision at edge CE: either use it to send bit x1 . and can solve this system of equations to retrieve x1 and x2 . We assume that time is slotted. where devices A and C would like to exchange the binary ﬁles x1 and x2 using device B as a relay. and not just forward them. then receiver R1 will only receive x1 .6 Introduction Now assume that both receivers want to simultaneously receive the information from both sources. information ﬂow was treated like ﬂuid through pipes. and that a device can . or use it to send bit x2 . we can increase the throughput when multicasting. R1 receives {x1 . However. arising from the fact that through this edge we can only send one bit per time slot. Example 1. Consider a wireless ad-hoc network. we are interested in multicasting. and delay.

1.3. Network coding for wireless is examined in the second part of this review. Figure 1. which it then broadcasts to both receivers using a common transmission. node C receives both ﬁles x1 and x2 . The network coding approach takes advantage of the natural capability of wireless channels for broadcasting to give beneﬁts in terms of resource utilization.1. and bitwise xors them to create the ﬁle x1 + x2 . who in turn forwards each ﬁle to the corresponding destination. either transmit or receive a ﬁle during a timeslot (half-duplex communication). In particular.1 Introductory Examples 7 Fig. As we will . Node C has x2 and can thus decode x1 . as illustrated in Figure 1. and interference (if there are other wireless nodes attempting to communicate in the neighborhood). The network coding approach uses one broadcast transmission less.3 Nodes A and C exchange information via relay B. Node A has x1 and can thus decode x2 . wireless bandwidth (the wireless channel is occupied for a smaller amount of time). The beneﬁts in the previous example arise from that broadcast transmissions are made maximally useful to all their receivers.3 depicts on the left the standard approach: nodes A and C send their ﬁles to the relay B. delay (the transmission is concluded after three instead of four timeslots). This approach oﬀers beneﬁts in terms of energy eﬃciency (node B transmits once instead of twice).

x2 ) that the relay needs to transmit. the probability of his guessing correctly x1 equals 50%. Similar ideas can also help to identify malicious traﬃc and to protect against Byzantine attacks. Consider node A that sends information to node D through two paths ABD and ACD in Figure 1.3.4. Assume that an adversary (Calvin) can wiretap a single path. and does not have access to the complementary path. Thus systems that only require protection against such simple attacks. Fig. If for example he retrieves x1 + x2 . can get it “for free” without additional security mechanisms. Security Sending linear combinations of packets instead of uncoded data oﬀers a natural way to take advantage of multipath diversity for security against wiretapping attacks. using simple algebraic operations. the same as random guessing. x1 + x2 is nothing but some type of binning or hashing for the pair (x1 . Example 1.4 Mixing information streams oﬀers a natural protection against wiretapping. . If instead linear combinations (over some ﬁnite ﬁeld) of the symbols are sent through the diﬀerent routes. Calvin cannot decode any part of the data. as we will discuss in the second part of this review.8 Introduction discuss there. If the independent symbols x1 and x2 are sent uncoded. 1. The new element is that we can eﬃciently implement such ideas in practice. Binning is not a new idea in wireless communications. Calvin can intercept one of them.

2 Challenges The deployment of network coding is challenged by a number of issues that will also be discussed in more detail throughout the review. node B has additional memory requirements (needs to store ﬁle x1 instead of immediately broadcasting it). Example 1. and then solve a system of linear equations.2. The current mechanisms in place are designed around the assumption that the only eligible entities to tamper with the data are the source and the destination. We discuss such questions in Chapter 7. nodes A and C need to also keep their own information stored. Figure 1. and has to perform operations over ﬁnite ﬁelds (bitwise xor x1 and x2 ).1.1 Introductory Examples 9 1. such as networks for banking transactions. Security Networks where security is an important requirement. need to guarantee protection against sophisticated attacks. and investigating tradeoﬀs between complexity and performance. Initial eﬀorts toward this goal are discussed in the second part of the review. Moreover.1. Network coding on the other hand requires intermediate routers to perform operations on the data packets.3. In Example 1. Thus deployment of network coding in such networks would require to put in place mechanisms that allow network coding operations without aﬀecting the authenticity of the data. Here we brieﬂy outline some major concerns. An important question in network coding research today is assessing the complexity requirements of network coding. Complexity Employing network coding requires nodes in the network to have additional functionalities.4. .

into the existing network architecture. Making this possible is also an area of current research.10 Introduction Integration with Existing Infrastructure As communication networks evolve toward an ubiquitous infrastructure. a challenging task is to incorporate the emerging technologies such as network coding. we would like to be able to proﬁt from the leveraged functionalities network coding can oﬀer. without incurring dramatic changes in the existing equipment and software. A related open question is. how could network coding be integrated in current networking protocols. . Ideally.

and in the process. such conditions have been known for the past 50 years. clearly. For the case of unicast (when only one receiver at the time uses the network). and then move to multicast. 11 .2 The Main Theorem of Network Multicast Network multicast refers to simultaneously transmitting the same information to multiple receivers in the network. We use an algebraic approach to present and prove the network multicast theorem. We start this chapter with a suitable single unicast scenario study. and. We are concerned with sufﬁcient and necessary conditions that the network has to satisfy to be able to support the multicast at a certain rate. The fascinating fact that the main network coding theorem brings is that the conditions necessary and suﬃcient for unicast at a certain rate to each receiver are also necessary and suﬃcient for multicast at the same rate. provided the intermediate network nodes are allowed to combine and process diﬀerent information streams. we must require that they hold for each receiver participating in the multicast. introduce basic notions in network coding.

12 The Main Theorem of Network Multicast

2.1

The Min-Cut Max-Flow Theorem

Let G = (V, E) be a graph (network) with the set of vertices V and the set of edges E ⊂ V × V . We assume that each edge has unit capacity, and allow parallel edges. Consider a node S ∈ V that wants to transmit information to a node R ∈ V . Deﬁnition 2.1. A cut between S and R is a set of graph edges whose removal disconnects S from R. A min-cut is a cut with the smallest (minimal) value. The value of the cut is the sum of the capacities of the edges in the cut. For unit capacity edges, the value of a cut equals the number of edges in the cut, and it is sometimes referred to as the size of the cut. We will use the term min-cut to refer to both the set of edges and to their total number. Note that there exists a unique min-cut value, and possibly several min-cuts, see for example Figure 2.1. One can think about a min-cut as being a bottleneck for information transmission between source S and receiver R. Indeed, the celebrated max-ﬂow, min-cut theorem, which we discuss below, claims that the maximum information rate we can send from S to R is equal to the min-cut value.

Fig. 2.1 A unicast connection over a network with unit capacity edges. The min-cut between S and R equals three. There exist three edge disjoint paths between S and R that bring symbols x1 , x2 , and x3 to the receiver.

2.1 The Min-Cut Max-Flow Theorem

13

Theorem 2.1. Consider a graph G = (V, E) with unit capacity edges, a source vertex S, and a receiver vertex R. If the min-cut between S and R equals h, then the information can be send from S to R at a maximum rate of h. Equivalently, there exist exactly h edge-disjoint paths between S and R. We next give a constructive proof that, under the conditions of the theorem, there exist exactly h edge-disjoint paths between S and R. Since these paths are made up of unit capacity edges, information can be sent through each path at unit rate, giving the cumulative rate h over all paths. The procedure we present for constructing edge-disjoint paths between S and R is a part of the ﬁrst step for all network code design algorithms that we discuss in Chapter 5. Proof. Assume that the min-cut value between S and R equals h. Clearly, we cannot ﬁnd more than h edge disjoint paths, as otherwise removing h edges would not disconnect the source from the receiver. We prove the opposite direction using a “path-augmenting” algorithm. The algorithm takes h steps; in each step we ﬁnd an additional unit rate path from the source to the receiver. Let pe be an indicator variable associated with an edge e that uv connects vertex u ∈ V to vertex v ∈ V . (There may exist multiple edges that connect u and v.) Step 0: Initially, set pe = 0 for all edges e ∈ E. uv 1 1 1 Step 1: Find a path P1 from S to R, P1 = {v0 = S, v1 , v2 , . . . , v 11 = R}, where 1 is the length of the path, and set pe 1 v1 = 1, v 0 ≤ i < 1 , using one edge between every consecutive pair of nodes. The notation pe 1 v1 = 1 indicates that the edge e has v

i i+1 i i+1

1 1 been used in the direction from vi to vi+1 . k = S, v k , v k , . . . , v k = R} of Step k: (2 ≤ k ≤ h) Find a path Pk = {v0 1 2 k length k so that the following condition holds: k k There exists edge e between vi , vi+1 , 0 ≤ i < k such that

pe k v k v

i

i+1

=0

or pe k v

k i+1 vi

= 1.

(2.1)

**14 The Main Theorem of Network Multicast Accordingly, set pe k vk v
**

i i+1

= 1 or pe k v

each newly found path uses only edges which either have not been used or have been used in the opposite direction by the previous paths. Note that each step of the algorithm increases the number of edge disjoint paths that connect the source to the receiver by one, and thus at the end of step k we have identiﬁed k edge disjoint paths. To prove that the algorithm works, we need to prove that, at each step k, with 1 ≤ k ≤ h, there will exist a path whose edges satisfy the condition in (2.1). We will prove this claim by contradiction. (1) Assume that the min-cut to the receiver is h, but at step k ≤ h we cannot ﬁnd a path satisfying (2.1). (2) We will recursively create a subset V of the vertices of V . Initially V = {S}. If for a vertex v ∈ V there exists an edge connecting v and S that satisﬁes (2.1), include v to V. Continue adding to V vertices v ∈ V such that, for some u ∈ V, there exists an edge between u and v that satisﬁes the condition in (2.1) (namely, pe = 0 or pe = 1), until no more uv vu vertices can be added. (3) By the assumption in (1), we know that V does not contain the receiver R, otherwise we would have found the desired path. That is, the receiver belongs to V = V \ V. Let ϑV = {e|e = (u, v) ∈ E with u ∈ V, v ∈ V} denote the set of all edges e that connect V with V. These edges form a cut. By the construction of V, pe = 1 and pe = 0 for all edges uv vu e e ∈ ϑV. But from (1), e∈ϑV puv ≤ k − 1, and thus there exists a cut of value at most k − 1 < h, which contradicts the premise of the theorem.

k i+1 vi

= 0. This means that

2.2

The Main Network Coding Theorem

We now consider a multicast scenario over a network G = (V, E) where h unit rate sources S1 , . . . , Sh located on the same network node S (source) simultaneously transmit information to N receivers

and that the value of the min-cut between the source node and each of the receivers is h. Using an alphabet of size.2. h unit rate sources located on the same vertex of the graph and N receivers. meaning that during each time slot all nodes simultaneously receive all their inputs and send their outputs. Accordingly. in which intermediate network nodes linearly combine their incoming information symbols over Fq . 1 ≤ i ≤ h. we also assume zero delay. Assume that the value of the min-cut to each receiver is h. this model corresponds to the scenario in which every edge can reliably carry one bit and each source produces one bit per time-unit. E) with unit capacity edges. What we mean by unit capacity edges: The unit capacity edges assumption models a transmission scenario in which time is slotted. simply means that we send the information from the sources in packets of m bits. The m bits are treated as one symbol of Fq and processed using operations over Fq by the network nodes. each unit rate source Si emits σi . We refer to such transmission mechanisms as transmission schemes over Fq . For the moment. We assume that G is an acyclic directed graph with unit capacity edges. with m time-units being deﬁned as one time-slot. that delivers the information from the sources simultaneously to each receiver at a rate equal to h. Then there exists a multicast transmission scheme over a large enough ﬁnite ﬁeld Fq . . we know that there exist exactly h edge-disjoint paths between the sources and each of the .2. From the min-cut max-ﬂow theorem.” we eﬀectively mean that there exists “a large enough packet length. When we say that a scheme exists “over a large enough ﬁnite ﬁeld Fq . .2. q = 2m .1 Statement of the Theorem We state the main theorem in network coding as follows: Theorem 2. . . and during each time-slot we can reliably (with no errors) transmit through each edge a symbol from some ﬁnite ﬁeld Fq of size q. In practice. say.2 The Main Network Coding Theorem 15 R1 . which is an element of the same ﬁeld Fq . RN . Consider a directed acyclic graph G = (V.” 2.

Thus the theorem establishes the existence of linear network codes over some large enough ﬁnite ﬁeld Fq . say. their sets of paths may overlap.16 The Main Theorem of Network Multicast receivers. which is in fact a special case of the Ford–Fulkerson algorithm. We can do that by using. the ﬁeld Fq should be chosen as small as possible. In the multicast case. Observe that the paths toward diﬀerent receivers overlap over edges BD and GH. This question will be addressed in Chapter 7.2. is using the network by itself. Theorem 2. if any of the receivers. The example in Figure 2. Thus.2 An Equivalent Algebraic Statement of the Theorem In order to route the h information sources to a particular receiver. the algorithm given within the proof of the min-cut max-ﬂow theorem. However. We will refer to such transmission schemes as linear network coding. which leads to reduced rates. share the overlapping edge capacity or share the access to the edge in time). for example. then each of the receivers will be getting the information at the same rate as if it had sole access to network resources. The theorem additionally claims that it is suﬃcient for intermediate nodes to perform linear operations.g. 2. When multiple receivers are using the network simultaneously. we ﬁrst have to ﬁnd h edge-disjoint paths that connect the source node to this receiver. we have to ﬁnd one set of such paths for each of the N receivers. (e.2 tells us that. To reduce computational complexity. if we allow intermediate network nodes to not only forward but also combine their incoming information ﬂows. The conventional wisdom says that the receivers will then have to share the network resources. we will not worry. where we discuss resources required for network coding.2. Note that the paths from diﬀerent sets may overlap. .2 shows a network with two sources and three receivers. about being eﬃcient in terms of the ﬁeld size. additions and multiplications over a ﬁnite ﬁeld Fq . the information from the h sources can be routed to Rj through a set of h edge disjoint paths. Rj . In order to provide a very simple proof of Theorem 2. Each subﬁgure shows a set of two edge disjoint paths for diﬀerent receivers.. namely. for a moment.

2.3. we assume that each used coeﬃcient is an unknown variable.2.2 The paths to the three receivers overlap for example over edges BD and GH. i = j. The coeﬃcients used to form this linear combination constitute what we call a local coding vector c (e) for edge e. The dimension of c (e) is 1 × |In(e)|. if paths bringing diﬀerent information symbols use the same edge e. overlap over a unit capacity edge.2. the linear combination of information is shown in Figure 2. that is. we could forward only one of these two symbols (or timeshare between them). Since we do not know what values should the coeﬃcients in the local coding vectors take. Deﬁnition 2. whose value will be determined later. The local coding vectors associated with edges BD and GH are c (BD) = [α1 α2 ] and c (GH) = [α3 α4 ].2. where In(e) is the set of incoming edges to the parent node of e. when two paths bringing symbols σi and σj from sources Si and Sj .2 The Main Network Coding Theorem 17 Fig. a linear combination of these symbols is transmitted through e. we can transmit through the shared edge a linear combination of σi and σj over Fq . instead. In linear network coding. If we were restricted to routing. For the example in Figure 2. Such operations may be performed several times throughout the network. The local coding vector c (e) associated with an edge e is the vector of coeﬃcients over Fq with which we multiply the incoming symbols to edge e. .

since we start from the source symbols and then at intermediate nodes only perform linear combining of the incoming symbols. 2. . Namely. ch (e)] belongs to an hdimensional vector space over Fq . the symbol ﬂowing through some edge e of G is given by σ1 σ2 c1 (e)σ1 + c2 (e)σ2 + · · · + ch (e)σh = c1 (e) c2 (e) · · · ch (e) . . Deﬁnition 2. through each edge of G ﬂows a linear combination of the source symbols. . . The global coding vector c(e) associated with an edge e is the vector of coeﬃcients of the source symbols that ﬂow (linearly combined) through edge e. .3 The linear network coding solution sends over edges BD and GH linear combinations of their incoming ﬂows. c(e) σh where the vector c(e) = [c1 (e) c2 (e) . or for simplicity as the coding vector. Note that. We shall refer to the vector c(e) as the global coding vector of edge e. The dimension of c(e) is 1 × h.18 The Main Theorem of Network Multicast Fig.3. We next discuss how this linear combining can enable the receivers to get the information at rate h. .

. the three receivers observe the linear combinations of source symbols deﬁned by the matrices A1 = 1 0 α3 + α1 α4 α2 α4 A3 = . Clearly. For example. we can compute global coding vectors. transmitted from the h sources. Then the receiver Rj has to solve the following system of linear equations: ρj σ1 1 j σ2 ρ2 . More precisely. . The global coding vectors associated with edges BD and GH in Figure 2. ρj h σh to retrieve the information symbols σi . Rj ).3. Therefore. For our example in Figure 2. given all the local coding vectors for a network. . Alternatively. A2 = 0 1 α1 α2 . 1 ≤ i ≤ h. in Figure 2. and α2 α1 α3 + α1 α4 α2 α4 . 1 ≤ j ≤ N .3. . the coding vector c(GH) is in the linear span of c(DG) and c(F G). choosing global coding vectors so that all Aj . are full rank will enable all receivers to recover the source symbols from the information they receive. matrices Aj can be expressed in terms of the components of the local coding vectors {αk }. consider receiver Rj . Rj ). Consequently. and Aj the i matrix whose ith row is the coding vector of the last edge on the path (Si . and vice versa. = Aj . (2.2. Let ρj be the symbol on the last edge of the path (Si .2) . There is one more condition that these vectors have to satisfy: the global coding vector of an output edge of a node has to lie in the linear span of the coding vectors of the node’s input edges.2 The Main Network Coding Theorem 19 The coding vectors associated with the input edges of a receiver node deﬁne a system of linear equations that the receiver can solve to determine the source symbols.3 are c(BD) = [α1 α2 ] and c(GH) = [α3 + α1 α4 α2 α4 ]. we can deal with local coding vectors.

1 ≤ j ≤ N . be full rank is equivalent to the requirement that the product of the determinants of Aj be diﬀerent from zero: f ({αk }) det(A1 ) det(A2 ) · · · det(AN ) = 0. 2.2. . Therefore.e. Since f ({αk }) is a product. deﬁning the information that the receivers observe. In the following section. α4 }. 1 ≤ j ≤ 3. there exist values in some large enough ﬁnite ﬁeld Fq for the components {αk } of the local coding vectors. because we assumed that the graph G is acyclic. We will also show this rigorously in the next chapter. namely. it is suﬃcient to show that none of its factors. i. This coeﬃcient assignment makes Aj to be the h × h identity matrix. . we show that f ({αk }) is not identically equal to zero on some ﬁeld. . none of the det Aj .20 The Main Theorem of Network Multicast The network code design problem is to select values for the coeﬃcients {αk } = {α1 . such that all matrices Aj . the function f ({αk }) is also a polynomial in {αk } of ﬁnite total degree. and then use value one for the coeﬃcients corresponding to the edges of these paths. are full rank. Next. are full rank. 1 ≤ j ≤ N .. The main multicast theorem can be expressed in algebraic language as follows: Algebraic Statement of the Main Theorem: In linear network coding. and value zero for the remaining coeﬃcients. we prove this equivalent claim of the main coding theorem for multicast. is identically equal to zero. Note that. each entry of matrix Aj is a polynomial in {αk }. there are no feedback loops. This follows directly from the min-cut max-ﬂow theorem: simply ﬁnd h-edge disjoint paths from the sources to receiver Rj . . .3 Proof of the Theorem The requirement that all Aj .3) where f ({αk }) is used to denote this product as a function of {αk }. (2. so that all matrices Aj . and because we only perform linear operations at intermediate nodes of the network.

det(A2 ) = 0). . . at least one of the determinants of these matrices is equal to zero (namely. We prove the lemma by induction in the number of variables η: (1) For η = 1. αη−1 )αη . . . . . . .4) Over F2 . or the following set of matrices: A1 = x(1 + x) x2 x(1 + x) x2 and A2 = 1 x 1 1 .2. . . . The lemma holds immediately since such polynomials can have at most d roots. . αη ) = i=0 i fi (α1 . αη . in every ﬁnite ﬁeld Fq of size q > d on which f (α1 . Let f (α1 . . Proof.4) we can use x = 2 over F3 so that both det(A1 ) = 0 and det(A2 ) = 0. we have a polynomial in a single variable of degree at most d. .5) Consider a ﬁeld Fq with q > d on which f (α1 . with maximum degree in each variable of at most d. . for example. Then at least one of the polynomials {fi } is not identically equal to zero on Fq . . . (2. . . the inductive hypothesis is that the claim holds for all polynomials with fewer than η variables. We can express f (α1 . αη ) as a polynomial in variable αη with coefﬁcients {fi } that are polynomials in the remaining variables α1 . . say fj . Consider. . Lemma 2. αη = pη ) = 0. . . the polynomial x(x + 1) over F2 . . For example. .3) holds. . pη . Then. . αη ) is not identically equal to zero. and thus concludes our proof. (2) For η > 1. such that f (α1 = p1 .2 The Main Network Coding Theorem 21 When dealing with polynomials over a ﬁnite ﬁeld. det(A1 ) = 0 and x = 1. there exist values p1 . . The following lemma tells us that we can ﬁnd values for the coefﬁcients {αk } over a large enough ﬁnite ﬁeld such that the condition f ({αk }) = 0 in (2. for x = 0. . . . αη ) is not identically equal to zero. . . for the matrices in (2. . αη−1 : d f (α1 . . one has to keep in mind that even a polynomial not identically equal to zero over that ﬁeld can evaluate to zero on all of its elements. αη ) be a multivariate polynomial in variables α1 . . . . .3 (Sparse Zeros Lemma). . . (2.

there exist values α1 = p1 . . pη−1 ) = 0. . . Can we construct network codes eﬃciently? How large does the operating ﬁeld Fq need to be? What are the complexity requirements of network coding? Do we expect to get signiﬁcant beneﬁts. and immediately triggers a number of important questions. Note that we have given a simple proof that network codes exist under certain assumptions. is coding/decoding worth the eﬀort? Do network codes exist for other types of network scenarios.22 The Main Theorem of Network Multicast Since the number of variables in fj is smaller than η. i.2 claims the existence of network codes for multicast. whether the main theorem would hold or could be easily extended if we removed or relaxed these assumptions. .2. we check how restrictive are the modeling assumptions we made. or wireless networks. . Collocated sources – not restrictive.e. 2. and give an alternative information-theoretical proof of the main theorem. αη−1 = pη−1 such that fj (p1 . In the rest of this chapter.4 Discussion Theorem 2... by the inductive hypothesis. methods to eﬃciently choose the linear coeﬃcients. . . In Chapter 5. pη−1 . . rather than a constructive proof. provided that the capacity of the edges are rational numbers and we allow parallel edges. . A number of questions naturally arise at this point. .e. and connect it to the h source vertices through unit capacity edges. Substituting these values. . or traﬃc patterns other than multicast? What is the possible impact on applications? We will address some of these issues in later chapters. f (p1 . we will look into the existing methods for network code design. . In Chapter 3. such as networks with noisy links. i. we will learn how to calculate the parameterized matrices Aj for a given graph and multicast conﬁguration. . αη ) becomes a polynomial in the single variable αη with degree of at most d and at least one coeﬃcient diﬀerent than zero in Fq . and we will discuss them in the next session. Unit capacity edges – not restrictive. The lemma then holds from the argument in (1). We can add to the graph an artiﬁcial common source vertex. that is.

Note that the underlying graph for the example in Figure 2. Acyclic graph – not restrictive.4 An undirected graph where the main theorem does not hold.2. in undirected graphs.5 to each receiver. However. However. delay introduces a relationship between network coding and convolutional codes. Same min-cut values – restrictive. If the receivers have diﬀerent min-cuts. network coding does not oﬀer any beneﬁts: we can achieve the maximal rate of 1.2 The Main Network Coding Theorem 23 Fig. there exist network conﬁgurations where we need to deal with the underlying cyclic graphs by taking special action. and this fact did not come up in our discussion. GH. All practical networks introduce delay. Without network coding we can achieve rate 1. as we discuss in Chapter 6. we can multicast at the rate equal to the minimum of the .4 for edge CD.2 contains the cycle {F G. This is the case in the example in Figure 2. For this example. HF }. Such an example is provided by the butterﬂy network in Figure 1. diﬀerent paths may traverse the same edge in opposite direction. there exist undirected graphs where use of network coding oﬀers beneﬁts. for example by introducing memory. From a theoretical point of view. while using network coding we can achieve rate 2. Directed graph – restrictive.5 to each receiver by time-sharing across edge CD. if we assume that the edges of the network are undirected. The gain in network coding comes from combining information carried by diﬀerent paths whenever they overlap on an edge. We discuss networks with delay in Chapter 6.2. 2. Zero delay – not restrictive. On the other hand.

The proof approach of the main theorem also follows the proof approach in [25. The proof for arbitrary capacity edges and/or directed graphs is a direct extension of the same ideas. For example. We will then have receivers with min-cut two. Notes The min-cut max-ﬂow theorem was proven in 1927 by Menger [40]. [20]. in [49] and by Harvey in [25]. It was also proven in 1956 by Ford and Fulkerson [22] and independently the same year by Elias et al. The ﬁrst origin of these ideas can be traced back to Yeung’s work in 1995 [48]. [2. The main theorem in network coding was proved by Ahlswede et al. There exist many diﬀerent variations of this theorem. and is a variation of a result proved in 1980 by Schwartz [45]. and receivers with mincut one. 49]. and it is speciﬁc to unit capacity undirected graphs. consider again the butterﬂy network in Figure 1. while receivers with min-cut one require that only routing is employed. Associating a random variable with the unknown coeﬃcients when performing linear network coding was proposed by Koetter and M´dard [32]. but cannot always transmit to each receiver at the rate equal to its min-cut. see for example [18]. The proof we gave follows the approach in [8]. Receivers with min-cut two require that linear combining is performed somewhere in the network. The sparse zeros e lemma is published by Yeung et al. .24 The Main Theorem of Network Multicast min-cuts. These two requirements are not always compatible. 36] in a series of two papers in 2000.2 and assume that each vertex of the graph is a receiver.

However. one may also argue that each network coding issue (e. We here present tools from the algebraic.1 3. information theoretic. Multicasting is the only traﬃc scenario examined in this chapter. E).. we discuss diﬀerent types of routing and coding. code design. in several research communities. and linear programming frameworks. However. At the end.g. 3. 25 .1 A Network Multicast Model The Basic Model A multicast scenario is described by a directed graph G = (V.1. complexity) should be put in the framework in which it can be studied the most naturally and eﬃciently. the presented tools and techniques can be extended to and are useful for studying other more general types of traﬃc as well. The choice of framework a researcher makes most frequently depends on his background and preferences. combinatorial. throughput beneﬁts.3 Theoretical Frameworks for Network Coding Network coding can and has been studied within a number of diﬀerent theoretical frameworks. and point out some of the important results obtained within each framework. a source vertex S ∈ V (on which h unit rate sources Si are collocated).

and solves the system of linear equations (2.3). R} satisﬁes the multicast property for rate h if the min-cut value between S and each receiver is greater than or equal to h. Rj ). 1 ≤ i ≤ h} be a set of h edge-disjoint paths from the sources to the receiver Rj . As discussed in Chapter 2. Under the assumption that the min-cut to each receiver equals at least h. S.2). . . Our object of interest is the subgraph G of G consisting of the hN paths (Si . in linear network coding.26 Theoretical Frameworks for Network Coding and a set R = {R1 . 1 ≤ j ≤ N .2). 1 ≤ i ≤ h.1. to each edge of the graph.2). Network code design is concerned with assigning local coding vectors. . R}. where In(e) is the set of edges incoming to the parent node of e (see Deﬁnition 2. each node of G receives an element of Fq from each input edge. The coeﬃcients deﬁning the linear combination on edge e are collected in a local coding vector c (e) of dimension 1 × |In(e)|. as we discuss later. coding vectors. or equivalently. Rj ). and then forwards (possibly diﬀerent) linear combinations of these symbols to its output edges. The symbol through edge e is given by c1 (e)σ1 + · · · + ch (e)σh . R} also satisﬁes the multicast property. S. . We call this condition the multicast property for rate h. R2 . Receiver Rj takes h coding vectors from its h input edges to form the rows of the matrix Aj . we know that a necessary and suﬃcient condition for feasibility of rate h multicast is that the min-cut to each receiver be greater or equal to h. As a consequence of this local linear combining. Deﬁnition 3. each edge e carries a linear combination of the source symbols. aﬀects the complexity of the network code. and the coeﬃcients describing this linear combination are collected in the h-dimensional global coding vector c(e) = [c1 (e) · · · ch (e)] (see Deﬁnition 2. An instance {G. RN } of N receivers. . From the main theorem on network coding (Theorem 2. the instance {G . S. and. the existence of such paths is guaranteed by the min-cut max-ﬂow theorem. The choice of the paths is not unique. Clearly. We will refer to these three ingredients together as a (multicast) instance {G. Let {(Si .

A network is solvable if there exist operations the nodes of the network can perform so that each receiver experiences the same maximum rate as if it were using all the network resources by itself. A graph G is called minimal with the multicast property if removing any edge would violate this property. .2. We then say that edge e is a coding point. There are. Deﬁnition 3. Deﬁnition 3. therefore. In practice. Observe now that in linear network coding. a valid linear network code exists over some large enough ﬁeld. those that do not have redundant edges. and then some for which there are neither linear nor nonlinear valid network codes. Coding points are the edges of the graph G where we need to perform network coding operations. we will be interested in minimal graphs. we need to perform linear combining at edge e. however. We have seen that for multicast networks satisfying the conditions of the main network coding theorem. 1 ≤ j ≤ N .3. these are the places in the network where we require additional processing capabilities. namely. linearly solvable. A valid linear network code is any feasible assignment of coding vectors such that the matrix Aj is full rank for each receiver Rj . Finally. some other network traﬃc scenarios for which there are no valid linear network codes. A network is linearly solvable if these operations are linear. Identifying a minimal graph before multicasting may allow us to use less network resources and reduce the number of coding points as the following example illustrates. only if there exist two or more paths that share e but use distinct edges of In(e). An assignment of coding vectors is feasible if the coding vector of an edge e lies in the linear span of the coding vectors of the parent edges In(e).3. as opposed to simple forwarding.5. Deﬁnition 3. All multicast instances are.4.1 A Network Multicast Model 27 Deﬁnition 3.

1. Identifying a minimal conﬁguration for an instance {G. that nodes A and B in Figure 3. Thus.3. This choice results in using two coding points. 3.. S. . however. linearly combining ﬂows at edges AB and CD. Notice.1(b) and 3.1 A network with two sources and two receivers: (a) the original graph.e. identifying minimal conﬁgurations can help to signiﬁcantly reduce the number of coding points. Contrary Fig.2.1(a) and their incident edges can be removed without aﬀecting the multicast condition. minimal conﬁgurations with two sources and two receivers have at most one coding point. has a single coding point.1(c). On the other hand.28 Theoretical Frameworks for Network Coding Example 3. R} can be always done in polynomial time.1(a) that is created by putting one butterﬂy network on the top of another. A choice of two sets of edge-disjoint paths (corresponding to the two receivers) is shown in Figures 3. i. and is identical to the butterﬂy network shown in Figure 1. We can extend the same construction by stacking k butterﬂy networks to create a non-minimal conﬁguration with k coding points. (b) two edgedisjoint paths from the sources to the receiver R1 . as we discuss in Chapter 5. Consider the network with two sources and two receivers shown in Figure 3. and (c) two edge-disjoint paths from the sources to the receiver R2 . The resulting graph is then minimal. as we prove in Section 3.

as we discuss in Chapter 7.2 shows the network with two sources and three receivers which we studied in Chapter 2 together with its line graph.2 The Line Graph Model To deﬁne a network code. Figure 3. Thus. it is often more transparent to work with the graph γ= 1≤i≤h 1≤j≤N L(Si . Rj ). Rj ). we eventually have to specify which linear combination of source symbols each edge carries.2 A network with 2 sources and 3 receivers. Rj ). identifying a minimum conﬁguration (that has the minimum number of coding points) is in most cases an NP-hard problem. Fig. . that is. and its line graph. each vertex of L(Si . (3. Rj ) represents an edge of (Si . 3.1) where L(Si .3. and any two vertices of L(Si .1 A Network Multicast Model 29 to that. Rj ). Rj ) denotes the line graph of the path (Si . 3. Rj ) are adjacent if and only if their corresponding edges share a common vertex in (Si .1.

4. . Rj ) as the receiver node for receiver Rj and source Si . Our deﬁnitions for feasible and valid network codes directly translate for line graphs. Each node in γ with a single input edge merely forwards its input symbol to its output edges. e e e Deﬁnition 3. and CE are receiver nodes.2. Sh } are the nodes of γ corresponding to the h sources. Each node with two or more input edges performs a coding operation (linear combining) on its input symbols. they are the h edges in G emanating from the source vertex S. In Figure 3.2(b). Finally. HF . Because the approach is very straightforward and requires little background. The edges coming into the node e are labeled by the coeﬃcients of local coding vector c (e). In Figure 3. S1 A and S2 C are source nodes. we refer to the node corresponding to the last edge of the path (Si . Randomized coding. as well as our deﬁnition for minimality. . 3. As a result. For a conﬁguration with h sources and N receivers. one corresponding to each of the h co-located sources. AF . These nodes are the coding points in Deﬁnition 3. S2 . In Figure 3. and forwards the result to all of its output edges. Equivalently. Thus designing a network code amounts to choosing the values {αk } for the labels of edges in the line graph. the line graph contains a node corresponding to each of the h sources. BD and GH are coding points. . . DE. . we adopted it for explaining the fundamentals in Chapter 2.2 Algebraic Framework One of the early approaches to network coding (and deﬁnitely the one that galvanized the area) was algebraic. We refer to these nodes as source nodes. we can assume that the source vertex S in graph G has exactly h outgoing edges. Using the line graph notation makes the deﬁnition of coding points transparent. there exist hN receiver nodes. HK.6.30 Theoretical Frameworks for Network Coding Without loss of generality (by possibly introducing an auxiliary node and h auxiliary edges).2(b). Source nodes {S1 . Note that vertices corresponding to the edges of In(e) are parent nodes of the vertex corresponding to e. DK.

Network code design amounts to selecting values for the variable entries in A.3) where D is the indeterminate delay operator. and Dj are matrices with appropriate dimensions which we discuss below.2) can also be interpreted as describing a convolutional code.4) In (3. if we consider a speciﬁc receiver Rj .2) where sk is the m × 1 state vector. This approach is particularly easy to explain using the notion of the line graph in which each edge carries either label 1 or the unique label corresponding to a variable in {αk }. yk is the h × 1 output vector. and up to m := |E| memory elements. uk is the h × 1 input vector.) A standard result in linear system theory gives us the transfer matrix Gj (D): Gj (D) = Dj + Cj (D−1 I − A)−1 B.2) matrix A is common for all receivers and reﬂects the way the memory elements (states) are connected (network topology).2 Algebraic Framework 31 believed to be of paramount importance for practical network coding. Matrices Cj and Dj . The main idea is to think of each vertex of the line graph (edge in the original graph) as a memory element that stores an intermediate information symbol. Using unit delay. . and an element of A is nonzero if and only if there is an edge in the line graph between the indexing states. Then. has been developed within the algebraic framework. This system is described by the following set of ﬁnite-dimensional state-space equations: sk+1 = Ask + Buk . (3. our line graph acts as a linear system with h inputs (the h sources) and h outputs (that the receiver observes). we obtain the transfer matrix for receiver Rj : Aj = Dj + Cj (I − A)−1 B. (The state space equations (3. Cj . yk = Cj sk + Dj uk . and A. Its elements are indexed by the states (nodes of the line graph). as we explain in Chapter 6.3. The nonzero elements equal either to 1 or to an unknown variable in {αk }. (3. B. Matrix B is also common for all receivers and reﬂects the way the inputs (sources) are connected to our graph. (3.

5) This equation immediately implies that the elements of the transfer matrices Aj are polynomials in the unknown variables {αk }. I−A B (3. can be very large. a result that we used in the proof of the main theorem. Here. in some code design algorithms. and Dj depends upon the number of edges in our graph. Let L denote the length of the longest path between the source and a receiver.5) oﬀers an intuitive explanation of (3.1. (3. (I − A)−1 = I + A + A2 + · · · + AL . in general. The series in (3. for example. assume that Dj = 0 (this we can always do by possibly adding auxiliary edges and increasing the size m of the state space). and Dj can be chosen to be binary matrices by possibly introducing auxiliary vertices and edges. We next present a very useful lemma. Matrices B. Moreover. B. Then AL+1 = 0. By accordingly ordering the elements of the state space vector. Cj . and therefore. The transfer matrix expresses the information brought along these paths from the sources to the receivers. In other words. In the next section.6) . Then for A strictly upper triangular | det Aj | = | det Nj |. Let Aj = Cj (I − A)−1 B. which. express how the outputs receiver Rj observes depend upon the state variables and the inputs.5) accounts for all paths connecting the network edges.4). we discuss a method to signiﬁcantly reduce this number. where Nj = 0 Cj . we use it to prove an upper bound on the ﬁeld size (code alphabet size) suﬃcient for the existence of a network code. We will from now on. Recall that the dimension of matrices A. matrix A becomes strictly upper triangular for acyclic graphs. without loss of generality.32 Theoretical Frameworks for Network Coding respectively. Lemma 3. nilpotent (An = 0 for some positive integer n). It is easy to see that A is eﬀectively an incidence matrix. Cj . Observe that substituting Dj = 0 in (3. helpful.4) gives Aj = Cj (I − A)−1 B. (3.

using a result known as Schur’s formula for the determinant of block matrices. αη = pη ) = 0. for q > N there exist values p1 . . . we further obtain ± det Nj = det 0 Cj = det(I − A) det(Cj (I − A)−1 B).3 Combinatorial Framework 33 Proof. This approach has been useful for understanding computational complexity of network coding in terms of the required code .6.2. . Each variable αk appears at most once in each matrix Nj . From Lemma 5.7) has degree at most N in each of the η variables {αk }. . . . pη ∈ Fq such that f (α1 = p1 . . Note that det Nj = det 0 0 Cj Cj . . = ± det B I−A I−A B since permuting columns of a matrix only aﬀects the sign of its determinant.3 Combinatorial Framework We present an approach within the combinatorial framework that is mainly based on graph theory and to a certain extent on projective geometry. . Now. 3. B I−A The claim (3. αη ) = det N1 det N2 · · · det NN (3. p2 . Proof.1) an alphabet of size q > N is always suﬃcient. since A is strictly upper triangular. . We discuss how tight this bound is and present additional alphabet size bounds in Chapter 7. then det(I − A) = 1.3. . For a multicast network (under the model assumptions in Section 3. As a result. . Theorem 3.6) follows directly from the above equation by noticing that. the polynomial f (α1 .

Each such subgraph is a tree. which we discuss in Chapter 7. HK} the linear combination that ﬂows through the coding point GH. In this section.2. Through T1 = {S1 A. The receiver nodes for a receiver Rj are deﬁned to be the last edges on the paths (Si . At a second level. Deﬁnition 3. We call T3 and T4 coding subtrees. Rj ). we use the line graph approach for a formal description and proofs. as these are subgraphs within which we do not have to code.2 receiver R1 has the receiver nodes AF and HF . DK} ﬂows the linear combination of source symbols that ﬂows through the coding point BD. DE. 1 ≤ i ≤ h. and we can partition the graph into four trees. For the example in Figure 3. we show how we can use it to identify structural properties of the multicast conﬁgurations. CE} ﬂows the source symbol σ2 . In Figure 3. First we discuss the results and structural properties informally at a high level using the original graph G. Each receiver has h receiver nodes. We call these edges receiver nodes. Deﬁnition 3. Through T3 = {BD. AF.8.3. rooted at a coding point or a source. Each receiver Rj observes the linear combination of the sources that ﬂow through the last edges on the paths (Si . . Since source symbols ﬂow through these trees. AB. and terminating either at receivers or other coding points. 1 ≤ i ≤ h. We give this discussion in two levels. 3.1 Basic Notions of Subtree Decomposition The basic idea is to partition the network graph into subgraphs through which the same information ﬂows. Rj ). A subtree Ti is called a source subtree if it starts with a source and a coding subtree if it starts with a coding point. and through T2 = {S2 C.7.34 Theoretical Frameworks for Network Coding alphabet size and the number of coding points. we call them source subtrees. coding points are the edges BD and GH. F G} ﬂows the source symbol σ1 . DG. HF. CB. and similarly through T4 = {GH.

2. 3. Figure 3. c(T3 ) = [1 1]. A valid code for the network in Figure 3.3(b): c(T1 ) = [1 0].3.2. For this code.3(b) shows the subtree graph for the network in Figure 3.2. Receiver j observes h coding vectors from h distinct subtrees to form the rows of the matrix Aj . we only need to know how the subtrees are connected and which receiver nodes are in each Ti .2(a) can be obtained by assigning the following coding vectors to the subtrees in Figure 3. c(T2 ) = [0 1].3 Combinatorial Framework 35 For the network code design problem. and c(T4 ) = [0 1].3 (a) Line graph with coding points BD and GH for the network in Figure 3. Fig. The network code design problem is now reduced to assigning an h-dimensional coding vector c(Ti ) = [c1 (Ti ) · · · ch (Ti )] to each subtree Ti . EΓ ) and the association of the receivers with nodes VΓ . . the ﬁeld with two elements is suﬃcient. Thus we can contract each subtree to a node and retain only the edges that connect the subtrees. which we will refer to as the subtree graph Γ. Example 3. is deﬁned by its underlying topology (VΓ . The resulting combinatorial object. Nodes B and G in the network (corresponding to coding points BD and GH) perform binary addition of their inputs and forward the result of the operation. and (b) the associated subtree graph.

for a conﬁguration with two sources and two receivers there exists exactly one multicast conﬁguration requiring network coding. . which corresponds to the butterﬂy network. .6) or a coding point (see Deﬁnition 3.3. and (2) Each node that is neither a source node nor a coding point belongs to the Vi which contains its closest ancestral coding point or source node. R2 . .3 Properties of Subtree Graphs Here we prove a number of properties for subtree graphs and minimal subtree graphs. We shall use the notation Ti to describe both the subgraph of the line graph γ. Vi together with the edges whose both endpoints are in Vi ). there exists a ﬁnite number of corresponding subtree graphs. and the corresponding node in the subtree graph Γ.36 Theoretical Frameworks for Network Coding The rest of the nodes in the network merely forward the information they receive.3). and R3 are A1 = 1 0 c(T1 ) = . .4). S2 .1) into subsets Vi so that the following holds: (1) Each Vi contains exactly one source node (see Deﬁnition 3. Ti is the subgraph of γ induced by Vi (namely. 0 1 c(T4 ) A3 = A2 = 0 1 c(T2 ) = . We partition the vertices of the line graph γ in (3. . and 1 1 c(T3 ) 1 1 c(T3 ) = . for a given number of sources and receivers.2 Formal Description of Subtree Decomposition Using the line graph approach (see Figure 3. and use these properties to show that. 3. the subtree decomposition can be formally performed as follows. 0 1 c(T4 ) 3. For example. The matrices for receivers R1 . Source subtrees start with one of e e e the source nodes {S1 .3. Sh } of the line graph and coding subtrees with one of the coding points. depending on the context.

3.3 Combinatorial Framework

37

Theorem 3.3. The subgraphs Ti satisfy the following properties: (1) Each subgraph Ti is a tree starting at either a coding point or a source node. (2) The same linear combination of source symbols ﬂows through all edges in the original graph that belong to the same Ti . (3) For each receiver Rj , there exist h vertex-disjoint paths in the subtree graph from the source nodes to the receiver nodes of Rj . (4) For each receiver Rj , the h receiver nodes corresponding to the last edges on the paths (Si , Rj ), 1 ≤ i ≤ h, belong to distinct subtrees. (5) Each subtree contains at most N receiver nodes, where N is the number of receivers.

Proof. We prove the claims one by one. (1) By construction, Ti contains no cycles, the source node (or the coding point) has no incoming edges, and there exists a path from the source node (or the coding point) to each other vertex of Ti . Thus Ti is a tree rooted at the source node (or the coding point). (2) Since Ti is a tree, the linear combination that ﬂows through the root (source node or coding point) also has to ﬂow through the rest of the nodes. (3) Because the min-cut condition is satisﬁed in the original network, there exist h edge-disjoint paths to each receiver. Edge-disjoint paths in the original graph correspond to vertex-disjoint paths in the line graph γ. (4) Assume that the receiver nodes corresponding to the last edges of paths (Sk , Rj ) and (S , Rj ) belong to the same subtree Ti . Then both paths go through the root vertex (source node or coding point) of subtree Ti . But the paths for receiver Rj are vertex-disjoint.

38 Theoretical Frameworks for Network Coding (5) This claim holds because there are N receivers and, by the previous claim, all receiver nodes contained in the same subtree are distinct. From deﬁnition, the multicast property is satisﬁed in Γ if and only if the min-cut condition is satisﬁed for every receiver in G. Thus the following holds: Lemma 3.4. There is no valid codeword assignment (in the sense of Deﬁnition 3.2) for a subtree graph which does not satisfy the multicast property. We use this lemma to show properties of minimal subtree graphs. Directly extending Deﬁnition 3.5, a subtree graph is called minimal with the multicast property if removing any edge would violate the multicast property. Theorem 3.5. For a minimal subtree graph, the following holds: (1) There does not exist a valid network code where a subtree is assigned the same coding vector as one of its parents. (2) There does not exist a valid network code where the vectors assigned to the parents of any given subtree are linearly dependent. (3) There does not exist a valid network code where the coding vector assigned to a child belongs to a subspace spanned by a proper subset of the vectors assigned to its parents. (4) Each coding subtree has at most h parents. (5) If a coding subtree has 2 ≤ P ≤ h parents, then there exist P vertex-disjoint paths from the source nodes to the subtree.

Proof. (1) Suppose a subtree is assigned the same coding vector as one of its parents. Then removing the edge(s) between the subtree and the other parent(s) results in a subtree graph with

3.3 Combinatorial Framework

39

(2)

(3)

(4) (5)

a valid coding vector assignment. By Lemma 3.4, the multicast property is also satisﬁed. But we started with a minimal subtree graph and removed edges without violating the multicast property, which contradicts minimality. Suppose there is a subtree T whose parents T1 , . . . , TP are assigned linearly dependent vectors c(T1 ), . . . , c(TP ). Without loss of generality, assume that c(T1 ) can be expressed as a linear combination of c(T2 ), . . . , c(TP ). Then removing the edge between T and T1 results in a subtree graph with a valid coding vector assignment. From Lemma 3.4 the multicast property is also satisﬁed. But this contradicts minimality. Suppose there is a subtree T whose parents T1 , . . . , TP are assigned vectors c(T1 ), . . . , c(TP ). Without loss of generality, assume that T is assigned a vector c that is a linear combination of c(T2 ), . . . , c(TP ). Then removing the edge between T and T1 results in a subtree graph with a valid coding vector assignment. From Lemma 3.4 the multicast property is also satisﬁed. But this contradicts minimality. Since coding vectors are h-dimensional, this claim is a direct consequence of claim (2). By claim (2), the coding vectors assigned to the P parent subtrees must be linearly independent, which requires the existence of P vertex disjoint paths.

The ﬁrst three claims of Theorem 3.5 describe properties of valid codes for minimal subtree graphs, while the last two claims describe structural properties of minimal subtree graphs. The additional structural properties listed below for networks with two sources directly follow from Theorems 3.3 and 3.5. Theorem 3.6. In a minimal subtree decomposition of a network with h = 2 sources and N receivers: (1) A parent and a child subtree have a child or a receiver (or both) in common.

Consequently. If T has a receiver in common with each of its two parents. Consider a network code which assigns coding vectors [0 1] and [1 0] to the source subtrees. to each of the coding subtrees. if T and P2 do not have a child in common. where α is a primitive element of Fq and q > m. do not have a receiver in common. and thus this code is feasible and valid. By claim (1). Similarly. and thus has no children. the code is still valid. one eventually reaches a child of T that is a terminal node of the subtree graph. Therefore. say C1 . And so forth. Let P1 and P2 be its parents. which contradicts the assumption. If T and one of the parents. a parent and a child have either a receiver or a child in common. there exists a descendant of P2 and child of T which must have a receiver in common with T .40 Theoretical Frameworks for Network Coding (2) Each coding subtree contains at least two receiver nodes. with no child or receiver in common. following the descendants of P1 . Proof. (3) Each source subtree contains at least one receiver node. As discussed in Appendix. the conﬁguration is not minimal. then they have a child in common. (1) Suppose that the minimal subtree graph has two source and m coding subtrees. by claim (1) of Theorem 3.5. if T and C1 do not have receiver in common. any two diﬀerent coding vectors form a basis for F2 . then they have a child in common. and a diﬀerent vector [1 αk ]. Now. (2) Consider a coding subtree T . Similarly. T has to have a receiver in common with this terminal subtree. then T has two receiver nodes. 0 < k ≤ m. which would have to be assigned a scalar multiple of the coding vector of Ti and Tj . Let q Ti and Tj denote a parent and a child subtree. alter the code by assigning the coding vector of Ti to Tj . respectively. say P1 . This assignment is feasible because Ti and Tj do not share a child. . and keep the rest of coding vectors unchanged. Since Ti and Tj do not share a receiver.

there are no coding subtrees. Therefore.2. there exist exactly two minimal subtree graphs shown in Figure 3. Following the same reasoning as in the previous claim.4(b) shows the network scenario in which each receiver has access to a diﬀerent source subtree and a common coding subtree.6. then by claim (1) it will have a receiver or a child in common with T1 . then each must contain N receiver nodes. then by Theorem 3. it is easy to show that for example there exist exactly three minimal conﬁgurations with two sources and Fig. If network coding is required.. For a network with two sources and two receivers.3 Combinatorial Framework 41 (3) If the two source subtrees have no children. Figure 3. all instances {G. and can be reduced to it in polynomial time by removing edges from the graph G. Figure 3.3. say T1 . i. and (b) network coding necessary. R} with two sources and two receivers where network coding is required. This case corresponds to the familiar butterﬂy example of a network with two sources and two receivers. Proof. If the source subtree has a child.4 Two possible subtree graphs for networks with two sources and two receivers: (a) no coding required. 3. S.4(a) shows the network scenario in which each receiver has access to both source subtrees.e. and thus no network coding is required. Continuing along these lines. Lemma 3. “contain” the butterﬂy network in Figure 1.7. .4. we conclude that each source subtree contains at least one receiver node. there can only exist one coding subtree containing two receiver nodes.

42 Theoretical Frameworks for Network Coding three receivers. Similar analysis goes through in the case of larger ﬁnite alphabets. It also selects |Out(S)| functions {fiS }. fiS : 2nωS → 2n . For simplicity. E).. we can send only bits through the unit capacity edges of the network.1 Network Operation with Coding The network operates as follows: (1) A source S of rate ωS produces B packets (messages). Let In(v) and Out(v) denote the set of incoming and outgoing edges for vertex v. Consider the instance {G = (V. The proof outline we give is high level and overlooks a number of technical details. each containing nωS information bits. one can enumerate all the minimal conﬁgurations with a given number of sources and receivers. that is. one for each of its outgoing edges.g. S. 3. (2) Similarly. which map the nωS information bits into n coded bits to be transmitted over its outgoing edges.4. and seven minimal conﬁgurations with two sources and four receivers. R}. as we discuss in part II of the review). we assume binary communication. and as its output one . Subsequently. and refer the interested reader to the literature for in depth and rigorous expositions on the information theoretic approach (see Section 3. In fact. to derive bounds for multiple unicast sessions traﬃc. both information theorists and computer scientists used information theoretic tools to establish bounds on achievable rates for the cases where the main theorem does not hold (e. each vertex v ∈ V selects |Out(v)| functions {fiv }.4 Information-Theoretic Framework The original approach to network coding as well as the original proof of the main theorem was information theoretic. 3.3).6. We here brieﬂy summarize the very beautiful proof idea used for the main theorem. Each function fiv has as its inputs |In(v)| packets of length n.

which it uses to decode the source packet mk . We ﬁrst calculate the probability Pr(m. the received information rate is nωS B → ωS n(B + |V |) 3. It then (by the means of functions {fiv }) computes and sends |Out(v)| packets through its outgoing edges. each receiver Rj gets |In(Rj )| packets. based on its knowledge of the network topology and all the mapping functions that the source and the intermediate nodes employ. if we were to restrict the network nodes operation to linear processing over F2 . The network is clocked: at time k of the clock.3. For each information packet mk . After time B. For example. m ) that a speciﬁc . each other (nonsource) vertex v ∈ V waits until it receives all |In(v)| incoming packets that depend only on packet mk . the source S maps one of its B information packets denoted by mk to |Out(S)| packets.4. The functions {fiS } and {fiv } are chosen uniformly at random among all possible such mappings. the source stops transmission. (3) (4) (5) (6) The Information Rate Achievable with Coding We now argue that the previously described network operation allows the receivers to successfully decode the source messages. each function fiv would be deﬁned by a randomly selected binary matrix of dimension |In(v)|n × n. 1 ≤ k ≤ B. 0}nωS . For each information packet mk . for 1 ≤ k ≤ B. and sends these packets through its outgoing edges. provided that the rate ωS is smaller than the min-cut between the source and each receiver. 1 ≤ k ≤ B. The receivers decode all B source packets during at most B + |V | time slots.4 Information-Theoretic Framework 43 packet of length n fiv : 2n|In(v)| → 2n . (1) Consider two distinct source messages m and m ∈ {1. Consequently. also depending only on packet mk .2 for B |V |.

since the mappings for every edge are chosen independently. That is. Since the mappings are chosen uniformly at random. at any other part but the source that produces packets at . and thus. for all v ∈ V. For every edge e = (v. and maps them using v the randomly chosen fe to the same packet for edge e. the condition |ϑV| > ωS is suﬃcient. S belongs to V = V \ V (the complement of V in V ). v receives from its input edges two distinct sets of packets for m and m . we have an error at receiver Rj if and only if Rj ∈ V. u) ∈ ϑV. at all the edges in ϑV. by the union bound. m ) to be the set of vertices in V that cannot distinguish between source messages m and m . Deﬁne V = V(m. In other words. the total probability or error does not exceed 2−n|ϑV| 2nωS . u) ∈ E with v ∈ V. ϑV deﬁnes a cut between the source and the receiver Rj . if m = minV |ϑS| is the min-cut value between the source and the destination. the probability of error also decays to zero. messages m and m are mapped to the same point equals 2−n|ϑS| . We now calculate the probability that such a cut occurs. u ∈ V}. Using the union bound we can show that the same result holds simultaneously for all receivers.44 Theoretical Frameworks for Network Coding receiver Rj is not going to be able to distinguish between m and m . Thus. the probability that. Furthermore. we calculate the pairwise probability of error. ϑV = {e|e = (v. this can happen with probability 2−n . the network operation we previously described allows to achieve all rates strictly smaller than this min-cut. all the In(v) incoming packets corresponding to messages m and m are the same. For this to happen. Clearly. When an error occurs. note that the network operation does not use the knowledge of the min-cut value. Therefore. Moreover. v can distinguish between m and m but u cannot. that is. This means that. First. provided that 2−n(|ϑV|−ωS ) decays to zero. (2) (3) (4) (5) We now make several important observations. Let ϑV denote the edges that connect V with V. We have 2nωS distinct messages in total.

3. That is. although the proof assumed equal alphabet sizes. Let fvu ∈ R+ denote the ﬂow through the 1 As deﬁned for example in [17].1 3. We start with some standard formulations for the max-ﬂow problem and for multicast without network coding.5. the same arguments go through for arbitrary alphabet sizes. where an intermediate node receiving linear independent information streams needs. and more generally. Third. or where the node is located in the network. and then present the LP formulations for multicast with network coding. as we will discuss in the second part of the review.5 Linear-Programming Framework 45 the information rate. each intermediate node performs the same operation. instead of unit capacity edges. In this section. E). Finally. if each edge were a Discrete Memoryless Channel (DMC). which turned out to be the most natural for addressing networks with costs. where edge e = (v.1 Unicast and Multicast Flows Consider the problem of maximizing the ﬂow from a source S ∈ V to a single destination R ∈ V . we had arbitrary capacities. 3. we always assume a capacitated directed graph G = (V. no matter what the min-cut value is. using the information theoretical min-cut. Network coding has also been studied within this framework. This is not the case with routing. Another notable achievement of this approach is the characterization of throughput beneﬁts of network coding. This property of network coding proves to be very useful in dynamically changing networks. note that the same proof would apply if. to know which information stream to forward where and at which rate. Chapter 14.5 Linear-Programming Framework There is a very rich history in using Linear Programming (LP) for studying ﬂows through networks. the packet length n can be thought of as the “alphabet size” required to achieve a certain probability of error when randomly selecting the coding operations. Second. for example. u) that connects vertex v ∈ V to vertex u ∈ V has an associated capacity cvu ≥ 0. .

u) ∈ E ∀ (v. ﬂow conservation can also be applied to the source and destination nodes. RN } of N terminals.5 depict a Steiner tree rooted at S2 and a partial Steiner tree rooted at S1 . ∀ (v. A partial Steiner tree may not reach all receivers. We want to maximize the ﬂow that the source sends to the receiver. . With each tree t. fvu ≥ 0. . then this problem is equivalent to the problem of Packing Steiner Trees. apart from the source and the destination.46 Theoretical Frameworks for Network Coding edge e = (v. we associate a random variable yt that expresses the information rate conveyed through the tree t to the receivers.w)∈E fuw . and nt the number of terminals in t. 2 Fractional means that we do not restrict yt to integer values. (capacity constraints) Consider now the problem of maximizing the multicast rate from the (source) vertex S ∈ V to a set R = {R1 .u)∈E (u. . The capacity constraints require that the value of the ﬂow through each edge does not exceed the capacity of the edge. Let τ be the set of all Steiner trees in {G. Flow conservation ensures that at every node u of the graph. R}. R2 . The maximum fractional2 packing of Steiner trees can be calculated by solving the following linear program. subject to two conditions: capacity constraints and ﬂow conservation. Figure 3. the total incoming ﬂow equals the total outgoing ﬂow. The associated LP can be stated as follows: Max-Flow LP: maximize fRS subject to fvu = (v. S. u) ∈ E. . by adding a directed edge of inﬁnite capacity that connects the destination to the source. . u) ∈ E ∀u∈V (ﬂow conservation) fvu ≤ cvu . A Steiner tree is a tree rooted at the source vertex that spans (reaches) all receivers. If we do not use network coding. In fact.

3.2 Multicast with Network Coding with and without Cost If we use network coding. . t∈τ :e∈t ∀e∈E (capacity constraints) yt ≥ 0. Packing Steiner Trees LP: maximize t∈τ yt subject to yt ≤ ce .” for example using linear combination. which simply summarizes what information rate actually ﬂows through each edge. we associate its own ﬂow f i . Each individual ﬂow f i obeys ﬂow conservation equations.5 The butterﬂy network with a Steiner tree rooted at S2 and a partial Steiner tree rooted at S1 . With each receiver Ri . the individual ﬂows are allowed to “overlap. ∀t∈τ This is a hard problem to solve. However. 3. as the set of all possible Steiner trees can be exponentially large in the number of vertices of the graph. the problem of maximizing the throughput when multicasting has a polynomial time solution. resulting in a conceptual ﬂow f .5.5 Linear-Programming Framework 47 Fig. The new idea we can use is the following.3.

R} where the min-cut to each receiver is larger or equal to h. ∀ (v. In many information ﬂow instances.w)∈E ∀i i fuw . such as the one we describe next. u) ∈ E. we could simply identify the min-cut from the source to each receiver. and this will be the subject of Chapter 5. ≥ 0 and fvu ≥ 0. and assume . we can apply the main theorem in network coding to prove the existence of a network code that can actually deliver this rate to the receivers. we may be interested in minimizing a cost that is a function of the information rate and may be diﬀerent for each network edge. u) ∈ E ∀ (v. ∀ u ∈ V. ∀i The ﬁrst constraint ensures that each receiver has min-cut at least χ.48 Theoretical Frameworks for Network Coding It is now this ﬂow that needs to satisfy the capacity constraints. no matter how the diﬀerent ﬂows fi overlap. and that it introduces an elegant way of thinking about network coded ﬂows through conceptual ﬂows that can be applied to problems with similar ﬂavor. ∀ (v. For example. The last missing piece is to show that we can design these network codes in polynomial time. S.u)∈E i fvu i fvu (u. Consider an instance {G. the cost of power consumption in a wireless environment can be modeled in this manner. The merit of this program is that it can easily be extended to apply over undirected graphs as well. Note that. i fvu = (v. instead of maximizing the rate. Again we add virtual edges (Ri . S) of inﬁnite capacity that connect each receiver Ri to the source vertex. Note that the conceptual ﬂow f does not need to satisfy ﬂow conservation. Network Coding LP: maximize χ subject to i fRi S ≥ χ. u) ∈ E (conceptual ﬂow constraints) (capacity constraints) fvu ≤ cvu . ∀ i (ﬂow conservation) ≤ fvu . Once we solve this program and ﬁnd the optimal χ. and use the minimum such value. instead of solving the LP. over directed graphs.

if there is a cost associated with using the edges of the graph. 3. u) ∈ E ∀ (v. The problem of minimizing the overall cost in the network is also computationally hard if network coding is not allowed.6 Types of Routing and Coding We here describe several types of routing and coding we have or will encounter in this review.1 Routing Routing refers to the case where ﬂows are sent uncoded from the source to the receivers. 3. On the other hand. the same rate the receiver would achieve by using all the network resources exclusively to support its information transmission.2. although network coding allows us to send information to each receiver at a rate equal to its min-cut. where we associate zero cost with the virtual edges (Ri . and intermediate nodes simply forward their incoming . ∀u ∈ V. S): Network Coding with Cost LP: minimize (x. u) ∈ E (conceptual ﬂow constraints) (capacity constraints) fvu ≤ cvu . u) is proportional to the ﬂow fvu with a weight wvu ≥ 0 (i. by using network coding. i fvu = (v.3. ∀ (v.e. ∀ (v.. u) ∈ E. ∀i In general. i fvu ≥ 0 and fvu ≥ 0. We will see a speciﬁc such case in Example 6. we can solve it in polynomial time by slightly modifying the previous LP..y)∈E wxy fxy subject to i fRi S ≥ h.6 Types of Routing and Coding 49 that the cost of using edge (v. the cost equals wvu fvu ).6. ∀i (ﬂow conservation) ≤ fvu . i.e. network coding does not necessarily allow us to achieve the optimal cost that the exclusive use of the network would oﬀer.u)∈E i fvu (u.w)∈E ∀ i i fuw .

intermediate nodes are allowed to combine packets using any operation (e.2 Network Coding Network coding refers to the case where the intermediate nodes in the network are allowed to perform coding operations. Fractional (integral) routing is equivalent to the problem of packing fractional (integral) Steiner trees. Vector routing amounts to packing Steiner trees not only over space. bitwise AND). (3) Vector routing. but also over time. We can think of each source as producing a length L binary packet.50 Theoretical Frameworks for Network Coding information ﬂows.g. We distinguish the following types of routing: (1) Integral routing. (2) Fractional routing. . Encoding L/m amounts to linear transformations in the vector space F2m : each incoming packet (that is. and then added to other packets. the corresponding vector) is L L multiplied by an m × m matrix over F2m . Coding amounts to linear operations over the ﬁeld Fq . (3) In nonlinear network coding. each incoming packet in a coding point is multiplied with a symbol in Fq and added to other incoming packets. Thus. and assume we can send through each edge one bit per time slot. we treat each packet as a vector of length L/m with symbols in F2m for some m. sets of m consecutive bits are treated as a symbol of Fq with q = 2m . which allows multiple fractional rates from diﬀerent sources to share the same edge as long as they add up to at most the capacity of the edge. The same operations are applied symbol-wise to each of the L/m symbols of the packet. 3. (1) In linear network coding. which can be either integral or fractional but is allowed to change at the beginning of each time slot. There are several possible approaches to network coding.6.. (2) In vector linear network coding. which requires that through each unit capacity edge we route at most one unit rate source.

Let τ denote the set of partial Steiner trees in G rooted at S with terminal set R.e. We will be using the number of time slots. there exist networks that are solvable using vector linear network coding. S.3 Coding at the Source This is a hybrid scheme between routing and network coding: source nodes are allowed to perform coding operations and send out encoded information streams.6. are only allowed to forward their incoming information streams. The goal is to maximize the total number of trees that each receiver occurs in. the alphabet size of the solution can be signiﬁcantly reduced as we move from linear. i. 3. but not solvable using linear network coding. We now formulate the problem of creating an appropriate routing schedule that supports such a coding scheme as a linear program. In each time slot we seek a feasible fractional packing of partial Steiner trees. For example: • Pack through the network and over n-time slots partial Steiner trees. Intermediate nodes in the network however. there exist networks which are solvable only using nonlinear operations and are not solvable using linear operations. trees that are rooted at the source nodes that span a subset of the receivers such that each receiver appears in at least f trees. to vector linear. and to nonlinear solutions. as a parameter. we can combine routing (or vector routing) with erasure correcting coding. Moreover.. Finally. • Encode the information using an erasure correcting code at the source and send a diﬀerent coded symbol through each partial tree. Consider an instance {G. The idea behind this scheme is to assume that a receiver experiences an “erasure” if it is not spanned by a partial Steiner tree. Thus each receiver essentially observes the source information through an erasure channel. n.3. R}.6 Types of Routing and Coding 51 For traﬃc conﬁgurations where the main theorem in network coding does not apply. In networks where there are no losses. to maximize the multicast rate. across .

52 Theoretical Frameworks for Network Coding the time slots. Coding at the Source LP: maximize f subject to y(t. let m = k t∈τ y ∗ (t.) For integer edge capacities ce . At each time slot. we can pack three partial Steiner trees. We express this as a linear program as follows.e. which would lead to Ti = 1.3. y(t. and each receiver has min-cut three from the source. 1≤k≤n Given a solution y ∗ to this linear program.. we can then achieve Ti = 2. For a tree t ∈ τ and a time slot k. k t∈τ :Ri ∈t ∀ Ri 1≤k≤n y(t. The network in Figure 3. (To be precise. Alternatively. In lossy networks. This scheme is very related to the rate-less codes approach over lossy networks. We can use an erasure code that employs m coded symbols to convey the same f information symbols to all receivers. Example 3. If we are restricted to integral routing over a single time-slot. . k). we need f /m to be a rational and not a real number. k) that indicates the amount of t that is packed in time slot k. Rate-less codes allow the receivers to decode as soon as they collect a suﬃcient number of coded symbols. the erasures that occur correspond to a particular packing of partial Steiner trees. ∀ t ∈ τ. packet erasures can be thought of as randomly pruning trees. k) ≥ 0. such as LT codes and Raptor codes. k) ≥ f. t∈τ :e∈t ∀ e ∈ E. where each receiver is taking part in two distinct trees. there is an optimum solution with rational coordinates.6 has N = 10 receivers. we have a non-negative variable y(t. i. to achieve rate Ti = f /m. k) ≤ ce . Using an erasure code of rate 2/3 at the source. we can pack one Steiner tree.

The presented combinatorial framework follows the approach of Fragouli and Soljanin in [24]. or (Case 2) three unit rate partial Steiner trees. we can pack (Case 1) one unit rate Steiner tree. The network coding LP . Introduction to LP can be found in numerous books. 44]. who translated the network code design to an algee braic problem which depends on the structure of the underlying graph.1 is proved by Ho et al. The information theoretic framework comes from the original paper by Ahlswede et al. Notes The algebraic framework for network coding was developed by Koetter and M´dard in [32].6 A network with unit capacity edges where. if we are restricted to integral routing over one time-slot. see for example [10.3. Lemma 3.6 Types of Routing and Coding 53 Fig. 3. [2]. in [26] and by Harvey in [25].

Linear vs.54 Theoretical Frameworks for Network Coding that achieved ﬂow maximization was proposed by Li et al. in [15]. while the cost minimization LPs and decentralized distributed algorithms for solving them were proposed by Lun et al. in [11]. in [38]. in [37]. . nonlinear solvability was discussed by Riis in [42] and by Dougherty et al. in [19]. Vector routing was examined by Cannons et al. Coding at the source using partial tree packing was proposed by Chekuri et al.

which refers to the average of the rates that the individual receivers experience. We consider both directed and undirected networks. This approach applies to both directed and undirected graphs. and average throughput. For the special case of undirected networks.4 Throughput Beneﬁts of Network Coding The multicast examples considered in the previous chapters demonstrated that network coding can oﬀer throughput beneﬁts when compared to routing. under two types of routing: integral routing. which refers to the common information rate guaranteed to all receivers. which requires that through each unit capacity edge we route at most one unit rate source. We will see how throughput beneﬁts of network coding can be related to the integrality gap of a standard LP formulation for the Steiner tree problem. we will outline a simple proof that coding can at most double the throughput. we will here look into how large such beneﬁts can be. we will give examples where coding oﬀers 55 . For directed networks. which allows multiple fractional rates from diﬀerent sources that add up to at most one on any given edge. and fractional routing. We consider two throughput measures: symmetric throughput.

we use the following notation: j • Tij and Tf denote the rate that receiver Rj experiences with fractional and integral routing. where the maximization is over all possible routing strategies. The beneﬁts of network coding in the case of the common throughput measure are described by Ti Tc and Tf Tc and the beneﬁts of network coding in the case of the average throughput measure are described by Tiav Tc and av Tf Tc . under some routing strategy. We refer to this throughput as symmetric or common. j 1 1 av • Tiav = N max N Tij and Tf = N max N Tf denote the j=1 j=1 maximum integral and fractional average throughput. 4. respectively. For a multicast conﬁguration with h sources and N receivers. and refer to it as coding throughput. We. . from the main network multicast theorem. Also. • Ti and Tf denote the maximum integral and fractional rate we can route to all receivers. For the routing throughput.56 Throughput Beneﬁts of Network Coding large throughput beneﬁts as well as examples where these beneﬁts are limited. it holds that Tc = h.1 Throughput Measures We denote by Tc the maximum rate that the receivers experience when network coding is used. have av 1 ≤ Tiav ≤ Tf ≤ h. therefore. because there exists a tree spanning the source and the receiver nodes. the uncoded throughput is at least 1.

and R = {R1 . 4. e ∈ E. h Tc Tc (4. R2 . In the Steiner tree packing problem. S. . .1 and give without proof the respective result for the average throughput in Theorem 4.2. 4. With every edge e of the graph.4. S. We prove this result for the symmetric throughput in Theorem 4.1) is achievable by the conﬁgurations in which network coding is not necessary for multicast at rate h. .1) The upper bound in (4. we can in general associate two parameters: a capacity ce ≥ 0 and a cost (weight) we ≥ 0. we are given {G. However. denote vectors that collect the set of edge capacities and edge weights. R} where G = (V. We focus our discussion on directed graphs. respectively. . RN } a set of N terminals (receivers). R}. R} and a set of non-negative edge capacities c. the described approach applies to both directed and undirected graphs. S ∈ V is a root (source) vertex. Let c = [ce ] and w = [we ].2 Linear Programming Approach We here use the Linear Programming (LP) framework to show that the throughput beneﬁts network coding oﬀers equal the integrality gap of a standard formulation for the Steiner tree problem. S. as we will treat undirected graphs separately later on. Either the edge weights or the edge capacities or both may be relevant in a particular problem. For the instance {G. a Steiner tree is a subset of G that connects S with the set of receivers R.1 Symmetric Throughput Coding Beneﬁts Consider an instance {G.2 Linear Programming Approach 57 and thus av Tf T av 1 ≤ i ≤ ≤ 1.2. The maximum throughput achievable with routing for this instance can be calculated using the following LP . E) is a directed graph. We consider essentially two problems: the Steiner tree packing problem and the minimum Steiner tree problem.

S. In the minimum Steiner tree problem. ∀ t∈τ Recall that the variable yt expresses how much fractional rate we route from the source to the receivers through the Steiner tree t. R. e∈t ze ≥ 0. we can think of the variable ze as expressing a cost of edge e. For further analysis.2) ∀ e∈E subject to yt ≤ ce . c) the rate network coding achieves. R} and a set of non-negative edge weights w. and Tc (G. Let Tf (G. S.58 Throughput Beneﬁts of Network Coding (see Chapter 3): Packing Steiner Trees – Primal: maximize t∈τ yt (4. t∈τ :e∈t yt ≥ 0.3) ∀ t∈τ subject to ze ≥ 1. ∀ e∈E In the dual program. for all possible Steiner trees t ∈ τ for our instance. We are asking what cost ze should be associated with edge e (for all e ∈ E) so that each Steiner tree has cost at least one and the total cost e∈E ce ze is minimized. it is useful to look at the dual problem of the above LP: Packing Steiner Trees – Dual: minimize e∈E ce ze (4. S. c) denote the optimal solution of this problem. R. We are asked to ﬁnd the minimum weight tree that connects the source to all the terminals. we are given {G. Here edge capacities are not relevant: the Steiner tree either uses or .

w. We consider the following / integer programming (IP) formulation for the minimum Steiner tree problem: Steiner Tree Problem: minimize e∈E we xe (4.2 Linear Programming Approach 59 does not use an edge. We can further simplify this constraint to xe ≥ 0. We denote by opt(G. then it remains feasible by setting xe = 1).5) ∀ D: D is separating subject to xe ≥ 1.4. v) ∈ E : u ∈ D.4) ∀ D: D is separating ∀ e∈E subject to xe ≥ 1. 1}. δ(D) = {(u. we need the notion of a separating set of vertices. e∈δ(D) xe ≥ 0. To state this problem. R) the value of the optimum solution for the given instance. v ∈ D}. A linear relaxation of the above IP is obtained by replacing the constraint xe ∈ {0. S. that is. 1} by 0 ≤ xe ≤ 1 for all e ∈ E. e∈δ(D) xe ∈ {0. and obtain the following formulation: Linear Relaxation for Steiner Tree Problem: minimize e∈E we xe (4. ∀ e∈E . e ∈ E (noticing that if a solution is feasible with xe ≥ 1. We call a set of vertices D ⊂ V separating if D contains the source vertex S and V \ D contains at least one of the terminals in R. where there is a binary variable xe for each edge e ∈ E to indicate whether the edge is contained in the tree. Let δ(D) denote the set of edges from D to V \ D.

60 Throughput Beneﬁts of Network Coding Let lp(G, w, S, R) denote the optimum value of this linear program for the given instance. The value lp(G, w, S, R) lower bounds the cost of the integer program solution opt(G, w, S, R). The integrality gap of the relaxation on G is deﬁned as α(G, S, R) = max

w≥0

opt(G, w, S, R) , lp(G, w, S, R)

where the maximization is over all possible edge weights. Note that α(G, S, R) is invariant to scaling of the optimum achieving weights. The theorem tells us that, for a given conﬁguration {G, S, R}, the maximum throughput beneﬁts we may hope to get with network coding equals the largest integrality gap of the Steiner tree problem possible on the same graph. This result refers to fractional routing; if we restrict our problem to integral routing on the graph, we may get even larger throughput beneﬁts from coding. Theorem 4.1. Given an instance {G, S, R} max

w

opt(G, w, S, R) Tc (G, S, R, c) = max . c Tf (G, S, R, c) lp(G, w, S, R)

**Proof. We ﬁrst show that there exist weights w so that max
**

c

opt(G, w, S, R) Tc (G, S, R, c) ≤ . Tf (G, S, R, c) lp(G, w, S, R)

(4.6)

(1) For the instance {G, S, R}, ﬁnd c = c∗ such that the left-hand side ratio is maximized, i.e., Tc (G, S, R, c) Tc (G, S, R, c∗ ) = max . ∗) c Tf (G, S, R, c) Tf (G, S, R, c Assume c∗ is normalized so that the minimum min-cut from the source to the receivers (and as a result the network coding rate) equals one, that is, 1 Tc (G, S, R, c∗ ) = . ∗) Tf (G, S, R, c Tf (G, S, R, c∗ ) (4.7)

Note that normalization does not aﬀect fractional routing.

4.2 Linear Programming Approach

61

**(2) Now solve the dual program in (4.3). From strong duality,1 we get that Tf (G, S, R, c∗ ) =
**

∗ yt = ∗ ce ze .

(4.8)

Then the dual program equivalently says that there exist ∗ costs ze such that each Steiner tree has cost at least one. From the slackness conditions,2 each tree t used for fractional ∗ routing (with yt > 0), gets cost of exactly one e∈t ze = 1. (3) We now use our second set of optimization problems. Consider the same instance {G, S, R}, and associate weights ∗ we = ze with every edge of the graph, and solve on this graph the Steiner tree problem in (4.4). Since for this set of weights, each Steiner tree in our graph has weight at least one, and (as previously discussed) the slackness conditions imply that there exist trees of weight one, the minimum value the Steiner tree IP can achieve equals one: opt(G, w = z ∗ , S, R) = 1. (4.9)

(4) Consider the LP relaxation of this problem stated in (4.5). Note that we can get a feasible solution by choosing x = c∗ (the capacities in the original graph). Indeed, for these values the min-cut to each receiver equals at least one. Moreover, ∗ ze ce = Tf (G, S, R, c∗ ). Therefore, the LP minimization will lead to a solution lp(G, w = z ∗ , S, R) ≤ Tf (G, S, R, c∗ ). Putting together (4.7), (4.9), and (4.10), we get (4.6). We next show that there exist capacities c so that opt(G, w, S, R) Tc (G, S, R, c) ≥ max . w lp(G, w, S, R) Tf (G, S, R, c)

1 Strong 2 See

(4.10)

(4.11)

duality holds since there exists a feasible solution in the primal. [10, 44] for an overview of LP.

62 Throughput Beneﬁts of Network Coding (1) We start from the Steiner tree problem in (4.4) and (4.5). Let w∗ be a weight vector such that α(G, S, R) = opt(G, w∗ , S, R) opt(G, w, S, R) . = max w lp(G, w∗ , S, R) lp(G, w, S, R)

**Let x∗ be an optimum solution for the LP on the instance (G, w∗ , S, R). Hence lp(G, w∗ , S, R) =
**

e ∗ we x∗ . e

Note that vectors with positive components, and thus the vectors x = {xe , e ∈ E} satisfying the constraints of (4.5), can be interpreted as a set of capacities for the edges of G. In particular, we can then think of the optimum solution x∗ as associating a capacity ce = x∗ with each edge e so that the e min-cut to each receiver is greater or equal to one, and the ∗ cost e we x∗ is minimized. e (2) Consider now the instance with these capacities, namely, {G, c = x∗ , S, R} and examine the coding throughput beneﬁts we can get. Since the min-cut to each receiver is at least one, with network coding we can achieve throughput Tc (G, c = x∗ , S, R) ≥ 1. With fractional routing we can achieve Tf (G, c = x∗ , S, R) =

∗ yt

(4.12)

(4.13)

∗ for some yt . (3) We now need to bound the cost opt(G, w∗ , S, R) that the solution of the IP Steiner tree problem will give us. We will use the following averaging argument. We will show that the average cost per Steiner tree will be less or equal to ∗ we x∗ e . Tf (G, c = x∗ , S, R)

Therefore, there exists a Steiner tree with the weight smaller than the average, which upper bounds the solution of (4.5): opt(G, w∗ , S, R) ≤

∗ lp(G, w∗ , S, R) we x∗ e = Tf (G, x∗ , S, R) Tf (G, x∗ , S, R)

2 Linear Programming Approach 63 But from (4. by changing the order of summation. c = x∗ . S.14) ∗ ∗ ∗ (5) Finally. . Then the network coding throughput is equal to the common min-cut value. Hence we have that e ∗ wt yt ≤ t∈τ e∈E ∗ we x∗ . if the min-cut to a receiver Ri is larger than required.2 Average Throughput Coding Beneﬁts We now consider the coding advantage for average throughput over a multicast conﬁguration {G. We have ∗ wt yt ≥ min {wt } t∈τ t∈τ t∈τ ∗ yt . c = x∗ .4. 4. That is. S. S. Indeed. ∗ (4) Let wt = e∈t we denote the weight of a tree t. S.12). the ∗ quantity t:e∈t yt is at most x∗ . By the feasibility of y ∗ for the capacity vector c = x∗ .6) and (4.15) Putting together (4. R) and we are done. R) Tc (G. We will assume for technical reasons that the min-cut from S to each of the terminals is the same. we need to show that t∈τ wt yt ≤ e∈E we xe . S. This can be easily arranged by adding dummy terminals. S. we get ∗ wt yt = t∈τ t∈τ ∗ yt e∈t ∗ we = e∈E ∗ we t:e∈t ∗ yt . w∗ . c = x∗ . Thus there exists a tree t1 ∈ arg mint∈τ {wt } of weight wt1 such that wt1 ≤ t∈τ ∗ wt yt ∗ . w∗ . we connect the receiver node to a new dummy terminal through an edge of capacity equal to the min-cut. R) Tf (G. we have lp(G. t∈τ yt (4. R) ≤ lp(G. R} and a set of non-negative capacities c on the edges of G.2. e (4. and consider ∗ ∗ t∈τ wt yt (the total weight of the packing y ).11) proves the theorem. R) Tf (G.

c) denote the value of the above linear program on a given instance. S. R ). since we assumed that the min-cut to each receiver is the same. c. S. S. R. S. R) = max c Tc (G. It is easy to see that β(G. R) ≤ α(G. R) ≥ 1. av Tf (G. R}. R) Note that β(G) is invariant to scaling capacities. since for any given conﬁguration {G. av namely. S. S. R ⊆R . S. A partial Steiner tree t stems from the source S and spans all or only a subset of the terminals. S. we associate a variable yt denoting a fractional ﬂow through the tree. R}. S. Then the maximum fractional packing of partial Steiner trees is given by the following linear program: Packing Partial Steiner Trees: nt maximize yt N t∈τ subject to yt ≤ ce . R). t∈τ :e∈t ∀ e∈E yt ≥ 0. the average throughput is at least as large as the common throughput we can guarantee to all receivers. Tf ≥ Tf . c.64 Throughput Beneﬁts of Network Coding The maximum achievable average throughput with routing is given by the maximum fractional packing of partial Steiner trees. and thus network coding achieves the maximum possible sum rate. S. ∀ t∈τ av Let Tf (G. R) . Let β(G. R∗ ) = max β(G. It is also straightforward that β(G. R∗ ) denote the maximum average throughput beneﬁts we can get on graph G when multicasting from source S to any possible subset of the receivers R ⊆ R: β(G. The coding advantage for average throughput on G is given by the ratio β(G. Let τ be the set of all partial Steiner trees in {G. With each tree t. S. and nt the number of terminals in t. c. S.

S. where HN is the N th harmonic number HN = av Note that the maximum value of Tf and Tf is not necessarily achieved for the same capacity vector c. The theorem quantitatively bounds the advantage in going from the stricter measure α(G. N = h receivers. R) . S. S. For a conﬁguration {G.4.1. R) or β(G. What this theorem tells us is that.3 Conﬁgurations with Large Network Coding Beneﬁts The ﬁrst example that demonstrated the beneﬁts of network coding over routing for symmetric throughput was the combination network B(h. 4. R∗ ) = O(1/ ln N )α(G. S. R). HN N j=1 1/j. The Combination Network k The combination network B(h. S. it is often the case that for particular instances (G. All sources . with |R| = N . S. S. R∗ ) ≥ max 1. has h sources. where k ≥ h. k). R∗ ) = 1. HN ]. R∗ ) is easier to analyze and the theorem can be useful to get an estimate of the other quantity. for a given {G. S. R)/β(G. R∗ ). Example 4. S. the ratio α(G. We now comment on the tightness of the bounds in the theorem. we have β(G. Examples include network graphs discussed in the next section. R). There are instances in which β(G. Furthermore. R} where the number of receivers is |R| = N and the min-cut to each receiver is the same. and two layers of k intermediate nodes A and B. 1 α(G. S. S. S. R) to the weaker measure β(G. k) depicted in Figure 4. S.3 Conﬁgurations with Large Network Coding Beneﬁts 65 Theorem 4. take for example the case when G is a tree rooted at S. or for the same number of receivers N . R∗ ) can take a value in the range [1.2. the maximum common rate we can guarantee to all receivers will be at most HN times smaller than the maximum average rate we can send from S to any subset of the receivers R. On the other hand there are instances in which β(G. S. R}. either α(G. In general.1.

Tc h for k ≥ h2 . A speciﬁc instance B(h = 3. as Figure 4. because there exist h sources and k > h2 points. Each A node is connected to a correspondk ing B node. These receiver will experience rate equal to one. 4. between the A and B nodes. Each of the h receivers is connected to a diﬀerent subset of size h of the B nodes. The beneﬁts of network coding compared to integral routing for the network B(h. k) can be bounded as 1 Ti ≤ . note that the min-cut condition is satisﬁed for every receiver. 1 ≤ i ≤ k. that are the edges (Ai . Edges have unit capacity and sources have unit rate. From the pigeonhole principle.66 Throughput Beneﬁts of Network Coding k Fig. k) network has h sources and N = h receivers. Consider the receiver that observes these h coding points where S1 ﬂows. . each receiver observing a distinct subset of h B-nodes.16) follows.6.16) To see that. Bi ). that will be allocated to at least h coding points. and thus Tc = h. k = 5) is depicted in Figure 3. say S1 .1 The combination B(h. k) has k coding points. there will exist at least one source. are connected to k A nodes. that sends one source through every edges that is a coding point. Choose any integral routing scheme. and thus the bound (4. (4. B(h.1 depicts.

Example 4. All edges in the graph have unit capacity. N ) Network Let N and p. Therefore. be two integers and I = {1. . A zk(N. We next describe a class of networks with N √ receivers for which coding can oﬀer up to O( N )-fold increase of the average throughput achievable by routing. we will see that by using a coding scheme at the source and vector routing. the out-degree of A nodes is N − (p − 1). It is. because they were introduced by Zosin and Khuller. as we will see the following section.2. Tc h(k − h + 1) (4. is deﬁned by two parameters N and p as follows: Source S transmits information rate ω = h = N −1 to N p−1 receiver nodes R1 · · · RN through a network of three sets of nodes A. A B node is connected to a C node if and only if they correspond to the same element of B. We refer to this class as zk networks. .17) On the other hand. 2. therefore. who constructed them to demonstrate a possible size of the integrality gap for the directed Steiner tree problem. and thus the coding beneﬁts are upper bounded by the same factor. natural to ask if there exist conﬁgurations where network coding oﬀers signiﬁcant beneﬁts with respect to the average throughput. and B. Note that each Steiner tree needs to use k − (h − 1) out of the k Ai Bi edges. illustrated in Figure 4. A receiver node is connected to the C nodes whose indices contain the index of the receiver. the average throughput beneﬁts for B(h. the out-degree of C nodes is p. the in-degree of B nodes is p. and C. and the . and consequently Tf k ≤ . An A node is connected to a B node if the index of A is a subset of the index of B. .4. The zk(p. k) are upper bounded by a factor of two. N } be an index set. the symmetric throughput can approach the average.2. by the elements of B. . We deﬁne two more index sets: A as the set of all (p − 1)-element subsets of I and B as the set of all p-element subsets of I. The out-degree N of the source node is p−1 .3 Conﬁgurations with Large Network Coding Beneﬁts 67 It is also straightforward to upper-bound the fractional throughput of B(k. to reach all the receivers.and C-nodes. the fractional packing number is at most k/(k − h + 1). B. In addition. p) network. h). p ≤ N . A-nodes are indexed by the elements of A.

and that the ct C-nodes are all descendants of the at A-nodes. the number of B.68 Throughput Beneﬁts of Network Coding Fig. p) network (Figure 4.3. N −1 p−1 in-degree of the receiver nodes is there exist exactly N −1 p−1 . we can .2) with h = sources and N receivers. N −p+1 p (4. it is easy to show that N −1 p−1 edge disjoint paths between the source and .and C-nodes. In fact. and ct . 4. Let at be the number of A-nodes in tree t. and thus the min-cut to each receiver equals The following theorem provides an upper bound to the average throughput beneﬁts achievable by network coding in zk(N. each receiver. Theorem 4. each carrying a diﬀerent information source.18) Proof. p) networks. the information is transmitted from the source node to the receiver through a number of trees. Note that ct ≥ at .2 zk(N. If only routing is permitted. Therefore. p) network. In a zk(N. av Tf N −1 p−1 Tc ≤ 1 p−1 + .

we need to ﬁnd the number of receivers that can be reached by a set of disjoint trees. (4.4. Tc N −p+1 p av We can apply the exact same arguments to upper bound Tf . the right-hand side of (4. j=1 The maximum number of receivers the tree can reach through Aj is nt (Aj ) + p − 1. Note that for any set of disjoint trees.3 Conﬁgurations with Large Network Coding Beneﬁts 69 count the number of the receivers spanned by the tree as follows: Let nt (Aj ) be the number of C-nodes connected to Aj .20) is minimized for N +1 p= √ N +1 √ N. Therefore. and interpreting these values as the fractional rate of the corresponding trees. Consequently. the maximum number of receivers the tree can reach is at [nt (Aj ) + p − 1] = at (p − 1) + ct . Ti can be upper-bounded as Ti ≤ 1 N (at (p − 1) + ct ) = t 1 (p − 1) N at + t t ct (4. j=1 To ﬁnd an upper bound to the routing throughput.20) p−1 Tiav 1 ≤ + . For a ﬁxed N .19) ≤ (p − 1) N p−1 + N . p Thus. the jth A-node in the tree. p N −1 p−1 The network coding rate Tc is equal to . we have at ≤ t N p−1 and t ct ≤ N . Note that at nt (Aj ) = ct . . by allowing at and ct to take fractional values.

In any combination network with two unit rate sources (network coding throughput Tc = 2) a simple routing scheme can achieve an average throughput of at least 3Tc /4 as follows: We route S1 through one half of the k coding points Ai Bi and S2 through the other half. 4 that computing an optimum routing is in general NP-hard.3 We next describe several such classes of networks. which include several examples considered in the network coding literature up to date. Example 4. Therefore. The simplest examples of such networks are conﬁgurations with two sources.4 Conﬁgurations with Small Network Coding Beneﬁts For a number of large classes of networks.1) give 1 Tiav ≤ ≤1 2 Tc by setting h = 2. For a network with two sources the bounds in (4. In general. . for even k. Example 4. ≥ + Tc 2 2N and there are networks for which this bound holds with equality. We can achieve the lower bound by simply ﬁnding a spanning tree.70 Throughput Beneﬁts of Network Coding and for this value of p. There are networks with two sources with even smaller coding throughput advantage.4. N (4. is given by Tiav = 3 Recall 1 k 2 2 k 2 2 +2 k 2 −2 k 2 2 =2 3 1 + 4 4(k − 1) 3 > Tc . av Tf Tc ≤2 N 1+N √ 2 √ . the average routing throughput. coding can at most double the average rate achievable by using very simple routing schemes. a tighter bound also holds: 1 1 Tiav .21) 4.3.

The probability that a receiver does not observe a particular source is thus equal to = h−1 h h . The combination network provides an interesting example where coding oﬀers signiﬁcant beneﬁts for the symmetric throughput (as discussed in the previous section). while the term 2[ k − 2 k 2] is the sum rate 2 2 for the receivers that observe both sources. The subtree decomposition can prove useful here in enabling us to estimate coding advantages for large classes of networks. To see that.23) Some thought is required to realize that this routing scenario can be cast into a classic urns and balls experiment in which h balls (corresponding to the receiver’s h incoming edges) are thrown independently and uniformly into h urns (corresponding to the h sources).4 Conﬁgurations with Small Network Coding Beneﬁts k 71 2 The term 2 2 is the sum rate for the receivers that only observe one of the two sources. network coding at most doubles the throughput in all networks with h sources and N receivers whose information ﬂow graphs are bipartite and each coding point has two parents.4. The established theory of such occupancy models can tell us not only . we consider a simple randomized routing scheme and compute the expected throughput observed by a single receiver. A similar bound holds for odd k. the number of sources that a receiver observes on the average is given by Tiav = h(1 − ) = h 1 − 1 − 1 h h h(1 − e−1 ) > Tc . In this scheme. for each edge going out of the source node. 2 (4. The same is true for conﬁgurations where all coding points have min-cut h from the sources. Example 4. (4. For example. while it cannot even double the average throughput achievable by routing.22) Therefore. one of the h information sources is chosen uniformly at random and routed through the edge.5.

Therefore. over networks with h sources. we can say that as the number of sources increases. k) networks. Example 4. the random routing can be replaced by deterministic. we ﬁrst note that under the randomized routing. each receiver observes the sequence of each source outputs as if it had passed through an erasure channel with the probability of erasure given by (4. k) combination networks. and additionally.72 Throughput Beneﬁts of Network Coding the expected value of the random variable representing the number of observed sources but also its entire probability distribution. hold in the case of B(h.6. we have Ti (n) → h(1 − ) = Ti . These results. Moreover. The next example illustrates what can be achieved by coding at the source that was described in Section 3. enabling each receiver to get the same throughput.5. then the integral throughput can achieve the average asymptotically over time. we have Ti (n) → h · k/n as n → ∞. the source is allowed to apply channel coding (over time). If in Example 4. k coding point Ai Bi and an arbitrary number of receivers observing the coding points. Since k/n can be taken arbitrary close to the capacity. and the integral throughput Ti can achieve the average in a ﬁnite number of time units. using uniform at random allocation of the sources. . The capacity of such a channel is equal to 1 − .6. since there are h sources. To see that. Information theory tells us that there exists a sequence of codes with rates k/n such that the probability of incorrect decoding goes to 0 and k/n → 1 − as n → ∞. in addition to the described randomized routing scheme.3. Such codes can be applied at each source. which is repeated at each time instant. the probability that the normalized diﬀerence between the observed throughput is greater than some value x decreases exponentially fast with x2 . When the conﬁguration is symmetric.22). Similar concentration results hold when the number of receivers is large (the rates they experience tend to concentrate around a much larger value than the minimum) and give us yet another reason for looking at the average throughput. as in the case of B(h.

5 Undirected Graphs We now consider undirected networks. This graph has the following properties: (1) The min-cut to each receiver is at least h/2. and let k = maxRi ∈R mi . R} where now G is an undirected graph with unit capacity edges. Clearly. Tc = h.4.e. i.5 Undirected Graphs 73 4. the throughput beneﬁts network coding oﬀers as compared to routing are upper bounded by a factor of two Tc ≤ 2Tf . Theorem 4. where through each edge e = (x. Theorem 4. In multicasting over undirected graphs. provided their sum does not exceed the capacity of the edge. such that at all vertices in V other than the source and the receivers. we use the following result for directed graphs. (2) The in-degree at each intermediate vertex equals the outdegree.4. To prove the theorem.. Therefore. the conditions of Theorem 4. Let G = (V. Replace each undirected edge with two opposite directed edges. E) be a directed graph with the set of receivers R ⊂ V .5 (Bang-Jensen.5 are satisﬁed. fxy + fyx ≤ ce . and where the min-cut to each receiver equals h. Therefore. Consider an instance {G. the coding advantage of network coding is bounded by a constant factor of two. Proof. S. y) we can simultaneously route ﬂows fxy and fyx in opposite directions. 1995). each of capacity 1/2. the in-degree is greater or equal than the out-degree. there are h/2 edge-disjoint Steiner trees such that each receiver occurs in . In undirected graphs. Assume that receiver Ri ∈ R has min-cut mi from the source. Then there are k edgedisjoint partial Steiner trees in G such that each node Ri ∈ R occurs in mi of the k trees.

It additionally tells us that. We also note however that empirical studies that use polynomial time approximation algorithms to solve the routing problem showed a comparable performance to network coding. 15]. Experimental results by Wu et al.5 and then using any network code design algorithm). Information theoretic rate bounds over undirected networks with two-way channels where provided by Kramer and Savari in [33]. Ch. while ﬁnding the optimal routing (packing Steiner trees) is a computationally hard problem. and were used to demonstrate network coding throughput beneﬁts in [1. it does oﬀer beneﬁts in terms of the transmission scheme design complexity. and thus the routing throughput can be made to be at least h/2.5 actually provides a more general claim than what we need to prove Theorem 4. Notes The ﬁrst multicast throughput beneﬁts in network coding referred to the symmetric integral throughput in directed networks. Theorem 4. that is. in [37]. The interested reader is referred to [44] for details on integer and linear programs. Theorem 4. We conclude that Tc ≤ 2Tf . and by Chekuri et al.74 Throughput Beneﬁts of Network Coding h/2 of them. over undirected graphs we can still send to each receiver rate larger or equal to half its min-cut value using routing. and were reported by Sanders et al. in [15]. and to [4. One should be aware that although network coding does not oﬀer signiﬁcant throughput beneﬁts for multicasting over undirected graphs.4.5 was proved Bang-Jensen et al. we have a non-uniform demand network. in [14]. reported in [47] showed small throughput bene- . and it was extended to Theorem 4.1 was subsequently proved by Agarwal and Charikar in [1]. in [43]. Theorem 4.2 by Chekuri et al. in [5]. In other words. Network coding schemes achieving throughput Tc can be designed in polynomial time (solving the LP in Section 3. even if the mincut to each receiver is diﬀerent. The zk networks were introduced by Zosin and Khuller in [50]. these trees span all receivers. Throughput beneﬁts over undirected graphs were examined by Li et al. 9] for details on the Steiner tree problem formulations.

. Throughput beneﬁts that network coding can oﬀer for other types of traﬃc scenarios were examined by Rasala-Lehman and Lehman in [41]. and by Dougherty et al.5 Undirected Graphs 75 ﬁts over undirected network graphs of six Internet service providers.4. in [19]. Non-uniform demand networks were examined by Cassuto and Bruck in [13] and later in [14].

On the other hand. the multicast capacity) can be found in polynomial time. Perhaps the most important point of this chapter is that.5 Network Code Design Methods for Multicasting We now look at network code design algorithms for multicasting under the assumptions of the main network coding theorem (Theorem 2. coding solutions that achieve higher rates than routing (namely. a directed acyclic graph with unit capacity edges where the min-cut to each of the N receivers equals the number of unit-rate sources h. as we discussed in Section 3. Maximizing the multicast rate using routing is. provided we can use an alphabet of size at least as large as the number of receivers. we can design network codes in polynomial time. NP-hard. Moreover.5. a very simple and versatile randomized approach to coding allows to achieve very good performance in terms of rate. and has been considered for use in practice.2). 76 .1 This is what makes network coding appealing from a practical point of view. We assume the network multicast model as established in Section 3.1: namely. 1 Polynomial in the number of edges of the network graph.

. Step (1) is necessary in all the algorithms described in this chapter. This approach requires O(|E|2 hN ) operations.4). Recall that a conﬁguration is minimal if removing any edge would violate the min-cut condition for at least one receiver (Deﬁnition 3. such as the one given within the proof of the min-cut max-ﬂow theorem (Theorem 2.1 Common Initial Procedure Given a multicast instance {G = (V. We then present examples of both centralized and decentralized algorithms. and coding points are edges in the network where two or more incoming information symbols have to be combined (Deﬁnition 3.1 describes a brute force implementation. such as scalability to network changes. E). Step (2) is not necessary.5. the ﬁrst common steps in all the algorithms described in this chapter are to (1) ﬁnd h edge-disjoint paths {(Si .5). the associated graph G = 1≤i≤h (Si . Rj ). the set of coding points C. thus the total complexity is O(|E|hN ). such as the number of coding points and the number of employed edges by the information ﬂow. but it can signiﬁcantly reduce the required network resources. Rj ). which sequentially attempts to remove each edge and check if the min-cut condition to each receiver is still satisﬁed.1 Common Initial Procedure 77 We classify network code design algorithms based on the type of information they require as their input: we say that an algorithm is centralized if it relies on global information about the entire network structure. and discuss their various properties. and decentralized if it requires only local and no topological information. R}. To implement it we can use any max ﬂow algorithm.1). S. A ﬂow-augmenting path can be found in time O(|E|). We start this chapter by describing a common initial procedure that all algorithms execute. Algorithm 5. and 1≤j≤N (2) ﬁnd the associated minimal conﬁguration. 1 ≤ i ≤ h} from the sources to the receivers. 5.

Rj ) 1≤j≤N if (V . In that sense. R}. A given network code (choice of coding vectors) conveys to receiver Rj rate at most equal to the number of independent linear combinations that the receiver observes.78 Network Code Design Methods for Multicasting Algorithm 5. 2A topological order is simply a partial order. Intuitively.2.1: Initial Processing(G. In other words.2 5. and of whether a speciﬁc network code preserves the multicast property. Rj ) for all i. E \ {e}) satisﬁes the multicast property ∀e ∈ E E ← E \ {e} then G ← (V . S. we can (with some abuse of terminology) talk about the mincut value a receiver experiences under a given network code. and assigns coding vectors to them. The Linear Information Flow (LIF) is a greedy algorithm based exactly on the observation that the choice of coding vectors should preserve the multicast property of the network. Such an order exists for the edges of any graph G that is acyclic. the algorithm preserves h “degrees of freedom” on the paths from the sources to each receiver. . S. we can think of the network code as being part of the channel. j G = (V .1 Centralized Algorithms Linear Information Flow Algorithm Consider an instance {G. The algorithm sequentially visits the coding points in a topological order2 (thus ensuring that no coding point is visited before any of its ancestors in the graph). Each assigned coding vector preserves the multicast property for all downstream receivers up to the coding point to which it is assigned. R) Find (Si . E ) ← 1≤i≤h (Si . E ) return (G ) 5.

where the vector ei has one in position i and zero elsewhere. . the set Cj contains the source nodes e . Rj ). we need some additional notation. for all receivers Rj ∈ R(δk ): j • the point f← (δk ) in Cj with the point δk . The set Cj keeps 1 h track of the most recently visited coding point in each of the h edge disjoint paths from the source to Rj . . and replaces. as Lemma 5. j • the associated vector c(f← (δk )) in Bj with c(δk ). the set Bj contains the set of linear equations the receiver Rj needs to solve to retrieve the source symbols. The set Bj keeps the associated coding vectors. Consider a coding point δ with m ≤ h parents and a receiver Rj ∈ R(δ). Such a vector c(δk ) always exists. . δ) be the (h − 1)dimensional space spanned by the elements of Bj after removing . . and V(Rj . . and the set B {S1 2 j contains the orthonormal basis h {e1 . For each receiver Rj . eh }. e2 . Rj ) for each receiver Rj . Let C denote the set of all coding points and R(δ) the set of all receivers that employ a coding point δ in one of their paths. Preserving the multicast property amounts to ensuring that the set Bj forms a basis of the h-dimensional space Fh at all steps of the q algorithm and for all receivers Rj . . j Let f← (δ) denote the predecessor coding point to δ along this path (Si .5.2 Centralized Algorithms 79 To precisely describe the algorithm. the algorithm maintains a set Cj of h coding points. S e . .2 proves. When the algorithm terminates. At step k. S e }. Initially. Let V(δ) be the m-dimensional space spanned by the coding vectors of the parents of δ. .1. The algorithm selects the vector c(δk ) so that for every receiver j Rj ∈ R(δk ). provided that the ﬁeld Fq has size larger than N . . cj } of h coding vectors. . Lemma 5. the algorithm assigns a coding vector c(δk ) to the coding point δk . Each coding point δ appears in at most one path (Si . . the set (Bj \ {c(f← (δk ))}) ∪ c(δk ) forms a basis of the h-dimensional space. for all Rj . and a set Bj = {cj . .

δ)} = h. feasible for c(δ). Therefore. δ) contains the rest of the basis Bj . There are q m − 1 such vectors. Consider a coding point δ with m ≤ h parents and a receiver Rj ∈ R(δ). which was counted . To make the network code valid for Rj . Note that that this bound may not be tight. The LIF algorithm successfully identiﬁes a valid network code using any alphabet Fq of size q > N . and thus the number of the vectors it excludes from V(δ) is q m−1 − 1.80 Network Code Design Methods for Multicasting j c(f← (δ)). since it was obtained by assuming that the only intersection of the |R(δ)| (h − 1)-dimensional spaces V(Rj . (Here. Since the dimension of the intersection of two subspaces A and B is given by dim{A ∩ B} = dim A + dim B − dim{A ∪ B}.) Therefore. Proof. δ)} = m − 1.2. Proof. provided that q m > N q m−1 ⇔ q > N. Applying the same argument successively to all coding points concludes the proof. Then dim{V(δ) ∩ V(Rj . the number of vectors excluded by |R(δ)| receivers in δ is at most |R(δ)|q m−1 − 1 ≤ N q m−1 − 1. The dimension of this intersection is m − 1. we only need to show that dim{V(δ) ∪ V(Rj . The coding vector c(δ) has to be a non-zero vector in the m-dimensional space V(δ) spanned by the coding vectors of the parents of δ. This is true j because V(δ) contains c(f← (δ)) and V(Rj . Rj ∈ R(δ) is the all-zero vector. the all-zero vector is counted only once. δ). we can ﬁnd a valid value for c(δ). Lemma 5. δ) spanned by the elej ments of Bj after removing c(f← (δ)). c(δ) should not belong to the intersection of V(δ) and the (h − 1)-dimensional space V(Rj .

. that is orthogonal to this subspace. which we discuss in Chapter 7. Using the fact that |R(δk )| ≤ N and applying the union bound the result follows.3. Tight alphabet size bounds are known for networks with two sources and for some other special class of networks. It is possible to do so either deterministically (e. Lemma 5.. Receiver Ri ∈ R(δk ) requires that c(δk ) does not belong to the (m − 1)-dimensional space spanned by the elements of Bj \ j {c(f← (δk ))}. i.2 Centralized Algorithms 81 only once. A randomly selected coding vector c(δk ) at step k of the LIF preserves the multicast property with probability at least 1 − N/q.3 The LIF algorithm is summarized in the following Algorithm 5. or by randomly selecting a value for c(δk ).5.e. Proof.2. A ﬁnal observation is that we can eﬃciently check whether the j selected vector c(δk ) belongs to the space spanned by Bj \ {c(f← (δk ))} by keeping track of. The algorithm can be implemented to run in expected time O(|E|N h2 ). . Thus 2 repeating the experiment on the average two times per coding point we can construct a valid network code. 3 Note j that Bj \ {c(f← (δk ))} forms an hyperplane. an h − 1 dimensional subspace. The remaining point to discuss is how the algorithm identiﬁes a valid coding vector c(δk ) at step k.3. From Lemma 5. of the h-dimensional space that can be uniquely described by the associated orthogonal vector.g. Randomly selecting a vector in Fm does not satisfy q this property with probability q m−1 /q m = 1/q. if for example we choose a ﬁeld Fq of size q > 2N the probability of success for a single coding vector is at least 1 . by exhaustive search). each pair of these spaces intersects on an (h − 2)-dimensional subspace. a vector αj . and performing an inner product with. However.

Fq with q > N ) C ← coding points of {G. e e e ∀Rj ∈ R. The problem of matrix completion is to select values for the variables so that the matrix satisﬁes desired properties. 1 ≤ k ≤ |C| At step k access δk ∈ C Find c(δk ) such that c(δk ) · aj (δk ) = 0 ∀Rj ∈ R(δk ) j Cj ← (Cj \ {f← (δk )) ∪ δk Bj ← (Bj \ {c(f j (δk ))}) ∪ c(δk ) ← and 1 j ∀Rj ∈ R(δk ) αj (δk ) ← c(δ )·aj (f← (δ )) aj (f← (δk )) j k k ∀δ ∈ Cj \ δk : αj (δ) ← αj (δ) − (c(δk ) · αj (δ))αj (δk ) return (the coding vectors c(δ). . as discussed in Section 3. R}.82 Network Code Design Methods for Multicasting Algorithm 5. These are the N transfer matrices from the source to each receiver. . Construct a bipartite graph with the . The problem of selecting the variable values so that the matrix is full rank over the binary ﬁeld (if possible) can be reduced to a bipartite matching problem as follows. S2 . the desired property is that N matrices sharing common variables are simultaneously full rank. We now discuss a very nice connection between bipartite graph matching and matrix completion. e2 .2: LIF({G.2 Matrix Completion Algorithms The problem of network code design can be reduced to a problem of completing mixed matrices. aj (δ) = ej • Repeat for k steps. . Mixed matrices have entries that consist both of unknown variables and numerical values. Such ideas have been extended to deterministic algorithms for network code design. eh }. For the network code design. Sh } ∀δ ∈ Cj . . R} in topological order ∀δ ∈ C. Consider a matrix A whose every entry is either zero or an unknown variable. R(δ) ← {Rj such that δ is in a path (Si . S. . .2. Cj ← {S1 . S. Rj )} Bj ← {e1 . . . ∀δ ∈ C) 5.2.

3 5.1) . .5. j) in A. It is also easy to see that the identiﬁed matching corresponds to an assignment of binary values to the matrix entries (one for an edge used. That is. zero otherwise) such that the matrix is full rank. . 5. as the following theorem states. This algorithm requires no centralized or local information. R} with N = |R| receivers. . Identifying a perfect matching can be accomplished in polynomial time. αη . . Consider an instance {G = (V. A perfect matching is a subset of the edges such that each node of the constructed bipartite graph is adjacent to exactly one of the edges. . place an edge between left node i and right node j. . For a non-zero entry in the position (i.3 Decentralized Algorithms 83 left nodes corresponding to the rows and the right nodes the columns of matrix A. . (η < η) such that f (α1 . (5.1 Decentralized Algorithms Random Assignment The basic idea in this algorithm is to operate over a ﬁeld Fq with q large enough for even random choices of the coding vector coeﬃcients to oﬀer with high probability a valid solution.4.3. E). The probability that all N receivers can decode all h sources is at least (1 − N/q)η . The associated probability of error can be made arbitrarily small by selecting a suitably large alphabet size. maximized over all receivers. η is the number of coding points encountered in the h paths from the source node to any receiver. where the components of local coding vectors are chosen uniformly at random from a ﬁeld Fq with q > N . . Recall that a network code is valid if it assigns values to variables α1 . S. αη ) = det A1 det A2 · · · det AN = 0. as nodes in the network simply choose the non-zero components of local coding vectors independently and uniformly from Fq . where η ≤ |E| is the maximum number of coding points employed by any receiver. Theorem 5.

. For this result. . Proof. We express f in the form d f (α1 . . then the probability that f (α1 . the following lemma is suﬃcient to prove Theorem 5. αη ).84 Network Code Design Methods for Multicasting where Aj is the transfer matrix for receiver j. . αη are chosen uniformly at random from a ﬁnite ﬁeld Fq of size q > N on which f is not identically equal to zero. αη ) be a multivariate polynomial in η variables with the maximum degree in each variable of at most N . αη ) equals zero is at most 1 − (1 − N/q)η . . . . . αη ) equals zero is at most 1 − (1 − N/q)η . . We have seen in Chapters 2 and 3 that for acyclic directed networks. .5. and instead present a simple proof of a slightly weaker but more often used result which. . . αη−1 ) + f2 (α1 . . . 1 ≤ j ≤ N . . .4. . αη ) be a multivariate polynomial in variables α1 . With this observation. . . . . . We omit the proof of this lemma. If the values for α1 . Lemma 5. f is a polynomial with maximum degree in each variable at most N . Let f (α1 . . (2) For η > 1. .4. If the values for α1 . Lemma 5. . . . . a randomly chosen element from a ﬁeld Fq satisfying the conditions of the theorem will be a root of f with probability of at most N/q. αη are chosen uniformly at random from a ﬁnite ﬁeld Fq of size q > N on which f is not identically equal to zero. . the inductive hypothesis is that the claim holds for all polynomials with fewer than η variables. then the probability that f (α1 .6. . αη ) = αη f1 (α1 . . . . . . . We prove the lemma by induction in the number of variables η: (1) For η = 1. we have a polynomial in a single variable of degree at most N . . . replaces η by the total number of local coding coeﬃcients η. αη . . Since such polynomials can have at most N roots. . with maximum degree in each variable of at most N and total degree at most N η . in Theorem 5. . Let f (α1 . the following lemma is suﬃcient.

. αη−1 it becomes a polynomial in αη of degree at least d and at most N . . it is very well suited to a number of practical applications. since when we evaluate f at the values of α1 . (c) Pr[f = 0|f1 = 0] ≤ N/q. However. (b) Pr[f = 0|f1 = 0] ≤ 1.5. We next brieﬂy discuss how this approach. . We now bound the above three probabilities as follows: (a) Pr[f1 = 0] ≤ 1 − (1 − N/q)(η−1) by the inductive hypothesis. Then Pr[f = 0] = Pr[f1 = 0] · Pr[f = 0|f1 = 0] +(1 − Pr[f1 = 0]) · Pr[f = 0|f1 = 0]. . such as dynamically changing networks. . that may contain all other powers d αη except for αη . . αη from a ﬁnite ﬁeld Fq of size q > N . as we discuss in the second part of this review. and f2 is a polynomial in η variables with degree in each variable of at most N .3 Decentralized Algorithms 85 where d ≤ N . f1 is a not-identically zero polynomial in η − 1 variables. q The randomized coding approach may use a larger alphabet than necessary. network coding. can be implemented in practice. it is decentralized. . and more generally. . Assume that we uniformly at random chose values for the variables α1 . scalable and yields to a very simple implementation. . In fact. Therefore Pr[f = 0] ≤ Pr[f1 = 0] + (1 − Pr[f1 = 0]) = Pr[f1 = 0] 1 − ≤ 1− 1− N q + N q 1− N q + N q by (a) N q by (b) and (c) N (η−1) q N η =1− 1− .

Packets are combined only with other packets in the same generation. the packets are grouped into generations. To deal with the lack of synchronization. sources may produce packets asynchronously. and sets of m consecutive bits are treated as a symbol of Fq with q = 2m .6%. Moreover. if we have h = 50 sources. Note that use of the coding vector in the header incurs a small additional overhead. The size of a generation can be thought of as the number of sources h in synchronized networks: it determines the size of matrices the receivers need to invert to decode the information. The same operations are applied symbol-wise to each of the L/m symbols of the packet. and also aﬀects the delay. and often follow diﬀerent routes. They use the packet’s header information to calculate the new coding vector to append. and symbol-wise combine buﬀered packets that belong in the same generation. is to append to each packet header an h-dimensional coding vector that describes how the packet is encoded. A feasible approach to perform network coding in a distributed and asynchronous manner. which linear combination of the source packets it carries. Intermediate nodes in the network buﬀer the packets they receive. A generation number is also appended to the packet headers to make this possible (one byte is suﬃcient for this purpose). for a packet that contains 1400 bytes. then the overhead is approximately 50/1400 ≈ 3. they uniformly at random select coeﬃcients. Assume that a generation consists of h information packets. where every byte is treated as a symbol over F28 . The receivers use the appended coding vectors to decode. To create a packet to send. the . which we will call source packets. that is. It is thus diﬃcult to implement a centralized network coding algorithm. it is desirable to keep the generation size small. Since inverting an h × h matrix requires O(h3 ) operations. may get dropped. information is sent in packets.2 Network Coding in Practice: Use of Generations In practical networks.86 Network Code Design Methods for Multicasting 5. Coding amounts to linear operations over the ﬁeld Fq . On the other hand. Each packet consists of L information bits. For example. Over the Internet packets between a source–destination pair are subject to random delays.3.

the permute-and-add codes have intermediate nodes permute each of their incoming binary packets and then xor them. The codes we describe in this section reduce this complexity to O(h). Let h be the binary min-cut capacity. 5. We here only give a high-level outline of the proof. . To show that the transfer matrix from the source to each receiver is invertible.7.5. the permute-and-add code achieves rate R = h − (|E| + 1) with probability greater than 1 − 2L +log(N |E|+h ) L+1 . However.3 Permute-and-Add Codes Randomized network code requires each intermediate node to perform O(h2 ) operations for every packet it encodes. to achieve low encoding complexity. with high probability. The basic idea is an extension of the polynomial time algorithms presented in Section 5.1.3 Decentralized Algorithms 87 size of the generation aﬀects how well packets are “mixed.” and thus it is desirable to have a fairly large generation size. but restrict the random 2 selection over permutation matrices. with 2 q = 2L . Each source produces a length L binary packet. it is suﬃcient to make sure that the information crossing speciﬁc cut-sets allows to infer the source information. to generate the new packet to send. Intermediate nodes choose uniformly at random which of the L! possible permutation to apply to each of their incoming packets. full rank. As their name indicates. These codes can be thought of as performing. Indeed. intermediate nodes may receive packets destined to the same receivers but belonging to diﬀerent generations. instead of the randomized network coding over Fq described previously. For any > 0. randomized vector network coding over FL . these codes do not randomly select L × L matrices over FL . so that no cut-set is visited before its “ancestor” cut-sets. if we use a large number of small-size generations. Theorem 5. These cut-sets can be partially ordered. Experimental studies have started investigating this trade-oﬀ.2.3. The problem is then reduced to ensuring that the transfer matrix between successive cut-sets is. Each receiver solves a set of hL × hL binary equations to retrieve the information data.

and [1 αi ] for 0 ≤ i ≤ q − 2. to label the nodes of a subtree graph of a network with two sources.2) Recall that for a valid network code. [1 0].1. This is because the binary alphabet is sufﬁcient for networks with two receivers.88 Network Code Design Methods for Multicasting 5. and for a minimal conﬁguration. Codes for networks with two sources As discussed in Section 3. We here present two algorithms which assign distinct coding vectors to the nodes in the subtree graph. Codes for networks with two sources are also simple to construct.2) are linearly independent in F2 . each input to a coding point have to be multiplied by non-zero coeﬃcients. and the coding vectors of any two subtrees having a receiver in common are linearly independent. The following example gives simple decentralized algorithms for networks with two receivers. q) [0 1]. we will discuss a connection between the problem of designing codes for networks with two sources and the problem of . Since any two diﬀerent points of (5. Codes for networks with two receivers If we have a minimal conﬁguration with h sources and two receivers. In Chapter 7. (5.4 Algorithms for Certain Classes of Networks These algorithms use some high level a priori information about the class of networks the multicast conﬁguration belongs to.2.3. each coding point needs to simply xor its incoming information symbols to get a valid network code. and thus each point is in the span of any two diﬀerent q points of (5.3. Example 5. we can use the points on the projective line PG(1. both coding conditions for a valid network code are satisﬁed if each node in a subtree graph of a network is assigned a unique point of the projective line PG(1. q).2). it is suﬃcient and necessary that the coding vector associated with a subtree lies in the linear span of the coding vectors associated with its parent subtrees. Example 5.

Riu }. we will use coding vectors from the set [1 αi ] for 0 ≤ i ≤ q − 2 with q − 2 = N − 1. for example.5. we locally know which receivers it contains and which sources are associated with each receiver (at the terminal before the coding point). . Ri2 . each coding subtree contains at least one receiver node associated with S1 and at least one receiver node associated with S2 . a token will be generated for each coding vector. Let R(T . However. . . The token remains at the possession of the transmission device until the data transfer is ﬁnished. where i1 < i2 < · · · < iu be the set of receivers associated with S1 in a given subtree T . it seizes the token when it arrives at the device. In our algorithm. Note that this is not the most eﬃcient mapping as it may require alphabet size of q = N + 1. If a change in the network occurs. Using this connection we can additionally employ all graph coloring algorithms for network code design in this special case. receivers leave the multicast session. where access to the network is governed by the possession of a token. Recall that at each subtree. These algorithms are well suited to applications such as overlay and ad-hoc networks. as opposed to q = N . Each coding point (associated terminal) seizes a token and uses the associated coding vector. . tokens that are no longer in use may be released back in the network to be possibly used by other coding points. The ﬁrst method is inspired by the Carrier Sense Multiple Access (CSMA) systems. S1 ) = {Ri1 . and when a device needs to transmit data. An alternative simple way to organize a mapping from coding vectors to coding points is described below. This is because for N receivers. Such algorithms may result in codes that are more eﬃcient in terms of the alphabet size. where . We now discuss how the code design problem can be simpliﬁed at the cost of using some additional network resources. The token is circulating around the network. We choose [1 αi1 −1 ] to be the label of that subtree. This way no other subtree can be assigned the same label since the receiver Ri1 can be associated with the source S1 in at most one subtree.3 Decentralized Algorithms 89 graph coloring. those presented here are decentralized and mostly scalable to network changes. In networks with two sources.

any coding vector in the h-dimensional space is eligible as c(T ). a simple decentralized code design is possible if we are willing to use an alphabet of size |C| + h − 1 as well as additional network resources (edges and terminals) to ensure that the min-cut to each coding point be h. and then code each subnetwork separately in a possibly decentralized manner. The following example provides the natural generalization of Example 5. Thus. Update G by removing the used paths.4. we can use the |C| + h points in general position in a normal rational . for a minimal conﬁguration with h sources and |C| coding points where the min-cut to each coding point is h. Additionally. an alphabet of size |C| + h − 1 is suﬃciently large for decentralized coding. 1 ≤ j ≤ N in G Design a network code based on these paths.2 to networks with h sources. form a basis of the h-dimensional space.90 Network Code Design Methods for Multicasting we have at our disposal large graphs rather then predetermined sets of paths from sources to receivers.3 outlines this procedure. Si2 ) Find paths (Si1 . Codes that employ vectors in general position In a network with |C| coding points. and for each pair of sources (Si1 . One way to take advantage of the increased connectivity is to (if possible) partition the network G with h sources (h even) into h/2 independent two-source conﬁgurations. Rj ). Example 5. Algorithm 5. (Si2 . selecting the coding vectors to be points in general position (see Appendix) ensures that each receiver observes h linearly independent coding vectors. Rj ). Indeed.3. The basic idea is that the h coding vectors of the parents of each coding point T . More speciﬁcally. Example 5. Algorithm 5.3: Paired-Sources Code(G) Group sources into pairs.

Alternatively. not just the coding points.3. Thus we can extend the multicast network. we will use the subtree decomposition described in Section 3. Here. 5.5. even in this case. Note that the projective line PG(1. |C| + h − 1) (see Appendix) to assign h vectors to the source subtrees and |C| vectors to the coding subtrees. An advantage of decentralized codes that employ vectors in general position is that they do not have to be changed with the growth of the network as long as the network’s subtree decomposition retains the same topology regardless of the distribution of the receivers. much like today terminals contact servers. q) (see Appendix) can be thought of as a subset of the projective . because multicast without coding is possible iﬀ the min-cut toward every node of the graph is h. until they connect to h subtrees that allow them to retrieve the source information. if for example in a network with two sources. we can think of coding points as edges incident to special network terminals.4 Scalability to Network Changes A desirable characteristic of network codes is that codes do not change when receivers join or leave the network. In the above scenario. that have not only enhanced computational power. since we can have new receivers contact subtrees. without any coding/decoding changes for existing users.4 Scalability to Network Changes 91 curve in PG(h − 1. Subtree decomposition enables scalability in the described manner even when the code is not decentralized. the receivers are allowed to join the network only in a way that will not change the topology of the subtree graph. or the new subtree graph contains the original subtree graph. We can think of subtrees as “secondary sources” and allow the new receivers to connect to any h diﬀerent subtrees. multicast with network coding requires lower connectivity then multicast without coding. addition of new receivers results in new subtrees without disturbing the existing ones. the coding vectors associated with any h subtrees provide a basis of the h-dimensional space. but also enhanced connectivity. For the following discussion. In a network using decentralized coding. then the existing code can be simply extended without any coding/decoding changes for existing users. Note that.

. Codes that use the algebraic structure were designed by Koetter e and M´dard [32]. in [43]. Permute-and-add codes were recently proposed by Jaggi et al.92 Network Code Design Methods for Multicasting line PG(1. Decentralized deterministic code design was introduced by Fragouli and Soljanin in [24]. in [26]. in [43]. where Fq is an extension ﬁeld of Fq . in [28]. we can employ unused points from the projective line PG(1. and also by Sanders et al. if we need to create additional coding vectors to allocate to new subtrees. Randomized algorithms were proposed by Ho et al. q ). q ). Minimal conﬁgurations and the brute force algorithm to identify them were also introduced in [24]. while the matrix completion codes were investigated by Harvey in [25]. Lemma 5. Notes LIF. in [26]. Improved algorithms for identifying minimal conﬁgurations were constructed in [34]. and independently by Jaggi et al. in [16]. Thus. in [29] was the ﬁrst polynomial algorithms for network code design (see also [30]).5 instrumental for Theorem 5. proposed by Sanders et al. and their asynchronous implementation over practical networks using generations by Chou et al. These algorithms were later extended to include procedures that attempt to minimize the required ﬁeld size by Barbero and Ytrehus in [6].4 was proved by Ho et al.

but instead. and append the associated coding vector to the outgoing packets. and that networks have no cycles. divides information packets into generations. 6. opportunistically combine them. In fact.1 Dealing with Delay Network links typically introduce variable delay which may cause problems of synchronization in the network code operations. and discuss networks that have both cycles and delay. We ﬁrst look at how to deal with delay over acyclic graphs. and appends a “generation identiﬁcation tag” to the packet. There are two known ways to deal with networks with delay: the asynchronous approach. We will now relax these assumptions. Here network nodes 93 . Intermediate network nodes store the received packets of the same generation. does not allocate to network nodes predetermined operations. We then formally deﬁne networks with cycles. brieﬂy discussed in Chapter 5. we will see that one method to deal with cycles is by artiﬁcially introducing delay.6 Networks with Delay and Cycles For most of this review we have assumed that all nodes in the network simultaneously receive all their inputs and produce their outputs.

1. Equivalently. Intermediate nodes wait for a predetermined time interval (larger than the expected delay of the network links) to receive incoming packets before combining them. where “∗” means that nothing was received through the edge during that time slot. This second approach is better suited for networks with small variability in delay. σ1 (1). . ∗. This approach is ideally suited to dynamically changing networks. receiver R1 observes sequences y1 (D) and y2 (D) given as y1 (D) = y2 (D) D 0 3 D3 D A1 σ1 (D) . σ1 (1) + σ2 (1). For acyclic graphs. we can model the network operation by associating one unit delay D (or more) either with each edge or only with edges that are coding points.94 Networks with Delay and Cycles only need to know at which rate to inject coded packets into their outgoing links. the approach discussed in this chapter deals with the lack of synchronization by using time-windows at intermediate network nodes to absorb the variability of the network links delay.2 and assume that there is a unit delay associated with every edge of the graph. Let σ1 (t) and σ2 (t) denote the bits transmitted by the source at time t. t ≥ 1. Consider the butterﬂy network in Figure 1.} = t=1 Dt+1 σ1 (t). σ1 (3). and we use A1 to denote the transfer matrix. Assume that node C still simply xors its incoming information bits. Then through edge AD. the sequence of bits {∗. Under this scenario. ∗. σ1 (2) + σ2 (2) + · · · } = t=1 Dt+3 (σ1 (t) + σ2 (t)). the elements of the . In contrast. σ1 (4). We next discuss how to design network codes while taking into account the introduced (ﬁxed) delay. Example 6. . and through edge ED. σ2 (D) where σ1 (D) = t σ1 (t)Dt and σ2 (D) = t σ2 (t)Dt . . where large variations in delay may occur. σ1 (2). the receiver R1 gets the sequence of bits {∗.

. (2) Coding vectors have elements over Fq [D] or Fq (D). and transmit linear combinations of their past and current inputs. as we will discuss in the section about cycles. In this framework. we will be dealing with vector spaces over the ﬁeld of rational functions of polynomials in Fq [D].1 Dealing with Delay 95 transfer matrix are polynomials in D.2) still holds. ﬁxed delay only means that instead of dealing with matrices and vectors whose elements belong to Fq . More generally.. merely replacing elements of Fq with elements of Fq (D). The goal of network code design is to select coding vectors so that each receiver Rj observes an invertible transfer matrix Aj .1. belong in the ring of polynomials Fq [D] = { i αi Di .e. Receiver R1 can decode at time t + 3 the symbols σ1 (t) and σ2 (t). we now have to deal with those whose elements are polynomials in the delay operator D with coeﬃcients over Fq . D2 0 D2 1 B1 y1 (D) = D3 y2 (D) 1 0 0 1 σ1 (D) .6. αi ∈ Fq }. intermediate nodes are only allowed to linearly combine over Fq the incoming information streams. i. we can follow a very similar proof. we b(D) can still talk about full rank and invertible matrices. it can process its received vectors by multiplying with the inverse transfer matrix B1 . that is. where a matrix B is the inverse of a matrix A if AB = BA = I. The delay is introduced only by the network links. Indeed. That is.e. 6. intermediate nodes may store their incoming information streams. It is easy to see that the main theorem in network coding (Theorem 2. That is.1 Algorithms There are basically two ways to approach network code design for networks for delay: (1) Coding vectors have elements over Fq . but do not incur additional delay. elements of Fq (D) = a(D) . i. Namely. . introduce additional delay. a(D). σ2 (D) In practice. b(D) ∈ Fq [D]. b(D) = 0 .

If we associate delay only the coding points in the graph. such as the Viterbi algorithm. is operation over a small ﬁeld Fq . but where the h symbols are not necessarily produced during the same time slot. the decoding trace-back depth is one.96 Networks with Delay and Cycles The beneﬁts we may get in this case as compared to the previous approach. In the case of networks with delay. Each assigned coding vector is such that all downstream receivers can still decode h source symbols produced during the same time-slot. all algorithms described in Chapter 5 for network code design can eﬀectively still be applied. the number of states and the way they are connected. An example subtree conﬁguration is shown in Figure 6. one from each source. namely. Convolutional codes are decoded using a trellis and trellis decoding algorithms.3). Thus the complexity of decoding is determined by the complexity of the trellis diagram.2 Connection with Convolutional Codes If we associate delay with the network links. In both cases. 6.1. consider the LIF algorithm in Section 5. at each step decoding h information symbols and keeping track of one state only. . This algorithm sequentially visits the coding points in a topological order and assigns coding vectors to them.2) and the transfer matrix (3. we can think of the network as a convolutional code where memory elements are connected as dictated by the network topology.1(b).1. in trellis decoding we can proceed step by step. inputs are connected at the sources and outputs are observed at the receivers. Given that the receivers observe the convolutional code outputs with no noise. it is suﬃcient to select coding vectors so that all downstream receivers get h linear independent combinations that allow them to decode h information symbols.2. For example. This code is described by the ﬁnite-dimensional state-space equations (3. In trellis decoding terminology. such as the binary ﬁeld. we obtain a convolutional code corresponding to the subtree graph of the network.1(a) and its corresponding convolutional code in Figure 6.

we still have some freedom at the receiver to optimize for decoding complexity. we are interested in equivalent encoders. as we are not interested in error-correcting properties. Only recently. that produce the same mapping of input to output sequences. One way to reduce the decoder complexity is to identify. Typically for convolutional encoders. Two codes are strictly equivalent if they have the same mapping of input sequences to output sequences. Among all strictly equivalent encoders. with the emergence of turbo codes where the mapping from input to output sequences aﬀects the code’s performance. but only the min-cut condition for each receiver. Note that in the conventional use of convolutional codes. we are restricted by the network conﬁguration for the choice of the encoder Gi (D).1 Dealing with Delay 97 Fig. there is no need to use a diﬀerent encoder to encode and decode. the encoder that uses the smallest number of memory elements is called minimal.6. that produce the same set of output sequences. Another way to reduce the decoder complexity is to use for decoding the trellis associated with the minimal strictly equivalent encoder to Gi (D). among all encoders that satisfy the multicast condition and the constraints of a given topology. the notion of strictly equivalent encoders has become important.1 Conﬁguration with 2 sources and 5 receivers: (a) the subtree graph. (b) the corresponding convolutional encoder. Note that the minimization does not need to preserve the same set of outputs. However. . the one that has the smallest number of memory elements. given these constraints. 6. In our case. and thus the smallest number of states. Here we provide another example where this notion is useful.

On the other hand. we will see other examples showing how use of network coding may enable delay reduction. Then with network coding. When in Part II we discuss network coding applications to wireless.2 and assume that the source would like to deliver m bits to both receivers. b2 . Assume that edges AD and BF have unit delay. because a receiver may need to wait for the arrival of several coded bits before decoding.2 Optimizing for Delay Consider the following scenario. . But neither routing nor network coding may allow us to achieve the delay that a receiver would experience were it to use all the network resources by itself.98 Networks with Delay and Cycles 6. bh }. we ignore the fact that the network coding solution does not preserve the ordering of the bits. the average interarrival delay equals1 2(α − 1) + m−2(α−1) 1 α−1 2 = + m 2 m 1 For simplicity. . either routing or network coding or a combination of the two may enable us to achieve the optimal delay when multicasting. As the following example illustrates. use of network coding may increase the delay.2. A source needs to multicast h information bits (or packets). {b1 . thus reducing the delay. Deﬁne delay δij to be the number of time slots between the moment when receiver Rj successfully decodes bit bi−1 to the moment when it successfully decodes bit bi . We are interested in identifying the optimal routing and network coding strategy. while edge CE has an associated delay of α delay units. . to a set of N receivers with as low delay as possible over a network with a delay associated with traversing every link of the network. . Example 6. a delay measure of interest is the inter-arrival delay. such that the overall average inter-arrival delay j i δij Nh is minimized. Consider the butterﬂy network in Figure 1. . Note that use of network coding may allow us to achieve higher information rates. For multimedia applications.

We observe that each path (Si . the source can deliver the information in α − 1 + 2(m−α+1) 2 1α−1 3 = + m 3 3 m timeslots. for the ﬁrst α − 1 timeslots each receiver gets one uncoded bit through AD and BF . by using routing along AD and BF and timesharing along CE.6. respectively. if a < b in some path. Depending on the value of α and m. The problem of cycles has to be addressed only for networks which do not satisfy this constraint. Each diﬀerent path imposes a diﬀerent partial order on the same set of edges. the resulting delay would be 1α−1 1 + 2 2 m that outperforms both previous approaches. Indeed. we consider network topologies that have cycles. Rj ) from source Si to receiver Rj induces a partial order on the set of edges: if edge a is a child of edge b then we say that a < b. if a receiver exclusively uses the network resources.3 Dealing with Cycles In this section. On the other hand.3 Dealing with Cycles 99 timeslots. 6. which we describe below. The source node edge is the maximal element. A suﬃcient. and ﬁnally for α − 1 timeslots the receivers only receive and decode one bit per timeslot through CE. . condition for consistency is that the underlying graph G is acyclic. there does not exist a path where b < a. then for (m − 2(α − 1))/2 timeslots each receiver can successfully decode two bits per timeslot. We distinguish two cases. Finally. but not necessary. depending on whether a partial order constraint. We distinguish the graphs depending on whether the partial orders imposed by the diﬀerent paths are consistent. is satisﬁed. Consistency implies that for all pairs of edges a and b. the network coding solution might lead to either less or more delay than the routing solution.

the paths from sources S1 and S2 impose consistent partial orders on the edges of the cycle.2(a).2 Two networks with a cycle ABCD: (a) paths (S1 . we need to introduce delay in each cycle to guarantee consistent and causal information propagation around the cycle. Consider the two networks in Figure 6. whereas for the path from source S2 . When the partial orders imposed by the diﬀerent paths are not consistent. It is this set of networks we call networks with cycles. Thus. although the underlying network topology contains cycles. Example 6.2. In the network shown in Figure 6. we have AB > CD. 6. in networks with cycles. R2 ) = S2 C → CD → DA → AB → BR2 impose inconsistent partial orders to the edges of the cycle. (b) paths (S1 .100 Networks with Delay and Cycles Fig. . 6. R2 ) = S2 B → BC → CD → DA → AR2 impose consistent partial orders to the edges of the cycle. for the path from source S1 . as the following example illustrates. the line graph associated with the information ﬂows does not. R1 ) = S1 A → AB → BC → CD → DR1 and (S2 . we can treat the network as acyclic. then the network code design needs to be changed accordingly. R1 ) = S1 A → AB → BC → CD → DR1 and (S2 . When the partial orders imposed by the diﬀerent paths are consistent. respectively.2(a).1 Delay and Causality Our ﬁrst observation is that. Sources S1 and S2 use the cycle ABCD to transmit information to receivers R1 and R2 . from the network code design point of view. In the network shown in Figure 6. we have CD > AB.3.3.

A).3. recursive convolutional codes.3 A networks with a cycle ABA. as depicted in Figure 6. This gives rise to the equation xAB = xA + xBA = xA + xB + xAB ⇒ xA + xB = 0. Example 6. Consider nodes A and B that insert the independent information ﬂows xA and xB in the two-edge cycle (A. 6. Note . Unless we introduce delay in the cycle. the information ﬂow is not well deﬁned. in the convolutional code framework. The convolutional code framework enables us to naturally take into account cycles without changing the network code design: we select coding operations so that each receiver observes a full rank transfer matrix. The ﬁrst approach observes that networks with cycles correspond to systems with feedback. Consider the network coding operation at node A.3 Dealing with Cycles 101 Fig. then node A simultaneously receives xA and xBA and sends xAB .6. That is. B. 6. Similarly for node B.4. It is also easy to see that local and global coding vectors are no longer well deﬁned. which is not consistent with the assumption that xA and xB are independent. or.3. the transfer matrices are rational functions of polynomials in the delay operator. If we assume that all nodes instantaneously receive messages from incoming links and send them to their outgoing links.2 Code Design There are two ways to approach designing network codes for networks with cycles.

Every vertex k in γ (edge in the original graph) corresponds to a state variable sk . and treat each cycle separately. A straightforward algorithm is to start with the graph γ = (Si . cycles per se do not increase the number of states and the complexity of trellis decoding. We illustrate this approach through the following example.5 generalizes to arbitrary graphs with simple cycles. cycles that do not share edges with other cycles. where σi (t) is the symbol transmitted from source Si at time t. The second network code design approach attempts to remove the cycles. and trace-back depth of one. The approach in Example 6.1) . The decoding complexity mainly depends on the size of the trellis. and afterwards can be removed from the circulation by the node that introduced it.. Operations in (6.2(b).e. we can still have a feed-forward encoder by representing the cycle with a block of memory elements. Example 6.5.1) can be easily implemented by employing a block of memory elements as shown in Figure 6. Thus. i. exactly as in the case of feed-forward convolutional codes. Observe that an information source needs to be transmitted through the edges of a cycle at most once. Consider the cycle in Figure 6. As a result. This approach applies easily to networks that have simple cycles. Then the ﬂows through the edges of the cycle are AB : σ1 (t − 1) + σ2 (t − 3) BC : σ1 (t − 2) + σ2 (t − 4) CD : σ1 (t − 3) + σ2 (t − 1) DA : σ1 (t − 4) + σ2 (t − 2). Rj ) (or the subtree graph) where vertices correspond to edges in the original graph G. We need to express how the information that goes through each sk evolves with (6.4. and for simplicity assume that each edge corresponds to one memory element. a trellis decoding algorithm like Viterbi.102 Networks with Delay and Cycles that decoding can still be performed using the trellis diagram of the recursive convolutional code.

2(b). we can look at the I inputs (incoming edges) of the cycle (for example in Figure 6. t2 .4. we create Li memory elements. i. We can think of this approach as “expanding in time” when necessary.4. for every cycle. For example. Consider the network depicted in Figure 6. it is not selected by the code designer. and σD have to ﬂow through the edge r2 r3 . C. For each input. that is. Each input follows a path of length Li (in Figure 6.6. and D. All four information streams σA . an additional set of Li memory elements. an edge (in this case r2 r3 ) shared by multiple cycles.. in Figure 6. This approach does not extend to networks with overlapping cycles. σC .e.6. time.3 Dealing with Cycles 103 Fig. Through each edge of the cycle ﬂows a linear combination of a subset of these Li memory elements.4 Block representation of the cycle in Figure 6. we create an expanded matrix A. we use six additional memory elements. and t3 and four sources A. To do that. B. Example 6.5 that has a knot. The code designer can only select the coeﬃcients for the linear combinations. This subset is deﬁned by the structure of the paths and the structure of the cycle. There are three receivers t1 . thus obtaining a convolutional code with a total of 12 memory elements. σB . Using this approach. and along speciﬁc paths. we have I = 2 inputs).4. L1 = L2 = 3). Consider ﬂow xA : to reach receiver t1 it has to circulate . that contains. 6. as the following example illustrates.

assume we associate a unit delay element with each of the edges r1 r2 . inside the cycle (r2 . The resulting convolutional code is depicted in Figure 6. 6. Thus. and r3 r4 . the second approach would not work. Notes Coding in networks with cycles and delay was ﬁrst considered in the seminal papers on network coding by Ahlswede et al. Using polynomials and rational functions over the delay operator D to take delay . 6. r3 r1 . Fig.104 Networks with Delay and Cycles Fig.5 if we associate unit delay with the edges r1 r2 . r3 . and r4 has σA . and r3 r4 . [2. σA cannot be removed from the cycle. r3 r1 . To apply the ﬁrst approach.5 A network conﬁguration with multiple cycles sharing the common edge r2 r3 (knot). Since none of the nodes r2 . 36].6.6 Convolutional code corresponding to the network conﬁguration in Figure 6. r4 r2 . r3 . r4 r2 . r4 ).

The connection with convolutional codes e was observed by Fragouli and Soljanin in [23] and independently by Erez and Feder in [21]. by Koetter and M´dard in [32]. The “knot example” comes from [7].6. . A very nice tutorial on convolutional codes is [39]. in [27]. Using a partial order to distinguish between cyclic and acyclic graphs was proposed by Fragouli and Soljanin in [24] and later extended by Barbero and Ytrehus in [7]. Methods to code over cyclic networks were also examined by Ho et al. together with the algebraic approach.3 Dealing with Cycles 105 into account was proposed.

In a timeinvariant network. this phase occurs only once.7 Resources for Network Coding Now that we have learned how to design network codes and how much of throughput increase to expect in networks using network coding. (2) Operational complexity: This is the running cost of using network coding. that is. the complexity of adapting to possible changes becomes important. while in a time-varying conﬁguration. and determining the operations that the nodes of the network perform. We can distinguish the complexity of deploying network coding to the following components: (1) Set-up complexity: This is the complexity of designing the network coding scheme. We focus our discussion on resources required to linear network coding for multicasting. Coding schemes for network coding and their associated design complexity are discussed in Chapter 5. the amount of computational and 106 . which includes selecting the paths through which the information ﬂows. as this is the better understood case today. it is natural to ask how much it costs to operate such networks.

There are also costs pertinent speciﬁcally to wireless networks. Again this complexity is strongly correlated to the employed network coding scheme. We look into this issue in Section 7. To recover the source symbols. Moreover. we look at information streams as sequences of elements of some ﬁnite ﬁeld Fq . in part II of the review. which we discuss in the chapter on wireless networks. the alphabet size also aﬀects the required storage capabilities at intermediate nodes of the network. typical algorithms for multiplications/inversions over a ﬁeld of size q = 2n require O(n2 ) binary operations. and coding nodes linearly combine these sequences over Fq . The cost of ﬁnite ﬁeld arithmetics grows with the ﬁeld size. The complexity is further aﬀected by (a) The size of the ﬁnite ﬁeld over which we operate. it is in our interest to minimize the number of coding points.107 network resources required per information unit successfully delivered. each receiver needs to solve a system of h × h linear equations. which requires O(h3 ) operations over Fq if Gaussian elimination is used. from this point of view as well. Linearly combination of h information streams requires O(h2 ) ﬁnite ﬁeld operations. while the faster implemented algorithms (using the Karatsuba method for multiplication) require O(nlog 3 ) = O(n1.2. (b) The number of coding points in the network where we need to perform the encoding operations. For example. In linear network coding.1. Note that network nodes capable of combining incoming information streams are naturally more expensive than those merely capable of duplicating and forwarding incoming packets. We call this the alphabet size of the code. . and can possibly cause delay.59 ) binary operations. Therefore. We discuss alphabet size bounds in Section 7.

[0 1]. which we can select from the set (7. they cannot be assigned the same coding vector. [1 α]. Any two such coding vectors form a basis of the two-dimensional space (see Example A.2 in Appendix). we connect the corresponding .1 Bounds on Code Alphabet Size We are here interested in the maximum alphabet size required for a multicast instance {G. Vertex coloring is an assignment of colors to the vertices of a graph so that adjacent vertices have diﬀerent colors.3. . (7. the alphabet size that is suﬃcient but not necessary for all networks with h sources and N receivers. Recall that the binary alphabet is suﬃcient for networks which require only routing. We relate the problem of assigning vectors to the vertices of Γ to the problem of vertex coloring a suitably deﬁned graph Ω.1) where α is a primitive element of Fq . Γ has only source subtrees and no network coding is required). We restrict our attention to minimal subtree graphs. we use the combinatorial framework of Section 3. Thus. S. We connect two vertices in Ω with an edge when the corresponding subtrees cannot be allocated the same coding vector. Thus. we can equivalently prove that we can construct a valid code by using as coding vectors the k = q + 1 diﬀerent vectors over F2 in the set q {[1 0].108 Resources for Network Coding 7. Let Γ be a minimal subtree graph with n > 2 vertices (subtrees). . . That is. In our case. n − 2 is the number of coding subtrees (note that when n = 2. and the graph to color Ω is a graph with n vertices.1 Networks with Two Sources and N Receivers To show that an alphabet of size q is suﬃcient. the “colors” are the coding vectors. since a valid network code for a minimal subtree graph Γ directly translates to a valid network code for any subtree graph Γ that can be reduced to Γ (the code for Γ simply does not use some edges of Γ ). . each vertex corresponding to a diﬀerent subtree in Γ.1. 7. [1 αq−1 ]}. If two subtrees have a common receiver node. an alphabet size suﬃcient for all possible minimal subtree graphs is suﬃcient for all possible non-minimal subtree graphs as well. For the rest of this section. R} with h sources and N receivers.1).

1 plots Ω for an example subtree graph (which we considered in Chapter 3). parent and child subtrees cannot be assigned the same coding vector. we need not worry about this case separately since by Theorem 3. We connect the corresponding vertices in Ω with an edge which we call a ﬂow edge.6. every vertex i in Ω has degree at least 2. (2) Coding subtrees: Each coding subtree has two parents.1. the two source subtrees have exactly one child which shares a receiver with each parent.1 A subtree graph Γ and its associated graph Ω. by Theorem 3. We prove the claim separately for source and coding subtrees. These are the only independence conditions we need to impose: all other conditions follow from these and minimality. 7. Figure 7. If n > 3.5. if two subtrees have a common child. a parent and a child subtree have either a child or a receiver in common. (1) Source subtrees: If n = 3. by Theorem 3. it cannot be allocated the same Fig. the two source subtrees have at least one child which shares a receiver or a child with each parent.7. Since the conﬁguration is minimal. The receiver edges in Ω are labeled by the corresponding receivers. they cannot be assigned the same coding vector.5. Similarly. Proof. For a minimal conﬁguration with n > 2. .1 Bounds on Code Alphabet Size 109 vertices in Ω with an edge which we call receiver edge. Lemma 7. For example. However.

1 and 7. Theorem 7. where m is a nonnegative integer. (7. The chromatic number of a graph is the minimum number of colors required to color the vertices of the graph. E(Ω) ≤ N + n − 2 = N + k + m − 2.2. A graph is k-chromatic if its chromatic number is exactly k. Proof. we know that each vertex has degree at least 2.1. we can lower bound the number of edges of Ω as E(Ω) ≥ k(k − 1) + 2m . Let m − k. ( [9. Lemma 7. We are going to count the number of edges in Ω in two diﬀerent ways: (1) From Lemmas 7. and at least k vertices have degree at least k − 1. we have at most N receiver edges and at most n − 2 distinct ﬂow edges (we count parallel edges only once). Consequently. 2 (7. so that no two adjacent vertices are assigned the same color. and the corresponding vertex has degree at least two. that may be either ﬂow edges. or receiver edges. Ch.3) .2.3. There exist conﬁgurations for which it is necessary. This implies that in Ω there should exist edges between a subtree and its parents.2) (2) Since there are N receivers and n − 2 coding subtrees. 9]) Every k-chromatic graph has at least k vertices of degree at least k − 1. Deﬁnition 7. Assume that our graph Ω has n vertices and chromatic number χ(Ω) = k ≤ n. For any minimal conﬁguration with N receivers. Thus.110 Resources for Network Coding coding vector as either of its parents. the code alphabet Fq of size 2N − 7/4 + 1/2 is suﬃcient.

since such a network may contain a two-source subnetwork that requires this alphabet size.1 Bounds on Code Alphabet Size 111 From (7.2. we will construct a subtree graph that has N = k(k−1) − k + 2 receivers and the corresponding graph Ω has 2 chromatic number k. Solving for q = k − 1 we get the bound q≤ 2N − 7/4 + 1/2 . Theorem 7. Such a graph can be constructed as depicted in Figure 7.4) Equation (7.4) becomes equality. We start with a minimal subtree graph Γ that has k vertices and k − 1 receivers. i. 2 (7.3).4.e. an alphabet of size greater than N is always suﬃcient. and requires an alphabet of 2 size q = k − 1. .7. an alphabet of size suﬃcient..1. for any minimal con2N − 7/4 + 1/2 is ﬁguration with N receivers. The corresponding graph Ω has k − 2 ﬂow edges and k − 1 receiver edges. we obtain N≥ k(k − 1) − k + 2. This proves the ﬁrst claim of the theorem that. To show that there exist conﬁgurations for which an alphabet of this size is necessary.2 Based on the results of the previous section.2) and (7. a lower bound on the alphabet size a network with h sources and N receivers may require is 2N − 7/4 + 1/2 . Thus Ω 2 cannot be colored with less than k colors.4) provides a lower bound on the number of receivers we need in order to have chromatic number k. The interesting question is whether there are networks that require larger alphabet. Add k(k−1) − [(k − 2) + (k − 1)] receivers. we are going to construct a subtree graph where (7. Networks with h Sources and N Receivers 7. The corresponding subtree graph has N = k(k−1) − k + 2 receivers. so 2 that Ω becomes a complete graph with E(Ω) = k(k−1) edges. For all conﬁgurations with h sources and N receivers.

in Theorem 3. and N − 1 coding subtrees.2. where paths cannot overlap more than a certain number of times.112 Resources for Network Coding Fig. recall that in the proof of Lemma 5. We have already given two diﬀerent proofs of this result. as we saw in Theorem 3. The proof of Theorem 3.2. the proof of Lemma 3.2 does not use the fact that we only care about minimal conﬁgurations. and thus the entries of the matrices Aj are not arbitrary. N receivers. 7.5. 1 x (7. we overesti- . Similarly. Here we discuss whether this bound is tight or not. there does not exist a minimal conﬁguration with two sources and two receivers corresponding to the matrices A1 = 1 0 .2 and Lemma 5. 0 x A2 = 1 1 .1 is more general than it needs to be. there exist two minimal conﬁguration with two sources and two receivers.5) In fact. For example.2 A minimal subtree graph for a network with two sources. Therefore. and does not necessarily give a tight bound.

this is no longer the case. Let v1 and v2 in Γ act as source subtrees. v)|v ∈ V \ {v1 .1. that has as subtrees the vertices V of G.2 Bounds on the Number of Coding Points 113 mated the number of vectors that cannot be used at a certain coding point. v2 ) ∈ E. Therefore. non-minimal conﬁgurations can have a number of coding points that grows with the number of edges in the network. we do not have examples of networks where the √ alphabet size needs to be larger than O( N ). create a receiver that observes v1 and v2 in Γ. (v2 . this lemma may not give a tight alphabet size either.7. Select an edge e ∈ E that connects vertices v1 and v2 . For each e = (v1 . Eγ = {(v1 .1 that the problem of ﬁnding the minimum alphabet size can be reduced to the chromatic number problem. v2 }}. Let G = (V. We thus conclude that ﬁnding the minimum alphabet size is also an NP-hard problem. an appropriately chosen subtree graph Γ. and the coding vectors (corresponding to colors) will be the vectors in (7. 7. Eγ ).3 Complexity It is well known that the problem of ﬁnding the chromatic number of a graph is NP-hard.1). This instance will have two sources. In minimal conﬁgurations. It has also been shown that this alphabet size is suﬃcient. Here we show that the reverse reduction also holds: we can reduce the problem of coloring an arbitrary graph G with the minimum number of colors.2 Bounds on the Number of Coding Points As we discussed in Example 3. E) be the graph whose chromatic number we are seeking. In fact. . v2 ∈ V . under some regularity conditions or special network structure. We will ﬁrst select the subtrees (vertices) in Γ that act as source subtrees. It is clear that ﬁnding a coding scheme for Γ is equivalent to coloring G. up to today.1. Intuitively. for an appropriately chosen network (more precisely. that is.1. to the problem of ﬁnding a valid network code with the smallest alphabet size. We saw in Section 7. The conjecture that this is indeed the suﬃcient alphabet size for all networks is as far as we know today open. and the remaining vertices as coding subtrees. that corresponds to a family of networks). v). Create a subtree graph Γ = (Vγ = V. 7. with v1 .

N = O(∞). one can show that the number of coding points is O(h3 ). N the number of receivers. S.6. The problem of ﬁnding the minimum number of coding points is NPhard for the majority of cases.2 Networks with h Sources and N Receivers Here we restrict our attention to acyclic networks. 7. where h is the number of sources.3).114 Resources for Network Coding if we have h sources and N receivers. However.2 demonstrates a minimal subtree graph for a network with two sources and N receivers that achieves the upper bound on the maximum number of subtrees.1 Networks with Two Sources and N Receivers For networks with two sources. Then we can eﬃciently ﬁnd a feasible network code with |C| ≤ h3 N 2 . Proof. we can calculate a tight upper bound on the number of coding points using combinatorial tools. it is polynomial time provided that h = O(∞). and |C| denotes the number of coding points for the network code.6. but similar bounds exist for cyclic networks. There exist networks which have a minimal subtree decomposition that achieves this upper bound. Consider a multicast instance {G. The ﬁrst part of the claim then follows directly from Theorem 3. The bounds we give below for minimal conﬁgurations depend only on h and N . In a minimal subtree decomposition of a network with two sources and N receivers. Theorem 7. the number of coding subtrees is upperbounded by N − 1. For such minimal conﬁgurations. the paths from the sources to the receivers can only intersect in a ﬁxed number of ways. Theorem 7. R}. . and the underlying graph is acyclic.2.2.5. 7. Recall that there are exactly 2N receiver nodes. The proof idea of this theorem is to ﬁrst examine conﬁgurations with h sources and two receivers (see Figure 7. Figure 7.

3 Minimal conﬁguration with h = 4 sources and N = 2 receivers. It is easy to see that this is a minimal conﬁguration.7.2. Conﬁgurations with N receivers can be reduced to O(N 2 ) conﬁgurations with two receivers as follows. There exist instances {G. Thus the total number of coding points cannot exceed O(h3 N 2 ). We can extend this construction to conﬁgurations with N receivers by using N/2 non-overlapping receiver pairs. R} with h sources and N receivers such that |C| ≥ Ω(h2 N ). 7. S. It is polynomial time in the case where the number of sources and receivers is constant. Lemma 7. Consider the minimal subtree graph in Figure 7.2 Bounds on the Number of Coding Points 115 Fig. Given a pair of receivers.7. The construction directly extends to the case of h sources with h(h − 1)/2 coding points. . simply remove the edges/coding points corresponding to the remaining N − 2 receivers. 7. The resulting conﬁguration will have at most O(h3 ) coding points.3 Complexity The problem of minimizing the number of coding points is NP-hard for the majority of cases. and the underlying graph is acyclic. The following theorem deals with the case of integral network coding over cyclic graphs (graphs that are allowed to have cycles). We prove this lemma constructively.3 with h = 4 sources and two receivers.

R2 ). R1 ) and (S2 . If it is greater than zero. S2 . the LDP problem has no solution. Theorem 7. This can be proved by reduction to the link-disjoint path (LDP) problem in directed graphs.4 Reduction between the minimum-number-of-coding-points and link-disjoint-paths problems. If this minimum number equals zero. 7. Consider now the problem of h = 2 sources multicasting from S to R1 and R2 using the minimum number of coding points. Connect S1 and S2 to a virtual common source S.8. we have identiﬁed a solution for the LDP problem. Consider integral network coding over cyclic graphs. Let > 0 be a constant.116 Resources for Network Coding Fig.3 Coding with Limited Resources In the previous sections we looked into network coding costs in terms of (i) the number of nodes required to perform coding (which are more 1 We underline that this reduction applies for the case of integral routing. our original problem is at least equally hard. and R2 . the LDP problem asks to ﬁnd two edge disjoint paths (S1 . R1 . Finding the minimum number of coding points within any multiplicative factor or within an additive factor of |V |1− is NP-hard.1 7. and also directly connect S2 to R1 and S1 to R2 through virtual edges. Since the LDP is known to be NP-complete. Here we give the main proof idea. Given a directed graph and nodes S1 . where V is the number of vertices in the graph. .

and each receiver has mincut m ≥ h. 7. Consider a set of points X and a family F of subsets of X. if the min-cut to each receiver is m > h. the system cannot support an alphabet size large enough to accommodate all users. . if the min-cut toward some or all of the receivers is greater than the number of sources.2 For the special case when the subtree graph is bipartite (which means that the parent of coding subtrees can only be source subtrees) and we have h = 2 sources. N possible conﬁgurations {G. When the min-cut toward each user is exactly equal to the number of sources.1 Min-Cut Alphabet-Size Trade-Oﬀ We here consider networks with two sources where the processing complexity is a stronger constraint than the bandwidth. Up to now we only have preliminary results toward these directions that apply for special conﬁgurations such as the following examples. 2 Note however that several of these choices may turn out to correspond to the same minimal conﬁguration. In particular. then it is called k-colorable. and we only want to route h sources. we can show that this is indeed true by applying the following result. Equivalently. we can “select” the h paths toward each receiver that lead to the smallest alphabet size requirements. the bound in Theorem 7. but on the other hand. and min-cut. number of coding points. a smaller alphabet size would be suﬃcient. the min-cut toward each receiver is larger than the information rate that we would like to multicast. One would expect that. We now examine whether we can quantify the trade-oﬀs between alphabet size.3.7. R} to select we have a choice of m h from.3 Coding with Limited Resources 117 expensive and may cause delay) and (ii) the required code alphabet size which determines the complexity of the electronic modules performing ﬁnite ﬁeld arithmetics. If a family admits a legal coloring with k colors. S. if we have N receivers.3 gives the maximum alphabet size a network with N users may require. h sources. The intuition is that. A coloring of the points in X is legal if no element of F is monochromatic. throughput.

when we are restricted to use an alphabet of size k − 1. The above inequality shows a trade-oﬀ between the min-cut m to each user and the alphabet size k − 1 required to accommodate N receivers for the special class of conﬁgurations we examine. 7.9 cannot be directly applied. If |F| < k In our case. Theorem 7. and |F| = N . X is the set of subtrees. each element of F corresponds to the set of m subtrees observed by a receiver. the min-cut to each receiver is m. From Theorem 7. We are interested in the number of receivers which will not be able to decode both sources. Note that each receiver can observe both sources if F is colorable. However. Suppose that we can use an alphabet size k − 1 (which gives k colors).118 Resources for Network Coding Theorem 7. and the set of colors are the points on the projective line. we have h = 2 sources. F is a family of sets each of size m. Theorem 7. this holds as long as N < k m−1 . then F is k-colorable. and the min-cut to each receiver is m. For every family F whose all members have size exactly m. Therefore. (Erd¨s 1963) Let F be a family of sets each of size at o m−1 . there exists a k-coloring of its points that colors at most |F|k 1−m of the sets of F monochromatically. because in this case there are additional constraints on coloring of the elements of X coming from the requirement that each child subtree has to be assigned a vector lying in the linear span of its parents’ coding vectors.9. We obtain a bound to this number by making use of the following result.10. m is the min-cut from the sources to each receiver.2 Throughput Alphabet-Size Trade-Oﬀ Consider again the case where the subtree graph is bipartite. and we only employ routing.3. least m.9. then at most N k 1−m receivers will not . since that means that no set of subtrees observed by a receiver is monochromatic. Thus if we have |F| = N receivers. We expect a similar tradeoﬀ in the case where the graph is not bipartite as well.

we may also use an alphabet size much larger than the optimal code design might require. 2 Consider a C-node that the N th receiver is connected to. N ) conﬁgurations there exist network codes over the binary alphabet. say ω. Its label. e−1 .4) as d PN ≥ 1 − N N (p) q e −N (N )/q p . as well as their outgoing edges. N ) networks that are described in Example 4. There are N −1 p−2 such edges. p For arbitrary values of p and N . Therefore. if we use very low complexity decentralized network code design algorithms. We make this case for the speciﬁc class of zk(p. Since the number of edges going out of S into A-nodes is N N N −1 N −1 p−1 . We 2 p−1 further remove all A-nodes which have lost their connection with the source S. the only A-node that this C-node is connected to is the one with the label ω \ {N }. say.3 Coding with Limited Resources 119 be able to decode both sources. Thus. and forward the result to the C-nodes.3. The B-nodes merely sum their inputs over Fh . the number of remaining edges is p−1 − p−2 = p−1 . In other words.7. is a p-element subset of I containing N . Because of our edge removal. if we want this bound to be smaller than. the min-cut toward each receiver is larger than the information rate that we would like to multicast. the probability PN that all N receivers will be able to decode is bounded (Theorem 5. at least |F |(1 − k 1−m ) receivers will still be able to successfully decode both sources.3 Design Complexity and Alphabet-Size Trade-oﬀ Here we show that. We .2. if the alphabet size is not large enough to accommodate all users. but on the other hand. On the contrary. We will show that for the zk(p. Consequently. network coding using a binary alphabet can be achieved as follows: We ﬁrst remove the edges going out of S into those A-nodes whose labels contain N . if we perform coding by random assignment of coding vectors d over an alphabet Fq . label these edges by the h = N −1 diﬀerent basis elements of Fh . and all those inputs are diﬀerent. we need to choose q ≥ N N . all C-nodes that the N th receiver is connected to have a single input. the N th receiver observes all the sources directly. 7.

6) . and each source of the p−2 remaining N −2 as a sum of that source and some p − 1 of the sources p−1 received directly. There are N −2 such nodes. Some of the C-nodes that Rj is connected to have a single input: those are the nodes whose label contains N . Lemma 7. Consider one of these receivers. . 2.3. We examine a hybrid coding/routing scheme in which only a fraction of the nodes that are supposed to perform coding according to the scheme described in Section 7. N − 1 will have to solve a system of equations. The average throughput under this scheme T (Ak ) is given by zk T (Ak ) = zk N −p p−1 h p+ +k . . Consequently. We derive an exact expression for the average throughput in this scenario. and the N −2 labels are all p−2 (p − 1)-element subsets of I which contain j and do not contain N .120 Resources for Network Coding Each of the receivers 1.4 Throughput Number-of-Coding-Points Trade-oﬀ We show this trade-oﬀ again for the speciﬁc class of zk(p. The labels of p − 1 of these A-nodes contain j. N p h (7. and show that it increases linearly with the number of coding points.3 are actually allowed only to forward one of their inputs. Rj gets N −2 sources directly. . That label is diﬀerent for all C-nodes that the receiver Rj is connected to. and only one does not.3. N ) that are allowed to perform linear combination of their inputs (as opposed to simply forwarding one of them) is restricted to k. and they p−2 all have diﬀerent labels. Let Ak be a hybrid coding/routing scheme in which zk the number of coding points in Γzk (p. N ) networks also used in the previous section. For the rest of the proof. 7.11. it is important to note that each of these labels contains j. Let us now consider the remaining N −1 − N −2 = N −2 C-nodes that p−1 p−2 p−1 Rj is connected to. . Each of these nodes is connected to p A-nodes. say Rj .

Thus. the k coding points where coding is allowed.e. at the N p − k forwarding points. At the other extreme.6 shows that the throughput beneﬁts increase linearly with the number of coding points k. we N −2 p−2 N −1 p 1 N N −1 p−1 + (N − 1) + −k + kp . N p Theorem 7.7) by simple arithmetics. substituting k = 0.7) In the above equation. required codes over arbitrarily large sequences.3 Coding with Limited Resources 121 Proof.6) follows from (7. using network coding at all av coding points. we have • the ﬁrst term because receiver N observes all sources. Alphabets of size N · h were shown to be suﬃcient for designing codes based on the algebraic . all of its p receivers get rate 1 by binary addition of the inputs at each coding point (see the description of the coding scheme in Section 7. Notes The very ﬁrst paper on network coding by Ahlswede et al. and • the fourth term because. exactly one receiver gets rate 1. −p Substituting k = h N p in (7. p−2 −1 • the third term because. i.7).3.e.. (7. using routing only. i. we get that Tk = h as expected. we get an exact characterization of Tiav as av Tk = Tiav = N −p h p+ . • the second term because each of the remaining N − 1 receivers observes N −2 sources directly at the source nodes. If only k out of the get that av Tk = N −1 p coding points are allowed to code. at a rate of (p − 1)/(hN ). Equation (7.3)..7. a signiﬁcant number of coding points is required to achieve a constant fraction of the network coding throughput.

in [34]. That √ codes for all networks with two sources require no more than O( N ) size alphabets was proved in by Fragouli and Soljanin in [24] by observing a connection with network code design and vertex coloring.122 Resources for Network Coding approach of Koetter and M´dard [32]. The polynomial time code design e algorithms of Jaggi et al. 35]. only examples of networks for which √ alphabets of size O( N ) are necessary were demonstrated in [24. The connection with coloring was independently observed by Rasala-Lehman and Lehman in [35]. in [31]. The encoding complexity in terms of the number of coding points was examined in [24] who showed that the maximum number of coding points a network with two sources and N receivers can have equals to N − 1. in [30] operate of alphabets of size N . in [15]. Coding with limited resources was studied by Chekuri et al. It is not known whether alphabet of size O(N ) is necessary for general networks. . Bounds on the number of coding points for general networks were derived by Langberg et al. and Kim et al. Cannons and Zeger in [12]. who also showed that not only determining the number of coding points but even determining whether coding is required or not for a network is for most cases NP-hard. Algorithms for fast ﬁnite ﬁeld operations can be found in [46].

q The moment curve provides an explicit construction of q vectors in general position. . Since in network coding we only need to ensure linear independence conditions. The vectors in a set A are said to be in general position in Fh if any h vectors in A are linearly independent. the coding vectors lie in Fh . αh ] | α primitive element of Fq }. the h-dimensional vector space over q the ﬁeld Fq . we are interested in many cases in sets of vectors in general position: Deﬁnition A. It can be deﬁned as the range of the f : Fq → Fh function q f (α) = [1 α α2 .Appendix Points in General Position For networks with h sources and linear network coding over a ﬁeld Fq . 123 .1. The moment curve in Fh is the set q {[1 α α2 . Deﬁnition A.2. . . αh ]. .

xh−1 ] 2 .2. . . . Example A. and [1 αi ]. . . Any two diﬀerent vectors in this set form a basis for F2 . 1≤j<i≤h Thus. . we cannot reuse any of its q − 1 nonzero multiples. xh−1 2 2 det . . if g denotes the maximum possible number of q such vectors. we cannot expect to ﬁnd more than q + 1 vectors in general position in F2 . For each of the g vector. q In fact. Indeed. g needs to satisfy g(q − 1) + 1 ≤ q 2 ⇒ g ≤ q + 1. For q + 1 ≥ h. and thus these vectors are in general position. the following set of q + 1 points are in general position in Fh : q [1 x1 x2 1 [1 x2 x2 2 . . xh−1 ] q 1] where xi ∈ Fq and xi = xj for i = j.1. . . take any h of these q vectors. . we often use the q + 1 vectors in the set [0 1].. . Example A. [1 xq x2 q [0 0 0 . it is nonzero.. . . . [1 0]. the corresponding determinant 1 x1 x2 . We also cannot use ..124 Points in General Position The vectors in the moment curve are in general position. .. . xh−1 1 1 1 x2 x2 . . . 1 xh x2 . . that equals (xi − xj ). . . xh−1 ] 1 . (A.. for 1 ≤ i ≤ q − 1. . . . .. provided xi = xj (for all i = j). . . .. This follows from a simple counting argument. . . Indeed. xh−1 h h is a Vandermonde determinant.1) where α is a primitive element of Fq . For networks with two sources. . . . . ..

. q) ≥ h + 1. q Thus. In general. . q) = q + 1 if q = h + 1 and q is odd. g(h. q) = q + 2 if h = 3 and q is even. Hence the left-hand side. for h ≥ q. q). 1] 1] The maximum size g(h. The right-hand side counts all possible vectors in F2 . . .1). q) ≥ q + 1. . q) ≤ q + n − 1. g(6. Some bounds include the following: • • • • g(h. it holds that g(h. . and it is widely believed that g(h. we can. q) ≤ 1 + g(h. [0 1 . q) = g(7. . The following set of h + 1 points are in general position in Fh : q [1 0 . q) = q + 1 if q is even. . [1 1 . Example A. we know that g(h. q) = q + 2 if q is even and either h = 3 or h = q − 1. q) that a set of vectors in Fh in general q position can have. g(h + 1. . . for h ≤ q. Some known results include the following: • • • • • • • g(h. g(h. q) ≥ n + 1. . . q) = q + 1 if q = 11 or 13. . g(6. [0 0 . restrict our selection of coding vectors to the set (A. q) = q + 2 for even q. for networks with two sources. g(h. is not known in general. q) = q + 1 if h = 2 or if h = 3 and q is odd. . without loss of generality. q) = q + 1. g(3. q) = h + 1. g(h.3. . g(4. . . . q) = g(5. . .125 the all-zero vector. q + 1 otherwise. whereas. 0] 0] .

a k-arc is a set of k points no three of which are collinear (hence the name arc). q). most of the harder results on the maximum size of arcs have been obtained by using algebraic geometry which is also a natural tool to use for understanding the structure (i.e. λ = 0.. q) looks combinatorial in nature.e. Matrix G is a generator matrix for an MDS code of dimension q h over Fq . . Consider a matrix G with columns a set of vectors in general position in Fh . λ ∈ Fq . i. a k-arc is a set of k points any h of which form a basis for Fh .1 Projective Spaces and Arcs Vectors in general position correspond to points in arcs in projective spaces. A good survey on the size of arcs in projective spaces can be found in [3]. The projective space PG(h − 1.. q).3. arcs have been used by coding theorists in the context of MDS codes.e. The moment curve was discovered by Carath´odory in 1907 and then indee pendently by Gale in 1956. In PG(h − 1. geometry) of arcs. not all zero. under the equivalence relation given by [a1 · · · ah ] ∼ [λa1 · · · λah ]. codes that achieve the Singleton bound. q) is deﬁned as follows: Deﬁnition A. Deﬁnition A.4.126 Points in General Position A. i. As a ﬁnal note. The projective (h − 1)-space over Fq is the set of htuples of elements of Fq .. Notes Although the problem of identifying g(h. PG(2. q In a projective plane.

Acknowledgments This work was in part supported by the Swiss National Science Foundation under award No. and DIMACS under the auspices of Special focus on Computational Information Theory and Coding. 127 . NSF under award No. CCR-0325673. PP002–110483.

Notations and Acronyms Fq : ﬁnite ﬁeld with q elements h: number of sources N : number of receivers {G. E). a source vertex S ∈ V . edge through which source Si ﬂows c (e): local coding vector associated with edge e of size 1 × |In(e)| and elements over Fq c(e): global coding vector of size 1 × h and elements over Fq cxy : capacity associated with the edge (x. R}: a multicast instance comprising of a directed graph G = (V. Rj ): path from source Si to receiver Rj e Si : source node in the line graph. . σi : source Si emits symbols σi over a ﬁnite ﬁeld Fq (Si . . S. and a set R = {R1 . . y) that connects vertex x to vertex y γ: a line graph Γ: an information ﬂow decomposition graph D: delay operator In(e): set of incoming edges for edge e 128 . R2 . RN } of N receivers. .

Notations and Acronyms 129 Out(e): set of outgoing edges for edge e In(v): set of incoming edges for vertex v Out(v): set of outgoing edges for vertex v LP: Linear Program IP: Integer Program LIF: Linear Information Flow algorithm Tc : throughput achieved with network coding Tf : symmetric throughput achieved with rational routing Ti : symmetric throughput achieved with integral routing av Tf : symmetric throughput achieved with rational routing Tiav : symmetric throughput achieved with integral routing .

Ball. [5] J. P. Bollob´s. [8] B.” SIAM Journal on Discrete Mathematics. 8. [4] M.” IEEE Transactions on Information Theory. L. 2004. “Preserving and increasing local edge-connectivity in mixed graphs. and R. Texas. 130 . L. and B. A. October 2006. “Heuristic algorithms for small ﬁeld multicast encoding. Ytrehus. Amsterdam: North-Holland. “On the advantage of network coding for improving network throughput. C. and G. R. Ahlswede. 155–178. vol. [7] A.” IEEE Transaction on Information Theory. 1995. vol. W. Ali. 2002. pp. S. Nemhauser.” Joint special issue of the IEEE Transactions on Information Theory and the IEEE/ACM Transaction on Networking. San Antonio. July 2000.” IEEE Information Theory Workshop. 2795–2804. North Holland. Charikar. 52. “Network information ﬂow. Graph Theory with Applications.References [1] A. Kaneta. Bang-Jensen. “Network Models.-Y. [2] R. W. Cai. Chengdu. Agarwal and M. ´ [6] A. L. N. H. R. Li. Barbero and Ø. Magnanti. September 1995. T. S. 1649–1656. [3] A. 1994. Frank. 41. pp. Modern Graph Theory. 1204–1216. a [9] J. Murty. China.” in Handbooks in Operations Research and Management Science. A. “On the size of arcs in projective spaces.” 2006 IEEE International Symposium Information Theory (ISIT’06). Bondy and U. and H. Yeung. Springer-Verlag. vol. Ytrehus. June 2006. M. “Cycle-logical treatment for cyclopathic networks. pp. vol. J. Monma. 1979. M. Hirschfeld. Jackson. 46. pp. Barbero and Ø.

Control. Fragouli and E. [17] T. pp. vol. A. “Information ﬂow decomposition for network coding. 1956. New York. Adelaide.” IEEE Transactions on Information Theory. Control. July 2006. C. June 2006. Convex Optimization. Chicago 2004. M. and K. R. no. and C. pp. Thomas. and Computing. 2. M. and K. Soljanin. vol. vol. September 2005. Bruck. June 2004. 2005. Harvey. “A connection between network coding and convolutional codes. [24] C. March 2004. and E. 3. Shannon. 2000. Boyd and L. “On achievable information rates in single-source non-uniform demand networks. [25] N. 117–119. no. pp. A. Feder. 1956. Dougherty. vol. Elias. R. Cover and J.” Allerton Conference on Communication. Jain. “Convolutional network codes. Elements of Information Theory. Soljanin. 52. Cannons and K. March 2006. Feinstein. Cassuto and J. 777–788. Fragouli and E.” Joint Special Issue of the IEEE Transactions on Information Theory and the IEEE/ACM Transactions on Networking. 51. 2410–2424. 4413–4430. 52. Ford Jr. October 2003. [26] T. M. Soljanin. [18] R. and D. Diestel. John Wiley & Sons. C. Zeger. Chou. Illinois. [19] R. Chekuri. “Note on maximum ﬂow through a network. September 2006. Monticello. 2745–2759. Erez and M. Freiling. [20] P. IL. [22] L.” IEEE Transactions on Information Theory.” IEEE International Symposium on Information Theory (ISIT). Fragouli. pp. and D. M´dard. Cannons. [21] E. “Network coding capacity with a constrained number of coding nodes. 8. pp. C. Cambridge University Press. Freiling. “Practical network coding. vol.References 131 [10] S. “A random lino e ear network coding approach to multicast. pp. and E. “Deterministic network coding by matrix completion. 1991. [11] J. Fulkerson. and K. C. pp.” 2005 IEEE International Symposium Information Theory (ISIT 2005). [23] C. 52. Zeger. 661–666. Springer-Verlag. Vandenberghe. “Network routing capacity. pp. Ho. vol. Zeger. Y. 2. Dougherty. Graph Theory. 52. [12] J.” IEEE International Symposium on Information Theory.” Canadian Journal of Mathematics. Chekuri. Australia. October 2006. vol. “Network coding for non-uniform demands. . 829–848. March 2006. “Insuﬃciency of linear coding in network information ﬂow. and Computing Allerton Park.” IEEE Transactions on Information Theory.” Allerton Conference on Communication. E. Eﬀros. Soljanin. vol. “On average throughput beneﬁts and alphabet size for network coding. [13] Y. Karger. [15] C. August 2005.” IRE Transaction on Information Theory.” IEEE Transactions on Information Theory.” MS Thesis. K¨tter. [14] C. Shi. 399–404. Fragouli. “Maximal ﬂow through a network. 8. R. R. J. A.” IEEE International Conference on Communications (ICC). Wu. [16] P.

” IEEE Transactions on Information Theory.” Network Coding Workshop. Chicago.” IEEE Transactions on Information Theory. [37] Z. W. R. M´dard. E. pp. “Linear network coding. Li. C. 2608–2623. “Complexity classiﬁcation of network information ﬂow problems. e o and F. vol. “Low complexity encoding for network codes. [31] M. 52. K¨tter. and J. “Linear versus nonlinear boolean functions in network ﬂow. February 2003. 2004. [33] G. (V.” in Joint Special Issue on Networking and Information Theory. vol. Menger. B. vol. R. 142–150. M´dard. no. Cassuto. pp. and M. USA. Y. S. vol. M. W. [34] M. vol. North Holland.” IEEE/ACM Transaction on Networking. USA. Jain.” in Handbook of Coding Theory. R. “Polynomial time algorithms for multicast network code construction. 10. [29] S. Ahn. Lehman and E. 782–796. “Complexity classiﬁcation of network information ﬂow problems. Eﬀros. “Beyond routing: An algebraic approach to network o e coding. R. pp. Li. Riis. pp. [30] S. no. 49. Cai. and K.” SODA. 95–115. Savari. K. 2386–2397. “Zur allgemeinen Kurventheorie. [28] S. Karger. [35] A. Sprintson.” 1st Workshop on Network Coding. K¨tter and M. and L.” presented at ISIT 2003. [41] A. eds. Yeung. Rasala-Lehman and E. Ahmed. M´dard. Ratnakar. R. June 2006. R. D. Eﬀros. pp. 1927. Chou. Ho. Jaggi. [36] S. Tolhuizen. 2003. [42] S. P.” Fundamenta Mathematicae. [38] D. C. “Low complexity algebraic network multicast codes. 2005. 2004. vol.-Y. pp. “Cut sets and information ﬂow in networks of two-way channels. Langberg.” in Proceedings of 2004 IEEE International Symposium Information Theory (ISIT 2004). 1973–1982. M.” IEEE Transaction on Information Theory. S. [32] R. Huﬀman. 2006. 52. March 2004.” in Proceedings of 38th Annual Conference on Information Sciences and Systems (CISS’04). June 2006. [39] R. Jaggi. 6. P. T. “On minimizing network coding e resources: An evolutionary approach. M´dard. Ho. P. Lehman. N. IEEE Transactions on Information Theory (IT) and IEEE/ACM Transactions on Networking (TON). 11.” in Proceedings of 15th ACM Symposium on Parallel Algorithms and Architectures. K¨tter. “The encoding complexity of network coding. Tolhuizen. M. Li. “Distributed Asynchronous Algoo e rithms for Multicast Network Coding.” Joint special issue of the IEEE Transactions on Information Theory and the IEEE/ACM Transaction on Networking. and M. 371–381. June 27–July 2 2004. Kramer and S. J. McEliece. Egner. and L. Seattle.132 References [27] T. 6. “Polynomial time algorithms for network information ﬂow. Jain. and N. and M. Princeton. October 2003.). 2006. A. Sanders. NJ. vol. Lau. pp. Japan. C. Leong. “The algebraic theory of convolutional codes. “Minimum-cost multicast over coded packet networks. 52. [40] K.” in Proceedings of 2006 IEEE International Symposium Information Theory (ISIT’06). Lehman. July 2006. “On achieving optimal multicast throughput in undirected networks. Sanders. Egner. Zhao. [43] P. . October 1998. WiOpt 2005. and L. A. Eﬀros. Lun. S. 51. Jaggi. B. Chou. Yokohama. Bruck. Pless and W.” SODA. Kim.

pp. N. pp. 2004. and Z. Press. 701–717. Chou. Second Edition. A. 412–422. Schrijver.” Journal of the ACM. W. September 2003. pp. [46] J. Zhang. “A comparison of network coding and tree packing. 2. John Wiley & Sons. T. “Multilevel diversity coding with distortion. 2002. [48] R. . 41. “On directed Steiner trees. and K. 59–63. Theory of Linear and Integer Programming.” IEEE Transaction on Information Theory. P. Gerhard. vol. 241–381. Cambridge Univ. Cai. 1980.” in Proceedings of the 13th Annual ACM/SIAM Symposium on Discrete Algorithms (SODA). Yeung.References 133 [44] A. 2006. [49] R. vol. “Network coding theory: A tutorial.” ISIT 2004. R. [47] Y. Yeung. von zur Gathen and J. vol. Khuller. “Fast probabilistic algorithms for veriﬁcation of polynomial identities. Modern Computer Algebra.” Foundation and Trends in Communications and Information Theory. Schwartz. Zosin and S. June 1998. Wu. Li. [45] J.-Y. 1995. pp. [50] L. S. W. Jain. 27.

- NotesRGCN
- Slide-8a H2 and H-Infinity Control
- GRS-Global Ranger System
- IOSR Journals
- Chapter 1 Functions - Thomas Calculus 12th edition
- project 2.0
- Network Coding
- 1412.5310
- Network Coding Ppt
- Phy Layer Network Coding
- Graphs of Straight Lines
- Multiple Transformations- Order Matters Sometimes
- Knot Theory for Spatial Graph on Surface
- syllabus
- 1631_topic11
- IJETR032758
- 02-UCTI HSSN2 Assignment
- 3.2 Resource Management Approach in Heterogeneous Wireless Access Networksv1
- CS Manual Glossary of Terms.pdf
- A Constructive Approach to Solving 3D Geometric Constraint Systems Using Dependence Analysis
- Vacancy
- Graphs
- Yearly Plan Form 2 Ictl
- ComIntRep
- Kalman's Test
- Computer Information Systems
- lesson plan 2
- Evaluation_GrantT
- 3
- Data Communication

Skip carousel

- Network Engineer Resume Sample
- Report on Practical Experiment of Blockchain Technology in Japanese Domestic Interbank Payment Operation
- SV150 doubletruck 042416
- IS Encryption Guide
- Broadband for Africa
- IBM ADEPT Practictioner Perspective - Pre Publication Draft - 7 Jan 2015
- HORNET anonymity network
- The Optical World
- Telecommunications Manager Resume Sample
- The Future Internet
- Technical Aspects of Lawful Interception
- SkyLock Product Description
- Network Administrator Resume Sample
- Network Technician Resume Sample
- National Broadband Plan Presentation - Commission Meeting Slides (12-16-09)
- Whitepaper on Distributed Ledger Technology
- Senior Network Administrator Resume Sample
- Hitchhiker's Guide to the Internet by Krol, Ed
- November 12, 1990
- As NZS 4229.2-1994 Information Technology - International Standardized Profile TAnnnn - Connection-Mode Trans
- SENATE HEARING, 111TH CONGRESS - KEEPING US SAFE
- UT Dallas Syllabus for ba4323.001.11f taught by Hans-Joachim Adler (hxa026000)
- CPNI Procedure 9-15-2008
- UT Dallas Syllabus for cs4390.002 05s taught by Ravi Prakash (ravip)
- Infinite Data v. LexisNexis Document Solutions
- The Impact of Data Caps and Other Forms of Usage-Based Pricing for Broadband Access
- SENATE HEARING, 110TH CONGRESS - PHANTOM TRAFFIC
- Ubiquitous Sensor Networks
- How Cisco IT Built a Medianet to Handle Bandwidth-Intensive Traffic
- Review on variants of Power Aware AODV

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading