Advanced Linear Programming

cx=d a1 a2 c a5

a3

a4

11111111111111111 00000000000000000 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111

11 00 00 11

v4

11 00 11 00

v2

11 00 00 11

v3

c v1
0 v 00 11

00 11

11 00 11 00

cx=d
11 00 11 00 11 00 11 00 00 11 00 11 00 11 11 00 00 11 11 00 11 00 11 00 11 00 00 11 00 11 11 00 00 11 00 11 00 11 11 00 00 11 11 00 11 00 11 00 11 00 00 11 00 11

a1
11 00 11 00 11 00 11 00 00 11 00 11 00 11 11 00 00 11 11 00 11 00 11 00 11 00 00 11 00 11 11 00 00 11 00 11 00 11 11 00 00 11 11 00 11 00 11 00 11 00 00 11 00 11 11 00 00 11 00 11 11 00 00 11 00 11 11 00 00 11 00 11 00 11 11 00 00 11 11 00 11 00 11 00 11 00 00 11 00 11 11 00 11 00 11 00 11 00 00 11 00 11 1 0 0 1 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 1 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 1 0

a2

a5

0 1 1 0 0 1

c

a3

a4

Dr R.A.Pendavingh September 6, 2004

Contents
Introduction 1. Optimization 2. Some preliminaries 3. Linear equations Exercises Part 1. Linear Optimization 5 5 7 9 12 15 17 17 20 22 25 29 29 31 34 37 41 41 44 46 47 51 51 56 58 59 63 65 65 68 70 72 75 75
3

Chapter 1. Linear inequalities 1. Convex sets 2. Linear inequalities 3. Carath´ eodory’s Theorem Exercises Chapter 2. Linear optimization duality 1. The duality theorem 2. Optimal solutions 3. Finding the dual Exercises Chapter 3. Polyhedra 1. Polyhedra and polytopes 2. Faces, vertices and facets 3. Polyhedral cones Exercises Chapter 4. The simplex algorithm 1. Tableaux and pivoting 2. Cycling and Bland’s rule 3. The revised simplex method Exercises Part 2. Integer Linear Optimization

Chapter 5. Integrality 1. Linear diophantine equations 2. Lattices 3. Lattices and convex bodies Exercises Chapter 6. Integer linear optimization 1. Integer linear optimization

4

CONTENTS

2. Matching 3. Branch & bound Exercises

78 80 82

Introduction
1. Optimization 1.1. Optimization. Optimization is choosing, from a set of alternative possibilities, the one that is ‘best’. Mathematically, optimization is the following. Given: a set S and a function f : S → R. Find: an x ˆ ∈ S such that f (ˆ x) ≤ f (x) for all x ∈ S .

Here S is the feasible set, an x ∈ S is a feasible point, f is the objective function. An x ˆ as required is optimal. In this course, we will consider optimization problems where the feasible set S is the set of those points x ∈ Rn that satisfy given constraints, such as inequalities g(x) ≤ 0, where g : Rn → R, equations h(x) = 0, where h : Rn → R, and possibly the requirement that x ∈ Zn .

Example: a transportation problem. We have k factories producing goods and and l cities needing these goods. The i-th factory has a production capacity ci ∈ R, and the j -th city has a demand dj ∈ R. The cost of transporting one unit of these goods from factory i to city j is pij ∈ R. We need to choose xij ∈ R, the amount to be moved from the i-th factory to the j -th city, for each i and j . We must choose these numbers such that • xij ≥ 0 for i = 1, . . . , k and j = 1, . . . , l; the goods are transported from factories to cities, and not the other way around. l • j =1 xij ≤ ci for i = 1, . . . , k ; the total amount leaving a factory does not exceed its production capacity. k • i=1 xij ≥ dj for j = 1, . . . , l; the demand for each city is met. k l • i=1 j =1 pij xij , the total cost of transportation, is as small as possible. This is an optimization problem in Rkl , with objective function f : x → with feasible set (1) S := {x ∈ Rkl |
l j =1 xij k i=1 xij xij ≥ 0, k i=1 l j =1 pij xij ,

and

≤ ci , i = 1, . . . , k, ≥ dj , j = 1, . . . , l, i = 1, . . . , k, j = 1, . . . , l }

1.2. Linear, convex, and integer linear optimization. A linear optimization problem is an optimization problem whose objective is a linear function and whose constraints are linear equations and inequalities. The above transportation problem is a linear optimization problem. Integer linear optimization is linear optimization, but with the additional constraint that the solution be integral. This is a hard problem class, but one with many applications. The most common solution method is an application of linear optimization. Convex optimization is a broad class of problems containing linear optimization, but also quadratic and semidefinite optimization.
5

6

INTRODUCTION

1.3. Applications. Whatever people do, they will at some point ask whether there is an easier way to attain the same goals, or whether a better result can be reached with the same effort. With mathematical optimization one can answer some of these questions. In this section, we outline some typical applications of the theory taught in this course. The classical application of linear optimization is to find the most economical use of recources; a simplest example being the diet problem, where one asks for the cheapest combination of foods that make up a diet containing enough calories, minerals, vitamins, etc. The above transportation problem is of intermediate complexity: not only a combination of sources of stuff is sought for each city, but also a way to distribute the stuff produced in each factory over the cities. Slightly more complicated is the transshipment problem, where cargo has to be moved around the world by a combination of sea lines, and it is possible to move cargo from one ship to another. A somewhat similar problem type is to plan the flow of goods in a production system. For example, a chemical production system, where there are a number of processes, each taking a certain combination of energy, manpower and chemical substances in fixed ratios, and producing new chemical substances in fixed ratios which may be used again as input for other processes; substances being bought in the beginning and sold in the end (where waste disposal is ‘selling’ at a negative price). Linear optimization problems can be solved on a very large scale, using efficient implementations of the method explained in this course. Problems with ten thousand variables and constraints are solved in about a minute on a 1 GHz machine with sufficient memory. The integer linear optimization model is more suitable for all the variants of the above problems where the amounts in the solution need to be integral; when the food is canned, the stuff and cargo is transported in batches, and the intermediate products are produced in lots. Especially so when the amounts are small, and mere rounding of the optimal linear optimization solution is not realistic. In integer linear optimization, we can also define socalled binary variables x ∈ {0, 1} (as this is equivalent with ‘0 ≤ x ≤ 1 and x ∈ Z’). Such a variable may represent an all-or-nothing choice to be made: assign this person to this job or not, schedule this task after this other task or vice versa, let this piece of road be a part of the route or not, etc. With linear constraints one can then place logical requirements on the combination to choices to be made: at least someone has to do this job, you cannot plan tasks A after B, B after C, and C after A, etc. So integer linear optimization has many applications in planning, rostering, scheduling, routing, and the like. An efficient method for solving integer linear optimization problems in general is not known. The existing solution method described in this course proceeds by solving many linear optimization problems to solve one integer linear optimization problem. Since we can solve linear optimization very efficiently, integer linear optimization problems can be solved on a moderate scale with this method. Truly fast solution methods have been developed only for specific integer linear optimization problems. They require more advanced methods and insights than discussed in this course. Convex optimization traditionally has applications in statistical estimation, like linear regression where one wants to estimate the parameters of a formula so that it fits the observations best. Some more general curve fitting problems also give rise to convex optimization problems. The field of convex optimization is going through important changes since the discovery of the interior point method (which is not covered in this course), an efficient method for solving certain conic convex optimization problems. This method cleared the way for new applications, like robust linear optimization. This is an extension of linear optimization, allowing us to take uncertainty in the coefficients defining the constraints and the objective into account, and to find a solution that is feasible and optimal for a worst-case realization of the coefficients.

2. SOME PRELIMINARIES

7

2. Some preliminaries 2.1. Solving optimization problems. In optimization, we want to find (2) given a set S and a function f : S → R. As a mathematical expression, (2) is just the minimum of the set of reals {f (x) | x ∈ S }. As a problem, (2) can have one of several possible outcomes. • S = ∅; then we say that (2) is infeasible. • inf {f (x) | x ∈ S } = −∞; then (2) is unbounded. • inf {f (x) | x ∈ S } > −∞, but min{f (x) | x ∈ S } does not exist. • min{f (x) | x ∈ S } exists. Solving an optimization problem is deciding which of these possiblities occurs, and in case the optimum exists, finding one optimal solution. On several occasions, we shall use the following theorem to show that an optimal solution indeed exists. Theorem 1 (Weierstrass). If S ⊆ Rn is a nonempty compact set and f : S → R is a continuous function, then min{f (x) | x ∈ S } exists. Recall that a subset S ⊆ Rn is compact if and only if it is both closed and bounded. 2.2. Vectors and matrices. A m × n matrix has m rows and n columns. If A is a m × n matrix, the entry in the i-th row and the j -th column is aij :   a11 · · · a1n  .  . . (3) A= . . . . am1 · · · amn min{f (x) | x ∈ S }

being a n × 1 matrix, may be multiplied on the left with any m × n matrix A; the result Ax is a column vector in Rm : (5) Ax = a1 x1 + · · · + an xn . 

As a rule, we shall use a capital letter for a matrix and the corresponding small letter for its entries. We may also refer to the entry of A in the i-th row and the j -th column as (A)ij . It is convenient to regard vectors as matrices with either one row or one column; thus we distinguish between row vectors and column vectors. When multiplying matrices we must observe that the dimensions of the matrices are appropriate: an m × n matrix A and a n × k matrix B may be multiplied, the product AB being a m × k matrix. A column vector   x1  .  (4) x= . . , xn

Here the column vectors a1 , . . . , an ∈ Rm A row vector (6)

 a1j  .  are the columns of A: aj :=  . .  for j = 1, . . . , n. amj

y = (y1 , . . . , ym ),

being a 1 × m matrix, may be multiplied on the right with any m × n matrix B ; the result yB is a row vector in Rn : (7) yB = y1 b1 + · · · + ym bm .

8

INTRODUCTION

Here the row vectors b1 , . . . , bm ∈ Rn are the rows of B : bi = (bi1 , . . . , bin ) for i = 1, . . . , m. The product of a row vector a and a column vector b in Rn is either the 1 × 1 matrix (8) ab = a1 b1 + · · · + an bn , which is the inner product of a and b, or the n × n matrix ba with (ba)ij = bi aj , depending on the order of multiplication. The transpose At of a m × n matrix A is the n × m matrix with entries (At )ij = aji . Thus transposition turns row vectors into column vectors and vice versa. When A and B are two n × m matrices, then A ≤ B means aij ≤ bij for all i, j . In particular, if a, b ∈ Rn are both column vectors or both row vectors, then a ≤ b means ai ≤ bi for all i. The zero 0 is a vector or matrix with all entries 0, of appropriate dimensions. So if a ∈ Rn then the 0 in a ≤ 0 is a vector in Rn , etc. Finally, for a vector x ∈ Rn , the Euclidian norm or

n n 2 length of a vector is x := i=1 xi . The n-dimensional ball with center c ∈ R and radius r is B n (c, r ) := {x ∈ Rn | x − c ≤ r }.

2.3. Standard forms of linear optimization problems. With the above notation, we have a compact way of writing down linear inequalities and linear optimization problems. For example we can write ax ≤ b, where a is a row vector, x is a column vector and b is a scalar, as shorthand for the linear inequality a1 x1 + · · · + an xn ≤ b. If A is an m × n matrix and b ∈ Rm is a column vector, then Ax ≤ b is a system of m linear inequalities. A linear optimization problem is any problem of the form (9) min or max{zx | P x ≤ u, Qx ≥ v, Rx = w, x ∈ Rn },

where P, Q, R, u, v, w, z are given matrices and vectors of appropriate dimensions. It is not hard to see that if we can solve the seemingly easier problem (10) max{cx | Ax ≤ b, x ∈ Rn },

given any A, b, c, then we can also solve the general problem. Thus when proving theorems and giving algorithms we will focus on (10) or other simple problems equivalent to (9) rather than (9) itself. 2.4. Theorems of the alternative. Suppose we want to prove that a feasible point x ˆ ∈ Rn of max{cx | Ax ≤ b, x ∈ Rn } is optimal; then we must show that (11) max{cx | Ax ≤ b, x ∈ Rn } ≤ d, where d := cx ˆ. But (11) is equivalent to: (12) there is no x ∈ Rn such that Ax ≤ b and cx > d.

How to prove a negative statement like this? There is a way to show that a system of linear equations has no solution: if no solution can be found by Gaussian elimination, then there is none. We will show that the nonexistence of a solution to a given system of linear equations is in fact equivalent to the existence of a solution to a related system of linear equations: this is Fredholm’s Alternative(Theorem 3). So we can not only prove that a system of linear equations has no solution, but we can certify it by showing the solution to the other system. There is a similar theorem for systems of linear inequalities, Farkas’ Lemma (Theorem 10). This theorem is the key to understanding linear optimization problems. In Chapter 6, we prove another theorem of this type, about the existence of an integral solution to a system of linear equations (Theorem 35). Such theorems, stating that either one system has a solution or another system has a solution, but not both, are so-called theorems of the alternative.

3. LINEAR EQUATIONS

9

3. Linear equations 3.1. Gaussian elimination. A row operation on a matrix C is any one of the following: (1) exchanging two rows, (2) multiplying a row by a nonzero scalar, or (3) adding a scalar multiple of a row to another row. We denote that C ′ can be obtained from C by a sequence of row operations by C ∼ C ′ — it is clear that ∼ is an equivalence relation of matrices. Note that (13) I C It follows that if C ∼ C ′ , then there exists a nonsingular square matrix Y such that C ′ = Y C . Such a matrix Y can be constructed by applying the row operations that change C into C ′ , to I C . Conversely, the effect of multiplying C on the left by any nonsingular matrix Y can be obtained by a series of row operations. A matrix C is in row echelon form if either (1) C = 0 D , where D is in row echelon form, or 1 ∗ (2) C = , where D is in row echelon form. 0 D Note that it is easy to recursively find a solution to Cx = d if C d is in row echelon form. Lemma 2. Let C be a matrix. There is a matrix C ′ with C ∼ C ′ such that C ′ is in row echelon form. Proof. Let C be an n × m matrix. We prove the lemma by induction on the number of rows plus the number of columns n + m, the case where n = 1 or m = 1 being trivial. If the first column of C has only zero entries, then we may write C = 0 D . By induction, there is a D′ such that D ∼ D′ and D ′ is row echelon form, hence C = 0 D ∼ 0 D ′ =: C ′ , with C ′ in row echelon form. If, on the other hand, the first column of C has at least one nonzero, then by exchanging two rows (if necessary) we obtain a matrix in which the entry in the first row and the first column nonzero. By multiplying the first row by a suitable number, we get a matrix in which the top left entry is 1, and after subtracting a suitable multiple of the first row from each other row we see that 1 ∗ (14) C∼ . 0 D Since by induction, D ∼ D′ with D ′ in row echelon form, we have (15) C∼ 1 ∗ 0 D ∼ 1 ∗ 0 D′ =: C ′ , ∼ Y C′ if and only if C ′ = Y C and det(Y ) = 0.

where C ′ is in row echelon form. The Gaussian elimination method to solve the matrix equation Ax = b is to write the coefficients in a matrix A b and use that (1) if A b ∼ A′ b′ , then Ax = b ⇔ A′ x = b′ , and (2) if A′ b′ is in row echelon form, either it has a row of the form 0 1 , or a solution to A′ x = b′ is easily determined. Thus, to either find a solution to Ax = b, or to find out that there is no solution, it suffices to find a matrix A′ b′ in row echelon form such that A b ∼ A′ b′ — a problem that can be solved by following the steps in the proof of Lemma 2.

10

INTRODUCTION

3.2. Fredholm’s Alternative. We prove the fundamental theorem of linear algebra, also known as Fredholm’s Alternative. Theorem 3 (Fredholm, 1903; Gauss, 1809). If A is an m × n real matrix and b ∈ Rm , then exactly one of the following is true: (1) there is a column vector x ∈ Rn such that Ax = b, or (2) there is a row vector y ∈ Rm such that yA = 0 and yb = 1. Proof. Suppose (1) and (2) are both true. Choose x, y such that Ax = b, yA = 0, and yb = 1. Then (16) 0 = 0x = (yA)x = y (Ax) = yb = 1,

a contradiction. Thus, at most one of (1) and (2) holds. We next show that at least one of (1) and (2) is true. Consider the matrix A b . There is a matrix A′ b′ ∼ A b such that A′ b′ is in row echelon form. Since A′ b′ ∼ A b , there is a square matrix Y such that A′ b′ = Y A b , i.e. such that A′ = Y A and b′ = Y b. As A′ b′ is in row echelon form, either (1’) A′ x = b′ for some x, or (2’) the i-th row of A′ b′ is of the form A′ x b′ 0 1 , for some i ∈ {1, . . . , m}.

In the former case (1) holds, since = implies Ax = b. In the latter case (2) is true: take y equal to the i-th row of Y , then yA is the i-th row of Y A = A′ , which is 0; yb equals the i-th entry of Y b = b′ , which is 1.   1 3 Example. Consider the matrix A :=  2 7  and the vectors b :=  3 10 2 To find a solution x ∈ R of Ax = b we do Gaussian elimination on A b (17) A b    1 3 1 1 1 3 =  2 7 1  ∼  0 1 −1  . 3 10 2 0 0 0      1 1 1  , b′ :=  1 . 2 1 and find that

It is easy to read off the solution x =

4 from the coefficient matrix on the right. Seeking −1 a solution x ∈ R2 of Ax = b′ , we find that     1 1 3 1 3 1 A b′ =  2 7 1  ∼  0 1 − 1  . (18) 1 3 10 1 0 0

The third row of the matrix on the right is of the form 0 1 , thus there is no x such that Ax = b′ . To find a certificate y ∈ R3 for this fact as in Fredholm’s Alternative, we compute     1 0 0 1 3 1 1 0 0 1 3 1 I A b′ =  0 1 0 2 7 1  ∼  − 2 1 0 0 1 −1  . (19) 1 0 0 1 3 10 1 1 1 −1 0 0 From the third row of the coefficient matrix on the right, we read off y = (1, 1, −1). One easily verifies that yA = 0 and yb′ = 1.

3. LINEAR EQUATIONS

11

3.3. Linear equations. Consider the system a11 x1 a21 x1 . . . + ··· + ··· + + a1n xn a2n xn . . . = = b1 b2 . . .

(20)

am1 x1 + · · · (21)

+ amn xn = bm

of m linear equations in n variables x1 , . . . , xn . We say that the equation c1 x1 + · · · + cn xn = d y1 × y2 × . . . ym × (a11 x1 (a21 x1 . . . + ··· + ··· + + a1n xn a2n xn . . . = = b1 ) b2 ) . . . + + cn xn = d.

is a linear combination of the rows of (20) if there exist y1 , . . . , ym ∈ R such that

(22)

(am1 x1 + · · · c1 x1 + ···

+ amn xn = bm )

Writing (20) as a matrix equation Ax = b, we see that Theorem 3 asserts that either (1) the system of linear equations (20) has a solution x1 , . . . , xn , or (2) the equation (23) 0x1 + · · · + 0xn = 1 is a linear combination of rows of (20), but not both. Since any solution x of (20) will satisfy any linear combination cx = d of the rows of (20), and 0x = 1 has no solutions, it is clear that the two statements cannot both be true; the more interesting fact is that at least one of the statements is true. 3.4. Linear spaces. Recall from linear algebra that z ∈ Rm is a linear combination of vectors a1 , . . . , an ∈ Rm if there are scalars λ1 , . . . , λn ∈ R such that (24) z = λ1 a1 + · · · + λn an , lin.hull {a1 , . . . , an } := {λ1 a1 + · · · + λn an | λ1 , . . . , λn ∈ R} Hy := {x ∈ Rm | yx = 0}. and that the linear hull of vectors a1 , . . . , an is the set of all linear combinations of a1 , . . . , an : (25)

The set of all vectors orthogonal to a vector y ∈ Rm is (26)

A set of points H ⊆ Rm is called a linear hyperplane if H = Hy for some nonzero vector y . Let A be the matrix with columns a1 , . . . , an . Theorem 3 is equivalent to: either (1) b ∈ lin.hull {a1 , . . . , an }, or (2) a1 , . . . , an ∈ H and b ∈ H for some linear hyperplane H ,

but not both. Again, it is easy to see that the two statements exclude each other, since (27) a1 , . . . , an ∈ H ⇒ lin.hull {a1 , . . . , an } ⊆ H

for any linear hyperplane H .

12

INTRODUCTION

Exercises (1) Consider the linear optimization problem max{−x1 + 2x2 + x3 | 2x1 + x2 − x3 ≤ −2, −x1 + 4x2 ≤ 3, x2 − x3 ≤ 0, x ∈ R3 }. (a) Rewrite this problem to a problem of the form max{cx | Ax ≤ b, x ∈ Rn }. Give n, A, b, c. (b) Rewrite this problem to a problem of the form min{cx | Ax = b, x ≥ 0, x ∈ Rn }. Give n, A, b, c. (2) Consider the linear optimization problem max{5x1 + x3 | −x1 + x2 ≥ 2, x1 + 4x2 + x3 ≤ 3, x1 , x2 , x3 ≥ 0, x ∈ R3 }. (a) Rewrite this problem to a problem of the form max{cx | Ax ≤ b, x ∈ Rn }. Give n, A, b, c. (b) Rewrite this problem to a problem of the form min{cx | Ax = b, x ≥ 0, x ∈ Rn }. Give n, A, b, c. Write down the objective and the constraints of a transportation problem (section 1) fact.\ city 1 2 3 1 3 12 4 , with 2 factories and 3 cities, and with transportation costs 2 6 1 2 fact. cap. city 1 2 3 . capacities 1 5 and demands demand 1 3 2 2 2 (a) Rewrite this problem to a problem of the form max{cx | Ax ≤ b, x ∈ Rn }. Give n, A, b, c. (b) Rewrite this problem to a problem of the form min{cx | Ax = b, x ≥ 0, x ∈ Rn }. Give n, A, b, c. Rewrite the the general linear optimization problem (9) to problems of the form max{cx | Ax ≤ b, x ∈ Rn } and min{cx | Ax = b, x ≥ 0, x ∈ Rn }. Find, for each of the matrices A and vectors b below, either a column vector x such that Ax = b or a rowvector y such  that  yA = 0 and yb = 0. 1 5 0 2 (a) A =  1 2 1 , and b =  2 . 1 1 2 3     1 5 0 1 2 (b) A =  1 2 1 2 , and b =  2 . 1 1 2 4 3 −1 3 4 (c) A = ,b= . 4 5 0    4 −1 3 (d) A =  4 5 , b =  0 . 1 2 2     4 −1 3 0 1 , b =  0 . (e) A =  4 5 1 2 2 −1 m Let a1 , . . . , an ∈ R , and let H ⊆ Rm be a hyperplane. Show that a1 , . . . , an ∈ H ⇒ lin.hull {a1 , . . . , an } ⊆ H .

(3)

(4) (5)

(6)

EXERCISES

13

(7) Let K be a field and let A be an m × n matrix with entries in K and let b ∈ Km . Show that exactly one of the following holds: (a) there is a column vector x ∈ Kn such that Ax = b, or (b) there is a row vector y ∈ Km such that yA = 0 and yb = 1. (8) Let A be a set of subsets of {1, . . . , n}. Show that exactly one of the following holds: (a) there is a set X ⊆ {1, . . . , n} such that |X ∩ A| is odd for all A ∈ A, or (b) there is a set B ⊆ A such that |B| is odd and |{B ∈ B | i ∈ B }| is even for all i ∈ {1, . . . , n}.

Part 1

Linear Optimization

CHAPTER 1

Linear inequalities
1. Convex sets 1.1. Definitions. Given two points x, y ∈ Rn the line segment between x and y is the set of points (28) Rn A set of points C ⊆ is convex if [x, y ] ⊆ C for all x, y ∈ C (see Figure 2). A hyperplane is a set of the form (29) (30) where d ∈ Rn is a nonzero row vector and δ ∈ R. A halfspace is a set of points where d ∈ Rn is a nonzero row vector and δ ∈ R. The common boundary of these sets is Hd,δ . If X, Y ⊆ Rn , then we say that X and Y are separable (by a hyperplane) if there is some ≤ ≥ nonzero d ∈ Rn and δ ∈ R such that X ⊆ Hd,δ and Y ⊆ Hd,δ . An open halfspace is a set of points (31) where d ∈ Rn is a nonzero row vector and δ ∈ R. If X, Y ⊆ Rn , then we say that X and Y are strongly separable (by a hyperplane) if there is some nonzero d ∈ Rn and δ ∈ R such that < > X ⊆ Hd,δ and Y ⊆ Hd,δ . 1.2. The separation theorem. Let Y ⊆ Rn , x ∈ Rn . It is clear that if Y and {x} are strongly separable, then x ∈ Y . For closed and convex Y , the converse statement holds (see Figure 2). Theorem 4. Let C ⊆ Rn be a closed convex set and let x ˆ ∈ Rn . If x ˆ ∈ C , then x ˆ and C are strongly separable by a hyperplane.
y
< > Hd,δ := {x ∈ Rn | dx < δ} or Hd,δ := {x ∈ Rn | dx > δ} ≤ ≥ Hd,δ := {x ∈ Rn | dx ≤ δ} or Hd,δ := {x ∈ Rn | dx ≥ δ}

[x, y ] := {x + λ(y − x) | λ ∈ [0, 1]}.

Hd,δ := {x ∈ Rn | dx = δ}.

00 11 11111111111111111 00000000000000000 00 11 00000000000000000 11111111111111111 x+0.9(y−x) 11 00 00000000000000000 11111111111111111 00000000000000000 11111111111111111 11111111111111111 00000000000000000 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 x 11111111111111111 00000000000000000 00 11 00000000000000000 11111111111111111 x+0.3(y−x) 00 11 00000000000000000 11111111111111111 00 11

H> < H

x−0.2(y−x) H

Figure 1. A line segment, and a hyperplane
17

18

1. LINEAR INEQUALITIES

A convex set

A nonconvex set

A separating hyperplane Figure 2.

An inseparable pair

Proof. We must show that there is a nonzero row vector d ∈ Rn and a δ ∈ R such that dy ≥ δ for all y ∈ C and dx ˆ < δ. This is trivial if C = ∅, so we may assume that C = ∅. First we find a vector y ˆ ∈ C that is closest to x ˆ among all y ∈ C , i.e. such that (32) To show that such a vector exists in C , choose some y ˜ ∈ C , and let p := y ˜−x ˆ . Then C ∩ B n (ˆ x, p) is a nonempty compact set and the map y → y − x ˆ is continuous, so by Theorem 1 (33) is attained by some y ˆ ∈ C ∩ B n (ˆ x, p). Clearly, y ˆ is closest to x ˆ among all vectors in C ; the n vectors outside B (ˆ x, p) by definition have distance to x ˆ greater than p, and y ˆ−x ˆ ≤ p. We set d := (ˆ y−x ˆ)t and δ := 1 d (ˆ x + y ˆ ). Since x ˆ ∈ C , we have y ˆ = x ˆ , and hence d is not 2 the zero vector. A simple calculation proves dx ˆ < δ: 1 d 2 1 x+y ˆ) + (ˆ x−y ˆ)) = δ − < δ. dx ˆ = d( (ˆ 2 2 2 and similarly we have dy ˆ > δ.(Geometrically, the hyperplane Hd,δ is orthogonal to the line segment [ˆ x, y ˆ] and intersects it in the middle.) It remains to show that dy > δ for all y ∈ C . So let us assume that this is not true, and there exists an y ∗ ∈ C such that dy ∗ ≤ δ. Let f : R → R be defined by (34) (35) where w := y ∗ − y ˆ. Then f ′ (λ) = 2(dw + λ w 2 ), and hence f ′ (0) = 2dw = 2d(y ∗ − y ˆ) < 0. So for a small λ > 0, we have (36) and moreover y ˆ + λ(y ∗ − y ˆ) ∈ [ˆ y , y ∗ ] ⊆ C , contradicting the choice of y ˆ. y ˆ−x ˆ
2

y ˆ−x ˆ = inf { y − x ˆ | y ∈ C }.

min{ y − x ˆ | y ∈ C ∩ B n (ˆ x, p)}

f :λ→ y ˆ + λ(y ∗ − y ˆ) − x ˆ

2

= d

2

+ 2λdw + λ2 w 2 ,

= f (0) > f (λ) = y ˆ + λ(y ∗ − y ˆ) − x ˆ 2,

1. CONVEX SETS

19

The Lorenz cone L3

A cone generated by 5 vectors Figure 3.

There are many variants of the above Separation Theorem; we mention two. Their proofs are exercises. Theorem 5. Let C ⊆ Rn be a closed and convex set, and let c lie on the boundary of C . Then there exists a d ∈ Rn and δ ∈ R such that dc = δ and dx ≤ δ for all x ∈ C .

Theorem 6. Let C, D ⊆ Rn be closed and convex sets. If C ∩ D = ∅, then C and D are separable by a hyperplane.

1.3. Cones. A set of points C ⊆ Rn is a cone if C is convex and αx ∈ C for all x ∈ C and nonnegative α ∈ R. For example, the k-dimensional Lorenz cone or ice cream cone (see figure 3) is (37) Lk := {x ∈ Rk |
2 x2 1 + · · · + xk −1 ≤ xk }.

It is an exercise to prove that a set C is a cone if and only if αx + βy ∈ C for all x, y ∈ C and nonnegative α, β ∈ R. We have a variant of the separation theorem for cones. Theorem 7. Let C ⊆ Rn be a closed cone and let x ∈ Rn . If x ∈ C , then there is a d ∈ Rn such that dy ≥ 0 for all y ∈ C and dx < 0.

Proof. Suppose x ∈ C . Since C is a closed convex set, there exists nonzero d ∈ Rn and a δ ∈ R such that dy > δ for all y ∈ C and dx < δ. We will show that dx < 0, and dy ≥ 0 for all y ∈ C. As C is a closed cone, we have 0 ∈ C and hence δ < d0 = 0. So dx < δ < 0. Let y ∈ C . δ > 0. Hence αy ∈ C but on the other hand d(αy ) = α(dy ) = δ, a If dy < 0, then α := dy contradiction. Hence dy ≥ 0. The polar of a cone C is C ∗ := {y ∈ Rn | xt y ≥ 0 for all x ∈ C }. One way to formulate the separation theorem for cones is as follows. Corollary 7.1. If C is a closed cone, then C = C ∗∗ . Proof. Let C be a closed cone. If x ∈ C , then y t x ≥ 0 for all y ∈ C ∗ , thus x ∈ C ∗∗ . Now suppose x ˆ ∈ C . Then there is a d such that dx ≥ 0 for all x ∈ C and dx ˆ < 0 by the theorem. Then y ˆ := dt ∈ C ∗ by definition of C ∗ and since x ˆt y ˆ < 0, it follows that x ˆ ∈ C ∗∗ .

20

1. LINEAR INEQUALITIES

2. Linear inequalities 2.1. Farkas’ Lemma. Let a1 , . . . , an ∈ Rm be vectors. We say that z is a nonnegative combination of a1 , . . . , an if (38) for some nonnegative λ1 , . . . , λn ∈ R. It is an exercise to show that the set of all nonnegative combinations of a given set of vectors is a closed cone (see Figure 3). We define (39) Compare the definition of ‘cone’ to the definition of ‘linear hull’. We can use the separation theorem for cones to obtain a statement about nonnegative combinations of vectors; this is Farkas’ Lemma. Theorem 8 (Farkas, 1894). Let a1 , . . . , an and b be column vectors in Rm . Exactly one of the following statements holds: (1) b ∈ cone {a1 , . . . , an }. (2) There is a row vector d ∈ Rm such that dai ≥ 0 for all i and db < 0. cone {a1 , . . . , an } := {λ1 a1 + · · · + λn an | λ1 , . . . , λn ∈ R, λ1 , . . . , λn ≥ 0}. z = λ1 a1 + · · · + λn an

Proof. Suppose both (1) and (2) hold, so n i=1 λi ai = b for some nonnegative λ1 , . . . , λn , and there is a d such that da1 , . . . , dan ≥ 0 and db < 0. Then
n n

(40)

0 > db = d(
i=1

λi ai ) =
i=1

λi (dai ) ≥ 0,

a contradiction. So (1) and (2) cannot both be true; it remains to be shown that at least one of them holds. Suppose (1) does not hold, then b ∈ cone {a1 , . . . , an }. By the separation theorem for cones, Theorem 7, there is a vector d ∈ Rm such that da ≥ 0 for all a ∈ cone {a1 , . . . , an } and db < 0. Since a1 , . . . , an ∈ cone {a1 , . . . , an }, we have dai ≥ 0 for all i. Thus (2) holds. 2.2. An extension of Fredholm’s Alternative. We say that a vector x is nonnegative, notation x ≥ 0, if each of the entries of x is nonnegative. By reformulating Farkas’ Lemma we obtain a theorem similar to Theorem 3, characterizing when the matrix equation Ax = b has a nonnegative solution. Theorem 9 (Farkas’ Lemma, variant). If A is an m × n real matrix and b ∈ Rm , then exactly one of the following is true: (1) there is a column vector x ∈ Rn such that x ≥ 0 and Ax = b, or (2) there is a row vector y ∈ Rm such that yA ≥ 0 and yb < 0. Proof. Let a1 , . . . , an be the columns of A, and apply Theorem 8. In terms of systems of linear equations, this theorem states that either (1) the system of linear equations (20) has a nonnegative solution x, or (2) there is a linear combination cx = d of rows of (20) with c ≥ 0 and d < 0, but not both. It is an exercise to show that Theorem 9 generalizes Fredholm’s Alternative. 2.3. Linear inequalities. The following form of Farkas’ lemma will be useful. Note that if u and v are vectors, the inequality sign in u ≤ v means that ui ≤ vi for each i.

Theorem 10 (Farkas’ Lemma, variant). If A is an m × n real matrix and b ∈ Rm , then exactly one of the following is true: (1) there is a column vector x ∈ Rn such that Ax ≤ b, or

2. LINEAR INEQUALITIES

21

Proof. We first show that at least one of (1) and (2) must be false. For if x ∈ Rn is such that Ax ≤ b, and y ∈ Rm is such that y ≥ 0, yA = 0 and yb < 0, then (41) 0 = 0x = yAx ≤ yb < 0, a contradiction. To see that at least one of (1) and (2) is true, we apply Theorem 9 to the matrix A′ := A −A I and b. We find that either (1’) there exist column vectors x′ , x′′ , x′′′ ≥ 0 such that Ax′ − Ax′′ + Ix′′′ = b, or (2’) there is a row vector y ∈ Rm such that yA ≥ 0, y (−A) ≥ 0, yI ≥ 0, and yb < 0. If case (1’) is true, then x = x′ − x′′ satisfies Ax ≤ b, and then (1) holds. If case (2’) is true, it follows that yA = 0, y ≥ 0, and yb < 0, and then (2) holds. Consider the system of linear inequalities a11 x1 a21 x1 . . . + ··· + ··· + + a1n xn a2n xn . . . ≤ ≤ b1 b2 . . .

(2) there is a row vector y ∈ Rm such that y ≥ 0, yA = 0 and yb < 0.

(42)

am1 x1 + · · ·

+ amn xn ≤ bm .

We say that cx ≤ d is a nonnegative combination of the rows of (42) if there are scalars y1 , . . . yn ≥ 0 such that: y1 × y2 × . . . (a11 x1 (a21 x1 . . . + ··· + ··· + + a1n xn a2n xn . . . ≤ ≤ b1 ) b2 ) . . .

(43)

ym ×

(am1 x1 + · · · c1 x1 + ···

+ amn xn ≤ bm ) + cn xn ≤ d.

+

Theorem 10 says that either (1) the system of linear inequalities (42) has a solution, or (2) the inequality (44) 0x1 + · · · + 0xn ≤ −1 is a nonnegative combination of the rows of (42), but not both. 2.4. Fourier-Motzkin elimination. The proof of Farkas’ Lemma is not algorithmic: it does not give us a procedure to compute a vector x satisfying a system of linear inequalities, like Gaussian elimination for systems of linear equations. A straightforward, but elaborate method to solve a system of linear inequalities is Fourier-Motzkin elimination. The method is not efficient as an algorithm; on the other hand it can be used in an alternative proof of Farkas’ Lemma that avoids calling the Separation Theorem. The core of Fourier-Motzkin elimination is a procedure to eliminate one variable, xn say, and obtain a system of many more linear inequalities on the remaining variables x1 , . . . , xn−1 that has a solution if and only if the original system had a solution. Recursively (using FourierMotzkin on this system of inequalities with fewer variables), we find a solution x ˆ1 , . . . , x ˆn−1 to this system, from which we compute the missing x ˆn . We show how to solve the system of inequalities (42).

22

1. LINEAR INEQUALITIES

Step 1 For each i = 1, . . . , m, divide the i-th row of (42) by |ain | whenever ain = 0. After this, in each row the coefficient at xn is 1, −1 or 0; say we have k′ inequalities with coefficient −1 at xn , k′′ with coefficient 1 at xn , k′′′ with coefficient 0 at xn . After reordering the rows, we get: (45)
′ −x ′ ′ a′ n ≤ bi i = 1, . . . , k , ix ′ +x ′ ′ ′ ′′ a′ n ≤ bi i = k + 1, . . . , k + k , ix ′ ′ ′ ′ ′′ ai x ≤ bi i = k + k + 1, . . . , k′ + k′′ + k′′′ .

where x′ = (x1 , . . . , xn−1 ). This system has the same solutions as (42). Step 2 Find a solution to the system of linear inequalities (46)
′ ′ ′ ′ (a′ i + aj )x ≤ bi + bj ′ ′ ′ ai x ≤ bi

i = 1, . . . , k′ , j = k′ + 1, . . . , k′ + k′′ , i = k′ + k′′ + 1, . . . , k′ + k′′ + k′′′ .

Call this solution x ˆ′ = (ˆ x1 , . . . , x ˆn−1 ). Step 3 Choose x ˆn such that (47)

(That max ≤ min follows from the fact that x ˆ′ satisfies (46)). Then x ˆ1 , . . . , x ˆn is a solution to (42). In step 2, the number of inequalities is k′ k′′ + k′′′ . A careful choice of the variable to be eliminated is recommended, to keep this number small if possible. It is the rapid increase in the number of inequalities that makes this method inefficient. Note that if k′ ≤ 1 or k′′ ≤ 1, the number of inequalities actually decreases. 3. Carath´ eodory’s Theorem 3.1. Carath´ eodory’s Theorem. Theorem 11 (Carath´ eodory, 1911). Let a1 , . . . , am , b ∈ Rn . If b ∈ cone {a1 , . . . , am }, then there is a J ⊆ {1, . . . , m} such that {ai | i ∈ J } is linearly independent and b ∈ cone {ai | i ∈ J }. Proof. Choose J ⊆ {1, . . . , m} so that b ∈ cone {ai | i ∈ J }, and such that |J | is as small as possible. So there exist λi ≥ 0 for each i ∈ J such that i∈J λi ai = b. If {ai | i ∈ J } is not linearly independent, then there exist βi ∈ R for i ∈ J , not all zero, such that i∈J βi ai = 0. We may assume that βi > 0 for some i ∈ J ; if not, replace each βi ′ ′ i by −βi . Let α := min{ λ βi | βi > 0, i ∈ J }, and let λi := λi − αβi for i ∈ J . Then λi ≥ 0 for all i ∈ J , and there must be some i0 ∈ J such that α = (48)
i∈J λi0 βi0 ,

′ ′ ′ ˆ | j = k′ + 1, . . . , k′ + k′′ }. max{a′ ˆ ′ − b′ ˆn ≤ min{b′ j − aj x ix i | i = 1, . . . , k } ≤ x

so that λ′ i0 = 0. Moreover, βi ai ) = b − α0 = b.

λ′ i ai =
i∈J

(λi − αβi )ai =

But now b ∈ cone {ai | i ∈ J \ {i0 }}, contradicting the minimality of J . So {ai | i ∈ J } is linearly independent. Corollary 11.1. Let a1 , . . . , am ∈ Rn be row vectors and let b1 , . . . , bm ∈ R. If the system of linear inequalities (49) has no solution x ∈ Rn , then there is a set J ⊆ {1, . . . , m} with at most n + 1 members such that the subsystem (50) has no solution x ∈ Rn . ai x ≤ bi for all i ∈ J a1 x ≤ b1 , . . . , am x ≤ bm ,

i∈J

λi ai − α(

i∈J

´ 3. CARATHEODORY’S THEOREM

23

at 0 i for i = 1, . . . , m and let b := . If (49) has no solution, bi −1 ′ then by Farkas Lemma (Theorem 10), we have b ∈ cone {a′ 1 , . . . , am }. Hence there is a set ′ ′ J ⊆ {1, . . . , m} such that b ∈ cone {ai | i ∈ J }, where {ai | i ∈ J } is linearly independent. It follows that J has at most n + 1 members, and that (50) has no solutions, as required. Proof. Let a′ i := 3.2. The fundamental theorem of linear inequalities. It is evident that Theorem 11 allows us to improve Theorem 8 slightly: we can replace ‘b ∈ cone {a1 , . . . , an }’ by a stronger statement, stating that we need only a linearly independent set subset of {a1 , . . . , an } to generate b. In fact, Carath´ eodory’s Theorem can be used to replace both alternatives in any version of Farkas’ Lemma by stronger statements. Theorem 12. Let a1 , . . . , am , b ∈ Rn . Then exactly one of the following holds: (1) there is a linearly independent subset X ⊆ {a1 , . . . , am } such that b ∈ cone X ; and (2) there is a nonzero d ∈ Rn such that dai ≥ 0 for all i, db < 0, and rank{ai | dai = 0} = rank{a1 , . . . , am , b} − 1.

Proof. By Theorem 8, either b ∈ cone {a1 , . . . , am } or there is a d ∈ Rn such that dai ≥ 0 for all i, and db < 0. In the former case, (1) follows by Carath´ eodory’s Theorem. In the latter case, choose d with dai ≥ 0 for all i and db < 0, such that |{i | dai = 0}| is as large as possible. Suppose rank{ai | dai = 0} < rank{a1 , . . . , am , b} − 1. Then also rank{b} ∪ {ai | dai = 0} < rank{a1 , . . . , am , b}, hence there is some f ∈ lin.hull {a1 , . . . , am , b} so that f ⊥ lin.hull {b} ∪ {ai | dai = 0}. Then d + λf ⊥ {ai | dai = 0} and (d + λf )b < 0 for all λ. We may assume that f ai > 0 for some i. Let (51) Then there is some i such that dai = 0 and (d + λ∗ f )ai = 0, contradicting our choice of d. λ∗ := max{λ ∈ R | (d + λf )ai ≥ 0 for all i}.

Although it can be done, it is a bit cumbersome to apply Theorem 11 in it’s current form to derive the sharper form of the second alternative, which is why we gave a direct proof. Indeed, it is hard to even recognize that the modifications of the second alternative have anything to do with Carath´ eodory’s Theorem. It is therefore useful to take a more abstract viewpoint. Both Farkas’ lemma and Carath´ eodory’s Theorem can be formulated in terms of linear subspaces of Rn . As such, they are easily combined. 3.3. An abstract view. Recall that the orthogonal complement of a linear space L ⊆ Rn is the linear space (52) Theorem 13 (Farkas’ Lemma, variant). Let L ⊆ Rn be a linear space, and let e ∈ {1, . . . , n}. Then exactly one of the following holds: (1) there exists an x ∈ L such that x ≥ 0 and xe > 0; and (2) there exists an y ∈ L⊥ such that y ≥ 0 and ye > 0. Proof. We prove the theorem by induction on n. If n = 1 the theorem clearly holds. Suppose the theorem fails. Then n > 1, and we may assume e = n. Let L′ := {x′ ∈ Rn−1 | (x′ , 0) ∈ L}. Then L′ ⊥ = {y ′ ∈ Rn−1 | (y ′ , t) ∈ L⊥ for some t ∈ R}. Since the theorem holds for L′ , there exists either a x′ ∈ L′ so that x′ ≥ 0 and x′ e > 0 or a ′ ′⊥ ′ ′ ′ y ∈ L so that y ≥ 0 and ye > 0. In the former case (x , 0) ∈ L satisfies (1), and in the latter case, let t be such that (y ′ , t) ∈ L⊥ . Then (y ′ , t) satisfies (2) unless t < 0. So let y := (y ′ , t). L⊥ := {y ∈ Rn | y ⊥ x for all x ∈ L}

24

1. LINEAR INEQUALITIES

Let L′′ := {x′′ ∈ Rn−1 | (x′′ , s) ∈ L for some s ∈ R}. Then L′′⊥ := {y ′′ ∈ Rn−1 | (y ′′ , 0) ∈ Since the theorem holds for L′′ , there exists either x′′ ∈ L′′ so that x′′ ≥ 0 and x′′ e > 0 ′′ ′′ > 0. In the latter case (y ′′ , 0) ∈ L⊥ satisfies (2), and in or a y ∈ L′′⊥ so that y ′′ ≥ 0 and ye the former case, let s ∈ R be such that (x′′ , s) ∈ L. Then (x′′ , s) satisfies (1) unless s < 0. Let x := (x′′ , s). ′ x′′ + st > 0. We now have the contradiction 0 = y t x = y ′t x′′ + st ≥ ye e L⊥ }. Note that we could also have derived this version of Farkas’ lemma from the previous versions; this is an exercise. Apply the theorem to L := {x ∈ Rn+1 | x1 a1 + · · · + xn an = xn+1 b} to derive Theorem 8, to L = {x ∈ Rn+1 ∈ [A| − b]x = 0} for a proof of Theorem 9, and to L := {[bs − Ax|s] ∈ Rm+1 | x ∈ Rn , s ∈ R} for a proof of Theorem 10. The support of a vector x ∈ Rn is (53) If L ⊆ is a linear space, then X ⊆ {1, . . . , n} is a dependent set of L if there exists an x ∈ L such that supp(x) = X , and an independent set otherwise. A circuit is an inclusionwise minimal dependent set and a basis is an inclusionwise maximal independent set. A dependent set D of L is positive if there exists some x ∈ L such that x ≥ 0 and supp(x) = D . Theorem 14 (Carath´ eodory’s Theorem, variant). Let L ⊆ Rn be a linear space, and let e ∈ {1, . . . , n}. If D is a positive dependent set of L such that e ∈ D , then there exists a positive circuit C of L such that e ∈ C ⊆ D . Rn supp(x) := {i ∈ {1, . . . , n} | xi = 0}.

Proof. Choose C ⊆ D so that C is a positive dependent set, e ∈ C and |C | is as small as possible. Let x ∈ L be such that supp(x) = C and xe > 0. If C is not a circuit, then there is a circuit C ′ properly contained in C . Let x′ ∈ L be such that supp(x′ ) = C ′ , and ′ ′ ′ ′ x′ e < 0 if e ∈ C and xi > 0 for some i in any case. Let α = min{xi /xi | xi > 0}. Then ′ ′ ′ x − αx ≥ 0, (x − αx )e > 0 and supp(x − αx ) is properly contained in C , contradiction. The two theorems are combined in an obvious manner. Theorem 15. Let L ⊆ Rn be a linear space, and let e ∈ {1, . . . , n}. Then exactly one of the following holds: (1) there exists a positive circuit of L containing e; and (2) there exists a positive circuit of L⊥ containing e. To apply this abstract version of the fundamental theorem of linear inequalities, it is useful to translate some linear algebra facts into our new terms. Observe that if a1 , . . . , am ∈ Rn and L := {x ∈ Rm | m i=1 xi ai = 0}, then X ⊆ {1, . . . , m} is independent in L if and only if {ai | i ∈ X } is an independent set of vectors. Lemma 16. Let L ⊆ Rn be a linear space. (1) if B, B ′ are basis of L and e ∈ B \ B ′ there there exists an f ∈ B ′ \ B such that B \ {e} ∪ {f } is a basis of L. (2) if B is a basis of L, then {1, . . . , n} \ B is a basis of L⊥ . (3) if B is a basis of L and e ∈ B , then B ∪ {e} contains a unique circuit of L. (4) B is a basis of L if and only if B is independent and intersects all circuits of L⊥ . rankL (X ) := max{|I | | I ⊆ X, I independent in m i=1 xi ai = 0}, then rankL (X ) = rank{ai | i ∈ X }. L}.

(54)

If L ⊆ Rn , the L-rank of X ⊆ {1, . . . , n} is Rm |

If L = {x ∈

EXERCISES

25

Lemma 17. Let L ⊆ Rn be a linear space.

(1) rankL ({1, . . . , n}) = dim(L); (2) if X, Y ⊆ {1, . . . , n}, then rankL (X ) + rankL (Y ) ≥ rankL (X ∩ Y ) + rankL (X ∪ Y ); (3) rankL (X ) ≤ |X |, and X is independent iff rankL (X ) = |X |. L := {x ∈ Rm+1 | x1 a1 + · · · + xm am = xm+1 b}

We sketch a second Proof of Theorem 12. Consider the linear space (55)

Observe that L⊥ = {(da1 , . . . , dam , −db) | d ∈ Rn }. Apply Theorem 15 with e = m + 1. If there exists a positive circuit C of L containing e, then (1) follows taking X := {ai | i ∈ C \ {e}}. If there exists a positive circuit D of L⊥ containing e, let y ∈ L⊥ be such that y ≥ 0, ye > 0 and supp(y ) = D . Let d ∈ Rn be such that y = (da1 , . . . , dam , −db). Then (2) follows. Exercises     −1 3 1 1 Let a1 =  2  , a2 =  1  , a2 =  −1  , a4 =  0 . Show that: 2 0 2 −5   0 (a)  3  ∈ cone {a1 , a2 , a3 , a4 }; 2   7 (b)  1  ∈ cone {a1 , a2 , a3 , a4 }; 1   7 (c)  −8  ∈ cone {a1 , a2 , a3 , a4 }. 1 Sketch cone {a1 , a2 , a3 , a4 } in R3 . Show that there is no x ∈ R2 such that 2x1 + x2 ≤ 1, −x1 + x2 ≤ 0, x1 − 3x2 ≤ −1 and x1 + x2 ≤ 4. Prove Theorem 3 using Theorem 9. Hint: apply Theorem 9 to matrix A′ := A −A and vector b. Prove: if A is an m × n matrix, B is an m × k matrix, and c ∈ Rm , then exactly one of the following holds: (a) There are column vectors x ∈ Rn , y ∈ Rk such that x ≥ 0 and Ax + By = c. (b) There is a row vector z ∈ Rm such that zA ≥ 0, zB = 0, and zc < 0. Hint: apply Theorem 9 to matrix A B −B and vector c. Prove: if A is an m × n matrix, B is an m × k matrix, and c ∈ Rm , then exactly one of the following holds: (a) There are column vectors x ∈ Rn , y ∈ Rk such that x ≥ 0 and Ax + By ≤ c. (b) There is a row vector z ∈ Rm such that zA ≥ 0, zB = 0, z ≥ 0, and zc < 0. Hint: apply Theorem 9 to matrix A B −B I and vector c. Prove: if A is an m × n matrix, B is an k × n matrix, and c ∈ Rm , d ∈ Rk , then exactly one of the following holds: (a) There is a column vector x ∈ Rn such that x ≥ 0, Ax = c, and Bx ≤ d. (b) There are row vectors y ∈ Rm , z ∈ Rk such that yA + zB ≥ 0, z ≥ 0, and yc + zd < 0.    

(1)

(2) (3) (4)

(5)

(6)

26

1. LINEAR INEQUALITIES

(7) Prove Gordan’s Theorem: for any vectors a1 , . . . , am in Rn , exactly one of the following holds: (a) There exist λ1 , . . . , λm ≥ 0, not all zero, such that m i=1 λi ai = 0. (b) There is a vector d ∈ Rn such that dai > 0 for i = 1, . . . , m. ai 0 Hint: apply Theorem 8 to vectors a′ and b′ := . i := 1 1 (8) Prove Theorem 10 by induction on n using Fourier-Motzkin elimination. Hint: verify that it suffices to show: if 0x′ ≤ −1 is a nonnegative combination of the system inequalities (46), then 0x ≤ −1 is a nonnegative combination of the system inequalities (42). (9) Find a solution x1 , x2 to the system of linear inequalities x1 +x2 x1 x1 +2x2 x1 −x2 −x1 −2x2 − x1 ≤ 0, ≤ 0, ≤ 3, ≤ 3, ≤ 4, ≤5

(10) (11) (12) (13) (14) (15) (16) (17) (18)

(19)

(20)

(a) by Fourier-Motzkin elimination, (b) by drawing the set of solutions in the plane. What is the graphical interpretation of the minimum and maximum in step 3. of Fourier-Motzkin elimination? Prove that the intersection of convex sets is convex. Prove: if C ⊆ Rn is a nonconvex set, then there exists an x ∈ Rn \ C that cannot be separated from C by a hyperplane. < Prove that for any closed convex set C , we have C = {Hd,δ | dx < δ for all x ∈ C }. Let C, D be compact convex sets. Prove: if C ∩ D = ∅, then there is a hyperplane separating C and D . Hint: consider C − D := {c − d | c ∈ C, d ∈ D }. Prove that a set C ⊆ Rn is a cone if and only if αx + βy ∈ C for all x, y ∈ C and all α, β ∈ R such that α, β > 0. Let C ⊆ Rn be a closed and convex set, and let c lie on the boundary of C . Prove that there exists a d ∈ Rn \ {0} and a δ ∈ R such that dc = δ and dx ≤ δ for all x ∈ C . Let C, D ⊆ Rn be closed, convex sets. Show that if C and D are disjoint, then there is a d ∈ Rn \ {0} and δ ∈ R such that dx ≤ δ for all x ∈ C and dy ≥ δ for all y ∈ D . Verify that cone {a1 , . . . , an } is a cone for all a1 , . . . , an ∈ Rm . Show that C = C ∗ if (a) C = {x ∈ Rn | x ≥ 0}, the positive orthant. (b) C = Lk , the Lorenz cone. ∗ = C and C ∗ = C . (c) C = C1 × C2 , where C1 2 1 2 Complete and prove: if A is an m × n matrix and b ∈ Rm , then exactly one of the following holds: (a) There is a column vector x ∈ Rn such that Ax ≤ b and x ≥ 0. (b) (???). Complete and prove: if A is an m × n matrix, B is an k × n matrix, and c ∈ Rm , d ∈ Rk , then exactly one of the following holds: (a) There is a column vector x ∈ Rn such that Ax ≤ c and Bx ≥ d. (b) (???).

EXERCISES

27

(21) Show that Farkas’ Lemma is equivalent to the following statement. Let L ⊆ Rn be a linear space. For each i ∈ {1, . . . , n} exactly one of the following holds: (a) there is an x ∈ L such that x ≥ 0 and xi > 0, or (b) there is an y ∈ L⊥ such that y ≥ 0 and yi > 0. Here L⊥ := {y ∈ Rn | y t x = 0} is the orthogonal complement of L. (22) Prove: For any closed convex cone C , m × n matrix A and vector b ∈ Rm , exactly one of the following is true: (a) there exists an x ∈ C such that Ax = b; (b) there exists a y ∈ Rm such that yA ∈ C ∗ and yb < 0. (23) Prove Helley’s Theorem: If C1 , . . . , Cm are compact, convex subsets of Rn so that the intersection of any n + 1 of them is nonempty, then the intersection of all m sets is nonempty.

CHAPTER 2

Linear optimization duality
1. The duality theorem 1.1. Weak duality. Consider the linear optimization problem max{cx | Ax ≤ b, x ∈ Rn }, or equivalently, (56) max{cx | a1 x ≤ b1 , a2 x ≤ b2 , . . . , am x ≤ bm , x ∈ Rn } where a1 , . . . , am are the rows of A (see Figure 1). We will describe a method to make upper bounds for (56). Recall that the inequality cx ≤ d is a nonnegative combination of the constraints of (56) if and only if there exist y1 , . . . , ym ≥ 0 such that y1 × (a1 x ≤ b1 ) y2 × (a2 x ≤ b1 ) . . . . (57) . . ym × or in other words, if (58) (am x ≤ bm ) + cx ≤ d.

If cx ≤ d is a nonnegative combination of the constraints, then cx ≤ d holds for all feasible points x, and then d is an upper bound on (56). To find the best possible upper bound of this
cx=d a1 a2 c a5

y1 a1 + · · · + ym am = c and y1 b1 + · · · + ym bm = d.

a3

a4

Figure 1. A linear optimization problem
29

30

2. LINEAR OPTIMIZATION DUALITY

type, we need to solve the following optimization problem: (59) This is itself a linear optimization problem, the dual of (56). Taking y = (y1 , . . . , ym ), a more concise way to write (59) is min{yb | yA = c, y ≥ 0, y ∈ Rm }. When speaking of a linear optimization problem and it’s dual, the original problem (in this case (56)) is referred to as the primal problem. We summarize the above in a Lemma, and give the short formal proof. Lemma 18 (‘Weak duality’). For any m × n matrix A and vectors b ∈ Rm and c ∈ Rn , we have (60) Proof. If x ∈ Rn is such that Ax ≤ b and y ∈ Rm is such that yA = c and y ≥ 0, then yb − yAx = y (b − Ax) ≥ 0 as both y ≥ 0 and b − Ax ≥ 0. Hence cx = (yA)x = y (Ax) ≤ yb. sup{cx | Ax ≤ b, x ∈ Rn } ≤ inf {yb | yA = c, y ≥ 0, y ∈ Rm }. min{y1 b1 + · · · + ym bm | y1 a1 + · · · + ym am = c, y1 , . . . , ym ≥ 0, y1 , . . . , ym ∈ R}.

(61)

This proves the Lemma.

1.2. Strong duality. We prove the Duality Theorem for linear optimization. Theorem 19 (von Neumann, 1947; Gale, Kuhn and Tucker, 1951). For any m × n matrix A and vectors b ∈ Rm and c ∈ Rn , we have max{cx | Ax ≤ b, x ∈ Rn } = min{yb | yA = c, y ≥ 0, y ∈ Rm },

(62)

provided that the maximization problem, or the minimization problem, is both feasible and bounded. Proof. Since we already know that max ≤ min, it suffices to show that there is a feasible solution x to the maximization problem and a feasible solution y to the minimization problem so that cx ≥ yb. To be exact, we want an x ∈ Rn and a y ∈ Rm such that Ax ≤ b, yA = c, y ≥ 0, and cx ≥ yb, or equivalently,     A 0 b  0 −At   −ct    x   t t   0  A  (63)   yt ≤  c  .  0 −I   0  t −c b 0

If the latter system has a solution, then there are u ∈ Rm , w ∈ Rn (taking w := (v − v ′ )t ) and z ∈ R such that z ≥ 0, uA = zc, u ≥ 0, Aw ≤ zb and ub − cw < 0. We prove the theorem by showing that such a triple (u, w, z ) cannot exist. 1 We distinguish two cases, z = 0 and z > 0. If z > 0, let x := 1 z w and y := z u. Then Ax ≤ b, yA = c, y ≥ 0, and cx > yb, a violation to weak duality. So z = 0, and hence uA = 0, u ≥ 0, Aw ≤ 0 and either ub < 0 or cw > 0. If ub < 0 (and uA = 0) then there is no x ∈ Rn such that Ax ≤ b by Farkas’ Lemma (Theorem 10), i.e. the maximization problem is

By Farkas’ Lemma, such a pair (x, y ) does not exist if and only if there exists a row vector [u v v ′ s z ] ∈ Rm+n+n+m+1 such that     A 0 b  0 −At   −ct      ′ ′ ′ t t   A  (64) [u v v s z ]   = 0, [u v v s z ]  c  < 0, [u v v s z ] ≥ 0.  0  0 −I   0  t −c b 0

2. OPTIMAL SOLUTIONS

31

infeasible. If cw > 0 and there exists a feasible x for the maximization problem, then x + λw is feasible for any λ ≥ 0 and c(x + λw) → ∞ when λ → ∞, so that in this case the maximization problem is infeasible or unbounded. Thus if z = 0 the maximization problem is either infeasible or unbounded, and one similarly argues that the minimization problem is either infeasible or unbounded, contradicting our assumption. In Chapter 1, we suggested that via Farkas’ Lemma we would find a certificate for optimality, that is a positive statement equivalent with condition (12). We state such a condition below. Note that (12) is logically equivalent to (1) in the Corollary. Corollary 19.1. Let A be an m × n matrix, b ∈ Rm a column vector, c ∈ Rn a row vector and d ∈ R. If {x ∈ Rn | Ax ≤ b} = ∅, then the following are equivalent: (1) cx ≤ d for all x ∈ Rn such that Ax ≤ b, and (2) there exists a row vector y ∈ Rm such that y ≥ 0, yA = c and yb ≤ d. Proof. If (2) holds, then cx = yAx ≤ yb ≤ d for all x such that Ax ≤ b. Now consider the problem max{cx | Ax ≤ b}. By the assumption that {x ∈ Rn | Ax ≤ b} = ∅, this problem is feasible. If (1) holds, then the problem is bounded; then Theorem 19 implies the existence of a dual optimal solution y such that yb = max{cx | Ax ≤ b}. Then yb ≤ d, and (2) holds. 1.3. Sensitivity analysis. In some applications, the coefficients of the vector b in the optimization problem (65) are uncertain or fluctuating, and it is useful to estimate the value of (65) as a function of b. Note that if for a given b, the optimum solution of the dual problem (66) (67) and (68) is y ˆ, then the feasibility of y ˆ does not depend on b; hence min{yb | yA = c, y ≥ 0} max{cx | Ax ≤ b, x ∈ Rn }

max{cx | Ax ≤ b′ , x ∈ Rn } = min{yb′ | yA = c, y ≥ 0} ≤ y ˆb′ , max{cx | Ax ≤ b′ , x ∈ Rn } − max{cx | Ax ≤ b, x ∈ Rn } ≤ y ˆb′ − y ˆb = y ˆ(b′ − b).

So the coefficient y ˆi is a bound for the change in the optimal value of (65) per unit of change in bi . In economics, the coefficients of the optimal dual solution are sometimes called shadow prices. 2. Optimal solutions 2.1. Complementary slackness. We consider again the primal and dual problems of Theorem 19. Lemma 20 (‘Complementary slackness’). Let A be and m × n matrix, b ∈ Rm a column vector and c ∈ Rn a row vector. Let a1 , . . . , am be the rows of A. If x is a feasible solution of max{cx | Ax ≤ b} and y is a feasible solution of min{yb | yA = c, y ≥ 0}, then the following are equivalent. (1) x and y are both optimal solutions; and (2) yi = 0 or ai x = bi for each i ∈ {1, . . . , m}. Proof. Feasible solutions x and y are both optimal if and only if cx = yb, by Theorem 19. Since yb − cx = yb − yAx = y (b − Ax) = m i=1 yi (bi − ai x), and yi ≥ 0 and b − ai x ≥ 0 for all i if x, y are feasible, we have cx = yb if and only if (2) holds.

32

2. LINEAR OPTIMIZATION DUALITY

x optimal

x not optimal

x not optimal
c x

x not optimal

x

c

x

c

c x

a2 c a3 c a2 a3 c a4

c

c ∈ cone {a2 , a3 }

c ∈ cone ∅

c ∈ cone {a2 }

c ∈ cone {a3 , a4 }

Figure 2. Several feasible solutions x and the optimality criterion We give an optimality criterion with a more geometrical flavor. Lemma 21. Let A be an m × n matrix, let b ∈ Rm be a column vector and c ∈ Rn a row vector. Let a1 , . . . , am be the rows of A. If x ˆ ∈ Rn is such that Ax ˆ ≤ b, then the following are equivalent: (1) x ˆ is an optimal solution of max{cx | Ax ≤ b, x ∈ Rn }, and (2) c ∈ cone {ai | ai x ˆ = bi , i ∈ {1, . . . , m}}. Proof. If (2) is not true, then by Theorem 8 (Farkas’ Lemma) there is a column vector d ∈ Rn such that cd < 0, and ai d ≥ 0 for all i with ai x ˆ = bi . But then A(ˆ x − λd) ≤ b for a small enough λ > 0 and c(ˆ x − λd) > cx ˆ for all λ > 0, hence x ˆ is not optimal and (1) is false. Let B := {i ∈ {1, . . . , m} | ai x ˆ = bi }. If (2) is true, then there exist λi ≥ 0 for i ∈ B such that c = i∈B λi ai . But then for any x ∈ Rn such that Ax ≤ b we have (69) so x ˆ is optimal. cx =
i∈B

λi ai x ≤

λi bi =
i∈B i∈B

λi ai x ˆ = cx ˆ,

Observe that the above lemma states nothing else than: x ˆ is an optimal solution of max{cx | Ax ≤ b} if and only if there exists a feasible solution y ˆ of min{yb | yA = c, y ≥ 0} so that the complementary slackness condition holds for the pair (ˆ x, y ˆ). So there is nothing here that could not be proved from the strong duality theorem and complementary slackness. The reason for including this lemma is to give some geometrical insight into optimality (see Figure 2; the feasible set and the vectors a1 , . . . , a5 in this picture are as in Figure 1). There is a second, more algebraic way to view Lemma 21. We say that an inequality ax ≤ b is tight in x ˆ if ax ˆ = b. Lemma 21 states that if x ˆ is an optimal solution of (56), it is possible to obtain the inequality cx ≤ d (for some d ∈ R) as a nonnegative combination of inequalities that are tight in x ˆ. Then it follows that d = cx ˆ, and the upper bound on (56) thus obtained is best possible. The direct proof of Lemma 21 is sometimes used to prove the strong duality theorem differently: for the lemma implies that if there exists an optimal solution to the primal problem, then there exists an optimal solution to the dual with the same objective value. For an alternative

2. OPTIMAL SOLUTIONS

33

proof of the strong duality theorem, it would suffice to show directly that max{cx | Ax ≤ b} is attained if the problem is feasible and bounded, and then apply the above Lemma to the optimal solution to show the existence of an optimal dual solution y . 2.2. Basic solutions. We are still working on the dual pair of optimization problems of Theorem 19. Recall from linear algebra that the rowspace of a matrix A is rowspace(A) := {yA | y ∈ Rm }.

Theorem 22. Let A be an m × n matrix, b ∈ Rm a column vector and c ∈ Rn a row vector. Let a1 , . . . , am be the rows of A. Assume that max{cx | Ax ≤ b, x ∈ Rn } is feasible and bounded. Then there exists a set B ⊆ {1, . . . , m} such that (1) {ai | i ∈ B } is a basis of the rowspace of A; (2) any x ∈ Rn such that ai x = bi for all i ∈ B is an optimal solution of max{cx | Ax ≤ b}; and (3) any y ∈ Rm such that yA = c and yi = 0 for all i ∈ B is an optimal solution of min{yb | yA = c, y ≥ 0}. Proof. By Theorem 19, max{cx | Ax ≤ b} has an optimal solution. Let us choose an optimal solution x ˆ ∈ Rn so that the total number of inequalities form Ax ≤ b that are tight in x ˆ is as large as possible: i.e. such that |{i ∈ {1, . . . , m} | ai x ˆ = bi }| is maximal. Since x ˆ is optimal, it follows from Lemma 21 that c ∈ cone {ai | ai x ˆ = bi }. By Theorem 11, there exists a set of indices B ⊆ {1, . . . , m} so that (1’) {ai | i ∈ B } is a basis of lin.hull {ai | ai x ˆ = bi }; (2’) ai x ˆ = bi for all i ∈ B ; and (3’) c ∈ cone {ai | i ∈ B }. (In fact, Carath´ eodory’s Theorem only implies that there is a linearly independent set satisfying (2’) and (3’), but it is easy to see that we may add vectors from {ai | ai x ˆ = bi } to obtain a basis as in (1’).) Suppose that (1) does not hold. Then (70) lin.hull {ai | ai x ˆ = bi } = rowspace(A) = lin.hull {ai | i ∈ {1, . . . m}}. Then there must be a vector d orthogonal to all ai such that ai x ˆ = bi , but not orthogonal to all rows ai of A, so Ad = 0. Consider the line ℓ : x ˆ + λd. Since Ad = 0, ℓ cannot be completely contained in {x ∈ Rn |Ax ≤ b}, so at least one of (71) λ∗ min{λ ∈ R | A(ˆ x + λd) ≤ b} or max{λ ∈ R | A(ˆ x + λd) ≤ b} exists. Let be an optimal solution of the maximization or the minimization problem, and ∗ ∗ let x := x ˆ + λ d. Then there is an i∗ ∈ B such that the linear inequality ai∗ x ≤ bi∗ is satisfied with equality at x∗ . Now (72)

and x∗ is an optimal solution of max{cx | Ax ≤ b} by lemma 21 and (3’). This is a contradiction to our choice of x ˆ. So (1) is true. If (2) were not true, then there exists an x′ ∈ Rn such that ai x′ = bi for all i ∈ B , which is not an optimal solution of max{cx | Ax ≤ b}. Note that any feasible x ∈ Rn such that ai x = bi for all i ∈ B is optimal by (3’) and Lemma 21. So x′ is not feasible. Let d := x′ − x ˆ. Then the line ℓ : x ˆ + λd is not completely contained in {x ∈ Rn |Ax ≤ b}, and again either the minimum or the maximum in (71) exists. We can deduce a contradiction as before. We leave it to the reader to show (3).

i∗ ∈ {i | ai x ˆ = bi } ⊆ {i | ai x∗ = bi } ∋ i∗ ,

34
c

2. LINEAR OPTIMIZATION DUALITY

c x x

c x

x basic optimal solution x nonbasic optimal solution x basic optimal solution Figure 3. Basic and nonbasic solutions When considering the optimization problem max{cx | Ax ≤ b}, we say that a set B ⊆ {1, . . . , m} is a basis if it satisfies (1) in the above theorem, which is optimal if both (2) and (3) hold. An x as in (2) is then a basic optimal solution of max{cx | Ax ≤ b} (see Figure 3); such an x necessarily exists, since a system of |B | ≤ n linear equations in n unknowns must have a solution. A y as in (3) is a basic optimal solution of min{yb | yA = c, y ≥ 0}; since B is a basis there exists exactly one such y (exercise). To construct basic optimal solutions, it suffices to know an optimal basis. Recall that the rank of a m × n matrix A satisfies rank(A) = n − dim ker(A), and equals the dimension of the rowspace of A. Corollary 22.1. If the problem max{cx | Ax ≤ b, x ∈ Rn } is feasible and bounded, it has an optimal solution satisfying at least rank(A) inequalities from Ax ≤ b with equality. An application of this corollary is the transportation problem of Chapter 1: this problem has k + l + kl inequality constraints: k for factory productivity, l for city demands and kl nonnegativity constraints. The problem has kl variables, and it is easy to verify that the constraint matrix has rank kl. So there is an optimal solution for which at least kl of the constraints are tight. At least kl − k − l of these tight constraints must be nonnegativity constraints, as there are only k + l other constraints. This implies that there is an optimal solution such that at most k + l (out of kl) variables are nonzero. So an optimal transportation plan need not be complicated. 3. Finding the dual 3.1. The dual of the dual. Since max{cx | Ax ≤ b, x ∈ Rn } is a problem to which any linear optimization problem can be reduced, we should be able to construct a dual to any linear optimization problem. A first question that comes to mind is: what is the dual of the dual of max{cx | Ax ≤ b, x ∈ Rn }; i.e. what is the dual of the linear optimization problem min{yb | yA = c, y ≥ 0, y ∈ Rm }? Observe that min{yb | yA = c, y ≥ 0, y ∈ Rm } equals     ct At (73) − max{(−bt )z |  −At  z ≤  −ct  , z ∈ Rm }. 0 −I The dual of this problem is     At ct (74) − min{[u u′ w]  −ct  | [u u′ w]  −At  = −bt , [u u′ w] ≥ 0, [u u′ w] ∈ Rn+n+m }, −I 0 − min{−c(u′ − u)t | A(u′ − u)t ≤ b, u, u′ ≥ 0, u, u′ ∈ Rn }.

which by transposing and simplifying some expressions equals (75)

3. FINDING THE DUAL

35

Substiting x for (u′ − u)t , we obtain max{cx | Ax ≤ b, x ∈ Rn }. So taking the dual of the dual brings back the original problem. 3.2. Construction of a standard dual. Again, since any linear optimization problem can be reduced to a problem of the form max{cx | Ax ≤ b}, we should be able to construct a dual for any given linear optimization problem. This is possible in principle; it involves rewriting the constraints to a single matrix inequality Ax ≤ b, then applying Theorem 19, followed by more matrix manipulations. The above computation of the dual of the dual is an example. It can be a messy job, and there is no reason to assume that the result of such manipulations is unique. We state a version of the duality theorem that contains all possible duality theorems almost trivially as a special case. With this theorem, the problem of constructing a dual becomes an exercise in substitution. Moreover, this procedure gives a ‘standard’ dual for each primal problem. Theorem 23. Let A, B, C, D, E, F, G, H, K be matrices, let d, e, f be row vectors and let a, b, c be column vectors of appropriate sizes. Then Ax + By + Cz ≤ a, Dx + Ey + F z = b, (76) max{dx + ey + f z | } = min{ua + vb + wc | Gx + Hy + Kz ≥ c, x ≥ 0, z ≤ 0 uA + vD + wG ≥ d, uB + vE + wH = e, } uC + vF + wK ≤ f, u ≥ 0, w ≤ 0

provided the maximization problem, or the minimization problem, is both feasible and bounded. The proof of this theorem is an exercise. Note that each variable of the primal problem corresponds to a constraint of the dual and vice versa. The exact correspondence laid down in Theorem 23 is summarized in the following table. maximization problem minimization problem ‘≤’ constraint ‘≥ 0’ variable ‘=’ constraint free variable ‘≥’ constraint ‘≤ 0’ variable ‘≥ 0’ variable ‘≥’ constraint free variable ‘=’ constraint ‘≤ 0’ variable ‘≤’ constraint coefficient at i-th variable in j -th constraint coefficient in i-th constraint at j -th variable The slack of an inequality ax ≤ b is b − ax. The complementary slackness condition for a pair of feasible solutions of the primal and dual problem is that if an inequality of the primal (resp. dual) problem has nonzero slack in the given solution, then the dual (resp. primal) variable corresponding to that inequality is zero in the given solution, i.e. has no slack in its nonnegativity constraint. Hence the name ‘complementary slackness condition’. A feasible primal-dual solution is optimal if and only if it satisfies complementary slackness. Corollary 23.1 (‘Complementary slackness’). For feasible solutions x, y, z and u, v, w of the above maximization and minimization problems, the following are equivalent: (1) (x, y, z ) and (u, v, w) are both optimal. (2) (a) if xi > 0 then the i-th column of uA + vD + wG ≥ d holds with equality for all i, (b) if zk < 0 then the k-th column of uC + vF + wK ≤ f holds with equality for all k, (c) if ul > 0 then the l-th row of Ax + By + Cz ≤ a holds with equality for all l, and (d) if wn < 0 then the n-th row of Gx + Hy + Kz ≥ c holds with equality for all n.

36

2. LINEAR OPTIMIZATION DUALITY

Example: the transportation problem. We consider again the transportation problem of Chapter 1, min{ (77)
k i=1 l j =1 pij xij

|

where capacities ci , demands dj and unit transportation costs pij are given. In matrix notation, this problem takes the form (78) where F is the k × kl matrix with fi,i′ j = 1 if i = i′ and fi,i′ j = 0 otherwise, G is the l × kl matrix with gj,ij ′ = 1 if j = j ′ and gj,ij ′ = 0 otherwise, and c ∈ Rk , d ∈ Rl and p ∈ Rkl are the capacity, demand and transportation cost vectors. It is an exercise to show that the dual of (78) is (79) Let us translate this matrix version of the dual back to more down-to-earth terms. In the dual, we have a variable for each constraint of the primal problem; in particular, we have a variable yi for each factory i and a variable zj for each city j . The constraints of the dual each correspond to a variable of the primal problem: e.g. the ij -th column of the matrix inequality zG − yF ≤ p is zj − yi ≤ pij . So the dual is max{ (80)
l j =1 zj dj

≤ ci for i = 1, . . . , k, ≥ dj for j = 1, . . . , l, xij ≥ 0 for i = 1, . . . , k, j = 1, . . . , l },

l j =1 xij k i=1 xij

min{px | F x ≤ c, Gx ≥ d, x ≥ 0, x ∈ Rkl },

max{zd − yc | zG − yF ≤ p, y ≥ 0, z ≥ 0, y ∈ Rk , z ∈ Rl }.

k i=1 yi ci

We will take the dualization process one step further an try to interpret the dual in the same economical setting that gave rise to the primal problem. Note that zj − yi is compared to pij ; this suggests that the dual variables zj and yi have the same dimension as pij , which is ‘money per unit’. Suppose yi is the price per unit for stuff at the front door of factory i and zj is the price per unit one is willing to pay for stuff in city j . Then there is a reasonable explanation for the constraint that zj ≤ yi + pij for each i, j . If citizens have complete knowledge of transportation costs and current factory prices, no citizen is willing to pay more for a unit of stuff than the cost at any given factory plus the amount it takes to transport from that factory to his or her city. So the feasible region of the dual problem may be interpreted as the range of market prices in a market where the buyers have complete information. The objective l k j =1 zj dj − i=1 yi ci is a lower bound for the total amount spent by citizens minus an upper bound for the total amount payed to factories, which is a lower bound for the amount of money spent on transportation. It is not hard to verify formally that the dual maximum is less than or equal to the primal minimum: if xij ≥ 0 are such that i xij ≥ dj for all j and j xij ≤ ci for all i, and yi ≥ 0, zj ≥ 0 are such that zj − yi ≤ pij for all i, j , then (81)
i,j

| yi ≥ 0 for i = 1, . . . , k, zj ≥ 0 for j = 1, . . . , l, zj − yi ≤ pij for i = 1, . . . , k, j = 1, . . . , l }.

pij xij ≥

i,j

(zj − yi )xij =

zj (
j i

xij ) −

yi (
i j

xij ) ≥

j

zj dj −

yi ci .
i

The rational buyer at city j will select an i0 such that pij + yi is minimum, and buy from factory i0 at that price zj = pi0 j + yi0 . The factories may increase prices if the market allows doing so without this leading to sales less than production capacity, and conversely prices need to drop if sales are less than capacity — a factory cannot afford not using the production capacity they

EXERCISES

37

pay for. The complementary slackness conditions state that a feasible x (a transportation plan to everyone’s capacity and satisfaction) and a feasible (y, z ) (prices consistent with complete information) have the same objective value if and only if pij = zj + yi whenever xij > 0 (buyers are rational), i xij = dj whenever zj > 0 (if people buy more than they need it must be free, another form of rationality) and j xij = ci whenever yi > 0 (a nonzero price is only stable if the production capacity is met). If buyers are rational and the market is free, prices will be stable if and only if the three conditions above are met simultaneously. By complementary slackness, spending on transportation will then equal the cost of transportation. The strong duality theorem implies that such a stable set of prices exists in a market with complete information. The above is of course fiction, not mathematics. We have basically invented explanations for the constraints and the objective and the dual, but in the end this does lead to a valid statement about the situation that is sketched this way, in this case that a properly functioning open market can be stable. Exercises (1) Prove: min{cx | Ax ≥ b, x ≥ 0, x ∈ Rn } = max{yb | yA ≤ c, y ≥ 0, y ∈ Rm } provided that the minimum, or the maximum, is both feasible and bounded (mimick the proof of Theorem 19). Give a ‘complementary slackness condition’ for a feasible pair x, y that is equivalent to  the optimality both x and y .   of  1 2 12  1  4  0       . Let P := {x ∈ R2 | Ax ≤ b}. Draw the P in  1 (2) Let A =  1 −3  and b =     −3 −1   −3  6 −2 2 the plane. Prove that cx ≤ d holds for all x ∈ P , for each of the following c ∈ R2 and d ∈ R. (a) c = (−1, 0), d = 1, (b) c = (0, −1), d = 0, (c) c = (1, 1), d = 9, and (d) c = (−1, −2), d = −3. Hint: read Corollary 19.1. (3) Finish the proof of Theorem 22. (4) Let B be an optimal basis, i.e. a B satisfying (1)-(3) of Theorem 22. (a) Prove that {x ∈ Rn | ai x = bi for all i ∈ B } = x ˆ + ker(A) for some x ˆ ∈ Rn . (b) Prove that there is exactly one y such that yA = c and yi = 0 for all i ∈ B . (5) Prove that the transportation problem has an optimal solution with at most k + l − 1 nonzero variables. (6) Prove Theorem 23. (7) Construct the dual of the following linear optimization problems. (a) max{cx | Ax = b, l ≤ x ≤ u, x ∈ Rn }, where A is an m × n matrix, b ∈ Rm and c, l, u ∈ Rn . (b) min{f x | Ax = b, Cx ≤ d, x ∈ Rn }, where A is an m × n matrix, b ∈ Rm , C is a k × n matrix, d ∈ Rk , and f ∈ Rn . (8) Prove that the dual of min{px | F x ≤ c, Gx ≥ d, x ≥ 0, x ∈ Rn } is equal to max{zd − yc | zG − yF ≤ p, y ≥ 0, z ≥ 0, y ∈ Rk , z ∈ Rl }. (9) Determine the dual of:

38

2. LINEAR OPTIMIZATION DUALITY

(12) The diet problem is the problem of finding a cheapest combination of foods that will satisfy all the daily nutritional needs of a person or animal. There are n types of food and m nutritional characteristics (vitamins, minerals, calories, dietary fibre, etc.). Given are a unit price ci for each food i, required amount bj for each j , and amount aij of j present in a unit of food i, for each i, j . (a) Formulate the diet problem as a linear optimization problem. (b) Determine the dual of this problem. (c) Interpret the constraints and variables of the dual. (13) Tschebyshev approximation is the following. Given are row vectors a1 , . . . , am ∈ Rn and numbers b1 , . . . , bm ∈ R. We seek a smallest t ∈ R such that there exists an x ∈ Rn with the property that |ai x − bi | ≤ t for i = 1, . . . , m. Formulate the Tschebyshev approximation problem as a linear optimization problem, and determine the dual of this problem. (14) Structural optimization is the problem of finding a bars-and-joints structure capable of carrying a certain set of external forces on its joints, using as little material as possible. Given are a finite set J of possible joints, each joint j ∈ J having coordinates pj ∈ R3 , an external force fj ∈ R3 acting on each joint j , and a set B of unordered pairs of joints. A subset K ⊆ J of the joints is fixed to the wall or floor. Between each pair of joints {i, j } ∈ B , you may place a bar of strength sij , meaning that such a bar can withstand a force ≤ sij pushing together or pulling apart it’s ends i and j . The amount of material needed for a bar are proportional to both its length and its required strenght. The bars act like springs: when a framework is subject to an external force, the joints are displaced slightly (the displacement is negligable) compressing or pulling the bars which respond by exerting an (opposite) force on the joints they connect. Equilibrium is reached when the forces in each joint add up to zero. (a) Let xij be the force with which bar {i, j } pushes it’s endpoints apart (when xij is negative the bar pulls its endpoints together). Give the equations that describe when a framework is in equilibrium. Can there be more than one solution? (b) Give an expression for the total amount of material needed in a framework with bars of strenght sij . (c) Formulate the structural optimization problem as a linear optimization problem with variables xij and sij for each {i, j } ∈ B . Can there be more than one optimal solution? (d) Determine the dual of this problem. Give a mechanical interpretation of the variables and constraints.

(a) max{−x1 + 2x2 + x3 | 2x1 + x2 − x3 ≤ −2, −x1 + 4x2 ≤ 3, x2 − x3 ≤ 0, x ∈ R3 }; (b) max{5x1 + x3 | −x1 + x2 ≥ 2, x1 + 4x2 + x3 ≤ 3, x1 , x2 , x3 ≥ 0, x ∈ R3 }. (10) Let A be an m × n matrix and let b ∈ Rn , and let P := {x ∈ Rn | Ax ≤ b}. Consider the problem of finding a ball B n (x0 , r ) := {x ∈ Rn | x − x0 ≤ r } with center x0 and radius r , such that B n (x, r ) ⊆ P , and the radius r is as large as possible. Formulate this problem as a linear optimization problem. Determine the dual of this problem. (11) Let L ⊆ Rn be a linear space and let s, t ∈ {1, . . . , n}, s = t. Show that if (a) there exists an x ∈ L such that xs > 0 and xi ≥ 0 for all i = s, t, and (b) there exists a y ∈ L⊥ such that yt > 0 and yi ≥ 0 for all i = s, t, then max{xt | x ∈ L, xs = 1, xi ≥ 0 for all i = s, t} = . − max{ys | y ∈ L⊥ , yt = 1, yi ≥ 0 for all i = s, t}

EXERCISES

39

(e) Suppose now that a type of rope is available capable of holding any force pulling its ends apart, of negligable cost. Formulate the problem that arises now. (f) Can you incorporate the effect of gravity in your model?

CHAPTER 3

Polyhedra
1. Polyhedra and polytopes 1.1. Polyhedra. A set P ⊆ Rn is a polyhedron if P = {x ∈ Rn | Ax ≤ b}, for some m × n matrix A and vector b ∈ Rm , i.e. a polyhedron is the solution set of a system of finitely many linear inequalities. The set of solutions to just one linear inequality ax ≤ b is the closed affine ≤ halfspace, or just halfspace Ha,b := {x ∈ Rn | ax ≤ b}. Halfspaces are clearly both closed and convex. Since a polyhedron is the intersection of finitely many halfspaces, it follows that all polyhedra are closed, convex sets. 1.2. Polytopes. We say that y ∈ Rn is a convex combination of x1 , . . . , xk ∈ Rn , if there exist λ1 , . . . , λk ≥ 0 such that y = λ1 x1 + · · · + λk xk and λ1 + · · · + λk = 1. The line segment [x, y ], for example, is the set of all convex combinations of x and y . Lemma 24. If C ⊆ Rn is a convex set and x1 , . . . , xk ∈ C then any convex combination of x1 , . . . , xk is in C . The convex hull of a set of vectors X ∈ Rn is the set of all convex combinations of finite sets of vectors in X : conv.hull X := {λ1 x1 + · · · + λk xk | k ∈ N, xi ∈ X, λ1 + · · · + λk = 1, λi ≥ 0}. It is clear from Lemma 24 that any convex set containing X will contain the convex hull of X . The convex hull of any set is, as it’s name suggests, a convex set. A polytope1 is the convex hull of a finite set of points, i.e. P is a polytope if there are k vectors x1 , . . . , xk such that We shall prove the following theorem in the next section. Meanwhile, see Figure 1. (82) P = conv.hull {x1 , . . . , xk } = {λ1 x1 + · · · + λk xk | λ1 + · · · + λk = 1 and λ1 , . . . , λk ≥ 0}. Theorem 25. Let P ⊆ Rn . Then P is a polytope if and only if P is a bounded polyhedron.
1The terminology is from the Greek: poly=many, hedron=side, topos=point.

Figure 1. A bounded polyhedron, or polytope P ⊆ R3
41

42

3. POLYHEDRA

d x −d x

d

An element d of the lineality space An element d of the cone of directions Figure 2. Lineality space and cone of directions 1.3. Lineality space and cone of directions. Let P ⊆ Rn be a convex set. The lineality space of P is the linear space (83) (84) lin(P ) := {d ∈ Rn | {x + λd | λ ∈ R} ⊆ P for all x ∈ P }, dir(P ) := {d ∈ Rn | {x + λd | λ ∈ R, λ ≥ 0} ⊆ P for all x ∈ P }. and the cone of directions of P is See Figure 2. It follows directly from these definitions that lin(P ) ⊆ dir(P ). It is an exercise to show that if P = {x ∈ Rn | Ax ≤ b}, then lin(P ) = {d ∈ Rn | Ad = 0} = ker(A) and dir(P ) = {d ∈ Rn | Ad ≤ 0}. If P is a polytope, then lin(P ) = dir(P ) = {0}. 1.4. Affine hull and dimension. Let x, y ∈ Rn . The line through x and y is the set (85) A set W ⊆ Rn is affine if < x, y >⊆ W for all x, y ∈ W . We say that y ∈ Rn is an affine combination of x1 , . . . , xk if there exist λ1 , . . . , λk ∈ R so that (86) (87) (88) The affine hull of a set X ⊆ Rn is the set of all affine combinations of points in X : aff.hull (X ) := {λ1 x1 + · · · + λk xk | k ∈ N, xi ∈ X, λ1 + · · · + λk = 1, λi ∈ R} aff.hull (X ) = {W | W ⊇ X, W affine}, and one can show that y = λ1 x1 + · · · + λk xk and λ1 + · · · + λk = 1. < x, y >:= {x + λ(y − x) | λ ∈ R}.

i.e. the affine hull of X is the intersection of all affine spaces containing X ; it can furthermore be shown that aff.hull (X ) is an affine space itself (exercise). A finite set {x1 , . . . , xk } ⊆ Rn is affinely independent if for all λ1 , . . . , λk ∈ R we have (89) The dimension of a set X ⊆ Rn is defined as the maximum cardinality of an affinely independent subset, minus 1: dim(X ) := max{|I | | I ⊆ X, I affinely independent } − 1. It can be shown that dim(X ) = dim(aff.hull (X )) for any X ⊆ Rn . λ1 x1 + · · · + λk xk = 0 and λ1 + · · · + λk = 0 ⇒ λ1 = · · · = λk = 0.

1. POLYHEDRA AND POLYTOPES


43

A vertex

An edge Figure 3. Faces of a polytope P ⊆ R3

A facet

1.5. Faces. Let P ⊆ Rn be a convex set, and let c ∈ Rn , d ∈ R. The hyperplane Hc,d := {x ∈ Rn | cx = d} is a supporting hyperplane of P if max{cx | x ∈ P } = d. In geometrical terms: H is a supporting hyperplane of P if all of P is on one side of H , but H touches P . We say that F ⊆ Rn is a face of P if F = P , or F = P ∩ H for some supporting hyperplane H of P ; in the latter case, H is said to define F . Clearly, the face F = P ∩ Hc,d is exactly the set of optimal solutions to max{cx | x ∈ P }. Let P be a polyhedron. A point v ∈ P is a vertex of P if {v } is a face of P . We denote the set of all vertices of P by V (P ). An edge of a polyhedron is a face F such that dim(F ) = 1. A face F is a facet of P if dim(F ) = dim(P ) − 1. See Figure 3. 1.6. Examples of polyhedra and polytopes. Sn := conv.hull {e1 , . . . , en+1 }. (1) The n-simplex Sn is the convex hull of the n + 1 standard basis vectors in Rn+1 . The dimension of Sn is n. We have Sn = {x ∈ Rn+1 | x ≥ 0, x1 + · · · + xn+1 = 1}. (2) The n-cube Qn is the convex hull of all possible vectors with each entry ±1: We have Qn = {x ∈ Rn | −1 ≤ xi ≤ 1 for i = 1, . . . , n}. The dimension of Qn is n. (3) The n-orthoplex On is the convex hull of ± all standard basis vectors in Rn . We have On = {x ∈ Rn | cx ≤ 1 for all c ∈ {−1, 1}n }. The dimension of On is n. (4) The 5 platonic solids in R3 are polyhedra. The tetrahedron is isomorphic to S3 , the hexahedron or cube is Q3 , and the octahedron or cross polytope is O3 . The other two are the dodecahedron (20 vertices, 30 edges, 12 pentagonal facets) and the icosahedron (12 vertices, 30 edges, 20 triangular facets). (5) There are 6 special polyhedra in R4 (polychora2 ), the pentachoron which is isomorphic to S4 , the octachoron or tesseract Q4 , and the cross polychoron O4 , the 24-cell whose 24 facets are all isomorphic to the octahedron, the 120-cell whose 120 facets are all isomorphic to the dodecahedron, and the 600-cell whose 600 facets are tetrahedra.
2A ‘polychoron’ is a 4-dimensional polyhedron. Greek: choros=space, room.

Qn := conv.hull {−1, 1}n .

On := conv.hull {e1 , −e1 , . . . , en , −en }.

44

3. POLYHEDRA

2. Faces, vertices and facets 2.1. Faces of polyhedra. The aim of this section is to show one direction of Theorem 25, that a bounded polyhedron is a polytope; precisely, we show that a bounded polyhedron is always the convex hull of its set of vertices. Lemma 26. Let A be an m × n matrix, let b ∈ Rn and let P := {x ∈ Rn | Ax ≤ b}. Let a1 , . . . , am be the rows of A. Then the following are equivalent for a nonempty set F ⊆ P : (1) F is a face of P ; and (2) F = {x ∈ P | ai x = bi for all i ∈ J } for some J ⊆ {1, . . . , m}. Proof. Let F be a face of P . Then F is the set of optimal solutions of max{cx | Ax ≤ b} for some c ∈ Rn . Let y be an optimal solution of the dual problem min{yb | yA = c, y ≥ 0}. By complementary slackness, x is an optimal solution of max{cx | Ax ≤ b} if and only if x is feasible and yi > 0 ⇒ ai x = bi for all i. In other words, F = {x ∈ P | ai x = bi for all i ∈ J } with J := {i ∈ {1, . . . , m} | yi > 0}. Now assume that F = {x ∈ P | ai x = bi for all i ∈ J } = ∅ for some J ⊆ {1, . . . , m}. Let c := i∈J ai and d := i∈J bi . Then (90) cx =
i∈J

ai x ≤

bi = d
i∈J

for all x ∈ P , with equality if and only if ai x = bi for all i ∈ J , i.e. if and only if x ∈ F . Since F = ∅, it follows that F is the set of optimal solutions of max{cx | Ax ≤ b}. It follows directly that a polyhedron has only finitely many faces, as there finitely many subsets J of {1, . . . , m}. In particular, a polyhedron has only finitely many vertices, and any face of a polyhedron is again a polyhedron. Lemma 27. Let P ⊆ Rn be a polyhedron and let c ∈ Rn . If max{cx | x ∈ P } is feasible and bounded, and lin(P ) = {0}, then max{cx | x ∈ P } is attained by a vertex of P .

Proof. Let A and b be such that P = {x ∈ Rn | Ax ≤ b}, and let a1 , . . . , am be the rows of A. Then ker(A) = lin(P ) = {0}, hence rank(A) = n. By Theorem 22 there exists a set B ⊆ {1, . . . , m}, so that F := {x ∈ Rn | ai x = bi for all i ∈ B } is a set of optimal solutions of max{cx | x ∈ P }. In particular, all x ∈ F are feasible, so that F ⊆ P . By Lemma 26, F is a face of P . Since ker(A) = lin(P ) = {0}, we have rank(A) = n and hence F = {x ˆ} for some x ˆ. Then x ˆ is a vertex of P and an optimal solution of max{cx | x ∈ P }. We say that a polyhedron is pointed if it has at least one vertex. One can show using the above Lemma that a polyhedron P is pointed if and only if lin(P ) = {0}. Theorem 28. If P ⊆ Rn is a bounded polyhedron, then P = conv.hull V (P ). Proof. We may assume that P is nonempty. Since V (P ) ⊆ P , it follows directly that conv.hull (V (P )) ⊆ P . We will show that P ⊆ conv.hull V (P ) as well. Suppose this is not true, and let x ˆ ∈ P \ conv.hull V (P ). By Theorem 4, there is a vector c ∈ Rn and a d ∈ R such that the hyperplane Hc,d separates conv.hull V (P ) and x ˆ, say cx < d for all x ∈ conv.hull V (P ) and cx ˆ > d. As P is bounded, we know that lin(P ) = {0}, and that max{cx | x ∈ P } is attained. Then (91) max{cx | x ∈ V (P )} ≤ max{cx | x ∈ conv.hull V (P )} < d < cx ˆ ≤ max{cx | x ∈ P }. But by Lemma 27 there is a vertex of P attaining the latter maximum, a contradiction.

2. FACES, VERTICES AND FACETS

45

2.2. Faces of polytopes. We will now complete the proof of Theorem 25, and show that a polytope is always a bounded polyhedron, i.e. that for any set of points x1 , . . . , xk ∈ Rn there is a system of finitely many linear inequalities Ax ≤ b so that conv.hull {x1 , . . . , xk } = {x ∈ Rn | Ax ≤ b}. Let X ⊆ Rn . We say that an inequality cx ≤ d is valid for X if cx ≤ d holds for all x ∈ X . For the polyhedral description of our polytope P = conv.hull {x1 , . . . , xk }, we need a finite collection of inequalities that are valid for P . Clearly cx ≤ d is valid for P if and only if cx ≤ d is valid for {x1 , . . . , xk } (exercise). Since a polytope P is both convex and compact, the Separation Theorem (Theorem 4) implies that the set of all valid inequalities is an adequate description of P : (92) P = {x ∈ Rn | cx ≤ d for all c, d so that cxi ≤ d for all i}.

But this is an infinite set of linear inequalities; so we must argue that most of these valid inequalities are superfluous for the description of P . We will show that the only valid inequalities cx ≤ d we really need for the description of P are such that P ∩ Hc,d is a facet; we call such inequalities facet-defining or essential. Lemma 29. Let y, x1 , . . . , xk ∈ Rn , and let P := conv.hull {x1 , . . . , xk }. Suppose that dim(P ) = n. If y ∈ P , then there is a vector c ∈ Rn and a d ∈ R such that cxi ≤ d for i = 1, . . . , k, cy > d, and P ∩ Hc,d is a facet of P . Proof. Using that y ∈ conv.hull {x1 , . . . , xk } if and only if (93) y 1 ∈ cone { x1 1 ,..., xk 1 },

and that {x1 , . . . , xk } is affinely independent if and only if {

x1 xk ,..., } is linearly 1 1 independent, it follows from Theorem 12 that if y ∈ conv.hull {x1 , . . . , xk }, then there is an affinely independent set Y ⊆ {x1 , . . . , xk }, a vector c ∈ Rn and a d ∈ R such that (1) cxi ≤ d for i = 1, . . . , k, cy > d, and (2) cx = d for all x ∈ Y , and |Y | = dim{y, x1 , . . . , xk }. Let F := P ∩ Hc,d. To show that F is a facet of P , it suffices to show that dim(F ) = n − 1, as dim(P ) = n by assumption. Observe that (94) Y ⊆ F ⊆ Hc,d,

so that dim(Y ) ≤ dim(F ) ≤ dim(Hc,d). Since Y is affinely independent, we have dim(Y ) = |Y | − 1 = dim{y, x1 , . . . , xk } − 1 = n − 1, and clearly dim(Hc,d ) = n − 1. Theorem 30. Let x1 , . . . , xk ∈ Rn , and let P := conv.hull {x1 , . . . , xk }. Let Ax ≤ b be a system of valid linear inequalities for P so that for each facet F of P there is a row ai x ≤ bi of Ax ≤ b such that F = P ∩ Hai ,bi . Let Cx = d be a system of linear equations such that aff.hull (P ) = {x ∈ Rn | Cx = d}. Then P = {x ∈ Rn | Ax ≤ b, Cx = d}. Rn Proof. Let us first assume that dim(P ) = n, so aff.hull (P ) = Rn . Clearly P ⊆ {x ∈ | Ax ≤ b}; we need to show that P ⊇ {x ∈ Rn | Ax ≤ b}. Suppose y ∈ P . By Lemma 29, there is a valid inequality cx ≤ d for P with cy > d, such that F := P ∩ Hc,d is a facet of P . Let ai x ≤ bi be the valid inequality from Ax ≤ b such that F = P ∩ Hai ,bi . Then Hai ,bi = aff.hull (F ) = Hc,d. It follows that ai y > bi , and hence y ∈ {x ∈ Rn | Ax ≤ b}. The case where dim(P ) < n is left to the reader.

46

3. POLYHEDRA

3. Polyhedral cones 3.1. Polyhedral cones and finitely generated cones. We say that a cone C ⊆ Rn is polyhedral if there is an m × n matrix A such that C = {x ∈ Rn | Ax ≥ 0}. A cone C ⊆ Rn is finitely generated if there are vectors a1 , . . . , am ∈ Rn such that C = cone {a1 , . . . , am } — then a1 , . . . , am are generators of C . We have: Lemma 31. Let a1 , . . . , am . Then (95)
t ∗ C = {x ∈ Rn | at 1 x ≥ 0, . . . , am x ≥ 0} ⇒ C = cone {a1 , . . . , am }.

t ∗ Proof. Let C = {x ∈ Rn | at 1 x ≥ 0, . . . , am x ≥ 0}. It is easy to see that then C ⊇ ∗ cone {a1 , . . . , am }. Suppose that y ∈ C \ cone {a1 , . . . , am }. By Farkas’ Lemma, there exists t ∗ t an x ∈ Rn such that at i x ≥ 0 for all i and y x < 0. Then x ∈ C , y ∈ C , and y x < 0, ∗ contradicting the definition of polar cone. So C = cone {a1 , . . . , am }.

In other words, if a cone is polyhedral, then its polar is finitely generated. This saves us half the work in proving the next theorem. The ‘if’ part is due to Hermann Weyl (1935) and the ‘only if’ part is due to Hermann Minkowski (1896). Theorem 32. Let C be a cone. Then C is polyhedral if and only if C is finitely generated. Proof. We first show that if C is finitely generated, then C is polyhedral. Let C = cone {a1 , . . . , am } for certain vectors a1 , . . . , am ∈ Rn . We distinguish two cases, the case that rank({a1 , . . . , am }) = n and the case that rank({a1 , . . . , am }) < n. We consider first the case where rank({a1 , . . . , am }) = n; then lin.hull {a1 , . . . , am } = Rn . Call a row vector d ∈ Rn essential if there is a linearly independent set Y ⊆ {a1 , . . . , am } such that (1) dai ≥ 0 for all i, and (2) da = 0 for all a ∈ Y , and |Y | = n − 1,

and such that d = 1. There are at most 2 such d’s for each subset Y . Since there are finitely many subsets Y of {a1 , . . . , am }, it follows that there are finitely many essential vectors. Thus D := {x ∈ Rn | dx ≥ 0, d essential} is a polyhedral cone. We claim that C = D . Clearly C ⊆ D , since dai ≥ 0 for all essential d and all i. If x ∈ C , then there is an essential d such that dx < 0 by Theorem 12, hence x ∈ D . It follows that C = D . Now suppose that rank({a1 , . . . , am }) < n. Let L := lin.hull {a1 , . . . , am }, and let b1 , . . . , bk be a set of linearly independent vectors such that L = {x ∈ Rn | bt i x = 0, i = 1, . . . , k }. Then rank({a1 , . . . , am , b1 , . . . , bk }) = n, and hence the cone C ′ = cone {a1 , . . . , am , b1 , . . . , bk } is polyhedral. Since C = {x ∈ C ′ | bt i x = 0, i = 1, . . . , k}, the cone C is polyhedral as well. To see that each polyhedral cone is finitely generated, note that if C is polyhedral, then ∗ C is finitely generated (by the Lemma), and hence C ∗ is polyhedral (by the above argument), hence C ∗∗ = C is finitely generated (by the Lemma applied to C ∗ ). Theorem 25 is a corollary to this theorem. Corollary 32.1. Let P ⊆ Rn . Then P is a bounded polyhedron if and only P is a polytope. Proof. We first show that any bounded polyhdron is a polytope. Let P be a bounded polyhedron. Then P = {x ∈ Rn | Ax ≤ b} for some m × n matrix A and vector b ∈ Rm . The cone C := {x′ ∈ Rn+1 | A −b x′ ≤ 0} is polyhedral, and we have (96) P = {x ∈ Rn | x 1 ∈ C }.

EXERCISES

47

d ∈ C , then Ad ≤ 0 and hence d ∈ dir(P ). Since we assumed that P is 0 bounded, there are no such d other than d = 0. By the theorem, there are a1 , . . . , ak ∈ Rn+1 such that C = cone {a1 , . . . , ak }. As ai ∈ C for all i, the last coordinate of each ai is nonzero. xi 1 ′ for i = 1, . . . , k, and we have Let a′ i := ain+1 ai . Then we may write ai = 1 Note that if (97) C = cone { x1 1 ,..., xk 1 }.

It follows that P = conv.hull {x1 , . . . , xk }. We now show that each polytope is a bounded polyhedron. Let P be a polytope. Then P = conv.hull {x1 , . . . , xk } for some x1 , . . . , xk ∈ Rn . The cone (98) C := cone { x1 1 ,..., xk 1 }

is finitely generated, and we have (99) P = {x ∈ Rn | x 1 ∈ C }.

By the theorem C is polyhedral, so there is a (n + 1) × m matrix A′ such that C = {x′ ∈ Rn+1 | A′ x ≤ 0}. Writing A′ = A −b , it follows that P = {x ∈ Rn | Ax ≤ b}. Exercises (1) Show that if C ⊆ Rn is convex and y ∈ Rn is a convex combination of x1 , . . . , xk ∈ C , then y ∈ C . (2) Show that the convex hull of any set is convex. (3) Show that the lineality space of a convex set is a linear space. (4) Show that if P = {x ∈ Rn | Ax ≤ b} = ∅, the following sets are equal: (a) {d ∈ Rn | {x + λd | λ ∈ R} ⊆ P for all x ∈ P }(=: lin(P )). (b) {d ∈ Rn | there is an x ∈ P such that {x + λd | λ ∈ R} ⊆ P }. (c) {d ∈ Rn | Ad = 0}. Hint: prove (a) ⊆ (b) ⊆ (c) ⊆ (a). (5) Show that the cone of directions of a convex set is a cone. (6) Show that if P = {x ∈ Rn | Ax ≤ b} = ∅, the following sets are equal: (a) {d ∈ Rn | {x + λd | λ ≥ 0, λ ∈ R} ⊆ P for all x ∈ P }(=: dir(P )). (b) {d ∈ Rn | there is an x ∈ P such that {x + λd | λ ≥ 0, λ ∈ R} ⊆ P }. (c) {d ∈ Rn | Ad ≤ 0} Hint: prove (a) ⊆ (b) ⊆ (c) ⊆ (a). (7) Show that if P is a polytope, then P = conv.hull V (P ). (8) Prove Radon’s Theorem: If X is a set of at least n + 2 vectors in Rn , then there exists a partition X1 , X2 of X (i.e. X1 ∪ X2 = X and X1 ∩ X2 = ∅) so that conv.hull (X1 ) ∩ conv.hull (X2 ) = ∅. (9) Show that the following are equivalent for a set W ⊆ Rn . (a) W is affine, i.e. < x, y >⊆ W for all x, y ∈ W . (b) W = p + L for some p ∈ Rn and linear space L ⊆ Rn . (c) W = {x ∈ Rn | Ax = b} for some m × n matrix A and vector b ∈ Rn . Prove that if W satisfies one, and hence each of the above, then dim(W ) = dim(L) = n − rank(A).

48

3. POLYHEDRA

(10) Let X ⊆ Rn . (a) Show that aff.hull (X ) = {W | W ⊇ X, W affine}. (b) Show that aff.hull (X ) is affine. (c) Show that dim(X ) = dim(aff.hull (X )). (11) Show that a finite set X ⊆ Rn is affinely independent if and only if the associated set x | x ∈ X } is linearly independent in Rn+1 . { 1 (12) Let X ⊆ Rn , and let x ∈ Rn . Prove: if x ∈ conv.hull (X ), then there exists an affinely independent set Y ⊆ X so that x ∈ conv.hull (Y ). (13) Prove Barany’s Theorem: Let X1 , . . . , Xn+1 be finite subsets of Rn so that 0 ∈ conv.hull (Xi ) for all i. Then there exist x1 ∈ X1 , x2 ∈ X2 , . . . , xn+1 ∈ Xn+1 so that 0 ∈ conv.hull {x1 , . . . , xn+1 }. (14) Prove that a polytope of dimension n has at least n + 1 vertices. (15) Let P be a polyhedron and F a face of P . Prove that F is a facet of P if and only if the only face of P that properly contains F is P . (16) Let P be a polyhedron and F a face of P . Prove that F is a minimal face (i.e. F does not properly contain another face of P ) if and only if F = p + lin(P ) for some vector p. (17) Prove the following statement: Let A be an m × n matrix, let b ∈ Rn and let P := {x ∈ Rn | Ax ≤ b}. Let a1 , . . . , am be the rows of A. Then the following are equivalent for a point v ∈ P : (a) v is a vertex of P ; and (b) {v } = {x ∈ P | ai x = bi for all i ∈ B } for some B ⊆ {1, . . . , m} such that {ai | i ∈ B } is a basis of Rn . Deduce that a polyhedron defined by m inequalities has no more than m choose n vertices. (18) We say that a polyhedron P is pointed if it has at least one vertex. Show that P is pointed if and only if lin(P ) = {0}. (19) Let C be a closed convex set and let x lie on the boundary of C . Show that there is a face of C with dim(F ) < dim(C ) containing x. (20) Let P ⊆ Rn be a polytope with dim(P ) = n. Show: (a) ∂P = {F | F is a facet of P }, where ∂P denotes the boundary of P . (b) if y ∈ P there are c ∈ Rn , d ∈ R such that cx ≤ d for all x ∈ P , cy > d, and P ∩ Hc,d is a facet of P . (21) Let C be a convex set. A point v ∈ C is extreme if there are no x, x′ ∈ C \ {v } so that v ∈ [x, x′ ]. Show: (a) If C is a compact convex set, then C is the convex hull of its extreme points. (b) If P is a polyhedron and v ∈ P , then v is extreme if and only if v is a vertex of P. Conclude that a bounded polyhedron is the convex hull of it’s vertices. (22) Let C be a closed convex set. A subset F ⊆ C is extreme if for all v ∈ F and x, x′ ∈ C \ {v } such that v ∈ [x, x′ ], we have x, x′ ∈ F . Show that F is an extreme set of C if and only if F is a face of C . (23) Verify that (a) Sn = {x ∈ Rn+1 | x ≥ 0, x1 + · · · + xn+1 = 1}. (b) Qn = {x ∈ Rn | −1 ≤ xi ≤ 1 for i = 1, . . . , n}. (c) On = {x ∈ Rn | cx ≤ 1 for all c ∈ {−1, 1}n }.

EXERCISES

49

Hint: that the polytope is included in the polyhedron is easy to verify; to show the converse, either prove that all vertices of the polyhedron occur in the definition of the polytope, or prove that all essential inequalities of the polytope are in the definition of the polyhedron. (24) Let x1 , . . . , xk ∈ Rn and P := conv.hull {x1 , . . . , xk }. Show that V (P ) ⊆ {x1 , . . . , xk } and P = conv.hull V (P ). i (25) Prove Euler’s formula: for any polytope P ⊆ Rn , we have n i=0 (−1) fi (P ) = 1 where fi (P ) is the number of i-dimensional faces of P . (26) A regular polytope may be defined recursively as a polytope whose vertices lie on a sphere, whose edges are all of the same length and whose facets are all isomorphic to the same regular polytope. Each of the examples in subsection 1.6 of this Chapter is a regular polytope. (a) Which are the 2-dimensional regular polytopes? (b) Prove that there are no other regular polytopes of dimension 3 than the 5 Platonic solids. Hint: use Euler’s formula and your classification of the 2-dimensional regular polytopes. (c) Prove that there are no other regular polytopes of dimension 4 than the 6 polychora described in subsection 1.6. (d) Prove that if P is a regular polytope of dimension n > 4, then P = Sn , Qn , or On . (e) A polytope P is vertex-transitive if for any two vertices v, w ∈ V (P ) there is an orthogonal transformation L : Rn → Rn such that L(v ) = w and L[P ] = P . Are all regular polytopes transitive? Open problem. Let P be a polytope, and let v, w ∈ V (P ) be two vertices of P . A path from v to w is a sequence of vertices v 0 , . . . , v k ∈ V (P ), such that [v i , v i+1 ] is an edge of P for each i = 0, . . . , k − 1, and v = v 0 , w = v k ; the length of that path P is k, i.e. the length of a path equals the number of edges it traverses. Conjecture 1 (Hirsch, 1957). Let P be an n-dimensional polytope with m facets, and let v, w ∈ V (P ) be two vertices of P . Then there is a path from v to w of length at most m − n.

CHAPTER 4

The simplex algorithm
1. Tableaux and pivoting 1.1. Overview. Several methods for solving linear optimization problems currently exist. There is the Simplex method due to George Dantzig (1951), the Ellipsoid method of Leonid Khachiyan (1979), and Narendra Karmarkar’s Interior point method (1984). Theoretically, the latter two are superior methods; it can be shown that they run in polynomial time which means that the running time of each algorithm on a theoretical computer is bounded by a polynomial function of the size of the input problem in bits. The simplex method does not have this virtue. In practice, both the simplex method and the interior point method perform well. The ellipsoid method requires very high precision calculations, and is slow in practice. In this Chapter we describe the simplex method, which may be compared to the Gaussian elimination method for solving systems of linear equations. Gaussian elimination is a method to modify a given system of linear equations into an equivalent system of linear equations which is easily solvable. The simplex method is a technique to modify a given linear optimization problem into an equivalent one for which an optimal solution is easily found. Gaussian elimination on Ax = b is usually performed by applying row operations on the coefficient matrix A b . Likewise, the simplex method is performed by row operations on a ‘tableau’ containing all the coefficients of the linear optimization problem at hand. These row operations are grouped in so-called pivot steps. After each pivot step, the tableau is ‘basic’, and it is easy to read off a ‘basic’ feasible solution of the optimization problem from the tableau. These solutions are vertices of the feasible region of the problem, and the sequence of basic solutions generated by the method is a path from vertex to vertex over the edges of the feasible region (see figure 1). Pivot steps are repeated until an optimal basic solution is found.

11 00 00 11 00 11 11 00 00 11 00 11

v4

11 00 00 11 00 11
1 v 00 11

v3

v2

c
00 11 0 v 00 11 00 11 11 00 00 11

Figure 1. The simplex method geometrically
51

52

4. THE SIMPLEX ALGORITHM

1.2. Tableaux. Let A be an m × n matrix, let b ∈ Rm be a column vector, let c ∈ Rn be a row vector and let d ∈ R. The tableau corresponding to the linear optimization problem max{cx + d | Ax = b, x ≥ 0} is the (m + 1) × (n + 1) block matrix (100) −c d . A b

Note the minus sign before the c in the top row. By convention, the rows of tableau (100) are indexed 0, 1, . . . , m and the columns 1, . . . , n, n + 1; that way, the ij -th entry of A is the ij -th entry of T . We say that two tableaux (101) T = −c′ d′ −c d and T ′ = A b A′ b′

are equivalent if the corresponding problems (102) have the same set of feasible solutions and the same set of optimal solutions. Recall that row operations on a matrix are (1) multiplying a row by a nonzero scalar; (2) interchanging two rows; and (3) adding a multiple of a row to another row. max{cx + d | Ax = b, x ≥ 0} and max{c′ x + d′ | A′ x = b′ , x ≥ 0}

Lemma 33. Let T and T ′ be tableaux. Suppose T ′ is obtained from T by any row operations other than multiplying the top row by a scalar, interchanging the top row with another, or adding a multiple of the top row to another row. Then T and T ′ are equivalent. Proof. Say T and T ′ are as in (101), with corresponding problems (102). If T ′ is obtained from T by a row operation not involving the top row at all, then {x ∈ Rn | Ax = b} = {x ∈ Rn | A′ x = b′ } and then clearly the optimization problems have the same set of feasible solutions. If T ′ is obtained from T by adding a scalar multiple of the i-th row to the top row, say −c′ = −c + λi ai and d′ = d + λi bi , then (103) for all x such that Ax = b. Thus, the objective functions cx + d and c′ x + d′ are identical on the common feasible set of T and T ′ , and hence the set of optimal solutions is the same for either objective function. So we have a certain freedom in changing the appearance of a linear optimization problem: we may move to an equivalent problem by row operations. We must now decide how to use this freedom. 1.3. Basic solutions and basic tableaux. Suppose we have a tableau of the form (104)
t

c′ x + d′ = (c − λi ai )x + (d + λi bi ) = cx + d + λi (−ai x + bi ) = cx + d

T =

−c′ 0 d . A′ I b

Then x∗ := 0 bt is a solution to the system of linear equations [ A′ I ]x = b. If b ≥ 0, ∗ then x ≥ 0, and then x∗ is a feasible solution of the optimization problem (105) max{ [ c′ 0 ]x + d | [ A′ I ]x = b, x ≥ 0, x ∈ Rn }
t

corresponding to T . Also, [ c′ 0 ]x∗ + d = d as [ c′ 0 ] 0 bt = c′ 0 + 0b = 0. Thus if c′ ≤ 0, then [ c′ 0 ]x + d ≤ d for all x ≥ 0, and then x∗ is an optimal solution.

1. TABLEAUX AND PIVOTING

53

All this is a good reason to favour tableaux of the form (104) where b ≥ 0: from such tableaux we can read off a feasible solution x∗ with objective value d easily; a solution which we can recognize as optimal if c′ ≤ 0. But (104) is not the only tableau with these useful properties: clearly, it doesn’t hurt if the columns of the identity matrix in T are scattered throughout the tableau. Hence the following definitions. Let A be an m × n matrix and let aj be the j -th column of A, let b ∈ Rm be a column vector, let c ∈ Rn be a row vector, let d ∈ R and let B ⊆ {1, . . . , n} be a set with |B | = m. The tableau (106) −c d A b

is a basic tableau belonging to basis B if {aj | j ∈ B } = {e1 , . . . , em } and cj = 0 for all j ∈ B . It is easy to determine a vector x ∈ Rn such that Ax = b and xj = 0 for all j ∈ B ; this is xB , the basic solution belonging to B . This basic solution xB has entries 0 and bi only, like the x∗ above, and its objective value cxB + d equals d, since cxB = 0. The tableau belonging to B is called feasible if b ≥ 0, dual feasible if c ≤ 0 and optimal if both feasible and dual feasible. The corresponding basic solution xB is feasible/optimal if the tableau is feasible/optimal. −c d and set B such that {aj | j ∈ B } is a basis of Rm , there is Given any tableau T = A b a unique basic tableau T ′ corresponding to basis B that is equivalent to T (exercise). 1.4. Pivoting. Given a matrix, pivoting on the ij -th entry is adding multiples of the i-th row to other rows and dividing the i-th row so that the j -th column becomes a unit vector, with a 1 in the i-th row and 0 in the rest of the column. Let T be a basic tableau belonging to basis B . If i∗ = 0 and j ∗ ∈ B , then pivoting on ∗ (i , j ∗ ) yields an equivalent basic tableau T ∗ corresponding to basis B ∗ = B ∪ {j ∗ } \ {j0 }, where j0 is the unique j ∈ B such that the i∗ j -th entry of T is 1. We call the j ∗ -th column the entering column; the j0 -th column is the leaving column. Suppose (107) T = −c∗ d∗ −c d , T∗ = . A b A∗ b∗

∗ ∗ Let aj be the j -th column of A and a∗ j the j -th column of A . Then aj0 = ei∗ = aj ∗ . Pivoting on (i∗ , j ∗ ) amounts to applying the following row operations.

(1) dividing the i∗ -th row by ai∗ j ∗ ; (2) adding the i∗ -th row cj ∗ times to the the top (or 0-th) row; and (3) subtracting the i∗ -th row aij ∗ times from the i-th row, for each i = 0, i∗ . The resulting tableau T ∗ is equivalent to T by Lemma 33. We deduce that after this pivot on (i∗ , j ∗ ), the entries of the resulting tableau T ∗ satisfy: (1) d∗ = d + bi∗ cj ∗ /ai∗ j ∗ ; ∗ ∗ (2) b∗ i∗ = bi∗ /ai∗ j ∗ , and bi = bi − bi∗ aij ∗ /ai∗ j ∗ for i = i ; and ∗ ∗ (3) c∗ j = cj − cj ∗ ai∗ j /ai∗ j ∗ if j = j , and of course cj ∗ = 0.

To improve a basic feasible tableau, we will pivot to obtain a better tableau. The above analysis will help us to decide how to choose the pivot elements (i∗ , j ∗ ). 1.5. Pivot selection. Suppose we are given a basic and feasible tableau T , and want to obtain, by pivoting on some (i∗ , j ∗ ), another feasible basic tableau T ∗ whose objective value is hopefully better, but certainly not worse. With T, T ∗ as in (107), we have b ≥ 0 as T is

54

4. THE SIMPLEX ALGORITHM

feasible, and we must choose i∗ , j ∗ such that b∗ ≥ 0 and d∗ ≥ d, i.e.
∗ ∗ ∗ (108) 0 ≤ b∗ i∗ = bi∗ /ai∗ j ∗ , 0 ≤ bi = bi − bi∗ aij ∗ /ai∗ j ∗ for i = i , and d ≤ d = d + bi∗ cj ∗ /ai∗ j ∗ .

A straightforward analysis shows that to achieve this, we must choose (1) j ∗ such that cj ∗ > 0; and (2) i∗ such that ai∗ j ∗ > 0 and bi∗ /ai∗ j ∗ = min{bi /aij ∗ | i such that aij ∗ > 0}.

There are two exceptional events. If there is no j such that cj > 0, then the tableau T is optimal and the corresponding basic solution is an optimal solution. If there does not exist an i such that aij ∗ > 0, then there exists an unbounded direction, i.e. a nonnegative vector f such that Af = 0 and cf > 0, which implies that the problem corresponding to T is unbounded (exercise). In either case, we have solved the optimization problem corresponding to T . Whenever bi∗ > 0, we will have d∗ > d, i.e. the basic solution corresponding to T ∗ has a strictly better objective value than the basic solution corresponding to T . Obviously, we can repeat such pivoting steps until we either find an optimal or an unbounded tableau, thereby solving the problem we started with. However, we need an initial feasible basic tableaux for the given optimization problem to start this procedure. We show how to find such a feasible tableau after the next example. Example. Consider the basic and feasible tableau (109) −6 −8 −5 −9 0 0 0 2 1 1 3 1 0 5 1 3 1 2 0 1 3

corresponding to basis B (1) = {5, 6}. We apply the improvement algorithm. We choose j ∗ = 1, which fixes i∗ = 1, hence j0 = 5. We pivot on (1, 1) and obtain (110) 0 −5 −2 0 3 0 15 1 1 1 1 1 1 1 2 2 2 2 0 22 , 1 1 1 1 1 0 22 2 2 −2 1 2

a basic tableau corresponding to B (2) = {1, 6} = B (1) \ {5} ∪ {1}. In the next iteration of the improvement algorithm, we choose j ∗ = 2, i∗ = 2 and j0 = 6. Pivoting on (2, 2) results in (111) 0 0 −1 1 2 2 16 2 2 2 3 1 1 0 1 − 5 5 5 5 25 , 1 1 1 1 2 0 1 5 5 −5 5 5

a basic tableau corresponding to B (3) = {1, 2} = B (2) \{6}∪ {2}. The next step: j ∗ = 3, i∗ = 2, and j0 = 2. Pivoting on (2, 3) we get (112) 0 5 0 2 1 4 17 1 −2 0 1 1 −1 2 , 0 5 1 1 −1 2 1

ˆ = {1, 3} = B (3) \ {2} ∪ {3}. This tableau is optimal, as the a tableau belonging to basis B top row contains only nonnegative elements to the left of the vertical line. The basic solution ˆ ˆ ˆ is xB belonging to B = (2, 0, 1, 0, 0, 0)t , with objective value cxB = 17, where c = (6, 8, 5, 9, 0, 0) is the original objective.

1. TABLEAUX AND PIVOTING

55

1.6. Finding an initial basic feasible tableau. Suppose we are given a linear optimization problem of the form (113) max{cx | Ax ≤ b, x ≥ 0, x ∈ Rn }. x s x s x s x s

Then we may rewrite this problem to the standard form (114) max{ [ c 0 ] | [ A I ] = b, ≥ 0, ∈ Rn+m }.

suitable for the tableau method. If b ≥ 0, we immediately have the basic feasible tableau (115) −c 0 0 , A I b

corresponding to the basis {n + 1, . . . , n + m}, where n and m are the number of columns and rows of A. For example, the problem (116) max{(6, 8, 5, 9)x | 2 1 1 3 1 3 1 2 x≤ 5 3 , x ≥ 0, x ∈ R4 }

gives rise to the initial basic tableau (109). It follows that an optimal solution of (116) is (2, 0, 1, 0)t , and the optimum is 17. If b ≥ 0, then we consider the auxiliary problem (117) max{−y | Ax − y 1 ≤ b, x ≥ 0, y ≥ 0, x ∈ Rn , y ∈ R}. The optimum of this problem is 0 if and only if the original problem (113) is feasible, and then an optimal basic solution is a feasible basic solution of (113). To solve (117), we need to find a feasible and basic initial tableau for (117). Such a tableau is obtained from the tableau (118) 0 0 1 0 , A I −1 b

corresponding directly to (117) by pivoting on (i∗ , j ∗ ), where j ∗ is the index of the ‘−1’-column (j ∗ = n + m + 1 if A is an m × n matrix) and i∗ is such that bi∗ = min{bi | i ∈ {1, . . . , m}} (exercise). Suppose the optimal tableau for the auxiliary problem is (119) −c′ p d′ , A′ q b′

˜ . If d′ = 0 we can make sure that n + m + 1 ∈ B ˜ by one more pivot if corresponding to basis B necessary. Then the tableau (120) −c 0 , A′ b′

˜ such that cj = 0. A basic is equivalent to (115) but not yet basic, as there may be j ∈ B ˜ and equivalent to (115) can be obtained by adding rows feasible tableau corresponding to B to the top row in (120). This is the feasible initial tableau we need to start looking for the optimal solution of of our original problem (113).

56

4. THE SIMPLEX ALGORITHM

Example. Consider the problem (121) max{(−3, 2, 6, 13)x | −1 1 2 5 3 −2 −5 −13 x≤ −1 5 , x ≥ 0, x ∈ R4 }.

We write down the tableau corresponding to the auxiliary problem: (122) 0 0 0 0 0 0 1 0 −1 1 2 5 1 0 −1 −1 . 3 −2 −5 −13 0 1 −1 5

Column 7 corresponds to the auxiliary variable. The minimum entry in the rightmost column is the −1 in row 1. We pivot on (1, 7) and obtain (123) −1 1 2 5 1 0 0 −1 1 −1 −2 −5 −1 0 1 1 , 4 −3 −7 −18 −1 1 0 6

the initial feasible tableau for the auxiliary problem, which corresponds to the basis {6, 7}. We pivot on (1, 1) and get (124) 0 0 0 0 0 0 1 0 1 −1 −2 −5 −1 0 1 1 , 0 1 1 2 3 1 −4 2

˜ := {1, 6}. This tableau is optimal and 7 ∈ B ˜ , thus B ˜ is a feasible corresponding to basis B ˜ basis for the original problem. To get the tableau for the original problem corresponding to B we delete the 7th column and replace the top row by (minus) the objective of (121): (125) 3 −2 −6 −13 0 0 0 1 −1 −2 −5 −1 0 1 . 0 1 1 2 3 1 2

˜ as there are j ∈ B ˜ such that the j -th entry This is not yet a basic tableau corresponding to B ˜ . Adding −3 in the top row is nonzero. In this case, there is only a problem for j = 1 ∈ B times row 1 to the top row we obtain: (126) 0 1 0 2 3 0 −3 1 −1 −2 −5 −1 0 1 , 2 0 1 1 2 3 1

˜ . This is the initial feasible tableau for (121). By a feasible basic tableau corresponding to B sheer luck, this tableau is also optimal. The optimal basic solution is (1, 0, 0, 0, 0, 2)t . The optimal solution of (121) is (1, 0, 0, 0)t , with objective value −3. 2. Cycling and Bland’s rule 2.1. Cycling and pivot rules. We have not yet shown that the simplex method is a finite method, i.e. that the optimal tableau is reached in finitely many pivot steps starting from an initial feasible tableau. In fact, it is possible that the simplex method as described above does not finish, but there are methods to avoid this unwanted behaviour. In particular, by being more careful when choosing pivot elements, one can guarantee that the simplex method will finish in a finite number of pivot steps. When applying the simplex algorithm starting from a feasible tableau, we get a sequence of feasible basic tableaux T (1) , T (2) , T (3) , . . . corresponding to bases B (1) , B (2) , B (3) , . . . say. There are only finitely many bases as there are only finitely many finite subsets of columns.

2. CYCLING AND BLAND’S RULE

57

If the simplex method would run indefinitely without reaching the optimal solution, then it must happen that some basis is repeated in the sequence, i.e. that B (s) = B (t) for some distinct s, t; this is called cycling. Since the objective value of the consequtive basic solutions is nondecreasing in the simplex algorithm, cycling can only occur when there is no improvement (t ) (s) in the objective function going from xB to xB . 2.2. Bland’s rule. When applying the improvement procedure for basic feasible tableaux, there is some freedom in choosing the pivot elements i∗ , j ∗ . Using the notation of the subsection on pivot selection, we must choose (1) j ∗ ∈ {j | cj > 0}; and (2) i∗ ∈ {i | aij ∗ > 0, bi /aij ∗ = µj ∗ }, where µj ∗ := min{bi /aij ∗ | i such that aij ∗ > 0}.

A pivot rule is a rule for selecting i∗ and j ∗ within these candidate sets. Recall that choosing i∗ fixes j0 as the unique j ∈ B such that aj = ei∗ . Bland’s rule is: choose j ∗ as small as possible; choose i∗ such that j0 is as small as possible. Using Bland’s rule, no cycling can occur. Theorem 34 (Bland, 1977). When using Bland’s rule for pivot selection, no cycling occurs. Proof. Suppose to the contrary that T (1) , . . . , T (s) , T (s+1) is a sequence of basic tableaux belonging to bases B (1) , . . . , B (s) , B (s+1) , so that T (k+1) is the unique basic tableaux obtained from T (k) by pivoting according to Bland’s rule, and that B (1) = B (s+1) (and hence T (1) = T (s+1) ). Call an index fickle if it occurs in some, but not all of the bases B (1) , . . . , B (s) . Let t be the largest fickle index, so (127) t := max{j ∈ X | ∃p : j ∈ B (p) ; ∃q : j ∈ B (q) }.

(k ) . Note that if j ∈ X and j > t, then j ∈ B (k ) for all k . Let p ∈ {1, . . . , s} with X = s k =1 B be such that t ∈ B (p) , t ∈ B (p+1) and let q ∈ {1, . . . , s} be such that t ∈ B (q) , t ∈ B (q+1) . For simplicity we set B := B (p) , T := T (p) , B ′ := B (q) , T ′ := T (q) . So t leaves the basis at T and enters the basis at T ′ . Directly from the definition of t, and the fact that t is chosen according to Bland’s rule, we have

Suppose j ∗ is the entering variable at T . Let f be the vector such that fj ∗ = 1, Af = 0, and {j | fj = 0} ⊆ B ∪ {j ∗ }. We claim that (1) fj ≥ 0 for all j ∈ X such that j < t; and (2) ft < 0. 0 < cj ∗ = cf = c′ f =
j ∈X

(1) c′ j ≤ 0 for all j ∈ X such that j < t; and ′ (2) ct > 0; and (3) c′ j = 0 for all j ∈ X such that j > t.

As c − c′ ∈ rowspace(A), we have cf = c′ f . Thus we arrive at (128) c′ j fj = (
j ∈X,j<t ′ c′ j fj ) + ct ft + ( j ∈X,j>t

c′ j fj ) < 0,

a contradiction. To complete the proof, the claim concerning f (that min{j ∈ X | fj < 0} = t) must be verified. This is an exercise.

58

4. THE SIMPLEX ALGORITHM

Earlier, the more complicated lexicographic rule had been shown to prevent cycling. Dantzig himself, when he published the simplex method, showed that cycling could be avoided by ‘perturbation’ of the given optimization problem. 3. The revised simplex method 3.1. Arithmetic complexity. The arithmetic complexity of an algorithm is the number of additions, subtractions, multiplications, divisions and comparisons performed in a full run of the algorithm. The arithmetic complexity is a reasonable measure of the amount of time a computer will take to run the algorithm, provided that there is not much else going on in the algorithm besides basic arithmetic, and provided that the size of the numbers involved is bounded. We recommend a solid course in complexity theory for those interested in the definition of bit complexity of an algorithm, which is a more precise measure of computational effort. For now, we will do with arithmetic complexity, as we will only want to analyze simple arithmetical procedures. 3.2. The arithmetic complexity of matrix operations and pivoting. Let A be a m × n matrix. Then a row operation takes at most 2n arithmetic operations since (1) multiplying a row of A by a nonzero scalar takes n multiplications; (2) adding a multiple of a row to another row of A takes n multiplications and n additions.

A pivoting operation on A takes m row operations, so at most 2mn arithmetic operations. Gaussian elimination takes at most rank(A) pivoting operations on the coefficient matrix A b to find a matrix in row echelon form. If A has m rows and n columns, then the arithmetic complexity of Gaussian elimination is at most rank(A)2(n +1)m. Or, since rank(A) ≤ m, at most 2(n + 1)m2 arithmetic operations. The simplex method is to repeatedly apply pivoting steps on a tableau, which is a block matrix of size (m + 1) × (n + 1), say. To determine the pivot (i∗ , j ∗ ) takes at most n − 1 comparisons for finding j ∗ = min{j | cj > 0}; then m comparisons and at most m divisions to calculate {bi /aij ∗ | aij ∗ > 0} and finally at most m − 1 comparisons to find the minimizer i∗ of this set. The pivot itself will take at most 2(n + 1)(m + 1) arithmetic operations, in all at most n − 1 + 3m − 1 + 2(n + 1)(m + 1) arithmetic operations, which is O(nm). In practice, this ‘worst-case’ bound is indeed proportional to the amount of time needed. 3.3. The number of pivot steps in the simplex algorithm. We have determined the arithmetic complexity of one pivot step. To know the arithmetic complexity of the simplex method we need a bound on the total number of pivot steps needed to reach an optimal tableau from an initial feasible tableau. Sadly, we have no better general upper bound than the total n! number of bases, which is (n−m )!m! for a tableau with m + 1 rows and n + 1 columns. Even more sadly, for each n there is an example on n variables with 2n constraints where the simplex algorithm, using Bland’s Rule, visits all 2n vertices of the cube-like feasible set in question. Similar bad examples exist for all known pivot rules that provably prevent cycling. It is still theoretically possible that a pivot rule exists that prevents cycling and for which a good upper bound on the number of pivots can be derived. The simplex method would be useless if its average behaviour was close to the worst-case bound. However, empirical studies show that the number of pivot steps is ‘usually’ linear in n, the number of columns of the tableau. This is not an exact result, but the least one can say is n! that the upper bound of (n−m )!m! is in practice a very bad estimate for the average number of pivot steps.

EXERCISES

59

3.4. The revised simplex method. It is possible to lower the complexity of pivoting steps by operating on a lighter data structure than a tableau. In a general pivot step, we need to access the current c to determine j ∗ , and only b and the j ∗ -th column of the current A to determine i∗ . It turns out to be more efficient to compute the j ∗ -th column of the current A at the last moment, instead of updating each column of A in each iteration. The information needed for this reconstruction is kept and updated, but not the full tableau T . Let T be the initial feasible basic tableau. Consider the extended initial tableau ˜ = −c d 0 (129) T A b I Let T ∗ be obtained by applying several pivots to T . Then applying the same pivots to the extended initial tableau yields the extended tableau (130)
∗ d∗ ˜∗ = −c T ∗ A b∗

y∗ . Q∗

The matrix Q∗ and the vector y ∗ record the combined effect of all row operations that were ˜ to T ˜ ∗ : it is not hard to see that A∗ = Q∗ A, b∗ = Q∗ b, c∗ = c − y ∗ A, applied going from T ∗ ∗ ∗ d = d + y b, and that Q = (A{1,...,m}×B ∗ )−1 , where B ∗ is the basis corresponding to T ∗ . It is an exercise to show that if T ∗ is optimal, then y ∗ is an optimal solution of the dual problem. In the revised simplex method, we keep and update Q∗ and y ∗ , and compute only the entries of c∗ , A∗ and b∗ we need to determine the pivot elements. Specifically, to perform one ∗ pivot step we compute entries of c∗ one by one until a positive entry c∗ j ∗ is found. Then b ∗ ∗ ∗ ∗ ∗ and the j th column of A , is computed to find the pivot row i . Updating Q , y is done by pivoting the i∗ -th entry of a∗ j ∗ in the matrix. (131)
∗ ∗ ˜ = cj T a∗ j∗

y∗ . Q∗

One pivot step now takes O(n) steps for finding j ∗ and O(m2 ) steps for finding i∗ and updating Q∗ and y ∗ . If m is much smaller than n this is significantly faster than the O(nm) steps needed to perform a pivot on the full tableau. The simplex method is usually implemented in this revised form. A further advantage of this revised method is numerical stability: we need only worry about the accuracy of the matrix Q and the vector y , and not about the larger matrix A. To make sure that Q is numerically accurate, one resets Q to (A{1,...,m}×B )−1 after every O(m) steps. Exercises −c∗ d∗ −c d , T∗ = etc. in all exercises. A b A∗ b∗ Several problems were taken from Linear Programming, Foundations and Extensions by Robert Vanderbei. (1) Let T be a basic feasible tableau corresponding to basis B . Show that if the j ∗ -th column of A has only nonpositive elements and cj ∗ > 0, then there exists an f ≥ 0 such that Af = 0, cf > 0 and fj = 0 if j ∈ B ∪ {j ∗ }. Show that in that case xB + λf is feasible for all λ ≥ 0 and c(xB + λf ) → ∞ if λ → ∞. Conclude that the problem corresponding to T is unbounded. (2) Solve the following tableaux. That is, find an optimal or an unbounded tableau equivalent to the given tableau, and write down an optimal basic solution or an unbounded improving direction. To avoid repetition, we will assume that T =

60

4. THE SIMPLEX ALGORITHM

(3)

(4)

(5) (6)

(7)

(8)

−6 −8 −5 −9 0 0 , belonging to basis {5}. 1 1 1 1 1 1 −3 3 15 −4 0 4 0 0 0 (b) 1 1 1 3 3 3 1 0 8 , belonging to basis {7, 8}. 5 5 5 2 2 2 0 1 14 −6 2 −1 −3 0 0 0 (c) 7 −2 −3 5 0 1 1 , belonging to basis {5, 6}. −3 1 1 −2 1 0 1 −5 −4 −3 0 0 0 0 2 3 1 1 0 0 5 (d) , belonging to basis {4, 5, 6}. 4 1 2 0 1 0 11 3 4 2 0 0 1 8 List all possible pivot elements (i∗ , j ∗ ) such that pivoting on (i∗ , j ∗ ) in (109) yields a feasible tableau. What is the minimum number of pivot steps needed to find the optimal tableau (112) from  (109) ?    0 2 3 5 Solve: max{(2, 3, 4)x |  1 1 2  x ≤  4  , x ≥ 0, x ∈ R3 }. Sketch the set of 1 2 3 7 feasible solutions in R3 and indicate the successive basic solutions found by the simplex method. Let T be a basic feasible tableau corresponding to basis B . Show that xB is a vertex of {x ∈ Rn | Ax = b, x ≥ 0}, the feasible set of the problem corresponding to T . Let T and T ∗ be feasible tableaux corresponding to bases B and B ∗ , so that T ∗ arises ∗ ∗ from T by one improving pivot step. Show that either xB = xB , or that [xB , xB ] is an edge of {x ∈ Rn | Ax = b, x ≥ 0}. Suppose T is an optimal tableau. Is it possible to see from T whether there is another optimal tableaux, equivalent to T . If so, how? Is it possible to see from an optimal tableau whether there is more than one optimal solution? Solve the problem max{cx | Ax ≤ b, x ≥ 0} for the following values of A, b, c. 2 −5 1 −5 (a) A = ,b= , c = (−1, −3, −1). 2 − 1 2   4   1 −11 −5 18 0 (b) A =  1 −3 −1 2 , b =  0 , c = (−19, 4, 17, 5). 1 0 0 0 1     −3 −1 −1 1 , b =  −1 , c = (1, 3). (c) A =  −1 2 1 2     −1 −1 −3 1 , b =  −1 , c = (1, 3). (d) A =  −1 −1 2 2 (a)

EXERCISES

61

1 −2 1    1 −1   2     2 −1   6     1 0   5 , c = (3, 2). , b =  16  2 1      12   1 1     21  1 2  0 1 10 (9) Suppose that T is a basic dual feasible tableau, i.e. c ≤ 0. A dual pivot step is pivoting on (i∗ , j ∗ ) selected as follows: (a) choose i∗ such that bi∗ < 0; and (b) choose j ∗ such that ai∗ j ∗ < 0 and cj ∗ /ai∗ j ∗ = min{cj /ai∗ j | ai∗ j < 0}. The dual simplex method is the repeated application of dual pivot steps to dual feasible tableaux until an optimal tableau is reached. (a) Show that if T ∗ is obtained from T by a dual pivot step, then T ∗ is dual feasible and d∗ ≤ d. (b) Solve the problems of exercise 8 with c ≤ 0 by the dual simplex method. (10) By linear optimization duality and elementary matrix manipulation, we have      (e) A =       max{cx | Ax ≤ b, x ≥ 0} = min{yb | yA ≥ c, y ≥ 0} = = − max{(−bt )z | (−A)t z ≤ (−c)t , z ≥ 0},

provided that the first maximization problem is feasible and bounded. Solve the duals of the problems of exercise 8 by the simplex method. When is solving the dual easier than solving the primal problem? (11) Complete the proof of Theorem 34. (12) Derive Theorem 22 from Theorem 34.

Part 2

Integer Linear Optimization

CHAPTER 5

Integrality
1. Linear diophantine equations 1.1. Linear diophantine equations. This section is all about proving the following ‘theorem with an alternative’, which is an analogue of Fredholm’s Alternative (Theorem 3) and Farkas’ Lemma (Theorem 9). Theorem 35. Let A be a rational m × n matrix and let b ∈ Qm . Then either (1) there is an x ∈ Zn such that Ax = b, or (2) there is a y ∈ Qm such that yA ∈ Zn and yb ∈ Z, but not both. We prove this theorem below, after presenting two essential Lemma’s. The following operations on a matrix are integral column operations: (1) exchanging two columns, (2) multiplying a column by −1, and (3) adding an integral multiple of a column to another column. When a matrix A′ can be obtained from another matrix A by integral column operations, we denote this by A′ ≈ A. The following lemma is easy to verify. Lemma 36. Let A, A′ be rational m × n matrices such that A ≈ A′ and let b ∈ Qm . Then (1) Ax = b for some x ∈ Zn if and only if A′ x′ = b for some x′ ∈ Zn , and (2) yA ∈ Zn if and only if yA′ ∈ Zn , for all y ∈ Qm .

We say that a m × n matrix H is in Hermite normal form if we can write H = B 0 where B is a nonnegative m × m lower triangular matrix such that the unique maximum entry in each row is located on the diagonal of B ; so if   b11 0 ··· 0 . ..   . . .  b22  b (132) B =  21 , ..  . . . 0  . bm1 bm2 · · · bmm where bii > bij ≥ 0 for all i, j such that j < i. Lemma 37. Let A be an integral m × n matrix with linearly independent rows. Then there is a matrix H ≈ A such that H is in Hermite normal form. Proof. We prove the Lemma by induction on the number of rows m. For an integral m × n matrix C , let σ (C ) := n j =1 |c1j |. If C has two nonzero elements in it’s top row, say 1k c1k ≥ c1j > 0, then subtracting the j -th column λ := ⌊ c c1j ⌋ times from the k -th column, we obtain a matrix C ′ ≈ C with σ (C ′ ) < σ (C ), since |c′ 1k | = |c1k − λc1j | < |c1j | ≤ |c1k |. Starting from A and applying such column operations while possible we obtain a sequence of integral
65

66

5. INTEGRALITY

matrices A ≈ A′ ≈ · · · with σ (A(p) ) > σ (A(p+1) ) for all p. The sequence is finite as σ (A(p) ) is a nonnegative integer for all p, so in particular there are no more that σ (A) matrices in the sequence. The final matrix A(t) cannot have more than one nonzero in the top row. By exchanging two columns in A(t) , and/or multiplying the first column by −1 if necessary we obtain (133) A≈ b11 0 ˜ ∗ A ,

˜ is an (m − 1) × (n − 1) integral matrix with linearly independent rows. where b11 ≥ 0 and A ˜ ≈A ˜ that is in Hermite normal form. It follows that By induction there exists a matrix H (134) A≈ b11 0 ˜ ∗ H .

By subtracting a suitable integer multiple of the i-th column from the first column for i = 2, 3, . . . , m (in that order), we obtain a matrix H ≈ A in Hermite normal form. Proof of Theorem 35. We show first that (1) and (2) cannot both be true. For if x, y are as in (1),(2), then (135) Z ∋ (yA)x = y (Ax) = yb ∈ Z,

a contradiction. It remains to show that at least one of (1), (2) holds. Note that without loss of generality, we may assume that A is integral, as multiplying both A and b by a λ ∈ Z does not affect the validity of (1) and (2). If the rows of of A are linearly dependent, then either one of the rows is redundant in both (1) and (2)(i.e. removing the row does not affect the validity of (1),(2)) or Ax = b has no solutions at all, and then there is a rational y ∈ Qm such that 1 . So we may assume that the rows of A are linearly independent. By yA = 0 ∈ Zn and yb = 2 Lemma 37, there is a matrix H ≈ A that is in Hermite normal form, say H = B 0 . Since A is integral with independent rows, it follows that H is integral and has independent rows as well, so B is nonsingular. By Lemma 36, to prove the theorem it suffices to show that either (1) there is an u ∈ Zm such that Bu = b, or (2) there is a y ∈ Qm such that yB ∈ Zn and yb ∈ Z.

Since B is nonsingular, the equation Bu = b has a unique solution, namely u = B −1 b. Thus if B −1 b ∈ Zm we are in case (1); if on the other hand B −1 b ∈ Zm , say (B −1 b)i ∈ Z, then taking y equal to the i-th row of B −1 , we get yB = (B −1 B )i = ei ∈ Zm and yb = (B −1 b)i ∈ Z, and we are in case (2). Theorem 35 has two Corollaries that may be familiar from first-year Algebra. Their proofs are exercises. Let a1 , . . . , an ∈ Z. The greatest common divisor of a1 , . . . , an is (136) gcd(a1 , . . . , an ) := max{d ∈ Z | d divides each of a1 , . . . , an }. Corollary 37.1. Let a1 , . . . , an ∈ Z. Then λ1 a1 + · · · + λn an = gcd(a1 , . . . , an ) for some λ1 , . . . , λn ∈ Z. Let a, b, d ∈ Z. By a ≡ b mod d we denote that a = b + λd for some λ ∈ Z. Corollary 37.2. (‘Chinese remainder Theorem’) Let b1 , . . . , bm ∈ Z, and let d1 , . . . dm ∈ Z, where gcd(di , dj ) = 1 for all i = j . Then there exists an x ∈ Z such that x ≡ bi mod di for i = 1, . . . , m.

1. LINEAR DIOPHANTINE EQUATIONS

67

1.2. Solving linear diophantine equations. We explain how to compute the set of all integral solutions to a system of linear equations in this section. A n × n matrix U is unimodular if U is integral and | det(U )| = 1. It is an exercise to prove the following Lemma. Lemma 38. Let U be an n × n matrix. The following are equivalent: (1) U is unimodular, (2) U ≈ I , (3) U −1 is unimodular, and (4) U x ∈ Zn if and only if x ∈ Zn , for all x ∈ Rn . A ≈ A′ ⇔ A I ≈ A′ U for some U ⇔ AU = A′ for some unimodular matrix U.

Let A and A′ be two m × n matrices. By the Lemma, we have (137)

So applying integral column operations amounts to multiplying on the right by a unimodular matrix. Lemma 39. Let A be an integral m × n matrix with independent rows and let b ∈ Zm . Suppose that H, U, B, V and W are matrices such that (138) A I ≈ H U = B 0 V W ,

Proof. By (138), we have AU = H , and U is unimodular since U ≈ I . Hence Hx′ = b if and only if A(U x′ ) = b, and U x′ ∈ Zn if and only if x′ ∈ Zn . If B −1 b is not integral, then Hx′ = b has no integral solutions, hence Ax = b has no integral solutions. If B −1 b is integral, B −1 b we have {x′ ∈ Zn | Hx′ = b}] = { | y ∈ Z(n−m) }. It follows that y (139) {x ∈ Zn | Ax = b} = U [{x′ ∈ Zn | Hx′ = b}] = B −1 b ={ V W | y ∈ Z(n−m) } = y = {v + W y | y ∈ Z(n−m) }.

and such that B is an m × m matrix. If B −1 b is not integral, then Ax = b has no integral solutions. Otherwise, {x ∈ Zn | Ax = b} = {v + W y | y ∈ Z(n−m) }, where v := V B −1 b.

Thus to compute the vector v ∈ Zn and the n × (n − m) matrix W it suffices to apply the A . integral column operations that bring A into Hermite normal form H to the matrix I This yields B, V, W and hence v . As an example, consider the 1 × 2 matrix A = 29 13 and the vector b = (1). We apply integral column operations:           29 13 3 13 3 1 0 1 1 0  1 0 ≈ 1 (140) 0  ≈  1 −4  ≈  13 −4  ≈  −4 13  . 0 1 −2 1 −2 9 −29 9 9 −29 Now B = (1), V = (141) −4 ,W = 9 13 . So B −1 b is integral, v := V B −1 b = −29 −4 9 + 13 −29 y | y ∈ Z}. −4 9 and

{x ∈ Z2 | 29x1 + 13x2 = 1} = {

The initiate will recognize the Euclidian algorithm in this procedure.

68
0 1 1 0 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1

5. INTEGRALITY
11 00 00 11 00 11 1 0 0 1 0 1 11 00 00 11 00 11 11 00 00 11 00 11 11 00 00 11 00 11 00 11 11 00 00 11 11 00 00 11 00 11 11 00 00 11 00 11 11 00 00 11 00 11 11 00 00 11 00 11 11 00 00 11 00 11 11 00 00 11 00 11 00 11 11 00 00 11 11 00 00 11 00 11 11 00 00 11 00 11 11 00 00 11 00 11 11 00 00 11 00 11 11 00 00 11 00 11 00 11 11 00 00 11 11 00 00 11 00 11

0 1 1111111 0000000 0 1 0000000 1111111 0 1 0 1 0000000 1111111 00 11 0 1 0000000 1111111 00 11 0 1 0000000 1111111 0 1 0 1 00 11 0000000 1111111 0 1 0 1 00 11 0000000 1111111 0 1 0 1 00 11 0 1 0 1 0000000 1111111 00 11 0 1 0 1 0000000 1111111 00 11 0 1 0 1 0000000 1111111 0 1 00 11 0000000 1111111 0 1 00 11 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

00 11 11 00 00 11 11 00 00 11 00 11 11 00 00 11 00 11

11 00 00 11 00 11 11 00 00 11 00 11

11 00 00 11 00 11 11 00 00 11 00 11

A lattice L generated by two vectors Parallel hyperplanes H such that L ⊆ H ∋ ◦. Figure 1. 2. Lattices (142) 2.1. Definitions. Given vectors a1 , . . . , an ∈ Rm , we define the integer span as int {a1 , . . . , an } := {λ1 a1 + · · · + λn an | λ1 , . . . , λn ∈ Z}.

If L = int {a1 , . . . , an }, then L is said to be generated by a1 , . . . , an . We say that a set L ⊆ Rn is a lattice if L = int {a1 , . . . , an } for some some linearly independent set {a1 , . . . , an } (see Figure 1). A lattice L is full-dimensional if there is no hyperplane H such that L ⊆ H . Equivalently, L ⊆ Rn is full-dimensional if L is generated by a basis of Rn . Any set of rational vectors, whether linearly independent or not, generates a lattice. Proof. We may assume that lin.hull {a1 , . . . , an } = Rm . Let A be the matrix with i-th column ai for i = 1, . . . , m. Then L = {Ax | x ∈ Zn }, and A has linearly independent rows. By Lemma 36(1), we have {Ax | x ∈ Zn } = {A′ x′ | x′ ∈ Zn } if A ≈ A′ , and by Lemma 37 there exists a H ≈ A in Hermite normal form, i.e. H = B 0 where B is a nonsingular matrix. So L = {Bu | u ∈ Zm }, and L is generated by the linearly independent columns of B . Irrational vectors may generate sets that are not lattices. It is an exercise to show that √ int {1, 2} is not a lattice. 2.2. The dual lattice. Let a1 , . . . , an , b ∈ Qm , and let L be the lattice generated by a1 , . . . , an . Theorem 35 is equivalent to: either (1) b ∈ L, or (2) there is a y ∈ Qm such that L ⊆ Hy and b ∈ Hy , Lemma 40. Let a1 , . . . , an ∈ Qm . Then L = int {a1 , . . . , an } is a lattice.

but not both, where Hy := {x ∈ Rn | y t x ∈ Z} = z ∈Z Hy,z is a set of parallel hyperplanes (see Figure 1). The dual of a lattice L is defined as L† := {y ∈ Rn | y t x ∈ Z for all x ∈ L}. Thus L† = {y ∈ Rn | L ⊆ Hy }. We have: Lemma 41. Let L be a full-dimensional lattice. Then L†† = L. Proof. Let L be a full-dimensional lattice, say L = {Bu | u ∈ Zn } for some nonsingular matrix B . Then L† = {(B −1 )t w | w ∈ Zn }. For suppose that y, w ∈ Rn are such that y = (B −1 )t w. Then (143) Thus y ∈ L† if and only if w ∈ Zn , as required. It follows that L†† = {Bu | u ∈ Zn }†† = {(B −1 )t w | w ∈ Zn }† = {Bu | u ∈ Zn } = L. y ∈ L† ⇔ y t x ∈ Z∀x ∈ L ⇔ ((B −1 )t w)t Bu ∈ Z∀u ∈ Zn ⇔ (w)t u ∈ Z∀u ∈ Zn .

2. LATTICES

69

2.3. The determinant of a lattice. The following Lemma will allow us to define an invariant of lattices. Lemma 42. Let A and A′ be m × n matrices with independent columns. Then A ≈ A′ if and only if {Ax | x ∈ Zn } = {A′ x | x ∈ Zn } . Proof. It is easy to see that if A ≈ A′ , then {Ax | x ∈ Zn } = {A′ x | x ∈ Zn }. So let us assume that {Ax | x ∈ Zn } = {A′ x | x ∈ Zn }. Then each column of A is an integral combination of columns of A′ and vice versa, hence there are integral n × n matrices U and U ′ such that AU = A′ and A = A′ U ′ . Then AU U ′ = A′ U ′ = A, hence U U ′ = I as A has independent columns. But then U ′ = U −1 , and hence det(U ) ∈ Z and det(U )−1 = det(U −1 ) ∈ Z, hence | det(U )| = 1. It follows that U is unimodular, and AU = A′ , so A ≈ A′ . Let L be a lattice, and let A be a matrix with linearly independent columns such that L = {Ax | x ∈ Zn }. The determinant of L is defined as (144) d(L) := det(At A). The determinant does not depend on the choice of A: for if A′ is some other matrix such that L = {A′ x | x ∈ Zn }, then by the Lemma AU = A′ for some unimodular U , and then (145) det((A′ )t A′ ) = det(U t At AU ) = det(U t ) det(At A) det(U ) = det(At A). If L is full-dimensional, then the colums of A are a basis of Rn , hence A is square and d(L) = det(At A) = | det(A)|. It is an exercise to show that d(L)d(L† ) = 1 for any full-dimensional lattice L. 2.4. Examples of lattices. We include the description of a few special lattices. (1) Zn , the cubic lattice; (2) An := {x ∈ Zn+1 | i xi = 0}. A2 is the hexagonal lattice, and A3 is the face-centered cubic lattice; (3) Dn := {x ∈ Zn | i xi ≡ 0 mod 2}, the checkerboard lattice; (4) The 8-dimensional lattice E8 := D8 ∪ (D8 + 1 2 1); (5) The 24-dimensional Leech lattice Λ24 , generated by the rows of the following matrix (we have omitted the 0’s for readability)
                                         8 4 4 4 4 4 4 2 4 4 4 2 4 2 2 2 4 2 2 2 4 4 4 4 4 2 2 2 2 2 4 2 2 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 1 2 1 1 2 2 2 1 2 1 2 1 1 4 2 2 4 2 2 2 2 2 2 4 2 2 2 2 2 2 1 2 2 2 2 1 2 1 1 2 2 2 1 2 1 2 1 1                      .                   

−3

1

1

1

1

1

1

1

70

5. INTEGRALITY

3. Lattices and convex bodies 3.1. Minkowski’s Theorem. If a set of points of sufficiently high volume must contain two points whose difference is in a given lattice. Lemma 43 (Blichfeldt). Let C ⊆ Rn be a compact set, and let L be a full-dimensional lattice. If Volume(C ) ≥ d(L), then there exist distinct x, y ∈ C such that x − y ∈ L.

Proof. Since L is a full-dimensional lattice, we may assume that L = {Bu | u ∈ Zn } for some nonsigular matrix B . Let F := {Bu | u ∈ [0, 1)n }. Then for each point x ∈ Rn there is a unique l ∈ L such that l + x ∈ F (exercise). Suppose that there do not exist distinct x, y ∈ C such that x − y ∈ L. Then the sets l + C for l ∈ L are pairwise disjoint, and for each x ∈ C , there is a unique l ∈ L so that l + x ∈ F . Thus (146) Volume(C ) =
l∈L

Volume(l + C ) ∩ F ≤ Volume(F ) = d(L).

In fact, we cannot have equality in (146), for then the sets {(l + C ) ∩ F | l ∈ L} would partition F , which is impossible (exercise). Thus Volume(C ) < d(L), as required. Consequently, any centrally symmetric convex set of sufficiently high volume contains a nonzero vector in a given lattice. Theorem 44 (Minkowski). Let C ⊆ Rn be a compact, convex set such that −x ∈ C for all x ∈ C , and let L ⊆ Rn be a full-dimensional lattice. If Volume(C ) ≥ 2n d(L), then C ∩ L contains a nonzero vector. Proof. Apply Lemma 43 to C and the lattice 2L := {2x | x ∈ L}. Since Volume(C ) ≥ 2n d(L) = d(2L), it follows that there are distinct x, y ∈ C such that x − y ∈ 2L. Hence 1 2 (x − y ) ∈ L \ {0}. As y ∈ C , we have −y ∈ C , and since C is convex, it follows that 1 1 2 (x − y ) ∈ [x, −y ] ⊆ C . Hence 2 (x − y ) ∈ C ∩ L \ {0}, as required. Let Vn be the volume of the n-dimensional unit ball, i.e. Vn := Volume(Bn (0, 1)). It is an n/2 2n π (n−1)/2 ((n−1)/2)! if n is odd. exercise to show that Vn = (π n! n/2)! if n is even and Vn = Minkowski’s Theorem yields an upper bound on the length of a shortest nonzero vector in a lattice. Theorem 45. Let L be an n-dimensional lattice. Then there exists a nonzero x ∈ L such that x ≤ 2 n d(L)/Vn . Proof. Without loss of generality, L is full-dimensional. Let C = B n (0, 2 n d(L)/Vn ). Then C is a convex body with −x ∈ C for all x ∈ C and Volume(C ) = 2n d(L). Hence by Theorem 44, there is some x ∈ L ∩ C \ {0}, as required. For any lattice L, let r (L) := min{ x | x ∈ L, x = 0} be the length of the shortest nonzero vector, and let αn := max{r (L) | rank(L) = n and det(L) = 1}, i.e. αn is the smallest number so that r (L) ≤ αn n det(L) for all lattices L of rank n. The above theorem states that αn ≤ 2Vn (147)
1 −n

. For small values of n, αn is known exactly: n 1 αn n 1
2 √ 3

2 √3 4 √5 2 2 2 2

6
8 √ 3

√7 8 4 2 16
n

For the lattices L = A2 , A3 , D4 , D5 , E8 we have r (L) = αn

det(L), where n = rank(L).

3. LATTICES AND CONVEX BODIES

71

3.2. The maximal distance to a lattice. We now come to a related problem: given a full-dimensional lattice L ⊆ Rn what is the maximal distance of a point z ∈ Rn to the closest lattice point? there is an l ∈ L such that z − l ≤ Lemma 46. Let L ⊆ Rn be a lattice of rank n with r (L† ) ≥ 1/2. Then for each z ∈ Rn
n 4 k =1 αk .

(148)

Proof. We prove the theorem by induction on n, the case n = 1 being easy. So assume that r (L† ) ≥ 1/2. Since d(L)d(L† ) = 1, we have r (L)r (L† ) ≤ α2 n

2 n It follows that r (L) ≤ 2α2 ˜ be the n . Let b ∈ L \ {0} be a vector with b ≤ 2αn . For v ∈ R , let v n t ˜ orthogonal projection of v on the hyperplane H := {x ∈ R | b x = 0}. Then L := {˜ l | l ∈ L} ˜ † = {y ∈ L† | y ⊥ b}, so that r (L ˜ † ) ≥ r (L) ≥ 1/2. Thus by induction on n, is a lattice with L n−1 4 ˜ such that z there is an ˆ l∈L ˜−ˆ l ≤ α . k =1 k

Let ℓ := {ˆ l + λb | λ ∈ R} = {x ∈ Rn | x ˜=ˆ l}, and let x be the point of ℓ closest to z . Then n−1 4 x ˜=ˆ l, and z − x = z ˜−ˆ l ≤ k =1 αk . Let l be the point of L ∩ ℓ closest to x. Then x−l ≤
1 2

b ≤ α2 n . Since x − l ⊥ z − x, we have z−l
2

(149) as required.

= z−x

2

+ x−l

2

n−1 k =1

2

n

α4 k

+

α4 n

=
k =1

α4 k,

For any lattice L ⊆ Rn , let R(L) := max{d(z, L) | z ∈ Rn }, and let 1 (150) βn := max{R(L) | rank(L) = n, r (L† ) ≥ }. 2 methods, it can be shown that βn ≤ n, where our exposition yields βn ≤ cn3/2 for some constant c independent of n (exercise). 3.3. Khinchine’s Theorem. A convex body is a compact, convex set. Given a lattice L ⊆ Rn , the lattice width of a convex body C ⊆ Rn is w(C, L) := min{max{wx | x ∈ C } − min{wx | x ∈ C } | w ∈ L† \ {0}}. Then βn ≤
n 4 k =1 αk

4 −k n k =1 16Vk

by the above theorem and Lemma 45. By other

(151)

It is an exercise to show that if τ : Rn → Rn is a linear bijection, then w(τ [C ], τ [L]) = w(C, L). A set E ⊆ Rn is an ellipsoid if there is some linear bijection τ such that τ [E ] is an n-dimensional ball. Lemma 47. If w(E, L) ≥ βn then E ∩ L = ∅, for any ellipsoid E and full-dimensional lattice L in Rn . Proof. The validity of this statement remains invariant if we apply a linear bijection to both E and L. Thus without loss of generality, E = B n (z, βn ) for some z ∈ Rn . Suppose w(E, L) ≥ βn . Then for any w ∈ L† \ {0}, we have (152) It follows that w ≥ 1/2 for all w ∈ L† \ {0}, i.e. r (L† ) ≥ 1/2. By Lemma 46 there is an l ∈ L with z − l ≤ βn , i.e. l ∈ E ∩ L, as required. βn ≤ w(E, L) ≤ max{wx | x ∈ E } − min{wx | x ∈ E } = 2βn w .

72

5. INTEGRALITY

Each compact body can be ‘approximated’ by an ellipsoid. This will allow us to derive a similar statement for general convex bodies. Lemma 48. Let C be a convex body. Then there are ellipsoids E, E ′ with common center z such that E ⊆ C ⊆ E ′ and E ′ = {n(x − z ) + z | x ∈ E }.

In the above Lemma, E ′ is a homothetic dilation of E , obtained by ‘blowing up’ E by a factor of n. Combining Lemma 47 and Lemma 48, we obtain Khinchine’s Flatness Theorem. Theorem 49 (Khinchine, 1948). If w(C, L) ≥ nβn then C ∩ L = ∅, for any convex body C and full-dimensional lattice L in Rn . Proof. Let L be a lattice and let C be a convex body such that w(C, L) ≥ γn . There are ellipsoids E, E ′ as in Lemma 48; thus w(E, L) = 1/nw(E ′ , L) ≥ 1/nw(C, L) ≥ γn /n = βn . By Lemma 47, E contains a lattice point. As E ⊆ C , C contains a lattice point as well. Exercises
a 1 1 (1) Let a, b ∈ Z \ {0}. Show that |a − ⌊ a b ⌋b| < |b|, and that |a − ⌊ b + 2 ⌋b| ≤ 2 |b|. (2) Let a1 , . . . , an ∈ Z. Show that λ1 a1 +· · ·+λn an = gcd(a1 , . . . , an ) for some λ1 , . . . , λn ∈ Z. (3) Show that min{ax + by | ax + by ≥ 1, x, y ∈ Z} = gcd(a, b), for any a, b ∈ Z \ {0}. (4) Let b1 , . . . , bm ∈ Z, and let d1 , . . . dm ∈ Z, where gcd(di , dj ) = 1 for all i = j . Show that there exists an x ∈ Z such that x ≡ bi mod di for i = 1, . . . , m. Hint: show that ‘if yi d1 ∈ Z for all i and i yi ∈ Z, then y is integral’ (assuming that gcd(di , dj ) = 1 for all i = j ). (5) Prove Lemma 38. (6) Show that the following statements are equivalent for any x, y ∈ Zn : (a) gcd(x1 , . . . , xn ) = gcd(y1 , . . . , yn ); and (b) there is a unimodular n × n matrix U such that y = U x. (7) For any integral m × n matrix A, let ψ (A) be the greatest common divisor of the determinants of m × m submatrices of A. Show that A ≈ B if and only if ψ (A) = ψ (B ). Prove that ψ (A) = d({Ax | x ∈ Zn }). (8) Find out whether Ax = b has an integral solution. (a) A = 8 9 , b = 3. (b) A = 8 10 38 , b = 3. (c) A = 121 22 14 7 , b = 12. 2 4 3 5 (d) A = ,b= . 8 −3 7 6 9 7 −1 18 6 (e) A = ,b= . − 3 11 10 0   1   2 6 11 −4 3 2 2 −1  , b =  3  . (f) A =  0 −4 5 13 1 5 0 −5 −4 (9) Find out whether Ax = b has an integral solution. If so compute the set of all integral solutions. (a) A = 76 42 , b = 3. (b) A = 76 42 , b = 6. (c) A = 76 42 , b = 12. 11 71 16 (d) A = ,b= . 3 13 −1

EXERCISES

73

(10) (11)

(12)

(13) (14)

(15) (16) (17) (18) (19) (20) (21) (22)

11 71 15 ,b= . 3 13 3 10 67 12 4 (f) A = ,b= . 2 18 4 4 Solve the Frobenius problem: Given integers a, b ≥ 2 such that gcd(a, b) = 1, find the largest integer n such that n ∈ {αa + βb | α, β ∈ Z, α, β ≥ 0}. Consider R := Z[ω ] = {a + ωb | a, b ∈ Z} and K := Q[ω ] = {p + ωq | p, q ∈ Q}, where √ 1 ı 2 π/ 3 ω := e = − 2 + ı 23 . Verify that R is a ring and that K is a field. Show that for any m × n matrix A with entries in R and b ∈ Rm , exactly one of the following holds. (a) There is an x ∈ Rn such that Ax = b. (b) There is a y ∈ Km such that yA ∈ Rn and yb ∈ R. Can you find other pairs (R, K) for which the above statement holds? A set L ⊆ Rn is an additive subgroup if −a ∈ L for all a ∈ L, and a + b ∈ L for all a, b ∈ L. An additive subgroup L is discrete if inf { a − b | a, b ∈ L, a = b} > 0. Show that L is a lattice if and only if L is a discrete additive subgroup. Let a, b, c, d ∈ Z be such that for all u, v ∈ Z there exist x, y ∈ Z such that ax + by = u and cx + dy = v . Show that ad − bc = ±1. Suppose a1 , . . . , an ∈ Rm . Let U := lin.hull {a1 , . . . , an }, and let k := dim U . (a) Show that there is a linear bijection l : U → Rk . (b) Let a′ i := l(ai ) for all i. Show that int {a1 , . . . , an } is a lattice if and only if ′ int {a′ 1 , . . . , an } is a lattice. (c) Let b1 , . . . , bn ∈ U , and let b′ i := l(bi ) for all i. Show that int {a1 , . . . , an } = ′ ′ ′ int {b1 , . . . , bn } if and only if int {a′ 1 , . . . , an } = int {b1 , . . . , bn }. √ 1 Show that int {1, 2} is not a discrete subset of R . Let B be a nonsingular n × n matrix, and let L = {Bx | x ∈ Zn }. Show that L† = {B −1 x | x ∈ Zn }, and that d(L† ) = d(L)−1 . Let a ∈ Zn be a row vector and let L := {x ∈ Zn | ax = 0}. Show that d(L) = a . Show that An , Dn , E8 , Λ24 are indeed lattices. Compute d(L), r (L), r (L† ) for L = An , Dn , E8 and Λ24 . Let L = {Bu | u ∈ Zn } for some nonsigular matrix B . Let F := {Bu | u ∈ [0, 1)n }. Show that for each point x ∈ Rn there is a unique l ∈ L such that l + x ∈ F . Show that in the proof of Lemma 43 the sets {(l + C ) ∩ F | l ∈ L} cannot partition F . Let a ∈ Zn be a row vector. Show that there is an x ∈ Zn such that ax = 0 and x ≤ 2( a /Vn−1 )1/(n−1) . n/2 2n π (n−1)/2 ((n−1)/2)! if n is odd. Show that Vn = (π n! n/2)! if n is even and Vn = 2π Hint: Show that Vn = n Vn−2 by integrating over the 2-dimensional unit ball. (e) A =
−2

(23) Show that there exists a constant c0 ∈ R such that Vn n ≤ c0 n for all n ∈ N. Hint: use the previous exercise and that √ √ 2πnn+1/2 exp(−n + 1/(12n + 1)) < n! < 2πnn+1/2 exp(−n + 1/(12n)) (24) (25) (26) (27)

(this is a refinement of Stirling’s formula). Show that there is a constant c1 ∈ R such that βn ≤ c1 n3/2 for all n ∈ N. Derive from Lemma 46 that if L ⊆ Rn is a full-dimensional lattice and z ∈ Rn then there exists an l ∈ L such that z − l ≤ βn r (L† )/2. Show that if τ : Rn → Rn is a linear bijection, then w(τ [C ], τ [L]) = w(C, L). Show that if C is a convex body, then there are simplices S, S ′ with common center z so that S ⊆ C ⊆ S ′ and S ′ = {n(x − z ) + z | x ∈ S }. Hint: take S a simplex of

74

5. INTEGRALITY

maximum volume contained in C . (This is a variant of Lemma 48 which is easier to prove.) (28) Let L = {Ax | x ∈ Zn }, where A has linearly independent columns, and let r ∈ R. Show that if a ∈ L is such that a < r , then a ∈ {Ax | x ∈ Z, x < √ r t }.
λ1 (A A)

CHAPTER 6

Integer linear optimization
1. Integer linear optimization 1.1. Overview. Integer linear optimization is optimizing a linear objective over all integral vectors satisfying given linear equations and/or inequalities. All such problems can be reduced to the standard form (153) max{cx | Ax ≤ b, x ∈ Zn }.

The feasible set of this problem is the intersection of the lattice Zn with the polyhedron P := {x ∈ Rn | Ax ≤ b} (see Figure 1). There is no algorithm that solves integer linear optimization problems efficiently (i.e. in polynomial time), and is generally believed that no such algorithm can exist1. Nevertheless, there is an algorithm that solves any integer linear optimization problems in finite time, the branch & bound algorithm. Used with discretion, this algorithm is a very powerful tool to solve integer linear optimization problems in practice. We describe the branch & bound algorithm in the final section of this chapter. Many combinatorial optimization problems can be formulated as an integer linear optimization problem. Certain well-stuctured problems can be solved to a much greater extent than the general problem. We describe such a well-solved problem in section 2 of this chapter.
1Integer linear optimization is N P -complete.

cx=d a 1
1 0 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1

a2

a5

1 0 0 1 0 1

c

a3

a4

Figure 1. An integer linear optimization problem
75

76

6. INTEGER LINEAR OPTIMIZATION

a1
11 00 00 11 11 00 00 11 00 11 00 11 11 00 00 11 00 11 11 00 00 11 11 00 11 00 00 11 11 00 00 11 00 11 00 00 11 00 11 11 00 00 11 00 11 11 11 11 00 00 00 11 00 11

a2

a5

11 00 00 11 11 00 11 00 00 11 11 00 11 00 00 11 00 11 00 00 11 00 00 11 11 00 00 11 00 11 11 11 11 00 00 00 11 00 11 11 11 00 11 00 00 11 11 00 11 00 00 11 11 00 00 11 00 11 00 00 11 00 11 11 00 00 11 00 11 11 11 11 00 00 00 11 00 11 11 00 00 11 11 00 11 00 11 00 00 11 00 11 00 00 00 11 11 00 00 11 00 11 11 11 11 00 00 11 11 00 00 11 11 00 00 11 00 11 00 11 11 00 00 11 00 11

a3

a4

Figure 2. A polyhedron and its integer hull 1.2. Integral polyhedra and integer hulls. Let P be a polyhedron. The integer hull of P is (see Figure 2) (154) (155) Clearly, PI ⊆ P . A polyhedron P is integral if P = PI . We have PI := conv.hull (P ∩ Zn ),

Thus optimizing a linear function over all integral points in P is no harder than optimizing over PI , but we need a description of PI in terms of linear equations and inequalities to do this in practice. For this reason, determining the integer hull and finding integral polyhedra are central problems in integer linear optimization. If P is bounded, then we can derive from Theorem 25 that P is integral if and only if all its vertices are integral vectors, i.e. if V (P ) ⊆ Zn . It also follows from Theorem 25 that the polytope PI is a polyhedron. It is a difficult problem to determine a polyhedral description of PI given P . We describe how in theory, PI can be approximated. Suppose P = {x ∈ Rn | Ax ≤ b, x ≥ 0}. The hyperplane Hc,d is a cutting plane for P if cx ≤ d is valid for all x ∈ PI and there is some x ∈ P for which cx > d. Then Hc,d ‘cuts off’ a corner of the polyhedron P containing no integer points. One way to obtain a valid inequality for PI is to take a valid inequality for P and ‘round it down’. For any c ∈ Rn , let ⌊c⌋ be defined by ⌊c⌋i = ⌊ci ⌋ for all i (similarly for ⌈c⌉). If cx ≤ d for all x ∈ P , then (156) for all x ∈ P ∩ Zn , hence ⌊c⌋x ≤ ⌊d⌋ is valid for all x ∈ PI . Any valid inequality for P can be obtained by making a nonnegative combination of rows of Ax ≤ b, but not every cutting plane is obtained by rounding down a valid inequality for P . However, let (157) let P (0) := P and let P (t+1) := (P (t) )′ for all t ∈ N. Without proof, we mention: P ′ := {y ∈ P | ⌊c⌋y ≤ ⌊d⌋ for all c, d such that cx ≤ d for all x ∈ P }, ⌊c⌋x ≤ ⌊cx⌋ ≤ ⌊d⌋

max{cx | x ∈ P ∩ Zn } = max{cx | x ∈ PI }.

Theorem 50. Let P := {x ∈ Rn | Ax ≤ b, x ≥ 0}, where A and b are rational. Then PI = P (t) for some finite t.

1. INTEGER LINEAR OPTIMIZATION

77

1.3. Totally unimodular matrices. A n × m matrix M is totally unimodular (TU) if det(M ′ ) ∈ {−1, 0, 1} for all square submatrices M ′ of M . In particular, the entries (1 × 1 submatrices) of a TU matrix are either −1, 0, or 1. We describe how certain integral polyhedra arise from totally unimodular matrices. Let us first argue that there exist TU matrices. Let D = (V, A) be a directed graph. The incidence matrix of D is the V × A matrix D M = (Mva ) such that mva = 1 if v is the head of a, mva = −1 if v is the tail of a and mva = 0 otherwise (thus M D is a matrix with exactly one 1 and exactly one −1 in each column, and 0’s otherwise). Lemma 51. If M is the incidence matrix of a directed graph, then M is totally unimodular. Proof. Suppose the matrix M contradicts the Lemma. Let B be a square k × k submatrix of M with det(B ) ∈ {0, −1, 1}. We may assume B is chosen such that k is as small as possible. There cannot be a column with no nonzero entries, as this would imply det(B ) = 0. If there is a column with exactly one nonzero entry, say the i-th column with a nonzero in the j -th row, then striking both the i-th column and the j -th row from B yields a (k − 1) × (k − 1) submatrix B ′ of M with | det(B ′ )| = | det(B )|, i.e. a smaller counterexample, contradicting that k was chosen as small as possible. So each column of B has exactly two nonzeros, a 1 and a −1. But then 1t B = 0, implying that det(B ) = 0. This contradicts the choice of B . If M is totally unimodular, then it follows easily from the the definition that the transpose of M as well as each submatrix of M is totally unimodular. Moreover, duplucating a row or column, adding a row or column with only one nonzero entry, and negating a row or column all preserve total unimodularity. It follows that if M is TU, then M I , M −M , etc. are all TU (exercise). Theorem 52. Let M be a totally unimodular m × n matrix, and let b ∈ Zm be an integral vector. Then P := {x ∈ Rn | M x ≤ b} is an integral polyhedron. Proof. We need to show that if y ∈ P , then y ∈ conv.hull P ∩ Zn . So let y ∈ P . Let P ′ := {x ∈ P | ⌊y ⌋ ≤ x ≤ ⌈y ⌉}. It suffices to show that P ′ is integral, since then y ∈ conv.hull P ′ ∩ Zn and hence y ∈ conv.hull P ∩ Zn . Now P ′ is a bounded polyhedron, and P ′ = {x ∈ Rn | M ′ x ≤ b′ }, where     M b (158) M ′ =  −I  , b′ =  −⌊y ⌋  . I ⌈y ⌉ So M ′ is again TU and b′ is integral. Since P ′ is a bounded polyhedron, we know that rank(M ′ ) = n. Let mi denote the i-th row of M ′ . Let v be a vertex of P ′ . By Theorem 22, there is a set of rank(M ′ ) = n indices B such that {mi | i ∈ B } is linearly independent and ′ ′ ′ ′ mi v = bi for all i ∈ B . So the submatrix MB ×n of M is nonsingular, and MB ×n v = bB . Then ′ ′ − 1 ′ MB ×n is unimodular, hence (MB ×n ) is an integral matrix. Since bJ is integral as well, it ′ −1 ′ follows that v = (MB ×n ) bj is integral. Corollary 52.1. Let M be a totally unimodular m × n matrix, and let b ∈ Zm and c ∈ Zn be integral vectors. Then (159) {x ∈ Rn | M x ≤ b, x ≥ 0}, {y ∈ Rm | yM ≥ c, y ≥ 0}, {y ∈ Rm | yM = c, y ≥ 0}

are all integral polyhedra.

78

6. INTEGER LINEAR OPTIMIZATION

Figure 3. A bipartite graph, a matching and a vertex cover 2. Matching 2.1. Matchings and vertex covers. Let G = (V, E ) be a graph. A matching of G is a set of edges F ⊆ E such that each vertex of G is incident with at most one edge in F , and a vertex cover is a set of vertices X ⊆ V such that each edge of G is incident with some vertex in X . We set ν (G) := max{|F | | F a matching of G} and τ (G) := min{|X | | X a vertex cover of G}. If F is a matching and X is a vertex cover, then each edge in F is incident with at least one vertex in X and no vertex in X covers more than one edge in F , hence |F | ≤ |X |. It follows that ν (G) ≤ τ (G) for any graph. See Figure 3. The incidence matrix of the undirected graph G = (V, E ) is the V × E matrix M G = (mve ) such that mve = 1 if v is incident with e and mve = 0 otherwise. The characteristic vector of F a set F ⊆ E is χF ∈ {0, 1}E defined by χF e = 1 if e ∈ F and χe = 0 otherwise. It is easy to verify that (160) and therefore (161) Similarly, we have (162) (163) and therefore {χF | F is a matching of G} = {x ∈ ZE | M G x ≤ 1, x ≥ 0}, ν (G) = max{1t x | M G x ≤ 1, x ≥ 0, x ∈ ZE }.

{χX | X is a vertex cover of G} + {y ∈ ZV | y ≥ 0} = {y ∈ ZV | yM G ≥ 1, y ≥ 0}, τ (G) = min{y 1 | yM G ≥ 1, y ≥ 0, y ∈ ZV }.

2.2. Matchings in bipartite graphs. A graph G = (V, E ) is bipartite if there are two disjoint subsets U, W ⊆ V such that each edge of G is incident with one vertex in U and one in W . If G is bipartite, the incidence matrix of G is totally unimodular (exercise). It follows that we can find a maximum-cardinality matching in any bipartite graph G by solving the linear optimization problem (164) However, there exist more efficient methods. max{1t x | M G x ≤ 1, x ≥ 0, x ∈ RE }.

Theorem 53 (K¨ onig, 1931). Let G be a bipartite graph. Then ν (G) = τ (G).

2. MATCHING

79

(165) and

Proof. Let M G be the V × E incidence matrix of G. As M G is totally unimodular, both {x ∈ RE | M G x ≤ 1, x ≥ 0} and {y ∈ RV | yM G ≥ 1, y ≥ 0} are integral polyhedra. Hence ν (G) = max{1t x | M G x ≤ 1, x ≥ 0, x ∈ ZE } = max{1t x | M G x ≤ 1, x ≥ 0, x ∈ RE } τ (G) = min{y 1 | yM G ≥ 1, y ≥ 0, y ∈ ZV } = min{y 1 | yM G ≥ 1, y ≥ 0, y ∈ RV }.

(166)

Thus τ (G) = ν (G) follows by linear optimization duality (Theorem 19).

(167)

We say that a matching F of G is perfect if each vertex of G is incident with exactly one edge in F . If G = (V, E ) and Q ⊆ V , then the neigbor set of Q in G is It is an exercise to prove the following Corollary. N (Q) := {w ∈ V \ Q | vw ∈ E, v ∈ Q}.

Corollary 53.1 (Hall, 1935). For a bipartite graph G with bipartition U, W , exactly one of the following is true. (1) G has a perfect matching. (2) There is a set Q ⊆ U such that |N (Q)| < |Q|. The above corollary is know as the marriage theorem, since it concerns a classic matchmakers’ problem. 2.3. Matchings in nonbipartite graphs. If a graph is not bipartite, then we may have ν (G) < τ (G). For example, ν (K3 ) = 1 and τ (K3 ) = 2, and more general for odd circuits we have ν (C2k+1 ) = k and τ (C2k+1 ) = k + 1. Define the polytope PM (G) := conv.hull {χF | F is a matching of G} for any graph G. If G is bipartite, we have (168) as the latter polyhedron is integral. Again this is not necessarily true for a nonbipartite graph: if G is an odd circuit, then 1 1 1 ∈ PM (G) ⊆ {x ∈ RE | M G x ≤ 1, x ≥ 0} ∋ 1. (169) 2 2 Theorem 25 implies that PM (G) is a bounded polyhedron, as it is by definition a polytope. So there must be a system of linear inequalities whose set of solutions is PM (G). Let G = (V, E ) be an undirected graph. For any U ⊆ V , let E (U ) := {e ∈ E | e has both ends in U } ≥ 0 ≤ 1 ≤ PM (G) = {x ∈ RE | M G x ≤ 1, x ≥ 0},

(170)

be the set of edges spanned by U . Let (171)

QM (G) := {x ∈ RE | xe

e∈δ(v) xe

It is an exercise to show that PM (G) ⊆ QM (G) for any graph G. Without proof, we mention: Theorem 54 (Edmonds, 1965). If G is an undirected graph, then PM (G) = QM (G). This theorem does not give an efficient method for finding maximum-cardinality matchings through linear optimization, as the number of inequalities that describe QM (G) is of the order 2|V | . But an efficient method does exist. In fact, Edmonds’ Theorem was a by-product of the developement such a method.

e∈E (U ) xe

| ⌊ |U 2 ⌋

for all e ∈ E for all v ∈ V,

.

for all U ⊆ V }

80

6. INTEGER LINEAR OPTIMIZATION

BB(c, P, d, x0 ) Solve max{cx | x ∈ P } and call the optimal solution x ˜; if cx ˜ ≤ d return(d, x0 ); else if x ˜ ∈ Zn return(cx ˜, x ˜); else let i be such that x ˜i ∈ Z Let P ′ := {x ∈ P | xi ≤ ⌊x ˜i ⌋} and P ′′ := {x ∈ P | xi ≥ ⌈x ˜i ⌉}; ′ (d, x0 ) ← BB(c, P , d, x0 ); (d, x0 ) ← BB(c, P ′′ , d, x0 ); return(d, x0 )

Figure 4. The branch & bound algorithm 3. Branch & bound 3.1. Branch & bound. Suppose we want to solve the problem (172) where P a polyhedron of which we know a description P = {x ∈ Rn | Ax ≤ b}. Two easy observations lead to the branch and bound algorithm for solving (172) First of all, if we somehow can find polyhedra P ′ , P ′′ so that P ∩ Zn = (P ′ ∩ Zn ) ∪ (P ′′ ∩ Zn ), then we may solve max{cx | x ∈ P ∩ Zn } by solving both max{cx | x ∈ P ′ ∩ Zn } and max{cx | x ∈ P ′′ ∩ Zn }: the best of both solutions is the solution to (172). Second, we obviously have (173) The latter problem, the LP relaxation of (172), is an ordinary linear optimization problem, which we can solve by the simplex algorithm or any other suitable method. If we are only interested in solutions x ∈ Zn so that cx > d for some d, then we may safely stop if the value of the LP relaxation is ≤ d. Branch & bound is the procedure described in Figure 4. To solve (172), we must compute BB(c, P, −∞, x0 ) (when passing a polyhdron as an argument, we assume that a description in terms of linear inequalities is given), where x0 is just arbitrary. If the problem is feasible, the algorithm will eventually find an integral x ˜ and set x0 ← x ˜ and d ← cx ˜. From then on, it is ˆ , d, x0 ) (so P ˆ contains only a subset of P ∩ Zn), we find that possible that in a subproblem (c, P ˆ ∩ Zn such that cx > d since already max{cx | x ∈ P ˆ } ≤ d; in that there cannot be any x ∈ P ˆ case, there is no reason to further investigate P . If we would not have a check like this the algorithm would be hardly more than a complete enumeration of all elements of P ∩ Zn . Now, we may save considerable time by not investigating corners of P that cannot contain anything interesting anymore. It should now be clear now why this is called branch and bound: after splitting P into two polyhedra P ′ and P ′′ together containing all integer points of P , the computation ‘branches’; ˆ after computing an upper bound on the objective value, we ‘bound’. when we abandon P The name ‘branch & bound’ applies to several variants of the above procedure. There are other ways to split up P : in general we may construct some a ∈ Zn so that ax ˜ ∈ Z and put P ′ = {x ∈ P | ax ≤ ⌊ax ˜⌋}, P ′′ := {x ∈ P | ax ≥ ⌈ax ˜⌉}. Also, one may choose in which order ′ BB(c, P ′ , d, x0 ) and BB(c, P ′′ , d, x0 ) are evaluated, e.g. ‘if ⌊ax⌋ − ax ≤ 1 2 , do BB(c, P , d, x0 ) first’. max{cx | x ∈ P ∩ Zn } ≤ max{cx | x ∈ P }. max{cx | x ∈ P ∩ Zn },

3. BRANCH & BOUND

81

3.2. Alternative formulations. There is usually more than one polyhedron containing the same integral points, i.e. we can easily have P ∩ Zn = Q ∩ Zn for different polyhedra P, Q. Then max{cx | x ∈ P ∩ Zn } and max{cx | x ∈ Q ∩ Zn} are alternative formulations of the same optimization problem. If P, Q are such that P ⊇ Q, then the branch & bound algorithm will in general solve max{cx | x ∈ Q ∩ Zn } faster than max{cx | x ∈ P ∩ Zn }, simply because then (174) max{cx | x ∈ Q} ≤ max{cx | x ∈ P }. and thus max{cx | x ∈ Q} ≤ d is more likely than max{cx | x ∈ P } ≤ d. Consequently, the branch & bound algorithm will on the whole find more occasions to skip work when evaluating max{cx | x ∈ Q ∩ Zn } versus max{cx | x ∈ P ∩ Zn }. Thus, if an integer linear optimization problem max{cx | P ∩ Zn } is solved too slowly by B&B, switching to a formulation max{cx | Q ∩ Zn } where P ⊇ Q and P ∩ Zn = Q ∩ Zn might help. We have no mathematical proof of this claim, and indeed it is not a very precise statement. But in practice one does strengthen the description of an optimization problem in this manner to improve the running time, and it does often work. The best possible Q, the ‘tightest’ description of P ∩ Zn is PI , the integer hull of P . If we have a description of PI , then there is not even the need to apply the branch & bound algorithm: it suffices to solve the linear optimization problem max{cx | x ∈ PI }. But it may not be possible to determine the inequalities that describe PI ; or there may just be too many inequalities in the description to compute even this linear optimization problem. In general, a polyhedron Q such that P ⊇ Q and P ∩ Zn = Q ∩ Zn may be obtained by adding linear inequalities to the description of P that are valid for all x ∈ P ∩ Zn but not for all x ∈ P , as was explained in the beginning of this Chapter. Even when the full description of the integral hull is not found, adding valid inequalities may still improve the running time of the branch & bound algorithm. But we need not always use an existing description of a problem to find a better one — sometimes it is clear that two different formulations each accurately describe the problem at hand. Example: the facility location problem. Consider the following situation. Throughout the country, you have a total number of m clients that require services from a facility. There is a cost cij associated with servicing client i from facility j . Moreover, there is a cost fi for using facility i at all. The facility location problem (FLP) is to decide which facilities to open, and to assign each client to an open facility, such that the combined cost of opening facilities and servicing clients is minimal. The FLP can be formulated as an integer linear optimization problem: (175) min{
n i=1 fi yi

+

n i=1

m j =1 cij xij

|

xij , yi

m j =1 xij n i=1 xij

≤ myi ≥ 1 ∈ {0, 1}

for all i, . for all j, for all i, j }

Here, the constraint xij , yi ∈ {0, 1} abbreviates ‘xij , yi ∈ Z and 0 ≤ xij , yi ≤ 1’. So (175) is an integer linear optimization problem. The variables represent the choices to be made: yi = 1 means opening facility i, and xij = 1 means assigning client j to facility i. It is clear any x, y satifying the constraints of (175) represent a consistent set of choices. So (175) is a correct formulation of the FLP, but so is (176) min{
n i=1 fi yi

+

n i=1

m j =1 cij xij

| xij

n i=1 xij xij , yi

≤ yi ≥ 1 ∈ {0, 1}

for all i, j . for all j, for all i, j }

82

6. INTEGER LINEAR OPTIMIZATION

Let (177) and (178) P := { (x, y ) ∈ Rnm+n |
m j =1 xij n i=1 xij

0 ≤ xij , yi Q := { (x, y ) ∈ Rnm+n | xij 0
n i=1 xij ≤ xij , yi

≤ myi ≥ 1 ≤ 1 ≤ yi ≥ 1 ≤ 1

for all i, for all j, for all i, j } for all i, j for all j, . for all i, j }

Then (175) is min{f y + cx | (x, y ) ∈ P ∩ Znm+n } and (176) is min{f y + cx | (x, y ) ∈ Q ∩ Znm+n }, and P ⊇ Q. Even though (176) has more constraints than (175), the B&B algorithm will in general solve (176) faster than (175). Exercises (1) Determine whether the matrix  −1 1 0 0 1  1 −1 1 0 0   0 1 −1 1 0   0 0 1 −1 1 1 0 0 1 −1 (2)      

(3) (4) (5)

(6)

(7) (8)

(9) (10) (11) (12)

is totally unimodular. Let M be a totally unimodular matrix. Show that each of the following matrices is totally unimodular. (a) M I . (b) M −M . (c) M t Let M be TU, and let M ′ be obtained from M by a pivot. Show that M ′ is TU. Prove Corollary 52.1. 1 if a ≤ i ≤ b for some a, b ∈ N. A vector q ∈ {0, 1}n is an interval vector if qi = 0 otherwise A matrix M is an interval matrix if each column of M is an interval vector. Show that each interval matrix is totally unimodular. Let M be a totally unimodular matrix and let b ∈ Zn . Show that exactly one of the following holds. (a) There is an x ∈ Zn such that M x = b and x ≥ 0. (b) There is a y ∈ Zm such that yM ∈ {0, 1}n and yb < 0. Show that the incidence graph of a bipartite graph is totally unimodular. Show that the matching and the vertex cover shown in Figure 3 are both optimal. Conclude that this bipartite graph has no perfect matching. Find a set of vertices Q such that |N (Q)| < |Q|. Prove Corollary 53.1. Show that PM (G) ⊆ QM (G) for any undirected graph G = (V, E ). Determine the number of inequalities in the definition of QM (G), for any undirected graph G = (V, E ), as a function of |V | and |E |. Show that QM (G) = {x ∈ RE | M G x ≤ 1, x ≥ 0}(1) for any undirected graph G.

Sign up to vote on this title
UsefulNot useful