You are on page 1of 320

Linear Optimization

and Extensions
Library of Congress Cataloging-in-Publication Data
Fang, Shu-Cherng.
Linear optimization and extensions : theory and algorithms I
Shu-Cherng Fang, Sarat Puthenpura.
p. em.
Includes bibliographical references and index.
ISBN 0-13-915265-2
I. Linear programming. I. Puthenpura, Sarat. II. Title.
T57.74.F37 1993
519.7' 2-dc20 92-38501
CIP

Acquisitions editor: Marcia Horton


Production editor: Irwin Zucker
Prepress buyer: Linda Behrens
Manufacturing buyer: David Dickey
Supplements editor: Alice Dworkin
Copy editor: Robert Lentz
Cover design: Karen Marsilio
Editorial assistant: Dolores Mars

© 1993 by AT&T. All rights reserved.


Published by Prentice-Hall, Inc.
A Simon & Schuster Company
Englewood Cliffs, New Jersey 07632

The authors and publisher of this book have used their best efforts in preparing this book. These efforts include the
development, research, and testing of the theories and programs to determine their effectiveness. The author and
publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation
contained in this book. The author and publisher shall not be liable in any event for incidental or consequential
damages in connection with, or arising out of, the furnishing, performance, or use of these programs.

All rights reserved. No part of this book may be


reproduced, in any form or by any means,
without permission in writing from the publisher.

Printed in the United States of America

10 9 8 7 6 5 4 3 2

ISBN 0-13-915265-2

Prentice-Hall International (UK) Limited, London


Prentice-Hall of Australia Pty. Limited, Sydney
Prentice-Hall Canada Inc., Toronto
Prentice-Hall Hispanoamericana, S.A., Mexico
Prentice-Hall of India Private Limited, New Delhi
Prentice-Hall of Japan, Inc., Tokyo
Simon & Schuster Asia Pte. Ltd., Singapore
Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro
Dedicated to our families:
Chi-Hsin Chao Fang
Mini and Vidya Puthenpura
Contents

PREFACE xiii

1 INTRODUCTION 1

1.1 History of Linear Programming 1


1.2 The Linear Programming Problem 2
1.2.1 Standard-Fonn Linear Program, 3
1.2.2 Embedded Assumptions, 3
1.2.3 Converting to Standard Fonn, 4

1.3 Examples of Linear Programming Problems 5


1.4 Mastering Linear Programming 9
References for Further Reading 10
Exercises 11

2 GEOMETRY OF LINEAR PROGRAMMING 14

2.1 Basic Terminologies of Linear Programming 14


2.2 Hyperplanes, Halfspaces, and Polyhedral Sets 15
2.3 Affine Sets, Convex Sets, and Cones 17

vii
viii Contents

2.4 Extreme Points and Basic Feasible Solutions 19


2.5 Nondegeneracy and Adjacency 21
2.6 Resolution Theorem for Convex Polyhedrons 23
2.7 Fundamental Theorem of Linear Programming 24
2.8 Concluding Remarks: Motivations of Different
Approaches 25
References for Further Reading 26
Exercises 26

3 THE REVISED SIMPLEX METHOD 29

3.1 Elements of an Iterative Scheme 29


3.2 Basics of the Simplex Method 30
3.3 Algebra of the Simplex Method 31
3.3.1 Stopping the Simplex Method-Checking
for Optimality, 33
3.3.2 Iterations of the Simplex Method-Moving
for Improvement, 33

3.4 Starting the Simplex Method 39


3.4.1 Two-Phase Method, 39
3.4.2 Big-M Method, 41

3.5 Degeneracy and Cycling 42


3.6 Preventing Cycling 44
3.6.1 Lexicographic Rule, 44
3.6.2 Bland's Rule, 44

3.7 The Revised Simplex Method 45


3.8 Concluding Remarks 50
References for Further Reading 50
Exercises 51

4 DUALITY THEORY AND SENSITIVITY ANALYSIS 55

4.1 Dual Linear Program 56


4.2 Duality Theory 57
Contents ix

4.3 Complementary Slackness and Optimality Conditions 61


4.4 An Economic Interpretation of the Dual Problem 63
4.4.I Dual Variables and Shadow Prices, 63
4.4.2 Interpretation of the Dual Problem, 64
4.5 The Dual Simplex Method 65
4.5.I Basic Idea of the Dual Simplex Method, 65
4.5.2 Sherman-Morrison- Woodbury Formula, 66
4.5.3 Computer Implementation of the Dual Simplex
Method, 70
4.5.4 Find an Initial Dual Basic Feasible Solution, 72
4.6 The Primal Dual Method 73
4.6.I Step-by-Step Procedure for the Primal-Dual
Simplex Method, 75
4.7 Sensitivity Analysis 78
4. 7.I Change in the Cost Vector, 78
4. 7.2 Change in the Right-Hand-Side Vector, 80
4. 7.3 Change in the Constraint Matrix, 82

4.8 Concluding Remarks 86


References for Further Reading 87
Exercises 87

5 COMPLEXITY ANALYSIS AND THE ELLIPSOID METHOD 92

5.1 Concepts of Computational Complexity 93


5.2 Complexity of the Simplex Method 94
5.3 Basic Ideas of the Ellipsoid Method 96
5.4 Ellipsoid Method for Linear Programming 100
5.5 Performance of the Ellipsoid Method for LP 103
5.6 Modifications of the Basic Algorithm 104
5.6.I Deep Cuts, I04
5.6.2 Surrogate Cuts, I06
5.6.3 Parallel Cuts, I06
5.6.4 Replacing Ellipsoid by Simplex, 107
5.7 Concluding Remarks 108
References for Further Reading 108
Exercises 109
X Contents

6 KARMARKAR'S PROJECTIVE SCALING ALGORITHM 112

6.1 Basic Ideas of Karmarkar's Algorithm 112


6.2 Karmarkar's Standard Form 114
6.2.1 The Simplex Structure, 115
6.2.2 Projective Transformation on the Simplex, 116

6.3 Karmarkar' s Projective Scaling Algorithm 117


6.4 Polynomial-Time Solvability 120
6.5 Converting to Karmarkar's Standard Form 126
6.6 Handling Problems with Unknown Optimal Objective
Values 128
6.7 Unconstrained Convex Dual Approach 135
6. 7.1 £-Optimal Solution, 136
6.7.2 Extension, 139

6.8 Concluding Remarks 141


References for Further Reading 141
Exercises 142

7 AFFINE SCALING ALGORITHMS 144

7.1 Primal Affine Scaling Algorithm 145


7.1.1 Basic Ideas of Primal Affine Scaling, 145
7.1.2 Implementing the Primal Affine Scaling
Algorithm, 155
7.1.3 Computational Complexity, 160
7.2 Dual Affine Scaling Algorithm 165
7.2.1 Basic Ideas of Dual Affine Scaling, 165
7.2.3 Dual Affine Scaling Algorithm, 167
7.2.3 Implementing the Dual Affine Scaling
Algorithm, 169
7.2.4 Improving Computational Complexity, 172
7.3 The Primal-Dual Algorithm 177
7.3.1 Basic Ideas of the Primal-Dual Algorithm, 178
7.3.2 Direction and Step-Length of Movement, 180
7.3.3 Primal-Dual Algorithm, 184
7.3.4 Polynomial-Time Termination, 184
7.3.5 Starting the Primal-Dual Algorithm, 188
7.3.6 Practical Implementation, 189
Contents xi

7.3.7 Accelerating via Power-Series Method, 193


7.4 Concluding Remarks 194
References for Further Reading 195
Exercises 197

8 INSIGHTS INTO THE INTERIOR-POINT METHODS 201

8.1 Moving Along Different Algebraic Paths 201


8.1.1 Primal Affine Scaling with Logarithmic Barrier
Function, 203
8.1.2 Dual Affine Scaling with Logarithmic Barrier
Function, 204
8.1.3 The Primal-Dual Algorithm, 205
8.2 Missing Information 207
8.2.1 Dual Information in the Primal Approach, 207
8.2.2 Primal Information in the Dual Approach, 207
8.3 Extensions of Algebraic Paths 208
8.4 Geometric Interpretation of the Moving Directions 209
8.4.1 Primal Affine Scaling with Logarithmic Barrier
Function, 211
8.4.2 Dual Affine Scaling with Logarithmic Barrier
Function, 212
8.4.3 The Primal-Dual Algorithm, 213
8.5 General Theory 217
8.5.1 General Primal Affine Scaling, 217
8.5.2 General Dual Affine Scaling, 219
8.6 Concluding Remarks 220
References for Further Reading 221
Exercises 221

9 AFFINE SCALING FOR CONVEX QUADRATIC PROGRAMMING 224

9.1 Convex Quadratic Program with Linear Constraints 225


9.1.1 Primal Quadratic Program, 225
9.1.2 Dual Quadratic Program, 225
9.2 Affine Scaling for Quadratic Programs 227
9.2.1 Primal Affine Scaling for Quadratic
Programming, 227
xii Contents

9.2.2 Improving Primal Affine Scaling for Quadratic


Programming, 237

9.3 Primal-Dual Algorithm for Quadratic Programming 241


9.3.1 Basic Concepts, 241
9.3.2 A Step-By-Step Implementation Procedure, 243
9.3.3 Convergence Properties of the Primal-Dual
Algorithm, 245

9.4 Convex Programming with Linear Constraints 246


9.4.1 Basic Concepts, 246
9.4.2 A Step-by-Step Implementation Procedure, 248

9.5 Concluding Remarks 249


References for Further Reading 249
Exercises 250

10 IMPLEMENTATION OF INTERIOR-POINT ALGORITHMS 253

10.1 The Computational Bottleneck 253


10.2 The Cholesky Factorization Method 254
10.2.1 Computing Cholesky Factor, 255
10.2.2 Block Cholesky Factorization, 257
10.2.3 Sparse Cholesky Factorization, 259
10.2.4 Symbolic Cholesky Factorization, 263
10.2.5 Solving Triangular Systems, 263

10.3 The Conjugate Gradient Method 265


10.4 The LQ Factorization Method 268
10.5 Concluding Remarks 275
References for Further Reading 276
Exercises 277

BIBLIOGRAPHY 280

INDEX 295
Preface

Since G. B. Dantzig first proposed the celebrated simplex method around 1947, the wide
applicability of linear programming models and the evolving mathematical theory and
computational methodology under these models have attracted an immense amount of
interest from both practitioners and academicians. In particular, in 1979, L. G. Khachian
proved that the ellipsoid method of N. Z. Shor, D. B. Yudin, and A. S. Nemirovskii could
outperform the simplex method in theory by exhibiting polynomial-time performance;
and, in 1984, N. Karmarkar designed a polynomial-time interior-point algorithm that
rivals the simplex method even in practice. These three methods present different and
yet fundamental approaches to solving linear optimization problems.
This book provides a unified view that treats the simplex, ellipsoid, and interior-
point methods in an integrated manner. It is written primarily as a textbook for those
graduate students who are interested in learning state-of-the-art techniques in the area of
linear programming and its natural extensions. In addition, the authors hope it will serve
as a useful handbook for people who pursue research and development activities in the
relatively new field of interior-point methods for optimization.
We have organized the book into ten chapters. In the first chapter, we introduce
the linear programming problems with modeling examples and provide a short review
of the history of linear programming. In the second chapter, basic terminologies are
defined to build the fundamental theory of linear programming and to form a geomet-
ric interpretation of the underlying optimization process. The third chapter covers the
classical simplex method-in particular, the revised simplex method. Duality theory,
the dual simplex method, the primal-dual method, and sensitivity analysis are the topics
of Chapter 4. In the fifth chapter, we look into the concept of computational complex-
ity and show that the simplex method, in the worst-case analysis, exhibits exponential

xiii
xiv Preface

complexity. Hence the ellipsoid method is introduced as the first polynomial-time al-
gorithm for linear programming. From this point onward, we focus on the nonsimplex
approaches. Naturally, the sixth chapter is centered around the recent advances of Kar-
markar's algorithm and its polynomial-time solvability. Chapter 7 essentially covers
the affine scaling variants, including the primal, dual, and primal-dual algorithms, of
Karmarkar's method. The concepts of central trajectory and path-following are also in-
cluded. The eighth chapter reveals the insights of interior-point methods from both the
algebraic and geometric viewpoints. It provides a platform for the comparison of differ-
ent interior-point algorithms and the creation of new algorithms. In Chapter 9, we extend
the results of interior-point-based linear programming techniques to quadratic and convex
optimization problems with linear constraints. The important implementation issues for
computer programming are addressed in the last chapter. Without understanding these
issues, it is impossible to have serious software development that achieves the expected
computational performance.
The authors see three key elements in mastering linear optimization and its exten-
sions, namely, (1) the intuitions generated by geometric interpretation, (2) the properties
proven by algebraic expressions, and (3) the algorithms validated by computer imple-
mentation; and the book is written with emphasis on both theory and algorithms. Hence
it is implied that a user of this book should have some basic understanding in math-
ematical analysis, linear algebra, and numerical methods. Since an ample number of
good reference books are available in the market, we decided not to include additional
mathematical preliminaries.
This book pays special attention to the practical implementation of algorithms.
Time has proven that the practical value of an algorithm, and hence its importance among
practitioners, is largely determined by its numerical performance including robustness,
convergence rate, and ease of computer implementation. With the advent of digital
computer technology, iterative solution methods for optimization have become extremely
popular. Actually, this book explains various algorithms in the framework of an iterative
scheme with three principal aspects: (a) how to obtain a starting solution, (b) how to
check if a current solution is optimal, and (c) how to move to an improved solution. We
have attempted to cast all the algorithms discussed in the book within the purview of
this philosophy. In this manner, computer implementation follows naturally.
The material in this book has been used by the authors to teach several graduate
courses at North Carolina State University, University of Pennsylvania, and Rutgers
University since 1988. According to our experience, Chapters 1 through 6 together with
a brief touch of Chapter 7 comprise the material for a one-semester first graduate course
in Linear Programming. A review of Chapters 3 and 5 together with Chapters 6 through
10 could serve for another one-semester course in Advanced Linear Programming, or
Special Topics on Interior-Point Methods. This book can also be used as a "cookbook"
for computer implementation of various optimization algorithms, without actually going
deep into the theoretical aspects. For this purpose, after introducing each algorithm, we
have included a step-by-step implementation recipe.
We have tried to incorporate the most salient results on the subject matter into this
book. Despite our efforts, however, owing to the tremendous ongoing research activities
Preface XV

in the field of interior-point methods, we may have unintentionally left out some of the
important and recent work in the area.

ACKNOWLEDGMENTS

Writing this book has been a long and challenging task. We could not have carried
on this endeavor without persistent help and encouragement from our colleagues and
friends, in addition to our families. The first and foremost cif such people is Mr. Steve
Chen, Head of the Global Network and Switched Services Planning Department, AT&T
Bell Laboratories. He envisioned the importance of this work and provided us with
tremendous support in time,· equipment, periodic suggestions for improving the book,
and every other aspect one can think of. Also, in particular, we wish to thank Professors
Romesh Saigal (University of Michigan), Jong-Shi Pang (Johns Hopkins University),
Jie Sun (Northwestern University), Robert J. Vanderbei (Princeton University), and Yin-
Yu Ye (University of Iowa) for reviewing our book proposal and/or drafts; Professor
Salah E. Elmaghraby (North Carolina State University) for encouraging and scheduling
one of us in the teaching of linear programming courses; Professor Elmor L. Peterson
(North Carolina State University) for his invaluable advisory work; Dr. L. P. Sinha
and Mr. W. Radwill (AT&T Bell Laboratories) for their valuable support and constant
encouragement. Also, successful completion of this work would not have been possible
without the support we received from Dr. Phyllis Weiss (AT&T Bell Laboratories).
Besides, we would like to thank Drs. Jun-Min Liu, Lev Slutsman, David Houck Jr.,
Mohan Gawande, Gwo-Min Jan (AT&T Bell Laboratories), and Dr. Ruey-Lin Sheu
(North Carolina State University) for their constructive suggestions. We express also the
greatest appreciation to those students who have tolerated the unpolished manuscript and
helped us improve the quality of this book. The final thanks go to Dr. Bruce Loftis of the
North Carolina Supercomputing Center, the Cray Research Grants, and our publishers in
Prentice Hall.
Shu-Cherng Fang
Raleigh, North Carolina

Sarat Puthenpura
Murray Hill, New Jersey
Linear Optimization
and Extensions
1

Introduction

Linear programming is concerned with problems in which a linear objective function


in terms of decision variables is to be optimized (i.e., either minimized or maximized)
while a set of linear equations, inequalities, and sign restrictions are imposed on the
decision variables as requirements. Linear programming is a quite young and yet very
active branch of applied mathematics. The wide applicability of linear programming
models and the evolving mathematical theory and computational methodology under
these models have attracted an immense amount of interest from both practitioners and
academicians in the past five decades. In a recent survey of Fortune 500 companies,
85% of those responding said that they had used linear programming.
In this chapter, we briefly review the history of linear programming in Section
1, introduce linear programming problems in Section 2, and give linear programming
examples in Section 3. The layout of the book is discussed in the final section.

1.1 HISTORY OF LINEAR PROGRAMMING

The linear programming problem was first conceived by G. B. Dantzig around 1947 while
he was working as a Mathematical Advisor to the United States Air Force Comptroller on
developing a mechanized planning tool for a deployment, training, and logistical supply
program. The work led to his 1948 publication, "Programming in a Linear Structure."
The name "linear programming" was coined by T. C. Koopmans and Dantzig in the
summer of 1948, and an effective "simplex method" for solving linear programming
problems was proposed by Dantzig in 1949. In the short period between 1947 and
1949, a major part of the foundation of linear programming was laid. As early as 1947,

1
2 Introduction Chap. 1

Koopmans began pointing out that linear programming provided an excellent framework
for the analysis of classic economic theories.
Linear programming was not, however, born overnight. Prior to 1947, mathemati-
cians had studied systems of linear inequalities, the core of the mathematical theory of
linear programming. The investigation of such systems can be traced to Fourier's work
in 1826. Since then, quite a few mathematicians have considered related subjects. In
particular, the optimality conditions for functions with inequality constraints in the finite-
dimensional case appeared in W. Karush's master's thesis in 1939, and various special
cases of the fundamental duality theorem of linear programming were proved by others.
Also, as early as 1939, L. V. Kantorovich pointed out the practical significance of a
restricted class of linear programming models for production planning and proposed a
rudimentary algorithm for their solution. Unfortunately, Kantorovich' s work remained
neglected in the U.S.S.R. and unknown elsewhere until long after linear programming
had been well established by G. B. Dantzig and others.
Linear programming kept evolving in the 1950s and 1960s. The theory has been
enriched and successful applications have been reported. In 1975, the topic came to
public attention when the Royal Sweden Academy of Sciences awarded the Nobel Prize
in economic science to L. V. Kantorovich and T. C. Koopmans "for their contributions
to the theory of optimum allocation of resources." Yet another dramatic development
in linear programming came to public attention in 1979: L. G. Khachian proved that
the so-called "ellipsoid method" of N. Z. Shor, D. B. Yudin, and A. S. Nemirovskii,
which differs radically from the simplex method, could outperform the simplex method
in theory. Unlike the simplex method, which might take an exponential number of
iterations to reach an optimal solution, the ellipsoid method finds an optimal solution of
a linear programming problem in a polynomial-time bound. Newspapers around the world
published reports of this result as if the new algorithm could solve the most complicated
and large-scale resource allocation problems in no time. Unfortunately, the theoretic
superiority of the ellipsoid method could not be realized in practical applications.
In 1984, a real breakthrough came from N. Karmarkar's "projective scaling algo-
rithm" for linear programming. The new algorithm not only outperforms the simplex
method in theory but also shows its enormous potential for solving very large scale
practical problems. Karmarkar' s algorithm is again radically different from the simplex
method-it approaches an optimal solution from the interior of the feasible domain. This
interior-point approach has become the focal point of research interests in recent years.
Various theoretic developments and real implementations have been reported, and further
results are expected.

1.2 THE LINEAR PROGRAMMING PROBLEM

In this section, we first introduce a linear programming problem in its standard form, then
discuss the embedded assumptions of linear programming, and finally show a mechanism
to convert any general linear programming problem into the standard form.
Sec. 1.2 The Linear Programming Problem 3

1.2.1 Standard-Form Linear Program

A standardjorm linear programming problem can be described as follows:


Minimize + c2x2 + · · · + CnXn
z = C]X!
subject to aux1 + a12x2 + · · · + a!nXn = b1

X], X2, ... Xn ::::_ 0 (1.1)

In which, x,, x2, ... , Xn are nonnegative decision variables to be determined and c 1 , c2,
... , Cn are cost coefficients associated with the decision variables such that the objective
function z = c,x, + c2x2 + · · · + CnXn is to be minimized. Moreover, ~J=I a;jXj = b;
denotes the ith technological constraint for i = 1, ... , m, where aij, for i = 1, ... , m
and j = 1, ... , n, are the technological coefficients and b;, for i = 1, ... , m, are the
right-hand-side coefficients.
A linear programming problem (in standard form) is to find a specific nonnegative
value for each decision variable such that the objective function achieves its minimum
at this particular solution while all the technological constraints are satisfied.
If we denote X= (x,, ... , Xn)T, c = (c,, ... , Cn)T, b = (b 1, ... , bm)T, and A=
matrix of (aij ), then the above linear programming problem can be written in matrix
notation as follows:
Minimize cT x

subject to Ax = b (1.2)

x::::O

1.2.2 Embedded Assumptions

In order to represent an optimization problem as a linear programming problem, implicitly


we make the following assumptions:

1. Proportionality assumption: For each decision variable Xj, for j = 1, ... , n, its
contribution to the objective function z and to each constraint
n
La;jXj =b;, for i=1, ... ,m,
j=l

is directly proportional to its value. There are no economies of returns to scale or


discounts at all. To be more specific, one unit of variable Xj contributes Cj units in
the objective function and aij units in the ith constraint, and two units of variable
4 Introduction Chap. 1

Xj contribute exactly 2cj units in the objective function and 2aij units in the ith
constraint. No set-up cost for starting the activity is realized.
2. Additivity assumption: The contribution to the objective function or any techno-
logical constraint of any decision variable is independent of the values of other
decision variables. There are no interaction or substitution effects among the de-
cision variables. The total contribution is the sum of the individual contributions
of each decision variable.
3. Divisibility assumption: Each decision variable is allowed to assume any fractional
value. In other words, noninteger values for the decision variables are permitted.
4. Certainty assumption: Each parameter (the cost coefficient Cj, the technological
coefficient aij, and the right-hand-side coefficient bi) is known with certainty. No
probabilistic or stochastic element is involved in a linear programming problem.

It is clearly seen that a nonlinear function could violate the proportionality assump-
tion and additivity assumption, an integer requirement on the decision variables could
ruin the divisibility assumption, and a probabilistic scenario could rule out the certainty
assumption. Although the embedded assumptions seem to be very restrictive, linear
programming models are nonetheless among the most widely used models today.

1.2.3 Converting to Standard Form

The standard form of linear program deals with a linear minimization problem with non-
negative decision variables and linear equality constraints. In general, a linear program
is a problem of minimizing or maximizing a linear objective function with restricted or
unrestricted decision variables in the presence of linear equality and/or inequality con-
straints. Here we introduce a mechanism to convert any general linear program into the
standard form.

Linear Inequalities and Equations. A linear inequality can be easily con-


verted into an equation. If the ith technological constraint has the form
n
l:aijXj ::=: bi
j=l

we can add a nonnegative slack variable si :=::: 0 to make a linear equation


n
l:aijXj +s; = b;
j=l

Similarly, if the ith technological constraint has the form


n
l:a;jXj :=::: b;
j=l

we can subtract a nonnegative surplus variable ei :=::: 0 to make a linear equation


Sec. 1.3 Examples of Linear Programming Problems 5
n

L a i j X j - ei = b;
j=l
n
On the other hand, a linear equation L aijXj = b; can be converted into a pair of
j=l
inequalities, namely, n n
L a;jXj :S b; and L aijXj ::: b;
j=l j=l

Restricted and Unrestricted Variables. The decision variables in a standard-


form linear program are required to be nonnegative. If a variable is restricted to be
Xj ::: lj, we can replace Xj by Xj + lj and require the new variable Xj ::: 0. Similarly, if
a variable is restricted to be Xj :S Uj, we can replace Xj by Uj - Xj and require the new
variable Xj ::: 0.
As to an unrestricted variable Xj E R, we can replace it by Xj - Xj with two new
variables Xj ::: 0 and Xj ::: 0. Also note that, if x 1 , ••• , Xk are a group of unrestricted
variables, we need to introduce only k + 1 new variables, i 1, ••• , xk and such that Xj x
is replaced by Xj - x, for j = 1, ... , k with Xj ::: 0 and 0. x :::
Maximization and Minimization. In case our objective is to maximize a linear
function, instead of minimizing, note that over any given region,

maximum (t
j=l
CjXj) = -minimurri (tj=l
-cjxj)

Therefore, we simply multiply the cost coefficients by -1 to convert a maximization


problem into a minimization problem. But, once the minimum of the new problem is
found, remember to multiply the minimum value by -1 for the original maximum.

Canonical-Form Linear Program. In addition to the standard form, linear


programming problems are also commonly represented in the following canonical form:

Minimize

n
subject to "'"'a·
L...J lj·x·J >- b·,, fori= 1, ... , m
j=l

Xj ::: 0, for j = 1, ... , n

1.3 EXAMPLES OF LINEAR PROGRAMMING PROBLEMS

Modeling a problem is always an art. Although iinear programming has long proved
its merit as an effective model of numerous applications, still there is no fixed rule of
6 Introduction Chap. 1

modeling. In this section we present some classic examples of situations that have natural
formulations, from which we see that a general practice is to define decision variables
first. Each decision variable is associated with a certain activity of interest, and the
value of a decision variable may represent the level of the associated activity. Once the
decision variables are defined, the objective function usually represents the gain or loss of
taking these activities at different levels, and each technological constraint depicts certain
interrelationships among those activities. However, many sophisticated applications go
far beyond the general practice.

Example 1.1 The diet problem


Suppose n different food items are available at the market and the selling price for the jth
food is Cj per unit. Moreover, there are m basic nutritional ingredients for the human body
and a minimum of b; units of the ith ingredient are required to achieve a balanced diet for
good health. In addition, a study shows that each unit of the jth food contains aij units of
the ith nutritional ingredient. A dietitian of a large group may face a problem of determining
the most economical diet that satisfies the basic minimum nutritional requirements for good
health.
Since the activity of interest here is to determine the quantity of each food in the diet,
we define Xj to be the number of units of food j in the diet, for j = 1, ... , n. Then the
problem is to determine Xj 's which minimize the total cost

subject to the nutritional requirements

and the nonnegativity constraints

Xj ~ 0, X2 ~ 0, ... , Xn ~ 0

By subtracting a nonnegative surplus variable for each constraint, we have a linear


programming problem in its standard form:
n
Minimize L CjXj
j=l

n
subject to L a;jXj -Xi = b;, fori= 1, ... , m
j=l

for j = 1, ... , n, i = 1, ... , m


Sec. 1.3 Examples of Linear Programming Problems 7

Example 1.2 The transportation problem


A moving company is contracted to ship certain product from m sources to n destinations.
There are a; units of product stored at the ith source, for i = 1, ... , m, and a minimum
of b1 units of product are required to be received at the jth destination, for j = 1, ... , n.
Suppose the customer is willing to pay a price of CiJ for moving one unit of product from
source i to destination j and the moving company is interested in fulfilling the contract with
a maximum earning.
Since the major activity of interest is to ship the product from a source to a destination,
we define x;J to be the number of units of product shipping from the ith source to the jth
destination, for i = 1, ... , m and j = 1, ... , n. Then the problem is to find XiJ 's which
maximize the total earnings
m n
LLCijXij
i=l J=i
subject to the source constraints
n
LXiJ ::Sa;, for i = I, 2, ... , m
J=l
the destination constraints
m
LXij:::: bj, for j = 1, 2, ... , n
i=i
and the nonnegativity constraints

Xij:::: 0, for i = 1, ... , m, j = 1, ... , n


By adding a nonnegative slack variable to each source constraint, subtracting a non-
negative surplus variable from each destination constraint, and multiplying the total earning
by -1, we have a standard-form linear programming problem:
m n
Minimize L L -CiJXiJ
i=l J=i
n
subject to LXij +x; =a;, for i = 1, 2, ... , m
J=l
m
LXij- Xj = bj, for j = 1, 2, ... , n
i=l
fori= 1, ... , m, j = 1, 2, ... , n
To assure this problem has a feasible solution, the condition
m n
La; ::::Lbi
i=l J=i
is, of course, assumed.
8 Introduction Chap. 1

Example 1.3 The warehousing problem


A warehouse has a fixed capacity C. The manager of the warehouse buys and sells the
stock of a certain commodity over a certain length of time to make profit. We break the
time window into n periods (say one week per period) and assume that in the jth period
the same J.:tnit price Pj holds for both purchase and sale. In addition, there is a unit cost r
for holding stock for one period. The warehouse is empty at the beginning and is required
to be empty at the end. How should the manager operate?
The major activities involve buying, selling, and holding the stock in each period.
We define Xj to be the level of stock in the warehouse at the beginning of the jth period,
Yj the amount bought during the period, and Zj the amount sold during the period, Then
the manager tries to maximize his profit
n
"I)PjZj- PjYj- rXj)
j=i

subject to the inventory-balance constraints

Xj+i=Xj+Yj-Zj, for j = 1, ... , n - 1


the warehouse-capacity constraints
Xj ::5 C, for j = 1, ... , n
the boundary conditions

Xi =0

Xn + Yn- Zn = 0
and the nonnegativity constraints
x·1>- 0 , Yj :::: 0, Zj :::: 0, for j = 1, ... , n
After converting, we have a standard-form linear program:
n
-Minimize 2)-PjZj + PjYj + rxj)
j=i

subject to Xj -Xj+i +Yj -Zj =0, for j = 1, ... , n - 1

xj +xj = c, for j = 1, ... , n

Xi =0

Xn + y,.- Z 11 = 0

Xj :::: 0, Xj :::: 0, Yj :::: 0, Zj :::: 0, for j = 1, ... , n

Example 1.4 The cutting-stock problem


A metal slitting company cuts master rolls of standard width w and length l into subrolls ~f
smaller width but the same length l. Customers specify their orders in terms of the number
of subrolls of different widths. The objective is to use a minimum number of master rolls
to satisfy a set of customers' orders.
Sec. 1.4 Mastering Linear Programming 9

Suppose that there are m different widths specified by customers, say w 1 , w 2 , ... , Wm,
and customers require bi subrolls of width Wi, for i = 1, ... , m. For a master roll with
width w (of course, Wi :S w for each i), there are many ways to cut it into subrolls. For
example, subrolls of widths 3, 5, 7 are cut from a master roll of width 10. We can cut a
master roll to produce three subrolls of width 3, zero subrolls of width 5, and zero subrolls
of width 7; or cut to produce one subroll of width 3, zero subrolls of width 5, and one
subroll of width 7; or cut to produce zero subrolls of width 3, two subrolls of width 5,
and zero subrolls of width 7, ... , etc. Each such way is called a feasible cutting pattern.
Although the total number of all possible cutting patterns may become huge, the number of
feasible cutting patterns is always finite, say n. If we let aij be the number of subrolls of
width Wi obtained by cutting one master roll according to pattern j, then
m
"a··w·
L_; l j l <
- w
i=l

is required for the pattern to be feasible. Now define Xj to be the number of master rolls
cut according to the jth feasible pattern, and the cutting-stock problem becomes an integer
linear programming problem:
n
Minimize L Xj
j=l

n
subject to L_; lj J >
"a··x· - bi fori= 1, ... , m
j=l

for j = 1, ... , n

Xj: integer for j = 1, ... , n


If the integrality requirement on the Xj 'sis dropped, the problem becomes a linear program-
ming problem.

1.4 MASTERING LINEAR PROGRAMMING

This is a book of linear programming and its extensions. The authors see three key
elements in the mastering of linear programming, namely,

1. Intuitions generated by observing geometric interpretations.


2. Properties proven by manipulating algebraic expressions.
3. Algorithms validated by computer implementations.

The first step of learning is to "see" problems and have a feeling about them. In
this way, we are led to understand the known properties and conjecture new ones. The
second step is to translate geometric properties into algebraic expressions and to develop
algebraic skills to manipulate them in proving new results. Once the problems are
understood and basic results are obtained, the third step is to develop solution procedures.
10 Introduction Chap. 1

Since the most important characteristic of a high-speed computer is its ability to perform
repetitive operations very efficiently, linear programming algorithms are introduced in
an iterative scheme and validated by computer implementations.
The basic philosophy of solving a linear programming problem via an iterative
scheme is to start from a rough solution and successively improve the current solution
until a set of desired optimality conditions are met. In this book, we treat the simplex
method, the ellipsoid method, and Karmarkar' s algorithm and its variants from this
integrated iterative approach. The layout of the book is as follows. We provide simple
geometry of linear programming in Chapter 2, introduce the classic simplex method in
Chapter 3, and study the fascinating duality theory and sensitivity analysis in Chapter 4.
From the complexity point of view, we further introduce Khachian's ellipsoid method
in Chapter 5 and Karmarkar' s algorithm in Chapter 6. The affine scaling algorithms,
as variants of Karmarkar' s algorithm, are the topics of Chapter 7. The insights of the
interior-point methods are discussed in Chapter 8. Then we extend our horizon to touch
on the convex quadratic programming in Chapter 9. Finally we wrap up the book by
studying the computer implementation issues in Chapter 10.

REFERENCES FOR FURTHER READING

1.1. Bazaraa, M.S., Jarvis, J. J., and Sherali, H. D., Linear Programming and Network Flows, 2d
ed., John Wiley, New York (1990).
1.2. Bland, R. G., Goldfarb, D., and Todd, M. J., "The ellipsoid method: a survey," Operations
Research 29, 1039-1091 (1981).
1.3. Borgwardt, K. H., The Simplex Method: A Probabilistic Analysis, Springer-Verlag, Berlin
(1987).
1.4. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).
1.5. Dantzig, G. B., "Maximization of a linear function of variables subject to linear inequalities,"
Activity Analysis of Production and Allocation, edited by T. C. Koopmans, John Wiley, New
York, 339-347 (1951).
1.6. Dantzig, G. B., Linear Programming and Extensions, Princeton University Press, Princeton,
NJ (1963).
1.7. Gass, S.l., Linear Programming: Methods and Applications, 2d ed., McGraw-Hill, New York
(1964).
1.8. Gilmore, P. C., and Gomory, R. E., "A linear programming approach to the cutting-stock
problem," Operations Research 9, 849-859 (1961).
1.9. Gilmore, P. C., and Gomory, R. E., "A linear programming approach to the cutting-stock
problem-Part II," Operations Research 11, 863-888 (1963).
1.10. Goldfarb, D., and Todd, M. J., "Linear Programming," in Optimization, Handbook in Op-
erations Research and Management Science, ed. by Nemhauser, G. L. and Rinnooy Kan,
A. H. G., Vol. 1, 73-170, Elsevier-North Holland, Amsterdam (1989).
1.11. Hooker, J. N., "Karmarkar's linear programming algorithm," Interfaces 16, 75-90 (1986).
Exercises 11

1.12. Kantorovich, L. V., "Mathematical methods of organizing and planning production" (in Rus-
sian), Publication House of the Leningrad State University, Leningrad (1939), (English trans-
lation) Management Science 6, 366-422 (1959-60).
1.13. Karmarkar, N., "A new polynomial time algorithm for linear programming," Combinatorica
4, 373-395 (1984).
1.14. Karush, W., "Minima of functions of several variables with inequalities as side constraints,"
Master's thesis, Department of Mathematics, University of Chicago (1939).
1.15. Khachian, L. G., "A polynomial algorithm in linear programming" (in Russian), Doklady
Akademiia Nauk SSSR 224, 1093-1096, (English translation) Soviet Mathematics Doklady
20, 191-194 (1979). Soviet Mathematics Doklady 6, 286-290 (1965).
1.16. Luenberger, D. G., Introduction to Linear and Nonlinear Programming, 2d ed., Addison-
Wesley, Reading, MA (1973).
1.17. Murty, K. G., Linear Programming, John Wiley, New York (1983).
1.18. Shamir, R., "The efficiency of the simplex method: a survey," Management Science 33,
301-334 (1987).

EXERCISES

1.1. Convert the following linear programming problems into standard form:
(a) Minimize 4xt + .Jix2 - 0.35x3

subject to

X[,X3?::: 0
(b) Maximize -3.lxt + 2.Jix2- x3
subject to 1OOxt - 20x2 = 7

-1lxt -7TCX2- 2x3 :S 400

X[ ?::: 20, X2 ?::: 0, X3 ?::: -15


(c) Maximize X[+ 3X2- 2X3

subject to

X2 ?::: 0, X3 :S 10
1.2. Consider a linear programming problem:

Minimize 2xt + 6x2 + 8x3


subject to X! + 2x2 + x3 = 5
12 Introduction Chap. 1

(a) Convert this problem into its standard form.


(b) Can you find an equivalent linear programming problem with only two variables? [Hint:
Eliminate XJ from the constraints and replace it by 5- 2xz -x3 in the objective function.]
(c) Convert the equivalent linear program into standard form.
(d) Try to solve the problem.
1.3. Consider the following problem:

Minimize Xf +xz +4x3


subject to xf- xz = 0

(a) Is this a linear programming problem?


(b) Can you solve this problem by finding an equivalent linear programming problem?
[Hint: Use the first constraint.]
(c) Can you convert the equivalent linear programming problem into its standard form?
(d) Can you solve the linear program? the original problem?
1.4. Consider the following optimization problem:

Minimize lxll + 21xzl- lx31

subject to x1 + xz- x3 _:::: 10

x1- 3xz +2x3 = 12


(a) Is this a linear programming problem?
(b) Can you convert it into a linear program in standard form? [Hint: For any real number
x, we can find u, v ;::: 0 such that lxl = u + v and x = u- v.]
(c) Convert the following problem into a linear program in standard form:

Minimize lx1 -51+ lxz + 41

subject to x 1 + xz _:::: 10

x1 - 3xz:::: 2

1.5. CHIPCO produces two kinds of memory chips (Chip-1 and Chip-2) for computer usage.
The unit selling price is $15 for Chip-1 and $25 for Chip-2. To make one Chip-1, CHIPCO
has to invest 3 hours of skilled labor, 2 hours of unskilled labor, and 1 unit of raw material.
To make one Chip-2, it takes 4 hours of skilled labor, 3 hours of unskilled labor, and 2 units
of raw material. The company has 100 hours of skilled labor, 70 hours of unskilled labor,
and 30 units of raw material available. The sales contract signed by CHIPCO requires that
at least 3 units of Chip-2 have to be produced and any fractional quantity is acceptable.
Can you formulate a linear program to help CHIPCO determine its optimal product
mix?
1.6. Assignment problem. Five persons (A, B, C, D, E) are assigned to work on five different
projects. The following table shows how long it takes for a specific person to finish a
specific project:
Exercises 13
Project#

2 3 4 5
A 5 5 7 4 8
B 6 5 8 3 7
c 6 8 9 5 10
D 7 6 6 3 6
E 6 7 10 6 II
The standard wage is $60 per person per day. Suppose that one person is assigned
to do one project and every project has to be covered by one person. Can you formulate
this problem as an integer linear program?
1.7. INTER-TRADE company buys no-bland textile outlets from China, India, and the Philip-
pines, ships to either Hong Kong or Taiwan for packaging and labeling, and then ships to the
United States or France for sale. The transportation costs between sources and destinations
can be read from the following table:

China India Philippines USA France

Hong Kong $50/ton $90/ton $70/ton $150/ton $180/ton


Taiwan $60/ton $95/ton $50/ton $130/ton $200/ton
Suppose that INTER-TRADE purchased 60 tons of no-blands from China, 45 tons
from India, and 30 tons from the Philippines. The U.S. market demands 80 tons of labeled
products and the French market 55 tons. Assume that packaging and labeling do not change
the weight of textile products.
(a) If both Hong Kong and Taiwan have unlimited packaging and labeling capacity, for-
mulate a linear program to help INTER-TRADE minimize the shipping cost.
(b) If Hong Kong can process at most 60 tons of no-blands, what will be changed in your
formulation?
(c) If Hong Kong can process at most 60 tons of no-blands and Taiwan can process at most
50 tons, what will happen to your formulation?
(d) Under condition (c), try to reduce the linear program to two independent transportation
problems.
2

Geometry of Linear
Programming

The intent of this chapter is to provide a geometric interpretation of linear programming.


Once the underlying geometry is understood, we can follow intuitions to manipulate
algebraic expressions in validating known results and developing new insights into linear
programming. We shall stick to linear programs in standard form in this chapter. Some
terminology and basic concepts will be defined before the fundamental theorem of linear
programming is introduced. Motivations of the classic simplex method and the newly
developed interior-point approach-will then be discussed.

2.1 BASIC TERMINOLOGY OF LINEAR PROGRAMMING

Consider a linear programming problem in its standard form:


Minimize cT x

subject to Ax= b
x:::o (2.1)
where c and x are n-dimensional column vectors, A an m x n matrix, and b an m-
dimensional column vector. Usually, A is called the constraint matrix, b the right-hand-
side vector, and c the cost vector. Note that we can always assume that b ::: 0, since for
any component bi < 0, multiplying a factor -1 on both sides of the ith constraint results
in a new positive right-hand-side coefficient. Now we define P = {x E WI Ax = b,
x :=:: 0} to be the feasible domain or feasible region of the linear program. When P is not
void, the linear program is said to be consistent. For a consistent linear program with a

14
Sec. 2.2 Hyperplanes, Halfspaces, and Polyhedral Sets 15

feasible solution x* E P, if cT x* attains the minimum value of the objective function cT x


over the feasible domain P, then we say x* is an optimal solution to the linear program.
We also denote P* = {x* E PIx* is an optimal solution} as the optimal solution set.
Moreover, we say a linear program has a bounded feasible domain, if there exists
a positive constant M such that for every feasible solution x in P, its Euclidean norm,
llxll = (Xf +xi + · · · + x;) 1/ 2 , is less than or equal to M. On the other hand, if there
exists a constant C such that cT x ::: C for each x E P, then we say the linear program is
bounded. In this context, we know a linear program with bounded feasible domain must
be bounded, but the converse statement needs not to be true.
Our immediate objective is to examine the geometry of the feasible domain P and
the linear objective function cT x of a linear program.

2.2 HYPERPLANES, HALFSPACES, AND POLYHEDRAL SETS

A fundamental geometric entity occurring in linear optimization is the hyperplane


H = {x E R" I aT x = ,8} (2.2)
whose description involves a nonzero n-dimensional column vector a and a scalar ,8. A
hyperplane separates the whole space into two closed halfspaces
(2.3)
and
(2.4)

that intersect at the hyperplane H. Removing H results in two disjoint open halfspaces
H{ = {x E R" I aT x < ,8) (2.5)

and
H~ = {x E R" I aT x > ,8) (2.6)

We further define H to be the bounding hyperplane of HL and H£.


The defining vector a of hyperplane H is called the normal of H. Since, for any
two vectors y and z E H,
T
a (y - z) = aT y - T
a z = ,8 - ,8 =0
we know the normal vector a is orthogonal to all vectors that are parallel to the hyperplane
H. Moreover, for each vector z in H and w in HL,
aT (w - z) = aT w - aT z < ,8 - ,8 = 0
This shows that the normal vector a makes an obtuse angle with any vector that points
from the hyperplane toward the interior of HL. In other words, a is directed toward the
exterior of HL. Figure 2.1 illustrates this geometry.
16 Geometry of Linear Programming Chap. 2

H= {x E R" I aTx=,6}

Figure ~.1

For a linear program in its standard form, the hyperplanes

depict the contours of the linear objective function, and the cost vector c becomes the
normal of its contour hyperplanes.
We further define a polyhedral set or polyhedron to be a set formed by the intersec-
tion of a finite number of closed halfspaces. If the intersection is nonvoid and bounded,
it is called a polytope. For a linear program in its standard form, if we denote a; to be
the ith row of the constraint matrix A and b; the ith element of the right-hand vector b,
then we have m-hyperplanes

H; = {x E Rn I af X = b;}, i = 1, .. . ,m

and the feasible domain P becomes the intersection of these hyperplanes and the first
orthant of R 11 • Notice that each hyperplane H is an intersection of two closed halfspaces
HL and Hu and the first orthant of R" is the intersection of n closed halfspaces {x E
R 11 I x; =:: 0} (i = 1, 2, ... , n). Hence the feasible domain P is a polyhedral set. An
optimal solution of the linear program can be easily identified if we see ~how the contour
hyperplanes formed by the cost vector c intersect with the polyhedron formed by the
constraints.
Consider the following linear programming problem:

Example 2.1
Minimize - X! - 2x2

subject to XI + x2 + x3 = 40

Although it has four variables, the feasible domain can be represented as a two-
dimensional graph defined by

X! +x2::::; 40, Xj 2:::0,

Hence we see a graphical representation in Figure 2.2.


Sec. 2.3 Affine Sets, Convex Sets, and Cones 17

2x 1 + x 2 =60

(30, 0) (40, 0) Figure 2.2

2.3 AFFINE SETS, CONVEX SETS, AND CONES

A more detailed study of polyhedral sets and polytopes requires the following definition:

Given p points, x 1, x2 , ... , xP E Rn, and p scalars A. 1 , A. 2 , ... , A.p E R, the


expression A. 1x 1 + A.zx2 + · · · + ApxP is called a linear combination. The linear combi-
nation becomes an affine combination when A. 1 + A. 2 + · · · + Ap = 1; a convex combination
when A. 1 + A. 2 + · · · + Ap = 1 and 0 ::::: A. 1, Az, ... , Ap ::::: 1; and a convex conical
combination when 0::::: A.r, A.z, ... , Ap·

To understand the geometrical meaning of the affine and convex combinations, we


consider the case of two points x 1 and x2 and its linear combination. Since we can
always let A. 1 = 1- s and A.z = s, for a scalars to replace the equation A. 1 + Az = 1, we
see that

Consequently, we know the set of all affine combinations of distinct points x 1, x2 E R" is
the whole line determined by these two points, while the set of all convex combinations
is the line segment jointing x 1 and x2 . Obviously each convex combination is an affine
combination, but the converse statement holds only when x 1 = x2 .
Following the previous definition, for a nonempty subset S C R", we say S is
affine if S contains every affine combination of any two points x 1, x2 E S; S is convex if
S contains every convex combination of any two points x 1, x2 E S.
It is clear that affine sets are convex, but convex sets need not be affine. Moreover,
the intersection of a collection (either finite or infinite) of affine sets is either empty or
affine and the intersection of a collection (either finite or infinite) of convex sets is either
empty or convex.
We may notice that hyperplanes are affine (and hence convex), but closed halfspaces
are convex only (not affine). Hence the linear manifold (the solution set of a finite system
of linear equations) {x E R" I Ax = b} is affine (and hence convex) but the feasible
domain P of our linear program is convex only.
18 Geometry of Linear Programming Chap. 2

Given a set S C Rn and x E S, we say x is an interior point of S, if there exists


a scalar E > 0 such that the open ball B = {y E W lllx - Yll < E} is contained in S.
Otherwise xis a boundary point of S.
For a convex set S C W, a key geometric property is due to the following sepa-
ration theorem:

Separation Theorem. Let S be a convex subset of Rn and x be a boundary point


of S. Then there is a hyperplane H containing x with S contained in either HL or Hu.

Based on this theorem, we can define a supporting hyperplane H to be a hyperplane


such that (i) the intersection of H and S is not empty, and (ii) HL contains S. A picture
of a supporting hyperplane to a convex set is given by Figure 2.3.

H
Figure 2.3

One very important fact to point out here is that the intersection set of the polyhedral
set P and the supporting hyperplane with the negative cost vector -c as its normal
provides optimal solutions to our linear programming problem. This fact will be proved
in Exercise 1.6, and this is the key idea of solving linear programming problems by
"graphic method." Figure 2.4 illustrates this situation of Example 1.

-c = (1 Z)T

(30, 0) (40, 0) Figure 2.4


Sec. 2.4 Extreme Points and Basic Feasible Solutions 19

In general, for a convex polyhedral set P and a supporting hyperplane H, the


intersection set F = P n H is called a face of P. If F is a zero-dimensional set, we
have a vertex; one-dimensional, an edge; and one dimension less than set P, a facet.
To define the dimensionality of a subset of W, we start with an affine subspace. For a
subspace S c Rn and a vector a E Rn, the set

Sa= {y =X+ a I XES} (2.7)

is called an affine subspace of Rn. Basically, translating a subspace by a vector results


in an affine subspace. The dimension of Sa is equal to the maximum number of linearly
independent vectors in S. The dimension of a subset C c Rn is then defined to be the
smallest dimension of any affine subspace which contains C.
One more important structure to define is the conical set. A nonempty set C C Rn
is a cone if AX E C for each x E C and A ::: 0. It is obvious that each cone contains the
zero vector. Moreover, a cone that contains at least one nonzero vector x must contain
the "ray" of x, namely {Ax I A ::: 0}. Such cones can clearly be viewed as the union of
rays. A cone needs not to be convex, but given an m x n matrix M, a convex cone can
be generated by the columns of M, namely
Me = (y E Rm I Y = Mw, W E Rn, W::: 0} (2.8)
This particular cone will be used in later chapters.
Affine sets, convex sets, and convex cones have certain important properties in
common. Given a nonempty set S c Rn, the set of all affine (convex, convex conical)
combinations of points in S is an affine (convex, convex conical) set which is identical
to the intersection of all affine (convex, convex conical) sets containing S. We called
this set an affine (convex, convex conical correspondingly) hull.

2.4 EXTREME POINTS AND BASIC FEASIBLE SOLUTIONS

Extreme points of a polyhedral set are geometric entities, while the basic feasible solutions
of a system of linear equations and inequalities are defined algebraically. When these
two basic concepts are linked together, we have algebraic tools, guided by geometric
intuition, to solve linear programming problems.
The definition of extreme points is stated here: A point x in a convex set C is said
to be an extreme point of C if x is not a convex combination of any other two distinct
points in C. In other words, an extreme point is a point that does not lie strictly within
the line segment connecting two other points of the convex set. From the pictures of
convex polyhedral sets, especially in lower-dimensional spaces, it is clear to see that the
extreme points are those "vertic~s·: of a convex polyhedron. A formal proof is left as an
exercise. 1~, ~,
To characterize those extreme points of the feasible domain P = (x E Rn I Ax=
b, x ::: 0} of a given linear program in its standard form, we may assume that A is an
m xn matrix with m :::: n. We also denote the jth column of A by Aj, for j = 1, 2, ... , n.
20 Geometry of Linear Programming Chap. 2

Then for each point x = (xi, x2, ... , xn) 7 E P, we have

XJAI +x2A2 + · · · +xnAn = b (2.9)

Therefore we call column Aj the corresponding column of the jth component Xj of x,


for j = 1, ... , n. Moreover, we have the following theorem.

Theorem 2.1. A point x of the polyhedral set P = {x E Rn I Ax = b, x 2: 0}


is an extreme point of P if and only if the columns of A corresponding to the positive
components of x are linearly independent.
Proof Without loss of generality, we may assume that the components of x are
zero except for the first p components, namely

X=(~) where x = (x 1 , ••• , xp)T > 0

We also denote the first p columns of matrix A by A. Hence Ax = Ax = b.


(2: side): Suppose that the columns of A are not linearly independent, then there
exists a nonzero vector w such that Aw = 0. We define y 1 = x + ow and y2 = x - ow.
For a small enough o > 0, we see y 1 , y2 2: 0, and Ay 1 = Ay2 = Ax = b. We further
define

Then we know y 1 , y2 E P and x = 1j2y 1 + 1/2y2 . In other words, xis not an extreme
point of P.
(:::: side): Suppose that x is not an extreme point, then x = Ay 1 + (1 - A)y2 for
some distinct y 1, y2 E P and 0 :::: A :::: 1. Since y 1, y2 2: 0 and 0 :::: A :::: 1, the last n - p
components of y 1 must be zero. Consequently, we have a nonzero vector w = x- y 1
such that Aw = Aw = Ax - Ay 1 = b - b = 0. This shows that the columns of A are
linearly dependent.

For an m x n matrix A (assuming m :::: n), if there exist m linearly independent


columns of A, we say A has full row rank, or full rank in short. In this case, we can
group those m linearly independent columns together to form a basis B and leave the
remaining n- m columns as nonbasis N. In other words, we can rearrange A= [BIN].
We also rearrange the components of any solution vector x in the corresponding order,
namely

X= (::)
For a component in x 8 , its conesponding columns is in the basis B, we call it a basic
variable. Similarly, those components in XN are called nonbasic variables. Since B is
a nonsingular m x m matrix, we can always set all nonbasic variables to be zero, i.e.,
Sec. 2.5 Nondegeneracy and Adjacency 21

XN = 0, and solve the system of equations Bxs = b for basic variables. Then vector

X= [ : : ]

becomes a basic solution. Furthermore, if x 8 = B- 1b ~ 0, then we say x is a basic


feasible solution to the linear program.
If matrix A does not have full row rank, then either the system of equations Ax = b
has no solution (hence P = <P) or some constraints are redundant. After removing
redundant constraints from A, the remaining matrix has full row rank. Therefore, we
assume that the constraint matrix A of a given linear programming problem always have
full row rank unless specified otherwise. Under this assumption, since there are at most
nl
C(n, m) = - - - -
ml(n- m)l
different ways of choosing m linearly independent columns from n columns of A, we
know there are at most C(n, m) basic solutions.
The following corollary is a direct consequence of Theorem 2.1.

Corollary 2.1.1. A point x E P = {xI Ax = b, x ~ 0} is an extreme point of P


if and only if x is a basic feasible solution corresponding to some basis B.

By noticing that every basic feasible solution is a basic solution, we have the next
corollary.

Corollary 2.1.2. For a given linear program in its standard form, there are at
most C(n, m) extreme points in its feasible domain P.

2.5 NONDEGENERACY AND ADJACENCY

A very important fact to mention is that the correspondence between basic feasible
solutions and extreme points of P, as described in Corollary 2.2, in general is not one-
to-one. Corresponding to each basic feasible solution there is a unique extreme point
in P, but corresponding to each extreme point in P there may be more than one basic
feasible solution. ·
Consider the a polytope P define by
= {X E + X3 = 10, Xt + X4 = 10, X[, X2, X3, X4 ~ 0}
4
p R 1 X[+ X2 (2.10)
or, equivalently for its graph in Figure 2.5, we have
P = {x E R
2
Ix 1 + x2 :S 10, Xt ::: 10, x 1, x 2 ~ 0} (2.11)
Note that P has three extreme points in Figure 2.5, namely,
A= (0, 0), B = (0, 10), and c= (10, 0)
22 Geometry of Linear Programming Chap. 2

Figure 2.5

Their four-dimensional coordinates corresponding to (2.10) are


A= (0, 0, 10, 10), B = (0, 10, 0, 10), and C = (10, 0, 0, 0)
We can check that extreme point A corresponds to one basic feasible solution
taking X3 and X4 as basic variables, and extreme point B corresponds to the basic feasible
solution taking x2 and X4 as basic variables. However, extreme point C corresponds to
three basic feasible solutions: one takes x 1 and x2 as basic variables, one takes XI and
x 3 , and the remaining one takes XI and x 4 . The reason for C getting more than one
corresponding basic feasible solution is that all the three corresponding basic feasible
solutions have one basic variable with zero in value, which makes them indistinguishable
from one another. Based on this observation, we define a basic feasible solution to be
nondegenerate, if it has exactly m positive basic variables. Otherwise, the basic feasible
solution has at least n - m + 1 zero elements in it, and we call it a degenerate case.
A linear programming problem is nondegenerate if all basic feasible solutions are
nondegenerate. In this case, there is a one-to-one correspondence between the extreme
points and basic feasible solutions. This nondegeneracy assumption of a given linear
programming problem will greatly simplify our situations in solving linear programming
problems. We shall discuss it further in the next chapter.
Two basic feasible solutions of P are adjacent, if they use m - 1 basic variables
in common to form basis. For example, in Figure 2.2, it is easy to verify that extreme
point (0, 40) is adjacent to (0, 0) but not adjacent to (30, 0) since (0, 40) takes x2 and X4
as basic variables, while (0, 0) takes x 3 and x4 and (30, 0) takes x1 and X3. Under the
"nondegeneracy assumption," since each of the n - m nonbasic variables could replace
one current basic variable in a given basic feasible solution, we know that every basic
feasible solution (hence its corresponding extreme point) has exactly n - m adjacent
neighbors. Actually, each adjacent basic feasible solution can be reached by increasing
Sec. 2.6 Resolution Theorem for Convex Polyhedrons 23

the value of one nonbasic variable from zero to positive and decreasing the value of
one basic variable from positive to zero. This is the basic concept of pivoting in the
simplex method to be studied in the next chapter. Geometrically, adjacent extreme points
of P are linked together by an edge of P, and pivoting leads one to move from one
extreme point to its adjacent neighbor along the edge direction. This can be clearly seen
in Figure 2.2.

2.6 RESOLUTION THEOREM FOR CONVEX POLYHEDRONS

Suppose the feasible domain P is bounded-in other words, P is a polytope. From


Figure 2.6 it is easy to observe that each point of P can be represented as a convex
combination of the finite number of extreme points of P.

Figure 2.6

This idea of "convex resolution" can be verified for a general polyhedral set with
the help of the following definition: An extremal direction of a polyhedral set P is a
nonzero vector d E Rn such that for each x 0 E P the ray {x E Rn I x = x0 +Ad, A ::: 0}
is contained in P. Note that, in the convex analysitlil:erature, it is usually called the
direction of recession.
From the definition of the feasible domain P, we see that a nonzero vector d E Rn
is an extremal direction of P if and only if Ad = 0 and d ::: 0. Also, P is unbounded if
and only if P has an extremal direction. Using extreme points and extremal directions,
every point in P can be well represented by the following theorem.

Theorem 2.2 (Resolution Theorem). Let V = {vi E Rn 1 i E I} be the set of all


extreme points of P with a finite index set I. Then for each x E P, we have

X= LA;Vi +d (2.9)
iE!
24 Geometry of Linear Programming Chap. 2

where LAi 1, A; > 0 for i E I, and d is either the zero vector or an extremal
iEl
direction of P.
A proof using the mathematical induction method on the number of positive com-
ponents of the given vector x E P is included at the end of this chapter as an exercise.
A direct consequence of the resolution theorem confirms our observation made at
the beginning of this section, namely,

Corollary 2.2.1. If P is bounded (a polytope), then each point x E P is a convex


combination of the extreme points of P.

Another direct implication is as follows.

Corollary 2.2.2. If P is nonempty, then it has at least one extreme point.

2.7 FUNDAMENTAL THEOREM OF LINEAR PROGRAMMING

The resolution theorem reveals one fundamental property of linear programming for
algorithm design.

Theorem 2.3 (Fundamental Theorem of Linear Programming). For a con-


sistent linear program in its standard form with a feasible domain P, the minimum
objective value of z = cT x over P is either unbounded below or is achievable at least at
one extreme point of P.
Proof Let V = {vi E PI i E I} be the set of all extreme points of P with a finite
index set I. Since the problem is consistent, I is nonempty and there is at least one
v 1 E V. By the resolution theorem, P either has an external direction d with cT d < 0
or does not have such a direction.
In the first case, P is unbounded, and z goes to minus infinity at v 1 +Ad as A goes
to positive infinity.
For the latter, for each x E P, either

X= LA;Vi with LA;=l, A; ::: 0 or


iEl iEl
X= LA;Vi +d with LA;=l, A; ::: 0, and crd::: 0
iEl iEl
In both situations, assuming cT vmin is the minimum among {cT vi 1 i E I}, we have

cT X :0: LA; (cT vi) :0: cT vmin (LA;) = CT vmin


iEl iEl

Hence the minimum value of z is attained at the extreme point vmin.


Sec. 2.8 Concluding Remarks: Motivations of Different Approaches 25

It is important to point out that Theorem 2.3 does not rule out the possibility of
having an optimal solution at a nonextreme point. It simply says that among all the
optimal solutions to a given linear programming problem, at least one of them is an
extreme point.

2.8 CONCLUDING REMARKS: MOTIVATIONS OF DIFFERENT


APPROACHES

The fundamental theorem of linear programming shows that one of the extreme points of
the feasible domain P is an optimal solution to a consistent linear programming problem
unless the problem is unbounded. This fundamental property has guided the design of
algorithms for linear programming.
_Gne o_f_the most intuitive ways of solving a linear programming problem is the
graphical method, as we discussed before. We draw a graph of the feasible domain P first.
-Tllenat eac~me point v of P, using the negative cost vector-cas the normal vector,
we draw a hyperplane H. If P is contained in the halfspace HL, then H is a desired
supporting hyperplane and v is an optimal solution to the given linear programming
problem. This method provides us a clear picture, but it is limited to those problems
whose feasible domains can be drawn in the three-dimensional, or lower, spaces only.
Another straightforward method is the enumeration method. Since an extreme point
corresponds to a basic feasible solution, it must be a basic solution. We can generate
all basic solutions by choosing m linearly independent columns from the columns of
constraint matrix A and solving the corresponding system of linear equations. Among
all basic solutions, we identify feasible ones and take the optimal one as our solution.
The deficiency of this method is due to the laborious computation. It becomes impractical
when the number C(n, m) becomes large.
The rest of this book is devoted to designing efficient iterative algorithms for linear
programming. There are two basic approaches. One is the well-known simplex method,
the other is the newly developed interior-point approach. Focusing on finding an optimal
extreme point, the simplex approach starts with one extreme point, hops to a better
neighboring extreme point along the boundary, and finally stops at an optimal extreme
point. Because the method is well designed, rarely do we have to visit too many extreme
points before an optimal one is found. But, in the worst case, this method may still visit
all nonoptimal extreme points.
Unlike the simplex method, the interior-point method stays in the interior of P
and tries to position a current solution as the "center of universe" in finding a better
direction for the next move. By properly choosing step lengths, an optimal solution is
finally achieved after a number of iterations. This approach takes more effort, hence
more computational time, in finding a moving direction than the simplex method, but
better moving directions result in fewer iterations. Therefore the interior-point approach
has become a rival of the simplex method and gathered much attention.
Figure 2. 7 shows the fundamental difference between these two approaches.
26 Geometry of Linear Programming Chap. 2

Simplex method
/
x4

/ . lo<orim/-p
/ointmeth/od xi

x* L~ ----- !2.__f___/ x2

Figure 2.7

REFERENCES FOR FURTHER READING

2.1. Bazaraa, M. S., Jarvis, J. J., and Sherali, H. D., Linear Programming and Network Flows (2d
ed.), John Wiley, New York (1990).
2.2. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).
2.3. Dantzig, G. B., Linear Programming and Extensions, Princeton University Press, Princeton,
NJ (1963).
2.4. Gass, S. I., Linear Programming: Methods and Applications (2d ed.), McGraw-Hill, New
York (1964).
2.5. Gill, P. E., Murray, W., and Wright, M. H., Numerical Linear Algebra and Optimization,
Vol. 1, Addison-Wesley, Redwood City, CA (1991).
2.6. Goldfarb, D. and Todd, M. J., "Linear Programming," in Optimization, Handbook in Opera-
tions Research and Management Science, ed. Nemhauser, G. L., and Rinnooy Kan, A. H. G.,
Vol. 1, 73-170, Elsevier-North Holland, Amsterdam (1989).
2.7. Luenberger, D. G., Introduction to Linear and Nonlinear Programming, (2d ed.), Addison-
Wesley, Redwood City, CA (1973).
2.8. Peterson, E. L., An Introduction to Linear Optimization, lecture notes, North Carolina State
University, Raleigh, NC (1990).

EXERCISES

2.1. Prove that a linear program with bounded, feasible domain must be bounded, and give a
counterexample to show that the converse statement need not be true.
2.2. Let S be a subset of Rn. For each of the following assertions, either prove it or provide a
counterexample in R 2 to disprove it:
(a) If S is convex, then S is (i) affine; (ii) a cone; (iii) a polyhedron; (iv) a polytope.
(b) If S is affine, then S is (i) convex; (ii) a cone; (iii) a polyhedron; (iv) a polytope.
(c) If S is a cone, then S is (i) convex; (ii) affine; (iii) a polyhedron; (iv) a polytope.
(d) If S is a polyhedron, then S is (i) convex; (ii) affine; (iii) a cone; (iv) a polytope.
Exercises 27

(e) If S is a polytope, then S is (i) convex; (ii) affine; (iii) a cone; (iv) a polyhedron.
2.3. Let H = {x E R11 I a 7 x = ,8) be a hyperplane. Show that H is affine and convex.
2.4. Suppose C1, C2, ... , Cp are p(> 0) convex subsets of R 11 • Prove or disprove the following
assertions:
p
(a) nci is convex.
i=l
p
(b) Uci is convex.
i=l

2.5. Use the results of Exercises 2.3 and 2.4 to show that P = {x E R 11 1Ax = b, x ::: 0} is a
convex polyhedron.
2.6. To make the graphic method work, prove that the intersection set of the feasible domain
P and the supporting hyperplane whose normal is given by the negative cost vector -c
provides the optimal solutions to a given linear programming problem.
2.7. Let P = {(xt,X2) E R 2 lx1 +x2 S 40,2xt +x2 S 60,x1 S 20,xt,X2::: 0}. Do the
following:
(a) Draw the graph of P.
(b) Convert P to the standard equality form.
(c) Generate all basic solutions.
(d) Find all basic feasible solutions.
(e) For each basic feasible solution, point out its corresponding extreme points in the graph
of P.
(f) Which extreme points correspond to degenerate basic feasible solutions?
2.8. For P as defined in Exercise 2.7, use the graphic method to solve linear programming
problems with the following objective functions:
(a) z = -x2;
(b) Z =-X! - X2;
(c) Z = -2Xj - X2;
(d) Z = -Xj
(e) z = -x1 +x2.
What conclusion can be reached on the optimal solution set P*?
2.9. Show that the set of all optimal solutions to a linear programming problem is a convex
set. Now, can you construct a linear programming problem which has exactly two different
optimal solutions? Why?
2.10. Prove that for a degenerate basic feasible solution with p < m positive elements, its
corresponding extreme point P may correspond to C (n - p, n - m) different basic feasible
solutions at the same time.
2.11. Let M be the 2 x 2 identity matrix. Show that
(a) Me, the convex cone generated by M, is the first orthant of R2 .
(b) Me is the smallest convex cone that which contains the column vectors (1, 0) T and
(0, 1) 7 .
2.12. Given a nonempty set S c R 11 , show that the set of all affine (convex, convex conical)
combinations of points in S is an affine (convex, convex conical) set which is identical to
the intersection of all affine (convex, convex conical) sets containing S.
28 Geometry of Linear Programming Chap. 2

2.13. To prove the resolution theorem by the induction method, we let p be the number of positive
components of x E P. When p = 0, x = 0 is obviously an extreme point of P. Assume
that the theorem holds for p = 0, 1, ... , k, and x has k + l positive components.
If x is an extreme point, then there is nothing to prove. If x is not an extreme point, we
let x 7 = (x 7 I 0), where x7 = (xt, ... , Xk+l) > 0 and A = (A IN]. Then Theorem 2.1
shows that the columns of A are linearly dependent, in other words, there exists a vector
w E Rk+l, w =F 0 such that Aw = 0. We define wT = (w7 , 0) E Rn, then w =F 0 and
Aw = Aw = 0. There are three possibilities: w ::=:: 0, w < 0, and w has both positive and
negative components.
For w :::: 0, consider x(e) = X + ew and pick e* to be the largest negative value of e such
that x* = x(e*) has at least one more zero component than x. Then follow the induction
hypothesis to show the theorem holds. Similarly, show that in the remaining two cases, the
theorem still holds.
2.14. For a linear programming problem with a nonempty feasible domain P = {x E Rn I Ax=
b, x 0:::: 0}, prove that every extreme point of P is a vertex of P and the converse statement
is also true.
3

The Revised Simplex


Method

In Chapter 2 we have seen that if the optimal solution set of a linear programming
problem is nonempty, then it contains at least one extreme point of the polyhedral set
of the feasible domain. Thus an intuitive way to solve a linear programming problem
is to traverse from one extreme point to a neighboring extreme point in a systematic
fashion until we reach an optimal one. This is the basic idea of the simplex method and
its variants. However, in doing so, as in any other iterative scheme, we have to resolve
three important issues: (1) How do we start with an extreme point? (2) How do we move
from one extreme point to a better neighboring extreme point in an "efficient" way? (3)
When do we stop the process? This chapter addresses these issues for the simplex
method with an emphasis on the so-called revised simplex method, which provides a
computationally efficient implementation for linear programming.

3.1 ELEMENTS OF AN ITERATIVE SCHEME

The philosophy of solving an optimization problem via an iterative scheme is to start


with a "rough" solution and successively improve the current solution until a desired goal
is met. The simplex method, ellipsoid method, Karmarkar's projective scaling method,
and the affine scaling method to be studied are all in this category. Basically, an iterative
scheme consists of three major steps:

Step 1: Start from somewhere.


Step 2: Check if the goal is met.
Step 3: Move to a place closer to the goal.

29
30 The Revised Simplex Method Chap. 3

The first step is to find a valid and yet convenient starting point. The choice of a
starting point may affect the overall efficiency of an iterative scheme. It varies widely
from one method to another. If a method is very sensitive to its starting point, it is cer-
tainly worth spending additional computational effort and time in finding a good starting
point. Otherwise, we should spend minimum effort on it. Sometimes mathematical
transformations are employed to transform a given problem into an equivalent form for
a quick admissible starting point. Once the transformed problem is solved, its solution
could then be used to obtain a solution to the original problem. In general, finding a
starting point is not an easy task; it may take as much as half of the total computational
effort. We shall study different starting mechanisms in later sections and chapters.
The second step of an iterative scheme is to check if we have reached our goal or
not. For an optimization problem, this means testing for optimality of a solution. This
test has to be carried out at each iteration for the current solution in hand. When the
result turns out to be positive, the iterative scheme is terminated. Otherwise, we go to
the third step for further improvement. The testing process usually requires a stopping
rule, or stopping criterion for an iterative scheme. Once again, a computationally simple
stopping rule is preferred for an efficient iterative scheme, since it is performed at each
iteration.
If the stopping rule is met, we have achieved our goal. Otherwise, we proceed
to make further improvement in getting closer to our goal. This is usually done by
moving from a current solution to a better one. To do so we need two elements: (1) a
good direction of movement, and (2) an appropriate step length along the good direction.
A good direction should point to a better result, and the step length describes how far
we should proceed along the direction. Needless to say, the efficiency of an iterative
method depends strongly on the mechanism of finding a good direction and appropriate
step-length. In general, the synthesis of the direction of movement and the associated step
length calculation constitute the bulk of computation for an iterative scheme. Therefore
special attention should be paid to this aspect to achieve speed and efficiency in practical
implementations.
Bearing these ideas in mind, we shall study the guiding principles of the simplex
method for solving linear programming problems. For computational efficiency, we
focus on the revised simplex method, which is a systematic procedure for implementing
the steps of the original simplex method in a smaller array.

3.2 BASICS OF THE SIMPLEX METHOD

Consider the following linear programming problem in its standard form:


Minimize z= cT x (3.1a)
subject to Ax = b; (3.lb)
where A is an m x n matrix with full row rank, b can always be adjusted to be an
m-dimensional nonnegative vector, and c, x E Rn.
Sec. 3.3 Algebra of the Simplex Method 31

The simplex method was first conceived in the summer of 1947 by G. B. Dantzig.
Over the past four decades, although many variants of the simplex method been developed
to improve its performance, the basic ideas have remained the same. We study the basic
ideas in this section.
Considering the fundamental theorem of linear programming, we know that if the
feasible domain P = {x E RnjAx = b, x:::: 0} is nonempty, then the minimum objective
value z = cT x over P either is unbounded or is attainable at an extreme point of P.
This motivates the simplex method to restrict its iterations to the extreme points of P
only. It starts with an extreme point of P, checks for optimality, and then moves to
another extreme point with improved objective value if the current extreme point is
not optimal. Owing to the correspondence between extreme points and basic feasible
solutions as described in Corollary 2.2, the simplex method can be described in terms of
basic feasible solutions in an iterative scheme:

Step 1: Find a basic feasible solution.


Step 2: Check if current basic feasible solution is optimal. If it is optimal, stop.
Otherwise, go to next step.
Step 3: Move to a basic feasible solution with improved objective value, then
return to Step 2.

For Step 1, two commonly used mechanisms of finding a starting basic feasible
solution are the two-phase method and the big-M method. We shall introduce these two
mechanisms in Section 3.4. Once a starting point is obtained, in Step 2 it is checked
whether the current solution achieves the optimum. A stopping mle called nonnegative
reduced costs will be introduced in Section 3.3 for this purpose. If the objective cost
function can be further reduced, the stopping rule will be violated and the simplex method
proceeds to Step 3 to find an improved basic feasible solution. Under the assumption of
nondegeneracy, from Chapter 2, we know that each basic feasible solution has n - m
adjacent basic feasible solutions, which can be reached by moving along corresponding
edge directions from the current solution with appropriate step lengths. The simplex
method chooses an edge direction that leads to an adjacent basic feasible solution with
improved objective value. This is the so-called pivoting process, which will be discussed
in Section 3.3.

3.3 ALGEBRA OF THE SIMPLEX METHOD

In order to introduce the simplex method in algebraic terms, we standardize some nota-
tions here. For a given basic feasible solution x*, we can always denote it by

x* = [:;]
where the elements of the vector x8 represent the basic variables and the elements of
vector x'N represent nonbasic variables. Needless to ~~~ x8 :::: 0 and x'N = 0 for the
~
32 The Revised Simplex Method Chap. 3

basic feasible solution. Corresponding to the basic variables x'B and nonbasic variables
x';y, we partition A and c as

A= [BIN] and c= [ ~:] (3.2)

where B is an m x m nonsingular matrix that is referred to as the basis and N is referred


to as nonbasis with dimensionality of m x (n- m).
Once a basis B is known, every feasible solution x E P can be rearranged in a
corresponding order as

X= [ ; : ]

with both Xs and XN being nonnegative. Hence the linear programming problem defined
by (3.1) becomes

(3.3a)
subject to Bx 8 + NxN = b; x 8 ::: 0; xN ::: 0 (3.3b)

The equation in (3.3b) implies that

(3.4)

Substituting (3.4) back into (3.3a) results in

z =esT (B- 1b- B- 1NxN) + cNT XN


= CsTB- 1b + (cNT- c~B- 1
N) XN

= CsTB-Ib + rT [ : : ] (3.5)

where

(3.6)

Note that r is an n-dimensional column vector. Its first m components, corre-


sponding to the basic variables, are set to be zero and the remaining n - m components
correspond to nonbasic variables. Also note that the objective value of z* at current
basic feasible solution x* is c 8 TB- 1b, since x'B = B- 1b and xN = 0 at this point.
Consequently, Equation (3.5) becomes

for each x E P (3.7)


Sec. 3.3 Algebra of the Simplex Method 33

Now it is apparent that if r :::: 0, i.e., every component of cNT - c~B- 1 N ( or


equivalently, of (CN- (B- 1N)T c 8 ) T ) is nonnegative, then i- z* :::: 0 for each feasible
solution XE P, since

X= [::] ::::0
In this case the current basic feasible solution x* is an optimal solution. On the other
hand, if any component of r is negative, its corresponding element of XN may be increased
from zero to some positive value (or equivalently a nonbasic variable is brought into the
basis) to gain a reduction in the objective value z. Hence vector r is named the reduced
.Q!.St vector• which consists of reduced costs Summarizing previous discussions, we have
derived the following result:

Theorem 3.1. If

=[X~]=
1
x*
x'N
[B-0 b] -> O
is a basic feasible solution with nonnegative reduced costs vector r, given by Equation
(3.6), then x* is an optimal solution to the linear programming problem (3.1).

Moreover, we have developed a stopping rule based on the appearance of nonneg-


ative reduced costs.

3.3.1 Stopping the Simplex Method-Checking for


Optimality

Let x* be a current basic feasible solution with B being its corresponding basis, N the
nonbasis, B .the index set of basic variables in x*, and N the index set of nonbasic
variables. :tviore~v~f,f'Oi.=each nonbasic variable xq (q E N), let cq ~~t~-~ost coefficient
associated with it and Nq the column in N that corresponds to xq. Then Theorem 3.1
says that if
for each q EN (3.8)

then we can terminate the simplex method with an optimal solution x*. Otherwise, we
have to move to another basic feasible solution for some potential improvement in the
objective value.
Note that Nq = Aq for each q inN, since they represent the same columns.

3.3.2 Iterations of the Simplex Method-Moving for


Improvement

After taking care of Step 2, we now focus on the process of moving to a basic feasible
solution with improved objective value. The process includes finding a good moving
direction (direction of translation) and an appropriate step length.
34 The Revised Simplex Method Chap.3

Direction of Translation. A direction of translation is a vector d E Rn along


which we propose to translate our current basic feasible solution. Since the idea of the
simplex method is to hop from a current extreme point to an adjacent extreme point of
P, we consider only those directions which point from a current extreme point to its
adjacent neighbors. In other words, such a.direction must be along an edge of P. Hence
they are edge directions.
Consider the correspondence relation between the extreme points and the basic
feasible solutions of P. We see that, under the assumption of nondegeneracy, each
basic feasible solution (extreme point) of P has exactly n- m (the number of nonbasic
variables) adjacent neighbors in P. An adjacent basic feasible solution is obtained by
introducing a nonbasic variable (increase its value from zero to positive) to replace one
basic variable (reduce its value from positive to zero). The interrelationship is described
by Equation (3.4). To be more specific, if a nonbasic variable xq is being considered,
the remaining nonbasic variables are kept with zero value and Equation (3.4) becomes
xs = B- 1(b- xqAq) (3.9)
--·- ~~---·-----...-~-------~----··
where Aq is the column corresponding to Xq in A. Hence we know that the edge direction
corresponding to increasing Xq is given by

dq = ( -B:qiAq) for q EN (3.10)

where eq is an (n - m )-dimensional vector with 1 at the position corresponding to Xq and


0 corresponding to other nonbasic variables. Note that dq E Rn, and moving along this
direction will increase Xq, keep other nonbasic variables at zero, and change the values
of current basic variables according to Equation (3.9). Also note that, since Aq and Nq
represent the same column of matrix A,
-B-IA )
Adq = [BIN] ( eq q = Aq - Nq = 0 (3.11)

Therefore, under the assumption of nondegeneracy, the edge direction dq is a feasible


direction, because for current basic feasible solution x with a sufficient small scalar
a> 0,
A(x + adq) =Ax+ aAdq =Ax= b (3.12a)
and
(3.12b)
However, for a degenerate basic feasible solution, since the value of some basic variables
is zero, (3.12b) could be violated for any positive a. In this case, we have an infeasible
edge direction. Any amount of translation along an infeasible direction causes infeasi-
bility, which mandates the step length a to be zero. This happens because a degenerated
basic feasible solution is overdetermined by more than n hyperplanes passing through
it, and some edge directions lead to the region outside of P. Figure 3.1 illustrates this
situation. In the figure, x is the current basic feasible solution, which is the intersection
Sec. 3.3 Algebra of the Simplex Method 35

Feasible
region P

x1 Figure 3.1

of three lines in a two-dimensional plane. Hence it is overdetermined. It is clear to see


that d 1 is a feasible edge direction, but d2 is npt. More details of degeneracy ,will be
discussed in Section 3.5. . //
Now, for current basic feasible solution x*, suppose that dq given by (3.10) is a
feasible edge direction, our task is to determine if it is a good direction of translation,
i.e., a direction which leads to an improvement in the objective value. This means we
expect
/?.:;~~---,~,·

cT 0x* + cxdq) < cT x*_____fQr ex > 0. (3.13)


----__/'~--~~-~=-:::~-~---~--- ~-"-·-·~-··--

Consequently, we require
-B-IA ]
---~ = [c~lc~] [ eq q ~-~-=.<:~~-~~~'L-~-g (3.14)

Note again that Aq and Nq are the same column vector, and Equation (3.14) actually
requires. the !C!_ciuced cost r q < 0 to assure the corresponding edge direction is a good
direction of translation.
Summarizing our findings, we have the following theorem.

Theorem 3.2. Let

x* = [B~Ib]
be a basic feasible solution to the linear programming problem defined by (3.1) with basis
B. If the reduced cost rq < 0, for some nonbasic variable xq, then the edge direction dq
given by (3.10) leads to an improvement in the objective value.

Note that when x* is nondegenerate, each edge direction is a feasible directign,


therefore a positive step length ex can be chosen to traiislatecurrent basic feasible solution
along the direction dq to a distinct adjacent neighbor with improved objective value.
36 The Revised Simplex Method Chap. 3

However, when x* is degenerate, dq may become an infeasible edge direction that forces
a step length a = 0. In this case, no actual translation happens, and we stay at the same
extreme point with two different representations in terms of basic variables.
Also note that for a feasible edge direction dq with rq < 0, if dq ::: 0, then x* +adq
is always feasible as long as a > 0. Therefore, as a approaches infinity, the given linear
programming problem becomes unbounded below and we have the following result.

Theorem 3.3. Let x* be a basic feasible solution to the linear programming


problem defined by (3.1). If there is a feasible edge direction dq ::: 0 with a reduced
cost rq < 0, for some nonbasic variable Xq, then the linear programming problem is
unbounded below.
For a current basic feasible solution, it is possible to have more than one nonbasic
variable with negative reduced cost. Among the corresponding good edge directions, in
theory we can choose an arbitrary one as a direction of translation. Different rules for
selecting a nonbasic variable to enter the basis result in variants of the simplex method.
The two most commonly used rules are the smallest index rule, which picks the smallest
index q for which rq < 0, and the largest reduction rule which picks the index q with
the most negative value of rq. Although the second rule looks better in cost reduction
as far as the current iteration is concerned, there is no evidence showing that it is an
overall better choice.

Step Length. Once a good edge direction dq given by (3.10) is selected as the
direction of translation at the current basic feasible solution x*, we have to determine an
appropriate step length a ::: 0 such that x* + adq becomes a new basic feasible solution
by bringing in the nonbasic variable Xq and dropping out a basic variable to form a new
basis. By Theorem 3.3, we know that if dq ::: 0 is a feasible direction, then the given
linear programming problem is unbounded below. In case dq has negative components,
since Adq = 0 has been verified before, in order to keep x* + adq ::: 0, we need to
choose a according to the following formula:

a= Minirp.um -~ldJ
jEB
x*
[
< 0
dj
l (3.15)

where xJ is the jth element of x*, B is the index set of basic variables, and dJ is the
component in dq corresponding to the basic variable xJ.
This formula is referred to in the literature as the minimum ratio test. It determines
which basic variable will become nonbasic (with zero value) as the nonbasic variable xq
is introduced to the new basis. It is not difficult to show that under the assumption of
nondegeneracy, there is a unique basic variable Xp (p E B) which provides a positive
step length a leading to a distinct basic feasible solution. Also note that at a degenerate
point, the step length obtained by (3 .15) may become zero. In this case, although in
theory we have changed our basis, we actually stay at the same extreme point of P.
This process of changing basis is sometimes called the pivoting process. By pivot-in we
mean a nonbasic variable entering the new basis, and pivot-out a basic variable leaving
the current basis.
Sec. 3.3 Algebra of the Simplex Method 37

Just like the pivot-in process, there may be several basic variables achieving the
minimum ratio at the same time. Among these candidates for pivot-out, different variants
of the simplex method pick different candidates. But no evidence supports a particular
variant of the simplex method for all cases.
The following theorem summarizes the discussions in this subsection:

Theorem 3.4. Let

x* = [B~Ib]
be a basic feasible solution to the linear programming problem defined by (3~1) with
basis B. If a reduced cost rq < 0 is found for some nonbasic variable xq, then the edge
direction dq given by (3.10) together with the step length a determined by (3.15) lead to
another basic feasible solution whose objective value is no worse than that of the current
one.
Note that if dq is a feasible direction and a > 0, then the adjacent basic feasible
solution obtained in Theorem 3.4 represents a distinct extreme point with improved
objective value. However, when a = 0, we stay at the same extreme point with the
same objective value. Moreover, based on Theorems 3.1-3.4, we can sketch the key
steps of the simplex algorithm as follows:

Step 1: Find a basic feasible solution x with a basis matrix B and nonbasis
matrix N.
Step 2: Compute the reduced cost rq for each nonbasic variable xq according to
Equation (3.8). If rq ::: 0 for each nonbasic variable, then stop. The current basic
feasible solution is optimal. Otherwise, go to Step 3.
Step 3: Compute the direction of movement dq with rq < 0 according to (3.10). If
dq ::: 0, then the linear programming problem is unbounded. Otherwise, compute
the step length a according to (3.15), update the current basic feasible solution by
x +-- x + adq and update the corresponding basis matrix. Go to Step 2.

The following example illustrates this procedure.

Example 3.1
Minimize - x1- x2 - X3

subject to Zx1 + x4 = 1

2xz +x5 = 1
38 The Revised Simplex Method Chap. 3

Note that

b=[l If

and
c = [-I -I -I o o of
Step 1: Let us pick x 4, x 5 and x6 as basic variables, then

[::J=[X4 Xs X6IX! Xz x3f=[1 1• 1 1 o o of


is a basic feasible solution. If we consider x 4 , x 5 and X6 as slack variables, and
draw a three dimensional graph based on the coordinates of x 1 , Xz, X3, this solution
corresponds to the vertex (0,0,0) of the polyhedron P = {x E R 3 I 0 ::::: x; :::::
1/2, i = 1, 2, 3}.
In this case,

10 01 0]0
B = [0 0 1
:8 = {4, 5, 6} N= {1, 2, 3}

CB=[O 0 O]T, and CN=[-1 -1 -1f


Step 2: Following Equation (3.8), the reduced cost vector r = [ 0 0 0 -1
-1 -1 f. Since r1 = rz = r3 = -1 < 0, the current solution is not optimal,
and we go to Step 3.
Step 3: Let us pick a nonbasic variable with negative reduced cost, say q = 1. This
means x 1 is entering the basis. According to (3.10), d 1 = [ -2 0 0 1 0 0 f,
and we use (3.15) to determine the step length. The result shows that x 4 is leaving
the basis and a = 1/2. Hence the new basic feasible solutionis given by

[::]+ad 1
=[1 1 0 0 Of+1/2x[-2 0 0 1 0 of

= [O 1 112 o of
Note that the new basic variables are x 5 , x 6 , x 1 and nonbasic variables are x 2 , x3, x4.
Moreover,
Xs=[xs X6 XJf=[1 1 1/2f
XN = [ Xz X3 X4 f = [0 0 0f
Sec. 3.4 Starting the Simplex Method 39

the new basis matrix

and the corresponding nonbasis

0 0 1]
N=
[02 20 00
By now we have completed one simplex iteration. It is easy to verify that the
new solution corresponds to the vertex (1/2, 0, 0) with a reduced objective value
-1/2. If we go back to Step 2 for two more iterations, we can reach an optimal
solution
(Xt X2 X3 X4 Xs X6f=[1j2 1/2 1/2 0 0 Of
with an objective value -3/2.
As we discussed before, under the assumption of nondegeneracy, at each
iteration the step length a in Step 3 is always positive. This leads the simplex
method to a distinct extreme p9int with lower objective value after each iteration.
Hence the simplex method is not going to revisit any extreme point that has been
visited before. Since the total number of extreme points of the feasible domain P
is finite, we have the following tqeorem:

Theorem 3.5. Under the ass)lmption of nondegeneracy, the revised simplex


method terminates in a finite number of iterations.

In the presence of degeneracy, the simplex method may be trapped into an endless
loop without termination. This phenomenon is called cycling, and we shall study it in
Section 3.5.

3.4 STARTING THE SIMPLEX METHOD

For some linear programming problems, it is easy to find a starting basic feasible solution.
But this task can be as difficult as finding an optimal solution from a given basic feasible
solution. In this section we introduce two commonly used starting mechanisms.

3.4.1 Two-Phase Method

Consider the linear programming problem defined by (3.1). Without loss of generality,
we can assume b ::0: 0. Remember that we have n variables and m constraints. We let
xa = (x 1a, x 2 a, ... , Xma)T E Rm be an m-dimensional vector of artificial variables, and
40 The Revised Simplex Method Chap. 3

consider an associated Phase I problem:


m
Minimize z = ~x;a (3.16a)
i=1

subject to Ax + xa = b; x :::: 0; xa :::: 0 (3.16b)


Note that the Phase I problem defined by (3.16) has n +m variables and m con-
straints. Since b :::: 0 is assumed,

[;] = [~]
is a basic feasible solution to the Phase I problem. Also note that the Phase I problem is
always bounded below by 0, since xa :::: 0 is required. Therefore, applying the simplex
method (with a cycling prevention mechanism to be discussed later) to a Phase I problem
always results in an optimal solution

[;0*]
There are two possible cases:

Case 1 xM =f. 0. If ~* =1= 0, then the original problem is infeasible, since if the
original problem has a feasible solution x, then

[~]
is feasible to the Phase I problem with zero objective value. This violates the optimality
of the solution

[;0*]
Case 2 xa* = 0. In this case, if the current basis does not contain any artificial
variable in it, then x0 forms a starting basic feasible solution to the original problem. In
particular, if it is nondegenerate, x0 has exactly m positive elements in it to form the
basis. On the other hand, if it is degenerate with at least one artificial variable remaining
in the basis, say x;a = 0 is the kth basic variable in current basis, then we denote ek to
be an m -dimensional vector with its kth element being equal to 1 and the rest being
equal to 0 and consider the value e[B- 1Aq for each nonbasic variable Xq associated with
the current optimal solution.
There are two possibilities:
1. If e[B- 1Aq =1= 0 for a nonbasic variable xq, then we can bring Xq = 0 to the current
basis as a basic variable to replace x;a. In this case, the optimal solution to the
Phase I problem provides a starting basis without any artificial variable in it for
the original linear programming problem.
Sec. 3.4 Starting the Simplex Method 41

2. If e[B-tAq = 0 for every nonbasic variable Xq, then we know the kth row of the
constraint set Ax = b is redundant. In this case we can remove that redundant row
from the original constraints and restart the Phase I problem.

Validation of these two cases is left to the reader as an exercise.

3.4.2 Big-M Method

Unlike the two-phase method, the big-M method imposes a large penalty M > 0 for
each artificial variable and solves the following linear programming problem:
n m
Minimize z = l:c1x1 + l:Mxia (3.17a)
}=t i=t

subject to Ax+ xa = b; x :=:: 0, xa :=:: 0 (3.17b)


Note that

[;] = [~]
is a starting basic feasible solution and M can be thought of as the penalty to be paid
for xa =1= 0. In theory, when M is chosen to be large enough, the artificial variables will
not appear in the final solution. In reality, we may raise a fundamental issue, namely
how big should M be? This is a very important issue in implementation. Consider the
following simple example:
Example 3.2
Minimize Xt

subject to EXt - x2- X3 =E (E > 0)

The associated big-M problem can be stated as follows:


Mimimize x1 + M X4
subject tO EXt - X2- X3 + X4 = E

It is clear that x_T = (1, 0, 0, 0) and X:T = (0, 0, 0, E) are two basic feasible solutions
to the big-M problem with objective value of 1 and EM, respectively. Since x corresponds
x
to a basic feasible solution to the original problem but does not, we have to make sure
that 1 < EM, or M > 1/E for any given E > 0. Consequently we see the difficulty of
choosing M for a general implementation.
When the simplex method is applied to solve the big-M problem (3.17) with
sufficient large M > 0, since the problem is feasible, we either arrive at an optimal
42 The Revised Simplex Method Chap. 3

solution to the big-M problem or conclude that the big-M problem is unbounded below.
But what can we say about the original problem in each. case?

Case 1 The big-M problem has a finite optimum at (x*, xa*). In this case, if
xa* = 0, then x* is an optimal solution to the original linear programming problem. The
reason is that, for each feasible solution x of .the original linear program,

[~]
is a feasible solution to the big-M problem. Hence we know
m
cT x = cT x + Mx 0 2:: cT x* + ML Xi a* = cT x*
i=l

On the other hand, if xa* =f. 0, then we can conclude that the original linear programming
problem has no feasible solution. To show this case, we assume that the original problem
has a feasible solution x. Then

[~]
is a feasible solution to the big-M problem and
m
cT x = CT x +M X 0 2:: cT x* + ML Xi a*
i=l

But this inequality is impossible, since M is sufficiently large and at least one xf* > 0.
Therefore x could not be a feasible solution to the original problem.

Case 2 The big-M problem is unbounded below. Similar to Case 1, we can


see that if all artificial variables are equal to zero, then the original problem is also
unbounded below. Otherwise, if at least one artificial variable is positive, then the
original problem is infeasible.

3.5 DEGENERACY AND CYCLING

We have seen that the step length a can assume zero value, if one or more of the basic
variables involved in the minimum ratio test tum out to be zero. In this case, the current
basic feasible solution is degenerate, and the new basic feasible solution stays at the
same extreme point although the new basis is different. In other words, although it
appears that geometrically we are stagnant at an extreme point of the feasible domain P,
algebraically we are not. Thus technically we can proceed with the simplex iterations
even if a = 0, but the real danger is that at some point we might return to the old basis.
This phenomenon is called cycling. Note that for a degenerate basic feasible solution x
with p( < m) positive components, we may have up to
(n- p)!
C((n- p), (n- m)) = (n- m)!(m- p)!
Sec. 3.5 Degeneracy and Cycling 43

different bases corresponding to the same extreme point x. The following example given
by E. M. L. Beale in 1955 shows that the simplex method could be trapped into cycling
problem if the largest reduction rule is used for entering basis and the minimum ratio
test with smallest index rule as tie-breaker is used for leaving basis.
Example 3.3
3 1
Minimize - -x4
4
+ 20xs- -x6
2
+ 6x7
1
subject to XJ + 4x4 - Sxs - X6 + 9x7 =0

x3 +x6 =1

In the exercise, it can be verified that the optimal solution is given by XI =


3/4, x2 = X3 = 0, X4 = 1, xs = 0, X6 = 1, X? = 0 with an optimal objective value -5/4.
However, if we start with a basis {x 1 , x 2 , x 3 } and follow the previously mentioned rules
for pivoting, then the successive new bases are {x4, x2, x 3}, {x4, xs, x3}, {x6, xs, x3},
{x 6 ,x7 ,x3 }, {xi,x7 ,x3}, and return to {x 1,x 2,x3}. If the same sequence of pivots is
repeated again and again, the simplex method will cycle forever among these bases
without reaching the optimal solution.
Another interesting point is that a degenerate basic feasible solution can be optimal
even if some of the reduced costs are negative, since the corresponding edge directions
may be infeasible. This phenomenon is caused by the overdetermined system associated
with a degenerate basic feasible solution.
To be more precise, for a nondegenerate basic feasible solution x, it must lie in the
intersection of n linearly independent hyperplanes defined by the system

Mx = [~] (3.18)

where

x = [ ::] and M = [~ ~] (3.19)

As a matter of {act, since B is nonsingular, M is also nonsingular and x is uniquely


determined by

(3.20)

Nevertheless, for a degenerate basic feasible solution x E W, it satisfies not only


then equations in (3.18), but also at least one more linear equation
Xp = 0
44 The Revised Simplex Method Chap. 3

for some basic variable Xp. Hence it is overdetermined by more than n linear equations.
Therefore, some edge directions lead to infeasibility, as shown in Figure 3.1.
Note that the matrix M is called the fundamental matrix and each edge direction
dq is a column of M -l which corresponds to a nonb~sic variable Xq.

3.6 PREVENTING CYCLING

Having looked at the trap of cycling, we need some means to prevent it from happening.
As we have seen, cycling can occur only when degeneracy is encountered. Since it is
intuitively clear that degenerate basic feasible solutions can be eliminated by slightly
perturbing the constraint parameters (resulting in only a slightly perturbed optimal value
and optimal solutions), it should not be surprising that cycling can be prevented. Among
several methods available for the prevention of cycling, two most commonly used are
the lexicographic rule proposed by G. B. Dantzig, A. Orden, and P. Wolfe in 1955, and
the Bland's rule due toR. G. Bland in 1977.
Observe that in the absence of degeneracy the objective values at each iteration
of the simplex method form a strictly decreasing monotone sequence that guarantees no
basis will be repeated. When degeneracy is involved, the sequence is no longer strictly
decreasing. To prevent revisiting the same basis, we need to incorporate another index
to keep some strictly monotone property for cycling prevention.

3.6.1 Lexicographic Rule

Basically, the lexicographic rule is used to select a leaving variable from the current
basis. It ensures no cycles by the fact that while the objective value c~B- 1 b may
remain constant in the presence of degeneracy, the vector [c~B- 1 blc~B- 1 r can be kept
lexicographically monotone decreasing.
In this rule, we first use the minimum ratio test (3.15) to decide pivot-out candidates.
If the test generates a unique index, then the corresponding variable leaves the basis. In
case there is a tie among several indices, we restrict ourselves to these indices and conduct
another minimum ratio test with the value of x/ being replaced by its corresponding
element in the vector n- 1Ap 1, where Ap 1 is the column in A corresponding to the
basic variable Xp1 with the smallest index. If the tie is still unbroken, we conduct further
minimum ratio tests on those still tied indices by using n- 1Ap2 , where Ap2 is the column
in A corresponding to the basic variable xp 2 with the second smallest index, and so forth.
In the exercise, we show that when or before all the m columns of basic variables are
used, the tie must be broken, and the unique index leads to a lexicographically monotone-
decreasing sequence of [c~B- 1 blc~B- 1 f.

3.6.2 Bland's Rule

Bland's rule is very simple. It specifies the choice of both the entering and leaving
variables. In this rule, variables are first ordered in sequence, then
Sec. 3.7 The Revised Simplex Method 45

1. Among all nonbasic variables with negative reduced costs, choose the one with the
smallest index to enter the basis.
2. When there is a tie in the minimum ratio test, choose the basic variable with the
smallest index to leave the basis.

Bland's rule actually creates the following monotone property (to be proved in an
exercise): If a variable xq enters the basis, then it cannot leave the basis until some other
variable with a larger index, which was nonbasic when Xq enters, also enters the basis.
This monotone property prevents cycling, because in a cycle any variable that enters the
basis must also leave the basis, which implies that there is some largest indexed variable
that enters and leaves the basis. This certainly contradicts the monotone property.

3.7 THE REVISED SIMPLEX METHOD

In this section we introduce the revised simplex method, which is a computationally


efficient implementation of the simplex approach. It does not need the simplex tableau
used in the original simplex method, but it computes all pertinent information in a
systematic and space saving-manner.
Consider the sketched simplex iterative scheme. From the implementation point of
view, the most laborious computation is spent on calculating B- 1 at each iteration. Once
this inverse matrix is known, other computational work becomes simple. However, a
straightforward implementation of inverting an m x m matrix may require O(m 3 ) ele-
mentary operations. Moreover, it is undesirable to compute the inverse matrix explicitly
because of the numerical-stability and error-propagation problem (due to round-off and
truncation errors resulting from finite word length of computers). In the revised simplex
method, instead of inverting the basis matrix B directly, we get around the problem by
solving an equivalent system of simultaneous equations.
A step-by-step computation procedure of the revised simplex algorithm can be
described as follows:
Let
X= [X;]
be a current basic feasible solution (which may be obtained by the two-phase method), Aj,
be the jith column of A, B = [Aj 1 Aj2 Ah ... AjJ be the basis, and B = U1, h, h, ... , jm}
be the index set of the basic variables.

Step 1: Compute the "simplex multipliers," w, by solving the system


BTw = cB (3.21)
where c 8 is cost vector corresponding to the basic variables, i.e., c 8 = (cj 1 , ch, Cj3 ,
... 'Cjm)T.
Step 2: Compute the reduced costs
Vj ~B. (3.22)
46 The Revised Simplex Method Chap. 3

Step 3 (check for optimality): If rj :=::: 0 Vj ¢ B, then STOP. The current


solution is OPTIMAL.
Step 4 (enter the basis): Choose q ¢ B such that rq < 0.
Step 5 (edge direction): Compute dCf by solving the system

Bd = -Aq (3.23)

and set dCJ = [ ~ J .

Step 6 (check for unboundedness): If d :=::: 0, then STOP. The problem is un-
bounded below.
Step 7 (leave the basis and step-length): Find an index }p and step-length a
according to

(3.24)

Step 8 (update): Set

Xq +-a (3.25)
Xj; +- + adj;
Xj; for (3.26)
B +- B + [Aq - Ajp]e~ (3.27)

:B +-BU{q}\{Jp} (3.28)

Go to Step 1.
Note that n- 1 is implicitly calculated in both (3.21) and (3.23). One can use the
well-known Gauss Jordan elimination method for solving systems of equations. How-
ever, a more popular implementation used in most modern computer packages is the LU
factorization method, since it is more efficient, accurate, and numerically stable. This
method is particularly preferred when the problem is sparse and in large scale.
The basic idea of LU factorization method is to triangularize the matrix B as a
product of a lower triangular matrix L and an upper triangular matrix U. In this way,
solving (3.21) becomes solving

(3.29)

Since U7 is a lower triangular matrix, we can first denote L 7 w by y and solve

7
U J= Cs (3.30)
Sec. 3.7 The Revised Simplex Method 47

by the forward solve process, in which we obtain the first element of y directly from
the first equation in (3.30), and then substitute it into the second equation in (3.30) for
the second element of y, and so on. Once y is obtained, then w can be obtained by
solving
(3.31)
This time, since LT is upper triangular, w can be easily solved by the backward solve
process, in which we obtain the last element of w directly from the last equation in
(3.31), and then substitute it into the second last equation in (3.31) for the second last
element of w, and so on. Similar techniques work for solving (3.23), too. For more
serious implementation, one is encouraged to know more about how to obtain the L
and U factors, how to update them, and how to use scaling techniques for numerical
accuracy. These will be covered in Chapter 10.
Also note that in (3.27), ep is an m-vector with one at its pth element and zero
everywhere else. The right-hand side of (3.27) uses matrix operations to replace the
column A1P by Aq in the basis B. The readers can easily check this out. As to (3.28), it
simply means that we add index q and drop }p in the index set B.
Given below are two examples. One has degenerate basic feasible solutions, which
illustrate how the revised simplex method works.
Example 3.4 Nondegenerate
Consider the following linear programming problem:

Minimize - 3xt - 2xz

subject to XJ + xz + x3 = 40

Note that this problem and Example 2.1 have the same feasible domain but different
objective function. The graph of its feasible domain can be referred to Figure 2.2. To
solve this problem, we first note that
1
1 0
1 OJ
1 '
-2 0 0]

An easy starting basic feasible solution can be identified at x0 = [ 0 0 40 60 f


with the basis matrix

B= [ ~ ~]
index set B = {3, 4} and c s = [0 0 f. This solution corresponds to the extreme point
(0, 0) in Figure 2.2, and the objective value at this point is zero. The revised simplex
method proceeds as follows.
48 The Revised Simplex Method Chap. 3

Step 1: Compute the simplex multipliers w by solving

[~ ~r W= [~J
which implies that w = [ 0 0 f.
Step 2: Compute the reduced costs

TJ = CJ- WT A1 = -3- (0 Of(l 2) = -3


r2 = C2- WT A2 = -2- (0 Of(l 1) = -2
Step 3 (check for optimality): Since r; < 0, fori = 1, 2, the current solution is
not optimal.
Step 4 (enter the basis): We may choose q = 1 here, since r 1 < 0. (Actually, we
may choose q = 2, too.)

Step 5 (edge direction): Since A 1 = [ ~], we computed by solving

which results in d = [ -1 -2 ]T. Note that the edge direction is given by


d 1 =[1 0 -1 -2f.
Step 6 (check for unboundedness): Since d < 0, no unboundedness has been
detected, and we proceed further.
Step 7 (leave the basis and step length): The minimum ratio test shows that

a = --X4 = min
d4 3:0:i::S4
{-X"}
-'
d;
=min {-40 -60}
--, -
-1 -2
- = 30

Hence x 4 will leave the basis.


Step 8 (update): Set

x1 -+--a = 30 (enter the basis)


x 2 stays at zero
X3 -+-- 40 + (30)( -1) = 10
x 4 -+-- 60 + (30)( -2) = 0 (leave the basis)

B -+-- B + [AJ - A4][0 1] = [~ ~] + [G) - G) ][0 1] = [~ ~]


:B -+-- :B u {1}\{4} = {3, 1}
This finishes one iteration of the revised simplex method. The new solution
(xJ, x2) = (30, 0) is a vertex of the polytope, which is adjacent to the original
Sec. 3.7 The Revised Simplex Method 49

vertex (xt, xz) = (0, 0) but has a smaller objective value ( -90 as opposed to 0).
On the next iteration, the revised simplex method will step to the vertex whose first
two coordinates are (20, 20), and this one will turn out to be optimal. Its objective
value is -100.
Now we add one more constraint, x 1 ::: 30, to make the previous example
degenerate at (30, 0) and consider the following problem:

Example 3.5 Degenerate


Minimize - 3xt - 2xz

subject to x1 + xz + x3 = 40
2xt +xz +x4 = 60

x1 +xs = 30

Figure 3.2 is a two-dimensional graph of the feasible domain P. Note that the
extreme point (30, 0) is now "overdetermined" as the intersection of three lines, instead
of two.

xz

Figure 3.2

For the revised simplex method, we have

A= [; ~ ~ ~ ~] , b= [~g]. and c' = [-3 -2 0 0 OJ


1 0 0 0 1
50 The Revised Simplex Method Chap. 3

As in Example 3.4, we start it from the extreme point (0, 0) which corresponds to
the basic feasible solution x0 = [ 0 0 40 60 30 f with the basis matrix

B !
~ [~ ~]
and the index set B = {3, 4, 5}.
Carrying out one iteration of the revised simplex method as we did in Example 3.4,
the new solution becomes x 1 = [ 30 0 10 0 0 f which is a degenerate solution
obtained from the negative reduced cost of r 1 and the minimum ratio test with a = 30
and a tie between X4 and x 5 . This means x 1 is entering the basis and either x 4 or x 5
is leaving the basis. Let us choose xs to leave the basis. Therefore our current basis
becomes
1 0 1]
B=
[00 01 21
with an index set B = {3, 4, 1}.
Continue one more iteration, it can be easily checked that w = [ 0 0 -3 ]T,
r 2 < 0, d = [ -1 -1 0 f, and a = min [ -::_\ , _:>1 ] = 0. Therefore we know that
0

x 2 enters the basis, X4 leaves the basis, and the step length is zero. Updating related
information, we have reached a new basic feasible solution x2 = [ 30 0 10 0 0 f
with a new basis index B = {3, 2, 1}. Note that x 1 = x2 . This means we actually stay at
the same extreme point (30, 0) in Figure 3.2 owing to the zero step length.
For one more iteration, it can be checked out that x 5 is entering the basis to replace
x 3 , and the current basic feasible solution x3 = [ 20 20 0 0 10 f which is an
optimal solution.
Note that in the previous example the revised simplex method avoids cycling
problem even without any cycling prevention mechanism. But this is not true in general.

3.8 CONCLUDING REMARKS

In this chapter, we have seen the basic concepts behind the simplex method and have de-
veloped a computationally attractive procedure to implement the revised simplex method.
In the next chapter we will study the duality theory of linear programming and develop
two more implementation schemes for the simplex method, namely the dual simplex
method and the primal dual simplex method. The performance of the simplex method
will be discussed in Chapter 5.

REFERENCES FOR FURTHER READING

3.1. Bartels, R. H., and Golub, G. H., "The simplex method for linear programming using LU
decomposition," Communications of the ACM 12, 26-268 (1969).
Exercises 51

3.2. Bazaraa, M. S., Jarvis, J. J., and Sherali, H. D., Linear Programming and Network Flows (2d
ed.), John Wiley, New York (1990).
3.3. Beale, E. M. L., "Cycling in the dual simplex algorithm," Naval Research Logistics Quarterly
2, 269-276 (1955).
3.4. Bland, R. G, "New finite pivoting rules for the simplex method," Mathematics of Operations
Research 2, 103-107 (1977).
3.5. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).
3.6. Dantzig, G. B., Linear Programming and Extensions, Princeton University Press, Princeton,
NJ (1963).
3.7. Gill, P. E., Murray, W., and Wright, M. H., Numerical Linear Algebra and Optimization,
Vol. 1, Addison-Wesley, Redwood City, CA (1991).
3.8. Goldfarb, D, and Todd, M. J., "Linear programming," in Optimization, Handbook in Op-
erations Research and Management Science, ed. by Nemhauser, G. L. and Rinnooy Kan,
A. H. G., Vol. 1, 73-170, Elsevier-North Holland, Amsterdam (1989).
3.9. Golub, G. H., and Van Loan, C. F., Matrix Computations, Johns Hopkins University Press,
Baltimore (1983).
3.10. Kotiah, T., and Slater, N., "On two-server Poisson queues with two types of customers,"
Operations Research 21, 597-603 (1973).
3.11. Luenberger, D. G., Introduction to Linear and Nonlinear Programming (2d ed.), Addison-
Wesley, Reading, MA (1973).

EXERCISES

3.1. Consider a linear programming problem with its feasible domain P = {x E R"IAx:::: b, x::::
0}. If b:::: 0, suggest an easy way to find a starting basic feasible solution.
3.2. Consider a linear programming problem in its standard form with P = {x E R"IAx =
b,x:::: 0}.
(a) Let d E R". Show that a necessary condition for d to be a feasible direction is that
Ad=O.
(b) Suppose x = (xi, ... , x"l E P with x; > 0 when d; =I= 0. Show that there exists a
scalar a > 0 such that x +ad :::: 0.
3.3. Consider the following linear programming problem:

Minimize -2x1 - x2 + x3 + X4 + 2xs


subject to - 2x1 + x2 + x3 + x4 + xs = 12
- Xj + 2X2 + X4 - X5 =5
x1- 3x2 +x3 +4xs = 11

Let x3, x4, and xs be basic variables and XI, x2 nonbasic.


(a) Write down B, N and M, calculate s-I, M- 1 and the corresponding basic feasible
solution.
52 The Revised Simplex Method Chap. 3

(b) Applying the Gaussian elimination method to the three constraints, you can express basic
variables x3, x4, and xs in terms of nonbasic variables x1 and x2. Now reformulate the
linear programming problem in terms of two nonbasic variables.
(c) Draw a two-dimensional graph for the reformulated linear programming problem and
explain why you can represent the feasible domain P of the original linear program by
a two-dimensional graph, although P c R 5 .
(d) Mark the basic feasible solution of (a) on the two-dimensional graph.
(e) Calculate B- 1A and B- 1b and compare the results with those in (b). What is your
conclusion and explanation?
(f) Go back to (a) and write down the direction vectors d 1 and d2 . Compute the corre-
sponding reduced costs.
(g) Between d 1 and d2 , which direction leads to a potential reduction in the objective value?
This is your direction of translation. How far can we proceed along that direction without
violating the nonnegativity constraints? This is your step length.
(h) Take the direction of translation and step length of (g) to move to a new solution. Show
that the new solution is not only a basic feasible solution but also an adjacent extreme
point of the previous one.
(i) Update all data in (a). Is the new solution optimal? Why?
U) Redo (b) and (c) in terms of the new basic and nonbasic variables. Notice that the new
graph is different from the previous two-dimensional graph. Why?
(k) Summarize what you have learned from (a) to U).
(I) In general, if a given linear programming problem in its standard form has n variables
and n - 2 nonredundant constraints, you can always have a two-dimensional graphic
representation for it. Why?
3.4. Complete the simplex iterations for Example 3.1.
3.5. In Case 2 of the two-phase method, prove the following statement is true: "If e[ B -I Aq = 0
for every non basic variable Xq, then the kth row of the constraint set Ax = b must be
redundant."
3.6. In Case l of the big-M method, verify that the inequality
m
cT x = cT x + M x 0 ::;: cT x* + M L x;a*
i=l

is impossible in that situation.


3.7. Solve the following linear programming problem by both the two-phase method and the
big-M method.
Minimize 3xl - 3x2 + x3
subject to XI + 2x2 - X3 ::: 5

3.8. Show that for a degenerate basic feasible solution x with p( < m) positive components, we
may have up to
(n- p)!
C((n- p), (n- m)) = (n- m)!(m- p)!

different bases corresponding to the same extreme point x.


Exercises 53

3.9. Show that M- 1 in Equation (3.20) is correct.


3.10. Following the pivoting rules of Example 3.3 to show the cycling behavior of the simplex
method. Then solve the problem again with the help of the lexicographic rule or Bland's
rule to eliminate the cycling problem.
3.11. Give a complete proof of the fact that Bland's rule prevents cycling as sketched in Sec-
tion 3.6.2.
3.12. Solve the system of equations Bx = y with

B ~ l~
0
4
8
2
8
29
10 22
~~
22
42
l
(a) Show that B can be factorized as the product of L and U where

and

(b) Define w = Ux, then solve Lw = y for w by the forward solve process.
(c) Solve Ux = w for x by the backward solve process.
3.13. Complete the simplex iterations of Example 3.4.
3.14. Complete the simplex iterations of Example 3.5.
3.15. Draw a detailed flow chart of the revised simplex method with the two-phase method for
computer implementation.
3.16. Develop computer codes based on the flow chart of the last exercise and test the following
problems:
(a) Minimize Xt + x2 + x3 - 3x4 + 6xs + 4x6

subject to Xt + x2 + 3x4 - xs + 2x6 = 6

X2 + X3 - X4 + 4xs + X6 = 3

Xt,X2,X3,X4,X5,X6 ~ 0
(b) Minimize - + 1xs + X6 + 2x7
x4

subject to Xt + X4 + X5 + X6 + X7 = 1

x2 + (l/2)x4- (ll/2)xs - (5/2)x6 + 9x7 = 0

x3 + (1/2)x4- (3/2)xs- (lj2)x6 + x7 = 0


54 The Revised Simplex Method Chap. 3

(c) Minimize - 7xs - 4x6 + 15x7


subject to XJ + (lj3)xs- (32/9)x6 + (20j9)x7 = 0
x2 + (lj6)xs- (13/9)x6 + (5/18)x7 = 0
X3 + (2j3)xs - (16j9)x6 + (lj9)x7 = 0
X4 =1
X],X2,X3,X4,XS,X6,X7 ~ 0

(d) Minimize 3xl - lx2 + 4x3 - X4 + xs


subject to lx1 + 3x2 + X3 + 4x4 + 4xs = 12

(e) Minimize - x3
subject to XJ :S: 1

x2 ~ 0.00000001x1

x2 :S: 1 - O.OOOOOOOlx1

x3 ~ O.OOOOOOOlx2

X3 :S: 1 - O.OOOOOOOlx2

X],X2,X3 ~ 0
3.17. Analyze the computer outputs of 3.16 and comment on the special properties of each sub-
problem.
4

Duality Theory and


Sensitivity Analysis

The notion of duality is one of the most important concepts in linear programming.
Basically, associated with each linear programming problem (we may call it the primal
problem), defined by the constraint matrix A, the right-hand-side vector b, and the cost
vector c, there is a corresponding linear programming problem (called the dual problem)
which is constructed by the same set of data A, b, and c. A pair of primal and dual
problems are closely related. The interesting relationship between the primal and dual
reveals important insights into solving linear programming problems.
To begin this chapter, we introduce a dual problem for the standard-form linear
programming problem. Then we study the fundamental relationship between the primal
and dual problems. Both the "strong" and "weak" duality theorems will be presented.
An economic interpretation of the dual variables and dual problem further exploits the
concepts in duality theory. These concepts are then used to derive two important simplex
algorithms, namely the dual simplex algorithm and the primal dual algorithm, for solving
linear programming problems.
We conclude this chapter with the sensitivity analysis, which is the study of the
effects of changes in the parameters (A, b, and c) of a linear programming problem
on its optimal solution. In particular, we study different methods of changing the cost
vector, changing the right-hand-side vector, adding and removing a variable, and adding
and removing a constraint in linear programming.

55
56 Duality Theory and Sensitivity Analysis Chap. 4

4.1 DUAL LINEAR PROGRAM

Consider a linear programming problem in its standard form:


Minimize cT x (4.1a)
subject to Ax = b (4.1b)
x:::O (4.1c)
where c and x are n-dimensional column vectors, A an m x n matrix, and b an m-
dimensional column vector. Let us assume that x is a nondegenerate basic feasible
solution. Corresponding to x, we have a basis matrix B and nonbasis matrix N. Also let
B be the index set of basic variables and N the index set of nonbasic variables.
According to Theorem 3.1, we know x is optimal if and only if rq ::: 0, for each
q E N, or, equivalently, c~B- 1 Aq :::; cq for each q E N. Also it is easy to see that
c~B- 1 Ap = cp for each p E B. Therefore, by denoting c~B- 1 = wT, we have

Vq EN
and
WT Ap = Cp

Therefore, in vector form, we have

or, equivalently,
(4.2)

Notice that, at this optimal solution x, since XN = 0, we see

b T W.= W Tb = Cs
TB-Ib = CsXB
T T
=C X

But in general we only have


bT w = wTb = wT Ax :::; cT x, for Ax = b and x ::: 0
Therefore, in view of the new variables w, we can define an associate linear
programming problem:
Maximize bT w (4.3a)
subject to AT w :::; c; w unrestricted (4.3b)
Notice that problem (4.3) is a maximization problem with m unrestricted variables
and n inequality constraints. The roles of the variables and constraints are somewhat
reversed in problems (4.1) and (4.3). Usually, we call problem (4.1) the primal problem
and problem (4.3) the dual problem. These two make a primal-dual pair. A vector w of
dual variables becomes a dual solution, if constraint (4.3b) is satisfied.
Sec. 4.2 Duality Theory 57

Example 4.1
The dual linear program of Example 2.1 becomes

Maximize 40wt + 60w2


subject to Wt + 2w2 :SO -1

W[, W2 :SO 0

Example 4.2
For a linear programming problem in the "inequality form," i.e.,

Minimize cT x

subject to Ax :=: b, x :=: 0

We can convert this problem into its standard form and then derive its dual problem.
As we are required to show in an exercise, it is the following:
Maximize bT w
subject to AT w:::: c, w::: 0
These two linear programming problems are sometimes called a symmetric pair of primal
and dual programs, owing to the symmetric structure observed.

4.2 DUALITY THEORY

Note that both the primal and dual problems are defined by the same data set (A, b, c).
In this section, we study the fundamental relationship between the pair. First we show
that the concept of dual problem is well defined in the sense that we can choose either
one of the primal-dual pair as the primal problem and the other one becomes its dual
problem.

Lemma 4.1. Given a primal linear program, the dual problem of the dual linear
program becomes the original primal problem.
Proof Let us start with problem (4.1). Its dual problem (4.3) may be expressed as

-Minimize z= -bT w (4.4a)


subject to AT w :::: c; w unrestricted (4.4b)
Since w is unrestricted, we may represent w = u - v with u ::: 0 and v =::: 0. To
convert problem (4.4) into its standard form, we further introduce slack variables s =::: 0
58 Duality Theory and Sensitivity Analysis Chap. 4

for an equivalent form


-Minimize z = [ -bT I bT I O]x (4.5a)

subjectto [A 7 1-ATII]x=c; x= [!]2-0 (4.5b)

Note that problem (4.5) is in its standard form. Its dual problem becomes
-Maximize z= cT w (4.6a)

subject to [ -t ] ~ Y],
w [ w unresnicted (4.6b)

Defining x = -w, we have an equivalent problem


Minimize z = cT x (4.7a)
subject to Ax= b, x=::O (4.7b)
which is nothing but the primal linear programming problem (4.1).

Next we show that the primal (minimization) problem is always bounded below by
the dual (maximization) problem and the dual (maximization) problem is always bounded
above by the primal (minimization) problem, if they are feasible.

Theorem 4.1 (Weak Duality Theorem of LP). If x0 is a primal feasible solution


and w0 is dual feasible, then cT x0 :::: bT w0 .
Proof The dual feasibility of w0 implies that c :::: AT w0 • For x 0 is primal feasible,
we know x0 :::: 0 and, hence, x 0 Tc:::: x 0 TATw0 . Noting also that Ax0 = b, we see
cT xo = xoT c:::: xoT AT wo = bT wo.

Several corollaries can be immediately obtained from the weak duality theorem:

Corollary 4.1.1. If x0 is primal feasible, w0 is dual feasible, and cT x


0
= bT w0 ,
then x0 and w0 are optimal solutions to the respective problems.
Proof Theorem 4.1 indicates that cT x :::: bT w 0 = cT x0, for each primal feasible
solution x. Thus x0 is an optimal solution to the primal problem. A similar argument
holds for the dual problem.

Corollary 4.1.2. If the primal problem is unbounded below, then the dual prob-
lem is infeasible.
Proof Whenever the dual problem has a feasible solution w 0 , the weak duality
theorem prevents the primal objective from falling below bT w 0 .

Similarly, we have the following result:


Sec. 4.2 Duality Theory 59

Corollary 4.1.3. If the dual problem is unbounded above, then the primal prob-
lem is infeasible.
Note that the converse statement of either of two foregoing corollaries is not true.
For example, when the primal problem is infeasible, the dual could be either unbounded
above or infeasible. However, if the primal is infeasible and the dual is feasible, then
the dual must be unbounded. Concrete examples are presented in the exercises.
With these results, a stronger result can be stated as the following important theo-
rem.

Theorem 4.2 (strong duality theorem of LP)

1. If either the primal or the dual linear program has a finite optimal solution, then
so does the other and they achieve the same optimal objective value.
2. If either problem has an unbounded objective value, then the other has no feasible
solution.

Proof For the first claim, without loss of generality, let us assume that the primal
problem has reached a finite optimum at a basic feasible solution x. If we apply the
revised simplex method at x and define wr = c~B- 1 , then

T
c-Aw= [CB] [BT]
CN -NT w=r:::O (4.8)

Therefore w is dual feasible. Moreover, since x is a basic feasible solution,

CT X= CBXB
T TB-Ib = W Tb = bT W
= CB (4.9)

Owing to Corollary 4.1.1, we know w is an optimal solution to the dual linear program.
The proof of the second claim is a direct consequence of Corollary 4.1.2 and
Corollary 4.1.3.

The strong duality theorem has several implications. First of all, it says there is no
duality gap between the primal and dual linear programs, i.e., cT x* = bT w*. This is not
generally true for nonlinear programming problems. Second, in the proof of Theorem 4.2,
the simplex multipliers (see Section 7 of Chapter 3), or Lagrange multipliers, become
the vector w of dual variables. Furthermore, at each iteration of the revised simplex
method, the dual vector w maintains the property cT x = br w. However, unless all
components of the reduced costs vector r are nonnegative, w is not dual feasible. Thus,
the revised simplex method maintains primal feasibility and zero duality gap and seeks
for dual feasibility. Needless to say, the simplex multipliers w* corresponding to a primal
optimal solution x* form a dual optimal solution.
A celebrated application of the Duality Theorem is in establishing the existence
of solutions to systems of equalities and inequalities. The following result, known as
Farka' s lemma, concerns this aspect.
60 Duality Theory and Sensitivity Analysis Chap.4

Theorem 4.3 (Farka's lemma) The system


Ax= b, x::: 0 (4.10)
has no solution if and only if the system
AT w::::: 0, bT w > 0 (4.11)
has a solution.
Proof Consider the (primal) linear program
Minimize or X
subject to Ax= b, x ::: 0
and its dual
Maximize br w

subject to AT w::::: 0, w unrestricted


Since w = 0 is dual feasible, the primal problem is infeasible if and only if the
dual is unbounded. However, the primal is infeasible if and only if system (4.10) has no
solution; and the dual is unbounded if and only if system (4.11) has a solution. To be
more precise, for any solution to system (4.11 ), say d such that A r d ::::: 0 and br d > 0,
then ad is a dual feasible solution that leads to an unbounded objective value as a
approaches infinity.

An equivalent statement of Parka's lemma is as follows:


Of the two systems
(I) Ax = b, x ::: 0 (4.12a)
(II) AT w ::::: 0, bT w > 0 (4.12b)
either system (I) or (II) is solvable, but not both.
The geometric implication of this result is quite straightforward. If we denote A1
as the jth column of A, the existence of a solution to system (4.12a) mandates that b
should lie in the convex cone defined by Aj, for j = 1, 2, ... , n, since x::: 0 and
n
b = LAJXj
}=I

However, the existence of a solution w to system (4.12b) requires w to make an angle


greater than 90 degrees with each column of A while it makes an angle less than 90
degrees with b. Consequently, b.is required to lie outside of the cone defined by the
columns of A. Therefore one and only one of the two systems has a solution. Figure 4.1
is a graphic representation of our discussion.
Variants of Parka's lemma, all of them stating that, given a pair of systems of
equalities and inequalities, one and only one is solvable, are broadly known as theorems
of the alternative. We shall introduce some of them in the exercises.
Sec. 4.3 Complementary Slackness and Optimality Conditions 61

Cone defined by colums of A


(Cone in which b should lie for (I) to have a solution)

--------------- I

------------ '
I

: --- I

Cone in which b should lie for (II) to


have a solution
\
Cone containing w
Figure 4.1
for (II) to have a solution

Another important application of the duality theory is in establishing optimality


conditions for linear programming. In the next section, we first introduce the notion of
complementary slackness and then study the Karush-Kuhn-Tucker conditions for linear
programming problems.

4.3 COMPLEMENTARY SLACKNESS AND OPTIMALITY


CONDITIONS

Recall the symmetric pair of primal and dual linear programs:


Minimize c1 X (P)
subject to Ax:;:: b, x:;:: 0

Maximize b1 w (D)
subject to A 1 w ~ c, w :;:: 0
For the primal problem, we define
s=Ax-b:;::O (4.13)
as the primal slackness vector. For the dual problem, we define
r=c-A 1 w:;::O (4.14)
as the dual slackness vector. Notice that s is an m-dimensional vector and r an n-
dimensional vector. Moreover, for any primal feasible solution x and dual feasible
62 Duality Theory and Sensitivity Analysis Chap.4

solution w, we know

0 ::: r 7 X + ST W
= (c 7 - w 7 A)x + w 7 (Ax- b)
= c7 X - b 7 W (4.15)

Therefore, the quantity of r 7 x + s 7 w is eyual to the duality gap between the primal
feasible solution x and dual feasible solution w. This duality gap vanishes, if, and
only if,

r7 X = 0 and s7 W = 0 (4.16)

In this case, x becomes an optimal primal solution and w an optimal dual solution. Since
all vectors x, w, r, and s are nonnegative, Equation (4.16) requires that "either r1 = 0
or x1 = 0 for j = 1, ... , n" and "either s; = 0 or w; = 0 fori = 1, ... , m." Hence
(4.16) is called the complementary slackness conditions. This important result can be
summarized as the following theorem:

Theorem 4.4 (Complementary slackness theorem). Let x be a primal feasible


solution and w be a dual feasible solution to a symmetric pair of linear programs. Then
x and w become an optimal solution pair if and only if the complementary slackness
conditions
either ri = (c- A 7 w)j = 0
or XJ = 0, Vj = 1,2, ... ,n
either s; = (Ax- b); = 0
or w; = 0, Vi= 1, 2, ... , m

are satisfied.
As to the primal-dual pair of linear programs in the standard form, i.e.,

Minimize c 7 x (P)

subject to Ax=b, x:::o

Maximize b 7 w (D)
subject to A 7 w::: c

since the primal problem always has zero slackness (they are tight equalities), the condi-
tion w 7 s = 0 is automatically met. Therefore, the complementary slackness conditions
are simplified to r 7 x = 0.
With this knowledge, we can state the Karush-Kuhn-Tucker (K-K-T) conditions
for linear programming problems as following:
Sec. 4.4 An Economic Interpretation of the Dual Problem 63

Theorem 4.5 (K-K-T optimality conditions for LP). Given a linear program-
ming problem in its standard form, vector x is an optimal solution to the problem if, and
only if, there exist vectors w and r such that

(1) Ax = b, x :::: 0 (primal feasibility)


(2) A 7 w +r = c, r ::: 0 (dual feasibility)
(3) r 7 X= 0 (complementary slackness)

In this case, w is an optimal solution to the dual problem.


Example 4.3
Let us consider Example 3.4. When the revised simplex method terminates, it can be found
that x = [ 20 20 0 0 f, w = [ -1 -1 f, and r = [ 0 0 1 1 f. Hence we know
the K-K-T conditions are satisfied, and we have reached an optimal solution.

The theorem of K-K-T conditions is one of the fundamental results in mathematical


programming. For a nonlinear programming problem, which is much more general than
the linear programming, it specifies the necessary and/or sufficient conditions for opti-
mality, depending upon whether the given problem satisfies certain regularity conditions.
A detailed discussion of these regularity conditions is beyond the scope of this book.
The result we see in Theorem 4.5 is one special case of the general result.

4.4 AN ECONOMIC INTERPRETATION OF THE DUAL PROBLEM

So far, we have seen that the dual linear program uses the same set of data as the primal
problem, supports the primal solutions as a lower bound, and provides insights into the
sufficient and necessary conditions for optimality. In this section, we intend to explain
the meaning of dual variables and make an economic interpretation of the dual problem.

4.4.1 Dual Variables and Shadow Prices

Given a linear programming problem in its standard form, the primal problem can be
viewed as a process of providing different services (x :;:: 0) to meet a set of customer
demands (Ax= b) in a least expensive manner with a minimum cost (min c7 x).
For a nondegenerated optimal solution x* obtained by the revised simplex method,
we have

X*= [x08] [B-0b] 1


with an optimal cost z* = csB- 1b

where B is the corresponding optimal basis matrix.


Since x8 = B- 1b > 0, for a small enough incremental .6.b in demand, we know
B- (b + .6.b) > 0 and
1
64 Duality Theory and Sensitivity Analysis Chap.4

is an optimal basic feasible solution (why?) to the following problem


Minimize c 7 x
subject to Ax= b + llb, x:::: 0
which is the same process of minimizing the total cost but satisfying more demands. Note
that the optimal cost associated with this problem is z* = cBB- 1 (b+llb). Consequently,
z*- z* = cBB- 1(b + llb)- cBB- 1b = cBB- 1 llb = (w*) 7 llb (4.17)
Recall that w* is the simplex multiplier. At the primal optimal solution x*, it
becomes the vector of dual variables. Equation (4.17) says the incremental cost (Z* - z*)
of satisfying an incremental demand (llb) is equal to (w*) 7 llb. Therefore w7 can be
thought as the marginal cost of the providing one unit of the ith demand at optimum.
In other words, it indicates the minimum unit price one has to charge the customer
for satisfying additional demands when an optimum is achieved. Therefore, the dual
variables are sometimes called the marginal prices, the shadow prices, or the equilibrium
prices.

4.4.2 Interpretation of the Dual Problem

This time, let us consider a linear programming problem in inequality form:


Maximize c7 x
subject to Ax ::=: b, x :::: 0
Its dual linear program becomes
Minimize b 7 w

subject to A7 w :::: c, w :::: 0


First, let us explain the scep.~o of the primal linear program. Consider a man-
ufacturer who makes n products ouf of m resources. To make one unit of product j
(j = 1, ... , n ), it takes aiJ units of resource i for i = 1, 2, ... , m. The manufacturer
has obtained b; units of resource i (i = 1, ... , m) in hand, and the unit price of prod-
uct j (j = 1, ... , n) is c1 at current market. Therefore, the primal problem leads the
manufacturer to find an optimal production plan that maximizes the sales with available
resources.
Next, we consider the dual scenario. Let us assume the manufacturer gets the
resources from a supplier. The manufacturer wants to negotiate the unit purchasing
price w; for resource i (i = 1, ... , m) with the supplier. Therefore the manufacturer's
objective is to minimize the total purchasing price b7 w in obtaining the resources b;
(i = 1, ... , m). Since the marketing price c1 and the "product-resource" conversion
ratio aiJ are open information on market, the manufacturer knows that, at least ideally,
a "smart" supplier would like to charge him as much as possible, so that
Sec. 4.5 The Dual Simplex Method 65

In this way, the dual linear program leads the manufacturer to come up with a least-cost
plan in which the purchasing prices are acceptable to the "smart" supplier.
The foregoing scenarios not only provide economic interpretations of the primal and
dual linear programming problems, but also explain the implications of the complemen-
tary slackness conditions. Assume that the manufacturer already has bi (i
units of resources on hand. Then,
= 1, ... , m)

1. the ith component of the optimal dual vector wj represents the maximum marginal
price that the manufacturer is willing to pay in order to get an additional unit of
resource i from the supplier;
2. when the ith resource is not fully utilized (i.e., aix* < bi where ai is the ith row of
A and x* is an optimal primal solution ), the complementary slackness condition
requires that wj = 0, which means the manufacturer is not willing to pay a penny
to get an additional amount of that resource;
3. when the supplier asks too much (i.e., when AJ w* > Cj, where Aj is the jth
column of A), the complementary slackness condition requires that xJ = 0, which
means that the manufacturer is no longer willing to produce any amount of product
j.

Many other interpretations of the dual variables, dual problems, and complementary
slackness conditions can be found in the exercises.

4.5 THE DUAL SIMPLEX METHOD

With the concept of duality in mind, we now study a variant of the revised simplex
method. Basically, this variant is equivalent to applying the revised simplex method to
the dual linear program of a given linear programming problem. Hence we call it the
dual simplex method.

4.5.1 Basic Idea of the Dual Simplex Method

Recall that the basic philosophy of the revised simplex method is to keep primal feasi-
bility and complementary slackness conditions and seek for dual feasibility at its optimal
solution. Similarly, the dual simplex method keeps dual feasibility and complementary
slackness conditions but seeks for primal feasibility at its optimum.
Let us start with a basis matrix B which results in a dual feasible solution w such
that
(4.18)

We can further define

X=[::]= [B~lb] (4.19)


66 Duality Theory and Sensitivity Analysis Chap.4

In this way, we see that

[B- b]
1
Ax = [B IN] - - = b (4.20)
0
and
(4.21)
Therefore, the dual feasibility and complementary slackness conditions are satisfied
in this setting. However, the primal feasibility is not satisfied unless x 8 = s- 1b ::: 0. In
other words, before reaching an optimal solution, there exists at least one p E B (the index
set of basic variables in the primal problem) such that xP < 0. The dual simplex method
will reset Xp = 0 (that is, drop Xp from the basic variables) and choose an "appropriate"
nonbasic variable xq ~ B to enter the basis. Of course, during this pivoting process, the
dual feasibility and and complementary slackness conditions should be maintained. This
is the key idea behind the dual simplex method.
Note that the complementary slackness conditions are always satisfied because of
the way we defined w and x, hence we only have to concentrate on dual feasibility.
Remember that, in Chapter 3, we showed that dual feasibility is associated with the
reduced costs vector

r = [::]

with r 8 = 0 and r~ = c~- c~B- N.


1
Also remember that the fundamental matrix is

M = [: ~]
with its inverse M- 1 = [:-I -~-IN] Thus the information of dual variables and
dual feasibility is embedded in the following equation:
(w 7 I rN 7 ) = c 7 M- 1 = (c~B- 1 I c~- c~B- 1 N) (4.22)
Needless to say, after each pivoting a new basic variable is introduced to replace an old
one, which results in a new fundamental matrix that produces new information on the
dual according to Equation (4.22). Therefore, in order to maintain the dual feasibility,
we exploit the matrix M- 1 first.

4.5.2 Sherman-Morrison-Woodbury Formula

Note that the fundamental matrix M is an n x n matrix, and a direct inversion re-
quires O(n 3 ) elementary operations. In order to reduce the computational effort, also to
reveal the new dual information in an explicit form, we introduce the Sherman-Morrison-
Woodbury formula to modify the inverse of the fundamental matrix after each pivoting.
We first investigate the changes of the fundamental matrix (from M to M) after
each pivoting. In this case, we assume that Xp leaves the basis and Xq enters the basis.
Sec. 4.5 The Dual Simplex Method 67

Let ej be an n-dimensional unit vector with 1 for its jth component and 0 for the rest.
Then the new fundamental matrix M can be obtained according to

(4.23)

The following example illustrates this mechanism.

Example 4.4
Assume that xT = [xr x2 x3 x4 xs], xr, x2 are basic variables, X3, x4, xs are nonbasic,
and, correspondingly,

and

1 2 3 4 5J
5 6 7 8 9
M= 0 0 1 0 0
[0 0 0 1 0
0 0 0 0 1

Suppose that xr is leaving the basis (p = 1) and x 5 is entering the basis (q = 5).
The new fundamental matrix is given by

M~ [~ ~J
2 3 4
6
0
0
0
7 8
1 0
0 1
0 0
+ m [(I
0 0 0 0)-(0 0 0 0 1)]

~ [~ ~J [~ ~J
2 3 4

n~[~
0 0 0 2 3 4
6 7 8 0 0 0 6 7 8
0 1 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
0 0 0
+ 0 0 0 -1 1 0 0 0

The inverse matrix of the new fundamental matrix can be obtained with the help
of the following Sherman-Morrison-Woodbury formula.

Lemma 4.2. Let M be ann x n nonsingular matrix and u, v be two n-dimensional


column vectors. If w = 1 + vTM- 1u =1= 0, then the matrix (M + uvT) is nonsingular and
68 Duality Theory and Sensitivity Analysis Chap.4

Proof

=I- (±)M- 1uvT +M- 1uvT- (±)M- 1uvTM- 1uvT

= I + ( 1 - ±) M- 1UVT - ( w: 1
) M- 1uvT = I

Note that once M- 1 is known, the inverse matrix of (M + uvT) can be found in
O(n 2 )elementary operations. Sometimes, we call it the rank-one updating method.
To derive the inverse of the new fundamental matrix M, we let u = eq, v =
(ep - eq)· Then, Lemma 4.2 implies that

)TM-1
M-1 = M-1- M eq eP-
-1 (
eq (4.24)
1+ (ep- eq)TM- 1eq

Notice that, e;M- 1 is the qth row of M- 1. Hence it is (eq)T itself. Consequently,
-1 [ TM-1 T]
M-1 =M-1- M eq eP -eq
1 + e~M- 1 eq - er eq
M- 1e [eTM- 1 - eT]
= M-1- q P q (4.25)
eTM- 1e
p q

Remember that, from (4.22), (wT I rNT) = cTM- 1• We define


(wT I rT) = cTM-1 (4.26)
Hence we have

(4.27)

or

(4.28)

We further define
UT = eTB-1
p (4.29)

Yj = uT Aj (Ajbeing the jth column of A) (4.30)


and
rq
y=-. (4.31)
Yq
Sec. 4.5 The Dual Simplex Method 69

Then Equation (4.28) shows that

w=w+yu (4.32)

j EN, j i- q N being the index )


(4.33)
( set of nonbasic variables
(4.34)

Several observations can be made here:

1. Equation (4.29) says that uT is the pth row of B- 1 .


2. Equation (3.30) further indicates that Yq = uT Aq = -d$, which is opposite to
the pth component of the edge direction that we derived in the revised simplex
method.
3. In order to maintain dual feasibility, we require
rq
rp = -y = --?: 0 (4.35)
Yq
and

for j EN (4.36)

If there exists j E N such that yj < 0, then ~~j :::: -y is required. Hence we
must choose q such that
-rq -rj -
0 :S -y = - :S - , Vyj < 0, j EN (4.37)
Yq Yj
In other words, we should choose q so that the minimum ratio test

-rq =min { - .1 Yj < 0, j EN


0 :S -y = _
Yq Y1
-r·1 -} (4.38)

is satisfied.
4. In case Yj :::: 0, Vj E N, then we know

Yj = uT Aj = e~B- 1
Aj?: 0, (4.39)

Therefore,

(4.40)

Consequently, for any feasible x :::: 0, we see that e~B- 1 Ax :::: 0. Notice
that e~B- 1 Ax = e~B- 1 b = e~x 8 = xp. Hence (4.39) implies that xp?: 0, which
contradicts our assumption (of the dual simplex approach) that Xp < 0. This in
turn implies that there is no feasible solution to the primal problem.
70 Duality Theory and Sensitivity Analysis Chap.4

4.5.3 Computer Implementation of the Dual Simplex Method

Incorporating the above observations into the dual simplex approach, we can now present
a step-by-step procedure of the dual simplex method for computer implementation. It
solves a linear programming problem in its standard form.

Step 1 (starting with a dual feasible basic solution): Given a basis B = [Aj1 ,
Ah, Ah, ... Aj'"] of the constraint matrix A in the primal problem with an index set
B = U1, h, j3, ... jm}, such that a dual basic feasible solution w can be obtained
by solving the system of linear equations
BTw = cs
Compute the associated reduced costs vector r with

Step 2 (checking for optimality): Compute vector xs for primal basic variables
by solving
Bxs = b

If xs ::: 0, then STOP. The current solution

is optimal.
Otherwise go to Step 3.
Step 3 (leaving the basis): Choose a basic variable Xjp < 0 with index jp E B.
Step 4 (checking for infeasibility): Compute u by solving the system of linear
equations

Also compute

If Yj ::: 0, Vj rft B; then STOP. The primal problem is infeasible. Otherwise


go to Step 5.
Step 5 (entering the basis): Choose a nonbasic variable xq by the minimum ratio
test

-r
_q
Yq
= min {--r·j
Y1
1 -}
Yj < 0, j rft B

Set
-rq
-=-y
Yq
Sec. 4.5 The Dual Simplex Method 71
I
Step 6 (updating the reduced costs):
vj ¢:. B, j f. q
Tjp +- -y
Step 7 (updating current solution and basis): Computed by solving
Bd= -Aq
Set

xq +- a = Yq =
x· (-x·
}p
d; )
}p

v j; E B, i f. p
B +- B + [Aq - Aj)e~

B +- Bu {q}\Up}
Go to Step 2.

The following example illustrates the dual simplex method.


Example 4.5
Consider the linear program
Minimize - 2x1 - x2

subject to XJ + x2 + x3 = 2

X] , X2, X3, X4 :::: 0

Step 1 (starting): Choose B ={I, 4}. We see that

Then the dual solution

c~B- 1
2
w= = [ -; J
Computing rj, Vj ¢ B, we have r2 = I and r3 = 2, which implies that w is dual
feasible.
Step 2 (checking for optimality): Since

xs = [~~] =B- 1b= [~I]


the corresponding primal vector is infeasible.
Step 3 (leaving the basis): Since x4 < 0 (the second element in B ), we choose x4
to be leaving the basis and let p = 2.
72 Duality Theory and Sensitivity Analysis Chap. 4

Step 4 (check infeasibility): Compute


u7 =efB-J =(-1 1]
and

Step 5 (entering the basis): Take the minimum ratio test

r2
- Y2 =min
{ -1 -2}
-1, -1 = 1 = -y

Therefore x2 is entering the basis and p = 2.


Step 6 (updating the reduced costs):

r4 = -y = 1 and r3 = 2- YY3 = 1
(also note that r2 has been changed from 1 to 0 as x2 enters the basis.)
Step 7 (updating current solution and basis): Solving for d in the equation
Bd = -A2, we obtain

Also
X4
x2 = a = - = 1
Y2

X]= 2- 1 X 1 = 1

Thus the new primal vector has x 1 = x2 = 1 (and nonbasic variables x3 = X4 = 0).
Since it is nonnegative, we know it is an optimal solution to the original linear program.
The corresponding optimal basis B becomes

4.5.4 Find an Initial Dual Basic Feasible Solution

To start the dual simplex method, we need a basis matrix B which ensures a dual basic
feasible solution. In contrast with the artificial variable technique introduced in Chapter
3 to obtain a starting primal basic feasible solution for the revised simplex algorithm,
a popular method called the artificial constraint technique is used for the dual simplex
method. Basically, we can choose any nonsingular m x m submatrix B of A, and add
one artificial constraint

with a very large positive number M to the original problem. In this way, an additional
slack variable Xn+l is added, and B U {n + 1} becomes an index set of basic variables
for the new problem. Among those nonbasic variables, choose the one with minimum
Sec. 4.6 The Primal Dual Method 73

value in the reduced cost rj as the entering variable and Xn+l as the leaving variable. It
can be shown that a dual basic feasible solution can be identified by performing such a
single pivot.
Another way to obtain a dual basic feasible solution is by solving the following
linear programming problem (possible by applying the revised simplex method):
Minimize cT x (4.41a)
subject to Ax = Be, x ::: 0 (4.41b)
where B is any m x m nonsingular submatrix of A and e is a vector of all ones. Note
that problem (4.41) has a starting feasible solution

[~]
for the revised simplex method. If this leads to an optimal solution, the corresponding
dual solution can be chosen as an initial dual basic feasible solution. On the other hand,
if problem (4.41) becomes unbounded, we can show that the original linear program is
also unbounded. Hence no dual feasible solution can be found. This is left as an exercise
to the reader.
Before concluding this section, we would like to point out three facts:

1. Solving a linear program in its standard form by the dual simplex method is math-
ematically equivalent to solving its dual linear program by the revised (primal)
simplex method.
2. Solving a linear program by the dual simplex method requires about the same
amount of effort as the revised (primal) simplex method.
3. The dual simplex method is very handy in sensitivity analysis with an additional
constraint. This topic will be discussed in later sections.

4.6 THE PRIMAL DUAL METHOD

As we discussed before, the dual simplex method starts with a basic feasible solution
of the dual problem and defines a corresponding basic solution for the primal problem
such that the complementary slackness conditions are met. Through a series of pivoting
operations, the method maintains the dual feasibility and complementary slackness con-
ditions and tries to attain the primal feasibility. Once the primal feasibility is achieved,
the K-K-T optimality conditions guarantee an optimal solution. In this section, we study
the so-called primal-dual method, which is very similar to the dual simplex approach but
allows us to start with a nonbasic dual feasible solution.
Consider a linear programming problem in its standard form, which we may refer
to as the "original problem." Let w be a dual feasible (possibly nonbasic) solution. Then
we know that Cj ::: wT Aj Vj, where Aj represents the jth column of the constraint matrix
A. We are particularly interested in those binding (or tight) constraints and denote an
index set T = {j 1 wT Aj = Cj }. According to the complementary slackness theorem
74 Duality Theory and Sensitivity Analysis Chap.4

(Theorem 4.4), T is also the index set of primal variables which may assume positive
values. Now we consider the following linear programming problem:

Minimize z = L Oxj + eT xa (4.42a)


jET

subject to LAjXj + lxa = b (4.42b)


jET

X·>
1- '
0 Vj E T, and xa;::: 0 (4.42c)
where xa is an m-dimensional vector of artificial variables.
Note that problem (4.42) only includes a subset of primal variables in the original
problem, hence it is called the restricted primal problem associated with the original one.
Also note that the following result is true.

Lemma 4.3. If the restricted primal problem has an optimal solution with zero
objective value, then the solution must be an optimal solution to the original problem.
Proof Assume that

is an optimal solution to the restricted problem with zero objective value. Since the
optimal objective value of the restricted primal problem is zero, we have x~ = 0 in its
optimal solution. Therefore we can use x} to construct a primal feasible solution x to
the original problem such that Xj = xl ;::: 0, Vj E T, and Xj = 0, Vj ~ T. Note that
the restricted problem was defined on the basis of an existing dual feasible solution w
with cj = wT Aj, Vj E T, and cj > wT Aj, Vj ~ T. It is clear that the complementary
slackness conditions are satisfied in this case, since (cj - wT Aj )xj = 0, Vj. Thus the
K-K-T conditions are satisfied and the proof is complete.

If the optimal objective value of the restricted primal problem is not zero, say
z* > 0, then xf is not good enough to define a primal feasible solution to the original
problem. In other words, a new dual feasible solution is needed to reconstruct the
restricted primal problem with reduced value of z*. In doing so, we also would like to
make sure that only new primal variables whose index does not belong to T are passed
on to the new restricted primal problem. To achieve our goal, let us consider the dual
problem of the restricted primal problem (4.42), i.e.,
Maximize z' = yrb (4.43a)
subject to YT Aj ~ 0, Vj E T (4.43b)
y:::: e, y unrestricted (4.43c)
Let y* be an optimal solution to this problem. Then the complementary slackness
conditions imply that y*T Aj :::: 0, for j E T. Only for those j f{. T with y*T Aj > 0,
Sec. 4.6 The Primal Dual Method 75

the corresponding primal variable Xj could be passed on to the restricted primal problem
with potential for lowering the value of z*. (Why?) More precisely, we may consider y*
as a moving direction in translating the current dual feasible solution w to a new dual
solution w', i.e., we define
w' = w+ay*, for a> 0
Hence we have
Cj- w'T Aj = Cj- (w + ay*)T Aj = (cj- wT Aj)- a(y*T Aj) (4.44)
Now, for each j E T, since Cj- wT Aj = 0 and y*T Aj _:::: 0, we know Cj- w'T Aj ::: 0.
In order to keep w' to be dual feasible, we have to consider those j tf. T with y*T Aj > 0.
Given the fact that Cj - wT Aj ::: 0, V j tf. T, we can properly choose a > 0 according to
the following formula:

a-
_ (ck - wT Ak) _
TA - mm
. { (cj - wT Aj)
T
I 1. tf. T,- y*T A . > 0 }
1
(4.45)
y* k 1 y* Aj

such that Cj - w'T Aj ::: 0, Vj tf. T. In particular, ck - w'T Ak = 0 and Cj - w'T Aj ::: 0,
for j tf. T and j =f. k. Then the primal variable xk is a candidate to enter the basis of the
new restricted primal problem, in addition to those primal variables in the basis of the
current restricted problem.
Following this process of adding primal variables into the restricted problem, we
may end up with either one of the following two situations: Case 1-the optimal objective
value of a new restricted primal problem becomes zero. Then Lemma 4.3 assures us an
optimal solution to the original problem is reached. Case 2-the optimal objective value
of a new restricted primal problem is still greater than zero but y*T Aj _:::: 0, Vj tf. T.
Then we can show that the original primal problem is infeasible and its dual problem is
unbounded.

4.6.1 Step-by-step Procedure for the Primal-Dual Simplex


Method

Summarizing the discussions in the previous section, we can write down a step-by-step
procedure for the primal-dual simplex method for computer implementation.

Step 1 (starting): Choose an initial dual vector w such that


c-J - WT A- >
1 -
0' Vj"

LetT= {j I Cj- wT Aj = 0}.


Step 2 (check for optimality): Solve the restricted primal problem (4.42). If the
optimal cost of this problem is zero, then STOP. The current solution is optimal.
Otherwise go to Step 3.
Step 3 (compute the direction of translation for the dual vector): Solve the
dual problem (4.43) of the restricted primal problem. Let y* be its optimal solution
and take it as the direction of translation of the current dual solution.
76 Duality Theory and Sensitivity Analysis Chap. 4

Step 4 (check infeasibility/unboundedness): If y*T Aj s 0, V j tf. T, then STOP.


The original primal problem is infeasible and its dual is unbounded. Otherwise,
continue.
Step 5 (enter the basis of the restricted primal): Choose an index k such that

(ck - wT Ak) -
y*T Ak -
. { (Cj - wT Aj)
Il)lll y*T Aj
I 1. tf. T' y *T A.
J >
o}
Also define a step length

a=
y*TAk
Add the primal variable xk into the basis to form a new restricted primal problem.
Step 6 (update the dual feasible vector): Set
w +- w+ay*

Go to Step 1.
Note that the mechanisms of generating a starting dual feasible solution for
the dual simplex method can be applied here to initiate the primal-dual method.
The following example illustrates the procedures of the primal-dual algorithm.
Example 4.6
Minimize - lx1- x2

subject to XJ + x2 + x3 = 2

XJ +x4 = 1

Step 1 (starting): The dual of the above problem is


Maximize 2 w 1 + w2
subject to w 1 + w 2 s -2
WJ s -1
W! so
W2 so
WJ, W2 unrestricted
Let us choose a dual feasible solution

Notice that only the second constraint is tight, hence T= {2}.


Sec. 4.6 The Primal Dual Method 77

Step 2 (check for optimality): The restricted primal is

Minimize xf + x~
subject to xz + xf = 2
X~= 1

Solving it, we have an optimal solution [x 2 xf x2JT = [2 0 If with


the corresponding dual solution [w 1 w 2 f = [0 1f. Since the optimal cost is
1 (=f 0), the current solution is not optimal to the original problem.
Step 3 (compute the direction of translation for the dual vector): The dual to
the restricted primal is

Maximize 2y! + Y2
subject to Y1 :so
Y1 :::: 1

Y2 :::: 1

Y!, Y2 unrestricted

Since x 2 and x~ are basic variables of the restricted primal, it follows the
complementary slackness conditions that the first and third constraints of its dual
problem are tight. Therefore,

y* = [n
is an optimal solution to this problem. We take it as the direction of translation for
the dual vector w.
Step 4 (check infeasibility/unboundedness): Now we proceed to compute the
values y*T Aj for j E { 1, 3, 4}. It can be easily verified that these values are 1, 0,
and 1 respectively. Therefore we continue.
Step 5 (enter the basis of the restricted primal): Compute cj - wT Aj for
j = 1, 3, 4. The values are 2, 1, and 3 respectively. Therefore,

a =min{~~}
1' 1
=2

and k = 1. This implies that x 1 should also enter the basis in addition to x 2 •
Step 6 (update the dual feasible vector): The new dual vector becomes
78 Duality Theory and Sensitivity Analysis Chap.4

So far we have just completed one iteration and a new restricted primal
problem is generated:
Minimize xf + x~
subject to x1 + Xz + xf = 2
x 1 +x~ =1

Solving it, we have an optimal solution x 1 = xz = 1 and xf = x~ = 0 with


a zero objective value. Hence we know [1 1 0 Of is an optimal solution to
the original problem and [-1 - 1f is an optimal solution to its dual problem.

4.7 SENSITIVITY ANALYSIS

Given a linear programming problem in its standard form, the problem is completely
specified by the constraint matrix A, the right-hand-side vector b, and the cost vector c.
We assume that the linear programming problem has an optimal solution x* for a given
data set (A, b, c). In many cases, we find the data set (A, b, c) needs to be changed
within a range after we obtained x*, and we are interested in finding out new optimal
solutions accordingly. Conceptually, we can of course solve a set of linear programming
problems, each one with a modified data value within the range. But this may become
an extremely expensive task in reality. The knowledge of sensitivity analysis or post-
optimality analysis will lead us to understand the implications of changing input data on
the optimal solutions.

4. 7.1 Change in the Cost Vector

Assume that x* is an optimal solution with basis B and nonbasis N of a linear program-
ming problem:
Minimize cT x

subject to Ax = b, x ~ 0

Let c' = [ ~~] be a perturbation in the cost vector such that the cost vector changes
according to the formula

(4.46)

where a E R.
We are specifically interested in finding out an upper bound a and lower bound
g_ such that the current optimal solution x* remains optimal for the linear programming
problem with a new cost vector in which g_ ::::: a ::::: a. The geometric concept behind the
Sec. 4.7 Sensitivity Analysis 79

-c

,'

,'
' ' ,'
\'
''
'' ''
'' a
'' increases
Ax=b '<
'
,' ''

'' '
'>,'

-(c + ac')
,'
Figure 4.2

effect of the above perturbation of c on x* is illustrated in Figure 4.2. When the scale of
perturbation is small, x* may remain optimal. But a large-scale perturbation could lead
to a different optimal solution.
In order to find the stable range for the current optimal solution x* with basis B,
we focus for a moment on the revised simplex method. Notice that since the feasible
domain {x E Rn I Ax = b, x ::::_ 0} remains the same, x* stays feasible in the linear
program with the perturbed cost vector c. Moreover, x* stays optimal if the reduced
costs vector satisfies the requirement that
(4.47)
In other words, we require
(eN +ac'N)T- (cs +ac's)TB- 1N ::::_ 0 (4.48)
We now define
(4.49)
and
tT tT tTB-IN (4.50)
rN=CN-CB

Then, as long as a satisfies that

(4.51)
x* stays optimal for the linear programming problem with a perturbed cost vector c.
Therefore, denoting N as the index set of nonbasic variables, we can determine that

«= max {m~ {~:' c; > 0, q d~} -oo} , (4.52)


80 Duality Theory and Sensitivity Analysis Chap. 4

and

ll = min {min { ~;' ,; < 0, q E N} ,+oo} (4.53)

Several observations can be made here:

Observation 1. For g_ :::: a :::: a, x* remains as an optimal solution to the linear


program with perturbed cost vector. Besides, the optimal objective value z*(a) becomes
a function of a such that
z*(a) = (c1 + ac'~)B- 1 b
= z* + a(c'~B- 1 b) (4.54)
which is a linear function of a, when a stays in the stable range.

Observation 2. If the perturbation is along any particular cost component, say


cj for 1 :::: j :::: n, we can define ej to be the vector with all zeros except one at its jth
component and set c' = ej. In this way, Equations (4.52) and (4.53) provide a stable
range [cj + g_, Cj +a] for the jth cost component. This also tells us how sensitive each
cost coefficient is.

Observation 3. When a is within the stable range, the current solution x* remains
optimal and the optimal objective value is a linear piece in the range. As a goes beyond
either the lower bound or the upper bound just a little bit, Figure 4.2 indicates that
a neighboring vertex will become a new optimal solution with another stable range.
This can be repeated again and again and the optimal objective function z*(a) becomes
a piecewise linear function. The piecewise linearity is between the bounds on a for
various bases. We can further prove that z*(a) is actually a concave piecewise linear
function as shown in Figure 4.3.

4.7.2 Change in the Right-hand-side Vector

As in the previous section, let us assume that x* is an optimal solution with basis B and
nonbasis N to the linear programming problem
Minimize cT x
subject to Ax = b, x :::: 0
This time we incur a perturbation b' in the right-hand-side vector and consider the
following linear program:
Minimize z(a) = cT x (4.55a)
subject to Ax= b + ab', x:::: 0 (4.55b)
for a E R.
Sec. 4.7 Sensitivity Analysis 81

z*(o:)

Figure 4.3

Ax=b

Figure 4.4

Note that because the right-hand-side vector has been changed, x* need not be
feasible any more. But we are specifically interested in finding an upper bound a and
lower bound ~ such that the current basis B still serves as an optimal basis for the linear
programming problem with a new right-hand-side vector in which ~ ::s a ::s a. The
geometric implications of this problem are depicted in Figure 4.4.
In order to declare that B is an optimal basis, we have to check two conditions,
namely,
1. The reduced cost vector rT = cNT- c 8 TB- 1N is nonnegative.
82 Duality Theory and Sensitivity Analysis Chap. 4

2. The basic solution provided by B is feasible, i.e.,

Xa-
_ [B- (b0+ ab')]
1

The first condition is obviously satisfied, since the cost vector c, the basis B, and
the nonbasis N remain the same as before. The second condition is not necessarily true,
owing to the change of ab', unless n- 1 (b + ab') ::=:: 0.
To find the stable range for a, we let b = n- 1b and i)' = n- 1b'. Thus b+ab':::: 0
is required for the second condition. Consequently, we can define

"-=max {max { ~? b; > 0, p Eii}, -oo} (456)

and

1i = nlln {min {~? b; < 0, p di} ,+oo} (4.57)

where B is the index set of the basic variables corresponding to B.


It can be clearly seen that within the range g_ :::::: a ::::::a, B remains an optimal basis
for the perturbed linear program. Moreover, the corresponding optimal solutions

Xa -
_[B- b+0aB- b'] _ [B-0 b] + [aB-0 b'] -_
1 1
- --
1 1
X
* + aB _1b,

form a linear function of a. In addition, the optimal objective values


x*(a) = cT Xa = cT x* + acsB- 1b'
= z* + acsB- 1b'
also become a linear function of a within the range.
If the perturbation is due to the change of the right-hand side of a particular
constraint, say bi for some 1 :::::: i :::::: m, we can define ei to be the vector with all zeros
except one at its ith component and set b' = ei. In this way, Equations (4.56) and (4.57)
provide a stable range [bi + g_, bi +a] for the ith resource constraint, which indicates
how sensitive the resource is.

4.7.3 Change in the Constraint Matrix

So far, we have dealt with the changes in the cost vector and the right-hand-side vector.
In this section, we proceed to analyze the situation with changes in the constraint matrix.
In general, the changes made in the constraint matrix may result in different optimal
basis and optimal solutions. It is not a simple task to perform the sensitivity analysis.
Here we deal only with four simpler cases, namely adding and removing a variable and
adding and removing a constraint. As in previous sections, we still assume that the
original linear programming problem has an optimal solution x* = [B- 1b I 0] with an
optimal basis B such that the constraint matrix can be partitioned as A = [B I N].
Sec. 4.7 Sensitivity Analysis 83

Case 1 (adding a new variable). Suppose that a new decision variable, say Xn+l,
is identified after we obtained the optimal solution x* of the original linear program. Let
us also assume that Cn+ 1 is the cost coefficient associated with Xn+ 1, and An+ 1 is the
associated column in the new constraint matrix. We would like to find an optimal solution
to the new linear programming problem:
Minimize c7 X+ Cn+IXn+l
subject to Ax+ An+IXn+I = b, X 2: 0, Xn+I 2: 0
Note that we can set Xn+I = 0; then

[~]
becomes a basic feasible solution to the new linear program. Hence the simplex algorithm
can be initiated right away. Remember that x* is an optimal solution to the original
problem, the reduced costs rj, for j = 1, ... , must remain nonnegative. Therefore, we
only have to check the additional reduced cost rn+I = Cn+I- c~B- 1 An+I·
If rn+I :=: 0, then the current solution x* with Xn+I = 0 is an optimal solution to the
new problem and we do not have to do anything. On the other hand, if rn+l < 0, then
Xn+ 1 should be included in the basis as a basic variable. Therefore, we can continue the
simplex algorithm to find an optimal solution to the new linear programming problem.

Case 2 (removing a variable). After solving a linear programming problem, we


find that a decision variable, say xk> is no longer available and hence has to be removed
from consideration. Our objective is to find a new optimal solution with minimum
additional effort.
Note that ifx; = 0, then the current optimal solution x* remains optimal. When
x; > 0 (xk is a basic variable), we have to work out a new solution. In this case, we
first attempt to remove xk from the basis by solving the following Phase I problem:
Minimize xk

subject to Ax = b, x :=: 0
Since the constraints are not altered, we know x* can be served as an initial basic
feasible solution to this problem for the revised simplex algorithm. Moreover, if the
simplex method finds the optimal objective value of the Phase I problem is not zero,
then the new linear programming problem obtained by removing the variable Xk from
the original problem must be infeasible. On the other hand, if the simplex method finds
an optimal solution x' with zero objective value for the Phase I problem, then we can
take x' as an initial basic feasible solution to the new linear program without the variable
Xk. In a finite number of iterations, either an optimal solution can be found for this new
problem, or the unboundedness can be detected.

Case 3 (adding a constraint). This time a new constraint is imposed after solv-
ing a linear programming problem. For simplicity, we assume the additional constraint
84 Duality Theory and Sensitivity Analysis Chap.4

has inequality form, namely,


(4.58)
where am+ IT is an n-dimensional row vector to be added to the constraint matrix A.
Hence the new linear problem becomes
Minimize cT x
subject to Ax = b

To solve this new linear program, first notice that the additional constraint may
cut the original feasible domain to be smaller. If x* remains feasible, then of course it
remains optimal. But the feasible domain may exclude x*, as shown in Figure 4.5. In
this case, we do not even have a basic feasible solution to start the simplex algorithm.
Also notice that B is no longer a basis in the new problem. In fact, if the additional
constraint is not redundant, the dimensionality of any new basis becomes m + 1, instead
of m.

New constraint
(x* infeasible)

-------r

New constraint
(x* feasible) Figure 4.5

To solve the new problem with an additional constraint, we add a slack variable
Xn+I and consider the following linear programming problem:

Minimize c~xB + C~XN + Oxn+l (4.59a)


subject to BxB + NxN = b (4.59b)
T T
(am+J 8 ) XB +(am+ IN) XN + Xn+l = bm+l (4.59c)
XB,XN ~ O,xn+l ~ 0 (4.59d)
Sec. 4.7 Sensitivity Analysis 85

where am+1s and am+1N are the subrows of am+! corresponding to XB and XN, respec-
tively.
We now pass the slack variable to the basis B and consider a new basis B defined
by
(4.60)

It is easy to verify that B is nonsingular and its inverse matrix is given by

B"-1 = [ TB-1 -1 01] (4.61)


-(am+1sB )
With the new basis B, we can define

XB = B-1 [ b ] (4.62)
bm+1
Then
x [x;]
=

is a basic solution (not necessarily feasible) to the new problem with an additional
constraint. Moreover, we can show the following result.
Lemma 4.4. Let B be an optimal basis to the original linear programming prob-
lem. If x, essentially defined by (4.62), is nonnegative, then it is an optimal solution to
the new linear programming problem with the additional constraint.
Proof Since the basic solution x is nonnegative, it is a basic feasible solution. In
order to declare it is an optimal solution, we need to show the reduced cost for each
nonbasic variable is nonnegative, i.e.,

cq- [ -Cn ] T- s- 1 [ -Aq- ] ::: o, (4.63)


0 am+1,q

Since B is an optimal basis to the original linear program, we have


(4.64)
Noting that

[c; JT B-I = [c~B-1 I o]


we see that condition (4.63) is true and the proof is completed.
On the other hand, if xB is not nonnegative, then the primal feasibility condition is
violated by at least one primal variable. In this case, we can restore the primal feasibility
condition by employing the dual simplex algorithm, starting with a dual basic feasible
solution
[~]
where wT = c~B- 1 . The following example illustrates this situation.
86 Duality Theory and Sensitivity Analysis Chap. 4

Example 4.7
Consider the problem,

Minimize - 2x 1 - xz

subject to x, + xz + x3 = 2

XI, Xz, X3 2:: 0

It is easy to verify that x; = 2, x~ = xj = 0 is the optimal solution to this problem


with an optimal basis B = [1]. Moreover, we have the index set of basic variables B = (1},
the nonbasic index set N = (2, 3}, and en = [-2].
One more constraint is added to form a new linear program

Minimize - 2x 1 - xz

subject to x 1 + xz + x3 = 2

x1 + +x4 = 1

XJ,X2,X},X4 2::0

It is clear that x* becomes infeasible to the new problem.


We now form a new basis

with en = [ -~ J. The dual solution is defined by w = c~B- 1 . For the reduced costs rj
(j = 2, 3), we have r2 = 1 and r3 = 2, which implies that w is dual feasible. However,

Xn = [:~] =B- b=
1
[_n
we know the corresponding primal is infeasible. Therefore we can restore the primal feasi-
bility by the dual simplex method. The rest follows Example 4.5 exactly to the end.

Case 4 (removing a constraint). This case is more complicated than the ones
we have considered so far. However, if the constraint, say ak T x ::=: bk> that we wish
to remove is nonbinding, i.e., a[ x < bk> then it can be removed without affecting the
optimality of the current optimal solution. To check if the kth constraint is binding,
we simply look at the dual variable wk. If wk = 0, then the complementary slackness
condition allows the constraint to be not binding.
On the other hand, if we want to remove a binding constraint, the task becomes
difficult. We may have to solve the new linear programming problem from the beginning.

4.8 CONCLUDING REMARKS

In this chapter, we have introduced the fundamental concept of duality theory in linear
programming. Two variants of the simplex algorithm, namely the dual simplex algorithm
Exercises 87

and the primal-dual algorithm, have been derived based on this very concept. We also
studied post-optimality analysis, which could assess the sensitivity of an optimal solution
or optimal basis with respect to various changes made in the input data of a linear
programming problem.

REFERENCES FOR FURTHER READING

4.1. Balinsky, M. L., and Gomory, R. E., "A mutual primal-dual simplex method," in Recent
Advances in Mathematical Programming, ed. R. L. Graves and P. Wolfe, McGraw Hill, New
York (1963).
4.2. Balinsky, M. L., and Tucker, A. W., "Duality theory of linear programs: A constructive
approach with applications," SIAM Review 3, 499-581 (1969).
4.3. Barnes, J. W., and Crisp, R. M., "Linear programming: a survey of general purpose algo-
rithms," AilE Transactions 7, No. 3, 49-63 (1975).
4.4. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).
4.5. Dantzig, G. B., Linear Programming and Extensions, Princeton University Press, Princeton,
NJ (1963).
4.6. Farkas, J., "Theorie der einfachen Ungleichungen," Journal fiir die reine und angewandte
Mathematik 124, 1-27 (1902).
4.7. Gill, P. E., Murray, W., and Wright, M. H., Numerical Linear Algebra and Optimization,
Vol. 1, Addison-Wesley, Redwood City, CA (1991).
4.8. Goldfarb, D., and Todd, M. J., "Linear Programming," in Optimization, Handbook in Opera-
tions Research and Management Science, ed. Nemhauser, G. L. and Rinnooy Kan, A. H. G.,
Vol. 1, 73-170, Elsevier-North Holland, Amsterdam (1989).
4.9. Lemke, C. E., "The dual method for solving the linear programming problem," Naval Re-
search Logistics Quarterly 1, No. 1 (1954).
4.10. Luenberger, D. G., Introduction to Linear and Nonlinear Programming, 2d ed., Addison-
Wesley, Reading, MA (1973).
4.11. Peterson, E. L., An Introduction to Linear Optimization, Lecture notes, North Carolina State
University, Raleigh, NC (1990).

EXERCISES

4.1. Prove that the symmetric pair in Example 4.2 are indeed a pair of primal and dual problems
by converting the primal problem into its standard form first.
4.2. Find the linear dual program of the following problems:
(a) Minimize 9xl + 6x2
subject to 3x! + 8x2 ~ 4
5xi +2x2 ~ 7
XJ,X2 ~ 0
88 Duality Theory and Sensitivity Analysis Chap. 4

(b) Maximize 4xl + ?x2


subject to 3xl + Sx2 :S 9
8x1 + 2x2 :S 6
XJ,X2 2:0

Combining the results of (a) and (b), what's your conclusion?


4.3. Show that (a-1) and (a-2) , (b-1) and (b-2) are primal-dual pairs:
(a-1) Minimize cT x
subject to Ax=b
(a-2) Maximize bTw
subject to ATW=C
(b-1) Minimize CT X

subject to Ax ::S b, x :S 0
(b-2) Maximize bTw
subject to AT w 2: c, w ::::: 0

4.4. Find the dual linear program of the following problem:

Minimize 9xl + 6x2- 4x3 + 100


subject to 3xl + 8x2 - Sx3 2: 14

XJ :S 0, x2 2: 0, X3 unrestricted

4.5. Find the dual problems of the following linear programming problems:
(a) Minimize cT x
subject to Ax 2: b, X2:0
(b) Maximize bTw
subject to AT w :S c, W2:0
(c) Minimize CT X

subject to Ax=b, l::sx::su


(l and u are vectors of lower bounds and upper bounds.)

(d) Minimize CT X

n
subject to Ax=O, LXi =1, X2:0
i=l

(This is the famous Karmarkar's standard form which will be studied in Chapter 6.)
Exercises 89

n
(e) Minimize LXi
i=1
n
subjectto x;-LaP~jXj2:R~, 'ii=1,2,···,N, 'ik=1,2,···,l
j=1
x; unrestricted

where pk is an NxN (probability) matrix fork= 1, 2, ... , fa E (0, 1), Rf E R+, 'i i, k.
(This is the policy iteration problem in dynamic programming.)
4.6. Construct an example to show that both the primal and dual linear problems have no
feasible solutions. This indicates that the infeasibility of one problem does not imply the
unboundedness of the other one in a primal-dual pair.
4.7. For an infeasible linear program, show that if its dual linear program is feasible, then the
dual must be unbounded.
4.8. For a linear programming problem (P) in its standard form, assume that A is an m x n
matrix with full row rank. Answer the following questions with reasons.
(a) For each basis B, let wT (B) = chB- 1 be the vector of simplex multipliers. Is w(B)
always a feasible solution to its dual problem?
(b) Can every dual feasible solution be represented as wT (B) = chB- 1 for some basis B?
(c) Since A has full row rank, can we guarantee that (P) is nondegenerate?
(d) If (P) is nondegenerate, can we guarantee that its dual is also nondegenerate?
(e) Is it possible that both (P) and its dual are degenerate?
(f) Is it possible that (P) has a unique optimal solution with finite objective value but its
dual problem is infeasible?
(g) Is it possible that both (P) and its dual are unbounded?
(h) When (P) and its dual are both feasible, show that the duality gap vanishes.
4.9. Consider a two-person zero-sum game with the following pay-off matrix to the row player:

Strategies 1 2 3

1 2 -1 0

2 -3 1 1

(This means the row player has two strategies and the column player has three
strategies. If the row player chooses his/her second strategy and the column player chooses
his/her third strategy, then the column player has to pay the row player $1.)
Let XJ, x2, and x3 be the probabilities with which the column player selects his/her
first, second, and third strategies over many plays of the game. Keep in mind that the
column player wishes to minimize the maximal expected payoff to the row player.
(a) What linear program will help the column player to determine his probability distribution
of selecting different strategies?
(b) Find the dual problem of the above linear program.
90 Duality Theory and Sensitivity Analysis Chap.4

(c) Interpret the dual linear program.


(d) Solve the dual linear program graphically.
(e) Use the dual optimal solution to compute the column player's probabilities.
(f) Write down and interpret the complementary slackness conditions for the two-person
zero-sum game.
4.10. Here is a description of the transportation problem:
A company needs to ship a product from m locations to n destinations. Suppose
that a; units of the product are available at the ith origin (i = 1, 2, 3, ... , n), bj units are
required at the jth destination (j = 1, 2, ... n). Assume that the total amount of available
units at all origins equals the total amount required at all destinations. The cost of shipping
one unit of product from origin i to destination j is Cij and you are asked to minimize the
transportation cost.
(a) Formulate the problem as a linear programming problem.
(b) Write its dual linear program.
(c) Write down its complementary slackness conditions.
(d) Given that i = 3, j = 4, a1 = 3, az = 3, a4 = 4; b1 = 2, bz = 3, b3 = 2, and b4 = 3
with the cost matrix

Destination
2 3 4

7 2 -2 8
Origin 2 19 5 -2 12
3 5 8 -9 3

and assuming that w = (0, 3, -4, 7, 2, -5, 7l is an optimal dual solution, find an optimal
solution to the original (primal) problem.
4.11. Closely related to Farka's theorem of alternatives is Farka's transposition theorem: "There
is a solution x to the linear system Ax = b and x :=::: 0 if, and only if, bT w :=::: 0 when
AT w :=::: 0." Prove Farka' s transposition theorem.
4.12. Show that there is a solution x to the linear system Ax ::::: b if, and only if, bT w :=::: 0 when
AT w = 0 and w :=::: 0. This problem is called Gale's transposition theorem.
4.13. Show that there is a solution x to the linear system Ax ::::: b and x :=::: 0 if, and only if,
bT w :=::: 0 when AT w :=::: 0 and w :=::: 0.
4.14. Prove Gordan's transposition theorem: There is a solution x to the strict homogeneous linear
system Ax< 0 if, and only if, w = 0 when AT w = 0 and w :=::: 0.
4.15. Use Farka' s lemma to construct a proof of the strong duality theorem of linear programming.
4.16. Why is x* an optimal solution to the linear programming problem with new demands in
Section 4.4.1 ?
4.17. Show that, in applying the primal-dual method, if we end with a restricted primal problem
with positive optimal objective value and y*T Aj ::::: 0, 'V j ¢. T, then the original primal
problem is infeasible and its dual is unbounded.
4.18. Consider the following linear program:
Exercises 91

Minimize 2.q + x2 - x3

subject to Xt + 2x2 + X3 :S 8

- X[ + X2 - 2X3 :S 4

X[,X2,X3:::: 0
First, use the revised simplex method to find the optimal solution and its optimal dual
variables. Then use sensitivity analysis to answer the following questions.
(a) Find a new optimal solution if the cost coefficient of x2 is changed from 1 to 6.
(b) Find a new optimal solution if the coefficient of x 2 in the first constraint is changed
from 2 to 0.25
(c) Find a new optimal solution if we add one more constraint x2 + x3 = 3.
(d) If you were to choose between increasing the right-hand side of the first and the second
constraints, which one would you choose? Why? What is the effect of this increase on
the optimal value of the objective function?
(e) Suppose that a new activity X6 is proposed witli a unit cost of 4 and a consumption
vector A6 = (1 2l. Find a corresponding optimal solution.
5

Co~nplexity nalysis
and the Elli soid etho

The simplex approach described in previous chapters has been an extremely efficient
computational tool ever since it was introduced by G. B. Dantzig in 1947. For certain
problems, however, at least in theory, the method was shown to be very inefficient. This
leads to the study of the computational complexity of linear programming. The worst-
case analysis shows that the simplex method and its variants may take an exponential
number (depending on the problem size) of pivots to reach an optimal solution and the
method may become impractical in solving very large scale general linear programming
problems. Therefore, research work has been directed to finding an algorithm for linear
programming with polynomial complexity.
The first such algorithm was proposed by L. G. Khachian in 1979, based on the
method of central sections and the method of generalized gradient descent with space
dilation, which were developed for nonlinear optimization by several other Soviet math-
ematicians. In theory, Khachian's ellipsoid method has a better time bound than the
simplex method, but it seems to be of little practical value at least at the present time.
The practical performance of the variants of the simplex method is far better than that
of the ellipsoid method.
In this chapter we start with the concept of computational complexity, discuss the
performance of the simplex method in the context of complexity analysis, then introduce
the basic ideas of the ellipsoid method, and conclude with the performance of Khachian's
algorithm.

92
Sec. 5.1 Concepts of Computational Complexity 93

5.1 CONCEPTS OF COMPUTATIONAL COMPLEXITY

The concept of complexity analysis was introduced in the 1970s to evaluate the perform-
ance of an algorithm. The worst-case analysis measures the degree of difficulty in
problem solving under the worst scenario. The computational complexity provides us
an index of assessing the growth in computational effort of an algorithm as a function
of the size of the problem in the worst-case analysis. The complexity of an algorithm
is usually measured in this context by the number of elementary operations such as
additions, multiplications, and comparisons, which depends on the algorithm and the
total size of the input data in binary representation.
For a general iterative scheme, as discussed in Chapter 3, its complexity is deter-
mined by the product of the total number of iterations and the number of operations at
each iteration. The total number of iterations certainly depends on the accuracy level
required, while the number of elementary operations depends upon the binary represen-
tation of the input size. Consider a linear programming problem

Minimize cT x (5.1a)
subject to Ax = b, x =:: 0. (5.1b)

where A is an m x n matrix with m, n =:: 2, b E Rm, c, x E Rn, and the input data is
all integer (possibly converted from some rational data to this form). By specifying the
values of m, n, A, b, c, we define an instance of the linear program. If we further define
the input length of an instance to be the number of binary bits needed to record all the
data of the problem and denote it by L, then the size of an instance of the problem can
be represented by the triplet (m, n, L). Consequently, the complexity of an algorithm for
linear programming becomes a function of the triplet, namely f(m, n, L). Suppose that
there exists a constant number r > 0 such that the total number of elementary operations
required by the algorithm in any instance of the problem is no more than r f(m, n, L),
we say the algorithm is of order of complexity O(f(m, n, L)). When the complexity
function f (m, n, L) is a polynomial function of m, n and L, the algorithm is said to be
polynomially bounded or to be of polynomial complexity. Otherwise, the algorithm is a
nonpolynomial-time algorithm.
Notice that in the binary system, it takes (r + 1) bits to represent a positive integer
~ E [2r, 2r+I) for a nonnegative integer r. Therefore for a positive integer ~, we require
flog (1 + nl binary bits to represent it, where f·l denotes the round-up integer value.
Adding one more bit for signs, a total of 1 + rlog (1 + I~ 1)1 binary bits are needed for
encoding an arbitrary integers. For linear program (5.1), the input length is given by
n
L = fl + log(l +m)l + fl + log(l +n)l + L{l + fl + log(l + hDl}
}=1
m n m
(5.2)

i=l}=l i=l
94 Complexity Analysis and the Ellipsoid Method Chap. 5

In our complexity analysis, since only an upper bound on the computational effort is
required, we do not need an exact L in defining the size of an instance of a problem. A
common estimate is given by
n

L = 11 + logm + logn + L{l +log (1 + lcjl)}


j=l

m n

+ LL{l +log(1 + laijl)} (5.3)


i=l j=l

m
+ L{l +log (1 + lb;l)}l
i=l

or
n m n m
L = L 11 +log (1 + icj 1)1 + LL 11 +log (1 + iaij 1)1 + L 11 +log (1 + lb; 1)1 (5.4)
i=l j=l i=l

We now proceed to show that the simplex method is not of polynomial complexity,
although a vast amount of practical experience has confirmed that in most cases the
number of iterations is a linear function of m and a sublinear function of n.

5.2 COMPLEXITY OF THE SIMPLEX METHOD

The computational complexity of the simplex method depends upon the total number of
iterations and the number of elementary operations required at each iteration. Different
implementation details result in different complexity. Variants of the simplex method
were designed to achieve better computational performance. Following the computational
procedure in Chapter 3, it is not difficult to estimate that the revised simplex method
requires about m(n - m) + (m + 1) 2 multiplications and m(n + 1) additions at each
iteration. As to Dantzig's original simplex method, it requires about m(n- m) + n + 1
multiplications and m(n - m + 1) additions at each iteration. The key point is that both
of them are of order 0 (mn).
How many iterations are required? Each iteration of the simplex method and its
variants hops from one extreme point to a neighboring extreme point. For a linear
programming problem in its standard form, the feasible domain contains up to C(n, m)
extreme points that an algorithm could possibly visit. Since

C(n, m) = n! ::=: (!!_)m ::::2m whenever n:::: 2m,


m!(n- m)! m
it is quite plausible to require an exponential order of iterations. This fear of exponential
effort was confirmed by some worst-case examples specially designed for the simplex
method and its variants.
Sec. 5.2 Complexity of the Simplex Method 95

The first such example is given by V. Klee and G. L. Minty in 1971 to show that
Dantzig's simplex method must traverse all (2n- 1) extreme points to reach the optimal
solution.
Example 5.1 (Klee-Minty's example)
For 0 < 8 < 1/2,
Maximize Xn

subject to 0 :::: XJ :::: 1 (5.5a)

8xi-i::SXi::S1-8xi-!, i=2,3, ... ,n (5.5b)

Xi 2: 0, i = 1, 2, ... , n. (5.5c)

Obviously the origin point is a basic feasible solution. If we start with the origin and apply
the largest reduction rule to the entering nonbasic variables, the simplex method takes 2n - 1
iterations to visit every extreme point of the feasible domain. For n = 2 and n = 3, Figures
5.1 and 5.2 illustrate the situations. A mathematical proof based on a linear transformation
of the example is included in Exercise 5.3.

xz

x 1 =(1,b)

~----------------------------~-x!
xO = (0, 0) Figure 5.1

Variants of the simplex method may change the entering or leaving rules (pivot-
ing scheme) to avoid traversing every extreme point. But different bad examples were
reported for different variants. This leads us to believe that the simplex method and its
variants are of exponential complexity.
However, the bad examples rarely happen in real-world problems. It has been
observed in the past forty years that real-life problems in moderate size require the
simplex method to take 4m to 6m iterations in completing two phases. It is conjectured
for n large relative to m, the number of iterations is expected to be a x m, where
exp(a) < log2 (2 + njm). Similar results were confirmed by Monte Carlo experiments
96 Complexity Analysis and the Ellipsoid Method Chap. 5

-,-------
x4:
'

__ ,..-- Xz
x:. ~---cf--- .::_ ........_ .._-__ x2

Figure 5.2

with artificial probability distributions. Hence the expected computational effort of the
simplex method is of O(m 2 n).
When sparsity issues are addressed, a regression equation of the form K mo: nd 0 ·33
usually provides a better fit for the complexity of the simplex method, where K is a
constant, 1.25 < a < 2.5, and d is the number of nonzero elements in matrix A divided
by nm. This explains why the simplex method is efficient in practice, although it is of
exponential complexity in theory.

5.3 BASIC IDEAS OF THE ELLIPSOID METHOD

After the simplex method was realized to be of exponential complexity, a major theo-
retical question arose: "Is there a polynomial-time algorithm for linear programming?"
An affirmative answer was finally provided by L. G. Khachian in 1979. He showed how
one could adapt the ellipsoid method for convex programming (of which linear program-
ming is a special case) developed by N. Z. Shor, D. B. Yudin, and A. S. Nemirovskii to
give a linear programming algorithm of polynomial complexity. More precisely, Yudin
and Nemirovskii showed that the ellipsoid method related to Shor's work approximates
the exact solution within any given tolerance E > 0 in a number of iterations which is
polynomial in both the size of input data and log (1/E). Khachian further proved that
when the method is applied to linear programming problems with integer coefficients,
even an exact solution can be obtained in polynomial time. In this section, we introduce
the basic ideas of the ellipsoid method for linear programming.
Consider a system of n variables in m (strict) linear inequalities, i.e.,
n

I:>ijXj < bi' i = 1, 2, ... , m (5.6)


j=l
or

Ax< b (5.6a)
Sec. 5.3 Basic Ideas of the Ellipsoid Method 97

with A being an m x n matrix, x E Rn, and b E Rm. Our objective is to find a solution
of (5.6) if it exists. The ellipsoid method starts with a spheroid whose radius is large
enough to include a solution of the system of inequalities if one exists. Denoting the set
of solutions in the initial spheroid by P, the algorithm proceeds by constructing a series
of ellipsoids, Eb at the kth iteration such that P s; Ek. The ellipsoids are constructed in
a way that their volumes shrink geometrically. Since the volume of P can be proven to
be positive when P ::f.¢>, one can show that after a polynomial number of iterations the
algorithm either finds the center point of the current ellipsoid is a solution or concludes
that no solution exists for (5.6).
We now describe the method in geometric terms. Given a nonnegative number r
and a point z E Rn, a spheroid (sphere) centered at z with radius r in the n-dimensional
Euclidean space is defined by
n
S(z, r) = {x E Rn I L (x; - Z;) 2 :::=:: r 2 } = {x E Rn I (x- zl (x- z) :::=:: r 2 } (5.7)
i=l

The volume of S(z, r) is denoted by vol (S(z, r)). Given ann x n nonsingular matrix A
and a point cERn, an affine transformation T(A, c) maps every point x ERn to a new
point A(x- c) E Rn. An ellipsoid is the image of the unit sphere S(O, 1) under some
affine transformation. Therefore an ellipsoid can be represented by
E = {x ERn I (x- cl AT A(x- c) ::::: 1} (5.8)

The point c is defined to be the center of E, and the volume of E is then given by
vol (E) = det(A -I) x vol (S(O, 1)) (5.9)

where det(A -I) is the determinant of the inverse matrix of A. By a half-ellipsoid 4E,
we mean the intersection of E with a halfspace whose bounding hyperplane, H = {x E
Rn 1 aT x = f3} for some vector a E Rn and scalar f3, passes through the center of E. In
other words, we may define

(5.10)

Example 5.2
In Figure 5.3, E = S(O, 1) is the 2-dimensional unit sphere, the shaded area is !E given
by the intersection of E with the halfspace ((x 1, x2 ) E R2 1x 1 :::: 0}. Passing the points
(1, 0), (0, 1), and (0, -I), a new ellipsoid

E= {x E R2 ! (9/4)(xJ - 1/3) 2 + (3/4)xi::::: 1}


is constructed to include !E with a minimum volume vol (E) = [4.J3/9] x vol (S(O, 1)).
The center of E is at (1/3, 0) and the defining matrix

3/2
A= [ 0
0
.J3!2
J with det(A) = 3.../3/4

We can further extend the result in Example 5.2 to the n-dimensional case. For E =
S(O, 1) s; Rn with the half-ellipsoid 4E
= {x E E I x 1 2:: 0}, we can construct a new
98 Complexity Analysis and the Ellipsoid Method Chap. 5

E = { (x 1, x2) I 9/4(x 1 - 113)2 + 3/4(x2) 2 ~ 1)

(1, 0)

Figure 5.3

ellipsoid

t~{xER' + 1)2 (x,


(~ - n +11 )2 + ( n 2n~ 1) ~n x( :S 1} (5.11)

whose center is at

-
1
( n+1
,o, ... ,o)
and
2 2
vol (E)= (-n-) (-
n+1 2 ) n
n -1
(n-l)/ x vol (E)

The associated affine matrix A is ann-dimensional diagonal matrix with (n + l)jn


as its first diagonal element and [(n 2 - l)jn 2 ] 112 as the remaining diagonal elements.
The picture of E is shown in Figure 5.4.
In Figure 5.4, we see the ellipsoid E is determined by three parameters r, a, and
8, where
r=1/(n+1) (5.12a)
a=2j(n+1) (5.12b)
and
8=n 2 j(n 2 -1) (5.12c)
Comparing it to E, E moves its center from the origin to ( r, 0, ... , 0), shrinks
in the x 1 direction by the factor .)8(1- a) = nj(n + 1) and expands in all orthogonal
directions by the factor .../8 = nj~. Hence we call r, a, and 8 the step, dilation,
and expansion parameters.
There are two interesting consequences of the facts mentioned above. First, note
that affine transformations preserve ratios of volumes, and every ellipsoid can be mapped
Sec. 5.3 Basic Ideas of the Ellipsoid Method 99

Figure 5.4

to the unit sphere by an appropriate affine transformation. In Exercise 5.6, we can further
prove that

(_n_)
+
n 1
(_:f_)(n-1)/2 :S: e-l/2(n+l)
n2 - 1
for any integer n > 1

Hence we have the following result:

!
Lemma 5.1. Every half-ellipsoid E is contained in an ellipsoid E whose volume
is less than e-I/Z(n+Il times the volume of E.

Second, for a convex polyhedral set P contained in an ellipsoid E, if the center of


E lay outside P, then P would be contained in some half-ellipsoid !E and, consequently,
in a smaller ellipsoid E. Hence we have the following lemma:

Lemma 5.2. The smallest ellipsoid E containing a convex polyhedral set P has
its center in P.

Lemma 5.2 actually suggests an iterative scheme to solve the system of inequalities
(5.6). Here is the basic idea: if part of the solution set of (5.6) forms a convex polyhedron
P and it is contained in an ellipsoid Ek at the kth iteration, then we could check the center
of Ek. If the center of Ek belongs to P, we find a solution to (5.6). Otherwise, we can
replace Ek by a smaller ellipsoid Ek+I = Ek and repeat this process. Since Lemma 5.1
indicates that the volume of Ek shrinks at least by an exponential term e-l/ZCn+Il after
each iteration, this iterative scheme requires only a polynomial number of iterations to
reach a conclusion, if we know where to start and when to terminate.
100 Complexity Analysis and the Ellipsoid Method Chap.5

To start up the iterative scheme, consider the following known result (to be proven
in Exercise 5.7):

Lemma 5.3. If the system of inequalities (5.6) has any solution, then it has a
solution x E Rn such that
j = 1,2, ... ,n (5.13)
where L is the input size given by (5.3) with Cj = 0 for all j.

Hence we can define Eo = S(O, 22L), which is an n-dimensional sphere with


radius equal to 22L. In this case, the convex polyhedron P defined by (5.6) and (5.13) is
contained in Eo to let us proceed with the iterative scheme.
To terminate the iterative scheme, we should know the following result (to be
proven in Exercise 5.8):

Lemma 5.4. If the system of inequalities (5.6) has a solution, then the volume
of its solutions inside the cube {x E Rn llxd :S 2L, i = 1, ... , n} is at least 2-(n+l)L.

Hence we can terminate the iterative scheme when vol (Ek) < 2-<n+ l)L. In this
case, (5.6) has no solution.
Summarizing Lemmas 5.1-5.4, we can outline the basic geometry of the ellipsoid
method for solving a system of strict linear inequalities (5.6) as follows:

Step 1 (initialization): Let Eo = S(O, 22 L) and k = 0.


Step 2 (checking for solution): If the center xk of Ek satisfies (5.6), then stop.
Otherwise, let Ek+l be the smaller ellipsoid Ek which contains the polyhedron P
defined by (5.6) and (5.13). Increase k by 1.
Step 3 (stopping): If vol (Ek) < 2-(n+I)L, then stop with the conclusion that
(5.6) has no solution. Otherwise, go to Step 2.

5.4 ELLIPSOID METHOD FOR LINEAR PROGRAMMING

Following the basic ideas described in the previous section, we introduce the ellipsoid
method for linear programming in this section. The major task is to construct E of Step
2 in algebraic terms. To do so, we let aT = (an, ai2• ... , ain) E Rn fori = 1, ... , m,
and rewrite system (5.6) as
i = 1,2, ... ,m (5.14)
Moreover, we let Ek be an ellipsoid defined by {x E Rn 1 (x-xk)TBj; 1(x-xk) :::::: 1}, where
1
xk is the center of Ek and Bj; =A[ Ak for some affine transformation matrix Ak. Notice
that when Ak is nonsingular, Bj; 1 is positive definite. Furthermore, if xk is not a solution
of (5.14), then there exists an index i such that aT xk :::.: bi and the potential solution falls
aT aT
in the half-ellipsoid ~Ek = {x E Ek I x :S xk} = {x E Ek I -aT -aT
x :?.: xk}. Refer
Sec. 5.4 Ellipsoid Method for Linear Programming 101

back to Figure 5.4. Based on the three parameters of the step, dilation, and expansion
defined by (5.12), the new ellipsoid Ek is defined by

xk+l = xk - r ( Bkai I j a[Bkai) (5.15)

and
(5.16)

Since B;; 1 is symmetric and positive definite, it can be shown that B;;~ 1 is also a sym-
metric positive define matrix and the set
{ x ERn I (x- xklB;; 1(x- xk) :::=:: 1, af x < b;} (5.17)

is contained in the ellipsoid Ek+l = {x ERn I (x-xk+ 1


)TB;;~ 1 (x-xk+ 1) :::=:: 1}. Moreover,
the volume of Ek+ i is

_ n ) (~) (n+l)/2
( n+1 times the volume of Ek
n2 - 1
Now, the ellipsoid method for solving a system of strict linear inequalities can be
described in algebraic terms:

Step 1 (initialization): Let k = 0, Ek = S(O, 22L), Bk = 22LI, and xk = 0.


Step 2 (checking for solution): If xk satisfies (5.14), then stop with the solu-
tion xk. Otherwise, identify an index i such that af xk ::: b;, calculate xk+ 1 and
Bk+t according to (5.15) and (5.16) and increase k by 1.
Step 3 (stopping): If vol (Ek) < 2-(n+I)L, then stop with the conclusion that
(5.14) has no solution. Otherwise, go to Step 2.

Notice that after each iteration vol (Ek) is reduced at least by a factor of e-I/Z(n+l).
Since the starting volume is vol (S(O, 22L)), and the smallest ending volume is 2-(n+l)L,
the total number of iterations is at most a constant times n 2 L. This can be shown in
Exercise 5.11. Also notice that the formulas (5.12), (5.15), and (5.16) for xk+ 1 and
Bk+l assume exact arithmetic. For real implementation, one must use finite-precision
arithmetic to approximate exact numbers. This may cause computational errors. Never-
theless, Khachian indicated that if we take 23L bits of precision before the decimal point
and 38nL after the point, then it is sufficient for rounding approximation to an exact
number. However, if the values of xk+ 1 and Bk+t are rounded to this specified number
of bits, the ellipsoid Ek+l may not contain the re~uired half-ellipsoid. Khachian further
showed that if Bk+l is multiplied by a factor 2 114n , which is slightly larger than 1, then
Ek+l will always contain the desired half-ellipsoid. This guarantees the ellipsoid method
terminates within O(n 2 L) iterations with an exact solution. For our limited interests,
unless otherwise noted we will assume throughout that exact arithmetic is used in this
chapter.
The following simple example shows how the algorithm works.
102 Complexity Analysis and the Ellipsoid Method Chap. 5

Example 5.3
Let (5.6) be given by
Xj < 0

X2 < 0

In this case af = (1, 0), ar = (0, 1), bT ::::: (0, 0), L = 2 + log2 5, and 22L = 400. The
algorithm starts with

Bo = [4o o
0
0
400
J
and terminates in two iterations with

X
1 = [-20/3]
0 , B 1 = [ 1600/9 0 ]
0 1600/3
2 [ -20/3 ] B2 = [ 6400/27 0 ]
X = -40-J} /9 ' 0 6400/27
The geometric picture is given by Figure 5.5.

Figure 5.5

The duality theory in Chapter 4 indicates that in order to solve a linear programming
problem in its canonical form, i.e.,
Maximize cT x (5.18a)
subject to Ax ~ b, X ::: 0, (5.18b)
we only need to consider the following system of inequalities:
CT X= bT W (5.19a)
Ax~ b, x::: 0 (5.19b)
(5.19c)
Sec. 5.5 Performance of the Ellipsoid Method for LP 103

We know that system (5.19) is solvable if and only if the original problem (5.18) has
a feasible solution and a finite optimum. Moreover, if (x, w) is a solution to system
(5.19), then x is an optimal solution to (5.18). Notice that system (5.19) is not of the
strict inequality form, and the ellipsoid method may not be applicable. Fortunately, we
can perturb system (5.19) by a very small number 2-L to convert the weak inequality
form to strict inequalities. The following lemma validates this perturbation scheme.

Lemma 5.5. Suppose that the system

aiT x < bi+ 2-L , i = 1, ... ,m


has a solution, then
af x::::: bi, i = 1, ... ,m
has a solution.

Hence the ellipsoid method can be applied to system (5.19) with a perturbation
factor 2-L for strict inequalities. In this way, it is clearly seen that a polynomial time
algorithm for linear equalities yields a polynomial time algorithm for linear programming
problems.

5.5 PERFORMANCE OF THE ELLIPSOID METHOD FOR LP

In theory, to solve a system of m strict inequalities in n variables, the ellipsoid method


requires at most O(n 2 L) iterations, and at each iteration it requires O(n 2 L) elementary
operations in updating xk+l and Bk+I to the required precision level. Therefore the total
complexity of the ellipsoid method in this case is of 0 (n 4 L 2 ), which is far better than that
of the simplex method. However, from a practical point of view, the ellipsoid method
may have little use. There are several disadvantages of this approach to solving linear
programming problems.
The first disadvantage is the slow convergence. Consider system (5.19); it has n+m
variables and hence the ellipsoid method is applied to a system of linear inequalities in
Rn+m. The initial ellipsoid may have an astronomical volume if we cannot cleverly
identify a good bound for the feasible domain of both problem (5.18) and its dual
problem. Moreover, the solution of system (5.19) lies in the hyperplane cT x = bT y;
hence, even if (5.19) is feasible, the volume of the feasible set is zero. By perturbing the
right-hand side in (5.19), the corresponding feasible set still has a very small volume; thus
the number of iterations is likely to be very large. In fact, practitioners have confirmed
the extremely slow convergence.
The second disadvantage is that, if the method determines that (5.19) is infeasible,
after a long period of tedious computation, it is not clear whether the original linear
program (5.18) is infeasible or unbounded. Of course there is a remedy strategy, but
more computation work is required.
104 Complexity Analysis and the Ellipsoid Method Chap. 5

The third disadvantage is due to sparsity. So far, the ellipsoid method does not
seem to be able to exploit sparsity. We may start with a very simple matrix Bo, but the
number of the fill-in elements in Bk grows very rapidly. Thus even if the number of
iterations could be reduced significantly, the ellipsoid method would still have problems
in solving large-scale linear programming problems.
After all, a fundamental difficulty is due to the limitations of finite-precision arith-
metic. It is unlikely that any reasonable implementation of the method would be of
polynomial time.

5.6 MODIFICATIONS OF THE BASIC ALGORITHM

To improve the slow convergence of the ellipsoid method, variations of the basic ellipsoid
method were developed. Since the number of iterations depends upon the volume ratio of
Ek+l to Ek> research has been conducted to generate smaller ellipsoids at each iteration by
considering deep cuts, surrogate cuts, and parallel cuts. Researchers have also replaced
the role of ellipsoids in the basic method by certain polyhedra called simplices. In this
section we discuss some of these modifications.

5.6.1 Deep Cuts

In the basic ellipsoid method, suppose that xk violates the ith constraint of (5.14);
the ellipsoid Ek+ 1 constructed according to (5.15) and (5.16) contains the half-ellipsoid
~Ek = {x E Ek I aT x ::::; aT xk}. In reality we only require that Ek+ 1 contain the smaller
portion {x E Ek I aT x < b;} C Ek. Hence we may obtain an ellipsoid of smaller volume
by using the deep cut aT x ::::; b; instead of the cut aT x ::::; aT xk, which passes through the
center of Ek. This is illustrated in Figure 5.6.

Deep cut

Figure 5.6
Sec. 5.6 Modifications of the Basic Algorithm 105

Figure 5.7

To derive the formulas for the smaller ellipsoid, we consider the basic case where
E is the unit sphere S(O, 1), and one of the inequalities in (5.14) reads xi > t for some
0 ::=: t < 1. As Figure 5.7 shows, the feasible set P defined by (5.13) and (5.14) can be
included in an ellipsoid
2
E= {X ER" u1 ~ 1 J
1
(xi<:":)'+ n'~: ~ :,) t,x/ 1} oO (5.20)

whose center is
1 + nt )
( n + 1 , 0, ... , 0

and volume is

(1- t)(l- t2)(n-l)/2 (-n-)


n+
(~)(n-1)/2
1 n2 - 1
times the volume of the unit sphere. Notice that
vol (E)= (1 - t)(1- t 2 )(n-l)/2 vol (E) (5.21)

When t = 0, E = E, and E becomes smaller as t increases. When t approaches 1, E


becomes one point with null volume.
Parallel to the previous derivation, the ellipsoid E is defined by (5.15) and (5.16)
with new parameters
1 +nt
r=--, ()" = _2(1
__ + ___:.__
nt)
(5.22)
n+l (n+1)(1+t)'
106 Complexity Analysis and the Ellipsoid Method Chap.5

where

(5.23)
Computing t for each inequality in (5.14), if any one t is greater than or equal to one,
then system (5.14) has no feasible solution. Otherwise we can select the deepest cut with
the largest t for constructing Ek+l· Conceptually, deep cuts should lead to faster volume
reduction, and hence faster convergence of the ellipsoid method. But, as reported by
researchers, the improvement obtained can be rather disappointing.

5.6.2 Surrogate Cuts

A surrogate cut is generated by combining some inequalities of (5.14) to achieve a deeper


cut than any cut generated by a single constraint of (5.14). In theory, any cut of the form
m m
L uiaf x::: L uibi
i=l i=l

is valid as long as ui :::: 0 fori = 1, ... , m, since no points that satisfy (5.14) are cut off
by this inequality. It can be shown that the deepest surrogate cut at the kth iteration of
the ellipsoid method is the one whose ui coefficients are obtained by solving
maximize uT (AT xk- b)/(uT ATBkAu) 112 (5.24)
u:::o
where A is defined in (5.6). In practice, since solving (5.24) requires a sufficient amount
of computations, only surrogate cuts which can be generated from two constraints are
considered. Figure 5.8 illustrates a surrogate cut.

---
--- ---

Figure 5.8

5.6.3 Parallel Cuts

Suppose system (5.14) contains a pair of parallel constraints


af x < bi and aJ x < bj (5.25)
Sec. 5.6 Modifications of the Basic Algorithm 107

where aj = -a; and -bj < b;. Then we consider how to use two constraints simultane-
ously to generate a new ellipsoid. At the kth iteration, we let a = (a[ xk- b;) j a[Bkai J
and a= (aJ xk -bj)jy'a[Bka;. Suppose that aa < 1/n and a ::s-a ::s 1; then formulas
(5.15) and (5.16) with new parameters
I
P = [4 (1 _ a2)(1 _ a2) + n2(a2 _ a2)2] 2 (5.26a)

a = (lj(n + l))(n + (2/(a- a) 2)(1- aa- pj2)) (5.26b)


r=a(a-a)/2 (5.26c)
8 = (n 2j(n 2 - 1))(1- (a 2 +a2 - pjn)/2) (5.26d)

generate Ek+I that contains the slice {x E Ek I - bj ::S af x ::s b;} of Ek. Figure 5.9 shows
parallel constraint on the unit sphere.

0
xk

Figure 5.9

5.6.4 Replacing Ellipsoid by Simplex

In the early development stage of the ellipsoid method, A. Yu. Levin already used
simplices rather than ellipsoids to achieve iterative volume reductions. This can be
viewed as a polyhedral version of the ellipsoid method. After Khachian proposed his
method, this idea regained researcher interest.
To describe this idea, we assume that there are n + 1 points v0 , v 1 , ... , vn in Rn,
and no hyperplane passes through all of them. Then the convex hall (as defined in
Chapter 2) generated by these n + 1 points in Rn forms a simplex S(v0 , ... , vn). It is
obvious that a simplex in R 2 is a triangle, and in R 3 a tetrahedron. The center of this
simplex is the point defined by
1 n
C=-I>i
n+1 i=O
(5.27)

Similar to a half-ellipsoid, we define a half-simplex ~S to be the intersection of a simplex


S with a halfspace whose bounding hyperplane passes through the center of S.
For a given simplex S(v0 , ... , vn), let }S(v0 , ... , vn) be the intersection of
S(v , ... , vn) and the halfspace {x E Rn I aT x ::S b}, and
0

e(x) =b-aT x (5.28)


108 Complexity Analysis and the Ellipsoid Method Chap. 5

for xES. Moreover, we let e(vk) = max{e(vi) 1 i = 0, ... , n}. Since ~S :f.¢, we have
e(vk) > 0. We now define

(5.29a)

and
. k 1 . k
v' = v + d; (v' - v ), i=0,1, ... ,n (5.29b)

It is straightforward to show that the new simplex S(VO, ... , vn) contains the half-simplex
~S(v 0 , ... , vn), and vol (S(VO, ... , vn)) < e- 1/ 2 (n+Il vol (S(v0 , ... , vn)). Consequently,
2

the following lemma holds:

Lemma 5.6. Every half-simplex ~S is contained in a simplex S whose volume


is less than e- 112(n+I? times the volume of S.

Then a polyhedral version of the ellipsoid method follows.

5.7 CONCLUDING REMARKS

Linear programming practitioners have taken the efficiency of the simplex method for
granted for a long time. However, the worst-case analysis shows the algorithm is of
exponential complexity. The gap in-between still requires substantial efforts to reach full
understanding.
On the other hand, Khachian's algorithm is of polynomial complexity, which settles
a significant theoretical question about the degree of difficulty of linear programming
problems. However, even with considerable modification, the ellipsoid method seems to
be inferior to the simplex method for practical computation.
Although the ellipsoid method has also showed its theoretical significance in solving
nonlinear and combinatorial optimization problems where the constraints are known only
implicitly and may be exponential in number, a polynomial-time algorithm with better
performance in practice for linear programming problems is still in high demand. In 1984,
N. Karmarkar finally provided a promising result and stimulated exciting developments
in this area. We shall study Karmarkar' s algorithm in the next chapter.

REFERENCES FOR FURTHER READING

5.1. Bland, R. G., Goldfarb, D., and Todd, M. J., "The ellipsoid method: a survey," Operations
Research 29, 1039-1091 (1981).
5.2. Borgwardt, K. H., The Simplex Method: A Probabilistic Analysis, Springer-Verlag, Berlin
(1987).
5.3. Burrell, B. P., and Todd, M. J., "The ellipsoid method generates dual variables," Mathematics
of Operations Research 10, 688-700 (1985).
Exercises 109

5.4. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).


5.5. Gacs, P., and Lovasz, L., "Khachian's algorithm for linear programming," Mathematical
Programming Study 14, 61-68 (1981).
5.6. Goldfarb, D, and Todd, M. J., "Linear Programming," in Optimization, Handbook in Opera-
tions Research and Management Science, ed. Nemhauser, G. L., and Rinnooy Kan, A. H. G.,
Vol. 1, 73-170, Elsevier-North Holland, Amsterdam (1989).
5.7. Grotschel, M., Lovasz, L., and Schrijver, A., The Ellipsoid Method and Combinatorial Opti-
mization, Springer Verlag, Heidelberg (1988).
5.8. Khachian, L. G., "A polynomial algorithm in linear programming" (in Russian), Doklady
Akademiia Nauk SSSR 224, 1093-1096, (English translation) Soviet Mathematics Doklady
20, 191-194 (1979).
5.9. Khachian, L. G., "Polynomial algorithms in linear programming" (in Russian), Zhurnal \)1-
chisditel'noi Mathematikii Mathematicheskoi Fiziki 20, 51-68, (English translation) USSR
Computational Mathematics and Mathematical Physics 20, 53-72 (1980).
5.10. Klee, V., and Minty, G. L., "How good is the simplex algorithm?," in Inequalities Ill, ed.
0. Shisha, Academic Press, New York, 159-179 (1972).
5.11. Kozlov, M. K., Tarasov, S. P., and Khachian, L. G., "Polynomial solvability of convex
quadratic programming" (in Russian), Doklady Akademiia Nauk USSR 5, 1051-1053 (1979).
5.12. Levin, A. Yu., "On an algorithm for the minimization of convex functions" (in Russian),
Doklady Akademiia Nauk USSR 160, 1244-1247, (English translation) Soviet Mathematics
Doklady 6, 286-290 (1965).
5.13. Shamir, R., "The efficiency of the simplex method: a survey," Management Science 33,
301-334 (1987).
5.14. Shor, N. Z., "Utilization of space dilation operation in minimization of convex functions" (in
Russian), Kibernetika 1, 6-12, (English translation) Cybernetics 6, 7-15 (1970).
5.15. Shor, N. Z., Minimization Methods for Non-differentiable Functions, Springer-Verlag, Berlin
(1985).
5.16. Todd, M. J., "Improved bounds and containing ellipsoids in Karmarkar's linear programming
algorithm," Mathematics of Operations Research 13, 650-659 (1988).
5.17. Ye, Y., "Karmarkar's algorithm and the ellipsoidal method," Operations Research Letters 4,
177-182 (1987).
5.18. Yudin, D. B., and Nemirovskii, A. S., "Informational complexity and efficient methods for the
solution of convex extremal problems" (in Russian), Ekonomika I Matematicheskie Metodv
12, 357-369, (English translation) Matekon 13, 3-25 (1976).

EXERCISES

5.1. Compare the graphs of fi(n) = n 2 , h(n) = n 3 , h(n) = 2n, f4(n) = I00n 2 , and fs(n)
= (0.001)2n, for n ::: 0.
(a) Does a quadratic algorithm always perform better than a cubic algorithm? Why?
(b) Does a polynomial algorithm always perform better than an exponential algorithm?
Why?
5.2. Show that C(n, m) > 2m for nonnegative integers n and m with n :::2m.
110 Complexity Analysis and the Ellipsoid Method Chap. 5

e
5.3. Consider Klee-Minty' s example, letting = 1IE and the linear transformation w I = X],
w; = (x;- EX;-1)/Ei- 1 fori= 2, ... , n. Show that problem 5.5 is equivalent to
n
Maximize L w;
i=l

subject to
i-1
w; +2L Wk :::::gi-l for i = 2, ... , n
k=2

W], ... , Wn:::: 0.


5.4. (a) Solve the foregoing equivalent problem, starting from the origin and using the largest
reduction rule for entering variables, for the case n = 2 and n = 3.
(b) Draw the graph for each case.
(c) Show by induction on n that the simplex method traverses through 2n - 1 extreme
points.
5.5. (a) Prove that in Rn, E given by Equation (5.11) contains ~S(O, 1).
(b) Notice that affine transformations preserve ratios of volumes; determine the volume of
:E.
5.6. Show that
(_n_) (-n-) (n-l)/ 2
2
< e-1/2(n+l) for any integer n > 1
+
n 1 n2 - 1 -
5.7. Let P be the polyhedron defined by
n
LGijXj::::: b;, i = 1,2, ... ,m
j=l
x::::o
with L defined by (5.3) with Cj = 0 for j = 1, ... , n. Show that for every extreme point
v of P, its maximum norm max; jv;l < 2L jn, and its entries are rational numbers with
denominator at most 2L. (Hint: Express v by Cramer's rule and use Hadamard's inequality
to derive the result.)
5.8. If the system of inequalities (5.6) has a solution, then the volume of its solutions inside the
n-dimensional cube jx;l ::::: 2L is at least 2-(n+l)L. (Hint: You may assume that (5.6) has a
positive solution, so P in Exercise 5.7 has an interior solution. Then consider the polytope
formed by n
"'a··x·
~ lj 1< - b·,, i = 1,2, ... ,m
j=l

j = 1, 2, ... , n
It has n + 1 extreme points vo, ... , Vn which are not on a hyperplane. Therefore the volume
of the polytope is at least

:! I det ( :o : v~ ) I·
1 · ..

Then follow the result of Exercise 5.7 to finish the proof.)


Exercises 111

5.9. When Bk is a symmetric positive definite matrix, prove that Bk+I defined by (5.16) has the
same property.
5.10. Show that the set defined by (5 .17) is contained in the ellipsoid
Ek+I = {x ERn I (x- xk+I)TB_;;~ 1 (x- xk+ 1) _:::: 1}.

5.11. Consider the ellipsoid method in Section 5.3. What is the volume of Eo? Show that the
total number of iterations needed is O(n 2 L).
5.12. Use Farkas' lemma in Chapter 4 to prove Lemma 5.3.
5.13. Show that the ellipsoid defined by (5.20) has its center at ((1 + nt)j(n + 1), 0, ... , 0) and
has volume equal to

(1 - t)(l - P)(n-1)/2 (-n-) _n_ 2 )(n-1)/2

n+1 ( n2 - 1
times the volume of the unit sphere.
5.14. Prove that in Rn, E given by Equations (5.15), (5.16), (5.22), and (5.23) contains the desired
feasible solution set, and determine the volume of E.
5.15. Consider a simple system of linear inequalities x 1 > 1/2, x 2 > 1/2. Solve the problem by
the basic ellipsoid method and the modified method with deep cuts. Does the idea of deep
cuts help?
5.16. Consider Exercise 5.15. Generate a surrogate cut of x 1 +x2 _:::: 1 and then apply the modified
ellipsoid method to solve the problem.
5.17. Consider a simple system of linear inequalities x 1 > 1/4, x 1 < 1/2, x 2 < 1/2. Solve the
problem by the ellipsoid method with parallel cuts.
5.18. Prove that the deepest surrogate cut at the kth iteration of the ellipsoid method is the one
whose Ui coefficients are obtained by solving (5.24).
5.19. In generating parallel cuts, if bj = -bi, calculate parameters r, a, and 8. Compare the
rank of Bk and Bk+ 1 and conclude that Ek+ 1 becomes flat in the direction of ai.
5.20. For any x E S(v0 , ... , vn), we have
n
X= LUiVi
i=O

for some nonnegative Ui. Define Iii= diui fori=!= k and


_ e(x)
lli = di Ui + - 2- - k - for i=k
n e(v )
If X further satisfies aT X< b, show that X belongs to S('v 0 ,
vn). 0 0 0,

5.21. Prove the ratio r between the volume of the new simplex and the volume of a given simplex
2
in Lemma 5.6 is less than e- 1/ 2 (n+ll • [Hint: Note the facts that vk = vk; each vi with
i :j:: k lies on the line passing through vk and vi; and the distance from vk to vi equals to
the distance from vk to vi divided by di. Hence

r = fl _!_,]

i# l
6

• •
ar111arkar's ro1ect•ve
Scaling lgorithrn

In the fall of 1984, N. K. Karmarkar of AT&T Bell Laboratories proposed a new


polynomial-time algorithm for linear programming. Unlike the ellipsoid method, the new
algorithm not only possesses better complexity than the simplex method in the worse-
case analysis, but also shows the potential to rival the simplex approach for large-scale
real-world applications. This development quickly captured the attention of everyone in
the field.
Radically different from the simplex method, Karmarkar's original algorithm con-
siders a linear programming problem over a simplex structure and moves through the
interior of the polytope of feasible domain by transforming the space at each step to
place the current solution at the center of the polytope. The concept of reaching the op-
timum through the interior has stimulated many new researches in developing so-called
interior-point methods. Numerous extensions and variants have been reported.
In this chapter, we first introduce the basic idea of Karmarkar's algorithm, then
describe the algorithm in detail with a proof of polynomial-time complexity. Some
extensions and a computer implementation procedure will also be discussed. The so-
called affine scaling algorithms will be left for discussion in the next chapter.

6.1 BASIC IDEAS OF KARMARKAR'S ALGORITHM

As discussed in Chapter 5, the philosophy of solving an optimization problem via an


iterative scheme is to start with a "rough" solution and successively improve the current
solution until a desired goal is met. The performance of an iterative algorithm depends

112
Sec. 6.1 Basic Ideas of Karmarkar's Algorithm 113

upon two key factors: (1) How many steps (iterations) does it take? (2) How much
computation does it involve in each iteration?
The simplex method starts with an extreme point and keeps moving to a better
neighboring extreme point at each iteration until an optimal solution or infeasibility
is reached. In this scheme, the computational work at each iteration is minimized by
limiting the searches to only those edge directions which lead to adjacent extreme points.
But, as the Klee-Minty example showed, the simplex method may have to travel a
long path on the boundary of the feasible domain and visit almost every extreme point
before it stops. This boundary approach suffers from heavy computation in large-scale
applications, since the feasible domain may contain a huge number of extreme points.
Therefore one alternative idea is to travel across the interior of the feasible domain along
a "shorter path" in order to reduce the total number of iterations. However, this interior-
point approach usually requires the consideration of all feasible directions for a better
movement at each iteration. In other words, the new philosophy is to reduce the number
of iterations at the expense of heavier computation at each iteration.
In general, it is not an easy task to identify the "best direction of movement" among
all feasible directions at a particular interior point of the feasible domain. However,
Karmarkar noticed two fundamental insights, assuming the feasible domain is a polytope.

1. If the current interior solution is near the center of the polytope, then it makes sense
to move in the direction of steepest descent of the objective function to achieve a
minimum value.
2. Without changing the problem in any essential way, an appropriate transformation
can be applied to the solution space such that the current interior solution is placed
near the center in the transformed solution space.

The first insight can be observed in Figure 6.1. Since x 1 is near the center of
the polytope, we can improve the solution substantially by moving it in a direction of
steepest descent. But if an off-center point x 2 is so moved, it will soon be out of the
feasible domain before much improvement is made.

Figure 6.1

Karmarkar observed the second insight via the so-calied projective transformation,
whereby straight lines remain straight lines while angles and distances distort such that
114 Karmarkar's Projective Scaling Algorithm Chap. 6

we can view any interior point as the center of the polytope in a distorted picture. One
can use imagination to verify this with Figure 6.1 by viewing it at an angle and distance
that makes x2 appear to be near the center of the polytope. Such a distortion scarcely
alters anything essential to the problem but merely looks at it from a different viewpoint.
With these two fundamental insights, the basic strategy of Karmarkar's projective
scaling algorithm is straightforward. We take an interior solution, transform the solution
space so as to place the current solution near the center of the polytope in the transformed
space, and then move it in the direction of steepest descent, but not all the way to the
boundary of the feasible domain in order to have it remain as an interior solution. Then
take the inverse transformation to map the improved solution back to the original solution
space as a new interior solution. We repeat the process until an optimum is obtained
with the desired accuracy.

6.2 KARMARKAR'S STANDARD FORM

Following the basic strategy of the projective scaling, Karmarkar' s algorithm has a pre-
ferred standard form for linear programming:
Minimize CT X (6.la)
subject to Ax=O (6.1b)
eT X= 1, X2:0 (6.1c)
where A is an m x n dimensional matrix of full row rank, eT = (1, 1, ... , 1) is an
n- vector of all ones, and c, x E Rn.
A feasible solution vector x of problem (6.1) is defined to be an interior solution
if every variable xi is strictly positive. Note from (6.1c) that the feasible domain is a
bounded set, hence it becomes a polytope. A consistent problem in Karmarkar' s standard
form certainly has a finite infimum. Karmarkar made two assumptions for his algorithm.

(Al) Ae = 0, so that x0 = ft = (ft .... , ft) T is an initial interior solution.


(A2) The optimal objective value of problem (6.1) is zero.

We shall see later how a linear programming problem can be cast into Karmarkar's
standard form satisfying the two assumptions. Here are a couple of examples that fit our
description.
Example 6.1
Minimize -x, + 1
subject to x2 - x3 = 0
Sec. 6.2 Karmarkar's Standard Form 115

Example 6.2
Minimize -x1 - 2x2 + 4x5
subject to x2 - x3 = 0

2x1 - 2x2 + 4x3 - 4xs =0


XJ + 2x2 + x4 - 4xs = 0

XI + X2 + X3 + X4 + X5 = 1

6.2.1 The Simplex Structure

Expression (6.1c) defines a regular polygon in then-dimensional Euclidean space, namely


n
lc. = {x E Rn I I:x; = 1, X; ::=: 0} (6.2)
i=l

It is clearly seen that in R 1, tc. = { 1} which is a singleton; in R2 , it is the line segment


between the points (0, 1) and (1, 0); in R 3 , it is the triangular area formed by (0, 0, 1),
(0, 1, 0) and (1, 0, 0); and in R 4 , it becomes the pyramid with vertices at (0, 0, 0, 1),
(0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1). It is also easy to see that, in Rn, tc. has exactly
n vertices, C(n, 2) edges, C(n, n - 1) facets, and its center at ejn. Just noting the
coordinates of the center and each vertex of b.. (see Figure 6.2), we can show that the
radius of the smallest circumscribing spheroid of tc. is given by

~
R= --=-- (6.3)
-fo

Figure 6.2
116 Karmarkar's Projective Scaling Algorithm Chap. 6

Similarly, the radius of the largest inscribing spheroid in fj_ is given by

(6.4)
r = ~.jr=n(:;=n=-=1"")

6.2.2 Projective Transformation on the Simplex

Let X: be an interior point of fj_, i.e., x; > 0 fori = 1, ... , n and I:7= 1 x; = 1. We can
define an n x n diagonal matrix

X= diag (X:)=

li X2

It is obvious that matrix X is nonsingular and its inverse matrix X -I is also a diagonal
(6.5)

matrix but with 1/x; as its ith diagonal elements for i = 1, ... , n. Moreover, we can
define a projective transformation Tx. from fj_ to fj_ such that

--1
X X
Tx.(x) = __ 1 for each x E fj_ (6.6)
eTX X

Notice that X -Ix is an n-dimensional column vector and eTX - \ is a scalar


which equals the sum of all elements in the vector X -I x. Therefore, the elements in
Tx.(x) are normalized with sum equal to 1. In other words, Tx.(x) E fj_, and Tx. is indeed
a well-defined mapping from fj_ to itself.

Example 6.3
Consider the simplex 6. in R 3 as shown in Figure 6.3. Let X = (1' 0, ol' y = (0, 1' 0) T,
z = (0, 0, 1)T, a = (3/10, 1/10, 3/S)T, b = (1/3, 0, 2j3)T, c = (0, 1/7, 6/7l,
d = (3/4, 1/4, O)T.
Since point A is an interior point, we can define

Xa = ['flO0
0
1/10
,~,]
0 0

Then we have

x,-I ~ [T 5;,] 0
10
0

Moreover, we see that Ta(X) = (1,0,0)T, Ta(Y) = (0, 1,0)T, Ta(Z) = (0,0, 1)T, Ta(a) =
(1/3, 1/3, 1/3)T, Ta(b) = (1/2, 0, 1/2l, Ta(c) = (0, 1/2, 1/2)T, Ta(d) = (lj2, 1/2, O)T.
Sec. 6.3 Karmarkar's Projective Scaling Algorithm 117

z = (0, 0, I)T

X= (1, 0, O)T d y = (0, 1, O)T

Figure 6.3

Example 6.3 showed that the scale and the angle in the transformed space are
distorted such that a current interior point, in this case point a, becomes the center of fl.
In general, we can show the following results:

(Tl) Ty: is a well-defined mapping from b. to b., if x is an interior point of fl.


(T2) T-x(x) = ejn becomes the center of fl.
(T3) Ty:(x) is a vertex of b. if x is a vertex.
(T4) Tx:(x) is on the boundary of b. if x is on the boundary.
(T5) Tx:(x) is an interior point of b. if x is in the interior.
(T6) Ty: is a one-to-one and onto mapping with an inverse transformation Tx:-J such
that

for each y E b. (6.7)

6.3 KARMARKAR'S PROJECTIVE SCALING ALGORITHM

Consider a linear programming problem in Karmarkar's standard form (6.1). Its feasible
domain is a polytope formed by the intersection of the null space of the constraint matrix
A, i.e., {xI Ax = 0} and the simplex b. in Rn. Let x > 0 be an interior feasible solution,
then the projective transformation Tx: maps x E b. to
x- 1x
y = Tx:(x) = __ 1
eTX X

and we can denote x in terms of its image y by the formula

(6.8)
118 Karmarkar's Projective Scaling Algorithm Chap. 6

Plugging the value of x into problem (6.1) according to Equation (6.8), and remembering
that Tx: maps f:.. onto f:.., we have a corresponding problem in the transformed space,
namely,
cTXy
minimize (6.1'a)
eTXy

subject to AXy=O (6.1'b)


eT y = 1, y:::::O (6.1'c)
Note that in problem (6.1') the image of x, i.e., y = Tx:(x) = ejn, becomes a
feasible solution that sits at the center of the simplex f:... If we denote the constraint
matrix by

B= [ ~;] (6.9)

then any direction d E R" in the null space of matrix B, i.e., Bd = 0, is a feasible
direction of movement for y. But remember that the distance from the center off:.. to its
closest boundary is given by the radius r in Equation (6.4). Therefore, if we denote the
norm of d by lldll,
then

y(a) =y+ar cl:ll) (6.10)

remains feasible to problem (6.1') as long as d lies in the null space of matrix B and
0 s a < 1. In particular, if 0 s a < 1, then y(a) remains an interior solution, and its
inverse image
Xy(a)
1
x(a) = Tx:- (y(a)) = ---====---
eTXy(a)
(6.11)

becomes a new interior solution to the original problem (6,1). Also note that since
1 1
r = >-
..)n(n -1) n
we may replace Equation (6.10) by

(6.10')

for 0 sas 1, to obtain a new interior feasible solution.


After determining the structure of the feasible directions in the transformed space,
we focus on finding a good feasible direction that eventually leads to an optimal solution.
Since y is at the center of f:.., from the first insight mentioned in Section 6.1, it makes
sense to move along the steepest descent of the objective function. Although the objective
function (6.1'a) is no longer a linear function-actually it is a fractional linear function-
Karmarkar pointed out that the linear numerator function cTXy could be a good indication
of the reduction of the objective function. Therefore, we take its negative gradient,
Sec. 6.3 Karmarkar's Projective Scaling Algorithm 119

which is -cTX, or equivalently -Xc, as a good candidate. In order to keep feasibility,


we further project the negative gradient into the null space of the constraint matrix B.
From basic knowledge of linear algebra, we have the following formula for the projected
negative gradient:
(6.12)
Now it is easy to describe the basic steps of Karmarkar's algorithm. The algorithm
starts with an interior solution in the original space, maps the solution to the center of b.
by a projective transformation, applies Equation (6.12) to find a good moving direction,
chooses an appropriate step-length and uses Equation (6.10') to move to a new interior
feasible solution in the transformed space, and then maps the new solution back to the
original space according to Equation (6.11) to gain a fair amount of reduction in the
objective function. By repeating this iterative process, Karmarkar showed his algorithm
could terminate in 0 (nL) iterations to reach an optimal solution. We shall study his
proof in the next section. Here we provide an iterative procedure for the implementation
of Karmarkar' s algorithm.

Step 1 (initialization): Set k = 0, x0 = ejn, and L to be a large positive integer.


Step 2 (optimality check): If

CT Xk _:::: 2-L ( CT ~)

then stop with an optimal solution x* = xk. Otherwise, go to Step 3.


Step 3 (find a better solution): Let
xk = diag (xk)

for some 0 < a _:: : 1

k+l Xkyk+l
X = -=--=--:--:-
eTXkyk+l

Set k = k + 1; go to Step 2.

Note that in this computational procedure xk is always an interior feasible solution;


Xk is ann-dimensional diagonal matrix with the ith element of vector xk as its ith diago-
nal element; Bk is the constraint matrix of a linear programming problem in Karmarkar's
standard form as defined in Equation (6.9); dk is a feasible direction of the projected
negative gradient as defined in Equation (6.12); yk+I is a new interior feasible solution in
the transformed space as defined in Equation (6.10'); and xk+ 1 is a new interior feasible
solution as defined by Equation (6.11). Also note that the constant L in Step 2 is usually
120 Karmarkar's Projective Scaling Algorithm Chap. 6

chosen to be the problem size as defined in Chapter 5 or a multiple of the problem size
such that 2-L < 8 for a given tolerance 8 > 0. We shall prove in the next section that,
if the step size a is chosen to be 1/3, then the algorithm terminates in 0 (nL) iterations.
But for real applications, a larger value of a tends to speed up the convergence.
The following example illustrates one iteration of Karmarkar' s algorithm.
Example 6.4
Solve Example 6.1 by Karmarkar' s algorithm.
First we see that the linear programming problem is in Karmarkar' s standard form,
which satisfies both assumptions (A1) and (A2). Hence we start with

xo = (~· ~, ~)
and note that A= [0, 1, -1] and cT = (-1, 0, 0).
Now check Step 2. From Equation (5.4), we can choose L = 20 and easily see that
the objective value at x0 is too high. Therefore we have to find a better solution.
For Step 3, we define

~~J
[ 1/3 0
Xo = ~ 1/3
0
then AXo = [0, 1/3, -1/3] and

Bo = [~ 1/3
1
-~/3 J
Moreover, the moving direction is given by
T
0 2 -1 -1
= -[1- B0T (B 0 B0T ) -l Bo]Xoc =
d ( 9, 9 , 9 )

with norm lldll = .J6;9. For purposes of illustration, we choose a = 1I .J6 to obtain a new
solution in the transformed space
y0 = (1/3, 1/3, 1/3l + (lf3)(1/.J6)(9/../6)(2/9, -1/9, -1/9l = (4/9, 5/18, 5/18)T
Hence the new interior feasible solution is given by
0
1 XoY T
X = eTXoyO = (4/9, 5/18, 5/18)

Continuing this iterative process, Karmarkar's algorithm will stop at the optimal
solution x* = (1,0,0)T. It is worth mentioning that if we take a= 6j.,J6 > 1, then
yl = (1, 0, O)T and x 1 = x*. Hence direction d0 really points to the optimal solution.

6.4 POLYNOMIAL-TIME SOLVABILITY

In this section we show that Karmarkar's algorithm terminates in O(nL) iterations under
assumptions (AI) and (A2). The key to proving this polynomial-time solvability is to find
Sec. 6.4 Polynomial-Time Solvability 121

an appropriate step-length a such that the objective value after each iteration decreases
at a geometric rate. In particular, Karmarkar showed that, for a = 1/3,

fork= 1, 2, ... (6.13)

In this way, for L (or a multiple of it) large enough such that 2-L(cT x0 ) ~ 0, we need
only choose k satisfying

(6.14)

Then we can terminate the algorithm to the precision level we want. Taking the natural
logarithm of (6.14), we see the requirement becomes

In other words, if k > SnL, the algorithm could be terminated with c7 xk < s. Hence
Karmarkar's algorithm requires only a polynomial number O(nL) of iterations.
Notice that (6.13) is equivalent to

or

n loge (c 7 xk) ~ n loge (c7 x0 ) - 5k (6.17)

This shows that the requirement (6.13) will be met if at each iteration we can reduce the
function value of n loge (cT x) by at least a constant of 1/5. Remember that the direction
of movement in Karmarkar' s algorithm was chosen to be the projected negative gradient
in order to reduce the function value of c 7 Xky, which is clearly different from the desired
function n loge (cT x). To link these two different objectives together, Karmarkar defined
a potential function for each interior point x of 1::!. and cost vector c as follows:

(6.18)

Two simple properties can be derived from this definition. First, in the transformed
solution space, we have a corresponding potential function

f'(y) = f(y; Xkc) = Lloge


n (cTX
-.-k
y) (6.19a)
J=l Y;

Remember that
122 Karmarkar's Projective Scaling Algorithm Chap. 6

hence we have

n ( T ) n
= Lloge cxx + Llogexf
j=! 1 j=!

(6.19b)
where det xk is the determinant of the diagonal matrix xk.
The previous equation shows that the potential function is an invariant under the
projective transformation Txk which satisfies the relation
J' (y) = f (x) + loge (det Xk) (6.20)
The second property is based on the observation that

f(xk) = J' (~)-loge (detXk)


and

Therefore if we can reduce the potential function f' (e/ n) by a constant in the transformed
solution space at each iteration, then f (xk) is reduced by the same amount after each
iteration taken in the original space. In particular, if we can show that

J'(yk+!) 2: !' (~)- ~ for k = 0, 1, 2, ... (6.21)

then

j(0+I) 2: f(xk)- ~ fork= 0, 1, 2, ...

Consequently, we have
f(xk) 2: f(xo)- 5k fork= 1, 2, ...

or
n n k
n loge c 7 xk - L loge xJ 2: n loge c 7 x0 - L loge xJ - 5
j=! j=! n
Note that x0 is at the center ejn of /::, and the function value of L loge Xj over /::, is
j=!
maximized at the center of!::,, hence condition (6.17)
k
n loge (c 7 xk) 2: n loge (c 7 x0 ) -
5
is immediately achieved to guarantee the polynomial-time termination of Karmarkar's
algorithm.
Sec. 6.4 Polynomial-Time Solvability 123

The remaining work is to show that condition (6.21) holds for an appropriately
chosen step-length a in Karmarkar's algorithm. Recall from (6.19a) that
n
j'(y) = nloge (cTXky)- L)oge YJ
}=!

We examine its two terms separately. First we show a lemma as follows.

Lemma 6.1. In Karmarkar' s algorithm, let

e d )
y = ;;- + ;;-a ( TldiT for some 0 :=: a :=: 1
where

then

n loge (cTXky) :=: n loge ( cT~ke) -a (6.22)

Proof Note that the direction vector d is obtained as the projection of the negative
cost vector -cTXb hence cTXkd = -lldll 2. Then we have
T cTXke a
c Xky = - - - -lldll
n n
Moreover, we define

and s'(~,/3)
to be the spheroid in the transformed space which has a center at ejn with a radius
;S :=:: 0. In this way, if we take
~
;S=R=y---;-
then y(R) is the minimizer of the following problem:
Minimize cTXky

subject to AXky = 0

yES'(~, R)
which is a relaxation of the problem
Minimize cTXky

subject to AXky = 0

eT y = 1, y :=:: 0
124 Karmarkar's Projective Scaling Algorithm Chap. 6

Notice that the latter problem is closely related to problem (6.1). By Karmarkar's second
assumption (A2), we know its optimum value is zero. Hence we know the optimal
objective value of the relaxed problem is nonpositive and
cTXke
cTXky(R) = - - - Rlldll:::: 0
n
This implies that
1 cTXke
-lldll : =: _R_n_

Since R = .j(n- 1)/n < 1, we further have


T cTXke a ( a ) cTXke ( a) cTXke
c Xky = - - - - lldll:::: 1 - - - - < 1 - - - -
n n nR n n n
Taking logarithms on both sides and using the fact that loge (1- ajn) :::: -ajn, we have
the desired result (6.22).
n
To take care of the other term, - L logeyj, in the potential function, we have the
j=l
following lemma.

Lemma 6.2. If yES'(~.~) then

n
- Llogeyj ::=:- Lloge
n (1) +
- a
2
2 (6.23)
j=l j=l n 2(1 -a)
Proof Since
yES'(~,~)
we know
1 a
Yj::::---
n n
and hence nyj :::: 1 -a, for j = 1, 2, ... , n. Taking the Taylor series expansion of
loge (1 + (n Yj - 1) ), for each j, there is a /Lj between 1 and nyj such that

In other words, we have /Lj :::: 1 -a such that


1 2
loge (nyj) :::: (nyj- 1)- (nyj- 1)
2(1- a) 2
Notice that
Sec. 6.4 Polynomial-Time Solvability 125

and
n

l:)nyi- 1)2 = llny- ell 2 = n2 IIY- _=11 2 ::: a2


J=l n
therefore

and (6.23) follows directly.

Combining (6.22) and (6.23), we see the potential function


2

f (e) a
1 1
f (y) ::: -;; -a+ 2(1 - a)2 for appropriate a

In particular, if we choose a = 1/3, then

/
1
(y)::: /
1
(~)- 5/24
Therefore condition (6.21) is satisfied, and we have the following result as a major
theorem for polynomial-time solvability.

Theorem 6.1. Under the assumptions (A1) and (A2), if a step-length is chosen
to be a = 1/3, then Karmarkar's algorithm stops in O(nL) iterations.

The computational work at each iteration of Karmarkar' s algorithm is dominated


by inverting the matrix BkB[. A simpleminded direct implementation with exact arith-
metic requires O(n 3 ) elementary operations to find the inverse matrix. Hence the total
complexity of Karmarkar's algorithm becomes O(n 4 L). On the other hand, for finite-
precision mathematics, to carry out all computations to the O(L) precision level it re-
quires O(n 3 L) bit operations in inverting a matrix, hence Karmarkar's algorithm requires
a total of 0 (n 4 L 2 ) bit operations. However, as shown by N. Karmarkar, using the rank-
one updating method, the average computation per iteration can be reduced to O(n 25 L)
bit operations with 0 (L) precision. This reduction results in a total of 0 (n 35 L 2 ) bit
operations. Also note that, although when the step-length is set to be 1/3 we can achieve
the theoretic polynomial-time solvability, in real applications we may use much larger
step-length to speed up the convergence. It has been confirmed the new method typically
requires only 20 to 50 iterations to provide highly accurate solutions even for very large
problems .. We shall discuss further implementation issues in Chapter 10.
Note that at each iteration of Karmarkar's algorithm, the current solution always
stays in the interior of the feasible domain, even when the algorithm terminates with
a solution xk such that cT xk < 2-L(cT ejn). In order to obtain an exact extreme-point
optimal solution, we have to further verify the basic and nonbasic solution variables.
This can be done by a polynomial-time procedure called the purification scheme. The
basic idea is quite simple. Looking at problem (6.1), there are n + m + 1 constraints
126 Karmarkar's Projective Scaling Algorithm Chap. 6

(including both explicit and nonnegativity constraints) in total. If n linearly independent


constraints are binding at xk, then it is already a basic feasible solution. Otherwise, we
can find a nonzero direction d in the null space of the binding constraints. If cT d < 0,
then we move along direction d, otherwise along -d, until some additional constraints
become binding for feasibility considerations. Since the feasible domain is bounded, we
can always find a new solution with at least one more binding constraint. The objective
value of this new solution is obviously at least as good as cT xk. Repeating this process,
a basic feasible solution x* can eventually be identified such that cT x* < 2-L(cT ejn).
Since we can begin with the m + 1 linearly independent explicit constraints, the
purification scheme takes at most n - (m + 1) steps. Also note that in each step the
computational complexity is polynomial, hence the purification scheme is a polynomial-
time procedure. An efficient implementation requires a complexity bound of O(m 2 n).
It is also worth mentioning that the diagonal elements of the matrix
XAT (AX2 AT)- 1AX could serve as indicators of optimal basis information. To illustrate
this idea, we further define (M)+ to be the generalized inverse of matrix M, DIAG (M)
to be a column vector formed by the diagonal elements of matrix M, and X to be a di-
agonal matrix with x; as its ith diagonal element. Also for ann- dimensional column
vector p, we define a new column vector
u(p) = DIAG (X~AT (AX~AT)+ AX~)
In this way, we can consider the following method for locating an optimal extreme-point
solution x* from an approximated primal solution xk:

Step 1: Given a small number e > 0, set j = 0 and p 0 = xk.


Step 2: Increase j by 1, compute pj = u (pj -I). Find
I 1 = {i I p( ::: 1 - e, 1 :S i :S n) and Iz = {i I p( :S e, 1 :S i :S n}
Step 3: If I 1 U I 2 = {1, 2, ... , n), then stop. Otherwise, go to Step 2.

It can be shown that, as E: goes to zero, if xk is sufficiently close to a nondegenerate


optimal vertex x* of the linear programming problem, then {pj} converges to a vector
p* with m ones and n - m zeros with a cubic rate of convergence. In practice, when
the above algorithm terminates, we set x;
= 0 for those i E I 2 , and solve the remaining
system of linear equations Ax* = b. Further information can be found in the original
work of R. Tapia and Y. Zhang.

6.5 CONVERTING TO KARMARKAR'S STANDARD FORM

Consider a standard-form general linear programming problem


Minimize cT x (6.24a)
subject to Ax = b (6.24b)
x:::O (6.24c)
Sec. 6.5 Converting to Karmarkar's Standard Form 127

Our objective is to convert this problem into the standard form (6.1) required by
Karmarkar, while satisfying the assumptions (A1) and (A2). We shall first see how to
convert problem (6.24) into Karmarkar's form and then discuss the two assumptions.
The key feature of Karmarkar's standard form is the simplex structure, which of
course results in a bounded feasible domain. Thus we want to regularize problem (6.24)
by adding a bounding constraint

for some positive integer Q derived from the feasibility and optimality considerations.
In the worst case, we can choose Q = 2 L, where L is the problem size. If this constraint
is binding at optimality with the objective value of magnitude -2°(L), then we can show
that the given problem (6.24) is unbounded.
By introducing a slack variable Xn+J, we have a new linear program:
Minimize c7 x (6.25a)
subject to Ax=b (6.25b)
7 (6.25c)
e x+xn+1 = Q

X::=: 0, Xn+1 ::=: 0 (6.25d)


In order to keep the matrix structure of A undisturbed for sparsity manipulation,
we introduce a new variable Xn+2 = 1 and rewrite the constraints of problem (6.25) as
Ax- bxn+2 = 0 (6.26b)
e 7 X+ Xn+1- Qxn+2 = 0 (6.26c)
7
e X+ Xn+1 + Xn+2 = Q + 1 (6.26d)
X ::=: 0, Xn+l ::=: 0, Xn+2 ::=: 0 (6.26e)
Note that the constraint Xn+2 = 1 is a direct consequence of (6.26c) and (6.26d). To
normalize (6.26d) for the required simplex structure, we apply the transformation Xj =
(Q + 1)yj, for j = 1, ... , n + 2, to (6.26). In this way, we have an equivalent linear
programming problem
Minimize (Q + 1)(c7 y) (6.27a)
subject to Ay- byn+2 =0 (6.27b)
7
e Y + Yn+ 1 - Qyn+2 =0 (6.27c)
7
e Y + Yn+l + Yn+2 = 1 (6.27d)

Y ::=: 0, Yn+1 ::=: 0, Yn+2 ::=: 0 (6.27e)


Problem (6.27) is now in the standard form required by Karmarkar. In order to satisfy
assumption (A1), we may introduce an artificial variable Yn+ 3 with a large cost coefficient
128 Karmarkar's Projective Scaling Algorithm Chap. 6

M as designed in the big-M method and consider the following problem:


Minimize (Q + 1)(cT y) + Myn+3 (6.28a)
subject to Ay- byn+2 - [Ae- b]yn+3 = 0; (6.28b)

eT y + Yn+l - QYn+2- (n + 1- Q)Yn+3 = 0; (6.28c)

eT Y + Yn+l + Yn+2 + Yn+3 = 1; (6.28d)


YJ 2: 0, j = 1, ... , n +3 (6.28e)
Notice that y = ej (n + 3) is clearly an initial interior feasible solution to problem
(6.28). Moreover, a value M of magnitude 2°(L) exists which does not increase the
problem size and ensures a zero value of the artificial variable Yn+3 at optimality, provided
that problem (6.27) has a feasible domain.
Taking care of assumption (A2) proposes a more difficult problem for us. It is
obvious that not every linear programming problem has a zero optimal objective value.
However, if somehow the optimal objective value z* of a given linear program is known,
we can simply subtract z* from the objective function (6.28a) to get a zero optimal
objective value. The real challenge comes from those linear programming problems
with unknown optimal objective values. We shall discuss this subject in next section.

6.6 HANDLING PROBLEMS WITH UNKNOWN OPTIMAL OBJECTIVE


VALUES

Assumption (A2) requires the optimal objective value of a given linear program to be
zero. For those linear programming problems with a known optimal objective value, this
assumption can be easily taken care of. But for those with unknown optimal objective
values, we have to figure out a process to obtain that piece of information.
Originally, Karmarkar used the so-called sliding objective function method to handle
the problem. We let z* be the unknown optimum value of the objective function and
pick an arbitrary value z. Suppose we run Karmarkar's algorithm pretending that z is
the minimum value of the objective function, i.e., we try to minimize cT x - for the z
given linear program. We also modify Step 3 of Karmarkar' s algorithm as follows:
"After finding yk+ 1 we check if
cTXkyk+l
---==--- <
eTXkyk+l
z
If so, we choose a point yk+
1
on the line segment between ft and yk+l such that
cTXkyk+l -
eTXkyk+l = Z

and assign xk+l = T- 1(yk+l) instead of T- 1(yk+l )."


z,
In this way, if z* ::s then at each iteration of Karmarkar's algorithm, either we
obtain a constant reduction (say 1/5 in our case) in the potential function or find a point
Sec. 6.6 Handling Problems with Unknown Optimal Objective Values 129

that achieves the assumed minimum z. On the other hand, for z < z*, eventually we get
a proof that the assumed minimum is lower than the actual minimum by noticing that
Karmarkar' s iteration is no longer able to produce a constant reduction in the potential
function.
With this modification, we can describe the sliding objective function method as
follows. Given that a lower bound l and an upper bound u on the objective function
are known (otherwise, we can take l = -2°(L) and u = 2°(L) to start with), we further
define a tentative lower bound l' and upper bound u' by
l' = l + (lj3)(u -l) (6.29)
and
u' = l + (2j3)(u - l) (6.30)
We pretend that l' is the minimum value of the objective function and run the modified
algorithm. Karmarkar showed that in a polynomial number of iterations, the algorithm
either identifies that l' is lower than the actual minimum or finds a feasible solution with
an objective value lower than u'. For suppose l' is not lower than the actual minimum;
then the constant reduction in the potential function in each iteration will force cr x to
be lower than u'. When l' is found to be too low or u' is too high, we replace l by l'
or u by u' correspondingly and rerun the algorithm. Since the range u - l ?: 0 shrinks
geometrically after each run, we know that in O(nL) runs the range is reduced from
2°CLJ to 2-0(Ll and an optimal solution will be identified.
Another way to handle the unknown optimal objective values is to use the infor-
mation of dual variables. Consider the dual of the linear programming problem (6.1).
We have
Maximize z (6.31a)
m
subject to L a;j w; +z ~ Cj, j = 1, 2, ... , n (6.31b)
i=l

wE R"', zE R (6.31c)
Notice that the dual problem (6.31) is always feasible, since we can choose any value
of w,, w2, ... , Wm and let

z =.min
J=!, ... ,n
(cj- taijw;)
i=l
(6.32)

such that (w, z) becomes a feasible solution to problem (6.31). For simplicity, we can
write (6.31b) as
(6.31b')
and write (6.32) as
z = min(c- AT w)j (6.321)
J
130 Karmarkar's Projective Scaling Algorithm Chap. 6

If a given linear program (6.1) satisfies assumption (A2), then we know z ::s 0 in the dual
problem (6.31). Moreover, any dual feasible solution (w, z) provides a lower bound for
the optimal objective value z* of problem (6.1). One immediate question is, how do we
define dual variables associated with each iteration of Karmarkar's algorithm? With this
information, then we discuss how to use this dual information to handle problems with
unknown optimal objective values.
To get a hint on the definition of dual variables at each iteration, we first consider
the form of the dual variables (w*, z*) at optimum. Assume that x* is the optimal
solution to problem (6.1) and denote matrix X* = diag (xj, ... , x;). At optimum, we
know ATw* ::S c. By complementary slackness, we further have X*ATw* = X*c. In
order to represent w* in terms of x*, we multiply AX* on both sides. Hence we have
A(X*) 2 AT w* = A(X*) 2 c (6.33)
This suggests that we might obtain good dual solutions by defining
wk = (AX~AT)- 1 AX~c (6.34)
and
i = min(c- AT wk)j (6.35)
J

at each iteration of Karmarkar' s algorithm. This is indeed true under the nondegeneracy
assumption, owing to the following theorem:

Theorem 6.2. Under the assumptions (A1) and (A2), if the iterates {xk} defined in
Karmarkar' s algorithm converge to a nondegenerate basic feasible solution x* of problem
(6.1), then {(wk, zk)} defined by (6.34) and (6.35) converges to an optimal solution of
its dual problem (6.31).
Proof Let x* be the principal submatrix of X* corresponding to the basic variables
in x* and

[~]
be the basis matrix of the given linear program corresponding to x*. Then A has
rank m and so does AX*. Hence we know A(X*) 2 AT is nonsingular. Consequently,
A(X*) 2 AT = A(X*) 2 AT is nonsingular.
By definition (6.34), we know (AXfAT)wk = AXfc fork = 1, 2, .... Noticing
that matrix (AXfAT) converges to the nonsingular matrix A(X*) 2 AT and vector AX~c
converges to A(X*) 2 c, it follows that wk converges to the unique solution w* of Equa-
tion (6.33). But we already know that the optimal solution to problem (6.31) also satisfies
Equation (6.33), hence {(wk, zk)} must converge to the optimal dual solution.

The nondegeneracy assumption in Theorem 6.2 is essential to its validity. In order


to deal with the general case as well as to handle problems with unknown optimal
objective values, Todd and Burrell proposed a new way to define dual variables at each
Sec. 6.6 Handling Problems with Unknown Optimal Objective Values 131

iteration. Their basic idea is to incorporate dual information {(wk, zk)} into Karmarkar's
algorithm, with {zk} being monotonically nondecreasing such that i can be used as an
estimate of the unknown optimum value of the objective function.
Notice that for a primal feasible solution x, cT x -l = cT x- zkeT x = (c- zke)T x,
therefore we define

(6.36)

In this way, when z* is unknown, we can consider replacing c by c(zk) in the objective
function at the kth iteration as an estimate. Now, assume that we can modify Karmarkar's
algorithm by finding a sequence of feasible solutions xk, wk, and zk such that

xk E F = {x ERn I Ax= 0, eT x = 1, x > 0} (6.37)


wk E Rm (6.38)
l = min(c- AT ~)j (6.39)
1

(6.40)

at each iteration, for k = 0, 1, .... Then, before the optimum is reached, we know
zk ::=: z* < cT xk. Moreover, (6.37) and (6.40) directly imply that cT xk ::=: cT x 0 and hence

Together with the definition of potential function (6.18) and inequality (6.40), we know
that
k
f(xk; c(z*)) ::=: f(x 0; c(z*))- S (6.41)

Therefore, the modified algorithm will converge in the same way as Karmarkar's al-
gorithm. The remaining question is how to construct such a sequence of improved
solutions.
Fork= 0, since we know how to take care of assumption (A1), we can choose
o e
X =-
n
and corresponding z0 . Then (6.37)-(6.40) are clearly satisfied. We now are interested
in knowing how to find xk+ 1, wk+l, and zk+ 1 satisfying (6.37)-(6.40), given that we
proceed through the kth iteration. Before doing so, we need some notations and a key
lemma. First for a p xn matrix M with rank p, we denote by PM= I-MT(MMT)- 1M
the projection mapping onto the null space of M, i.e., {d E Rn I Md = 0}. Also denote
by
eeT
Pe=l--
n
132 Karmarkar's Projective Scaling Algorithm Chap. 6

the projection mapping onto {d E Rn jeT d = 0}. Furthermore, we denote

B= -
A [AeT l (6.42)

Suppose that A has full row rank and Ae = 0, then B has full row rank and

Pp, = P;..Pe = PeP;,. (6.43)


The key lemma is stated as follows.

Lemma 6.3. In applying the modified Karmarkar' s algorithm with a given cost
vector c E Rn and explicit constraint matrix A such that Ae = 0, let dk = - Pp,c,
w = (AAT)- 1Ac, and z= min(c- AT w)j. Then we have

(cr e) + ~(Z)
1

cT (~n + (~) ~)
n !Jdkll
:::: ( 1_ ~)
n n n
Proof Since dk is the projection of -c, we have Jjdk jj 2 = cT Pp,c = -CT dk, and

AT e a dk ) cAT e a k
c (-
n
+ (n- ) - =---lid
JJdkJJ n n
II
Thus it suffices to show that
AT
k c e A

lid 11::::--z
n
Notice that
dk = -Pp,c = -PeP;..c = -Pe(c- AT w) =-(c-AT w- eeT (c-AT w)jn)
Since Aejn = 0, we get

Also before the optimum is reached,


cr e
->z
A

n
For some i, we have
A
Z=C-
(A AA T Wi
A)

hence
AT
k c e A

di = - - z :::: 0 and
n
With the help of Lemma 6.3, we show how to find xk+ 1 , wk+ 1 and zk+ 1 after the
kth iteration. Let w = (AX~AT)- 1 AX~c(zk) and z = min(c- ATw)j. There are two
J
cases, depending upon whether z :::: zk.
Sec. 6.6 Handling Problems with Unknown Optimal Objective Values 133

Case 1. If z .:s zk, then z will not be a better estimate than zk. We shall focus
on satisfying (6.37) and (6.40). In this case, since
min(c(zk) -AT w)j _:s 0
1

and xk E F, we have
min(Xkc(zk) - XkAT w)j .:S 0
1

We now apply Lemma 6.3 with c


= Xkc(i), A = AXb and B = Bk. Since the
corresponding zis nonpositive, this tells that F x can be reduced by a factor of (1-ajn)
by taking a step length of a. Thus the potential function f(·; c(zk)) can be reduced by
at least 1/5 as before, if we move in the original space along the direction
dk = -XkPBkXk (c- zke)
This suggests that we set wk+ 1 = wk, zk+ 1 = zk and move along dk for new xk+ 1, then
(6.37)- (6.40) holds for the (k + l)th iteration.

Case 2. For z > zk, then min(c(zk)- AT w)j > 0 and


1

(6.44)

Note that
Xkc (zk) - XkAT w = PAxkXkc (i) = PAxk (Xkc- ixk)
If we denote u = PAxkXkc and v = PAxkxk, then
Xkc(l) -XkATw=u-lv
and (6.44) becomes
min (u- zkv)j > 0.
1

Now let z = cT xk > zk. We see that


(eT(u-zv)PAxke)T (Xkc-zxk) =eT (Xkc-zxk) =cTxk -z=O
Therefore, min(u - zv)j .:s 0, since the sum of its components is zero. Consequently,
1
there exists zk+ 1 with zk < zk+ 1 .:S z such that
rnj.n(u-i+ 1v)j =0
1

In this case, zk+ 1 becomes a better estimate and we can define


1
~+! = (AX~ATr AX~c (zk+l) (6.45)
Note that
(6.46)
134 Karmarkar's Projective Scaling Algorithm Chap. 6

Since xk > 0, we know min(c(zk+l)- ATwk+ 1)1 = 0, and hence


}

min (c-AT wk+ 1) = i+ 1 (6.47)


} 1

Thus zk < zk+ 1 ~ z*. Combining (6.40) with the definition of the potential function, we
can show that

(6.48)

Moreover, from (6.46), we know min(Xkc(zk+ 1) - XkATwk+ 1) 1 = 0, hence Lemma 6.3


}

can be applied with = c Xkc(zk+ 1), A= AXk> and B = Bk. Since the corresponding
z = 0, the potential function f ( · ; c(zk+ 1)) can be reduced by at least 1j 5 as before by
moving in the original space along the direction
dk = -XkPskXk (c -l+ 1e)
Combining the analysis of both cases, we state the modified step in Karmarkar' s
algorithm as follows:

At iteration k with xk, wk, and zk, set Xk = diag (xk), compute

u = PAxkXkc, v = PAxkxk
If min(u- zkv)1 ~ 0, then set
1

Otherwise, find
with min(u -l+ 1v)1 = 0
}

and set
wk+! = (AX~AT)- 1 AX~c(l+ 1 )

Compute dk = -XkPe(u- zk+ 1v), and

-k+! k 1 dk
X = X + 3n Tldkli
Set

The modified algorithm then generates a sequence {xk} of primal feasible solutions
and a sequence {(wk, zk)} of dual solutions such that both cT xk and zk converge to the
unknown optimal objective value z*.
Sec. 6.7 Unconstrained Convex Dual Approach 135

6.7 UNCONSTRAINED CONVEX DUAL APPROACH

As pointed out in the previous section, the dual problem of Karmarkar' s linear program
inherits some interesting properties. In this section, we show that, given an arbitrarily
small number£ > 0, an £-optimal solution to a general linear program in Karmarkar's
standard form can be found by solving an unconstrained convex programming problem.
Let us focus on the linear programming problem (6.1) and its dual problem (6.31)
with an additional assumption that problem (6.1) has a strictly interior feasible solution
x such that Xj > 0 for j = 1, ... , n. We consider the following simple geometric
inequality:

LeYj:::: IT
n n {
-eYjx· }Xj (6.49)
j=l j=l 1

which holds for arbitrary Yj E R, and Xj > 0, j = 1, 2, ... , n, with


n

The equality in (6.49) occurs if and only if

j = 1,2, ... ,n (6.50)

for a constant A. > 0. We further expand (6.49) by substituting

for j = 1, 2, ... , n and f.J. > 0

Taking logarithms on both sides and rearranging terms, we have

t
J=l
(t
1=l
aijwi- Cj) Xj :S f.J. t x j
J=!
logexj +f.J.lOge {t [(t
J=l
exp
1=l
aijWi- Cj) I f.J.l }
(6.51)
which holds true for arbitrary Wi E R, i = 1, 2, ... , m, Xj > 0, j = 1, 2, ... , n with
m
LXj = 1, and f.J. > 0. Moreover, inequality (6.51) becomes an equality if and only if:
j=l

j = 1, 2, ... , n (6.52)

Now, let us assume that the n-vector x also satisfies the constraint (6.1b) of the
linear programming problem. Then
n
LaijXj = 0, i = 1, 2, ... , m
j=l
136 Karmarkar's Projective Scaling Algorithm Chap. 6

and
n n

-I: CjXj = -I: CjXj (6.53)


j=l j=l

Hence, after rearrangement, (6.51) reduces to

-{L loge {t
j=l
exp [ (t t=l
aijWi - Cj) j f.Ll } :S CT X+ f.L t
j=l
Xj loge Xj (6.54)

which holds true for arbitrary Wi E R, i = 1, 2, ... , m, and Xj > 0, j = 1, 2, ... , n,


satisfying constraints (6.1b), (6.lc). The equality holds if and only if (6.52) is true. Note
n
that in (6.54), the term L:xj loge Xj is usually named the entropy function associated
j=l
with a probability function.

6.7.1 €-Optimal Solution

In nonlinear programming literature, inequality (6.54) is usually referred to as the "weak


duality theorem," where the right-hand side is minimized and the left-hand side is max-
imized. To derive an .c:-optimal solution we simply consider the maximization of the
left-hand side of (6.54), with respect to unconstrained Wi, i = 1, 2, ... , m. If we let

(6.55)

it can be shown that h(w; f.L) is a strictly concave function of w. Also, under the
assumption that there is a feasible interior solution to the linear programming problem
(6.1), inequality (6.54) implies that h(w; f.L) is bounded from above. Hence a unique
maximum solution w* exists.
Taking derivatives of h(w; f.L) at w*, we have

i = 1,2, ... ,m (6.56)


Taking second-order derivatives, we can easily verify that w* really achieves the maxi-
mum of h(w; f.L) over wE Rn.
Let us define the n-vector x* as follows:

j = 1,2, ... ,n (6.57)


Sec. 6.7 Unconstrained Convex Dual Approach 137

Then, equation (6.56) implies that x* satisfies the constraint (6.lb), and equation (6.57)
implies that x* satisfies the constraints (6.lc). Hence x* is a feasible solution to problem
(6.1). Moreover, each xJ satisfies the condition specified in (6.52) with

and hence, (6.54) becomes an equality with x and w being replaced by x* and w*,
respectively. We summmarize previous results as follows:

Theorem 6.3. Let w* be the unique maximum of the concave function h(w; J.L)
with f.L > 0. If x* is defined by Equation (6.57), then

h(w*; J.L) = -J.L loge { t


J=l
exp [ (t
z=l
aijw;- Cj) /J.Ll } = cT x* + J.L t x j loge xj
J=l
(6.58)
Notice that, for x ?:: 0 and eT x = 1,

Consequently, we know h(w*; J.L) approaches cT x* as f.L goes to 0. Hence, when J.L
is sufficiently small, we can find a near-optimal solution x* to the linear programming
problem (6.1) by solving an unconstrained maximization problem of the concave ob-
jective function h(w; J.L), or equivalently, minimizing an unconstrained convex function
-h(w; J.L). The remaining question is, "How small should f.L be such that x* obtained
by (6.57) is e-optimal, i.e., cT x* - z* ::::: e?"
To answer this question, we define

(6.59)

It can be easily seen that ( wr, ... , w;,


z*) is a feasible solution to the dual program
(6.31). Without loss of generality, we assume that the minimum of the right-hand side
of Equation (6.59) occurs at j = 1 and
m
z* = (c1- L:anw7) (6.60)
i=l

Taking the logarithm of xt as defined in Equation (6.57), and multiplying the result by
f.L, we have
138 Karmarkar's Projective Scaling Algorithm Chap. 6

Combining (6.58) and (6.60), we see that


n
f.L loge x~ = -z* + cT x* + f.L L xj* loge x/ (6.62)
j=l

Moreover, from the theory of linear programming, we know 0 ::: cT x* - z*. Moreover,
n
0 -< cT x* - z* = ru.loooe x*1 - L....t x*1 log e x*1
11. ' \ " '
r
j=l

n n

= f.L 2.:: xj loge xr - f.L 2.:: x; loge x;


j=l j=l

= f.L t
j=l
loge (:~) xj
1
(6.63)

Since xJ > 0, for j = 1, 2, ... , n, and


n

L:xj* =1
j=l

considering the geometric inequality again, we have

x* ) x~'
n
'\"' x* >
L....t l -
IJ n (
_!_
* (6.64)
j=l j=l xj
Since 1 2:: xi, we have

n::=:IJn ( x*) )x~'


j=l 1
(6.65)

Therefore,

x* ) '
L loge
n (
)
x*
:S loge n (6.66)
j=l 1

and Equation (6.63) reduces to


(6.67)
Now for any given tolerance s > 0, we can define f.L = £/loge n to guarantee that
0 ::: cT x* - z* ::: s.
Hence we have the following result:

Theorem 6.4. For any given s > 0, we choose


£
f.L = - -
loge n
Sec. 6.7 Unconstrained Convex Dual Approach 139

and let w* be the unique minimum of the convex function -h(w; p.,). If x* is defined
by Equation (6.57), then
(6.68)
and (x*; w*, z*) becomes an .s-optimal solution pair to the primal problem (6.1) and its
dual problem (6.31).
The following example illustrates the unconstrained dual approach to linear pro-
gramming problems in Karmarkar' s standard form.
Example 6.5
Minimize - x3

subject to XJ - x2 = 0

XJ +x2 +x3 =I

It is easy to see that (0, 0, I) is the optimal solution. In this case, we have a corre-
sponding unconstrained convex programming problem:
Minimize J.doge {exp[z/ J.t] + exp[ -z/ J.t] + exp[I/ J.t]}
subject to zE R

Taking its first-order necessary condition, we see z* = 0. Also by (6.57), we have


I x3 = _ _exp[l/ J.tl
_::._::._:_:_:__,.--
X~ =X~ = - - - - - - , . -
(I + I + exp[l/ J.tD' (1 + 1 + exp[l/ J.tD
Therefore, both xf and x; decrease to 0 and x; increases to I as J.t decreases to 0.

The unconstrained convex programming approach allows us to have a different view


of the linear programming problems. The potential of customizing different unconstrained
optimization techniques, including the descent methods, conjugate direction methods,
and quasi-Newton methods, for finding an .s-optimal solution to the linear programming
problem is certainly worthy of further exploration.

6.7.2 Extension

The work in the previous section actually suggests us to consider a perturbed problem
(Pf.l..):
n
Minimize cT x + p., L Xj loge Xj
j=!

subject to Ax = 0

eT X= 1
x;:::O
140 Karmarkar's Projective Scaling Algorithm Chap.6

and its unconstrained convex dual program (DM):

subject to w E Rm
Under the assumption that problem (6.1 ), and hence ( PJL), has an interior feasible
solution, there is no duality gap between problems (PM) and (DM). Moreover, when A
has full row rank, for any given tolerance 8 > 0, by choosing
8
f.l,=--
loge n
the optimal solution w* of problem DJL generates a primal feasible solution x* of problem
(6.1), according to Equation (6.57), and a dual feasible solution (w*, z*) of problem
(6.31), according to Equation (6.59), such that I cr x* - z* I ::s 8.
For a linear programming problem in its standard form, we consider a corresponding
problem (P~):
n

Minimize cT x + JL L XJ loge XJ
}=l

subject to Ax = b
X>O

Replacing the inequality (6.19) by the following one:

fort> 0 (6.69)

and following a similar derivation procedure with

for j = 1,2, ... ,n (6.70)

we can construct an unconstrained concave program (D~):

Maximize h' (w; JL) = t


t=l
biwi - JL { t;=1
exp ( [ (tt=i
aiJwi - CJ) / JL]- 1) }
subject to w E Rm

With an additional assumption that problem (P~) has a bounded feasible domain,
a sufficiently small JL can be determined such that the optimal solution w* of problem
(D~) generates an 8-optimal solution x* to the original linear programming problem in
References for Further Reading 141

standard form according to the following conversion formula:

for j = 1, 2, ... , n (6.71)

6.8 CONCLUDING REMARKS

Karmarkar' s projective scaling algorithm has stimulated a great amount of research in-
terest in linear programming. Since the work was introduced in 1984, many variants
have been proposed and many more are to come. The fundamental difference between
Karmarkar' s algorithm and· simplex methods is the philosophy of moving in the interior
versus moving on the boundary of the polytope. It is not true that Karmarkar-based
interior-point methods are going to replace the simplex methods, at least in the foresee-
able future. Both approaches are very sensitive to the structure of problems. The per-
formance is heavily affected by the sophistication of implementation. A hybrid method
of using the interior approach at the beginning for drastic reduction and shifting to the
simplex method for a final basic feasible solution seems attractive. We shall study the
interior-point methods further and discuss implementation issues in coming chapters.

REFERENCES FOR FURTHER READING

6.1. Anstreicher, K. M., "A combined phase I- phase II projective algorithm for linear program-
ming," Mathematical Programming 43, 209-223 (1989).
6.2. Anstreicher, K. M., "On the performance of Karmarkar's algorithm over a sequence of
iterations," SIAM Journal on Optimization I, 22-29 (1991).
6.3. Bayer, D., and Lagarias, J. C., "Karmarkar's linear programming algorithm and Newton's
method," Mathematical Programming 50, 291-330 (1991).
6.4. Fang, S. C., "A new unconstrained convex programming approach to linear programming,"
OR Report No. 243, North Carolina State University, Raleigh, NC (1990), Zeischrift fiir
Operations Research 36, 149-161 (1992).
6.5. Fang, S. C., and Tsao, J. H-S., "Solving standard form linear programs via unconstrained
convex programming approach with a quadratically convergent global algorithm," OR Report
No. 259, North Carolina State University, Raleigh, NC (1991).
6.6. Gay, D., "A variant of Karmarkar's linear programming algorithm for problems in standard
form," Mathematical Programming 37, 81-90 (1987).
6.7. Hooker, J. N., "Karmarkar's linear programming algorithm," Interfaces 16, 75-90 (1986).
6.8. Karmarkar, N., "A new polynomial time algorithm for linear programming," Proceedings of
the 16th Annual ACM Symposium on the Theory of Computing, 302-311 (1984).
6.9. Karmarkar, N., "A new polynomial time algorithm for linear programming," Combinatorica
4, 373-395 (1984 ).
6.10. Kojima, M., "Determining basic variables of optimal solutions in Karmarkar's new LP algo-
rithm," Algorithmica 1, 499-515 (1986).
142 Karmarkar's Projective Scaling Algorithm Chap. 6

6.11. Kortanek, K. 0., and Zhu, J., "New purification algorithms for linear programming," Naval
Research Logistics 35, 571-583 (1988).
6.12. Monteiro, R. C., "Convergence and boundary behavior of the projective scaling trajectories
for linear programming," Mathematics of Operations Research 16, No. 4 (1991).
6.13. Rajasekera, J. R., and Fang, S.C., "On the convex programming approach to linear program-
ming," Operations Research Letters 10, 309-:312 (1991).
6.14. Shanno, D. F., "Computing Karmarkar projection quickly," Mathematical Programming 41,
61-71 (1988).
6.15. Shub, M., "On the asymptotic behavior of the projective rescaling algorithm for linear pro-
gramming," Journal of Complexity 3, 258-269 (1987).
6.16. Stone, R. E., and Tovey, C. A., "The simplex and projective scaling algorithm as iteratively
reweighted least squares methods," SIAM Review 33, 220-237 (1991).
6.17. Tapia, R. A., and Zhang, Y., "Cubically convergent method for locating a nearby vertex in
linear programming," Journal of Optimization Theory and Applications 67, 217-225 (1990).
6.18. Tapia, R. A., and Zhang, Y., "An optimal-basis identification technique for interior-point
linear programming algorithms," Linear Algebra and Its Applications, 152, 343-363 (1991).
6.19. Todd, M. J., and Burrell, B. P., "An extension to Karrnarkar's algorithm for linear program-
ming using dual variables," Algorithmica 1, 409-424 (1986).
6.20. Todd, M. J., and Ye, Y., "A centered projective algorithm for linear programming," Mathe-
matics of Operations Research 15, 508-529 (1990).
6.21. Vanderbei, R. J., "Karmarkar's algorithm and problems with free variables," Mathematical
Programming 43, 31-44 (1989).
6.22. Ye, Y., "Karmarkar's algorithm and the ellipsoidal method," Operations Research Letters 4,
177-182 (1987).
6.23. Ye, Y., "Recovering optimal basic variables in Karmarkar's polynomial algorithm for linear
programming," Mathematics of Operations Research 15, 564-572 (1990).
6.24. Ye, Y., and Kojima, M., "Recovering optimal dual solutions in Karmarkar's polynomial
algorithm for linear programming," Mathematical Programming 39, 305-317 (1987).

EXERCISES

6.1. Focus on the n-dimensional Euclidean space.


(a) For a given point x E t::,., looking at its coordinates, how can we identify it is a vertex
of t::,.? on an edge of t::,.? in the interior of !::,. ? at the center of !::,. ?
(b) From (a) prove that !::,. has n vertices and C(n, 2) edges.
(c) Show that the distance between the center and any vertex of !::,. is given by

R=--
Jn=l
.fo
and the distance between the center and any facet of !::,. is given by

r = --./;=n:;=(n=-=1;::;:)
Exercises 143

6.2. For a projective transformation Tx:, prove results (Tl) through (T6). What can one say about
its inverse transformation?
6.3. Does the projective transformation Tx: map a line segment in 6. to a line segment? Why?
6.4. Why is x(a) in Equation (6.11) an interior feasible solution to problem (6.1)? Prove it.
6.5. Show that if matrix A in Equation (6.9) has full rank, then the matrix BBT is invertible and
hence the direction din Equation (6.12) is well defined.
6.6. Carry out one more iteration of Example 6.4. Is it closer to the optimal solution?
6.7. Show that the function

achieves its maximum value at


X * =-e for x E 6.
11

6.8. Apply Karmarkar's algorithm to solve Example 6.2.


6.9. Convert the linear programming problems in Exercise 3.16 into Karmarkar' s standard form
satisfying Assumption (AI).
6.10. In solving problem (6.25) with Q = 2L, if (6.25c) becomes a binding constraint at optimality
with the objective value of magnitude -2°(L), show that problem (6.24) is unbounded.
6.11. Show how the inequality cT xk :::: cT x 0 is implied by (6.37) and (6.40).
6.12. Show that (6.43) is true under the assumptions that A has full row rank and Ae = 0.
6.13. Consider h(w; ~) as defined by (6.55).
(a) Find its gradient vector Vh(w; ~).
(b) Find its Hessian matrix H(w; ~).
(c) Show that H(w; ~) = ADAT for a special diagonal matrix D with negative diagonal
elements.
(d) When A is assumed to be of full row rank, show that H(w; ~) is symmetric, negative
definite.
(e) Conclude that h(w; ~) is a strictly concave function of w.
6.14. Derive the dual objective function h(w; ~)for Example 6.2.
6.15. Code Karmarkar's algorithm and test the linear programming problems of Exercise 6.9.
7

Affine Scaling I orithll1s

Since its introduction in 1984, Karmarkar's projective scaling algorithm has become
the most notable interior-point method for solving linear programming problems. This
pioneering work has stimulated a flurry of research activities in the field. Among all
reported variants of Karmarkar' s original algorithm, the affine scaling approach especially
attracted researchers' attention. This approach uses the simple affine transformation to
replace Karmarkar's original projective transformation and allows people to work on the
linear programming problems in standard form. The special simplex structure required
by Karmarkar' s algorithm is relaxed.
The basic affine scaling algorithm was first presented by I. I. Dikin, a Soviet
mathematician, in 1967. Later, in 1985, the work was independently rediscovered by
E. Barnes and R. Vanderbei, M. Meketon, and B. Freedman. They proposed using the
(primal) affine scaling algorithm to solve the (primal) linear programs in standard form
and established convergence proof of the algorithm. A similar algorithm, the so-called
dual affine scaling algorithm, was designed and implemented by I. Adler, N. Karmarkar,
M. G. C. Resende, and G. Veiga for solving (dual) linear programs in inequality form.
Compared to the relatively cumbersome projective transformation, the implementation of
both the primal and dual affine scaling algorithms become quite straightforward. These
two algorithms are currently the variants subject to the widest experimentation and exhibit
promising results, although the theoretical proof of polynomial-time complexity was lost
in the simplified transformation. In fact, N. Megiddo and M. Shub's work indicated
that the trajectory leading to the optimal solution provided by the basic affine scaling
algorithms depends upon the starting solution. A bad starting solution, which is too
close to a vertex of the feasible domain, could result in a long journey traversing all
vertices. Nevertheless, the polynomial-time complexity of the primal and dual affine
scaling algorithms can be reestablished by incorporating a logarithmic barrier function
on the walls of the positive orthant to prevent an interior solution being "trapped" by the

144
Sec. 7.1 Primal Affine Scaling Algorithm 145

boundary behavior. Along this direction, a third variant, the so-called primal-dual affine
scaling algorithm, was presented and analyzed by R. Monteiro, I. Adler, and M. G. C.
Resende, also by M. Kojima, S. Mizuno, and A. Yoshise, in 1987. The theoretical issue
of polynomial-time complexity was successfully addressed.
In this chapter, we introduce and study the abovementioned variants of affine
scaling, using an integrated theme of iterative scheme. Attentions will be focused on
the three basic elements of an iterative scheme, namely, (1) how to start, (2) how to
synthesize a good direction of movement, and (3) how to stop an iterative algorithm.

7.1 PRIMAL AFFINE SCALING ALGORITHM

Let us consider a linear programming problem in its standard form:


Minimize cT x (7 .I a)

subject to Ax = b, x 2: 0 (7 .I b)
where A is an m x n matrix of full row rank, b, c, and x are n-dimensional column
vectors.
Notice that the feasible domain of problem (7 .1) is defined by
P = {x E R 11 I Ax= b, x 2: 0}
We further define the relative interior of P (with respect to the affine space
{xiAx = b}) as
P 0 = {x E R11 1Ax= b, x > 0} (7.2)
An n-vector x is called an interior feasible point, or interior solution, of the linear
programming problem, if x E P 0 . Throughout this book, for any interior-point approach,
we always make a fundamental assumption
pO =/= ¢
There are several ways to find an initial interior solution to a given linear programming
problem. The details will be discussed later. For the time being, we simply assume that
an initial interior solution x0 is available and focus on the basic ideas of the primal affine
scaling algorithm.

7.1.1 Basic Ideas of Primal Affine Scaling

Remember from Chapter 6 the two fundamental insights observed by N. Karmarkar in


designing his algorithm. Since they are still the guiding principles for the affine scaling
algorithms, we repeat them here:

(1) if the current interior solution is near the center of the polytope, then it makes sense
to move in the direction of steepest descent of the objective function to achieve a
minimum value;
146 Affine Scaling Algorithms Chap. 7

(2) without changing the problem in any essential way, an appropriate transformation
can be applied to the solution space such that the current interior solution is placed
near the center in the transformed solution space.

In Karmarkar' s formulation, the special simplex structure

b.= {x ERn I X!+ ... + Xn = 1, X; :=:: 0, i = 1, ... , n}

and its center point ejn = (ljn, 1/n, ... , 1/n)T were purposely introduced for the re-
alization of the above insights. When we directly work on the standard-form problem,
the simplex structure is no longer available, and the feasible domain could become an
unbounded polyhedral set. All the structure remaining is the intersection of the affine
space {x E Rn I Ax = b} formed by the explicit constraints and the positive orthant
{x E Rn 1 x :=:: 0} required by the nonnegativity constraints. It is obvious that the non-
negative orthant does not have a real "center" point. However, if we position ourselves
at the point e = (1, 1, ... , 1) T, at least we still keep equal distance from each facet,
or "wall," of the nonnegative orthant. As long as the moving distance is less than one
unit, any new point that moves from e remains in the interior of the nonnegative orthant.
Consequently, if we were able to find an appropriate transformation that maps a cur-
rent interior solution to the point e, then, in parallel with Karmarkar' s projective scaling
algorithm, we can state a modified strategy as follows.
"Take an interior solution, apply the appropriate transformation to the solution space so as
to place the current solution at e in the transformed space, and then move in the direction of
steep descent in the null space of the transformed explicit constraints, but not all the way to
the nonnegativity walls in order to remain as an interior solution. Then we take the inverse
transformation to map the improved solution back to the original solution space as a new
interior solution. Repeat this process until the optimality or other stopping conditions are
met."

An appropriate transformation in this case turns out to be the so-called affine


scaling transformation. Hence people named this variant the affine scaling algorithm.
Also, because it is directly applied to the primal problems in standard form, its full name
becomes the primal affine scaling algorithm.

Affine scaling transformation on the nonnegative orthant. Let xk ERn


be an interior point of the nonnegative orthant R~, i.e., xf > 0 for i = 1, ... ,n. We
define an n x n diagonal matrix
xk 0

~
n
X~
X, = diag (x!') = (7.3)

l 0 n

It is obvious that matrix Xk is nonsingular with an inverse matrix XJ: 1 , which is also a
diagonal matrix but with 1j xt
being its i th diagonal element for i = 1, ... , n.
Sec. 7.1 Primal Affine Scaling Algorithm 147

The affine scaling transformation is defined from the nonnegative orthant R~ to


itself by
(7.4)
Note that transformation (7.4) simply rescales the ith component of x by dividing a
positive number xf. Geometrically, it maps a straight line to another straight line.
Hence it was named the affine scaling transformation. Figure 7.1 illustrates the geometric
picture of the transformation in two-dimensional space. Note that for the two-dimensional
inequality constraints, such as the case depicted by Figure 7.1, the scaling variables
include the slack variables, too. As a matter of fact, each edge of the polygon corresponds
to a slack variable being set to zero. However, it is difficult to represent the whole picture
in the same figure.
Yt

L-------~======~----~ L-------~--~----------Y2

Figure 7.1

The following properties of Tk can be easily verified:

(Tl) n is a well-defined mapping from R~ to R~, if xk is an interior point of R~.


(T2) Tk(xk) =e.
(T3) Tk(x) is a vertex of R~ if x is a vertex.
(T4) Tk(x) is on the boundary of R~ if xis on the boundary.
(T5) Tk(x) is an interior point of R~ if x is in the interior.
(T6) Tk is a one-to-one and onto mapping with an inverse transformation Tk-l such
that
for each y E R~. (7.5)

Primal affine scaling algorithm. Suppose that an interior solution xk to the


linear programming problem (7.1) is known. We can apply the affine scaling transfor-
148 Affine Scaling Algorithms Chap. 7

mation Tk to "center" its image at e. By the relationship x = Xky shown in (7.5), in the
transformed solution space, we have a corresponding linear programming problem
Minimize (ck) T y (7.1'a)
subject to Aky = b, y 2:: 0 (7.1'b)
where ck = Xkc and Ak = AXk.
In Problem (7.1'), the image of xk, i.e., yk = Tk(xk), becomes e that keeps unit
distance away from the walls of the nonnegative orthant. Just as we discussed in Chapter
6, if we move along a direction d~ that lies in the null space of the matrix Ak = AXk for
an appropriate step-length ak > 0 , then the new point yk+l = e + akd; remains interior
feasible to problem (7.1'). Moreover, its inverse image xk+I = Tk- 1 (yk+ 1) = Xkyk+I
becomes a new interior solution to problem (7 .1 ).
Since our objective is to minimize the value of the objective function, the strategy
of adopting the steepest descent applies. In other words, we want to project the negative
gradient -ck onto the null space of matrix Ak to create a good direction d~ with improved
value of the objective function in the transformed space. In order to do so, we first define
the null space projection matrix by
Pk =I- A[ (AkA[)- 1Ak =I- XkAT (AX~AT)- 1 AXk (7.6)

Then, the moving direction d;, similar to (6.12), is given by


d~ = Pk(-ck) =-[I- XkAT (AX~AT)- 1 AXk]Xkc (7.7)
Note that the projection matrix Pk is well defined as long as A has full row rank
and xk > 0. It is also easy to verify that AXkdk = 0. Figure 7.2 illustrates this projection
mapping.

-ck

yk

'',,Constant objective
',',, plane

Figure 7.2

Now we are in a position to translate, in the transformed solution space, the current
interior solution yk = e along the direction of d; to a new interior solution yk+I > 0 with
an improved objective value. In doing so, we have to choose an appropriate step-length
ak > 0 such that

(7.8)
Sec. 7.1 Primal Affine Scaling Algorithm 149

Notice that if d~ :::: 0, then ak can be any positive number without leaving the
interior region. On the other hand, if (d;)i < 0 for some i, then ak has to be smaller
than

Therefore we can choose 0 < a < 1 and apply the minimum ratio test

ak = min {-;--
I -(dy)i
(d~)i < o} (7.9)

to determine an appropriate step-length that guarantees the positivity of yk+ 1. When a


is close to 1, the current solution is moved "almost all the way" to the nearest positivity
wall to form a new interior solution in the transformed space. This translation is also
illustrated in Figure 7.2.
Our next task is to map the new solution yk+ 1 back to the original solution space for
obtaining an improved solution xk+ 1 to problem (7.1). This could be done by applying
the inverse transformation Tk- 1 to yk+ 1. In other words, we have
xk+1 = Tk-1 (l+1) = Xkl+1
= xk +akXkd~
= xk - akXkPkXkc

= xk- akXk [1- XkAT (AX~ATr 1


AXk] Xkc

akX~ (AX~ATr AX~c]


1
= xk- [c-AT

= xk - akX~ [c - AT wk] (7.10)


where
(7.11)
This means the moving direction in the original solution space is d~ = -XUc-
AT wk] and the step-length is ak. while d~ = - Xk [ c - AT wk] in the transformed space.
Several important observations can be made here:

Observation 1. Note that d~ = -Pkck and d~ = Xkd~. Since Pk is a projection


mapping, we see that
CT xk+1 = CT Xk + akcTXkdky
= cT xk + ak(ckl dky
= CT xk -a k.(dk)T
y
dky

= CT Xk- ak ~~d~w (7.12)


150 Affine Scaling Algorithms Chap. 7

This implies that xk+ 1 is indeed an improved solution if the moving direction d~ f. 0.
Moreover, we have the following lemmas:

Lemma 7 .1. If there exists an xk E P 0 with d~ > 0, then the linear programming
problem (7 .1) is unbounded.
Proof Since d~ is in the null space of the constraint matrix AXk and d~ > 0, we
know yk+ 1 = y + akd~ is feasible to problem (7.1'), for any ak > 0. Consequently, we
can set ak to be positive infinity, then Equation (7.12) implies that the limit of c 7 xk+ 1
approaches minus infinity in this case, for xk+l = xk + akXkd~ E P.

Lemma 7.2. If there exists an xk E P 0 with d~ = 0, then every feasible solution


of the linear programming problem (7 .1) is optimal.
Proof Remember that Pk is a null space projection matrix. Ford~ = -PkXkc = 0,
we know that Xkc is in the orthogonal complement of the null space of matrix AXk.
Since the orthogonal complement in this case is the row space of matrix AXk, there
exists a vector uk such that
(AXk) 7 uk = Xkc or (uk)T AXk = c 7 Xk

Since Xk 1 exists, it follows that (uk)T A= c 7 . Now, for any feasible solution X,

7 7
c x = (uk)T Ax= (uk) b

Since (uk) 7 b does not depend on x, the value of c 7 x remains constant over P.

Lemma 7.3. If the linear programming problem (7.1) is bounded below and its
objective function is not constant, then the sequence {c 7 xk I k = 1, 2, ... } is well-defined
and strictly decreasing.
Proof This is a direct consequence of Lemmas 7.1, 7.2, and Equation (7.12).

Observation 2. If xk is actually a vertex point, then expression (7.11) can be


reduced to wk = (B 7 )- 1c 8 which was defined as "dual vector" in Chapter 4. Hence we
call wk the dual estimates (corresponding to the primal solution xk) in the primal affine
scaling algorithm. Moreover, in this case, the quantity
(7.13)
reduces to c- A 7 (B 7 )- 1c 8 , which is the so-called reduced cost vector in the simplex
method. Hence we call 0 the reduced cost vector associated with xk in the affine scaling
algorithm.

Notice that when rk ::?:: 0, the dual estimate wk becomes a dual feasible solution
and (xk) T 0 = e 7 Xkrk becomes the duality gap of the feasible solution pair (xk, wk),
i.e.,
(7 .14)
Sec. 7.1 Primal Affine Scaling Algorithm 151

In case eTXk rk = 0 with rk ::=:: 0, then we have achieved primal feasibility at xk, dual
feasibility at wk, and complementary slackness conditions. In other words, xk is primal
optimal and wk dual optimal.
Based on the above discussions, here we outline an iterative procedure for the
primal affine scaling algorithm.

Step 1 (initialization): Set k = 0 and find x0 > 0 such that Ax0 = b. (Details
will be discussed later.)
Step 2 (computation of dual estimates): Compute the vector of dual estimates
wk = (AX~AT)- 1 AX~c

where Xk is a diagonal matrix whose diagonal elements are the components of xk.
Step 3 (computation of reduced costs): Calculate the reduced costs vector
rk = c- ATwk

Step 4 (check for optimality): If rk ::=:: 0 and eTXkrk .:::: E (a given small positive
number), then STOP. xk is primal optimal and wk is dual optimal. Otherwise, go
to the next step.
Step 5 (obtain the direction of translation): Compute the direction

d~ = -Xkrk
Step 6 (check for unboundedness and constant objective value): If d~ > 0,
then STOP. The problem is unbounded. If d~ = 0, then also STOP. xk is primal
optimal. Otherwise go to Step 7.
Step 7 (compute step-length): Compute the step-length

ak =min{~
I-(dy)i
(d;); < o} where 0 <a < 1

Step 8 (move to a new solution): Perform the translation


xk+
1
= xk + akXkdky
Reset k +-- k + 1 and go to step 2.
The following example illustrates the steps of primal affine scaling algorithm.
Example 7.1
Minimize - 2x 1 + x2

subject to XJ - x2 + X3 = 15

X2 + X4 = 15
152 Affine Scaling Algorithms Chap. 7

In this case,

A=[1
0
-1 1 OJ
1 0 1 , b = [15 15f, and c = [ -2 1 0 0] T

Let us start with, say, x0 = [10 2 7 13]T, which is an interior feasible solution. Hence,

~ ~ ~l
10
Xo-
-
l
Moreover,
0
0 0 7
0 0 0
0
13
- 0.00771f

r 0 =c-AT w 0 = [-0.66647 -0.32582 1.33535 - 0.00771f


Since some components of r0 are negative and eTXor0 = 2.1187, we know that
the current solution is nonoptimal. Therefore we proceed to synthesize the direction of
translation with
d~ = -X0 r 0 = [6.6647 0.6516 -9.3475 0.1002f
Suppose that a = 0.99 is chosen, then the step-length
0.99
ao =- - = 0.1059
9.3475
Therefore, the new solution is
x 1 = x0 + aoXod~ = [17.06822 2.13822 0.07000 12.86178]T
Notice that the objective function value has been improved from -18 to -31.99822.
The reader may continue the iterations further and verify that the iterative process converges
to the optimal solution x* = [30 15 0 O]T with optimal value -45.

Convergence of the primal affine scaling algorithm. Our objective is to


show that the sequence {xk} generated by the primal affine scaling algorithm (without
stopping at Step 6) converges to an optimal solution of the linear programming problem
(7.1). In order to simplify our proof, we make the following assumptions:

1. The linear programming problem under consideration has a bounded feasible do-
main with nonempty interior.
2. The linear programming problem is both primal nondegenerate and dual nonde-
generate.

The first assumption rules out the possibility of terminating the primal affine scaling
algorithm with unboundedness, and it can be further shown that (see Exercise 7.5) these
two assumptions imply that (i) the matrix AXk is of full rank for every xk E P and (ii)
the vector rk has at most m zeros for every wk E Rm.
We start with some simple facts.

Lemma 7.4. When the primal affine scaling algorithm applies, lim Xkrk = 0.
k->00
Sec. 7.1 Primal Affine Scaling Algorithm 153

Proof Since {cT xk} is monotonically decreasing and bounded below (by the first
assumption), the sequence converges. Hence Equations (7.12) and (7.9) imply that

0 = lim (cT xk - cT xk+l) = lim ak lldk 11 2 2: lim _a_lldk 11 2


k-->oo k->oo Y k-+oo lldk II Y
y

Notice that a > 0 and lid~ II 2: 0, we have

lim lid~ II= lim IIXkrkll = 0.


k-->oo k-+oo

The result stated follows immediately.

The reader may recall that the above result is exactly the complementary slackness
condition introduced in Chapter 4. Let us define C c P to be the set in which the
complementary slackness holds. That is,
C = {xk E P IXkr" =0} (7.15)
Furthermore, we introduce D c P to be the set in which the dual feasibility
condition holds, i.e.,
(7.16)
In view of the optimality conditions of the linear programming problem, it is easy
to prove the following result.

Lemma 7.5. For any x E C n D, xis an optimal solution to the linear program-
ming problem (7.1).

We are now ready to prove that the sequence {xk} generated by the primal affine
scaling algorithm does converge to an optimal solution of problem (7 .1 ). First, we show
that

Theorem 7.1. If {xk} converges, then x* = lim xk is an optimal solution to


k-+00
problem (7.1).
Proof We prove this result by contradiction. First notice that when {xk} converges
to x*, x* must be primal feasible. However, let us assume that x* is not primal optimal.
Since r" (·) is a continuous function of x at xk, we know r* = lim rk is well
k->oo
defined. Moreover, Lemma 7.4 implies that
X*r* = lim Xkrk = 0
k-+oo

Hence we have x* E C. By our assumption and Lemma 7.5, we know that x* ~ D.


Therefore, there exists at least one index j such that rl < 0. Remembering that x* E C,
we have xj = 0. Owing to the continuity of r", there exists an integer K such that for
any k ::: K, {rj} < 0. However, consider that

xj
k+l = k
xj - ak ( xjk)2 rjk
154 Affine Scaling Algorithms Chap. 7

Since (xj) 2rj < 0, we have xJ+ 1 > xj > 0, V k ~ K, which contradicts the fact
that xJ ___,. xj* = 0. Hence we know our assumption must be wrong and x* is primal
optimal.

The remaining work is to show that the sequence {xk} indeed converges.

Theorem 7.2. The sequence {xk} generated by the primal affine scaling algorithm
is convergent.
Proof Since the feasible domain is nonempty, closed, and bounded, owing to
compactness the sequence {xk} has at least one accumulation point in P, say x*. Our
objective is to show that x* is also the only accumulation point of {xk} and hence it
becomes the limit of {xk}.
Noting that rk(·) is a continuous function of xk and applying Lemma 7.4, we can
conclude that x* E C. Furthermore, the nondegeneracy assumption implies that every
element in C including x* must be a basic feasible solution (vertex of P). Hence we
can denote its nonbasic variables by x'N and define N as the index set of these nonbasic
variables. In addition, for any 8 > 0, we define a "8-ball" around x* by
B 0 = { xk E P Ixt < 8e}
Let r* be the reduced cost vector corresponding to x*. The primal and dual non-
degeneracy assumption ensures us to find an E > 0 such that
mip.lrjl > E
jEN

Remember that the nondegeneracy assumption forces every member of C to be a


vertex of P and there are only a finite number of vertices in P, hence C has a finite
number of elements and we can choose an appropriate 8 > 0 such that
B2o n C = x* (7.17)
and
mip.lr}l > E, (7.18)
jEN

Recalling that

we have

Owing to the boundedness assumption, we know that the step-length ak at each


iteration is a positive but bounded number. Therefore, for appropriately chosen E and 8,
if xk E Bo, which is sufficiently close to x*, we see that ak[xjrjf <EO. Hence we can
define a set
Sec. 7.1 Primal Affine Scaling Algorithm 155

Now, for any xk E SE, 8, (7.18) implies that


2
ak [xJ] I rf I < 8, V j EN
which further implies that

xjk+ I = xjk - ak [ ( xj)


k' 2 rjk] < 28,

This means xk+ 1 E B28 if xk E SE ,8.


Now we are ready to show that x* is the only accumulation point of {xk} by
contradiction. We suppose that {xk} has more than one accumulation point. Since x* is
an accumulation point, the sequence {xk} visits S€, 8 infinitely often. But because x* is
not the only accumulation point, the sequence has to leave S£, 8 infinitely often. However,
each time when the sequence leaves S£,8• it stays in B28 \S£, 8 . Therefore, infinitely many
elements of {xk} fall in B28 \S£, 8 • Notice that this difference set has a compact closure,
and the subsequence of {xk} belonging to B28 \S£, 8 must have an accumulation point in
the compact closure. Noting the definition of C, we know that every accumulation point
of {xk} must belong to it. However, Cis disjoint from the closure of B28 \S£, 8 . This fact,
together with Equation (7 .17), causes a contradiction. Thus x* is indeed the limit of the
sequence {xk}.

More results on the convergence of the affine scaling algorithm under degeneracy
have appeared recently. Some references are included at the end of this chapter for
further information.

7.1.2 Implementing the Primal Affine Scaling Algorithm

Many implementation issues need to be addressed. In this section, we focus on the start-
ing mechanisms, checking for optimality, and finding an optimal basic feasible solution.

Starting the primal affine scaling algorithm. Parallel to our discussion for
the revised simplex method, here we introduce two mechanisms, namely, the big-M
method and two-phase method for finding an initial interior feasible solution. The first
method is more easily implemented and suitable for most of the applications. However,
more serious commercial implementations often consider the second method for stability.
Big-M Method. In this method, we add one more artificial variable xa asso-
ciated with a large positive number M to the original linear program problem to make
(1, 1, ... , 1) E Rn+l become an initial interior feasible solution to the following problem:

Minimize CT X + MX a (7.19a)

subject to [A I b- Ae] [xxa J = b, (7.19b)

where e = (1, 1, ... , l)T ERn.


156 Affine Scaling Algorithms Chap. 7

Comparing to the big-M method for the revised simplex method, here we have
only n + 1 variables, instead of n + m. When the primal affine scaling algorithm is
applied to the big-M problem (7.19) with sufficiently large M, since the problem is
feasible, we either arrive at an optimal solution to the big-M problem or conclude that
the problem is unbounded. Similar to the discussions in Chapter 4, if the artificial
variable remains positive in the final solution (x*, xa*) of the big- M problem, then the
original linear programming problem is infeasible. Otherwise, either the original prob-
lem is identified to be unbounded below, or x* solves the original linear programming
problem.

Two-Phase Method. Let us choose an arbitrary x0 > 0 and calculate v = b-Ax0 .


If v = 0, then x0 is an initial interior feasible solution. Otherwise, we consider the
following Phase-I linear programming problem with n + 1 variables:

Minimize u (7.20a)

subject to [A I v] [~] = b, X 2: 0, U 2: 0 (7.20b)

It is easy to verify that the vector

is an interior feasible solution to the Phase-I problem. Hence the primal affine scaling
algorithm can be applied to solve this problem. Moreover, since the Phase-! problem is
bounded below by 0, the primal affine scaling algorithm will always terminate with an
optimal solution, say (x*, u*)T. Again, similar to the discussions in Chapter 4, if u* > 0,
then the original linear programming problem is infeasible. Otherwise, since the Phase-!
problem treats the problem in a higher-dimensional space, we can show that, except for
very rare cases with measure zero, x* > 0 will become an initial interior feasible solution
to the original problem.
Note that the difference in dimensionality between the original and Phase-I prob-
lems could cause extra computations for a simpleminded implementation. First of all,
owing to numerical imprecisions in computers, the optimal solution x* obtained from
Phase-I could become infeasible to the original problem. In other words, we need to
restore primal feasibility before the second-phase computation. Second, the difference of
dimensionality in the fundamental matrices AX~ AT (of the original problem) and AX~ AT
(of the Phase-I problem) could prevent us from using the same "symbolic factorization
template" (to be discussed in Chapter 10) for fast computation of their inverse matrices.
Therefore, it would be helpful if we could operate the Phase-I iterations in the original
n -dimensional space.
In order to do so, let us assume we are at the kth iteration of applying the primal
affine scaling to the Phase-I problem. We denote
Sec. 7.1 Primal Affine Scaling Algorithm 157

to be the current solution,

A= [A I v],
Remember that the gradient of the objective function is given by

hence the moving direction in the original space of Phase-! problem is given by
a~ = -Xk[I- A.[ c.Ak.A[)- 1Ak]ck (7.21)
where ck = Xkc. If we further define
k A A !A k
w = (AkAk)- Akc (7.22)
then we have
(7.23)
Simple calculation results in
AkA[= [AX~AT + (uk) 2vvT] (7.24)
and

Akck =[A I v] [~k ~k] [u~] = [AXk vuk] 1 [:k] = (uk) 2v (7.25)

Combining (7.22), (7.24), and (7.25), we see that


(7.26)
Applying the Sherman-Woodbury-Morrison lemma (Lemma 4.2), we have

wk = l (AX 2 AT)- 1v = l (AX2 AT)- 1v (7 27)


[(uk)-2 + yT (AX~AT)-lv] k [(uk)-2 + y] k ·

where y = vT (AX~AT)- 1 v. Plugging corresponding terms into (7.23), we see that


a~= -:Xkce- A.[ wk)

(7.28)

Observing that

(uk)2 ( 1 _ vT wk) = (uk)2 ( 1 _ Y ) = 1


(uk)-2 +y (uk)-2 +y
158 Affine Scaling Algorithms Chap. 7

we further have
1
(Jk= 1 [X~AT(AX~A~~- (b-Axo)] (7.29)
x (uk)-2 +y
Notice that the scalar multiplier in (7.29) will be absorbed into the step-length and
the last element of the moving direction is -1. Hence we know that the algorithm tries
to reduce u all the time. In this expression, we clearly see the computation of d~ can be
performed in the original n-dimensional space and the template for the factorization of
AX~ AT can be used for both Phase I and II.
In order to compute the step-length, we consider that

Similar to the previous discussion, the step-length can be chosen as

for some 0 < a < 1

An interesting and important point to be observed here is that the Phase-I iterations
may be initiated at any time (even during the Phase-II iterations). Once we detect that
the feasibility of a current iterate is lost owing to numerical inaccuracies that stem from
the finite word length of computers, Phase-I iterations can be applied to restore the
feasibility. Hence sometimes we call it a "dynamic infeasibility correction" procedure.
Sophisticated implementations should have this feature built in, since the primal method
is quite sensitive to numerical truncations and round-off errors.
Having determined the starting mechanisms, we focus on the stopping rules for the
implementation of the primal affine scaling algorithm.

Stopping rules. As we mentioned earlier, once the K-K-T conditions are met,
an optimal solution pair is found. Hence we use the conditions of (1) primal feasibility,
(2) dual feasibility, and (3) complementary slackness as the stopping rules. However, in
real implementations these conditions are somewhat relaxed to accommodate the numer-
ical difficulties due to limitations of machine accuracy.
Let xk be a current solution obtained by applying the primal affine scaling algorithm.
The primal feasibility condition requires that

(I) PRIMAL FEASIBILITY

(7.30)

In practice, the primal feasibility is often measured by

IIAxk- bll
with xk :::: 0 (7.31)
llbll +1
Sec. 7.1 Primal Affine Scaling Algorithm 159

Note that for xk :::: 0, if CJp is small enough, we may accept xk to be primal feasible.
The addition of 1 in the denominator of (7.31) is to ensure numerical stability in
computation.

(II) DUAL FEASIBILITY

The dual feasibility requires the nonnegativity of reduced costs, i.e.,


rk =c-ATwk:::: 0 (7.32)
where wk is the dual estimate defined in Equation (7.11). A practical measure of
dual feasibility could be defined as
II~ II
(7.33)
CJd= llcll+l
where IIrk II and Ilei I are calculated only for those i such that rf < 0. When CJd is
sufficiently small, we may claim the dual feasibility is satisfied by wk.

(Ill) COMPLEMENTARY SLACKNESS

The complementary slackness condition requires that


(xkl rk = eXkrk = 0 (7.34)
Since
CT Xk - bT Wk = eTXkrk (7.35)
where xk is primal feasible and wk is dual feasible, we may define
CJc = cT xk - bT wk (7.36)
to measure the complementary slackness condition.

In practice, we choose CJp, CJd and CJc as sufficiently small positive numbers and
use them to decide if the current iteration meets the stopping rules. According to the
authors' experience, we have observed the following behavior for the primal affine scaling
algorithm:

1. At each iteration, the computational bottleneck is due to the computation of the


dual estimates wk.
2. Although the primal feasibility condition is theoretically maintained, the numerical
truncation and round-off errors of computers could still cause infeasibility. There-
fore, the primal feasibility needs to be carefully checked. Once the infeasibility is
detected, we may apply the Phase-I "dynamic infeasibility correction" procedure
to restore the primal feasibility.
3. The value of the objective function decreases dramatically in the early iterations, but
the decreasing trend slows down considerably when the current solution becomes
closer to optimality.
160 Affine Scaling Algorithms Chap. 7

4. The algorithm is somewhat sensitive to primal degeneracy, especially when the


iteration proceeds near optimality. But this is not universally true. In many cases,
even with the presence of primal degeneracy, the algorithm still performs quite
well.

Finding a basic feasible solution. Notice that, just like Karmarkar' s algo-
rithm, at each iteration, the current solution of the primal affine scaling algorithm always
stays in the interior of the feasible domain P. In order to obtain a basic feasible solution,
the purification scheme and related techniques described in Chapter 6 can be applied here.

7.1.3 Computational Complexity

Compared to Karmarkar' s projective transformation, the affine scaling transformation is


less complicated and more natural. The implementation of the primal affine scaling is
also simple enough. It does not need the assumption of "zero optimal objective value"
nor require the special "simplex structure." By far, it is one of the most "popular"
variants of the interior-point method. According to R. Vanderbei, M. Meketon, and B.
Freedman's experiment, for problems with dense constraint matrices, their primal affine
scaling implementation takes about 7.3885m- 0·0187 n°· 1694 iterations to reach an optimal
solution within E = 10-3 . The result was derived from the regression analysis of 137
randomly generated problems.
Although in practice the primal affine scaling algorithm performs very well, no
proof shows the algorithm is a polynomial-time algorithm. Actually, N. Megiddo and
M. Shub showed that the affine scaling algorithm might visit the neighborhoods of all
the vertices of the Klee-Minty cube when a starting point is pushed to the boundary.

Potential push method. To avoid being trapped by the boundary behavior, a


recentering method called potential push is introduced. The idea is to push a current
solution xk to a new interior solution f._k which is away from the positivity walls but
without increasing its objective value. Then continue the iterations from X.k. Figure 7.3
illustrates this concept.
In Figure 7.3, we move from xk-I to a new solution xk along the direction d~-J
provided by the primal affine scaling algorithm. Then we recenter xk to X.k by a "potential
push" along the direction d.~ such that xk and X.k have the same objective value but X.k is
away from the boundary.
To achieve this goal, first we define a potential function p(x), for each x > 0, as
n

p(x) = - L loge Xj (7.37)


j=!

The value of the potential function p(x) becomes larger when x is closer to a positivity
wall Xj = 0. Hence it creates a force to "push" x away from too close an approach to
Sec. 7.1 Primal Affine Scaling Algorithm 161

Constant
objective
:/planes~;
' '
: '
'
' Objective
Potential A
step (along ct_:-I) i
step (along d})
/ !'' ~ '

x* ~xk-1
·~
'
'
'
'
''
'
' '
' Recentered
solution

Figure 7.3

a boundary by minimizing p(x). With the potential function, we focus on solving the
following "potential push" problem:

Minimize p(x) (7.38a)


subject to Ax = b, X>O (7.38b)
(7.38c)

Note that (7 .38b) requires the solution of problem (7 .38) to be an interior feasible solution
to the original linear programming problem; (7.38c) requires it to keep the same objective
value as xk; and minimizing p(x) forces the solution away from the positivity walls.
Therefore, we can take the optimal solution of problem (7 .38) as X.k.
Similar to our discussions for the Phase-I problem, if we directly apply the primal
affine scaling algorithm to solve the potential push problem, we have a mismatch in
dimensionality, since problem (7.38) has one more constraint than the original linear
programming problem. In order to implement the potential push method in a consistent
framework with the primal affine scaling algorithm, we need to take care of requirement
(7.38c) separately. Also notice that we do not really need to find an optimal solution
to the potential push problem. Any feasible solution to problem (7.38) with improved
value in p(x) can be adopted as X.k.
One way to achieve this goal is to take xk as an initial solution to problem (7 .38),
then project the negative gradient of p(x) onto the null space of the constraint matrix A
as a potential moving direction, say pk. But in order to keep the same objective value,
we first project the negative gradient of the objective function c7 x onto the null space
of A and denote it as g. Then, the recentering (or push) direction d~ is taken to be the
component of pk which is orthogonal to g. Finally, along this direction, we conduct a
line search for an optimal step-length.
162 Affine Scaling Algorithms Chap. 7

Mathematically speaking, we let P =I- AT (AAT)- 1A be the projection mapping


and Vp(x) be the gradient of the potential function. Then, we have

V
pk = - P ( p ( xk)) = [I - AT (AATr 1A] ( :k ) (7.39)

where

~= (~,
X x
... ,~)T
Xn
1

Similarly,
(7.40)
We now decompose pk into two components, one along g and the other orthogonal
to it. The first component can be expressed as JJ,g, for some M > 0, since it is along the
direction of g. Therefore the orthogonal component can be expressed as
d~ = pk- JJ,g (7.41)
Moreover, the orthogonal condition requires that
(d~)T g = 0 (7.42)
which determines the value of M by

(7.43)

and, consequently,

d~kx -- Pk - [ (pk)T
-- g
gTg
g] (7.44)

Figure 7.4 illustrates this situation.

----------------- ,Pk

j1.g

Figure 7.4
Sec. 7.1 Primal Affine Scaling Algorithm 163

Now we focus on finding an appropriate step-length K such that the point xk is


translated to a new solution xk = xk + Kd~ which has the same objective value as cT xk
but with a lower value in p(x). To do so, we conduct a line search along the direction
d~. One of the easiest ways is via binary search. Note that the maximum value (say /Z)
that K can assume is given by

(7.45)

Hence we only have to search for K in the interval (0, IZ) such that p(xk) assumes a
minimum value.
Several issues are worth mentioning here:

1. When the potential push method is applied after each iteration of the primal affine
scaling algorithm, since P needs to be evaluated only once for all iterations and
a binary search is relatively inexpensive, the evaluation of the potential function
required during the search becomes the most time-consuming operation associated
with the potential push.
2. The purpose of applying potential push is to gain faster convergence by staying
away from the boundary. If the extra speed of convergence obtained by potential
push appears to be marginal, then it is not worth spending any major effort in it.
Some coarse adjustments are good enough in this case. According to the authors'
experience, no more than four or five searches per affine scaling iteration are needed
to estimate xk.
3. Recall that Karmarkar' s potential function is given by (6.18), namely,
n
f(x; c)= n loge (cT x)- L loge Xj
j=l

Hence j(x; c) = n loge (cT x)- p(x), assuming that cT x > 0. When the potential
push is applied after each iteration of the primal affine scaling, we see the first
term in j(x; c) is reduced by the affine scaling and the second term is reduced by
the potential push. Thus the flavor of Karmarkar' s approach is preserved.
4. Since the flavor of Karmarkar's potential function is preserved, it is conjectured that
primal affine scaling together with potential push could result in a polynomial-time
algorithm. But so far, no rigorous complexity proof has been provided.

Logarithmic barrier function method. Another way to stay away from the
positivity walls is to incorporate a barrier function, with extremely high values along the
boundaries {x E Rn I Xj = 0, for some 1 ::=: j ::=: n}, into the original objective function.
Minimizing this new objective function will automatically push a solution away from
the positivity walls. The logarithmic barrier method considers the following nonlinear
164 Affine Scaling Algorithms Chap. 7

optimization problem:
n
Minimize FIL(x) = cT x- p, I)ogexj (7.46a)
j=!

subject to Ax= b, X>O (7.46b)


where p, > 0 is a scalar. If x*(p,) is an optimal solution to problem (7.46), and if x*(p,)
tends to a point x* as f.L approaches zero, then it follows that x* is an optimal solution
to the original linear programming problem. Also notice that the positivity constraint
x > 0 is actually embedded in the definition of the logarithmic function. Hence, for any
fixed p, > 0, the Newton search direction diL at a given feasible solution x is obtained
by solving the following quadratic optimization problem:

Minimize ~dT\72 FIL(x)d + ((V'FIL(x)l d (7.47a)


2
subject to Ad = 0 (7.47b)
where V'FIL(x) = c- p,X- 1e and V'2 FIL(x) = p,X- 2 .
In other words, the Newton direction is in the null space of matrix A and it
minimizes the quadratic approximation of FIL(x). We let AIL denote the vector of Lagrange
multipliers, then diL and AIL satisfy the following system of equations:

It follows that

(7 .48)

and
1
diL = --X[I- XAT (AX2 Ar)- 1 AX](Xc- p,e) (7.49a)
p,
Taking the given solution to be x = xk and comparing diL with the primal affine
scaling moving direction d~, we see that

diL = _!_d~ + Xk(l- XkAT (AXIAT)- 1AXk]e (7.49b)


f.L
The additional component Xk[I- XkAT(AX~AT)- 1 AXk]e = XkPke can be viewed as
a force which pushes a solution away from the boundary. Hence some people call it
a "centering force," and call the logarithmic barrier method a "primal affine scaling
algorithm with centering force. "
Sec. 7.2 Dual Affine Scaling Algorithm 165

While classical barrier function theory requires that xk solves problem (7.46) ex-
plicitly before IL = ILk is reduced, C. Gonzaga has pointed out that there exists ILo > 0,
0 < p < 1, and a > 0 so that choosing dJLk by (7 .49), xk+l = xk +adJLk, and ILk+ I = p ILk
yields convergence to an optimal solution x* to the original linear programming problem
in O(.jii.L) iterations. This could result in a polynomial-time affine scaling algorithm
with complexity O(n 3 L). A simple and elegant proof is due to C. Roos and J.-Ph. Vial,
similar to the one proposed by R. Monteiro and I. Adler for the primal-dual algorithm.

7.2 DUAL AFFINE SCALING ALGORITHM

Recall that the dual linear programming problem of problem (7 .1) is


Maximize bT w (7.50a)
subject to AT w + s = c, s ~ 0, w unrestricted (7.50b)
Similar to the dual simplex method, the dual affine scaling algorithm starts with a dual
feasible solution and takes steps towards optimality by progressively increasing the ob-
jective function while the dual feasibility is maintained in the process.
Notice that problem (7.50) contains both unrestricted variables w E Rm and non-
negative variables s E Rn. In this case, (w; s) is defined to be an interior feasible solution
if AT w + s = c and s > 0. Also note that for w-variables, there is no meaning of "cen-
tering" since they are unrestricted. But for s-variables, we can treat them as we treat the
x-variables in the primal problem.

7.2.1 Basic Ideas of Dual Affine Scaling

The dual affine scaling algorithm also consists of three key parts, namely, starting with
an interior dual feasible solution, moving to a better interior solution, and stopping with
an optimal dual solution. We shall discuss the starting mechanisms and stopping rules
in later sections. In this section we focus on the iterates.
Given that at the kth iteration, we have an interior dual solution (wk; sk) such
that AT wk + sk = c and sk > 0. Our objective is to find a good moving direction
(d:; d~) together with an appropriate step-length f3k > 0 such that a new interior solution
(wk+ 1 ; sk+ 1) is generated by
wk+ 1 = wk + f3kd~ (7.51a)
sk+l = sk + f3kd~ (7.51b)
which satisfies that
AT ~+1 + sk+I = c (7.52a)
sk+ 1 > 0 (7.52b)
and
(7.52c)
166 Affine Scaling Algorithms Chap. 7

Plugging (7.51) into (7.52a) and remembering that AT wk +sk = c, we have a requirement
for the moving direction, namely,
AT d: + d~ = 0 (7.53a)
In order to get better objective value, we plug (7.51a) into (7.52c), which results in
another requirement for the moving direction:
bT d: : : 0 (7.53b)
To take care of (7 .52b ), the affine scaling method is applied. The basic idea is to recenter
sk at e = ( 1, 1, ... , 1) T E Rn in the transformed space such that the distance to each
positivity wall is known. In this way, any movement within unit distance certainly
preserves the positivity requirement.
Similar to what we did in the primal affine scaling algorithm, we define an affine
scaling matrix Sk = diag (sk) which is a diagonal matrix with st
as its ith diagonal
element. In this way, St; 1sk = e and every s-variable is transformed (or scaled) into a
new variable u ::: 0 such that
(7.54a)
and
(7.54b)
Moreover, if d~ is a direction of co~t improvement in the transformed space, then its
corresponding direction in the original space is given by
d~ = Skd~. (7.54c)
Now we can study the iterates of the dual affine scaling algorithm in the transformed
(or scaled) space. In order to synthesize a good moving direction in the transformed
space, requirement (7 .53a) implies that
AT d: + d; = 0::::? AT d: + Skd~ = 0
::::? s-lAT
k
dkw+"
dk = 0 -----T'k
___.._s-lAT dkw = -dku

Multiplying both sides by ASt; 1 we get


AS- 2 AT dk = -As- 1dk
k . 1U k u

Assuming that A is of full row rank, we obtain


dkw = -(AS- 2 AT)- 1AS- 1dk
k k u (7.55a)
By defining Qk = (ASt; 2 AT)- 1AS;; 1 , (7.55a) is simplified as
d: = -Qkd~ (7.55b)
The above equation says that d~ is actually determined by d~ in the transformed
space. If we can find an appropriate direction d~ such that (7.53b) is satisfied, then we
can achieve our goal. To do so, we simply let
d~ = -QJb (7.56a)
Sec. 7.2 Dual Affine Scaling Algorithm 167

then we have
bTd: = bTQkd: = bTQkQ[b = ilbTQkll 2 2:0

Combining (7.56a) and (7.55b), we see that


d: = (ASk 2 AT)- 1b (7.56b)
Consequently, from (7.53a), we have the following moving direction in the original
space:
(7.56c)
Once the moving direction (d:; d~) is known, the step-length f3k is dictated by the
positivity requirement of sk+ 1 as in the primal affine scaling algorithm, namely,

1. If d~ = 0, then the dual problem has a constant objective value in its feasible
domain and (wk; sk) is dual optimal.
2. If d~ 2: 0 (but =I= 0), then problem (7.50) is unbounded.
3. Otherwise,

where 0 <a< 1

Note that, similar to the way we defined dual estimates in the primal affine scaling
algorithm, if we define
(7.57)
then Axk = ASk 2 AT d~ = b. Therefore, xk can be viewed as a "primal estimate" in
the dual affine scaling algorithm. Once the primal estimate satisfies that xk 2: 0, then
it becomes a primal feasible solution with a duality gap cT xk - bT wk. Moreover, if
cT xk - bT wk = 0, then (wk; sk) must be dual optimal and xk primal optimal. This
information can be used to define stopping rules for the dual affine scaling algorithm.

7.2.2 Dual Affine Scaling Algorithm

Based on the basic ideas discussed in the previous section, we outline a dual affine
scaling algorithm here.

Step 1 (initialization): Set k = 0 and find a starting solution (w 0 ; s0 ) such that


AT w0 + s0 = c and s0 > 0. (Details will be discussed later.)
Step 2 (obtain directions of translation): Let Sk = diag (sk) and compute
dkw = (AS-k 2 AT)-- 1b and dks =-AT dkw

Step 3 (check for unboundedness): If d~ = 0, then STOP. (wk; sk) is dual


optimal. If d~ 2: 0, then also STOP. The dual problem (7.50) is unbounded.
168 Affine Scaling Algorithms Chap. 7

Step 4 (computation of the primal estimate): Compute the primal estimate as:
xk = -SJ; 2 d~

Step 5 (check for optimality): If xk :::: 0 and c7 xk - b7 wk ::::: E, where E is a


preassigned small positive number, then STOP. (wk; sk) is dual optimal and xk is
primal optimal. Otherwise, go to the next step.
Step 6 (computation of step-length): Compute the step-length
. as; k
f3k = mm
1
{
--k-k
-(ds );
I (dJ; < 0 } where 0 <a< 1

Step 7 (move to a new solution): Update dual variables (w; s) by


wk+J = wk + f3kd~
sk+I = sk + f3kd~

Set k ~ k + 1 and go to Step 2.


Now we present an example to illustrate the dual affine scaling algorithm.
Example 7.2
Consider the dual problem of Example 7.1 and solve it by using the dual affine scaling
algorithm.
First note the dual problem is

Maximize 15wl + 15w2

It is easy to verify that w 0 = [-3 -3f and s 0 = [1 1 3 3f constitute an


initial interior feasible solution. Hence,
1 0 0
0 1 0 do = (AS-2 AT)-lb = [ 23.53211]
So=
[
0 o 3 w 0 34.~~0
0 0 0
and

-23.53211l
do= -AT do = -11.14679
s w -23.53211
[
-34.67890
Then

x0 = -S 0 2 d~ = (23.53211 11.14679 2.61467 3.8532ll


Sec. 7.2 Dual Affine Scaling Algorithm 169

Although x0 ~ 0, the duality gap is cT x0 - bT w0 = 54.08257, which is far bigger than


zero. Hence the current solution is not optimal yet.
To calculate the step-length, we choose a= 0.99. Consequently,

0.99 xI
fJo = 23.53211 = 0.04207
Updating dual variables, we have

WI= (-3 -3/ + 0.04207 X (23.53211 34.67890/ = (-2.01000 -1.54105/ and

s 1 = (1 I 3 3)T +0.04207 x (23.53211 -11.146789 -23.53211 -34.67890)T

= (0.01000 0.53105 2.01000 1.54105)T

So far, we have finished one iteration of the dual affine scaling algorithm. Iterating
again, we obtain

w 2 = (-2.00962 -1.10149)T

s2 = [0.009624 0.00531 2.00962 1.01494f

x2 = [29.80444 14.80452 0.00001 0.19548f

This time, x 2 > 0 and the duality gap has drastically reduced to

CT x2 - bT w2 = -44.80435- (-45.36840) = 0.56404

which is clearly closer to zero. The reader may carry out more iterations and verify that the
optimal value is assumed at w* = (- 2 -1) T and s* = (0 0 2 1) T with an optimal ob-
jective value of -45. The corresponding primal solution x* is located at (30 15 0 Ol.

7.2.3 Implementing the Dual Affine Scaling Algorithm

In this section we introduce two methods, the "big-M method" and "upper bound
method," to find an initial dual feasible interior solution for the dual affine scaling al-
gorithm. Then we discuss the stopping rules and report some computational experience
regarding dual affine scaling.

Starting the dual affine scaling algorithm. The problem here is to find
(w0 ; s0 ) such that AT w0 + s0= c and s0 > 0. Note that, in a special case, if c > 0, then
we can immediately choose w0 = 0 and s0 = c as an initial interior feasible solution
for the dual affine scaling algorithm. Unfortunately, this special case does not happen
every time, and we have to depend upon other methods to start the dual affine scaling
algorithm.

Big-M Method. One of the most widely used methods to start the dual affine
scaling is the big-M method. In this method, we add one more artificial variable, say wa,
170 Affine Scaling Algorithms Chap. 7

and a large positive number M. Then consider the following "big-M" linear programming
problem:
Maximize bT w + M wa

subject to AT w + pwa +s = c (7.58)


w, wa umestricted and s :::: 0
where p E Rn is a column vector whose ith component, i = 1, ... , n, is defined by

p; = { ~ if
if
C;
C;
:'S 0
> 0

Now, we define c = max I C;


i
e > 1, and choose w = 0, wa = -ec, and
I' set
s = c + ecp. It is clearly seen that (0; -ec; c + ecpl is feasible to the big-M problem
(7 .58) with c + ecp > 0. Hence we have found an initial interior feasible solution to the
big-M problem to start the dual affine scaling algorithm.
Note that wa starts with -ec < 0 and is forced to increase in the iterative process,
since M is a large positive number. At some point of time, we expect to see that
wa becomes nonnegative unless the original problem (7.50) is infeasible. When wa
approaches or even crosses zero at the kth iteration, we can take = wk and§= sk+pwa w
to start the dual affine scaling algorithm for the original dual linear programming problem
(7.50). If wa does not approach or cross zero, then it can be shown that the original
problem (7 .50) is infeasible. Showing this is left for the reader as an exercise.
Also note that both e and M are responsible for the quantity of M wa. Their values
could be "tweaked" simultaneously for numerical stability and robustness.

Upper Bound or Artificial Constraint Method. In this method, we assume that


for a sufficiently large positive number M, one of the optimal solutions to the original
primal linear programming problem (7.1) falls in the ball of S(O; M), and we consider a
corresponding "upper-bounded" linear programming problem:

Minimize cT x

subject to Ax = b and 0 :=:: x :=:: u

where u = [M M Mf E Rn. The additional upper-bound constraints are


artificially added to create a dual problem with a trivial initial interior solution. Actually,
the dual of the upper-bounded problem is given by
Maximize bT w - uT v

subject to AT w + s - v = c, s:::: 0, v:::: 0, and w umestricted


Vector v is sometimes called the vector of surplus variables. Remembering the
definition of C and e in the previous section, we see that w 0 = 0, v 0 = ece > 0, and
s0 = c + ece > 0 form an interior feasible solution to the dual upper-bound problem.
Subsequently, the dual affine scaling algorithm can be applied.
Sec. 7.2 Dual Affine Scaling Algorithm 171

The success of this method depends upon the choice of M. It has to be sufficiently
large to include at least one optimal solution to problem (7.50). If the original linear
programming problem is unbounded, the choice of M becomes a real problem.

Stopping rules for dual affine scaling. For the dual affine scaling algorithm,
we still use the K-K-T conditions for optimality test. Note that the dual feasibility is
maintained by the algorithm throughout the entire iterative procedure. Hence we only
need to check the primal feasibility and complementary slackness.
Combining (7 .56c) and (7 .57), we see that the primal estimate is given by
(7.59)
It is easy to see that the explicit constraints Ax = b are automatically satisfied for
any xk which is defined according to formula (7.59). Therefore, if xk ::: 0, then it must
be primal feasible. Also note that, if we convert problem (7.50) into a standard-form
linear programming problem and apply the primal affine scaling to it, the associated dual
estimates eventually result in formula (7 .59).
Once we have reached dual feasibility at wk and primal feasibility at xk, then the
complementary slackness is provided by ac = cT xk - bT wk. When ac is smaller than a
given threshold, we can terminate the dual affine scaling algorithm.

Experiences with dual affine scaling. In light of the fact that the dual affine
scaling algorithm is equivalent to the primal affine scaling algorithm applied to the dual
problem, similar properties of convergence of the dual affine scaling can be established
as we did for the primal affine scaling algorithm. The computational effort in each
iteration of the dual affine scaling is about the same as in the primal affine scaling. To
be more specific, the computational bottleneck of the primal affine scaling is to invert the
matrix AX~ AT, and the bottleneck of dual affine scaling is to invert the matrix AS:;;- 2 AT.
But these two matrices have exactly the same structure, although they use different
scaling. Any numerical method, for example, Cholesky factorization, that improves the
computational efficiency of one algorithm definitely improves the performance of the
other one.
Based on the authors' experience, we have observed the following characteristics
of the dual affine scaling algorithm:

1. For a variety of practical applications, we have noted a general tendency that


the dual affine scaling algorithm converges faster than the primal affine scaling
algorithm. However, the major drawback of the dual affine scaling algorithm is
that it does not give good estimates of the primal variables.
2. The problem of losing feasibility in the primal affine scaling algorithm is not a
serious problem for dual affine scaling. Actually, since the dual feasibility is
maintained by choosing appropriate d~ = -AT d~, one could approximate the
inverse matrix of AS:;;- 2 AT in computing d~ and still obtain a feasible direction
d~. Hence the dual method is less sensitive to numerical truncation and round-off
errors.
172 Affine Scaling Algorithms Chap. 7

3. The dual affine scaling algorithm is still sensitive to dual degeneracy, but less
sensitive to primal degeneracy.
4. The dual affine scaling algorithm improves its dual objective function in a very
fast fashion. However, attaining primal feasibility is quite slow.

7.2.4 Improving Computational Complexity

Like the primal affine scaling algorithm, there is no theoretic proof showing the dual
affine scaling is a polynomial-time algorithm. The philosophy of "staying away from the
boundary" to gain faster convergence also applies here. In this section, we introduce the
power series method and logarithmic barrier function method to improve the performance
of the dual affine scaling algorithm.

Power series method. In applying the primal affine scaling algorithm, if we


take the step-length ak to be infinitesimally small, then the locus of xk can be viewed as
a continuous curve extending from the starting point x0 to the optimal solution x*. As a
matter of fact, in the limit, we may pose the following equation:
dx(a) xk+l - xk
- - = lim = Xkdk =-X~ (c- ATwk)
dct "'k--"0 Clk y

as a first-order differential equation and attempt to find a solution function which describes
the continuous curve. This smooth curve is called a continuous trajectory, and the moving
direction d~ = Xkd~ at each iteration is simply the tangential direction (or first-order
approximation) of this curve at an interior point of P.
As we can see from Figure 7.5, the first-order approximation deviates from the
continuous trajectory easily. A higher-order approximation may stay closer to the con-
tinuous trajectory that leads to an optimal solution. The basic idea of the power series
method is to find higher-order approximations of the continuous trajectory in terms of
truncated power series.

x*

Figure 7.5
Sec. 7.2 Dual Affine Scaling Algorithm 173

The same idea applies to the dual affine scaling algorithm. As a matter of fact,
a continuous version of the dual affine scaling algorithm may be obtained by setting
f3k -+ 0 and solving Equation (7.51) as a system of ordinary differential equations.
These equations will specify a vector field on the interior of the feasible domain. Our
objective here is to generate higher-order approximations of the continuous trajectories
by means of truncated power series.
Combining (7.56b) and (7.56c), we first write a system of differential equations
corresponding to (7.51) as follows:

(7.60a)

(7.60b)

with the initial conditions


w(O) = w 0 and s(O) = s0 (7 .60c)
where S({J) = diag (s({J)) with s({J) = s0 + f3ds > 0.
In order to find the solution functions w({J) and s({J) which trace out a continuous
trajectory, we may consider expanding them in power series at the current solution
w(O) = w 0 and s(O) = s0 such that

(7.6la)

and

s({J) =so+ f{Jj (~) [djs(~)] = f{Jj (~) [djs(~)] (7.6lb)


j=l 1. d{J {3=0 j=O 1. d{J {3=0
If we denote

f<i> = (~) [di f(~)]


l. df3 {3=0
for a function f(fJ), then (7.61) becomes
00

w({J) = L {Jjw<j> (7.62a)


j=O
and
00

s({J) = L {Jj s<j> (7.62b)


j=O
Equation (7.62) can be truncated at any desirable order to get an approximation
of the continuous trajectory. Of course, higher-order truncation depicts the continuous
trajectory more closely but at higher computational expense. In general, to obtain a kth
174 Affine Scaling Algorithms Chap. 7

(k ::: 1) order approximation of w(f3) and s(f3), we need to compute w<j> and s<j> for
j = 1, ... , k. But Equation (7.60b) implies that

s<j> = -AT w<j>, for j = 1, 2, ... (7.63)

Hence the key is to compute w<j> for j ::: 1.


We start with Equation (7.60a) and denote M(f3) = AS(f3)- 2 AT. Then we have

(7.64)

where M< 0 > = M(O) = AS(0)- 2 AT, in which S(0)-2 is the diagonal matrix with l/(s?) 2
being its ith diagonal element, and

M(f3) dw(f3) = b (7.65a)


df3
Taking kth derivative on both sides, we have

k! ) [d(k-j)M(f3)] [dU+Ilw(f3)] = 0
~
k (
(7.65b)
j!(k-j)! d[3(k-j) df3U+Il

In other words, we have


k
:I:<J + 1)M<k-j>w<HI> = 0 (7.65c)
j=O

which further implies that

for k ::: 1 (7 .66)

Hence our focus is shifted to find M<k-j+l>w<j> for j = 1, 2, ... , k and compute
w<k+l> in a recursive fashion.
Remembering that M(f3) = AS(f3)- 2 AT, we let Y(f3) = S(f3)- 2 be a diagonal
matrix with 1/(si(f3)) 2 as its ith diagonal element. Then, we have

(7.67)

and

for k ::: 1 (7.68)

In order to compute y<j>, we further define Z(f3) = S(f3) 2 to be a diagonal matrix


with (si ({3)) 2 as its ith diagonal element. In this way, we see that

Z(f3)Y(f3) =I.
Sec. 7.2 Dual Affine Scaling Algorithm 175

Taking kth derivative results in


k
L z<k-j>y<i> = o, Vk?:.l
j=O

Consequently,

Vk?:.l, (7.69)

where y<O> = Y(O), which is a diagonal matrix with 1/Cs?f being its ith diagonal
element, and z<O> = Z(O), which is a diagonal matrix with (sf) 2 being its ith diagonal
element.
Now, if we know z<k-i>, then y<k> can be obtained by Equation (7.69). But this
is an easy job, since Z(f3) = S(f3) 2 . Taking kth derivative on both sides, we have
k
z<k> =L s<k-j>s<j> (7.70)
j=O

where S<j> for each j is a diagonal matrix which takes the ith component of s<j>,
i.e., S;<j>' as its ith diagonal element. In particular, S< 0 > = S(O), which is the diagonal
matrix with s? as its ith diagonal element.
Summarizing what we have derived so far, we can start with the initial conditions

Proceed with
w<l> = [M(O)r 1b = [AS02 Arr 1b,
and

Remembering that z<O> = Z(O) = diag ((s 0f) and y<O> = Y(O) = [Z(0)]- 1, from
Equation (7.70), we can derive z<I>. Then, y<I> can be derived from Equation (7.69).
Now, recursively, we can compute w< 2 > by Equation (7.68); compute s<2> by
(7 .71)
fork = 2; compute Z< 2 > by Equation (7.70); and compute y< 2 > by Equation (7.69).
Proceeding with this recursive procedure, we can approximate w(f3) and s(f3) by a power
series up to the desirable order.
Notice that the first-order power series approximation results in the same mov-
ing direction as the dual affine scaling algorithm at a current solution. In order to get
higher-order approximation, additional matrix multiplications and additions are needed.
However, there is only one matrix inversion AS 02 AT involved, which is needed anyway
by the first-order approximation. Since matrix inversion dominates the computational
176 Affine Scaling Algorithms Chap. 7

complexity of matrix multiplication and addition, it might be cost-effective to incorpo-


rate higher-order approximation. According to the authors' experience, we found the
following characteristics of power series method:

1. Higher-order power series approximation becomes more insensitive to degeneracy.


2. Compared to the dual affine scaling algorithm, a power series approximation of
order four or five seems to be able to cut the total number of iterations by half.
3. The power series method is more suitable for dense problems.

As far as the computational complexity is concerned, although it is conjectured that


the power series method might result in a polynomial-time algorithm, no formal proof
has yet been given.

Logarithmic barrier function method. Similar to that of the primal affine


scaling algorithm, we can incorporate a barrier function, with extremely high values
along the boundaries {(w; s) I Sj = 0, for some 1 :::=: j :::=: n }, into the original objective
function. Now, consider the following nonlinear programming problem:
n
Maximize FJL(w, J.L) = bT w + J.L 2:)oge(cj- AJ w) (7.72a)
j=i

subject to AT w < c (7.72b)


where f.L > 0 is a scalar and AJ is the transpose of the jth column vector of matrix
A. Note that if w*(J.L) is an optimal solution to problem (7.72), and if w*(J.L) tends to
a point w* as f.L approaches zero, then it follows that w* is an optimal solution to the
original dual linear programming problem.
The Lagrangian of problem (7.72) becomes
n
L(w, A.)= bT w + J.L 2.:: loge(cj - AJ w) +A. T (c-AT w)
j=i

where A. is a vector of Lagrangian multipliers. Since Cj - AJ w > 0, the complementary


slackness condition requires that A.= 0, and the associated K-K-T conditions become
b- J.LAS- 1e = 0, and s> 0
Assuming that wk and sk = c - AT wk > 0 form a current interior dual feasible solution,
we take one Newton step of the K-K-T conditions. This results in a moving direction
1
b.w = -(ASJ; 2 AT)- 1 b- (ASJ; 2 AT)- 1 ASJ; 1e (7.73)
J.L
Compare to d~ in (7.56b), we see that -(ASJ;2 AT)- 1ASJ; 1e is an additional term in
the logarithmic barrier method to push a solution away from the boundary. Therefore,
sometimes the logarithmic barrier function method is called dual affine scaling with
centering force.
Sec. 7.3 The Primal-Dual Algorithm 177

By appropriately choosing the barrier parameter J.L and step-length at each iteration,
C. Roos and J.-Ph. Vial provided a very simple and elegant polynomiality proof of the
dual affine scaling with logarithmic barrier function. Their algorithm terminates in at
most O(,jfi) iterations. Earlier, J. Renegar had derived a polynomial-time dual algo-
rithm based upon the methods of centers and Newton's method for linear programming
problems.
Instead of using (7.72), J. Renegar considers the following function:
n
f(w, /3) = t loge(bT W- /3) + L loge(Cj- AJ w) (7.74)
j=l

where f3 is an underestimate for the minimum value of the dual objective function (like the
idea used by Todd and Burrell) and t is allowed to be a free variable. A straightforward
calculation of one Newton step at a current solution (wk; sk) results in a moving direction
(7.75)
where
bT (AS.k 2 AT)- 1 AS.k 1e + bT wk- f3
y = (bTwk- {3) 2/t + bT(AS.k2 AT)- 1b
By carefully choosing values oft and a sequence of better estimations {f3k}, J. Renegar
showed that his dual method converges in O(,jfiL) iterations and results in a polynomial-
time algorithm of a total complexity 0 (n 3·5 L) arithmetic operations. Subsequently, P.M.
Vaidya improved the complexity to O(n 3 L) arithmetic operations. The relationship
between Renegar's method and the logarithmic barrier function method can be clearly
seen by comparing (7.73) and (7.75).

7.3 THE PRIMAL-DUAL ALGORITHM

As in the simplex approach, in addition to primal affine scaling and dual affine scaling,
there is a primal-dual algorithm. The primal-dual interior-point algorithm is based on
the logarithmic barrier function approach. The idea of using the logarithmic barrier
function method for convex programming problems can be traced back to K. R. Frisch
in 1955. After Karmarkar's algorithm was introduced in 1984, the logarithmic barrier
function method was reconsidered for solving linear programming problems. P. E. Gill,
W. Murray, M.A. Saunders, J. A. Tomlin, and M. H. Wright used this method to develop
a projected Newton barrier method and showed an equivalence to Karmarkar's projective
scaling algorithm in 1985. N. Megiddo provided a theoretical analysis for the logarithmic
barrier method and proposed a primal-dual framework in 1986. Using this framework, M.
Kojima, S. Mizuno, and A. Yoshise presented a polynomial-time primal-dual algorithm
for linear programming problems in 1987. Their algorithm was shown to converge in at
most O(nL) iterations with a requirement of O(n 3 ) arithmetic operations per iteration.
Hence the total complexity is 0 (n 4 L) arithmetic operations. Later, R. C. Monteiro and
I. Adler refined the primal-dual algorithm to converge in at most 0 (,jfiL) iterations
178 Affine Scaling Algorithms Chap. 7

with O(n 2 ·5 ) arithmetic operations required per iteration, resulting in a total of O(n 3 L)
arithmetic operations.

7.3.1 Basic Ideas of the Primal-Dual Algorithm

Consider a standard-form linear program:


Minimize cT x

subject to Ax= b, (P)

and its dual:


Maximize bT w

subject to AT w +s = c, s :::: 0, w unrestricted (D)

We impose the following assumptions for the primal-dual algorithm:

=
(Al) The setS {x E Rn I Ax= b, x > 0} is nonempty.
=
(A2) The set T {(w; s) E Rm x Rn I ATw + s = c, s > 0} is nonempty.
(A3) The constraint matrix A has full row rank.

Under these assumptions, it is clearly seen from the duality theorem that problems
(P) and (D) have optimal solutions with a common value. Moreover, the sets of the
optimal solutions of (P) and (D) are bounded.
Note that, for x > 0 in (P), we may apply the logarithmic barrier function technique,
and consider the following family of nonlinear programming problems (P p.):
n
Minimize cT x - fJ., L loge Xj
j=!

subject to Ax= b, x>O


where p., > 0 is a barrier or penalty parameter.
As p., --+ 0, we would expect the optimal solutions of problem (P p.) to converge to
an optimal solution of the original linear programming problem (P). In order to prove
it, first observe that the objective function of problem (PIL) is a strictly convex function,
hence we know (Pp.) has at most one global minimum. The convex programming theory
further implies that the global minimum, if it exists, is completely characterized by the
Kuhn-Tucker conditions:
Ax=b, X>O (primal feasibility) (7.76a)
ATw+s = c, S>O (dual feasibility) (7.76b)
XSe- p.,e = 0 (complementary slackness) (7.76c)
where X and S are diagonal matrices using the components of vectors x and s as diagonal
elements, respectively.
Sec. 7.3 The Primal-Dual Algorithm 179

Under assumptions (AI) and (A2) and assuming that (P) has a bounded feasible
region, we see problem (Pp.) is indeed feasible and assumes a unique minimum at x(JL),
for each JL > 0. Consequently, the system (7.76) has a unique solution (x; w; s) E
Rn x Rm x Rn. Hence we have the following lemma:

Lemma 7.5. Under the assumptions (Al) and (A2), both problem (Pp.) and sys-
tem (7.76) have a unique solution.
Observe that system (7.76) also provides the necessary and sufficient conditions
(the K-K-T conditions) for (w(JL); S(JL)) being a maximum solution of the following
program (Dp.):
n

Maximize bT w + JL L loge Sj
j=1

subject to AT w + s = c, s > 0, w unrestricted


Note that Equation (7.76c) can be written componentwise as
for j = 1, ... , n (7.76c')
Therefore, when the assumption (A3) is imposed, x uniquely determines w from Equa-
tions (7.76c') and (7.76b). We let (X(JL); w(JL); S(JL)) denote the unique solution to system
(7.76) for each fL > 0. Obviously, we see X(JL) E Sand (w(JL); s(JL)) E T. Moreover,
the duality gap becomes
g(JL) = CT X(JL) - bT W(JL)
= (cT - w(JL)T A)x(JL)

= S(JL)T X(JL) = nJL (7.77)

Therefore, as JL ~ 0, the duality gap g(JL) converges to zero. This implies that x(JL) and
w(JL) indeed converge to the optimal solutions of problems (P) and (D), respectively.
Hence we have the following result:

Lemma 7.6. Under the assumptions (Al)-(A3), as JL ~ 0, X(JL) converges to


the optimal solution of program (P) and (w(JL); s(JL)) converges to the optimal solution
of program (D).
For JL > 0, we let r denote the curve, or path, consisting of the solutions of system
(7.76), i.e.,
r = {(x(JL); W(JL); s(JL)) 1 (x(JL); w(JL); s(JL)) solves (7.76) for some fL > 0} (7.78)
As JL ~ 0, the path r leads to a pair of primal optimal solution x* and dual optimal
solution (w*; s*). Thus following the path r serves as a theoretical model for a class of
primal-dual interior-point methods for linear programming. For this reason, people may
classify the primal-dual approach as a path-following approach.
Given an initial point (x 0 ; w0 ; s0 ) E S x T, the primal-dual algorithm generates a
sequence of points {(xk; wk; sk) E S x T} by appropriately choosing a moving direction
180 Affine Scaling Algorithms Chap. 7

(d~; d~; d~) and step-length f3k at each iteration. To measure a "deviation" from the
curve r at each (xk; wk; sk), we introduce the following notations, fork= 0, 1, 2, ... ,
fori=1,2, ... ,n (7.79a)

(7.79b)

¢~n = min{¢7; i = 1, 2, ... , n} (7 .79c)

¢;ve
ek _- k (7.79d)
¢min

Obviously, we see that ek ::: 1 and (xk; wk; sk) E r if and only if ek = 1. We shall
see in later sections, when the deviation e0 at the initial point (x 0 ; w 0 ; s 0 ) E S x T is
large, the primal-dual algorithm reduces not only the duality gap but also the deviation.
With suitably chosen parameters, the sequence of points {(xk; wk; sk) E S x T} generated
by the primal-dual algorithm satisfy the inequalities
c 7 xk+l - b7 wk+ 1 = (1 - 2/(nek))(c7 xk - b7 wk) (7.80a)
ek+ 1 -2:::: (1- 1/(n + 1))(ek- 2), if 2 < ek (7.80b)
(7.80c)
The first inequality (7.80a) ensures that the duality gap decreases monotonically.
With the remaining two inequalities we see the deviation ek becomes smaller than 3 in
at most 0 (n log e 0 ) iterations, and then the duality gap converges to 0 linearly with the
convergence rate at least (1 - 2/(3n)).

7.3.2 Direction and Step-Length of Movement

We are now in a position to develop the key steps of the primal-dual algorithm. Let
us begin by synthesizing a direction of translation (moving direction) (d~; d~; d~) at a
current point (xk; wk; sk) such that the translation is made along the curve r to a new
point (xk+ 1 ; wk+l; sk+ 1). This task is taken care of by applying the famous Newton's
method to the system of equations (7.76a)-(7.76c).

Newton direction. Newton's method is one of the most commonly used tech-
niques for finding a root of a system of nonlinear equations via successively approximat-
ing the system by linear equations. To be more specific, suppose that F (z) is a nonlinear
mapping from Rn toRn and we need to find a z* E Rn such that F(z*) = 0. By using the
multivariable Taylor series expansion (say at z = zk), we obtain a linear approximation:
F(zk + .6.z) ~ F(zk) + J(zk).6.z (7.81)
where J(zk) is the Jacobian matrix whose (i, j)th element is given by

[a~iZ;(z) Jz=zk
Sec. 7.3 The Primal-Dual Algorithm 181

and t..z is a translation vector. As the left-hand side of (7.81) evaluates at a root of
F (z) = 0, we have a linear system
(7.82)
A solution vector of equation (7.82) provides one Newton iterate from zk to zk+l =
zk +d~ with a Newton direction d~ and a unit step-length. When J(z*) is nonsingular and
the starting point z0 is "close enough" to z*, Newton's method converges quadratically
to z*. But this spectacular convergence rate is only a "local" behavior. For a general
nonlinear mapping F (z), if z0 is not close enough to z*, the Newton iteration may diverge
hopelessly.
Let us focus on the nonlinear system (7.76a-c). Assume that we are at a point
(xk; wk; sk) for some f.Lk > 0, such that xk, sk > 0. The Newton direction (d~; d~; d~)
is determined by the following system of linear equations:

(7.83)
[l
where Xk and Sk are the diagonal matrices formed by xk and sk, respectively. Multiplying
it out, we have
(7.84a)
(7.84b)
(7.84c)
where
(7.85)
Notice that if xk E Sand (wk; sk) E T, then tk = 0 and uk = 0 correspondingly.
To solve system (7.83), we multiply both sides of Equation (7.84b) by AXkSk -t.
Then, we have
(7.86)
Now from Equation (7.84c), we have
d; = X,i; vk- X,i; Skd~.
1 1
(7.87)
Following (7.85), we denote X,i; 1vk = J.LkX_;; 1e- Ske as pk. Using Equation (7.84a) in
the above equation would produce
AXkS,i; 1d; = AXkS,i; 1pk - tk (7.88)
Substituting Equation (7.88) back into Equation (7.86) yields
d~ = [AXkS,i; 1ATrl (AXkS,i; 1(uk- pk) + tk) (7.89a)

where XkS,i; 1 is a positive definite diagonal matrix.


182 Affine Scaling Algorithms Chap. 7

Once d~ is obtained, d~ and d~ can be readily computed by


dk = uk- AT dk (7 .89b)
s w

and
dkX = xk s-I[pk-
k
dkJ S
(7.89c)
Again, for (xk; wk; sk) E S x T, Equations (7.89a)-(7.89c) are simplified as

d~ = -[AD~Arr 1 AS; 1 vk (7.90a)


dks =-AT dkw (7.90b)
d~ = s; 1[vk- Xkd~] (7.90c)

where D~ = XkSi: 1 and I\ = diag ( y'x.kjSf).


It is important to note that d~, d~, and d~ in (7.90) are closely related. If we
denote vector

(7.91a)

and matrix
", ?' . ,, ~, . '1 l A

Q = DkA' (ADj;A' )- ADk

then (d~; d~; d~) can be rewritten as

d~ = Dk(I- Q)rk(f.L) (7.90a')


d~ = -(AD~AT)- 1 ADkrk(f.L) (7.90b')

d~ = Di: 1Qrk (t-L) (7.90c')

Since matlix Q is the orthogonal projection matrix onto the column space of matrix
DkAT, we see that

(7.9lb)
(7.91c)

After obtaining a Newton direction at the kth iteration, the primal-dual algorithm
iterates to a new point according to the following translation:
xk+I = xk + ,Bkd~
wk+I = wk + ,Bkd~
sk+ 1 = l + ,Bkd~
with an appropriately chosen step-length ,Bk at the kth iteration such that xk+I E S and
(wk+I; sk+I) E T.
Sec. 7.3 The Primal-Dual Algorithm 183

Step-length and penalty parameter. When (xk; wk; sk) E S x T, the primal-
dual algorithm needs two parameters a and r, such that 0 ::::: r < a < 1, to control the
penalty (or barrier) parameter pf and the step-length f3k at the kth iteration.
For the penalty parameter, remembering the notations defined in (7.79), since we
want to reduce the duality gap, n¢~ve' we may choose the penalty parameter to be a
smaller number by setting
(7.92)

In this way, definition (7 .85) implies that vk ::::: 0.


As to the step-length f3k, the choice is closely related to the complementary slack-
ness. Note that Equations (7.84c) and (7.85) imply that xfd~ + sfd;; = p,k- ¢f. Hence
the complementary slackness varies quadratically in terms of the step-length f3, since

(7.93a)
i = 1,2, ... ,n

Moreover, since (d~l d~ = 0, we see the average complementary slackness, and hence
the duality gap, changes linearly in /3, i.e.,

(7.93b)

Ignoring the quadratic term in (7.93a) and lowering the value p,k = a¢~ve by a
factor r < a, we can define a linear function
(7.94)

The function ¢f (/3) can be either convex or concave depending upon the value of d;; d~.
For the convex piece, since d;;d~ ~ 0, the curve of ¢f(f3) lies above the curve of
1/fk (/3) for 0 ::::: f3 ::::: 1. However, a concave piece of ¢f (/3) may intersect 1/rk (/3) as
shown in Figure 7.6. In order to control the deviation parameter (Jk while reducing the
complementary slackness, we choose
for all f3 E (0, fJ),

0< 7J < 1, and i = 1, ... , n } (7.95)

Then the step-length f3k at the kth iteration is defined by


(7.96)

The geometrical significance of ak and f3k is depicted in Figure 7.6. It is clearly seen
from the figure that the choice of f3k depends on the choice of 0 < r < 1 to ensure the
existence of ak.
Note that when (xk; wk; sk) E S x T, since (d~; d~; d~) is a solution to (7.84)
with tk = 0 and uk = 0, we know that Axk+ 1 = b and AT w + s = c. Moreover, the
definition of ak in (7.95) further implies that xk+ 1 > 0 and sk+ 1 > 0. In other words,
(xk+ 1 ; wk+I; sk+ 1) E S x T.
184 Affine Scaling Algorithms Chap. 7

L---------------------~--~---~
0 ak Figure 7.6

7.3.3 Primal-Dual Algorithm

We now are ready to state the primal-dual algorithm as following:

Step 1 (initialization): Set k = 0 and find a starting solution (x 0 ; w 0 ; s0 ) E S x T.


Let E > 0 be a tolerance for duality gap and CY, r be control parameters such that
O:sr<CY<l.
Step 2 (checking for optimality): If c7 xk - b 7 wk < E, then stop. Otherwise,
continue.
Step 3 (finding the direction of translation): Define <P~ve and ¢~in by (7.79). Let
J-Lk = CY</J~ve and vk = J-Lke- XkSke. Computed~, d~, d~ according to (7.90).
Step 4 (calculating step-length): Compute ak by (7.95) and f3k by (7.96).
Step 5 (moving to a new solution): Let
xk+l = xk + f3kd~
wk+! = wk + f3kd~

sk+ 1 = sk + {3kd~
Set k ~ k + 1 and go to Step 2.
7.3.4 Polynomial-Time Termination

Unlike the pure primal affine scaling and the pure dual affine scaling, the primal-dual al-
gorithm is a polynomial-time algorithm. The well-chosen step-length f3k at each iteration
leads to the nice convergence results:

Theorem 7.3. If the step-length f3k < 1 at the kth iteration, then
k 4(CY-r) 4(CY-r)
{3> >------::----:- (7.97a)
- n(l - 20' + ek0'2) - n(l + 0'2)ek
Sec. 7.3 The Primal-Dual Algorithm 185

cT xk+l _ bT wk+l = (1 _ (1 _ u)f3k)(cT xk _ bT wk) (7.97b)


ek+J- ujr::::: (1- v)(ek- ujr) if ujr < ek (7.97c)
if ek ::::: ()" /r (7.97d)

where
4(u- r)r
v = ----- -,-----
n(l + u 2 ) + 4(u- r)r

On the other hand, if f3k = 1, then


cT xk+I - bT wk+l = u(cT xk- bT wk) (7.98a)
ek+I ::::: O"jT (7.98b)

Proof Let us define

(7.99)

where rk(J.Lk) is defined by (7.91a). What we want to prove first is that

¢; (/3) :=:: r/ (/3) forf3E[0,1] and i=l, ... ,n (7.100)

Note that the first two terms of Equation (7 .93a) are bounded below by the first
two terms of Equation (7.99) for all f3 E [0, 1]. Hence we only have to evaluate the
last quadratic term. By (7.91b) and (7.9lc), we know ~(J.Lk) is the orthogonal sum of
I k k
vectors D;; dx and Dkds. Therefore,
A A

Moreover, we see that

Hence we conclude that (7.100) holds.


Now consider the case of f3k < 1. We let y = max{/3 I r/C/3) :::: ljrk (/3) }, where
ljrk(fJ) is defined by (7.94) with r = u. Since J.Lk = u¢~ve• with (7.96), we know
f3k = ak :=:: y. Moreover, putting (7.94) and (7.99) together, by computing a positive
solution of the equation

(7.101)
we get

(7.102)
186 Affine Scaling Algorithms Chap. 7

On the other hand, by (7.91), we have


n
jjrk(~Lk)jj2 = L)C¢7)1/2- O"¢~veC¢7)-l/2f
i=l
n
= n¢~ve- 20"n¢~ve + 0" 2(¢:ve) 2 2)¢7)-l
i=l
::: n¢~ve(l - 20" + 0" ¢:vef¢~in)
2

= n¢~ve(l - 20" + 0"28k)


since gk ::: 1 (7.103)
Combining the last two inequalities in (7.103) with (7.102), we obtain the inequality
(7.97a).
Notice that x7d~ + sfd;; =ILk- ¢f and (d~f (d;) = 0. We further have
n
¢;:;,' = (ljn) L ¢7+1 = (ljn)(xk+l)T (sk+')
i=l
(7.104)
Since ILk = O"¢~ve = O"(cT xk- bT wk)jn, Equation (7.97b) follows immediately.
Recalling the definitions (7.94), (7.95), and (7.96), we see that
¢~fr{ = 1/lk ({3k) = ¢!un + f3k ( r ¢;ve - ¢~in) (7.105)
Together with (7.104), we have

gk+l _ ~ = ¢~ve + f3k ( O"¢~ve - ¢~ve) ()"


r ¢~in+ f3k ( r¢~ve -¢~in) r
_ (1 - f3k) ( r¢~ve - O"¢~in)
- [¢!un + f3k (r¢~ve- ¢~in)Jr

-
- [
1- rek f3k
1-(1-r8k)f3k
l( ak - -()")
r
o.1 06)
When gk ::: O"jr, the right-hand side of (7.106) becomes nonpositive. Conse-
quently, so does the left-hand side, and gk+I ::: O"jr. This proves (7.97d).
On the other hand, when gk > T•
(7.106) implies that

gk+l- ~< [1- r8kf3k ] (ek- ~) (7.107)


r- 1+r8kf3k r

Note that (7.97a) implies


Sec. 7.3 The Primal-Dual Algorithm 187

Substituting this inequality into (7.107), we have the result (7.97c).


Finally, let us prove (7.98a) and (7.98b) in case f3k = 1. It is not difficult to see
that (7.98a) can be derived in the same way as we derive (7.97b). As to (7.98b), we first
note that as f3k = 1,
for i = 1, 2, ... , n (7.109)
Hence,

(7.110)
r
and the proof is complete.

In view of Theorem 7 .3, if k* is the earliest iteration at which ek* :=: CJ1r, then

~
r
< ek ::: (1- v/ (e 0 - ~)
r
+ ~.
r
vk < k*
and
ek <-
()
v k:::: k*
-r '
If such a k* does not exist, then ek > CJ I r, V k and

-IJ <
r
ek ::: (1- v) k(O
e - -CJ)
r
+ -,
CJ
r
\fk

Notice that 0 < v < 1, in either case we have


ek::: max {CJir,e 0 }' vk
Then it is clear to see that ek gets smaller than (CJ lr) + 1 in at most O(n loge e 0 )
(say, k) iterations. Consequently, it follows from Equation (7.97a) that
40 - CJ)(CJ- r) k
( 1- CJ)f3k > V k >_ (7.111)
- n(l + CJ 2 )(% + 1)'

By the inequality (7.97b), the duality gap cT xk- bT wk attains the given accuracy
E and the iteration stops in at most

0 ( n loge ( cT xo ~ bT wo))
additional iterations. Hence the primal-dual algorithm terminates in at most

0 (n loge eO) +0 ( n loge ( CT XO ~ bT WO))


iterations.
There are various ways of setting the control parameters CJ and r such that 0 :=:
r < CJ < 1. As a special case, we let CJ = 112 and r = 114, then
k 4
f3 > -
- nek
188 Affine Scaling Algorithms Chap. 7

and (7 .80) follows. Also notice that at each iteration of the primal-dual algorithm, the
computational bottleneck is the inversion of the matrix AD~AT. A direct implementation
requires O(n 3 ) elementary operations for matrix inversion and results in an O(n 4 L)
complexity for the primal-dual algorithm. Definitely, this complexity can be reduced by
better implementation techniques.

7.3.5 Starting the Primal-Dual Algorithm

In order to apply the primal-dual algorithm, we start with an arbitrary point (x0 ; w 0 ; s0 ) E
R"+m+n such that x0 > 0 and s0 > 0.
In case Ax 0 = b and AT w 0 + s0 = c, we know x0 E S and (w0 ; s0 ) E T and we
have a starting solution for the primal-dual algorithm. Otherwise, consider the following
pair of artificial primal and dual linear programs:

Minimize c T x+nxn+l
0
subject to Ax + (b - Ax )xn+l = b (AP)

(AT w 0 + s0 - c)T X+ Xn+2 =A

where Xn+l and Xn+2 are two artificial variables and ;rr and A are sufficiently large positive
numbers to be specified later;

Maximize bT w + AWm+l
subject to AT w +(AT w 0 + s 0 - c)wm+l + s =c (AD)

(b- Ax
0
l w + Sn+l = 7r
Wm+l + Sn+2 = 0
(s; Sn+l; Sn+2) ~ 0

where Wm+J, Sn+l and Sn+2 are artificial variables.


Notice that if we choose ;rr and A such that

;rr > (b- Ax0 )T w0 (7.112a)

A > (AT w 0 + s0 - c) T x0 (7.112b)

then (x0 , x~+ 1 , x~+ 2 ) and (w0 , w~,+ 1; s 0 , s~+ 1 , s~+ 2 ) are feasible solutions to the artificial
problems (AP) and (AD), respectively, where

x~+l = 1
x~+2 = A - (AT wo +so - c) T xo

w~+l = -1
Sec. 7.3 The Primal-Dual Algorithm 189

In this case, the primal-dual algorithm can be applied to the artificial problems
(AP) and (AD) with a known starting solution. Actually, the optimal solutions of (AP)
and (DP) are closely related to those of the original problems (P) and (D). The following
theorem describes this relationship:

Theorem 7.4. Let x* and (w*; s*) be optimal solutions of the original problems
(P) and (D). In addition to (7.112a) and (7.112b), suppose that
(7.112c)
and
Tr > (b - Ax 0l w* (7.112d)
Then the following two statements are true:

(i) A feasible solution (x, Xn+l, Xn+2) of (AP) is a minimizer if and only if x solves
(P) and Xn+l = 0.
(ii) A feasible solution (w, Wm+l; s, sn+I, sn+2) of (AD) is a maximizer if and only if
(w; s) solves (D) and Wm+l = 0.

Proof. Since x* is feasible to (P), if we further define that x,~+l = 0 and x~+ 2 = ).. -
(AT w0 +s 0 -cl x*, then (x*, x~+!, x~+ 2 ) is feasible to (AP). Suppose that (x, Xn+!, Xn+2)
is feasible to (AP) with Xn+I > 0, then
cT x* + nx,:+l = w*Tb = w*T (Ax+ (b- Ax0 )xn+d
Note that AT w* + s* = c, Xn+l > 0, and (7.112d). We see that

CT X* * < (C- S*)T X + TrXn+l ::S


+ TrXn+l CT X + TrXn+l
since s*T x :;: 0. This means that (x, Xn+ 1, Xn+2) cannot be an optimal solution to (AP) un-
less Xn+l = 0. Furthermore, through the property of continuity, we know (x*, x~+l, x~+ 2 )
is an optimal solution to (AP). Therefore, if a feasible solution (x, :Xn+l, :Xn+2) of (AP)
is optimal, then :Xn+l = 0 and cTx = cT x*. Because x satisfies all the constraints of (P),
it must be an optimal solution to (P).
Conversely, if (x, 0, :Xn+2) is a feasible solution of (AP) and xis an optimal·solution
of (P), then the objective value cTx + TrXn+l coincides with the minimal value cT x* +
nx~+l· Hence it is an optimal solution of (AP). This concludes the proof of part (i).
Similarly, we can prove part (ii).

7.3.6 Practical Implementation

In the real implementation of the primal-dual algorithm, it is a very difficult task to keep
(xk; wk; sk) E S x T due to numerical problems. Also the choice of the control parameters
190 Affine Scaling Algorithms Chap. 7

greatly affects the performance of the algorithm. Much effort has been devoted to
designing a version of the primal-dual algorithm for practical implementation. In this
section, we introduce one version of the primal-dual algorithm that allows us to start
with an arbitrary point (x 0 ; w0 ; s0 ) with x 0 , s0 > 0. This version produces a sequence of
iterates {(xk; wk; sk)), with xk, sk > 0, which leads to an optimal solution, although they
no l.c:mger stay on the curve of S x T. It is important to know that, at this moment, there
i< 'iO rigorous convergence proof for this version of the primal-dual algorithm, but it is
··,;dely used in many commercial packages.

Moving direction. The basic idea of this version follows from the analysis of
Section 3.2. Assume that we are at a point (xk; wk; sk) for some fJ-k > 0, such that
xk, sk > 0. The Newton direction (d~; d~; d~) is determined by Equations (7.89a)-
(7.89c). Combining (7.89a), (7.89b), and (7.89c), we have
(7.113a)

where D~ = XkS/; 1 and Pk =I- DkAT(AD~AT)- 1 ADk, which is the projection matrix
onto the null space of matrix ADk.
If we further define that
k kA A -1
dxm = p, DkPkDkXk e,
then (7.113a) becomes
(7.113b)
The first term of (7.113b) is usually called the centering direction, since in light of
the potential push, it is nothing but the projection of the push vector (ljxk) which helps
the algorithm stay away from the walls of the primal polytope. The second term is called
the objective reduction direction, since it is the projected negative gradient of the primal
objective function which leads to a reduction in the primal objective function. The third
term is called the feasibility direction, since tk is a measure of primar feasibility. Also
note that Adkxctr = 0, and Ad~obj. = 0. Hence these two directions are in the null space of
matrix A, and the primal feasibility is solely affected by dkxfeas .
In practice, if we start with an arbitrary point (x0 ; w0 ; s0 ) with x 0 , s0 > 0, the value
of t0 might be very large, since x0 could be far from being feasible. At this point, the
main effort of the algorithm will be in finding a feasible point near the central trajectory.
Once a feasible solution is found (say, at the kth iteration) the algorithm will try to
keep tk' = 0 for all k' ::: k, except for the case that feasibility is lost due to numerical
truncation or round-off errors. In this way, d~1 '"·' will eventually vanish from the picture.
In a similar fashion one can carry out the analysis of moving directions on the dual
side, i.e, d~ and d~. It is left as an exercise for the reader.

Step-length. Once the moving direction is obtained, we are ready to move to a


new point (xk+ 1; wk+ 1; sk+ 1) with xk+ 1 > 0 and sk+ 1 > 0. To do so, we let
(7.114a)
Sec. 7.3 The Primal-Dual Algorithm 19"1

wk+l = wk +,Bod~ (7.114b)


sk+J = l +.Bod~ (7.114c)
where ,Bp and ,8 0 are the step-lengths in the primal and dual spaces, respectively. The
nonnegativity requirements of xk+ 1 and sk+ 1 dictate the choice of the step-lengths ,Bp
and ,8 0 . One simple way, as we did before, is to take
1
,Bp = (7.115a)
max {1, -dUaxf}
and
1
.Bo = (7.115b)
max {1, -dUasf}
where a < 1, (d;); is the ith component of d~, xt is the ith component of xk, (d;); is
the i th component of d~, and st is the i th component of sk.

Adjusting Penalty Parameters and Stopping Rules. Notice that the mov-
ing direction at the kth iteration is determined by the value of the penalty parameter J.L k.
Strictly speaking, the translation described above has to be carried out several times for
a fixed value of J.Lk, so that the Newton steps actually converge to the central trajectory
corresponding to that J.Lk. However, it is apparent that doing so would be an "overkill."
Recall that at optimality J.Lk has to be brought to zero to satisfy the complementary slack-
ness. Therefore, in practical implementations, the value of J.Lk is reduced from iteration
to iteration and only one Newton step is carried out for a given value of J.Lk.
The way in which J.Lk can be reduced at each iteration is suggested by the algorithm
itself. From Equation (7.76c) we see that J.L = s7 xjn. Plugging in the values of xk and
sk gives us a reasonably good measure of the penalty parameter for the current point.
According to our experience, sometimes, a lower value of J.i, say, O"[(skf xk]/n with
a < 1, could accelerate the convergence of the algorithm. There have been other similar
ideas reported by various authors on the choice of J.Lk. Nevertheless, the above simple
rule seems to work well for a variety of practical problems solved by the authors.
As far as the stopping rules are concerned, we may check the primal feasibility, dual
feasibility, and complementary slackness. Notice that the primal feasibility is measured
by tk, dual feasibility by uk, and complementary slackness by vk as defined by (7 .85).

Step-by-Step Procedure. As a summary of our discussion, we now provide


a step-by-step procedure for the implementation of the new version of the primal-dual
interior-point algorithm.

Step 1 (starting the algorithm): Set k = 0. Choose an arbitrary (x0 ; w0 ; s0 ) with


x0 > 0 and s0 > 0, and choose sufficiently small positive numbers E 1, E2 , and E3 .
Step 2 (intermediate computations): Compute
(xk) T sk
J.L k = -'---'---
n
192 Affine Scaling Algorithms Chap. 7

tk =b- Axk, uk = c- ATwk- sk, vk = p}e- xkske, pk = Xk' 1vk, and


i>~ = xksk'', where xk and sk are diagonal matrices whose diagonal entries are
xf and sf, respectively.
Step 3 (checking for optimality): If
lit! I !lull
llbll + 1 < Ez,
and
llcll + 1
then STOP. The solution is optimal. Otherwise go to the next step.
[Note: !lull and llcll are computed only when the dual constraints are violated.
If u :::: 0, then there is no need to compute this measure of optimality.]
Step 4 (calculating directions of translation): Compute

d~ = ( AD~Ar) _, (AD~ (uk- pk) + tk)


dks = uk- AT dkw
d; = i>~ (pk - d~)

Step 5 (checking for unboundedness): If

tk = 0, d; > 0, and cr d; < 0

then the primal problem (P) is unbounded. If

uk = 0, d~ > 0,

then the dual problem (D) is unbounded. If either of these cases happens, STOP.
Otherwise go to the next step.
Step 6 (finding step-lengths): Compute the primal and dual step-lengths
1
f3p= max {1 , - x, axik}
dkj

and

where a < 1 (say, 0.99).


Step 7 (moving to a new point): Update the solution vectors
xk+l + f3pd;
+- xk

wk+l +- wk + .BDd~

sk+ +- sk + f3Dd~
1

Set k +- k + 1 and go to Step 2.


Sec. 7.3 The Primal-Dual Algorithm 193

Now we present a numerical example to illustrate the algorithm.

Example 7.3
Consider the same problem as in Example 7.1 and Example 7.2. We begin with an arbitrary
assignment of x0 = [1 1 1 l]T, w0 = [0 O]T, s0 = [1 1 1 1]T. With this
information, we see that Xo, So and :06
are all equal to the identity matrix I, and p,O = 1.
We now compute

t 0 =b-Ax0 = [14 13]r, u 0 =c-ATw0 -s0 = [-3 0 -1 -1]T

v0 = J.L 0 e- XoSoe = [0 0 0 0] T, p0 = X01v0 = [0 0 0 0] T

Therefore,

d~ = + t0 J = [~:~ ~:~] [ ~~] [~:~]


1
(Afi6Arr [Afi6 (u0 - p
0
) =

rl?=u 0 -ATd~=[-9.4 -2.8 -7.4 -10.2]T

d~ = D6(p 0 - rl?) = [9.4 2.8 7.4 10.2] T

Although d~ > 0 and cT d~ < 0, we see from t 0 that the primal is still infeasible at
this moment. Hence we proceed further.
We choose a = 0.99. Using the formula to compute the step-lengths, we find that
(Jp = 1.0 and fJD = 1/10.30303 = 0.097059. Therefore the updated solution becomes

1] T + 1.0 X [9.4 2.8 7.4 10.2] T

[10.4 3.8 8.4 11.2] T

1] T + 0.0.097059 X ( -9.4 -2.8 -7.4 -10.2] T

[0.08765 0.72824 0.28176 0.00999] T

and

w 1 = (0 0) + 0.0.097059 X (6.4 9.2] T = [0.62118 0.89294] T

The new solution x 1 is already primal feasible, which is in tune with our previous
discussions. The reader is urged to carry out more iterations to see that an optimal solution
with

x* = [30 15 0 0] T, w* = [-2 -1]r, and s* = [0 0 2 1]T

is finally reached.

7.3.7 Accelerating via Power Series Method

As we discussed before, ideally it takes several Newton steps for a given penalty pa-
rameter to get onto the central trajectory, although we found that in most cases it is
adequate to carry out only one Newton step for each penalty parameter. In order to track
194 Affine Scaling Algorithms Chap. 7

the continuous central trajectories more closely, we may consider using the power-series
approximation method as we did for the dual affine scaling algorithm.
To simplify the discussion, we choose the smaller one between f3 p and f3 D as a
common step-length f3 for both the primal and dual iterations and focus on a current
point, say (x0 ; w 0 ; s0 ). In the limiting case of f3 -+ 0, (7.84) can be rewritten in the
following continuous version:

A dx(f3) = t(f3) (7.116a)


df3

AT dw(f3) + ds(f3) = u(f) (7.116b)


df3 df3

S(f3)dx(f3) +X(f3)ds(f3) =v(f3) (7.116c)


df3 df3
such that x(O) = x0 , w(O) = w0 , and s(O) = s0 , where t(f3) = b - Ax(/3), u(f3) =
c-AT w(f3) - s(f3), v(f3) = p.,e- X(f3)S(f3)e, and X(f3) and S(f3) are the diagonal
matrices whose diagonal elements are Xj ({3) and Sj ({3), respectively.
Now, what we have to do is to find a solution to the system depicted by Equa-
tion (7.116) in the form of a truncated power series. This can be carried out exactly as
we did for the dual affine scaling algorithm. The only difference is that, in addition to the
expansions of w(f3) and s(f3), we need to consider the expansions of x(f3), t(f3), u(/3),
and v(f3) around f3 = 0 as well. Owing to the similarity in procedure, the algebraic
simplifications are left for the readers as an exercise.
Based on our experience, we note the following characteristics of the primal-dual
interior point algorithm:

1. The algorithm is essentially a one-phase method.


2. The computational burden per iteration is more or less the same as that of the
primal or the dual affine scaling algorithm.
3. The improvement in convergence rate obtained by performing the power series
enhancement to the primal-dual algorithm is not as significant as we obtained in
the dual affine scaling algorithm.
4. Owing to its "self-correcting nature" (at least, in the case of restoring feasibility
that might have been lost due to numerical errors of computers), the primal-dual
algorithm is found to be numerically robust.

7.4 CONCLUDING REMARKS

In this chapter we have studied the basic concepts of affine scaling including the primal,
dual, and primal-dual algorithms. Many extensions have been made to enhance the basic
affine scaling algorithms. However, it is important to understand that the research work
in this area is still ongoing. Different barrier functions including the entropy and inverse
functions have been proposed. Unfortunately, no polynomial convergence result has
References for Further Reading 195

been achieved at this moment. A unified treatment will definitely help the development
of the interior-point methods for linear programming. The idea of using interior-point
methods to solve quadratic and convex programming problems with linear constraints
has also been explored by many researchers. We shall study these interesting topics in
later chapters.

REFERENCES FOR FURTHER READING

7.1 Adler, I., Karmarkar, N., Resende, M. G. C., and Veiga, G., "An implementation of Kar-
markar's algorithm for linear programming," Mathematical Programming 44,297-335 (1989).
7.2 Adler, I., and Resende, M.G. C., "Limiting behavior of the affine scaling continuous trajec-
tories for linear programming problems," Mathematical Programming 50, 29-51 (1991).
7.3 Barnes, E. R., "A variation of Karmarkar's algorithm for solving linear programming prob-
lems," Mathematical Programming 36, 174-182 (1986).
7.4 Cavalier, T. M., and Soyster, A. L., "Some computational experience and a modification of
the Karmarkar algorithm," presented at the 12th Symposium on Mathematical Programming,
Cambridge, MA (1985).
7.5 Dikin, I. I., "Iterative solution of problems of linear and quadratic programming" (in Rus-
sian), Doklady Akademiia Nauk USSR 174, 747-748, (English translation) Soviet Mathematics
Doklady 8, 674-675 (1967).
7.6 Frisch, K. R., "The logarithmic potential method of convex programming," Technical Report,
University Institute of Economics, Oslo, Norway (1955).
7.7 Freund, R. M., "Polynomial-time algorithms for linear programming based only on primal
affine scaling and projected gradients of a potential function," Mathematical Programming
51, 203-222 (1991).
7.8 Gill, P. E., Murray, W., Saunders, M.A., Tomlin, J. A., and Wright, M. H., "On projected bar-
rier methods for linear programming and an equivalence to Karmarkar's projective method,"
Mathematical Programming 36, 183-209 (1986).
7.9 Gonzaga, C., "An algorithm for solving linear programming problems in O(n 3 L) opera-
tions," in Progress in Mathematical Programming: Interior-Point and Related Methods, ed.
N. Megiddo, Springer-Verlag, New York, 1-28 (1989).
7.10 Gonzaga, C., "Polynomial affine algorithms for linear programming," Mathematical Pro-
gramming 49, 7-21 (1990).
7.11 Huard, P., "Resolution of mathematical programming with nonlinear constraints by the
method of centers," in Nonlinear Programming, ed. J. Abadie, North-Holland, Amsterdam,
Holland, 207-219 (1967).
7.12 Karmarkar, N., Lagarias, J. C., Slutsman, L., and Wang, P., "Power series variants of
Karmarkar-type algorithms," AT&T Technical Journal68, No. 3, 20-36 (1989).
7.13 Kojima, M., Mizuno, S., and Yoshise, A., "A primal-dual interior point method for linear pro-
gramming," in Progress in Mathematical Programming: Interior-Point and Related Methods,
ed. N. Megiddo, Springer-Verlag, New York, 29-48 (1989).
7.14 Megiddo, N ., "On the complexity of linear programming," in Advances in Economical Theory,
ed. T. Bewely, Cambridge University Press, Cambridge, 225-268 (1987).
196 Affine Scaling Algorithms Chap. 7

7.15 Megiddo, N., Progress in Mathematical Programming: Interior-Point and Related Methods,
Springer-Verlag, New York (1989).
7.16 Megiddo, N., and Shub, M., "Boundary behavior of interior point algorithms in linear pro-
gramming," Mathematics of Operations Research 14,97-146 (1989).
7.17 Monteiro, R. C., and Adler, I., "Interior path following primal-dual algorithms. Part I: Linear
programming," Mathematical Programming 44, 27-42 (1989).
7.18 Monteiro, R. C., Adler, I., and Resende, M. C., "A polynomial-time primal-dual affine scaling
algorithm for linear and convex quadratic programming and its power series extension,"
Mathematics of Operations Research 15, 191-214 (1990).
7.19 Renegar, J., "A polynomial-time algorithm based on Newton's method for linear program-
ming," Mathematical Programming 40, 59-93 (1988).
7.20 Roos, C., "A new trajectory following polynomial-time algorithm for linear programming
problem," Journal of Optimization Theory and Applications 63, 433-458 ( 1989).
7.21 Roos, C., and Vial, J.-Ph., "Long steps with the logarithmic penalty barrier function in linear
programming," in Economic Decision Making: Games, Economics, and Optimization, ed.
J. Gabszevwicz, J.-F. Richard, and L. Wolsey, Elsevier Science Publisher B.V., 433-441
(1990).
7.22 Sun, J., "A convergence proof for an affine-scaling algorithm for convex quadratic program-
ming without nondegeneracy assumptions," manuscript to appear in Mathematical Program-
ming (1993).
7.23 Tseng, P., and Luo, Z. Q., "On the convergence of affine-scaling algorithm," manuscript to
appear in Mathematical Programming 53 (1993).
7.24 Tsuchiya, T., "A study on global and local convergence of interior point algorithms for linear
programming" (in Japanese), PhD thesis, Faculty of Engineering, The University of Tokyo,
Tokyo, Japan (1991).
7.25 Vanderbei, R. J., "Karmarkar's algorithm and problems with free variables," Mathematical
Programming 43, 31-44 (1989).
7.26 Vanderbei, R. J., "ALPO: Another linear program solver," Technical Memorandum, No.
11212-900522-18TM, AT&T Bell Laboratories (1990).
7.27 Vanderbei, R. J., and Lagarias, J. C., "I. I. Dikin's convergence result for the affine-scaling
algorithm," Contemporary Mathematics 114, 109-119 (1990).
7.28 Vanderbei, R. J., Meketon, M. S., and Freedman, B. A., "A modification of Karmarkar's
linear programming algorithm," Algorithmica 1, 395-407 (1986).
7.29 Vaidya, P.M., "An algorithm for linear programming which requires O(((m + n)n 2 + (m +
n)l. 5 n)L) arithmetic operations," Mathematical Programming 47, 175-201 (1990).
7.30 Ye, Y., "An O(n 3 L) potential reduction algorithm for linear programming," Contemporary
Mathematics 114, 91-107 (1990).
7.31 Zhang, Y., Tapia, R. A., and Dennis, J. E., "On the superlinear and quadratic convergence
of primal-dual interior point linear programming algorithms," SIAM Journal on Optimization
2, 304-324 (1992).
Exercises 197

EXERCISES

7.1. You are given two algorithms, A and B. Algorithm A solves systems of linear equations;
Algorithm B solves linear programming problems.
(a) How can you use Algorithm A to solve a linear programming problem?
(b) How can you use Algorithm B to solve a system of linear equations?
(c) Combining (a) and (b), what is your conclusion? Why?
7.2. Consider the following linear programming problem:
Minimize -x, + 1
subject to x3 - x4 = 0

(a) Draw a graph of its feasible domain. Notice that (0, 0, 0.5, 0.5) is a vertex. Use the
revised simplex method to find its moving direction at this vertex and display it on the
graph.
(b) Note that (0.01, O.Ql, 0.49, 0.49) is an interior feasible solution which is "near" to the
vertex in (a). Use Karmarkar's algorithm to find its moving direction at this solution
and display it on the graph.
(c) Use the primal affine scaling algorithm to find its moving direction at
(0.01, 0.01, 0.49, 0.49) and display it on the graph.
(d) Use the primal affine scaling algorithm with logarithmic barrier function to find its
moving direction at (0.01, 0.01, 0.49, 0.49) and display it on a graph.
(e) Compare the directions obtained from (a) -(d). What kind of observations can be made?
Do you have any reason to support your observations?
7.3. Focus on the same linear programming problem as in Exercise 7.2.
(a) Find its dual problem and draw a graph of the dual feasible domain.
(b) Show that (1, -2) is an interior feasible solution to the dual linear program.
(c) Apply the dual affine scaling algorithm to find its moving direction at this point and
display it on the graph of the dual feasible domain.
(d) Is this moving direction pointing to the dual optimal solution?
(e) Apply the dual affine scaling algorithm with logarithmic barrier function to find its
moving direction at this point and display it on the graph of the dual feasible domain.
(f) Is the direction obtained in (e) better than that in (c)? Why?
7.4. Focus on the same linear programming problem again.
(a) Starting with the primal feasible solution x = (0.01, 0.01, 0.49, 0.49) and dual feasible
solution w = (1, -2), apply the primal-dual algorithm as stated in Section 7.3.6 under
"Step-by-Step Procedure" to find its moving directions dx and dw.
(b) Display the moving directions on the corresponding graphs.
(c) Can you make further observations and explain why?
7.5. Given a linear programming problem with bounded feasible domain, if the problem is both
primal and dual nondegenerate and xk is a primal feasible solution, show that
(a) AXk is of full row rank (assuming that m < n).
(b) The set C defined in (7.15) is a set of vertices of the polytope P of primal feasible
domain.
198 Affine Scaling Algorithms Chap. 7

7.6. Consider a linear programming problem with lower bounds:

Minimize c7 x

subject to Ax = b

where A is an m x n matrix with full row rank and q E R".


(a) Convert it into a standard form linear programming problem with exactly n variables.
(b) Find the dual linear program of (a). Show that when q = 0, a regular dual program is
obtained.
(c) Our objective is to design an interior-point method to solve the problem. The basic
philosophy is to map a current interior solution xk ( Axk = b and xk > q ) to the
"center" of the first 011hant of R" (i.e., e = (l, ... , ll).
(i) Find such a transformation and prove it is one-to-one and onto from the set {x E
R" I x 2: q} to the set (y E R" I y 2: 0}.
(ii) Write down the corresponding linear program in the transformed space.
(iii) In the transformed space, project the negative gradient of the objective function into
the null space of the constraints. What is the moving direction?
(iv) Derive the corresponding moving direction in the original space.
(v) Apply the primal affine scaling algorithm to the conve11ed standard linear program
of (a). Compare the moving direction with the one obtained in (iii). What is your
conclusion?
(vi) Continue the work of (iii): how do you choose an appropriate step-length to keep
feasibility?
(vii) Give the formula for updating a current interior solution.
(viii) What is your stopping rule?
(ix) How do you find an initial interior solution?
(x) Finally, state a step-by-step procedure to solve a linear programming problem with
lower bounds.
7.7. Consider the primal affine scaling with logarithmic barrier function. Define P AXk to be
the projection map onto the null space of matrix AXk. and show that the moving direction
(7.49a) at a current solution xk can be written as

7.8. In this problem, we try to outline a proof showing the primal affine scaling algorithm with
logarithmic barrier function is a polynomial-time algorithm. This proof is due to Roos and
Vial.
(a) Show that

where z (xk, ti) minimizes I :kz -ell


with the constraints A 7 y+z =candy E Rm. [Hint: Consider the first-order optimality
conditions of the minimization problem.]
Exercises 199

(b) Problem (a) indicates that the 2-norm of

Xkc )
p AXk ( ---;;; - e

can be used as a measure for the distance of a given point xk to the point xk (J-Lk) on
the central trajectory. Let us denote this distance measure by o(xk, J-Lk), i.e.,

Show that o(xk(J-Lk), J-Lk) = 0 and z(xk(J-Lk), J-Lk) = z(J-Lk).


(c) A new solution is given by xk+ 1 = xk +dt. Show that

k+l X~z (xk' 1-Lk)


x = 2xk - _:.:--'--,--.:...
I-Lk

(d) Prove that, if o(xk, J-Lk) < 1, then xk+l is an interior feasible solution to (P). Moreover,
o(xk+ 1, J-Lk) ::S o(xk, J-Lk). This implies that if we repeatedly replace xk by xk+ 1, with
fixed I-Lk, then we obtain a sequence of points which converges to x*(J-Lk) quadratically.
(e) Choose 0 < e < I and let J-Lk+l = (l - e)J-Lk. Show that

o (xk+t, I-Lk+ I) < I~ e (o(xk+l, J-Lk) + eJ/1)


(f) Let o(xk, J-Lk) ::S 1/2 and e = I/6J/1. Show that o(xk+ 1, Jl-k+J) ::S 1/2.
(g) Notice that when z(xk, J-Lk) is determined, then y(xk, J-Lk) is also determined by AT y +
z =c. Now, if o(xk, J-Lk) ::S l, show that y(xk, J-Lk) is dual feasible. Moreover,
J-Lk(n _ o(Xk, J-Lk)J/1) ::S CT Xk _ bT y(Xk, J-Lk) ::S J-Lk(n + o(Xk, J-Lk)J/1)
(h) Given an initial interior feasible solution x0 > 0 and a barrier parameter f-Lo > 0 such
that o(x0 , Jl- 0 ) .:s
1/2. Also let q be a large positive integer. We state our algorithm as
follows:
begin
0
e := I/6J/1, x := x0 , J-L := J-L ;

while ntL > e- 4 do


begin
z := z(x, J.L);
X2z
x:=2x--;
1-L
J-L := (1- e)J-L;

end
end
Let q 0 = -loge(nJ-L 0), and show that the algorithm terminates after at most 6(q-q 0 )Jn
iterations. The final points x and y(x, J-L) are interior solutions satisfying
200 Affine Scaling Algorithms Chap. 7

7.9. For the dual affine scaling algorithm, explain the meaning of "primal estimate" as defined
in (7.59).
7.10. For the primal-dual algorithm, try to decompose d~ and d: as we did for d~ in (7.113).
Then analyze different components.
7.11. We take x0 = e, w0 = 0, and s0 = e.
(a) Show that (7.112a) becomes n > 0.
(b) Show that (7.112b) becomes A. > n- cT e.
(c) What about (7.112c) and (7.112d)?
7.12. Derive the power-series expansions for x(/3), w(/3), s(/3), t(/3), u(/3), and v(/3) in the
primal-dual algorithm.
7.13. Develop computer codes for the primal affine scaling, dual affine scaling, and primal-dual
algorithms and test those problems in Exercise 3.16.
8

Insights into the


Interior-Point Methods

In Chapter 7 we have studied three polynomial-time interior-point algorithms, namely


the primal affine scaling with logarithmic barrier function, the dual affine scaling with
logarithmic barrier function, and the primal-dual algorithms. Actually they are strongly
connected and can be treated by an integrated approach. In this chapter we first show
that the moving directions of these three algorithms are merely the Newton directions
along three different algebraic paths that lead to the solution of the Karush-Kuhn-Tucker
conditions of a given linear programming problem under suitable assumptions. More-
over, the dual information embedded in the primal algorithm and the primal information
embedded in the dual algorithm can be recovered in the primal-dual algorithm but with
different scaling matrices. Based on these findings, we then introduce a general theory
of constructing new interior-point methods.

8.1 MOVING ALONG DIFFERENT ALGEBRAIC PATHS

Let us consider a linear programming problem (Program P) in its standard form:


Minimize CT X (8.la)
subject to Ax = b, x ::: 0 (8.lb)
where A is an m x n matrix. Its dual problem (Program D) is in the following form:
Maximize bT w (8.2a)

subject to AT w + s = c, s ::: 0 (8.2b)

201
202 Insights into the Interior-Point Methods Chap. 8

For any positive scalar f-L, we can incorporate a logarithmic barrier function either into
the primal program P and consider a corresponding problem (Program P11-):
n

Minimize cT x - f-L L loge Xj (8.3a)


j=t

subject to Ax == b, x>O (8.3b)


or into the dual program D and consider a corresponding problem (Program DIL):
n

Maximize bT w + f-L L loge Sj (8.4a)


j=l

subject to AT w +s = c, S>O (8.4b)


In Chapter 7, we have seen that the Karush-Kuhn-Tucker conditions of programs P11- and
D11- lead to the same system:
ATw+s-c=O (8.5a)
Ax-b=O (8.5b)
XSe- 1-Le = 0 (8.5c)
X> 0, S > 0 (8.5d)
where X and S are diagonal matrices using the components of vectors x and s as diagonal
elements, respectively.
To assure the existence of a unique optimal solution to program P11- and program D!L,
or equivalently the existence of a unique solution to the system (8.5), we assume that

(Al) There exists a primal interior feasible solution, i.e.,


S ={X ERn I Ax= b, X> 0} =/::. ¢
(A2) There exists a dual interior feasible solution, i.e.,
T ={(w; s) E Rm X Rn I AT w + s = c, s > 0} =I= ¢
(A3) The constraint matrix A has full row rank.

Notice that, under the above assumptions, as 1-L approaches 0, the unique solution to the
system of equations (8.5) solves the given linear program P and its dual problem D.
However, for 1-L > 0, we can actually approach the solution of XSe - f-Le = 0 from
different but equivalent algebraic paths. To be more specific, for Xj > 0 and Sj > 0
(j = 1, ... , n ), consider the following functions:
f(xj,sj) =f.L-XjSj (8.6a)
f1,
g(xj,sj) = - -Sj (8.6b)
Xj
1-L
h (xj, Sj) = - -Xj (8.6c)
Sj
Sec. 8.1 Moving along Different Algebraic Paths 203

Although they are different in format, the above three functions are all algebraically
equivalent to the condition (8.5c), since
2
{ (x, s) E R " If (x1 , SJ) = 0, x1 > 0, s1 > 0, for j = 1, ... , n}
211
= {(x,s) E R lg(xj,Sj) =O,x1 > O,s1 > 0, for j = 1, ... ,n}
= { (x, s) E R
2
" I h(xj, s1) = 0, Xj > 0, s1 > 0, for j = 1, ... , n}
= {(x, s) E R 2" I XSe - J.Le = 0, x > 0, s > 0}

In this way solving system (8.5) is equivalent to solving one of the following three
systems:
ATw+s-c=O (8.7a)
Ax-b=O (8.7b)
j(Xj, Sj) = 0, for j = 1, ... , n (8.7c)
X> 0, S > 0 (8.7d)

ATw+s-c=O (8.8a)
Ax- b = 0 (8.8b)
g(Xj, Sj) = 0, for j = 1, ... , n (8.8c)
X> 0, S > 0 (8.8d)

ATw+s-c=O (8.9a)
Ax- b = 0 (8.9b)
h(Xj, Sj) = 0, for j = 1, ... , n (8.9c)
X> 0, S > 0 (8.9d)
To solve any one of the above three systems, let us assume that (xk; wk; sk) E R" x
Rm x R" such that AT wk + sk = c, Axk = b, xk > 0, and sk > 0. We shall apply the
famous Newton method to solve these systems at (xk; wk; sk). Note that only functions
f, g, and h are nonlinear in these three systems. Therefore, when the Newton method
is applied, we need only linearize them for obtaining a moving direction.

8.1.1 Primal Affine Scaling with Logarithmic Barrier


Function

Let us focus on system (8.8) first. Taking one Newton step with a linearization of the
function g(x1 , s1 ) = 0, we have
k
O-g(x1k ,s1k )= [ \lg(x k ,s1k ) ] T ( Xj - Xjk )
1
s1 - s1
204 Insights into the Interior-Point Methods Chap. 8

Substituting (8.6b) for the function g and multiplying it out, we see that

Consequently, we have
sf -1 ~ hxf)'' -!) =:n c
Sj ~ ;; - ( (1)') Xj

Since the above equation holds for j = 1, ... , n, by taking matrix Xk = diag (xk), we
have
(8.10)

Moving along Newton direction, the linear equations (8.8a) and (8.8b) are preserved. By
(8.8a), s =c-AT wand (8.10) becomes
1
x = -X~(AT w + 2p.XJ: 2 xk- c)
fJ
Multiplying both sides by matrix A and applying (8.8b), we see that
1
b =Ax= -AX~(AT w + 2J-LXJ: 2 xk- c)
fJ
Consequently,

Plugging w into the formula of x, we have

Notice that AXke = Axk = b, we see the Newton direction is given by


f..xk = x- xk

= _]:_Xk[I- XkAT (AX~AT)- 1 AXk](Xkc- J-Le)


fJ
Since the above direction is exactly the same as the direction given by formula (7 .49a) at
x = xk, we can conclude that the primal affine scaling algorithm with logarithmic barrier
function actually takes the Newton direction along the algebraic path of g(x, s) = 0.

8.1.2 Dual Affine Scaling with Logarithmic Barrier Function

This time, let us focus on the system (8.9) to show that the dual affine scaling algorithm
with logarithmic barrier function actually takes the Newton direction along the algebraic
Sec. 8.1 Moving along Different Algebraic Paths 205

path of h(x, s) = 0. Note that one Newton step with the linearization of h(x1 , Sj) = 0
results in

Using formula (8.6c) for function h, we have

and

Xj ~ 1- ((:;) 2 ) Sj

Note the above equation holds for j = 1, ... , n. By taking matrix Sk = diag (sk), we
have

X= S_, s-2
2{J.,ke-fJ.,kS (8.11)

Again, moving along the Newton direction preserves the linear equations (8.9a) and
(8.9b). By (8.9b), we have

b =Ax= 2fJ.,ASk" 1e- fJ.,ASk" 2 s


= 2fJ.,ASk" 1e- fJ.,ASk" 2 (c- AT w)
However (8.9a) says that c = AT wk + sk, hence
b = 2fJ.,ASk" 1e- fJ.,ASk" 2 AT wk- fJ.,ASk" 2 sk + fJ.,ASk" 2 AT w
Therefore, we finally obtain the Newton direction
/::;.Wk=W-Wk

1
= -(ASk" 2 AT)- 1b- (ASk"2 AT)- 1ASk" 1e
fJ.,
Comparing this direction to (7.73), we see that the dual affine scaling algorithm with loga-
rithmic barrier function takes the Newton direction along the algebraic path of h(x, s) = 0.

8.1.3 The Primal-Dual Algorithm

Finally, we work on the system (8.7) to derive the moving directions of the primal-
dual algorithm. Simply by taking one Newton step with a linearization of the function
f(x1 , s1) = 0, we have
k
O-f(x1k ,s1k )= [ 'Vf(x1k ,s1k )] T ( Xj - Xj )
k
s1 - s1
206 Insights into the Interior-Point Methods Chap. 8

Using formula (8.6a) for function f, we have


k

xjk sjk - fJ.,


k
= - (sj, xjk) (Xj- xj)
k
Sj- sj
Note the above equation holds for j = 1, ... , n, hence

(8.12a)

Moreover, moving along the Newton direction assures that

(8.12b)
and
(8.12c)

Note that (8.12a), (8.12b), and (8.12c) form a system of linear equations with unknown
variables fixb fiwb and fisk. Using (8.12b) and (8.12c) to eliminate fixk and fisk in
(8.12a), we obtain

fiwk = (AXkSZ 1AT)- 1ASZ 1(XkSke- fJ.,e)


= -(AXkszl AT)-1 Aszlvk(fJ.,). (8.13a)

where vk(fJ.,) = fJ.,e- xkske.


Plugging fiwk into (8.12c), we have

fisk= -AT fiwk

=AT (AXkSZ 1AT)- 1ASZ 1vk(fJ.,). (8.13b)

After fisk is known, fixk immediately follows from (8.12a) as

SZ XkAT (AXkSZ 1AT)-I ASZ 1](XkSke- fJ.,e)


1 1
fixk = -[SZ -

= [SZ 1 - SZ 1XkAT (AXkSZ 1AT)- 1ASZ 1]vk(fJ.,)

= Sz 1[vk(fJ.,)- Xkfisk] (8.13c)

Comparing (8.13) to formula (7.90), we clearly see that the primal~dual algorithm takes
the Newton direction along the algebraic path of f(x, s) = 0.
Now, combining the results we obtained in the previous three subsections results
in the following theorem:

Theorem 8.1. The moving directions of the primal affine scaling algorithm with
logarithmic barrier function, the dual affine scaling algorithm with logarithmic barrier
function, and the primal-dual algorithm are the Newton directions along three different
and yet equivalent algebraic paths that lead to the solution of the Karush-Kuhn-Tucker
conditions (8.5).
Sec. 8.2 Missing Information 207

8.2 MISSING INFORMATION

In Chapter 7, the primal approach and dual approach were treated separately as if they
were independent problems. However, Theorem 8.1 indicates that the moving directions
of both the primal affine scaling and dual affine scaling with logarithmic barrier function
are closely related to that of the primal-dual method. Hence we shall further exploit the
dual information in the primal approach and the primal information in the dual approach.

8.2.1 Dual Information in the Primal Approach

We first study the dual information in the primal affine scaling algorithm. From (8.1 0),
we have
-2 k x-2
s= 2f-L X kx -{-L kx

= 2f-LXJ: 2 xk - f-LXJ: 2 (xk + .6.xk)


= f-LXJ: 2 xk- f-LXJ: Xk[l- XkAT (AX~AT)- 1 AXk) ( -~Xkc +e)
2

2
= !-LXJ: xk- !-LXJ: Xk (
2
-~Xkc +e) +f-LAT (AX~AT)- 1 AXk ( -~Xkc +e)
=c-AT (AX~AT)- 1 AXk(Xkc- f-Le)
Since we are moving along the Newton direction, both the primal and dual feasibility
conditions are preserved. Hence we can define that
w = (AX~AT)- 1 AXk(Xkc- f-Le)

In this way, we find the dual information

(AX~ATr AXk(Xkc- f-Le)- wk


1
.6.wk = w- wk =

= (AX~AT) - AX~(c- AT wk- f-LXJ: 1e)


1

= (AX~AT) -I AXk(XkSke- f-Le)


1
=- (AX~ATr AXkvk(f.L) (8.14)
Compming (8.14) with (8.13a), we see that the dual moving direction embedded in the
primal affine scaling algorithm with logarithmic barrier function has exactly the same
form as that of the primal-dual algorithm but with different scaling matrix, which, of
course, depends on the primal information Xk only.

8.2.2 Primal Information in the Dual Approach

Similar to what we did in the last subsection, we can derive the embedded primal
information of the dual affine scaling. Starting from Equation (8.11), we have
208 Insights into the Interior-Point Methods Chap. 8

X= S -1 8 -2
2f.J.,ke-f.J.,kS
1
= 2f.l.,SJ: 1e- f.l.,SJ: 2 [sk- ;AT (ASJ: 2ATr (b- f.J.,ASJ: 1e)]

1 1
= f.l.,SJ: [e + SJ: 1AT (ASJ:2 ATr (};AXke- ASJ: 1e)]
1 1
= f.l.,SJ: [e + SJ: 1AT (ASJ: 2 ATr ASJ: 1 (};skXke- e)]
1
=f.l.,SJ:
1
[I-SJ: 1AT(ASJ:2 ATr ASJ: 1]
1
(~ SkXke+e) +xk

Hence we know
2 1
f'::..xk = - [SJ: 1 - SJ: AT (ASJ:2 ATr ASJ: 1] (XkSke- f.J.,e)
2 1
= [SJ: 1 - SJ: AT (ASJ: 2 ATr ASJ: 1] vk(f.J.,) (8.15)

Comparing (8.15) to (8.13c), we see that the primal moving direction embedded in the
dual affine scaling algorithm with logarithmic barrier function has exactly the same form
as that of the primal-dual algorithm but with a different scaling matrix.
The results we found in the above two subsections can be summarized in the
following theorem:

Theorem 8.2. The form of either the dual moving direction embedded in the
primal affine scaling or the primal moving direction embedded in the dual affine scaling
can be found in the primal-dual algorithm but with different scaling matrices.

8.3 EXTENSIONS OF ALGEBRAIC PATHS

The concept of "moving along the Newton direction on different algebraic paths" not
only provides us a unified view to examine the primal affine scaling with logarithmic
barrier function, dual affine scaling with logarithmic barrier function, and primal-dual
algorithms but also serves as a platform to study new interior-point algorithms. At least
in theory there are infinitely many algebraic paths that could lead to the solution of
the given Karush-Kuhn-Tucker conditions, and each path may generate a new moving
direction associated with a potential interior-point algorithm. If a suitable step-length can
be decided at each iteration for a convergence proof, then a new interior-point algorithm
is introduced for further studies. An example of moving along a new path is given below.
Consider the function
Sec. 8.4 Geometric Interpretation of the Moving Directions 209

defined on x1 > 0, SJ > 0, j = 1, 2, ... , n, and p, > 0. In this way, solving system (8.5)
is equivalent to solving the following system:
ATw+s-c = 0 (8.16a)
Ax-b=O (8.16b)
r(Xj, Sj) = 0, for j = 1, ... , n (8.16c)
X> 0, S > 0 (8.16d)
We consider the moving direction at a given point (xk; wk; sk) such that Axk = b,
AT wk + sk = c, xk > 0, and sk > 0. One Newton step at this point with a linearization
of the function r (x1 , s1 ) = 0 yields that

-lo xjsJ _
e g--
p,
1 1)
( - - (xi- xj)
x1k' s1k s1 - s1k
Since the above expression holds for j = 1, 2, ... , n, its vector form becomes
x;;- 1
fl.xk + S;;- 1
fl.sk = -B(p,) (8.17)

where B(p,) = x~ sf x~s~ x~s~)


( loge --,loge
p,
- - , ... ,loge -
fJ.,
-
fJ.,

Moreover, moving along the Newton direction preserves the linear equations, hence
we have Afl.xk = 0 and AT fl. wk + fl.sk = 0. These two equations together with (8.17)
form a system of linear equations in terms of fl.xk. fl. wk. and fl.sk. The solution of this
system becomes
1 1
fl.xk = - [xk- xks;;- AT (Axks;;- 1ATr Axk] e(p,) (8.18a)

1 1
fl.wk = (AXkSk ATr AXke(p,) (8.18b)

fl.sk = -AT (Axks;;- 1AT) -I AXke(p,) (8.18c)


Comparing (8.18) with (8.13), we see that the moving directions along this new path are
different from previous results. Which algebraic path leads to computational superiority
remains an unanswered theoretical question.

8.4 GEOMETRIC INTERPRETATION OF THE MOVING DIRECTIONS

Different geometric viewpoints have been proposed to interpret the moving directions
of each individual affine scaling algorithm. Our objective in this section is to provide
a geometric view which at least interprets the moving directions of the primal affine
with logarithmic barrier, the dual affine with logarithmic barrier, and the primal-dual
algorithms in a unified way. Later on, we show that, for each of the above three
algorithms, an associated minimization problem can be defined such that the solution of
210 Insights into the Interior-Point Methods Chap. 8

the associated problem becomes the moving direction of the corresponding affine scaling
algorithm.
In order to achieve the goal, the concept of null space of a matrix needs to be
strengthened through the following two lemmas:

Lemma 8.1. Assume that m ::=:: n and A is an (m x n)-dimensional matrix with


full row rank. If U is an [n x (n- m)]-dimensional matrix of full rank such that AU= 0,
then

for each x ERn.

Proof For each x E Rn, since matrix A has full row rank, x can be decomposed
as

where u 1 E Rm and u 2 E Rn-m. Hence Ax = AA T u 1 + A Uu 2 = AA T u 1 and, conse-


quently, u 1 = (AA 7 )- 1Ax. Similarly, we see U 7 X= U 7 A 7 u 1 + U 7 Uu2 = U 7 Uu2 and,
consequently, u2 = (U7 U)- 1U 7 X. In other words, we have
for x ERn
and
for X E Rn

Notice that, if we define an operator p = U(U 7 U)- 1U 7 =[I- A 7 (AA 7 )- 1A],


then P 2 = P and AP = 0. Also note that, since matrix A is assumed to be of full row
rank, the null space of A is an (n - m )-dimensional subspace of W. This subspace
is, of course, isomorphic to Rn-m, and matrix U in Lemma 8.1 actually serves as an
isomorphism between the null space of A and the Euclidean space Rn-m. Furthermore,
we can prove the following result:

Lemma 8.2. Let A and U be defined as in Lemma 8.1 and Q be an (n x n)-


dimensional matrix which is symmetric and positive definite. Then; we have

Proof Since Q is positive definite, Q- 1 exists. If we define A = AQ and U =


Q- u, then A is an m x n matrix with full row rank and U is ann x (n- m) matrix of
1

full rank. Moreover, AQ =AU= 0. The result follows from Lemma 8.1.

With these two lemmas, we can start developing a unified geometric interpretation
of the moving directions in different affine scaling algorithms.
Sec. 8.4 Geometric Interpretation of the Moving Directions 211

8.4.1 Primal Affine Scaling with Logarithmic Barrier


Function

For the primal affine scaling algorithm with logarithmic barrier function, consider Pro-
gram PJ.L of (8.3). For a positive p,, we define
n
p(x) = cT x- f1, L loge Xj
j=l

Then p(x) is a convex and continuously differentiable function defined over the constraint
set (8.3b). In particular, for a given interior feasible solution xk, we have a first-order
approximation
p(x) ~ p (xk) + [Vp(xk)f (x- xk)
where V'p(xk) = c- p,XJ; 1e. Finding the steepest descent direction at xk is equivalent
to minimizing [Vp(xk)f(x- xk). Thus we consider a subproblem Ps of PJ.L:

Minimize [Vp (xk) f (x- xk)

subject to A (x- xk) = 0

IIQ- 1 (x- xk) 11


2
::=: f3 2
where Q- 1 is the inverse matrix of an (n x n)-dimensional symmetric positive definite
matrix Q and fJ < 1 is a well-chosen scalar such that the surface of the ellipsoid
{x ERn IIIQ- 1 (x-xk)ll 2 = {3 2 } becomes inscribed in the feasible domain of program Pw
In this case, the principal axes of the ellipsoid are the eigenvectors of Q- 1• In particular,
if we choose Q = Xk, then Q- 1 = XJ; 1 and problem Ps can be treated in a null-space
version. To be more specific, by noting that (x- xk) is in the null space of matrix A,
we can find a vector h E Rn-m and use the isomorphism U (between Rn-m and the null
space of A) to replace (x-xk) by Uh in problem Ps. Consequently, problem Ps becomes
Minimize [Vp (xk)fUh

subject to IIXJ; 1Uhll 2 ::=: {3 2

Note that the above problem is solvable by considering its Lagrangian:

where A. ::=:: 0 is a Lagrangian multiplier. Taking the partial derivative of L 1 with respect
to h and setting it to be zero at optimum hk, we have
uTvp (xk) + 2A. (uTx-; 2u) hk = o
Because matrix u has full rank and xk is a diagonal matrix, UTXJ; 2 U is a nonsingular
square matrix. Consequently,
212 Insights into the Interior-Point Methods Chap. 8

Remember that U is an isomorphism between Rn-m and the null space of matrix A in
Rn. We transform hk back to the null space of matrix A by

1
I:J..xk = Uhk = _ _!_u (U 7 X; 2 u r U 7 Vp (xk) (8.19)
2.A.
Noting that Vp(xk) = c tLXk' 1e, we apply Lemma 8.2 to (8.19) with Q = Xk. In this
way, we see that

I:J..xk = _ _!_Xk[I- XkA 7


2.A.
(AX~A 7 r 1 AXdXk (c- tLXk' 1e)
(8.20)
=- 1.A.Xdl- XkA T ( AXkA
2 y)-1
AXk] (Xkc- f-Le)
2
Comparing (8.20) with (7.49a) and noting that 1/2.A. is a positive scalar which does not
alter the direction of a vector, we can conclude that the moving direction of the primal
affine scaling with logarithmic barrier function algorithm is provided by the solution of
the subprogram Ps. This also provides a geometric interpretation of the abovementioned
moving direction.

8.4.2 Dual Affine Scaling with Logarithmic Barrier Function

With the same idea, we now consider the dual case. This time, we define
n

q(w, s) = b w
7
+ fL ~loge SJ
}=!

and assume that (wk, sk) is a solution to program DfL of (8.4). In this way, [Vq(wk, sk)f
= (b 7 , f-Le 7 S; 1). Since w-variables are unrestricted, we only have to construct an ellip-
soid in the s-space and consider the following subproblem Ds of program D/L:

Maximize (b r 'f-Le rs-1)


k
(w-wk)
s- sk

subject to [AT I I ] ( w - wk) = 0


n s- sk

where Q- 1 is the inverse matrix of an (n x n)-dimensional symmetric positive definite


matrix Q and f3 < 1 is a well-chosen scalar such that the nonnegativity constraints of
program D/L are replaced by the inscribing ellipsoid {s ERn IIIQ- 1 (s- sk)ll 2 :::: {3 2 }. In
particular, Q- 1 can be chosen as s; 1 for the consideration of a null space version of
program Ds.
Sec. 8.4 Geometric Interpretation of the Moving Directions 213

To be more specific, we let

Then AU= 0 and iJT can be considered as an isomorphism between Rm and the null
space of A, i.e.,

for some v E Rm

In other words, we have .6.wk = v, .6.sk =-AT v. Moreover, the subproblem Ds becomes

Maximize (bT, JLeTS; 1)Uv

subject to II-S; 1AT vll 2 :::: {3 2


V E Rm

To solve this problem, we consider its Lagrangian

(8.21)

By setting its partial derivative with respect to v to be zero at the optimal solution vk
and applying Lemma 8.2, we eventually have

(8.22a)

and
(8.22b)

Note that 1/2A is only a positive scalar. By comparing (8.22) to (7.73), we conclude
that the moving direction of the dual affine scaling with logarithmic barrier function
algorithm is provided by the solution of the subprogram Ds. This is consistent with the
geometric interpretation we derived for the primal case.

8.4.3 The Primal-Dual Algorithm

In order to interpret the moving directions of the primal-dual interior-point algorithm


in the same context, we need to construct a primal-dual optimization problem (PD).u
such that its subproblem (PD)s produces the desired directions. Note that the barrier
function method requires the parameter JL to decrease to zero. Therefore, without loss of
generality, we may assume that JL < 1 in this subsection. In this case, if x is a feasible
solution to problem P.u and (w, s) is a feasible solution to problem D.u, then
214 Insights into the Interior-Point Methods Chap. 8

n
CT X - fk L loge Xj -
J=l

ll

= XT s- fk L loge(XjSj)
J=l
ll

:=:: /k L[XjSj -loge(XjSj)] :=:: 0


j=l

The desired primal-dual optimization problem can be defined as a problem which min-
imizes the gap between problems PJL and DJL subject to the primal and dual interior
feasibility conditions, i.e., problem (PD)JL has the following form:

Minimi'e CT X- I' t.]og, Xj - ( bT W +I' t.!og. 'i) (8.23a)

subject to Ax = b, x> 0 (8.23b)

ATw+s = c, S>O (8.23c)

If we define

-A= [A0 0 0]
AT 1 11

and use p(x) and q(w, s) representing the objective function of P JL and DJL respectively,
then problem (PD)~-' is simplified as follows:

Minimize r(x; w; s) =p(x) - q(w, s)

subjectto A (:) = (:)

X> 0, S > 0

Suppose that (xk; wk; sk) is a feasible solution of (PD)w The steepest descent direction
suggests us to consider the following subproblem (PD)s:
Sec. 8.4 Geometric Interpretation of the Moving Directions 215

Minimize ['Vr (xk; wk; sk)] T ( ; =~k )


S - S"

subject to =0

!!Qit (x- xk) II ::: f3f


IIQ2I (s- sk) II ::: f3i
where Q 1 and Q2 are symmetric positive definite matrices and {3 1, f32 < 1 are well-
chosen scalars such that the corresponding ellipsoids are inscribed to the feasible domain
of (PD)w In particular, we can choose Q 1 = x!
12
S_;; 112 for the primal variables and
Q2 = x_;;' S!
12 12
for the dual slacks and consider a null-space version of problem (PD)s.
To do so, let us start with a matrix U satisfying AU= 0. By defining

we see AU= 0 and U serves as an isomorphism between Rn and the null space of A.
More explicitly, we have

where u 1 E Rn-m,

Therefore, b.xk = Uu 1, b.wk = u2 , and b.sk = -AT u2 . Consequently, problem (PD)s


becomes a null-space version:

Minimize [Vr (xk; wk; sk)f ( ~~~ )


-ATu2
subject to l x_;; s!
112 12
Uu
1
11 ::: f3f
II-X!/2S_;;If2AT u211 :S f3I
u 1 E Rn-m, u2 E Rm

The Lagrangian of the above problem is given by


216 Insights into the Interior-Point Methods Chap. 8

- (ar (x' ~:'' s') rATu' +A! (llx;'''s:''uu' II' -fit)


- A-2 (ll-x!l2s;I/2 AT u2112- f3i)
where A. 1 and A.2 are nonnegative Lagrange multipliers.
Recall that r(x; w; s) = p(x) - q(w, s). We now solve the subproblem (PD)s by
setting that

In this way, we have

(8.24a)

and

(8.24b)

Transforming back to the original space, we see that

6xk = Uu I = - -2A.1U ( U T x;; ISkU)-1 U T ( c - !LXi: Ie)


1

Applying Lemma 8.2 results in

6 Xk -___).l_xlf2s-lf2
I k k
[1 _ x!f2s-If2 AT
k k
(Axks-I
k
AT)-! Axlf2s-lf2]
k k
2
x!l2s;l/2 (c- /Lx;;Ie)

=- 2~1 (xks;;I- XkSi: 1AT (AXkS;; 1ATrl AXkS;; 1) (c-AT wk- {LX;; 1e)

(8.25a)

Similarly, we have
Sec. 8.5 General Theory 217

1
L'l.wk = u2 = - - (AXkSJ; 1Arrt (b- tLASJ; 1e)
2Az
1
= - - (AXkSJ; 1Arrl (AXke- tLASJ; 1e) (8.25b)
2Az

= - 1 ( AXkSk-1 A y)-1 ASk-1 (XkSke- f.Le)


2Az
Moreover,

L'l.s =-A T u-? = - -1A T ( AXkSk-1 A y)-1 ASk-1 (XkSke- f.Le) (8.25c)
2Az
Noting that both At and Az are nonnegative, and, comparing (8.25) to (7.90), we can
confirm that the moving directions of the primal-dual affine scaling algorithm are given
by the solution of the subproblem (PD) 5 •

8.5 GENERAL THEORY

The geometric interpretation of the moving directions of the affine scaling algorithms
suggests that we study two crucial factors. First, we need a symmetric positive definite
scaling matrix to open an appropriate ellipsoid in the null space of the constraint matrix for
consideration. Second, we need an appropriate objective function such that its first-order
approximation is optimized. In the previous section, we have incorporated logarithmic
barrier functions into the original objective and applied diagonal matrices for scaling.
Here we want to further extend this approach to study more general results.

8.5.1 General Primal Affine Scaling

In this subsection, we focus on the primal program P defined by (8.1). Instead of choosing
the logarithmic barrier function, for p, > 0, let us use a general concave barrier function
¢(x) which is well defined and differentiable on the relative interior of the primal feasible
domain and consider the following problem (P¢) JL:
Minimize cT x- tL¢(x) (8.26a)
subject to Ax = b (8.26b)
X>O (8.26c)
Under the interior-point assumption (Al) on problem (P), let xk be a feasible solu-
tion to problem (P¢ )JJ. and \1¢ be the gradient of ¢. We also let Q- 1 be an arbitrary
symmetric positive definite matrix, f3 < 1 be a positive scalar such that the ellipsoid
{x E Rn IIIQ- 1 (x- xk)Ji 2 _:::: {3 2 } becomes inscribed in the feasible domain of prob-
lem (P¢) w Our focus is to find a good moving direction vector L'l.xk = x - xk from the
ellipsoid such that xis still feasible, i.e., AL'I.xk = 0, and the objective value cT x- f.L¢(x)
is minimized.
218 Insights into the Interior-Point Methods Chap. 8

Taking the first-order approximation of the objective function at the current interior
solution xk, we have
c7 x- f..L¢(x) ~ c7 xk- f-L¢ (xk) + [c- f..L\1¢ (xk)J 7 l:!.xk
Therefore, we focus on the following subproblem (P¢ )s:
7
Minimize [c-f..LV¢(xk)J l:!.xk (8.27a)
subject to Al:!.xk =0 (8.27b)

IIQ- 1l:!.xkll 2 :::: /3 2 (8.27c)


In order to solve (8.27), we make use of the isomorphism U between the Rn-m and the
null space of matrix A such that l:!.xk is replaced by h E Rn-m to eliminate the constraint
Al:!.xk = 0 in a null-space version problem. In this way, we have an equivalent problem:
7
Minimize [c- f-LV¢ (xk)J Uh (8.28a)

subject to IIQ- 1Uhll 2 :S /3 2 (8.28b)


(8.28c)
The Lagrangian of problem (8.28) is given by
L(h, A.)= (c- f..L\1¢
7
(xk)) Uh +A. (11Q- UhW- !3 1 2
)

where A. ~ 0 is the Lagrange multiplier associated with the inequality constraint. Setting
aLjah = 0 and solving for h results in a solution
hk = _ __!__ (UTQ-2Url UT (c- f..L\1¢ (xk))
2A.
Consequently, from Lemma 8.2, a moving direction
l:!.xk = -Q[I- QA 7 (AQ 2 A 7 r 1
AQ]Q (c- f-LV¢ (xk)) (8.29)
is generated for the general primal affine scaling algorithm. Also note that, when ¢(x)
is strictly concave and twice differentiable, the Hessian matrix H of -¢(x) becomes
symmetric positive definite. Actually, H is the Hessian of the objective function c 7 x-
f..L¢(x) of program (P¢) w If we choose H 112 to be the scaling matrix Q- 1 (or equivalently,
H = Q-2 ), then
l:!.xk = -H- 112 [1- H- 1 A12 7
(AH- 1A 7 r
1
AH~ 1 1 2 ] H- 112 (c- f..LV¢ (xk)) (8.30)
is the projected Newton direction with respect to the general barrier function.
Note that the classic inverse function can be used as a barrier function, i.e.,
1 n 1
¢(x) = -~ L x~ for r > 0
j=1 1

In this case,
V¢(x) = x-r- 1e (8.31a)
and
H = -(r + l)x-r-2 (8.31b)
Sec. 8.5 General Theory 219

The Karush-Kuhn-Tucker conditions becomes


ATw+s = c, S>O (8.32a)
Ax=b, x>O (8.32b)
xr+ I Se = f1-e (8.32c)
Comparing (8.32) with (8.5), as r -+ 0, we see the two systems are closely related.
Plugging (8.3la) and (8.31b) into formula (8.30), den Hertog, C. Roos and T. Terlaky
designed their inverse barrier method by moving along the projected Newton direction
!:lxk = -H-I/2 [I- H-I/2AT (AH-IATri AH-1/2] H-1/2 (c- /1-x-r-Ie)
with a proper step-length such that the algorithm terminates after at most

iterations to reach an E-optimal solution, under the assumptions of having an interior


feasible solution and bounded primal feasible domain. As r -+ 0, the inverse barrier
function algorithm approaches the logarithmic barrier function algorithm with the same
complexity bound. Moreover, R. Sheu and S-C. Fang used the general direction (8.29) to
construct a generic path-following algorithm for linear programming and imposed some
sufficient conditions on a general barrier function to achieve polynomial-time perform-
ance.

8.5.2 General Dual Affine Scaling

In this subsection, we shift our focus to the dual program D defined by (8.2). As in the
general primal affine scaling, we replace the logarithmic barrier function, for 11- > 0, by
a general concave barrier function 1/r(x) which is well defined and differentiable on the
relative interior of the dual feasible domain. Now consider the following problem (D1/r) JL:
Maximize bT w + /1-1/r(s) (8.33a)
subject to AT w + s = c (8.33b)
S>O (8.33c)
Under the interior-point assumption (A2) on problem (D), let (wk; sk) be a feasible
solution to problem (D1/r) JL and Vlfr be the gradient of 1fr. Again we let Q- 1 be an
arbitrary symmetric positive definite matrix, f3 < 1 be a positive scalar such that the
ellipsoid {s ERn IIIQ- 1 (s- sk)ll 2 :::: {3 2 } becomes inscribed in the feasible domain of
problem (D1/r) w In order to find good moving direction vectors !:l wk = w - wk and
!:lsk = s - sk, we focus on the following subproblem (D1/r) s:

Maximize [bT 1 (11-Vl/r (sk))T] ( ~::) (8.34a)

(8.34b)
220 Insights into the Interior-Point Methods Chap. 8

(8.34c)
Remember the isomorphism

between the null space of matrix A_T = [AT I In] and Rm such that /:).wk v and
D.sk =-AT v, for v E Rm. A null-space version of problem (Do/)s becomes
Maximize [bT 1 (!L \11/1 (sk)) T] Uv (8.35a)

subject to 11-Q-IAT vll 2 ::::: {3 2 (8.35b)


(8.35c)
The Lagrangian of problem (8.35) is given by
2
L (v, A)= bT v- [!L\11/1 (sk)]T AT v- A (11-Q-IAT vll - f3 2 )
where A ::-::: 0 is the Lagrange multiplier associated with the inequality constraint (8.35b).
Setting aL;av = 0 and solving for v, we have

vk = 21A [(AQ-2ATrl b- (AQ-2ATri A (fL\11/1 (sk))]


Consequently, we have

/:).wk = vk = 2~ [ (AQ-2 ATrl b- (AQ-2 ATrl A (!L\11/1 (sk))] (8.36a)

and
(8.36b)
for the general dual affine scaling algorithm. Also note that, when 1jl (x) is strictly
concave and twice differentiable, the Hessian matrix H of 1jl (x) becomes symmetric
positive definite. If we choose H 112 to be the scaling matrix Q- 1 (or equivalently,
H = Q- 2 ), then the corresponding formulas for /:). wk and /:).sk can be derived. When the
classic inverse function is taken to be the barrier function for the dual approach, i.e.,
1 n 1
1/f(s) = -"'-
r L-- s~
for r > 0
j=l J

a corresponding dual algorithm can be further developed.

8.6 CONCLUDING REMARKS

In this chapter we have provided an algebraic view as well as a geometric interpretation


to gain more insights into the primal affine scaling, dual affine scaling, and primal-dual
algorithms. From the algebraic point of view, at least in theory, we may have infinitely
Exercises 221

many algebraic paths that lead to the solutions of the Karush-Kuhn-Tucker conditions.
Moving along the Newton direction of each such path with appropriate step-lengths may
result in a new algorithm for further analysis.
The geometric interpretation relies on the special structure of a corresponding
subproblem. Basically, it takes an appropriate scaling matrix and a scalar to open an
inscribed ellipsoid in the feasible domain such that the inequality constraints can be
replaced. Then we consider the projected (negative) gradient of the objective function
in the null space of the constraint matrix as a potential moving direction. The shape of
the inscribed ellipsoid is certainly determined by the scaling matrix, and the projected
gradient is dependent on the barrier function applied.
Based on the geometric view, a general scheme which generates the moving direc-
tions of the generalized primal affine scaling and dual affine scaling has been included.
As to the generalization of the primal-dual algorithm, the difficulty lies in finding a pair
of primal barrier function cj;(x) and dual barrier function ljr(s) such that both programs
(Pep) JL and (Do/) JL have a common system of Karush-Kuhn-Tucker conditions. If this
can be done, the generalization follows immediately. But so far, except by using the
logarithmic barrier function for both the primal and dual, no other successful case has
been reported.

REFERENCES FOR FURTHER READING

8.1 Gonzaga, C., "Search directions for interior linear programming methods," Algorithmica 6,
153-181 (1991).
8.2 den Hertog, D., Roos, C., and Terlaky, T., "Inverse barrier methods for linear programming,"
Report of the Faculty of Technical Mathematics and Informatics, No. 90-27, Delft University
of Technology (1990).
8.3 Sheu, R. L., and Fang, S. C., "Insights into the interior-point methods," OR Report No. 252,
North Carolina State University, Raleigh, NC (1990), Zeischriftfur Operations Research 36,
200-230 (1992).
8.4 Sheu, R. L., and Fang, S. C., "On the generalized path-following methods for linear pro-
gramming," OR Report No. 261, North Carolina State University, Raleigh, NC (1992).
8.5 Ye, Y., "An extension of Karmarkar's algorithm and the trust region method for quadratic
programming," in Progress in Mathematical Programming: Interior-Point and Related Meth-
ods, ed. N. Megiddo, Springer Verlag, New York, 49-64 (1989).
8.6 Zimmermann, U., "Search directions for a class of projective methods," Zeischriftfor Oper-
ations Research 34, 353-379 (1990).

EXERCISES

8.1. Show that (8.18) is indeed a solution to the system (8.17) together with AL'.xk = 0 and
A 7 L'.wk+L'.sk=0.
8.2. If we define P = U(U7 U)- 1U 7 =[I- A7 (AA 7 )- 1A], show that P2 = P and AP = 0.
222 Insights into the Interior-Point Methods Chap. 8

8.3. Show that v" in (8.22a) is indeed an optimal solution to the null-space version of program Ds.
8.4. From (8.19), we know that

t:,xk = 2IA. u(urxk"2url uT (-vp(xk))

= 2IA. u (urxk"zurl uT (-c + p,Xk"Ie)


Show that U(UTXk" 2U)- 1U7 is a projection mapping. Hence the moving direction in the
primal affine scaling with logarithmic barrier function can be viewed as the negative gradient
of p(x) projected into the null space of matrix A.
8.5. From (8.22), first try to show that

( ~;:) =
2~ iJ (ASk ATr
2 1
iJT (Vq (~, l))
= _!_iJ(AS-2AT)-liJT ( ~1 )
2A k p,Sk e
Then show that iJ (ASi: 2 AT) -l iJT is not a projection mapping. Hence the moving direction
in the dual affine scaling with logarithmic barrier function cannot be viewed as the negative
gradient of q(w; s) projected into the null space. The reason is mostly due to the unrestricted
variables w. This phenomenon will not happen for the symmetric dual problem, which
requires both w and s to be nonnegative.
8.6. Derive (8.24a) and (8.24b) from

8L3 =O and 8L3 =O


au 1 au2
8.7. In order to derive a geometric interpretation of the moving directions (8.18) associated with
the algebraic path r (xj, Sj) = loge (xj Sj / p,) = 0, we define

. . _ T XjSj
t(x, w, s)- -x s + LXjSj
n loge ( - -)
j=l J-i
Now consider the following subproblem:

{minimize [vt (xk; wk; sk)r (; =


;k) 1 At:,.xk
s- sk
= o, A 7 t:,wk + t:,sk = 0}

By choosing x; 112 s! 12
and x!l 2s;
112
as the scaling matrix for x and s, respectively, show
that the solution of this subproblem provides the moving directions (8.18).
8.8. Replace the objective function of (8.3a) by

+ ~ f-.. _!_
7
c X
p, r ~x~
j=l J

and verify that its Karush-Kuhn-Tucker conditions are given by (8.32).


8.9. Taking the inverse barrier function

I " I
1/f(s) = - ' \ ' - for r > 0
r ~s~
j=i J
Exercises 223

and using H 112 as the scaling matrix Q- 1, derive corresponding dual moving directions
t::,wk and !::,sk.
8.10. Consider using the entropy function
ll

</J(x) =- L Xj loge x1
}=1

as the barrier function.


(a) By plugging the entropic barrier function into the objective function of (8.26a), derive
the corresponding Karush-Kuhn-Tucker conditions.
(b) Find the gradient and Hessian of the en tropic barrier function at a given solution xk.
(c) Take Q- 1 = H 112 as the scaling matrix to derive the corresponding formula of the
primal moving direction !::,xk.
8.11. Take
ll

"if;(s) =- L SJ loge Sj
}=1

as the barrier function.


(a) Derive the K-K-T conditions for problem (8.33).
(b) Compare your results with 8.10(a) to see if they represent the same system.
9

ffine Sea/in for Convex


• •
ua rat1c ro rarnm1n

The linearly constrained convex quadratic programming problems and linear program-
ming problems are closely related by having the same structure of their feasible do-
mains. In the past, linear programming algorithms had been naturally extended to solving
quadratic programming problems. For example, in 1959 P. Wolfe extended the simplex
method for quadratic programming. Just as in solving linear programming problems, in
the worst-case analysis, the simplex-based algorithm could take an exponential number
of iterations to reach optimality. Another example is that the ellipsoid method proposed
by M. K. Kozlov, S. P. Tarazov, and L. G. Khachian in 1979 led to the first notable
polynomial-time algorithm for solving quadratic programming problems. Similar to the
case of linear programming, despite the theoretic significance, the related implementa-
tion issues made this approach much less attractive. Therefore, it is easy to understand
that after N. Karmarkar proposed his first interior-points method to solve linear pro-
gramming problems in 1984, many researchers have devoted their efforts to developing
interior-points methods for quadratic programming.
In this chapter, we look into extending the affine scaling approach to solving
quadratic programming problems. Since the theoretic complexity analysis in this case is
somewhat parallel to that of the linear programming case, we merely focus on introduc-
ing practical implementations and leave out detailed complexity analysis. In particular,
we concentrate on developing the primal affine scaling and primal-dual algorithms for
quadratic programming first and then extend the results to solving general convex pro-
gramming problems with linear constraints.

224
Sec. 9.1 Convex Quadratic Program with Linear Constraints 225

9.1 CONVEX QUADRATIC PROGRAM WITH LINEAR CONSTRAINTS

9.1.1 Primal Quadratic Program

Let Q be an n x n matrix, c and x be n-dimensional column vectors. A quadratic


function is defined on Rn by
1
z(x) = -xTQx + cT x
2
Notice that Y'xz(x) = Qx + c and v;z(x) = Q. Hence we know that z(x) becomes a
convex quadratic function when Q is positive semidefinite. Moreover, if Q is positive
definite, then z(x) becomes a strictly convex function which attains its unique minimum
x
at = -Q- 1c, by setting \7xZ (x) = 0. The minimum value of z (x) is equal to -1cT Q- 1c
at x.
Given that Q is positive semidefinite, A is an m x n matrix and b E Rm, a linearly
constrained convex quadratic programming problem (or primal quadratic program, in
short) is defined as follows:
1
Minimize z(x) = -xTQx + cT x (9.1a)
2
subject to Ax = b and x 2: 0 (9.1b)
Assume that matrix A has full row-rank and b can be represented as a nonnegative
linear combination of the column vectors of A. Then we know the feasible domain is
nonempty and there is no redundant constraint. Note that although the feasible domain of
problem (9 .1) has the same structure as that of a standard linear programming problem,
the contour surface of the quadratic objective function is different from that of a linear
program. To be more specific, given a constant k, the solutions of z(x) = k may form
an ellipsoid whose principal axes are inversely proportional to the square root of the
eigenvalues of Q. Because of the ellipsoidal contours of the objective function, unlike
linear programming problems, a quadratic program may achieve its optimum at any
feasible solution, no matter if it is in the interior or on the boundary. Figures 9.1 and
x
9.2 illustrate two possibilities. In the first case, the minimizer of the unconstrained
convex quadratic function z(x) is feasible to the linear constraints (9 .1 b). Hence the
unconstrained minimum is equal to the constrained minimum. In the second case, is nox
longer feasible to (9 .1 b), hence the constrained minimum is no less than the unconstrained
minimum. Although the first case is much simpler to handle, most real-life problems fall
in the second category. Hence we put emphasis on the second case.

9.1.2 Dual Quadratic Program

The concept of duality also applies to quadratic programs. When Q is positive definite,
corresponding to the primal problem (9 .1 ), we have a dual Lagrangian problem:
1
Maximize - -vT Qv + bT w (9.2a)
2
subject to - Qv + AT w +s = c, (9.2b)
226 Affine Scaling for Convex Quadratic Programming Chap. 9

Constant cost surfaces Figure 9.1

Figure 9.2

Under previously mentioned assumptions and taking v = x, we have the following


Karush-Kuhn-Tucker conditions for convex quadratic programming problems:
Ax=b, (9.3a)
-Qx+ATw+s=c, (9.3b)
XSe = 0 (9.3c)
where X and S are diagonal matrices using the elements of x and s as diagonal elements,
respectively.
In order to ensure that a solution to the primal can be recovered from a solution of
the dual, it is necessary for Q to satisfy two additional conditions, namely, Q must be
Sec. 9.2 Affine Scaling for Quadratic Programs 227

nonsingular and AQ- 1AT must be positive definite. In fact, if Q is assumed to be positive
definite, instead of positive semidefinite, then Q- 1 exists and AQ- 1A is positive definite
when A has full row-rank. For simplicity, we assume that Q in (9.la) is symmetric and
positive definite in this chapter.

9.2 AFFINE SCALING FOR QUADRATIC PROGRAMS

As mentioned earlier, right after Karmarkar's work with linear programming, many re-
searchers tried and were able to extend Karmarkar's projective scaling algorithm for
solving quadratic programs. For example, S. Kapoor and P. M. Vaidya developed a
projective scaling algorithm which requires O(n 3·67 (log(n + m + 2))(log L)L) arithmetic
operations. Similarly, Y. Ye and E. Tse proposed an algorithm with O(n 4 L 3 ) complex-
ity. However, owing to the complicated nature of the projective transformation, although
these algorithms have polynomial-time bounds, they lack computational support of ef-
fective implementation. Our objective is to introduce some practical implementations
based on the affine scaling approach-in particular, primal affine scaling and primal-dual
algorithms.

9.2.1 Primal Affine Scaling for Quadratic Programming

With some caution in handling the subtle differences between linear programming and
quadratic programming problems, the primal affine scaling algorithm developed for linear
programs can be extended to solve quadratic programming problems. In accordance with
the philosophy that we have been following throughout the book, we focus on three key
issues for developing an iterative algorithm, namely, (1) obtaining a starting feasible
solution, (2) synthesizing a direction of translation with appropriate step-length for an
improved solution, and (3) finding stopping criteria to terminate the iterative process.

Obtaining a starting solution. Since the feasible domain defined by (9.1b)


for a quadratic programming problem is identical to that of a standard form linear pro-
gramming problem, the Phase-1 method developed for linear programs in Chapter 7 can
be applied to obtain a starting primal interior feasible solution for quadratic programming
problems. The details can be found in Chapter 7.

Synthesizing a direction of translation. In Chapter 7, we have shown that


the primal affine scaling algorithm for linear programs actually moves along the direction
of the projected negative gradient of a linear objective function. Unfortunately, moving
along the direction of the negative gradient of a quadratic objective function may become
a poor direction of translation from a given interior point. As a matter of fact, this is
a common problem with methods based on projected gradients of nonlinear objective
functions. Figure 9.3 illustrates this situation.
As shown in the figure, a convex quadratic function (9.la) has elliptic contours.
When the ratio between the largest and smallest eigenvalues of the defining matrix Q be-
comes larger, the elliptic contours become "narrower." Under this situation, minimizing
228 Affine Scaling for Convex Quadratic Programming Chap. 9

Optimal solution
Directions of translation/

Figure 9.3

a convex quadratic function (even an unconstrained one) turns out to be ill-conditioned


and, instead of pointing straight to the minimum, the projected negative gradient direc-
tions may lead to a very inefficient zig-zag path.
However, this undesirable situation will not happen when the defining matrix Q
has identical eigenvalues. For example, if Q = ki for a constant k and identity matrix I,
then the convex quadratic function (9.1a) has spheric contours. In this case, the pro-
jected negative gradient direction may point directly toward the optimum, as shown in
Figure 9.4.

Direction of translation

Optimal solution

Figure 9.4

Transforming from Elliptic to Spheric Contours. It is easy to understand that


Figure 9.4 represents a much more desirable case than Figure 9.3. We now show that a
convex quadratic function with elliptic contours can be transformed into another convex
quadratic function with spheric contours in the transformed space.
Sec. 9.2 Affine Scaling for Quadratic Programs 229

Consider the convex quadratic function z(x) defined by (9.1a). For a symmetric
positive semidefinite matrix Q, we can always find a lower triangular matrix L such that
Q=LLr (9.4)

where Lr is the transpose matrix of L. Because L can be found by the standard Cholesky
factorization process, we usually call it a Cholesky factor of matrix Q. Notice that when
Q is positive definite, then both L and Lr are nonsingular.
Now consider a transformation from Rn to Rn such that
(9.5a)
Moreover, we define that
1
C =L- 1c (9.5b)
Then (9.1a) becomes
1 1T
1 1
+ c' T x 1
z(x ) =
2x X

1 ,r 1 1 + 1T 1
=
2x X C X

Note that the transformed convex quadratic function has spheric contours because of the
identical eigenvalues in its defining matrix I. Working in the transformed space, we
expect to avoid the undesirable "zig-zagging" problem. Also note that
1 1
z(x
1
) + -(c')r C
1
= -(x
1
+ c')r (x + c
1 1
)
2 2
Since iCc')r c1 is only a constant, minimizing z(x') is equivalent to minimizing z'(x') =
iCx' + c'l (x' + c').
Application to quadratic programming. For a quadratic programming prob-
lem, with symmetric positive definite Q, we shall apply the transformation (9.5a) to it
first, then follow the primal affine scaling idea to generate a direction of translation in
the affinely scaled space by projecting the negative gradient of the transformed quadratic
function onto the null space of the scaled constraints. The actual moving direction is
finally obtained by transforming the direction of translation back to the original space.
In light of previous discussion, the transformed quadratic program becomes:
1
Minimize z' = -(x' + c')r (x' + c') (9.6a)
2

subject to [~ -(~rriJ [;,] = [~], (9.6b)

Notice that the new quadratic objective function has spheric contours in the transformed
space, the original constrained Ax = b remains, the transformation X1 = Lr xis incorpo-
rated into the constraints, and x' is unrestricted although x is required to be nonnegative.
230 Affine Scaling for Convex Quadratic Programming Chap. 9

This transformed problem certainly doubles the number of variables in the original prob-
lem. But from later development, we can see this causes no special problem, because
once x is known, X 1 = LT x is automatically defined.
As in Chapter 7, let us assume that a primal interior feasible solution xk > 0 is
known. By taking its elements as diagonal elements, we form a diagonal scaling matrix
Xk. Remember that the basic idea of affine scaling is to scale the interior solution xk to
be at e = (1, 1, ... , 1) T. Therefore, in the scaled space we have scaled variables
-I
y = Xk X (9.7a)
1
As to the unrestricted variables X , they need not be scaled. Hence a scaling matrix is
defined by

Xk =
0 OJ
[Xk I (9.7b)

In this way, we have the following correspondence between the original variables and
scaled variables:

(9.7c)

Moreover, if we define

uk = [~ _(~Trl] xk = [ ~~k _ (~Trl] (9.8a)

then the constraints (9.6b) in the scaled space become

y:::O (9.8b)

Moreover, the quadratic objective function Z1 remains the same in the scaled space, since
only the unrestricted variables X1 are involved. The gradient of this objective function at
a given point

in the scaled space is given by

O ~]
1
\lz = [ x +c 1 (9.9a)

In particular, at ( ;:, ) = ( L; xk ) , the gradient vector becomes

\lz~ = [ x k'
0
+c1 J
(9.9b)

The affine scaling approach suggests us to project the gradient vector \lzk onto the null
space of the constraint matrix Uk and to take its negative as a direction of translation
Sec. 9.2 Affine Scaling for Quadratic Programs 231

in the scaled space. In other words, if we denote the projection matrix by Pk = [I -


U[ (Uku[)- 1Uk), then the direction of translation in the scaled space is given by
-k
d =
[
dy
-k
-k
dx'
l = -Pk '\lzk
I
(9.10)

Our objective is to find an explicit formula for d;


and transform it back to the original
space as the actual moving direction d~. To achieve our goal, following the development
in Chapter 7, we define the so-called dual estimates as follows:

w
k
= [w1]
w~ = (ukukr)-1.(vk '\lzk) I
(9 .11)

In this way, wk is a solution of the equation

(ukun wk = uk '\lz~ (9.12a)

and

(9.12b)

Owing to the fact that

T _ [AX~AT AX~ ]
UkUk - X~AT X~+ Q-1 (9.13)

it follows immediately from (9.8a), (9.9b), and (9.12a) that

AX~AT w1 +AX~~= 0 (9.14a)

and

(9.14b)

The second equation implies that

Substituting in the first one and rearranging terms, we have

AXk [I- Xk(X~ + Q- 1)- 1Xk] XkAT w1 =AX~ (X~+ Q- r 1 1


(Lrrl ( xk' + c1)
(9.16a)
Now, observing that

I- Xk (X~+ Q- 1r
1 1
Xk =I- (I+ Xi; 1Q- 1Xi; 1r

= (I+Xi;IQ-IXi;l-I) (I+Xi;IQ-lXi;lrl

= Xi; 1Q- 1Xi; 1 (I+ Xi; 1Q- 1Xi; 1) -I


232 Affine Scaling for Convex Quadratic Programming Chap. 9

= Xj;IQ-1 (X~+ Q-Jrl Xk

= xk-1 ( xkQ+I
2 )-]
xk
= Xj; 1 (Q + Xj; 2) Xj; 1
and
X~(X~+Q-Jrl = (I+Q-IXj;zri = (Q+Xj;zriQ
Equation (9.16a) reduces to

A (Q + Xj;2) -I AT wt = A (Q + Xj;2) -I Q (LT) -I ( xk' + c') (9.16b)

Noting that Q(LT)- 1(xk' + c') = Qxk + c and denoting


Hk = (Q + Xj;z)-I (9.17)
we see that
w~ = (AHkATr AHk (Qxk +c)
1
(9.18a)
Following (9.15), we further have
1
w~ =-(X~+ Q- 1r [x~AT w7 + (LTr (xk' + c')]
1

1
=-(X~+ Q- 1r X~AT w1- (X~+ Q- 1r (xk + Q- 1c)
1
(9.18b)
However, from (9.8a),(9.9b), and (9.12b) we know that
d~ = Xk(AT w1 + w~) (9.19)
Plugging (9.18a) and (9.18b) into (9.19) and carefully rearranging terms, we have a
simple result:
1
w1 + w~ =AT w1- (X~+ Q- r X~AT w7- (X~+ Q- 1r (xk + Q- 1c)
1 1
AT

= [I- (X~+ Q-1r1 x~] AT w1- (X~+ Q-1r1 (xk + Q-1c)

= Xj;2Q-I (I+ Xj;2Q-1r1 AT w1- (X~+ Q-1r1 (xk + Q-1c)

(since [I- (X~+Q- 1 r x~J = (Xi: 2Q- 1) [I+Xi:2Q- 1


1
r 1)
= Xj;2 (Q + Xj;zrl AT w1- Xj;2 (I+ Q-Ixj;zrl (xk + Q-lc)
= Xj;2 (Q + Xj;zri AT w1- Xj;2 (Q + Xj;2r1 Q (xk + Q-Ic)
= Xj; 2 [HkAT w1- Hk (Qxk +c)]

= Xj; 2 (HkAT [ (AHkAT)- 1 AHk (Qxk +c)]- Hk (Qxk +c))


= -Xj;2 [I- HkAT (AHkATr A] Hk (Qxk +c)
1
(9.20)
Sec. 9.2 Affine Scaling for Quadratic Programs 233

Therefore, we finally obtain a direction of translation in the scaled space:

(9.21)

Mapping it back to the original x-space by premultiplying the scaling matrix Xk> we
obtain

d~ =- [1- HkAT (AHkAT) -I A] Hk (Qxk +c) where Hk = (Q + X; 2 ) -I


(9.22)
The primal affine scaling algorithm takes d~ as its moving direction and iterates from
the known solution xk to a new interior solution xk+ 1 = xk + akd~, for an appropriate
step-length ak.
An interesting observation can be made here. For (9.22), we may define

such that d~ = -I\Hk(Qxk +c). Compared to the linear programming case, this
projection matrix depends not only upon the affine scaling matrix Xk> but also upon the
second-derivative information Q of the objective function. Therefore, the above direction
of translation has a combined effect of pure affine scaling and Newton's method. To be
more specific, when xk is sufficiently away from the positivity walls of the first orthant,
Hk is dominated by Q- 1 and d~ behaves like a Newton direction. On the other hand,
as xk is close to the positivity walls, Hk is dominated by X~ and d~ behaves like a pure
affine scaling scheme.
It is also important to observe that since we are not directly inverting Q, d~ exists
even when Q is only positive semi-definite. In this case, if we set Q = 0, then d~ reduces
to the direction of translation for the primal affine scaling in the linear programming case.

Finding an appropriate step-length. Unlike the linear programming case,


quadratic programming problems have elliptic (spheric is one kind) contours. Moving
along a given direction can sometimes decrease the objective value first, and then increase
the value beyond a certain point. Therefore, it is more complicated to determine a proper
step-length for quadratic programming. Our principle is to choose a step-length such that
the quadratic objective value z' is reduced by a maximum amount without violating the
nonnegativity requirements on x.
In other words, we consider two intermediate step-lengths. The first one denoted
by a1,is the maximum step size without violating the nonnegativity requirements on
x. The second one, denoted by al, is the maximum step size without increasing the
objective value. The actual step-length ak is, of course, taken to be the minimum of the
two. Figure 9.5 shows the case where al < a~, and Figure 9.6 illustrates the case where
I ?
ak >a;,.
As we derived for the linear programming case, when d~ :=:: 0, we have al = +oo.
Otherwise, let df be the ith component of d~, then a1
is determined by the following
234 Affine Scaling for Convex Quadratic Programming Chap. 9

Direction of translation

x becomes
- negative beyond
this point

Figure 9.5

Objective value
increases beyond
_-'this point

Direction of translation

Figure 9.6

minimum ratio test:

for some 0 < < 1 .9:23/

It is clearly seen that ~ 0.


In order to determine ~, we let x D xk c d\ and evaluate its objective function
(9.1a) at this point as
Sec. 9.2 Affine Scaling for Quadratic Programs 235

Differentiating z with respect to a and setting it to zero, we get

(9.24)

It is left to the reader to verify that af ::: 0. With (9.23) and (9.24), finally we obtain an
appropriate step-length
(9.25)

Stopping rules. Similar to what we have for the linear programming case,
based on the Karush-Kuhn-Tucker conditions (9.3), we can check primal feasibility, dual
feasibility, and complementary slackness for optimality. When all these three conditions
are satisfied, we can terminate the algorithm with an optimal solution. To be more
precise, we preselect three sufficiently small positive numbers E 1 , Ez, and E3 . At the kth
iteration with a feasible solution xk, we define
Hk = (Q+XJ; 2 )- 1 ; w7 = (AHkAT)- 1AHk(Qxk+c); and sk = (Qxk+c)-ATw1

Then we stop the algorithm when the following conditions are met:

(I) PRIMAL FEASIBILITY:

IIAxk- bll
llbll +1
(II) DUAL FEASIBILITY:

Either sk ::: 0 or

(Ill) COMPLEMENTARY SLACKNESS:

(xk)T sk ::::: E3

Also notice that, with these notations, d~ = -Hksk.


A step-by-step implementation procedure. Summarizing what we have
discussed, we can design a primal affine scaling algorithm for convex quadratic programs
as follows:

Step 1 (initialization): Compute x = -Q- 1c. If Ax = b and x ::: 0, then stop


with an optimal solution x.
Otherwise, set k = 0 and apply the Phase-1 linear programming method to find
an interior feasible solution x0 such that Ax0 = b and x0 > 0. Moreover, choose
236 Affine Scaling for Convex Quadratic Programming Chap. 9

E 1 , E2 , E3 to be three sufficiently small positive numbers and a < 1 to be a positive


constant.
Step 2 (computing dual estimates): Compute Hk = (Q + Xk" 2 )- 1• Evaluate w1
by solving

and compute dual slackness


sk = ( Qxk + c) - AT w~

Step 3 (checking for optimality): If

I!Axk- hi!
llhll + 1
and

and
(xk)T sk :::S E3

then stop with an optimal (or near optimal) solution xk. Otherwise, go to the next
step.
Step 4 (finding direction of translation): Compute a direction of translation

~ = -Hksk

Step 5 (computing step-length): Compute

2 [( d~) T ( Qxk + c) J
ak = - [(d~)T Qd~J
If d~ ::: 0, then set ak = al. Otherwise, calculate

a k1 = min
i
axk dk < 0
{ --'
-dk l
I }
l

and set ak = rnin{ak, al}.


Step 6 (moving to a new solution): Perform the translation
xk+
1
+-- xk + akd~
Set k +-- k + 1 and go to Step 2.

The following example illustrates this solution procedure.


Sec. 9.2 Affine Scaling for Quadratic Programs 237

Example 9.1
Minimize 2xf + 3xi + 5xj +X] + 2x2- 3x3
subject to X] +x2 =5

It is clear that

1 1 OJ and Q=
20 03 OJ
0
A= [ 0 1 1 ' [0 0 5
Also note that the unconstrained minimum x = -Q- 1c
= [-0.5 -0.6667 0.6000f is
infeasible; therefore we have to go to Step 2.
Let us start with an initial interior solution x0 = [0.3532 4.6468 5.3532f. (As
a matter of fact, the reader may verify that this solution can be obtained by invoking the
phase-! linear programming method with a starting vector of all ones.) Now, we have

Ho = [Q+X() 2 r 1
=
[
0.0998
o
0
0
o.3283
0
0
o
0.1986
]

The dual estimate is given by

Therefore,
s0 = (Qx0 +c)- AT w0 = [5.2756 - 1.6044 2.6517f

Since the negative component of s0 is too large, it is not optimal yet. Hence we
proceed to synthesize the direction of translation:
d~ = -H0 s0 = [ -0.5267 0.5267 - 0.5267]T
Because d~ is not nonnegative, we set a=
0.99 to calculate a6
= 0.6640. Also we compute
a6 = 1.8098. Therefore the actual step-length ao = 0.6640. Making the translation, we get
a new interior solution as
x 1 = x0 + aod~ = [0.0035 4.9965 5.0035]T
The reader may carry out further iterations to verify that the optimal solution is [0 5 sf.
9.2.2 Improving Primal Affine Scaling for Quadratic
Programming

Similar to Theorem 7 .2, it can be proven that the sequence of iterates {xk} generated by the
primal affine scaling algorithm for quadratic programming problems indeed converges to
an optimal solution under appropriate assumptions. However, as far as the computational
complexity is concerned, there is no evidence that the algorithm is of polynomial-time
bound. The authors have implemented the algorithm and found satisfactory performance
238 Affine Scaling for Convex Quadratic Programming Chap. 9

in solving quadratic programming problems. The characteristics of its performance are


pretty much like those of the linear programming case. In this section we introduce some
ideas of performance improvement.

Range-space formulation vs. null-space formulation. Notice that the ac-


tual moving direction of (9.22) is obtained through the projection of the negative gradient
of the objective function onto the null space of the scaled constrained matrix. Hence it
is sometimes called a null-space formulation. An alternate formulation of the direction
of translation would be the range-space formulation. We shall look into this aspect in
this subsection.
Recall that for an m x n matrix M of full row-rank, the associated null-space
projection matrix is defined as [I - MT (MMT)- 1M] and the range-space projection
matrix is defined as [MT (MMT) - 1M]. In this way, for a vector v E W, its projection
onto the null space of M is given by vn = [I - MT (MMT) - 1M]v and its projection onto
the range space of MT is given by v, = [MT (MMT)- 1M]v, such that v = Vn + v, and
v~v, = 0.
Also recall that, if matrix Me of dimensionality (n - m) x n is the orthogonal
complement of M such that MM~ u = 0 for each vector u E w-m, then the projection
onto the null space of M is equivalent to the projection onto the range space M~. In
this setting, we have

(9.26)

With these concepts in mind, let us revisit formula (9.22). Noting that Hk = (Q+Xk 2 )- 1
is a symmetric positive definite matrix, we denote Tk as its Cholesky factor such that
Tk T[ = Hk. Substituting into Equation (9.22), we see

d~ = - [1-TkT[AT (ATkT[Arri A] TkT[ (Qxk +c)

= -Tk (1- AkT (AkA e) -I Ak) T[ (Qxk +c) (9.27)

where Ak = ATk and AkT is the transpose matrix of Ak.


Now, suppose that matrix Ac is the orthogonal complement of A, then A~ _
Ac(T[)- 1 is the orthogonal complement of Ak. Moreover, (9.27) becomes
d~ = (A~)T [(A~) (A~)Tr (A~) T[
1
-Tk (Qxk +c)
1
= - AcT [ Ac ( T kT)-1 T k-1 AcT] - Ac (Qx k + c)
= - AcT [AcHk-1 AcT] -1 Ac ( Qxk + C)
= - AcT [ Ac ( Q + Xk-2) AcT] -1 Ac (Qxk + c ) (9.28)
Here (9.28) is a range-space formulation of the moving direction which depends on the
orthogonal complement Ac. Given that A= [BIN] with a basis matrix B and non-basis
Sec. 9.2 Affine Scaling for Quadratic Programs 239

matrix N, we know that, from Chapter 3,

(9.29)

Taking a closer look at the primal affine scaling algorithm for quadratic programming,
we see the most time-consuming work is the computation of the moving direction d~. We
now compare the null-space formulation (9.22) with the range-space formulation (9.28)
for solving a convex quadratic program with n variables and m constraints.
In the null-space formulation, the complexity of inverting (Q + x; 2 ) to form Hk
is of O(n 3 ), forming (AHkAT) is of O(mn 2 ), and inverting (AHkAT) is of O(m 3 ).
On the other hand, in the range-space formulation, the complexity of forming
Ac(Q + X; 2 )AJ is of O((n- m)n 2 ) and inverting this matrix is of O((n- m) 3 ).
Therefore the following observations can be made:

1. If n » m, then use the null-space formulation for computing d~.


2. If n ~ m or n- m « m, then use the range-space formulation for computing d~.

An additional advantage of using the range-space formulation is that we can use


a good amount of approximation in performing the matrix inversion in Equation (9.28),
since feasibility is automatically guaranteed by the definition of Ac. In contrast, the
null-space formulation is sensitive to truncation and round-off errors when the projection
matrix is close to the identity matrix. Thus, in general the range-space method is more
robust toward computational errors.

Potential push method. Just as in the linear programming case, the primal
affine scaling algorithm for quadratic programming may also "get trapped" at a wrong
vertex of the feasible domain. In order to avoid this potential problem, we introduce a
"potential push method" in this subsection and a "logarithmic barrier function method"
in the next subsection. Because these two methods are the extensions of what we had in
Chapter 7, we only outline the results without formal proof.
The basic concept of the "potential push" method is illustrated by Figure 9. 7. An
old solution xk- 1 with a moving direction d~- 1 leads to a current solution xk. However,
the current solution is "off the center" and our objective is to push the "off-centered"
solution to a better position which is away from the boundary but with the same objective
value. In order to do so, we define a potential function by

(9.30)

and try to find a projected normal vector a~ and a projected push vector a~ at xk such
that the better position can be determined by minimizing the potential function (9.30)
240 Affine Scaling for Convex Quadratic Programming Chap. 9

Recentering on
constant objective surface

Figure 9.7

along the path


x(t) = xk +(coset- oa~ + (sinBt)a~ (9.31)
where e is a parameterization on t. By a tedious derivation, we can show that
~k T(
dn = Ac AcQAc
T)-1 Act;_k (9.32a)
and
a~= KA~ (AcHkAn -I Acvk (9.32b)
where Ac is the orthogonal complement of A, Hk = (Q + X_;; 2 )- 1, gk is the gradient of ·
the objective function at xk, vk = 1/xk - ~gk (for some scalar ~), and K is a constant
determined by (see page 162 for the definition of ljxk)

K = (gk)T ATc (A cQAT)-]


c
A k
cg
]1/2
1
[ (vk)T AI (AcHkAir AcQAJ (AcHkAir Acvk 1

In real implementation, once a~ and a~ are found, a binary search procedure is usually
applied to find an approximated minimizer x(t) of the potential function (9.30) along
the path (9 .31). According to the authors' experience, the extra computational effort
for the potential push is well repaid for a variety of large-scale quadratic programming
problems. But again, there is no theoretic proof of polynomial-time complexity for the
potential push method.
Sec. 9.3 Primal-Dual Algorithm for Quadratic Programming 241

Logarithmic barrier function method. Just as in the linear programming


case, the logarithmic barrier function method can also be extended for quadratic pro-
gramming. By incurring a logarithmic barrier on the positivity walls with a barrier
parameter f.L > 0, we consider the following problem:

Minimize (9.33a)

subject to Ax = b, X>O (9.33b)


Similar to what we did in Chapter 7, a new moving direction at a current interior feasible
solution xk is obtained as
(9.34)

where Hk = (Q + X;;- 2)- 1, gk = Qxk + c- f,.LX;; 1e, and wk = (AHkAT)- 1AHkgk.


Based on this approach, D. Goldfarb and S. Liu developed an algorithm for convex
quadratic programming and claimed a computational complexity of O(n 3 L).
Comparing Equation (9.34) with Equation (7.49a), we see the connection between
the linear and quadratic cases.

9.3 PRIMAL-DUAL ALGORITHM FOR QUADRATIC PROGRAMMING

9.3.1 Basic Concepts

For a given barrier parameter fL > 0, let us focus on problem (9.33). The associated
Karush-Kuhn-Tucker conditions become
Ax=b, x>O (primal feasibility) (9.35a)
-Qx+ ATw+s = c, S>O (dual feasibility) (9.35b)
XSe = fLe (complementary slackness) (9.35c)
Now assume that (xk; wk; sk) is a current solution with xk > 0, sk > 0 for fLk > 0.
Invoking Newton's method would yield a system of equations for the directions of
translation. This system is given by

Io ] rd~ 1 r
dk
= - -Qxk +Axk - b
AT wk + sk - c 1 (9.36)
Xk d~ xkske - f.Lke

where xk and sk are the diagonal matrices formed by vectors xk and sk, respectively.
Note that (9.36) can be written as
AdkX = t;k where tk =b- Axk (9.37a)
where uk = Qxk +c - AT wk - sk (9.37b)
where vk = j.Lke - xkske (9.37c)
242 Affine Scaling for Conve:-< Quadratic Programming Chap. 9

Premultiplying Equation (9.37b) by Xb we have


-XkQd~ + XkAT d~ + Xkd~ = Xkuk (9.38)
Subtracting Equation (9.37c) from (9.38) would produce
- (Sk + XkQ) d~ + XkAT d~ = Xkuk- vk (9.39)
In other words, we see
-d~ + (Sk + XkQ)- 1 XkAT d~ = (Sk + XkQ)- 1 (Xkuk- vk) (9.40)
Premultiplying Equation (9.40) by A, we get
-Ad~+ A (Sk + XkQ)- 1 XkAT d~ =A (Sk + XkQ)- 1 (Xkuk- yk) (9.41)
Together with (9.37a), the above equation implies that
A (Sk + XkQ)- 1 XkAT d~ =A (Sk + XkQ)- 1 (Xkuk- vk) + tk (9.42)
Consequently, we have

d~ = [A (Sk + XkQ)- 1 XkAT] -I (A (Sk + XkQ)- 1 (Xkuk- vk) + tk] (9.43)


By Equation (9.40) we further have
d~ = (Sk + XkQ)- 1 XkAT d~- (Sk + XkQ)- 1 [Xkuk- vk]
= (Sk + XkQ)- 1 [Xk (AT d~ - uk) + yk] (9.44)
Also from Equation (9.37c) we get
d~ = Xk 1 (vk- Skd~) (9.45)
Therefore, a new solution can be obtained by choosing appropriate step-lengths a{ and
af in the primal and dual spaces, such that
xk+l = xk +a{d~ (9.46a)
wk+
1
= wk + afd~ (9.46b)
and
(9.46c)
To determine appropriate step-lengths, we have to maintain the positivity requirements
of x and s. Also the Newton step restricts a maximum step-length of 1. Therefore, just
as in the linear programming case, we take

(9.47a)

and
(9.47b)
Sec. 9.3 Primai~Dual Algorithm for Quadratic Programming 243

where d.~; is the ith component of d~, d~ the ith component of d~, xf the ith component
of xk, sf the i th component of sk, and 0 < a < 1 is a constant.
In addition, at each iteration, the barrier parameter f.Lk can be updated according to
the formula

for a constant 0 < a < 1

After determining how to iterate from one solution to another one, the remaining
work is to start and stop the iterations. Similar to the development in Chapter 7, here we
introduce a version of the primal-dual algorithm which can start with an arbitrary triplet
(x0 , w0 , s0 ) with appropriate dimensions and x0 , s0 > 0. The algorithm terminates when
the Karush-Kuhn-Tucker conditions (9.35) are met.

9.3.2 A Step-By-Step Implementation Procedure

Based on the basic concepts we discussed in the previous subsection, we list a version of
the primal-dual algorithm for convex quadratic programming. Although no theoretical
proof of convergence and polynomial-time complexity has been derived for this ver-
sion, experiences of solving real-life problems support its computational efficiency and
effectiveness.

Step 1 (starting the algorithm): Set k = 0. Start with any (x0 ; w0 ; s0 ) such that
x 0 > 0 and s0 > 0. Fix E 1, E2 and E3 to be three sufficiently small positive numbers
and 0 <a < 1.
Step 2 (intermediate computations): Compute
a (xk)T sk
f.Lk = - - -
n
tk=b-Axk

uk = Qxk + c - AT wk - sk

v" = t.tke - XkSke

Step 3 (checking for optimality): If

k
f.L < Et, and

then STOP. The current solution is optimal. Otherwise go to the next step.
(Note: Compute lluk II and IIQxk +ell only for those dual constraints which
are violated, i.e., corresponding to the negative components of uk. If uk :::: 0, then
there is no need to compute this measure of optimality.)
244 Affine Scaling for Convex Quadratic Programming Chap. 9

Step 4 (finding the directions of translation): Compute the moving direction

d~ = [A (Sk + XkQ)- 1 XkATr 1 [A (Sk + XkQ)- 1 (Xkuk- vk) + tk]


d~ = (Sk + XkQ)- 1 [Xk (AT d~ - uk) + vk]
and

Step 5 (finding step-lengths): Compute the primal and dual step-lengths with
0 < a < 1 (say, 0.99):

and

Step 6 (moving to a new solution): Update the solution vectors as


xk+l +- xk + aPk dkX

Set k +- k + 1 and go to Step 2.

Now we present a simple example to illustrate the above procedure.


Example 9.2
Consider the same problem as shown in Example 9.1. As the starting points, let us choose
x0 = [I I If, w 0 = [0 of, s0 = [I I If. In this case, both Xo and So are
identity matrices.
We now compute

t0 = b - Ax 0 = [3 sf, u0 = Qx0 +c-AT w0 - s0 = [2 4 I]T

v0 = p., 0 e- XoSoe = [0 0 of, p.,o =I

It is clear to see that we are not optimal yet, since the primal vector x0 is not feasible at
all. Therefore Step 4 requires that

do= [ 2.3077 -1.3846] [4.6667] [-l.923IJ


w -1.3846 3.2308 9.I667 = 23.I538
d~ = [-1.3077 4.3077 3.6923f

d~ = [1.3077 -4.3077 - 3.6923f


Sec. 9.3 Primal-Dual Algorithm for Quadratic Programming 245

a
If we choose = 0.99, then Step 5 implies that at = 0.7571 and ag = 0.2298. Following
Step 6, we have new solution vectors:
x 1 =[1 1 lf+0.757lx[-1.3077 4.3077 3.6923f
= [0.0100 4.2612 3.7953f
1
s = (1 1 If + 0.2298 x (1.3077 - 4.3077 - 3.6923f
= [1.3005 0.0010 0.1514f
and
1
W = (0 Of +0.2298 X (-1.9231 23.1538]T = (-0.4496 5.3213]T
Note that the new primal vector x1
is closer to satisfying the primal feasibility requirement.
The reader is urged to carry out further iterations in order to terminate the procedure with
an optimal solution.

9.3.3 Convergence Properties of the Primal-Dual Algorithm

As indicated earlier, both the convergence and polynomiality properties of the above
version of the primal-dual algorithm have not been fully investigated, although it indeed
works very well for solving many real-life problems. To answer the theoretic questions,
R. C. Monteiro and I. Adler have reported another version of the primal-dual algorithm
which converges to optimality with O(n 3 L) complexity.
Their version requires starting with a solution (x0 ; w0 ; s0 ) such that the primal
feasibility and dual feasibility conditions (9.35a, b) are met. These two conditions are
also required to be maintained at each iteration. In other words, tk and uk must be kept
at 0 for each k. In addition, the starting solution (x0 ; w0 ; s0 ) and a corresponding p., 0 are
required to satisfy the condition
lifO- p.,Oell : :_ 8p.,O
where f 0 = [x?s?, xgsg, x~s~]T and e is a small positive number, say 0.1.
The algorithm updates the value of the barrier parameter p., according to the fol-
lowing formula:
p.,k+! = f.Lk (1- of-Iii)
where 8 is also a small positive number, say 0.1. Their moving directions are also
synthesized according to Equations (9.43), (9.44) and (9.45) with tk = 0 and uk = 0.
But, instead of evaluating at the real xk and sk, they introduced the so-called "adjusted"
values of x_k and sk such that
lxf- xfl :::_y for each i
I# I
and
lsf- sf I :::_y for each i
lsfl
246 Affine Scaling for Convex Quadratic Programming Chap. 9

where y is again a small positive number, say 0.1. The moving directions are computed
by using xk and sk instead of xk and sk in Equations (9.43)-(9.45).
Since both the primal and dual feasibility conditions are maintained, R. C. Monteiro
and I. Adler showed that their algorithm takes at most 0 (.j!i max(log E- 1, log n, log M0))
iterations to terminate with (xk)T sk :s E. With the barrier parameter Mo satisfying the
condition log (M 0 ) = O(L), the algorithm stops in O(.j!iL) iterations. Now, at each
iteration the complexity analysis is pretty much like the linear programming case. Hence
the total complexity can be brought down to O(n 3 L) arithmetic operations.
From a practical point of view, although Monteiro and Adler's primal-dual algo-
rithm is theoretically interesting, it is not computationally attractive. Basically, several
parameters in the algorithm are heuristically chosen, and it is difficult to find "univer-
sally good" values of these parameters. Furthermore, it is not that easy to find a starting
solution satisfying all the requirements, and the computations of xk and sk and related
approximations are found to adversely affect the efficiency of the algorithm.

9.4 CONVEX PROGRAMMING WITH LINEAR CONSTRAINTS

In this section, we further extend the primal affine scaling algorithm to solve convex
programming problems with linear constraints. In certain sense this extension is expected,
since a convex function can be approximated by a quadratic function in a neighborhood
of each point of interest. Therefore, it is a natural extension from quadratic to convex
programming.

9.4.1 Basic Concepts

For simplicity, we assume that f(x) is a twice continuously differentiable real-valued


function defined on Rn. Let us denote its gradient vector by Vf (x) and Hessian matrix
by V2 f (x) at a point x E Rn. Then f is a convex function on Rn if and only if V2 f (x) is
positive semidefinite for every x E Rn. Moreover, if an unconstrained convex function f
has a minimizer in W, then its minimizer(s) can be determined by solving Vf(x) = 0. In
particular, f is strictly convex if and only if V2 f (x) is positive definite for every x E Rn.
In this case, f has a unique minimizer if it indeed has one. As to the constrained convex
programming problem, it becomes more complicated than solving Vf (x) = 0. We shall
see this from later discussion.
Given that f is a twice continuously differentiable strictly convex function, A is
an m x n matrix, and b E Rn, a linearly constrained convex programming problem is
defined as follows:
Minimize f (x) (9.48a)
subject to Ax = b and x ?: 0 (9.48b)
Assume that the constraint matrix A has full row-rank and b lies in the range space of
A. Then there is no redundant constraint, and the feasible domain defined by (9.48b)
is nonempty. Just as in the quadratic programming case, each contour surface of f(x)
Sec. 9.4 Convex Programming with Linear Constraints 247

is no longer a hyperplane, although the feasible domain stays the same as in a linear
programming problem.
The Karush-Kuhn-Tucker conditions for problem (9.48) are given by
Ax=b, (9.49a)
7
-Vf(x)+A w+s=0, (9.49b)
XSe=O (9.49c)
Notice that when \.;2 f(x) is positive definite, conditions (9.49a)-(9.49c) can be used for
the optimality test. Also notice that if f (x) = ~ x 7 Qx + c7 x, then Vf (x) = Qx + c and
conditions (9.49a)-(9.49c) become conditions (9.3a)-(9.3c). In other words, the work of
linearly constrained quadratic programming is a special case of this general setting.
The basic idea of designing a primal affine scaling algorithm for a linearly con-
strained convex programming problem is pretty much straightforward. We simply replace
the role of Q and Qx + c in the quadratic programming algorithm by \/2 f (x) and Vf (x),
respectively, to handle the convex case. In this way, for an interior feasible solution xk,
the moving direction is determined by

where Hk = (V2 f (xk) + X;:-


2
r'
(9.50)
A new iterate is generated according to the following translation:
1
xk+ = xk + akd~ (9.51)
where ak is an appropriate step-length.
Remember that we need two parameters al and CY.f to decide ak in the quadratic
programming case. Here, the choice of ak is still the same as (9.23) requires, because
the nonnegativity requirement is unchanged. As to af, we let x = xk +ad~ and evaluate
its objective function (9.48a) at x as
f(a) = f (xk +ad~)
Then we can conduct a line search on a to find a minimizer (or approximated minimizer)
and set it to be af. In this way, the objective function is guaranteed to decrease when
we move along the direction d~ with a step-length shorter than af. The final step-length
is, of course, the smaller of the two, i.e.,

Once the moving direction and step-length are known, the algorithm iterates from one
interior feasible solution to another. Moreover, the Karush-Kuhn-Tucker conditions are
checked for the termination of the algorithm. To be more precise, we preselect three
sufficiently small positive numbers E 1, E2 , and E3 . At the kth iteration with a feasible
solution xk, we define

Hk = (V2 f (xk) + X;:- 2 r 1


; wk = (AHkA 7 r 1
AHk Vf (xk);
and sk = Vf (xk) - A7 wk
248 Affine Scaling for Convex Quadratic Programming Chap. 9

Then we stop the algorithm when the following conditions are met:

(I) PRIMAL FEASIBILITY:

I!Axk- bjj
llbll + 1
(II) DUAL FEASIBILITY:

Either sk ::: 0 or

(Ill) COMPLEMENTARY SLACKNESS:

(xk)T sk:::: E3

9.4.2 A Step-by-Step Implementation Procedure

Based on the basic concepts, a primal affine scaling algorithm is assembled as follows
to solve convex programming problems with linear constraints.

Step 1 (initialization): Solve \1 f(x) = 0 for x. If Ax = b and x :;:: 0, then stop


with an optimal solution x. Otherwise, set k = 0 and apply the Phase-! linear
programming method to find an interior feasible solution x0 such that Ax0 = b and
x0 > 0. Moreover, choose E 1 , E2 , E3 to be three sufficiently small positive numbers
and a < 1 to be a positive constant.
Step 2 (computing dual estimates): Compute Hk = (\12 f(xk)+Xk 2 )- 1 . Evaluate
wk by solving

[AHkAT] w~ = AHk \1 f (xk)


and compute dual slackness
sk = \lf(xk)- AT~
Step 3 (checking for optimality): If
jjAxk- bjj
llbll + 1
and

either sk ::: 0 or

and
References for Further Reading 249

then stop with an optimal (actually, near optimal) solution xk. Otherwise, go to
the next step.
Step 4 (finding a direction of translation): Compute a direction of translation
d~ = -Hksk
Step 5 (computing step-length): Using the line search to find af by approximating
the minimizer of
f(a) = f (xk +ad~)
If d~ :::: 0, then set ak = af. Otherwise, calculate
I . ax; k
ak = ~m { -df
k d; < 0 I }
and set ak = min{ak, af}.
Step 6 (moving to a new solution): Perform the translation
xk+
1
+- xk + akd~
Set k +- k + 1 and go to Step 2.
Again, we have to point out that both the convergence proof and modifications for
polynomial-time complexity analysis are subject to further investigation. Here we only
outline an implementation procedure with no further theoretic indications.

9.5 CONCLUDING REMARKS

In this chapter, we have seen how the variants of affine scaling can be extended to
solve linearly constrained quadratic programming and convex programming problems.
The basic ideas are quite simple, but more rigorous study on the convergence proof and
polynomiality complexity analysis remains a challenge. Here we merely introduce some
practical implementations. Moreover, from these practical implementations, we learn the
potential of developing a general purpose affine scaling solver for linear, quadratic, and
linearly constrained convex programming problems, because the basic ideas follow the
same logic. Researchers are also studying the linear complementarity problem and other
types of nonlinear programming problems from the interior point of view.

REFERENCES FOR FURTHER READING

9.1 Cheng, Y-C., Houck, D. J., Jr., Meketon, M.S., Slutsman, L., Vanderbei, R. J., and Wang, P.,
"The AT&T KORBX® System," AT&T Technical Journal 68, No. 3, 7-19 (1989).
9.2 Fang, S. C., and Tsao, J. H-S., "An unconstrained convex programming approach to solv-
ing convex quadratic programming problems," OR Report No. 263, North Carolina State
University, Raleigh, NC (1992).
250 Affine Scaling for Convex Quadratic Programming Chap. 9

9.3 Freedman, B. A, Puthenpura, S. C., and Sinha, L. P., "A new Karmarkar-based algorithm
for optimizing convex, non-linear cost functions with linear constraints," Technical Memo-
randum, No. 54142-870217-01TM, AT&T Bell Laboratories (1987).
9.4 Goldfarb, D., and Liu, S., "An O(n 3 L) primal interior point algorithm for convex quadratic
programming," Mathematical Programming 49, 325-340 (1991).
9.5 Jarre, F., "On the convergence of the method of analytical centers when applied to con-
vex programs," manuscript, Institute fi.ir Angewandte Mathematik and Statistik, Universitat
Wurzburg, Wurzburg, Germany (1987).
9.6 Jarre, F., "The method of analytical centers for smooth convex programs," PhD thesis, Insti-
tute fi.ir Angewandte Mathematik and Statistik, Universitat Wurzburg, Wurzburg, Germany
(1987).
9.7 Kapoor, S., and Vaidya, P.M., "Fast algorithms for convex quadratic programming and mul-
ticommodity flows," Proceedings of the 18th Annual Symposium on Theory of Computing,
Berkeley, CA, 147-159 (1986).
9.8 Kozlov, M. K., Tarasov, S. P., and Khachian, L. G., "Polynomial solvability of convex
quadratic programming" (in Russian), Doklady Akademiia Nauk USSR 5, 1051-1053 (1979).
9.9 Mehrotra, S., and Sun, J., "An algorithm for convex quadratic programming that requires
O(n 3 ·5 L) arithmetic operations," Mathematics of Operations Research 15, 342-363 (1990).
9.10 Mehrotra, S., and Sun, J., "A method of analytic centers for quadratically constrained convex
quadratic programs," SIAM Journal of Numerical Analysis 28, 529-544 (1991).
9.11 Mehrotra, S., and Sun, J., "An interior point method for solving smooth convex programs
based on Newton's method," Contemporary Mathematics 114, 265-284 (1991).
9.12 Monteiro, R. C., and Adler, I., "Interior path following primal-dual algorithms. Part II: convex
quadratic programming," Mathematical Programming 44, 43-66 (1989).
9.13 Monteiro, R. C., Adler, I., and Resende, M. C., "A polynomial-time primal-dual affine scaling
algorithm for linear and convex quadratic programming and its power series extension,"
Mathematics of Operations Research 15, 191-214 (1990).
9.14 Sonnevand, G., "An analytical center for polyhedrons and new classes of global algorithms
for linear (smooth, convex) programming," in Proceedings of the 12th IFIP Conference on
System Modeling and Optimizations, Budapest, Lecture Notes in Control Information Sciences,
Springer-Verlag, New York, 84, 866-876 (1985).
9.15 Wolfe, P., "The simplex method for quadratic programming," Econometrica 27, 382-398
(1959).
9.16 Ye, Y., "An extension of Karmarkar's algorithm and the trust region method for quadratic
programming," in Progress in Mathematical Programming: Interior-Point and Related Meth-
ods, ed. N. Megiddo, Springer-Verlag, New York, 49-64 (1989).
9.17 Ye, Y., and Tse, E., "A polynomial-time algorithm for convex quadratic programming,"
Manuscript, Department of Engineering-Economic Systems, Stanford University, Stanford,
CA (1986).

EXERCISES

9.1. Let L be an n x n symmetric matrix and positive definite on the null space of matrix A,
where A is an m x n matrix with rank m. Show that the matrix
Exercises 251

L AT)
(A 0
is nonsingular.
9.2. Let P = {x E Rnl-l::: Xi::: 1, i = 1, ... , n} and f be a quadratic norm-function such that
j(x) = -llxl! 2 .
(a) Verify that any vertex of Pis a local minimum point of f.
(b) Can you use the algorithms developed in this chapter to solve this problem? Why?
9.3. Solve the following quadratic programming problems:
(a) Minimize x 2 - 14x + y 2 - 6y- 5 subject to x + y ::: 2 and -x- 2y :::: -3.
(b) Maximize 6xy- 2x 2 - 9y 2 + 18x- 9y + 10 subject to x + 2y ::: 12 and x, y :::: 0.
9.4. Show that
(a) X~(X~+Q-1)-1 = (Q+X;2)-1Q.
(b) I- Xk(X~ + Q- 1)- 1Xk = x; 1(Q +X; 2)-1X; 1.
9.5. Derive Equation (9.24) and verify that af : :
0.
9.6. Consider the following convex programming (primal) problem

Minimize f(x)

subject to g(x) ::: 0

h(x) = 0, XEX

Its Lagrangian dual problem is defined by

Maximize p(v, w)

subject to v :::: 0 and w unconstrained

where

p(v, w) = inf { f(x) + vT g(x) + wTh(x) x E I X}


and v and w are corresponding vectors of Lagrangian multipliers. Use this concept to show
that problems (9.1) and (9.2) are a pair of dual problems. [Hint: For a given w, the function
1
-xTQx+ CT X - WT Ax
2
is convex and attains its minimum at x = v such that Qv+c-AT w = 0. Also the minimum
value is (-1/2)vTQv.]
9.7. Show that Equation (9.3) consists of the Karush-Kuhn-Tucker conditions for the problem
depicted by Equation (9.1).
9.8. From Equation (9.30), show that qi = ¢}+¢~,where
n
and ¢~ =- I)oge (xn
i=1

Prove that ¢} is rotation invariant (i.e., you may rotate xk such that xk -+ Rxk where R is
orthonormal (RRT =I), without affecting the minimization of¢}) and¢~ is scale invariant
(i.e., you may scale xk without affecting the minimization of ¢~).
252 Affine Scaling for Convex Quadratic Programming Chap. 9

9.9. For the (d~, d~) pair depicted by Equations (9.32a) and (9.32b), show that

(d~)T Qd~ = 0

(d.~f Qd~ = (d;)T Qd;


(gk)T a;= o
(t) T d~ = (d~) T Qd~
(These are often called the boundary conditions for the potential push for QP problems.)
9.10. For the x(t) given by Equation (9.31), show that
lim x(t) = xk + Pktvk
IIQII---+0

such that cT vk = 0, Avk = 0, and Pk is a positive constant. (Note: This is. the potential
push equation for the LP case.) [Hint: Note that e in Equation (9.31) can be chosen as the
spectral norm of Q.]
9.11. Show that x(t) is a geodesic in the transformed space (where the constant objective surfaces
are spherical), and hence the locus of x(t) represents a great circle. [Hint: Show that
2
d x'(t)]T [dx'(t)] = 0 where x'(t) = LT x(t) and LLT =Q
[ dt2 dt
which implies zero acceleration in the tangent space. This is the striking property of
geodesics.]
9.12. Assume that the vector y 0 = e. Generate a sequence of vectors {Yk} by using the relationship
yk+l = Qyk

Show that the ratio


(yk)T yk+l
Ak = -'--"--.,--
(yk)T yk
approaches the spectral radius (i.e., the largest eigenvalue) of Q. [Note: Ak for sufficiently
large values of k can be considered as a practical choice of e for the potential push in
Equation (9.31).]
9.13. Carry out one more iteration of Example 9.1.
9.14. Carry out one more iteration of Example 9.2.
70

lrn lementation
of Interior-Point
Algorithrns
In recent years the interior-point algorithms have shown their efficiency in solving large-
scale linear and quadratic programming problems with a wide variety of successful ap-
plications. As a matter of fact, some large-scale problems became solvable owing to the
invention of these techniques. However, it is important to understand that implementa-
tion techniques play a key role in the claimed efficiency of these methods. For example,
we can easily implement the primal affine scaling algorithm for linear programming in
APL language in less than an hour, involving less than twenty lines of coding, but the
performance of such an implementation could be far from satisfactory. Many implemen-
tation issues need to be carefully addressed in order to achieve the expected performance.
Nowadays, with the advent of vector/parallel processing capabilities of modem comput-
ers, implementation skills are much more involved than ever before. This is particularly
true for any serious commercial software package.
In this chapter, we point out the computational bottleneck of interior-point algo-
rithms and focus on some implementation techniques, including the Cholesky factoriza-
tion, conjugate gradient, and LQ factorization methods, to tackle the bottleneck problem.
By no means does this chapter provide a complete treatment; it only touches the tip of
an iceberg.

10.1 THE COMPUTATIONAL BOTTLENECK

So far we have studied the Karmarkar's projective scaling algorithm, primal affine scaling
algorithm, dual affine scaling algorithm, primal-dual algorithm, affine scaling with loga-
rithmic barrier function method, affine scaling with power-series method, and extended
affine scaling algorithms for linearly constrained quadratic and convex programming

253
254 Implementation of Interior-Point Algorithms Chap. 10

problems. For all these interior-point algorithms, as discussed in previous chapters,


most computational time is spent in inverting a fundamental matrix M to find a mov-
ing direction at each iteration-for example, M = AX~AT in the primal affine scaling,
M = ASk 2 AT in the dual affine scaling, M = AXkSk 1AT in the primal-dual algorithm,
and M = A(Q + X~)AT in the quadratic programming affine scaling algorithm.
Note that this time-consuming task is equivalent to solving a system of linear
equations
Mu=v (10.1)
where M is an m x m positive definite symmetric matrix and u, v E Rm are m-dimensional
column vectors. Therefore, it is a crucial challenge to solve system (10.1) in a most effi-
cient manner. Actually, solving a system of linear equations is not a new problem. Many
books have been written for this purpose. Here we only focus on the three most popular
methods, namely, Cholesky factorization, conjugate gradient, and LQ factorization, and
discuss related implementation issues.

10.2 THE CHOLESKY FACTORIZATION METHOD

The idea of the Cholesky factorization method is quite simple. Instead of solving system
(10.1) directly, since the fundamental matrix M in (10.1) is symmetric and positive
definite, based on Cholesky, we first factorize it as a matrix product of an m x m lower
triangular matrix L and its transpose matrix LT, i.e., M = LL 7 . In this way, (10.1)
becomes
LL 7 u = v (10.2)
7
We further define z = L u. By solving
Lz = v (10.3)
for z first and then solving
(10.4)
for u, we find a solution to system (10.1) in two stages. Since L is a lower triangular
matrix, we can easily identify z1 first, then z2 , z3 , ..• , Zm by simple arithmetic operations.
Usually this process is called forward solve. Similarly, because LT is an upper triangular
matrix, we can easily identify Um first, then Urn-!, Um-2, ... , u 1 by simple arithmetic
operations. Therefore, it is often called backward solve.
It is easy to understand the advantage of forward solve and backward solve. But
the key to success is to find the Cholesky factor L in an efficient manner. Before
we introduce potential factorization algorithms, let us study a fundamental theorem of
Cholesky factorization.

Theorem 10.1. If M is an (m x m)-dimensional symmetric positive definite


matrix, then there is a unique lower triangular matrix L with positive diagonal elements
such that M = LL7 .
Sec. 10.2 The Cholesky Factorization Method 255

Proof We prove this result by the induction method.


The result is obviously true for m = 1. Now, assuming the result is true for
m = n- 1, we have to show it holds true form = n, where n is a positive integer. Since
M is symmetric and positive definite, we can partition it as

(10.5)

where d > 0, u is ann- 1 vector, and M 1 is an (n- 1) x (n- 1) submatrix. It can be


further written as

M= [u~ n[~ ~~J [~ uTi~J (10.6)

where I is the (n- 1) x (n- 1) identity matrix and M 1 = M 1 - uuT jd.


M 1 is clearly symmetric. It is also positive definite, since for any nonzero vector
X E R"- 1,

xTM 1x = [-xT ujd I xT] d


[u
uT
M J [- xTujdJ
x = zTMz > 0
1

where z = [-xT uj d I xT]T E R". Therefore, by our assumption, M 1 = M 1 - uuT j d has


a unique triangular factorization with positive diagonals, say M 1 = L 1Lf. Thus M may
be expressed as

M= [ uj~
~ OJI [10 L1OJ [10 0
Lf
J [~0 uT/~J
I
(10.7)

[ uj~
~
=
Since L 1 is unique, it is clear that

L= [u~ ~J
is also unique, and the proof is complete.

10.2.1 Computing the Cholesky Factor

We now focus on computing techniques for finding the Cholesky factor L of a symmetric
positive definite matrix M. Let mij and lij be the (i, j)th element of matrices M and
L, respectively, for i, j = 1, 2, ... , m. Since M = LLT, by matrix multiplication, we
know that
j

mij = ~likljk (10.8)


k=l

Because L is a lower triangular matrix, we need only consider the elements lij with
i ~ j. In this case, (10.8) implies that, for j = 1,
(10.9a)
256 Implementation of Interior-Point Algorithms Chap. 10

and
i = 2, ... ,m (10.9b)
Moreover, for j = 2, 3, ... , m, we can first compute

j-1 ) 1/2
ljj = (
mjj- :z=zJk (10.10a)
k=l

and then compute

for i = j + 1, j + 2, ... , m (10.10b)

In this scheme, the columns of L are computed one by one, but the part of the matrix
remaining to be factored is not accessed during the scheme. Also because the inner
product of subrows of L is calculated in (10.10), this scheme is called an inner-product
form. The inner-product form certainly is not the only way of computing the Cholesky
factor. As a matter of fact, the proof of Theorem 10.1 itself is a constructive proof.
It suggests a scheme called outer-product form of computing the rows of L one by
one. Details of this new scheme will be provided in the exercises. As to the detailed
implementation, the inner-product form scheme can be easily coded as follows:

Algorithm C-1

!11 -<-- ..Jii'i11


for i = 2 to m
li1 -<-- mi1 I l11
end
for j = 2 to m
for i =j to m
s ..- mij
for k =1 to j - 1
s ..- s-likljk
end
if i =j
l jj -<-- ,jS
else
lij-<-- S/ljj
end
end
end

Note that Algorithm C-1 is based on the fact that the inner products between the subrows
of L and the lower triangular portion of M can be overwritten by the corresponding
elements of L (once L is known, we do not need to M any longer). To speed up the
performance, we may consider using a computer with vector processing capability. In
Sec. 10.2 The Cholesky Factorization Method 257

this environment, for i == 1, ... , m and j = 1, ... , i, let mij == (mn, ... , mij )T be a
column vector and consider the following coding, which overwrites miJ for liJ:

Algorithm C-2

/11 ~ .J'i7711
for i = 2 to m
li1 ~ mi1 I l11
end
for j = 2 to m
for i = j to m
T
s ~ mii- mi(j- 1)mi(j-1)
if i = i
mii ~ .jS
else
mii ~ sjlii
end
end
end

The difference between Algorithm C-1 and Algorithm C-2 may look subtle, but it clearly
illustrates that detailed implementation on the coding level could gain speed and reduce
the memory requirement. Another interesting issue to note here is that in Algorithms C-1
and C-2, the operations describing the vector inner products are on the subrows of a ma-
trix, which may perform less efficiently if the matrix elements are stored columnwise (as
in the case of FORTRAN). Therefore, if we choose to implement the algorithm in FOR-
TRAN, a code reorganizing has to be done to allow the operations to access contiguous
memory locations and thereby cut down memory access time. As to C programming,
since the elements are stored rowwise, Algorithms C-1 and C-2 can be implemented
without any degradation in performance due to memory access.
Newer FORTRAN and C compilers also allow the so-called recursive functions
and subroutines. This is an interesting feature, where a function or subroutine can call
itself. This feature can be effectively used in Cholesky factorization. The way recursion
can be invoked varies from compiler to compiler for particular applications, and hence
is beyond our scope. The important message to a serious program developer is to study
the compiler before implementing any interior-point algorithm.
Another important aspect one should not leave out in this discussion is the block
Cholesky factorization.

10.2.2 Block Cholesky Factorization

Knowing that matrix operations can be highly parallelizable (several row or column
operations can be done simultaneously), when we are dealing with large-scale problems
with special block structure in the constraint matrix, we should further study the Cholesky
factorization algorithm.
258 Implementation of Interior-Point Algorithms Chap. 10

Given an (m x m)-dimensional symmetric positive definite matrix M partitioned


into p 2 subblocks:

[
~!!
M= . ~lpl
Mpl Mpp

such that m = pr, where r is referred to as the block size. The Cholesky factor of M
can be partitioned accordingly as
Loll
L= .
[
Lpl
By directly equating LLT = M with block structure, we see that

LIIL[1 = M11 (10.11a)


fori= 2, ... , p (lO.llb)
Moreover, by matrix multiplication, for p ::::: i ::::: j ::::: 2,
j

Mij = L L;kLJk
k=l
and hence
j-1

LijLL = M;j - L L;kLJk (10.12)


k=l
If we denote
j-l

S;j = M;j - L L;kLJk


k=l

s
then, for p ::::: i ::::: j ::::: 2, L 11 is the Cholesky factor of 11 and Lij is the solution of the
matrix equation ZLL = sij. Hence a block Cholesky factorization scheme is obtained:

Algorithm C-3

compute the Cholesky factor of M11 for L11


for i = 2 to p
solve ZL[1 = Mi1 for Li1
end
for j = 2 to p
for i = j to p
S ~ Mij
for k = 1 to j - 1
s~s-LikLJk
end
if j = j
Sec. 10.2 The Cholesky Factorization Method 259

compute the Cholesky factor of S for Ljj


else
solve ZL ~ =S for Lij
end
end
end

Note that Algorithm C-3 may use Algorithm C-1 or C-2 to find the Cholesky factor
for block submatrices. Also note that since L£
is upper triangular, solving ZL£ = S
is relatively simple. The recursive subroutines can be used here quite efficiently, if
the compiler supports this feature. One key factor that affects the performance of block
Cholesky factorization is the choice of the block size r, which often needs careful thinking
and experimentation. The development of block Cholesky factorization algorithms and
their implementations, especially on vector/parallel processors, is an active research area.

10.2.3 Sparse Cholesky Factorization

For large-scale problems, it is quite possible that most elements of matrix M have zero
value. The sparsity is measured by the ratio between the number of nonzero elements
and the total number of elements in a matrix. When the sparsity ratio is relatively low,
say 0.01 or even smaller, we say the matrix is a sparse matrix. Otherwise, we have a
dense matrix. However, there is no clear-cut threshold sparsity ratio.
When sparse matrices are involved, it is no longer necessary to keep track of their
every element. Most attention needs to be focused on the "position and value" of nonzero
elements only. The techniques which help us manipulate the sparse matrix operations
are often called the sparse matrix techniques. Many books have been written on this
topic.
As to applying Cholesky factorization methods to a symmetric positive definite
sparse matrix M, the key concern is to prevent the Cholesky factor L from being dense.
The following example shows that a relatively sparse matrix M could have a relatively
dense Cholesky factor L.
Example 10.1
(from A. George and J. W. Liu): Let

r~ ~-5
2 0.5

I~J
0 0
M= 3 0
0.5 0 0 0.625
2 0 0 0
Applying Algorithm C-1, we have

r
~.5
L= l
0.25
l
0.5
-1.0
-0.25
-1.0
l
-0.50
-2
0.50
-3
]
260 Implementation of Interior-Point Algorithms Chap. 10

Observe that, although M is relatively sparse, the corresponding Cholesky factor L


is really dense. This phenomenon of increasing the number of nonzero elements is called
fill-in, which is a direct consequence of the structure of M. Fill-in no doubt increases
the computational burden, since more nonzero elements need to be taken care of. In
addition, the storage requirements obviously increase with the fill-in phenomenon.
A commonly adopted technique to reduce fill-ins is to permute the rows and
columns of M such that it attains a structure which in tum makes the Cholesky fac-
tor sparse. To be more specific, we choose P to be an appropriate permutation matrix
which permutes the rows of M into a desirable structure. Then, instead of handling the
system (10.1), we consider an equivalent system
(10.13)

Note that for any permutation matrix P, the matrix PMP7 is still symmetric and positive
definite. It is also interesting to note that, in PMP7 , P permutes the rows of M and P 7
permutes the columns of M.
Example 10.2
For Example 10.1, if we choose a permutation matrix
o o o o

l
0 0 0 1
P= 0 0 1 0
0 1 0 0
I 0 0 0
then the rows and columns of M are permuted as
l6 0 0

l
0 0.625 0
PMP 7 = 0 0 3
0 0 0
2 0.5 2

In this way, the Cholesky factor of PMP 7 becomes

LJ~ 0.791
0
0
1.73
0 0.707
lo.5 0.632 1.15 1.41

Compared to the Cholesky factor in Example 10.1, the new Cholesky factor is relatively
sparse. Consequently, solving system (10.13) is more efficient than solving system (10.1).
With the abovementioned concept, we understand the key issue in sparse Cholesky
factorization is to find an appropriate permutation P for a given symmetric positive def-
inite matrix M such that the number of fill-ins is minimized. Unfortunately, minimizing
fill-ins is not a simple problem in general. So far, only heuristics have been proposed
by various researchers to provide acceptable, but not necessarily optimal, results.
Sec. 10.2 The Cholesky Factorization Method 261

The most popular fill-in reduction scheme is the so-called minimum degree reorder-
ing algorithm. Here we briefly outline the algorithm, leaving the reader to find detailed
explanations and theoretical insights in other books.
The algorithm may be best understood by graphical illustrations. First of all, let
us establish a relationship between graphs and matrices. For an (m x m)-dimensional
matrix M, we define an ordered graph of M and denoted it by eM. In eM, there are
m nodes, and a node i is connected to node j (where i =j::. j) by a link if the (i, j)th
element of M is not zero, i.e., mij =j::. 0. Figure 10.1 illustrates this situation with the
help of an example, where the off-diagonal nonzeros of M are depicted by asterisks.

1
* *
2
*
* 3 * *
* 4 * *
5 *
* * * 6
7
* * *
M

Figure 10.1

Two nodes i and j are adjacent if they are connected by a link. The degree of
a node i, denoted by Deg (i), is defined to be the number of adjacent nodes of i, in
other words, the number of links connected to node i. For example, in Figure 10.1,
Deg (1) = 2, Deg (2) = 1, and Deg (7) = 3.
The idea of the minimum degree reordering algorithm is to eliminate a node with
the minimum degree from the graph, one at a time, until every node is eliminated. Once
a node is eliminated, the degree of nodes may change in the remaining graph and hence
needs to be updated. The node elimination sequence eventually suggest us a candidate
of the desired permutation matrix P.
We now describe the minimum degree reordering algorithm in terms of the graph
elimination model, where an elimination graph is defined as a graph which is subjected
to the elimination of selected nodes. Here we eliminate the nodes of eM one by one
in a systematic order. At each step the resulting graph is labeled as ef;_f, where the
subscript k denotes the step number.

MINIMUM DEGREE REORDERING ALGORITHM

Step 1 (initialization): Set e~ *-eM and k = 1.


Step 2 (minimum degree selection): In the elimination graph et 1, choose a
node i of minimum degree.
262 Implementation of Interior-Point Algorithms Chap. 10

Step 3 (graph elimination): Form an elimination graph Gf by eliminating the


node i from Gf._ 1 .
Step 4 (loop or stop): Set k *- k + 1. If k > m (the number of nodes of G[)l),
then stop. Otherwise go to Step 2.

At the end, a permuted graph is obtained by swapping the step number k and the node
number i. Moreover, a permutation matrix P is generated by assigning Pki = 1, for
k = 1, ... , m, and other elements being zero.
Notice that more than one node can assume the minimum degree in Step 2. Differ-
ent heuristics of node selection give different versions of the minimum degree reordering
algorithm. In the simple case without any further information, we may break ties arbi-
trarily.
The following example illustrates the minimum degree algorithm applied to the
example of Figure 10.1 with an arbitrary tie-breaker.
Example 10.3
k = 1, node selected = 2, min. degree = I k = 2, node selected = 5, min. degree = I

k = 3, node selected= I, min. degree= 2 k = 4, node selected = 7, min. degree = 2

k = 5, node selected = 6, min. degree = 2 k = 6, node selected = 3, min. degree = I


Sec. 10.2 The Cholesky Factorization Method 263

Finally,
k = 7, node selected = 4, min. degree =0
k
2-
5 __,_ 2
1-->- 3
7-4
6-5
3-6
4-7

Swapping The permuted graph

The resulting permutation matrix becomes


0 1 0 0 0 0 0
0 0 0 0 1 0 0
1 0 0 0 0 0 0
P= 0 0 0 0 0 0 1
0 0 0 0 0 1 0
0 0 1 0 0 0 0
0 0 0 0 0 0

10.2.4 Symbolic Cholesky Factorization

When the Cholesky factorization method is applied to find moving directions at each
iteration of the previously mentioned interior-point algorithms, we need to factorize
M = ADkA 7 repeatedly, where Dk is a diagonal matrix with positive diagonal elements
for each k. It will be awfully tedious, if we have to permute every ADkA 7 in order to
reduce fill-ins.
Fortunately, closer observation indicates that although ADkA 7 changes along with
the value of Dk at each iteration, the positions of nonzero elements remain intact. This
means the sparsity structure is preserved as in AA 7 . Therefore, in the implementation
of an interior-point algorithm for large-scale problems, it is advantageous to perform a
symbolic factorization first. In this phase, we focus on AA 7 to analyze the positions in
which the nonzero entries of the computational result would occur. The minimum degree
reordering algorithm could be applied to reduce the fill-ins. Once this work is done, we
record the positions of nonzero elements as a template. Then, at each iteration, since
the positions of nonzero elements are known, we need only find the numerical value of
each nonzero element. Correspondingly, we may call it a numerical factorization phase.
Figure 10.2 illustrates this two-phase procedure using block diagrams.

10.2.5 Solving Triangular Systems

Once the Cholesky factor of matrix M is computed, solving system (10.1) is equivalent
to solving the triangular systems (10.3) and (10.4).
264 Implementation of Interior-Point Algorithms Chap. 10

--------------------------------------------. ~--------------------.
' ' '
' ' ''
Minimum degree
reordeting - Symbolic
Cholesky
factorization
'
'
--+-+-
'' '
''
'
'

'
'
'
Numeric
Cholesky
factorization

DONE ONLY ONCE DONE AT EACH


ITERATION

Figure 10.2

Forward solve. Since L is a lower triangular matrix, system (10.3) can be


solved by getting ZJ from the first equation, Z2 from the second, ... , and Zm from the
last. Therefore we call it aforward solve procedure. More specifically, we have
(10.14a)
and

Z; =
(
V; - L l;kZk
i-1 )
jl;; fori= 2, ... , m (10.14b)
k=l

It is easy to code as follows:

Algorithm F -1

Z1=V1/I11
for i = 2 to m
S=O
for k = 1 to i - 1
s ~ s+l;kzk
end
z; = (v;- s)jl;;
end

Similar to Algorithm C-1, since we access matrix L row by row, and inner products
of row vectors are involved in (10.14b), Algorithm F-1 can be modified for vector
processing. This scheme is certainly more appropriate, if matrix L is stored rowwise
(like C programming). We leave this to the reader.
If matrix L is stored column by column and sparsity of the solution vector is being
considered, the following coding scheme is more efficient:

Algorithm F -2

for i = 1 to m
Z; = V;/l;;
for k = (i + 1) to n
Vk ~ Vk - Z;lki
end
end
Sec. 10.3 The Conjugate Gradient Method 265

The reader is asked to verify that Algorithm F-2 solves system (10.3) and accesses
the matrix L column by column. Note that if vi turns out to be zero at the beginning
of the ith step, then Zi must be zero and the inner loop can be completely skipped.
Hence the sparsity issue is exploited. When a columnwise storage scheme (for example,
FORTRAN) is used, Algorithm F-2 is more efficient.

Backward solve. If, on the other hand, L 7 is an upper triangular matrix, system
(10.4) can be solved by getting Urn from the last equation, Um-1 from the second last,
... , and u 1 from the first. This forms a backward solve procedure. To be more specific,
we have
(10.15a)
and

Um-i = (Zm-i - t
k=m-i+1
h(m-i)Uk) /l(m-i)(m-i) for i = 1, ... , m - 1 (10.15b)

It is easy to code as follows:

Algorithm B-1

Um = Zm/lmm
for i = 1 to m - 1
S=O
for k = m - i + 1 to m
s +- s+lk(m-i) uk
end
Um-i = (Zm-i- S)/Z(m-i)(m-i)
end

Similar to Algorithm F-1, other coding scheme are available for further consideration.
Algorithm B-1 is only one of the simple implementations.
The forward solve and backward solve together with Cholesky factorization have
become the most popular method used by many interior-point algorithms for solving
system (10.1). Other methods, including the conjugate gradient method and the LQ
factorization method, will be introduced in subsequent sections.

10.3 THE CONJUGATE GRADIENT METHOD

In addition to the Cholesky factorization method, the conjugate gradient method can also
be applied to solve system (10.1) with a symmetric positive definite matrix M. The
method was originally suggested by M. R. Hestenes and E. Stiefel in 1952. Like the
steepest descent method, it is classified as an error correction method, which means
that the algorithm starts with an approximated solution (say uk), evaluates an error
266 Implementation of Interior-Point Algorithms Chap. 10

function, and then iterates along a direction (say dk) with an appropriate step-length to
reduce the error. Instead of moving directly along the negative gradient direction for
a maximum reduction of the error function, the moving directions are required to be
mutually conjugate with respect to the matrix M, i.e.,

fork f. j (10.16)

Therefore, this method carries the name "conjugate gradient."


For an (m x m)-dimensional matrix M, the conjugate gradient theory guarantees
that the mth iterate um is an exact solution of Mu = v. Of course, it is quite possible
that the method produces a sufficiently accurate solution prior to reaching um. Moreover,
as shown later, the most complicated arithmetic required by the method is merely the
matrix-to-vector multiplications (in computing Muk). Hence the method is well suited
for solving large sparse systems.
Now let us introduce the basic ideas of the conjugate gradient method. Suppose
that uk is a current approximate of the system Mu = v. We define

(10.17)

to be a corresponding error vector (or residual vector), and

(10.18)

to be an error function at uk. Remembering that M is symmetric, by combining (10.17)


and (10.18) we obtain that

(10.19)

Assume that the next iterate uk+ 1 is determined by

(10.20)

where ak is an appropriate step-length and dk is an appropriate direction of translation


such that the value of the error function is reduced by this translation. Figure 10.3
illustrates a two-dimensional example, in which uk+ 1 is on the straight line passing uk.
The slope of the line is determined by dk and the step-size ak is proportional to the
distance between uk and uk+ 1. Our objective here is to find dk and ak. Suppose that
dk is known. In order to achieve a maximum reduction in hk+ 1 , we try to find a local
minimum of hk+1 along dk by plugging (10.20) into (10.19) and setting the derivative
of hk+ 1 with respect to ak to be zero. This yields the result

(dk)T ric (dk)T ric


(10.21)
(dk)T Mdk = (dkf pk'
Sec. 10.3 The Conjugate Gradient Method 267
uz

Constant cost
surfaces of hk

Actual solution
u at the
__ - - -- - - - - global minimum
of hk

Steepest
u k' descent
direction
~------------------------------------~-!11

Figure 10.3

Note that the steepest descent method suggests that we consider using the negative
gradient of hk with respect to uk as dk. In this case, we have

dhk
- = -2 (Muk -
-duk v) = 2rk (10.22)

which means the negative gradient of hk is proportional to the residual rk. Therefore,
the residual vector can be used in place of dk. With this in mind, the conjugate gradient
method intends tc stay close to ~ while satisfying the conjugacy requirements (10.16).
Therefore, when u 0 is arbitrarily chosen, we can take d 0 = r 0 . After that, we may
consider taking dk as the component of~ orthogonal to Mdk-I, fork :::: 1. In this way,
we define
(10.23)

where fh is a scalar to be fixed by the conjugacy condition (Mdk-I) T dk = 0. Conse-


quently, we know that

where pk-I = Mdk-I (10.24)

Also, it is interesting to note that


~+I= v- Muk+ 1 = v- M [uk + akdk]
(10.25)
= (v- Muk)- akMdk = ~- akpk

Based on this idea, the gradient algorithm can be stated as follows:


268 Implementation of Interior-Point Algorithms Chap. 10

Algorithm CG: Set u 0 to be arbitrary, k = 0, and E > 0 sufficiently small.


Compute d 0 = r 0 = v - Mu0 . Repeat:
pk = Mdk
<Xk = [(dk)T rk] I [(dk)T Pk]

uk+I = uk + <Xkdk
rk+l = rk - <XkPk

f3k+1 = [ (pk) T ~+l] I [ (pk) T pk]


dk+I = ~+I - f3k+IPk
k+--k+l
until 110+ 1 11 :S E, output uk+ 1 as the solution.
As mentioned earlier, owing to the orthogonal relationship, the conjugate gradient
method in theory generates an exact solution of the system (10.1) in at most m itera-
tions. Therefore, strictly speaking, this method is a finite algorithm. However, owing
to numerical round-off and/or truncation errors, it is highly possible for the algorithm to
take more than m steps to reach a satisfying result. On the other hand, it is also possible
for the algorithm to terminate in less than m steps with acceptable accuracy.
In connection with our application to the interior-point algorithms, as a primal
algorithm converges, the dual estimate varies little from one iteration to another. There-
fore, we can set u0 to be the dual estimate of the previous iteration and expect Algo-
rithm CG to terminate quickly. For example, in the primal affine scaling algorithm, we
set M = AX~AT and v = AX~c; then, solving Mu = v provides the dual estimate wk at
the kth iteration.
Also note that the most intensive computation at each iteration of the conjugate
gradient algorithm is the multiplication of matrix M to a vector dk. The sparse matrix
techniques can be implemented effectively to perform this operation. In general, if M
has, on the average, y nonzero elements per row, then each iteration of Algorithm CG
requires of the order of (y + 5)m multiplications. Hence, an exact solution can be
generated in the order of (y + 5)m 2 multiplications. If we start with a good estimate, a
satisfactory solution is expected to be obtained in k < m iterations; then the algorithm
requires only of the order of (y + 5)km multiplications.
Several ad-hoc enhancements have been suggested by various researchers for the
conjugate gradient method. A detailed discussion of these techniques is beyond the scope
of this book.

10.4 THE LQ FACTORIZATION METHOD

The idea of LQ factorization is to utilize the power-series expansion in matrix form.


Given that the symmetric positive definite matrix M in system (10.1) has the form
M=I-B (10.26)
Sec. 10.4 The LQ Factorization Method 269

where B is symmetric positive definite with all eigenvalues being positive and less than
one, then the series
co
L Bk = I+ B + B 2
+ ... (10.27)
k=O

is convergent and
co
M- 1 =(I- B)- 1 = LBk (10.28)
k=O

In this way, a solution of system (10.1) is provided by

(10.29)

Therefore the required matrix inversion can be replaced by matrix multiplications and
additions.
However, for a general linear programming problem, the fundamental matrix M =
AD~AT does not necessarily fit this scheme, unless the constraint matrix A is properly
manipulated. In this section we introduce the LQ factorization method to achieve this
purpose. For simplicity, we focus on the implementation of the primal affine scaling
algorithm and leave other algorithms to the reader.
To begin with, we define an (m x n)-dimensional matrix Q to be orthonormal if
QQT = I. Moreover, the norm of matrix Q is defined by

IIQII = max IIQxll


llxll=l

Similarly,

IIQTII =max IIQTyll


IIYII=I

For an orthonormal matrix Q with full row-rank m < n, we have the following special
property of its matrix norm.

Theorem 10.2. Let Q be an (m x n)-dimensional orthonormal matrix with full


row-rank m < n; then IIQII ::S 1 and IIQTII = 1.
Proof Since Q is orthonormal with row-rank m < n, by the Gram-Schmidt or-
thogonalization process, there exists an [(n- m) x n]-dimensional matrix R such that

becomes an (n x n)-dimensional orthonormal matrix. It can be readily shown that


--1 -T
Q = Q and IIQxll = llxll for all x measured in the 2- norm. Hence, we have
270 Implementation of Interior-Point Algorithms Chap. 10

1 = jjQjj =max jjQxjj


llxll=l

::: max IIQxll = IIQII


llxll=l
This completes the first part of the proof. Now, for the second part,

=max
IIYII=l 11
Q(y) II
-T
0

--IIYII=l
max II ( y) II-
0 - 1
and we are done.

The LQ factorization method is based upon the following fundamental theorem.

Theorem 10.3 (fundamental theorem for LQ factorization). Let A be an


(m x n)-dimensional matrix with full row-rank m < n; then there exists an (m x m)-
dimensional lower triangular matrix L and an (m x n)-dimensional orthonormal matrix
Q such that A= LQ.
Proof Since A has full row-rank with m < n, by Theorem 10.1, we know that
AAT = LLT for a lower triangular matrix L with positive diagonal elements. Hence
L -I exists and Q = L -IA is well defined. Moreover,

QQT =L- 1AAT (L- 1)T =L- 1LLT (Lrrl =I

Hence we know that Q is orthonormal and A= LQ.

Note that for a linear programming problem in its standard form, i.e.,
Minimize cT x

subject to Ax= b, x:::O


if the constraint matrix A is (m x n)-dimensional with full row-rank m < n, then A can
be factorized as the product of L and Q according to Theorem 10.3. In this case we
have Q = L -IA, where Lis obtained by Cholesky factorization (say, Algorithm C-1).
However, computing Q does not require L -I explicitly. If we let Qj and Aj be
the jth column of Q and A, respectively, for j = 1, ... , n, then Qj can be obtained
by applying a "forward solve" (say, Algorithm F-1) to the system LQj = Aj, for j =
1, ... , n. In summary, we have the following algorithm for LQ factorization:
Sec. 10.4 The LQ Factorization Method 271

Algorithm Q

Step 1: Compute the Cholesky factor L for AAT.


Step 2: For j = 1 to n, use forward solve to find Qj for the system
LQj = Aj
End.

Once the LQ factorization is done, the original linear programming problem can be
expressed as
Minimize cT x (10.30a)
subject to Qx = b', x:=::O (10.30b)
where b' = L -lb (which can be obtained by applying a forward solve to the system
Lb' =b).
As mentioned earlier, we shall focus on the primal affine scaling algorithm and
leave similar development of other interior-point algorithms to the reader. Remember
that in each iteration of the primal affine scaling algorithm, most computational time is
spent on finding the dual estimate wk for the moving direction dk. To be more precise,
for the problem (10.30), we need to compute

(10.31)
where Xk is a diagonal matrix formed by the current primal solution xk at the kth iteration.
Or, equivalently, wk is a solution vector to the system
Mu=v (10.32)
where M = (QX~QT) and v = QX~c.
Now, since Q is an orthonormal matrix, we would like to show that (QX~QT)- 1
can be represented as a convergent power series in matrix form. Then we can compute
wk according to the basic idea mentioned at the beginning of this section. To achieve
this objective, let us focus on the kth solution xk > 0 and choose

(10.33)

In this way, Amax. Amin > 0. Moreover, for any a > Amax. we have
(10.34)

where X is a diagonal matrix with its ith diagonal element being


- a- (xf)2
Xi= < 1 fori= 1, ... , n
Ci

We now denote matrix QXQT by B. It is clear that B is symmetric and positive


definite with all eigenvalues being positive. Moreover, if Kmax is the largest eigenvalue
272 Implementation of Interior-Point Algorithms Chap. 10

of B (often called the spectral radius of B and denoted by p(B)), we see that
Kmax =liB!!= IIQXQTII

::':: IIQIIIIXIIIIQTII ::'::a- Amin < 1 (10.35)


a
This further implies that each eigenvalue of B is less than 1. Therefore, the power series
(10.27) is convergent and

(10.36)

Observe that the matrix power series (10.27) can be approximated by a matrix polynomial
Pr(B) of degree r > 0, i.e.,
(10.37)
where

(10.38)

and
00

Er(B) = L Bk (10.39)
k=r+!
By (10.37), we have

B k - IIBW+I
IIEr(B)II = 11(1- B)-I- Pr(B)II ::':: Loo II II - ( _ IIBII)
1
(10.40)
k=r+!
Since IIBII = p(B), (10.40) implies that

E (B)!! < [p(B)y+I (10.41)


li
r - [1- p(B)]

Consequently, for an arbitrary small tolerance level E > 0, in order to obtain a good
matrix approximation with I!Er(B)II :::; E, we simply have to choose a larger such that

r>
- !log ([1 - p(B)]E)
log [p(B)]
-1
l (10.42)

i.e., the smallest integer larger than


log ([1 - p(B)]E) _
1
log [p(B)]
Observe that, from Equation (10.42), the value of r increases as E gets smaller or
p(B) approaches unity. This factor directly affects the computational effort.
Summarizing what we have discussed, we can devise an algorithm, based on the
LQ factorization, to solve the computational bottleneck (10.30) at the kth iteration for
the primal affine scaling algorithm.
Sec. 10.4 The LQ Factorization Method 273

Algorithm LQ-1

Step 1: Choose E > 0 to be sufficiently small. Set


2
and a > .A.max = m!lX
l
(xn
Set
Amin
p(B) = 1 - - and r = pog ([1 - p(B)]E) - 1l
ct I log [p(B)]
Step 2: Set sCO) = wCO) = QX~c and j = 1. Set x; = (a - (xt) 2 )jct, for
i = 1, ... , n. Compute B = QXQT, where X is a diagonal matrix with x; being
its ith diagonal element.
Step 3: Repeat
sU> +- Bs(}-l)

w(j) +- w(j-l) + s(j)


j+-j+1
until j = r + 1.
Step 4: Set wk +- &w(r).

One potential drawback of applying Algorithm LQ-1 to solve large-scale problems is


due to sparsity considerations. For a given sparse matrix A, after factorization, Q may
become very dense and difficult to manipulate. Fortunately, this potential problem can
1
be overcome. The idea is to express QXQT as L -I AXAT (LT) - . Remember that the
sparsity issue of the Cholesky factor L has been handled in a previous section. Also note
that the operation of premultiplying L -I is equivalent to applying a "forward solve" and
postmultiplying (LT) -I is equivalent to a "backward solve." Therefore, we can replace
Step 3 in Algorithm LQ-1 by the following procedure:

Repeat:
Use "backward solve" to compute uU-I) for the system of equations
LT uU-1) = s(}-1)

Use "forward solve" to compute s(j) for the system of equations


LsUl = uU>

j+-j+l
until j = r + 1.
274 Implementation of Interior-Point Algorithms Chap. 10

With this modification, since A and L are sparse, the sparsity issue of Q is bypassed.
Any sparse matrix multiplication technique can be used here. Also note that, in Step 2,
we no longer have to compute B, and s(O) = w(O) = L - 1AX~c.
Another potential drawback of Algorithm LQ-1 is due to the large value of r
required by the algorithm. In theory, we know that as Amin/Amax ~ 0 or, equivalently,
as p(B) ~ 1, the value of r approaches infinity. This means that more and more
iterations of Step 3 are needed, which results in inefficient computation. This inevitably
happens in the interior-point methods, because when we approach an optimal vertex,
even from interior, Amin/Amax still approaches 0.
To overcome this potential problem, we may consider a fixed-point scheme. From
(10.31)-(10.34), we know that

wk = (QX~QTr1 QX~c
1 I
=-(I- B)- v
a
where B = QXQT and v = QX~c. If we further denote w = awk, then w =(I -B)- 1v.
This implies that
w=Bw+v (10.43)
In other words, w is a fixed-point solution of (10.43). Hence we can replace Algo-
rithm LQ-1 by the following iterative scheme:

Algorithm LQ-2

Step 1: Choose E > 0 to be sufficiently small. Choose


2
a> Amax = m!lX (xf)
l

Set v = QX~c. Set


2
Xi = (a- (xf) ) fa, fori= 1, ... , n

Compute B = QXQT.
Step 2: Set j = 1 and select an arbitrary w( 0l.
Repeat:
w(j) = Bw(j- 1) + v

j+-j+1
until llw(j)- wU- 1)11 ::::E.

Step 3: Assign
~ = (1/a)wUl

Notice that Algorithm LQ-2 allows an arbitrary starting point w(O). In practice, when
the primal affine scaling algorithm converges, the dual solution wk varies little and
Sec. 10.5 Concluding Remarks 275

p(B) -+ 1. Hence, a considerable advantage can be gained by setting w<0l as the


previous dual estimate wk-l. In this way, though the convergence may slow down, we
are close to a solution from the beginning.
Unlike Algorithm LQ-1, being a finite algorithm, Algorithm LQ-2 produces a
sequence w<n: j = 0, 1, 2, .... We need to show that the sequence indeed converges
and produces a solution to system (10.32).

Theorem 10.4. The sequence wUl: j = 0, 1, 2, ... generated by Algorithm LQ-


2 is a Cauchy sequence and thus converges. Moreover, if we let w be its limit point,
then wk = (1/a)w solves the system (10.32).
Proof Let p :::: 1, then
llwU+P) - w<j) II :::: llwU+p) - wU+p-l) II + · · · + llw(i+l) - w<j) II

= IIBi+p-1 (w<l) - w<Ol) II+ ... + IIBj (w(l)- w<Ol) II

:::: [p (B)]i+P-1 II (w(l)- w<Ol) II+ ... + [p (B)]j II (w<l) - w<Ol) II

< [p (B)]j llw(l) - w<O) II


- [1- P (B)]
which approaches 0 as j approaches +oo. Hence w<j) is a Cauchy sequence, and thus
converges. From Step 2, we know its limit point w satisfies that
w=Bw+v
Hence w =(I- B)- 1v, and wk solves (10.32).

The proof also shows that the rate of convergence of Algorithm LQ-2 is at least linear
in p(B). As a final remark, it is noteworthy that the iterative scheme presented in
Algorithm LQ-2 may be accelerated. Let 0 < e < 1 be arbitrary and consider Step 2 in
Algorithm LQ-2 being modified as
w<n = ewU-Il + (1 -e) [BwU-Il + v]
It can be shown that the sequence {wUl: j = 0, 1, 2, ... } generated by the modified
Algorithm LQ-2 is also a Cauchy sequence. Its convergence rate is at least linear in
p(ei + (1 - e)B). This could improve the rate of convergence. But it is not an easy
problem to find an optimal e for a general setting.
A more general and new model of treating the infinitely summable series by using
Chebychev approximation to accelerate the convergence is also under current investiga-
tion.

10.5 CONCLUDING REMARKS

In this chapter we have introduced three methods to overcome the computational bottle-
neck of implementing interior-point algorithms. While all three methods (Cholesky, CG,
276 Implementation of Interior-Point Algorithms Chap. 10

and LQ) appear promising, the most popular is the Cholesky factorization method. This
is partially due to the fact that a vast amount of literature is available in this area and
it is numerically stable. Moreover, any floating-point exceptions can be easily "tracked"
and "trapped" in the implementations of Cholesky factorization.
Sparsity considerations are very important in solving large-scale problems. In ad-
dition to what has been introduced in this chapter, one idea is to split the constraint
matrix into two parts, a dense part and a sparse part, and treat them separately. For the
sparse part we may use sparse matrix techniques, such as the ones described previously,
while a low rank updating scheme or the conjugate gradient method may be used for the
dense part. This approach could result in substantial reduction in computer run-time for
a number of cases. It is often referred to as the column dropping technique. However, it
requires us to take extreme care to ensure the robustness of the implemented software.
This has been recognized as a challenging task. The major problem is numerical insta-
bility. Even though the original system is nonsingular, the subsystem of equations to
be solved with some columns deleted may tum out to be singular. Another idea is to
solve dual problems as if they were primal. Under this situation, as long as the original
problem does not contain both dense rows and dense columns, we may still be able to
take advantage of one formulation.
Other techniques that have been tested for the implementation of interior-point
methods include (1) matrix reduction via the elimination of singleton columns and corre-
sponding variables, (2) fixing variables to their bounds, whenever applicable, (3) scaling
the data to render numerical stability in computation, and (4) using various acceleration
techniques applied to Cholesky factorization. The authors' limited experience with these
techniques indicates that little improvement can be obtained for overall efficiency in
solving real-life problems.
Finally, we want to mention that there have been attempts to apply decomposition
principles in conjunction with interior-point methods to solve large-scale problems. Ba-
sically, a large-scale problem is decomposed into a restricted master problem and several
subproblems, and then interior-point methods are applied. However, as far as the authors
know, there is no evidence showing any significant improvement.

REFERENCES FOR FURTHER READING

10.1. Adler, I., Karmarkar, N., Resende, M. G. C., and Veiga, G., "An implementation of Kar-
markar's algorithm for linear programming," Mathematical Programming 44,297-335 (1989).
10.2. Adler, I., Karmarkar, N., Resende, M.G. C., and Veiga, G., "Data structures and programming
techniques for the implementation of Karmarkar's algorithm," ORSA Journal of Computing
l, 84-106 (1989).
10.3. Cheng, Y-C., Houck, D. J., Jr., Meketon, M.S., Slutsman, L., Vanderbei, R. J., and Wang, P.,
''The AT&T KORBX® System," AT&T Technical Journal 68, No. 3, 7-19 (1989).
10.4. Duff, I. S., Erisman, A. M., and Reid, J. K., Direct Methods for Sparse Matrices, Clarendon
Press, Oxford (1986).
Exercises 277

10.5. Gay, D., "Massive memory buys little speed for complete in-core sparse Cholesky factoriza-
tions," Technical Report, AT&T Bell Laboratories (1988).
10.6. George, A., and Liu, J. W., Computer Solution of Large Positive Definite Systems, Prentice-
Hall, Englewood Cliffs, NJ (1981).
10.7. Gill, P. E., Murray, W., and Wright, M. H., Numerical Linear Algebra and Optimization,
Vol. 1, Addison-Wesley, Redwood City, CA (1991).
10.8. Golub, G. H., and Van Loan, C. F., Matrix Computations, Johns Hopkins University Press,
Baltimore, MD (1983).
10.9. Hestenes, M. R., and Stiefel, E., "Methods of conjugate gradients for solving linear systems,"
Journal of Researches National Bureau of Standards 49, 409-436 (1952).
10.10. Housos, E. C., Huang, C. C., and Liu, J. M., "Parallel algorithms for the AT&T KORBX®
System," AT&T Technical Journal68, No. 3, 37-47 (1989).
10.11. Markowitz, H. M., "The elimination form of the inverse and its application to linear pro-
gramming," Management Science 3, 255-269 (1957).
10.12. Pan, V., "How can we speed up matrix multiplications?", SIAM Review 26, 393-415 (1984).
10.13. Pissanetzky, S., Sparse Matrix Technology, Academic Press, New York (1984).
10.14. Puthenpura, S., Saigal, R., and Sinha, L. P., "Application of LQ factorization in implement-
ing the Karmarkar algorithm and its variants," Technical Memorandum, No. 51173-900205-
0lTM, AT&T Bell Laboratories (1990).
10.15. Saigal, R., "An infinitely summable series implementation of interior point methods," Techni-
cal Report 92-37, Department ofindustrial and Operations Engineering, University of Michi-
gan, Ann Arbor, May (1992).
10.16. Saigal, R., "Matrix partitioning methods for interior point algorithms," Technical Report
92-39, Department of Industrial and Operations Engineering, University of Michigan, Ann
Arbor, June (1992).
10.17. Vanderbei, R. J., "An implementation of the minimum-degree algorithm using simple data
structures," Technical Memorandum, No. 11212-900115-02TM, AT&T Bell Laboratories
(1990).
10.18. Vanderbei, R. J., "ALPO: Another linear program solver," Technical Memorandum, No.
11212-900522-18TM, AT&T Bell Laboratories ( 1990).
10.19. Van Loan, C., "A survey of matrix computations," Technical Report, Cornell University
(1990).
10.20. Wilkinson, J. H., The Algebraic Eigenvalue Problem, Oxford University Press (1965).

EXERCISES

10.1. Let M be an n x n symmetric positive definite matrix. Show that


(a) Any principal submatrix of M is positive definite.
(b) M is nonsingular and its inverse matrix M- 1 is also positive definite.
(c) BTMB is positive definite if and only if B is nonsingular.
(d) mii > 0, fori= 1, ... , n.
278 Implementation of Interior-Point Algorithms Chap. 10

(e) max mu::: max lmiJI·


1::::i:Sn 1::;i,j::;n
(f) mumjj::: mfj, for all i, j.
10.2. Let P be a permutation matrix that interchanges only two columns or two rows.
(a) Show that P is symmetric.
(b) Is the product of two such permutations symmetric? Why?
(c) Suppose that R is a product of a finite number of such permutations and M is ann x n
symmetric matrix; show that any diagonal element of RTMR must be a diagonal
element of M.
10.3. Is the product of two orthogonal matrices orthogonal? Why?
10.4. Let M be an n x n nonsingular matrix with QR factorization M = QR.
(a) Is this factorization unique?
(b) If M has another QR factorization such that M = QR. What is the connection between
Q and Q? R and R?
10.5. Derive the "outer product form" of the Cholesky factorization algorithm based on the proof
of Theorem 10.1. [Hint: Express

M = L1M1Lf, M1 = LzMzLf, Mn-1 = LninL~


where Li for i = l, 2, ... , n are lower triangular matrices and In is the n x n identity
matrix. Then show that L, the Cholesky factor of M, is L1 + Lz + · · · + Ln - (n - l )In.]
10.6. Let A be an (m X n)-dimensional matrix (m < n) with rank r :::::: m. Assume that A= uvT.
Show that U and V are of rank r.
10.7. Let A be an (m x n)-dimensional matrix (m < n) with full row-rank. Prove that AAT is
symmetric and positive definite. If

A= [ 1
-1 2
3 -2]5
find the Cholesky factor of AA T.
10.8. Let A be an (m x n)-dimensional matrix (say m < n). Also let B be ann x m matrix such
that BA = I. Show that B exists if and only if the columns of A are linearly independent.
Furthermore, show that B is unique if and only if the rows of A are linearly independent.
Construct such a B for a given A. [Note: B is called the "left inverse" of A.]
10.9. Consider the matrix M in Example 10.1. Compute its Cholesky factor L and then apply
the minimum degree reordering to recompute the Cholesky factor. Remember to compare
these two results.
10.10. Verify that Algorithm F-2 gives the correct answer, and access L column by column.
10.11. Prove that dk, fork = 1, 2, ... , obtained by (10.23) and (10.24), indeed are mutually
conjugate with respect to matrix M.
10.12. For the matrix A of Exercise 10.7, perform LQ factorization to get the corresponding
matrix Q.
10.13. Provide a simple example, say 1 ::::0 m < n ::::0 10, such that A is relatively sparse but Q is
very dense.
10.14. Consider the fixed-point iteration scheme in connection with the LQ factorization tech-
nique, where
wU) = ewU- 1) + (1 - &) [Bw(J-1) + v]
Exercises 279

Let Kmax and Kmin be the largest and smallest eigenvalues of B. Show that
. . -Kmin
p (BI + (1 - B)B) < p(B) If and only If 0 > 8 > _ __::.:.;:::.:._
1- Kmin
10.15. An advantage of the C programming language is due to the dynamic memory allocation
which substantially reduces memory requirements for the implementation of the interior-
point algorithms. If you know what "dynamic memory allocation" is, discuss the way
of implementing the minimum degree reordering, the Cholesky factorization, and the for-
ward/backward solves with the dynamic memory allocation scheme.
Bibliography

1. Adler, I., Karmarkar, N., Resende, M. G. C., and Veiga, G., "An implementation of
Karmarkar' s algorithm for linear programming," Mathematical Programming 44,
297-335 (1989).
2. Adler, I., Karmarkar, N., Resende, M. G. C., and Veiga, G., "Data structures and
programming techniques for the implementation of Karmarkar' s algorithm," ORSA
Journal of Computing 1, 84-106 (1989).
3. Adler, I., and Resende, M. G. C., "Limiting behavior of the affine scaling contin-
uous trajectories for linear programming problems," Mathematical Programming
50, 29-51 (1991).
4. Anstreicher, K. M., "A monotonic projective algorithm for fractional linear pro-
gramming," Algorithmica 1, 483-498 (1986).
5. Anstreicher, K. M., "Linear programming and the Newton barrier flow," Mathe-
matical Programming 41, 363-373 (1988).
6. Anstreicher, K. M., "The worst-case step in Karmarkar's algorithm," Mathematics
of Operations Research 14, 294-302 (1989).
7. Anstreicher, K. M., "A combined phase I-phase II projective algorithm for linear
programming," Mathematical Programming 43, 209-223 (1989).
8. Anstreicher, K. M., "A standard form variant, and safeguarded linesearch, for the
modified Karmarkar's algorithm," Mathematical Programming 47, 337-351 (1990).
9. Anstreicher, K. M., "On the performance ofKarmarkar's algorithm over a sequence
of iterations," SIAM Journal on Optimization 1, 22-29 (1991).

280
Bibliography 281

10. Anstreicher, K. M., and Bosch, R. A., "Long steps in an O(n 3 L) algorithm for
linear programming," Mathematical Programming 54, 251-265 (1992).
11. Asic, M. D., Kovacevic-Vujcic, V. V., and Radosavljevic-Nikolic, M. D., "Asymp-
totic behavior of Karmarkar's method for linear programming," Mathematical Pro-
gramming 46, 173-190 (1990).
12. Balinsky, M. L., and Gomory, R. E., "A mutual primal-dual simplex method," in
Recent Advances in Mathematical Programming, ed. R. L. Graves and P. Wolfe,
McGraw-Hill, New York (1963).
13. Balinsky, M. L., and Tucker, A. W., "Duality theory of linear programs: A con-
structive approach with applications," SIAM Review 3, 499-581 (1969).
14. Barnes, J. W., and Crisp, R. M., "Linear programming: A survey of general purpose
algorithms," AilE Transactions 7, No. 3, 49-63 (1975).
15. Barnes, E. R., "A variation of Karmarkar's algorithm for solving linear program-
ming problems," Mathematical Programming 36, 174-182 (1986).
16. Bartels, R. H., and Golub, G. H., "The simplex method for linear programming
using LU decomposition," Communications of the ACM 12, 26-268 (1969).
17. Bartholomew, J. W. H., Chandawarkar, A. S., and Puthenpura, S.C., "SIRCIT: A
system for AT&T overseas network planning," Proceedings of INFOCOM '90, San
Francisco (1990).
18. Bayer, D., and Lagarias, J. C., "The nonlinear geometry of linear programming,
I. Affine and projective scaling trajectories, II. Legendre transform coordinates
and central trajectories," Transactions of the American Mathematical Society 314,
499-581 (1989).
19. Bayer, D., and Lagarias, J. C., "Karmarkar's linear programming algorithm and
Newton's method," Mathematical Programming 50, 291-330 (1991).
20. Bazaraa, M. S., Jarvis, J. J., and Sherali, H. D., Linear Programming and Network
Flows, 2d ed., John Wiley, New York (1990).
21. Beale, E. M. L., "Cycling in the dual simplex algorithm," Naval Research Logistics
Quarterly 2, 269-276 (1955).
22. Ben Daya, M., and Shetty, C. M., "Polynomial barrier function algorithms for
convex quadratic programming," Report J 88-5, School of Industrial and Systems
Engineering, Georgia Institute of Technology (1988).
23. Blair, C. E., "The iterative step in the linear programming algorithm of N. Kar-
markar," Algorithmica 1, 537-539 (1987).
24. Bland, R. G., "New finite pivoting rules for the simplex method," Mathematics of
Operations Research 2, 103-107 (1977).
25. Bland, R. G., Goldfarb, D., and Todd, M. J., "The ellipsoid method: A survey,"
Operations Research 29, 1039-1091 (1981).
26. Borgwardt, K. H., The Simplex Method: A Probabilistic Analysis, Springer-Verlag,
Berlin (1987).
282 Bibliography

27. Burrell, B. P., and Todd, M. J., "The ellipsoid method generates dual variables,"
Mathematics of Operations Research 10, 688-700 (1985).
28. Carolan, W. J., Hill, J. E., Kennington, J. L., Niemi, S., and Wichmann, S. J., "An
empirical evaluation of the KORBX algorithm for military airlift applications,"
Operations Research 38, 240-248 (1990).
29. Cavalier, T. M., and Soyster, A. L., "Some computational experience and a modi-
fication of the Karmarkar algorithm," presented at the 12th Symposium on Math-
ematical Programming, Cambridge, MA (1985).
30. Charnes, A., "Optimality and degeneracy in linear programming," Econometrica
20, 160-170 (1952).
31. Cheng, Y-C., Houck, D. J., Jr., Meketon, M. S., Slutsman, L., Vanderbei, R. J.,
and Wang, P., "The AT&T KORBX® System," AT&TTechnical Journal68, No.3,
7-19 (1989).
32. Choi, I. C., Monma, C. L., and Shanno, D. F., "Further development of a primal-
dual interior point method," ORSA Journal on Computing 2, No. 4 (1990).
33. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).
34. Dantzig, G. B., "Maximization of a linear function of variables subject to linear
inequalities," Activity Analysis of Production and Allocation, ed. T. C. Koopmans,
John Wiley, New York, 339-347 (1951).
35. Dantzig, G. B., Linear Programming and Extensions, Princeton University Press,
Princeton, NJ (1963).
36. Dantzig, G. B., and Wolfe, P., "Decomposition principle for linear programming,"
Operations Research 8, 101-111 (1960).
37. Dennis, J. E., Jr., Morshedi, A. M., and Turner, K., "A variable-metric variant of
Karmarkar's algorithm for linear programming," Mathematical Programming 39,
1-20 (1987).
38. Dikin, I. I., "Iterative solution of problems of linear and quadratic programming"
(in Russian), Doklady Akademiia Nauk USSR 174, 747-748, (English translation)
Soviet Mathematics Doklady 8, 674-675 (1967).
39. Ding, J., and Li, T. Y., "A polynomial-time predictor-corrector algorithm for a
class of linear complementarity problems," SIAM Journal on Optimization 1, 83-
92 (1991).
40. Duff, I. S., Erisman, A. M., and Reid, J. K., Direct Methods for Sparse Matrices,
Clarendon Press, Oxford (1986).
41. Erickson, J. R., "An iterative primal-dual algorithm for linear programming," Re-
port LITH-MAT-R-1985-10, Department of Mathematics, Linkoping University,
Sweden (1985).
42. Erlander, S., "Entropy in linear programs," Mathematical Programming 21, 137-
151 (1981).
Bibliography 283

43. Fang, S. C., "A new unconstrained convex programming approach to linear pro-
gramming," OR Report No. 243, North Carolina State University, Raleigh, NC
(1990), Zeischriftfiir Operations Research 36, 149-161 (1992).
44. Fang, S. C., Puthenpura, S. C., Saigal, R., and Sinha, L. P., "Solving stochastic pro-
gramming problems via Kalman filter and affine scaling," Technical Memorandum,
No. 51173-900808-0lTM, AT&T Bell Laboratories (1990).
45. Fang, S. C., and Tsao, J. H-S., "Solving standard form linear programs via un-
constrained convex programming approach with a quadratically convergent global
algorithm," OR Report No. 259, North Carolina State University, Raleigh,. NC
(1991).
46. Fang, S. C., and Tsao, J. H-S., "An unconstrained convex programming approach
to solving convex quadratic programming problems," OR Report No. 263, North
Carolina State University, Raleigh, NC (1992)(also to appear in Optimization).
47. Farkas, J., "Theorie der einfachen Ungleichungen," Journal for die reine und ange-
wandte Mathematik 124, 1-27 (1902).
48. Fiacco, A. V., and McCormick, G. P., Nonlinear Programming: Sequential Un-
constrained Minimization Techniques, John Wiley, New York (1968).
49. Frazer, R. J., Applied Linear Programming, Prentice-Hall, Englewood Cliffs, NJ
(1968).
50. Frisch, K. R., ''The logarithmic potential method of convex programming," Tech-
nical Report, University Institute of Economics, Oslo, Norway (1955).
51. Freedman, B. A., Puthenpura, S. C., and Sinha, L. P., "A new Karmarkar-based
algorithm for optimizing convex, non-linear cost functions with linear constraints,"
Technical Memorandum, No. 54142-870217-0lTM, AT&T Bell Laboratories
(1987).
52. Freund, R. M., "Polynomial-time algorithms for linear programming based only on
primal affine scaling and projected gradients of a potential function," Mathematical
Programming 51, 203-222 (1991).
53. Gacs, P., and Lovasz, L., "Khachian's algorithm for linear programming," Mathe-
matical Programming Study 14, 61-68 (1981).
54. Garcia, C. B., and Zangwill, W. 1., Pathway to Solutions, Fixed Points, and Equi-
libria, Prentice Hall, Englewood Cliffs, NJ (1981).
55. Gass, S. I., Linear Programming: Methods and Applications, 2d ed., McGraw-Hill,
New York (1964).
56. Gay, D., "A variant of Karmarkar's linear programming algorithm for problems in
standard form," Mathematical Programming 37, 81-90 (1987).
57. Gay, D., "Massive memory buys little speed for complete in-core sparse Cholesky
factorizations," Technical Report, AT&T Bell Laboratories (1988).
58. George, A., and Liu, J. W., Computer Solution of Large Positive Definite Systems,
Prentice Hall, Englewood Cliffs, NJ (1981).
284 Bibliography

59. de Ghellinck, G., and Vial, J.-P., "A polynomial Newton method for linear pro-
gramming," Algorithmica 1, 425-453 (1986).
60. de Ghellinck, G., and Vial, J.-P., "An extension of Karmarkar's algorithm for
solving a system of linear homogeneous equations on the simplex," Mathematical
Programming 39, 79-92 (1987).
61. Gill, P. E., Murray, W., Saunders, M. A., Tomlin, J. A., and Wright, M. H.,
"On projected barrier methods for linear programming and an equivalence to Kar-
markar's projective method," Mathematical Programming 36, 183-209 (1986).
62. Gill, P. E., Murray, W., and Wright, M. H., Numerical Linear Algebra and Opti-
mization, Vol. 1, Addison-Wesley, Redwood City, CA (1991).
63. Gilmore, P. C., and Gomory, R. E., "A linear programming approach to the cutting-
stock problem," Operations Research 9, 849-859 (1961).
64. Gilmore, P. C., and Gomory, R. E., "A linear programming approach to the cutting-
stock problem-Part II," Operations Research 11, 863-888 (1963).
65. Goldfarb, D., and Liu, S., "An O(n 3 L) primal interior point algorithm for convex
quadratic programming," Mathematical Programming 49, 325-340 (1991).
66. Goldfarb, D., and Mehrotra, S., "A relaxed version of Karrnarkar's method," Math-
ematical Programming 40, 289-315 (1988).
67. Goldfarb, D., and Mehrotra, S., "Relaxed variants of Karrnarkar's algorithm for
linear programming with unknown optimal objective value," Mathematical Pro-
gramming 40, 183-196 (1988).
68. Goldfarb, D., and Mehrotra, S., "A self-correcting version of Karrnarkar's algo-
rithm," manuscript, IE&OR, Columbia University, New York (1988), to appear in
SIAM Journal of Numerical Analysis.
69. Goldfarb, D., and Todd, M. J., "Linear Programming," in Optimization, Hand-
book in Operations Research and Management Science, ed. Nemhauser, G. L.,
and Rinnooy Kan, A. H. G., Vol. 1, 73-170, Elsevier-North Holland, Amsterdam
(1989).
70. Golub, G. H., and Van Loan, C. F., Matrix Computations, Johns Hopkins University
Press, Baltimore (1983).
71. Gonzaga, C., "Conical projection algorithm for linear programming," Mathematical
Programming 43, 151-173 (1989).
72. Gonzaga, C., "An algorithm for solving linear programming problems in O(n 3 L)
operations," in Progress in Mathematical Programming: Interior-Point and Related
Methods, ed. N. Megiddo, Springer-Verlag, New York, 1-28 (1989).
73. Gonzaga, C., "Polynomial affine algorithms for linear programming," Mathematical
Programming 49, 7-21 (1990).
74. Gonzaga, C., "Large-step path-following methods for linear programming, Part I.
Barrier function method," SIAM Journal on Optimization 1, 268-279 (1991).
75. Gonzaga, C., "Large-step path-following methods for linear programming, Part II.
Potential reduction method," SIAM Journal on Optimization 1, 280-292 (1991).
Bibliography 285

76. Gonzaga, C., "Search directions for interior linear programming methods," Algo-
rithmica 6, 153-181 (1991).
77. Gonzaga, C., "Interior point algorithms for linear programming with inequality
constraints," Mathematical Programming 52, 209-225 (1991).
78. Grotschel, M., Lovasz, L., and Schrijver, A., The Ellipsoid Method and Combina-
torial Optimization, Springer-Verlag, Heidelberg (1988).
79. Guier, 0., den Hertog, D., Roos, C., Terlaky, T., and Tsuchiya, T., "Degeneracy
in interior point methods for linear programming," Reports of the Faculty of Tech-
nical Mathematics and Informatics, No. 91-102, Delft University of Technology,
Netherlands, December (1991).
80. den Hertog, D., Roos, C., and Terlaky, T., "Inverse barrier methods for linear
programming," Report of the Faculty of Technical Mathematics and Informatics,
No. 90-27, Delft University of Technology, Netherlands (1990).
81. den Hertog, D., Roos, C., and Terlaky, T., "A complexity reduction for the long-step
path-following algorithm for linear programming," SIAM Journal on Optimization
2, 71-87 (1992).
82. Hestenes, M. R., and Stiefel, E., "Methods of conjugate gradients for solving linear
systems," Journal of Researches National Bureau of Standards 49,409-436 (1952).
83. Ho, J. K., "Recent advances in decomposition," Mathematical Programming Study
31, 119-128 (1987).
84. Hooker, J. N., "Karmarkar's linear programming algorithm," Inteifaces 16, 75-90
(1986).
85. Housos, E. C., Huang, C. C., and Liu, J. M., "Parallel algorithms for the AT&T
KORBX® System," AT&T Technical Journal 68, No. 3, 37-47 (1989).
86. Huard, P., "Resolution of mathematical programming with nonlinear constraints by
the method of centers," in Nonlinear Programming, ed. J. Abadie, North-Holland,
Amsterdam, Holland, 207-219 (1967).
87. Ikura,Y., Freedman, B. A., and Sinha, L. P., "A new Karmarkar-based resource-
directed decomposition algorithm with application to large-scale operator schedul-
ing problems," paper presented at the 13th International Mathematical Symposium,
Tokyo, Japan (1988).
88. Iri, M., and Imai, H., "A multiplicative barrier function method for linear program-
ming," Algorithmica 1, 455-482 (1986).
89. Jan, G. M., and Fang, S. C., "A variant of primal affine scaling algorithm for linear
programs," Optimization 22, 681-715 (1991).
90. Jarre, F., "On the convergence of the method of analytical centers when applied to
convex programs," manuscript, Institute fi.ir Angewandte Mathematik and Statistik,
Universitat Wurzburg, Wurzburg, Germany (1987).
91. Jarre, F., ''The method of analytical centers for smooth convex programs," PhD
thesis, Institute fi.ir Angewandte Mathematik and Statistik, Universitat Wurzburg,
Wurzburg, Germany (1987).
286 Bibliography

92. John, F., "Extremum problems with inequalities as subsidiary conditions," Studies
and Essays, Interscience, New York, 187-204 (1948).
93. Kallio, M., and Porteus, E. L., "A class of methods for linear programming,"
Mathematical Programming 14, 161-168 (1978).
94. Kantorovich, L. V., "Mathematical methods of organizing and planning produc-
tion" (in Russian), Publication House of the Leningrad State University, Leningrad
(1939), (English translation) Management Science 6, 366-422 (1959-60).
95. Kapoor, S., and Vaidya, P. M., "Fast algorithms for convex quadratic programming
and multicommodity flows," Proceedings of the 18th Annual Symposium on Theory
ofComputing, Berkeley, CA 147-159 (1986).
96. Karmarkar, N., "A new polynomial time algorithm for linear programming," Pro-
ceedings of the 16th AnnualACM Symposium on the Theory of Computing, 302-311
(1984).
97. Karmarkar, N., "A new polynomial time algorithm for linear programming," Com-
binatorica 4, 373-395 (1984).
98. Karmarkar, N., Lagarias, J. C., Slutsman, L., and Wang, P., "Power series variants
of Karmarkar-type algorithms," AT&T Technical Journal 68, No. 3, 20-36 (1989).
99. Karmarkar, N., and Ramakrishnan, K. G., "Implementation and computational re-
sults of Karmarkar's algorithm for linear programming, using an iterative method
for computing projections," extended abstract, presented at the 13th Symposium
on Mathematical Programming, Tokyo (1988).
100. Karmarkar, N., and Sinha, L. P., "Application of Karmarkar' s algorithm to overseas
telecommunications network planning," paper presented at the 12th International
Symposium of Mathematical Programming, Boston (1985).
101. Karush, W., "Minima of functions of several variables with inequalities as side
constraints," Master's thesis, Department of Mathematics, University of Chicago
(1939).
102. Khachian, L. G., "A polynomial algorithm in linear programming" (in Russian),
Doklady Akademiia Nauk SSSR 224, 1093-1096, (English translation) Soviet Math-
ematics Doklady 20, 191-194 (1979).
103. Khachian, L. G., "Polynomial algorithms in linear programming" (in Russian),
Zhurnal Vychisditel 'noi Mathematikii Mathematicheskoi Fiziki 20, 51-68, (English
translation) USSR Computational Mathematics and Mathematical Physics 20,
53-72 (1980).
104. Klee, V., and Minty, G. L., "How good is the simplex algorithm?" in Inequalities
III, ed. 0. Shisha, Academic Press, New York, 159-179 (1972).
105. Kojima, M., "Determining basic variables of optimal solutions in Karmarkar's new
LP algorithm," Algorithmica 1, 499-515 (1986).
106. Kojima, M., Mizuno, S., and Yoshise, A., "A primal-dual interior point method for
linear programming," in Progress in Mathematical Programming: Interior-Point
and Related Methods, ed. N. Megiddo, Springer-Verlag, New York, 29-48 (1989).
Bibliography 287

107. Kojima, M., Mizuno, S., and Yoshise, A., "A polynomial-time algorithm for a
class of linear complementarity problems," Mathematical Programming 44, 1-26
(1989).
108. Kojima, M., Mizuno, S., and Yoshise, A., "An O(.,fiiL) iteration potential reduc-
tion algorithm for linear complementarity problems," Mathematical Prog'ramming
50, 331-342 (1991).
109. Kortanek, K. 0., and Shi, M., "Convergence results and numerical experiments on a
linear programming hybrid algorithm," European Journal of Operational Research
32, 47-61 (1987).
110. Kortanek, K. 0., and Zhu, J., "New purification algorithms for linear program-
ming," Naval Research Logistics 35, 571-583 (1988).
111. Kotiah, T., and Slater, N., "On two-server Poisson queues with two types of cus-
tomers," Operations Research 21, 597-603 (1973).
112. Kozlov, M. K., Tarasov, S. P., and Khachian, L. G., "Polynomial solvability of
convex quadratic programming" (in Russian), Doklady Akademiia Nauk USSR 5,
1051-1053 (1979).
113. Kuhn, H. W., and Tucker, A. W., "Nonlinear programming," Proceedings of the
Second Berkeley Symposium on Mathematical Statistics and Probability, University
of California Press, Berkeley, CA, 481-492 (1951).
114. Lagarias, J. C., and Todd, M. J., eds., Proceedings of AMS-IME-SIAM Research
Conference on Mathematical Developments Arising from Linear Programming,
Bowdoin College, Brunswick, ME (1988).
115. Lasdon, L. S., "Duality and decomposition in mathematical programming," IEEE
Transactions on Systems Science and Cybernetics 4, 86-100 (1968).
116. Lemke, C. E., "The dual method for solving the linear programming problem,"
Naval Research Logistics Quarterly 1, No. 1 (1954).
117. Lemke, C. E., "The constrained gradient method of linear programming," Journal
of the Society of Industrial and Applied Mathematics 9, 1-17 (1961).
118. Levin, A. Yu., "On an algorithm for the minimization of convex functions" (in
Russian), Doklady Akademiia Nauk USSR 160, 1244-1247, (English translation)
Soviet Mathematics Doklady 6, 286-290 (1965).
119. Luenberger, D. G., Introduction to Linear and Nonlinear Programming, 2d ed.,
Addison-Wesley, Reading, MA (1973).
120. Markowitz, H. M., "The elimination form of the inverse and its application to
linear programming," Management Science 3, 255-269 (1957).
121. McShane, K. A., Monma, C. L., and Shanno, D., "An implementation of a primal-
dual interior point method for linear programming," ORSA Journal on Computing
1, 70-83 (1989).
122. Megiddo, N., "On the complexity of linear programming," in Advances in Eco-
nomical Theory, ed. T. Bewely, Cambridge University Press, Cambridge, 225-268
(1987).
288 Bibliography

123. Megiddo, N., "Pathways to the optimal set in linear programming," in Progress in
Mathematical Programming: Interior-Point and Related Methods, ed. N. Megiddo,
Springer-Verlag, New York, 131-158 (1989).
124. Megiddo, N., Progress in Mathematical Programming: Interior-Point and Related
Methods, Springer-Verlag, New York (1989).
125. Megiddo, N., and Shub, M., "Bo!lndary behavior of interior point algorithms in
linear programming," Mathematics of Operations Research 14, 97-146 (1989).
126. Mehrotra, S., "On finding a vertex solution using interior point methods," Linear
Algebra and Its Applications 152, 106-111 (1991).
127. Mehrotra, S., and Sun, J., "An algorithm for convex quadratic programming that
requires 0 (n 3 ·5 L) arithmetic operations," Mathematics of Operations Research 15,
342-363 (1990).
128. Mehrotra, S., and Sun, J., "A method of analytic centers for quadratically con-
strained convex quadratic programs," SIAM Journal of Numerical Analysis 28,
529-544 (1991).
129. Mehrotra, S., and Sun, J., "An interior point method for solving smooth convex
programs based on Newton's method," Contemporary Mathematics 114, 265-284
(1991).
130. Monma, C. L., and Morton, A. J., "Computational experience with a dual affine
variant of Karmarkar's method for linear programming," Operations Research Let-
ters 6, 261-267 (1987).
131. Monteiro, R. C., "Convergence and boundary behavior of the projective scaling
trajectories for linear programming," Mathematics of Operations Research 16, No.
4 (1991).
132. Monteiro, R. C., and Adler, I., "Interior path following primal-dual algorithms.
Part I: Linear programming," Mathematical Programming 44, 27-42 (1989).
133. Monteiro, R. C., and Adler, I., "Interior path following primal-dual algorithms.
Part II: Convex quadratic programming," Mathematical Programming 44, 43-66
(1989).
134. Monteiro, R. C., and Adler, I., "An extension of Karmarkar type algorithm to a class
of convex separable programming problems with global linear rate of convergence,"
Mathematics of Operations Research 15, 408-422 (1990).
135. Monteiro, R. C., Adler, I., and Resende, M. C., "A polynomial-time primal-dual
affine scaling algorithm for linear and convex quadratic programming and its power
series extension," Mathematics of Operations Research 15, 191-214 (1990).
136. Murty, K. G., Linear Programming, John Wiley, New York (1983).
137. Nazareth, J. L., "Homotopies in linear programming," Algorithmica 1, 529-536
(1986).
138. Nazareth, J. L., Computer Solution of Linear Programs, Oxford University Press,
Oxford and New York (1987).
Bibliography 289

139. Nazareth, J. L., "Pricing criteria in linear programming," in Progress in Math-


ematical Programming: Interior-Point and Related Methods, ed. N. Megiddo,
Springer-Verlag, New York, 105-129 (1989).
140. Nazareth, J. L., "The homotopy principle and algorithms for linear programming,"
SIAM Journal on Optimization 1, 316-332 (1992).
141. Nesterov, Y., and Nemirovskii, A., "Acceleration and parallelization of the path-
following interior point method for a linearly constrained convex quadratic prob-
lem," SIAM Journal on Optimization 1, 548-564 (1991).
142. Padberg, M. W., "A different convergence proof of the projective method for linear
programming," Operations Research Letters 4, 253-257 (1986).
143. Pan, V., "How can we speed up matrix multiplications?" SIAM Review 26, 393-415
(1984).
144. Peterson, E. L., An Introduction to Linear Optimization, Lecture notes, North Car-
olina State University, Raleigh, NC (1990).
145. Pissanetzky, S., Sparse Matrix Technology, Academic Press, New York (1984).
146. Puthenpura, S., Saigal, R., and Sinha, L. P., "Application of LQ factorization in
implementing the Karmarkar algorithm and its variants," Technical Memorandum,
No. 51173-900205-01TM, AT&T Bell Laboratories (1990).
147. Rajasekera, J. R., and Fang, S.C., "On the convex programming approach to linear
programming," Operations Research Letters 10, 309-312 (1991).
148. Rajasekera, J. R., and Fang, S.C., "Deriving an unconstrained convex program for
linear programming," Journal of Optimization Theory and Applications 75, No. 3
(1992).
149. Reid, J. K., "A sparse-exploiting variant of Bartels-Golub decomposition for linear
programming bases," Mathematical Programming 24, 55-69 (1982).
150. Renegar, J., "A polynomial-time algorithm based on Newton's method for linear
programming," Mathematical Programming 40, 59-93 (1988).
151. Rockafeller, R. T., Convex Analysis, Princeton University Press, Princeton, NJ
(1970).
152. Roos, C., "A new trajectory following polynomial-time algorithm for linear pro-
gramming problem," Journal of Optimization Theory and Applications 63, 433-458
(1989).
153. Roos, C., and Vial, J.-Ph., "Long steps with the logarithmic penalty barrier function
in linear programming," in Economic Decision Making: Games, Economics, and
Optimization, ed. J. Gabszevwicz, J.-F., Richard, and L. Wolsey, Elsevier Science
Publisher B.V., 433-441 (1990).
154. Roos, C., and Vial, J.-Ph., "A polynomial method of approximate centers for linear
programming," Mathematical Programming 54, 295-306 (1992).
155. Rosen, J., "The gradient projection method for nonlinear programming, Part I:
Linear constraints," SIAM Journal of Applied Mathematics 8, 181-217 (1960).
290 Bibliography

156. Saigal, R., "An infinitely summable series implementation of interior point meth-
ods," Technical Report 92-37, Department of Industrial and Operations Engineer-
ing, University of Michigan, Ann Arbor, May (1992).
157. Saigal, R., "Matrix partitioning methods for interior point algorithms," Technical
Report 92-39, Department of Industrial and Operations Engineering, University of
Michigan, Ann Arbor, June (1992).
158. Schultz, G. L., and Meyer, R. R., "An interior point method for block angular
optimization," SIAM Journal on Optimization 1, 583-602 (1991).
159. Shamir, R., "The efficiency of the simplex method: A survey," Management Sci-
ence 33, 301-334 (1987).
160. Shanno, D. F., "Computing Karmarkar projection quickly," Mathematical Pro-
gramming 41, 61-71 (1988).
161. Shanno, D. F., and Marsten, R. E., "On implementing Karmarkar's algorithm,"
Working Paper, Graduate School of Administration, University of California, Davis,
CA (1985).
162. Sheu, R. L., and Fang, S.C., "Insights into the interior-point methods," OR Report
No. 252, North Carolina State University, Raleigh, NC (1990), Zeischrift for
Operations Research 36, 200-230 (1992).
163. Sheu, R. L., and Fang, S.C., "On the generalized path-following methods for linear
programming," OR Report No. 261, North Carolina State University, Raleigh, NC
(1992).
164. Shor, N. Z., "Utilization of space dilation operation in minimization of convex
functions" (in Russian), Kibernetika 1, 6-12, (English translation) Cybernetics 6,
7-15 (1970).
165. Shor, N. Z., Minimization Methods for Non-differentiable Functions, Springer-
Verlag, Berlin (1985).
166. Shub, M., "On the asymptotic behavior of the projective rescaling algorithm for
linear programming," Journal of Complexity 3, 258-269 (1987).
167. Sonnevand, G., "An analytical center for polyhedrons and new classes of global
algorithms for linear (smooth, convex) programming," in Proceedings of the 12th
IFIP Conference on System Modeling and Optimizations, Budapest, Lecture Notes
in Control Information Sciences, Springer-Verlag, New York, 84, 866-876 (1985).
168. Sonnevand, G., "A new method for solving a set of linear (convex) inequalities and
its application for identification and optimization," Proceedings of the Symposium
on Dynamic Modelling, IFAC-IFORS, Budapest (1986).
169. Stone, R. E., and Tovey, C. A., "The simplex and projective scaling algorithm as
iteratively reweighted least squares methods," SIAM Review 33, 220-237 (1991).
170. Sun, J., "A convergence proof for an affine-scaling algorithm for convex quadratic
programming without nondegeneracy assumptions," manuscript to appear in Math-
ematical Programming (1993).
Bibliography 291

171. Tapia, R. A., and Zhang, Y., "Cubically convergent method for locating a nearby
vertex in linear programming," Journal of Optimization Theory and Applications
67, 217-225 (1990).
172. Tapia, R. A., and Zhang, Y., "An optimal-basis identification technique for interior-
point linear programming algorithms," Linear Algebra and Its Applications, 152,
343-363 (1991).
173. Tardos, E., "A strongly polynomial algorithm to solve combinatorial linear pro-
grams," Operations Research 34, 250-256 (1986).
174. Todd, M. J., "Large scale linear programming-geometry, working bases, and
factorization," Mathematical Programming 26, 1-20 (1986).
175. Todd, M. J., "Polynomial expected behavior of a pivoting algorithm for linear
complementarity and linear programming problems," Mathematical Programming
35, 173-192 (1986).
176. Todd, M. J., "Improved bounds and containing ellipsoids in Karmarkar's linear
programming algorithm," Mathematics of Operations Research 13, 650-659 ( 1988).
177. Todd, M. J., "Polynomial algorithms for linear programming," in Advances in
Optimization and Control, ed. H. A. Eiselt and G. Pederzoli, Springer-Verlag,
Berlin, 49-66 (1988).
178. Todd, M. J., "Exploring special structure in Karmarkar's linear programming algo-
rithm," Mathematical Programming 41, 97-113 (1988).
179. Todd, M. J ., "Probabilistic models for linear programming," Mathematics of Oper-
ations Research 16, No. 4 (1991).
180. Todd, M. J., and Burrell, B. P., "An extension to Karmarkar's algorithm for linear
programming using dual variables," Algorithmica 1, 409-424 (1986).
181. Todd, M. J., and Ye, Y., "A centered projective algorithm for linear programming,"
Mathematics of Operations Research 15, 508-529 (1990).
182. Tomlin, M. J., "An experimental approach to Karmarkar's projective method for
linear programming," Mathematical Programming Studies 31, 175-191 (1987).
183. Tomlin, J. A., "A note on comparing simplex and interior methods for linear pro-
gramming," in Progress in Mathematical Programming: Interior-Point and Related
Methods, ed. N. Megiddo, Springer-Verlag, New York, 91-103 (1989).
184. Tseng, P., and Luo, Z. Q., "On the convergence of affine-scaling algorithm,"
manuscript to appear in Mathematical Programming 53 (19~3).
185. Tsuchiya, T., "Global convergence of the affine scaling methods for degenerated
linear programming problems," Research Memo. No. 373, The Institute of Statis-
tical Mathematics, Tokyo, Japan (1990) (Also in Mathematical Programming 52,
377-403 (1991).
186. Tsuchiya, T., "A study on global and local convergence of interior point algorithms
for linear programming" (in Japanese), PhD thesis, Faculty of Engineering, The
University of Tokyo, Tokyo, Japan (1991).
292 Bibliography

187. Vanderbei, R. J., "Karrnarkar's algorithm and problems with free variables," Math-
ematical Programming 43, 31-44 (1989).
188. Vanderbei, R. J., "An implementation of the minimum-degree algorithm using
simple data structures," Technical Memorandum, No. 11212-900115-02TM, AT&T
Bell Laboratories (1990).
189. Vanderbei, R. J., "ALPO: Another linear program solver," Technical Memorandum,
No. 11212-900522-18TM, AT&T Bell Laboratories (1990).
190. Vanderbei, R. J., and Carpenter, T. J., "Symmetric indefinite system for interior
point method," Technical Report SOR 91-7, Department of Civil Engineering and
Operations Research, Princeton University, NJ (1991).
191. Vanderbei, R. J., and Lagarias, J. C., "I. I. Dikin's convergence result for the
affine-scaling algorithm," Contemporary Mathematics 114, 109-119 (1990).
192. Vanderbei, R. J., Meketon, M. S., and Freedman, B. A., "A modification of Kar-
markar' s linear programming algorithm," Algorithmica 1, 395-407 (1986).
193. Van Loan, C., "A survey of matrix computations," Technical Report, Cornell Uni-
versity (1990).
194. Vaidya, P. M., "An algorithm for linear programming which requires O(((m +
n)n 2 + (m +n)l.5n)L) arithmetic operations," Mathematical Programming 47, 175-
201 (1990).
195. Wilkinson, J. H., The Algebraic Eigenvalue Problem, Oxford University Press
(1965).
196. Witzgall, C., Boggs, P. T., and Domich, P. 0., "On the convergence behavior
of trajectories for linear programming," Contemporary Mathematics 114, 161-187
(1990).
197. Wolfe, P., ''The simplex method for quadratic programming," Econometrica 27,
382-398 (1959).
198. Wright, S. J., "Interior-point methods for optimal control of discrete-time systems,"
Technical Report MCS-P226-0491, Argonne National Laboratories, Chicago, April
(1991).
199. Ye, Y., "Karrnarkar's algorithm and the ellipsoidal method," Operations Research
Letters 4, 177-182 (1987).
200. Ye, Y!, "A class of potential functions for linear programming," manuscript, De-
partment of Management Sciences, The University of Iowa, Iowa City (1988).
201. Ye, Y., "A further result on the potential reduction algorithm for P-matrix linear
complementarity problem," manuscript, Department of Management Sciences, The
University of Iowa, Iowa City (1988).
202. Ye, Y., "A combinatorial property of analytical centers of polytopes," manuscript,
Department of Management Sciences, The University of Iowa, Iowa City (1989).
Bibliography 293

203. Ye, Y., "An extension of Karrnarkar's algorithm and the trust region method
for quadratic programming," in Progress in Mathematical Programming: Interior-
Point and Related Methods, ed. N. Megiddo, Springer-Verlag, New York, 49-64
(1989).
204. Ye, Y., "An O(n 3 L) potential reduction algorithm for linear programming," Con-
temporary Mathematics 114, 91-107 (1990).
205. Ye, Y., "Recovering optimal basic variables in Karrnarkar's polynomial algorithm
for linear programming," Mathematics of Operations Research 15, 564-572 (1990).
206. Ye, Y., "A potential reduction algorithm allowing column generations," SIAM Jour-
nal on Optimization 2, 7-20 (1992).
207. Ye, Y., Guler, 0., Tapia, R. A., and Zhang, Y., "A quadratically convergent
O(JjlL)-iteration algorithm for linear programming," Technical Report TR91-26,
Department of Mathematical Sciences, Rice University, Houston, TX (1991).
208. Ye, Y., and Kojima, M., "Recovering optimal dual solutions in Karrnarkar's polyno-
mial algorithm for linear programming," Mathematical Programming 39, 305-317
(1987).
209. Ye, Y., and Pardalos, P., "A class of linear complementarity problems solvable in
polynomial time," manuscript, Department of Management Sciences, The Univer-
sity oflowa, Iowa City (1989).
210. Ye, Y., Tapia, R. A., and Zhang, Y., "A superlinearly convergent O(JjlL)-iteration
algorithm for linear programming," Technical Report TR91-22, Department of
Mathematical Sciences, Rice University, Houston, TX (1991).
211. Ye, Y., and Todd, M. J., "Containing and shrinking ellipsoids in the path-following
algorithm," Mathematical Programming 47, 1-9 (1990).
212. Ye, Y., and Tse, E., "A polynomial-time algorithm for convex quadratic program-
ming," manuscript, Department of Engineering-Economic Systems, Stanford Uni-
versity, Stanford, CA (1986).
213. Yudin, D. B., and Nemirovskii, A. S., "Informational complexity and efficient
methods for the solution of convex extremal problems" (in Russian), Ekonomika
I Matematicheskie Metodv 12, 357-369, (English translation) Matekon 13, 3-25
(1976).
214. Zhang, Y., Tapia, R. A., and Dennis, J. E., "On the superlinear and quadratic
convergence of primal-dual interior point linear programming algorithms," SIAM
Journal on Optimization 2, 304-324 (1992).
215. Zimmermann, U., "Search directions for a class of projective methods," Zeitschrift
for Operations Research 34, 353-379 (1990).
Index

295
A Basic solution, 21
Basic variable, 20
Additivity assumption, 4
Basis, 20, 32
Adjacency, 21, 22
Bland's rule, 44
Affine combination, 17
Affine hull, 19 Block size, 258
Affine scaling, 112, 144 Binary search, 163
for QP, 227 Boundary point, 18
transformation, 146-47 Bounding constraint, 127
Affine scaling algorithms, 10 Bounding hyperplane, 15
Affine transformation, 97, 144
Algebraic paths, 201, 202
extensions, 208-9 c
Artificial constraint technique, 72
Cauchy sequence, 275
Artificial variables, 39
Centering force, 164
Artificial variable technique, 72
Certainty assumption, 4
Assignment problem, 12
Cholesky factor, 259, 260, 229
Cholesky factorization, 256-60
block, 257, 258
B
inner product form, 278
Backward solve, 47, 53, 254, 265, 273 outer product form, 256, 278
Barrier function, 217 sparse, 259
inverse, 218, 220, 222 Code reorganizing, 257
logarithmic, 198, 219, 221, 222, 240-41 Column dropping technique, 276
Basic feasible solution: Compilers:
degenerated, 34 C, 257
overdetermined, 34, 35 FORTRAN, 257

297
298 Index

Complexity: Dual feasibility, 63, 66


analysis, 93 Dual problem, 55, 56
computational, 93 economical interpretation, 63-64
exponential, 95, 96 Dual solution, 56
order of, 93 Dual slackness vector, 61
polynomial, 93 Dual variables, 63
Complementary slackness condition, 61, 62, Dual vector, 150
63 Duality gap, 59, 62
Complementary slackness theorem, 62 Duality theorem:
Conjugate condition, 267 strong (LP), 59
Conjugate gradient method, 265-68 weak (LP), 58
Constraint matrix, 14 Duality theory, 10
Continuous trajectory, 172 Dynamic memory allocation, 279
Convex combination, 17 Dynamic programming, 89
canonical, 17 policy iteration, 89
Convex programming, 96
linearly constrained, 246
Cost coefficients, 3 E
Cost vector, 14
Cramer's rule, 11 0 Edge direction, 31, 34
Cutting stock problem, 8 feasible, 34
Cycling, 39, 42 infeasible, 34
prevention, 44-45 Elementary operations, 93
Elimination graph, 261
Ellipsoid,
D dilation of, 98
expansion of, 98
Decision variables, 3 half, 97
Decomposition principle, 276 Ellipsoidal method, 2, 10, 96-108
Deep cuts, 104-6 for linear inequalities, 101
Dense matrix, 259 geometry of, 100
Destination constraints, 7 performance of, 103-4
Diet problem, 6 Entropy function, 136, 223
Direction of recession, 23 Enumeration method, 25
Divisibility assumption, 4 Equilibrium prices (see Shadow prices)
Dual affine scaling algorithm, 165-77 Error correction method, 265
basic ideas, 165-67 Error function, 265, 266
generalized, 219-20 Error vector, 266
logarithmic barrier function method, Extremal direction, 23
176-77 Extreme points, 19
primal estimates, 171, 200 adjacent, 22
starting, 169-71
artificial constraint (upper bound)
method, 170-71 F
big-M method, 169-70
step-by-step procedure, 167-68 Farka's lemma, 59-61, 90
stopping rule, 171 Farka' s transposition theorem, 90
with barrier function, 204-5, 212-13 Feasible domain, 14
Index 299

Feasible region, 14 step length, liS, 119


Feasible solution, 7 unconstrained convex dual approach,
First order optimality conditions, 198 135-39
Fill-in, 260 extension of, 139, 140
Fixed point iteration, 278 Karrnarkar's standard form, 88, 114
Fixed point solution, 274 Karush-Kuhn-Tucker conditions, 62-63, 208,
Forward solve, 254, 264, 273 219, 222, 223, 226, 235, 241, 247,
Fundamental matrix, 43, 44, 66 251
Klee-Minty example, 95, 110

G
L
Gale's transposition theorem, 90
Gauss Jordan elimination, 46 Lagrange dual problem, 251
Geodesics, 252 Lagrange multiplier, 59, 216, 218, 220, 251
Gordan's transposition theorem, 90 Largest reduction rule, 36
Graph elimination, 261 Left inverse, 278
Graphical method, 18, 25 Lexicographical rule, 44
Line search, 247
Linear manifold, 17
H Linear programming (LP), 1
canonical form, 5, 102
Hadamard's inequality, IIO fundamental theorem, 24
Half spaces: inequality form, 57
closes, 15 problem, bounded, 15
open, 15 problem, consistent, 14
Hybrid method, 141 standard fonn, 2, 3, 6, 11, 12
Hyperplane, 15 Lower triangular matrix, 46, 229, 255, 256
LQ factorization method, 268-75
LU factorization, 46
I
Integer linear programming problem, 9
Interior feasible point, 145
M
Interior point methods, 2, 25, 26, 112, 113 Marginal cost, 64
Interior solution (see Interior feasible point) Marginal prices (see Shadow prices)
Inventory-balance constraints, 8 Matrix polynomial, 272
Iterative scheme, 10, 29 Minimum degree reordering algorithm,
261-62
Minimum ratio test, 36, 69, 70, 149
K Moving direction:
geometric interpretation, 209-10
Karrnarkar's algorithm, 10, 113-40
mutually conjugate, 266
basic ideas, 113, 114
convergence, 130
direction of movement, 118, 119
polynomial time solvability, 120-25
N
sliding objective function method, 128-34 Newton direction, 181, 201, 203-6, 209
step-by-step procedure, 119 projected, 218
300 Index

Nonbasic solution, 21 generalized, 217-19


Nonbasic variable, 20 logarithmic barrier function method,
Nonbasis, 20, 32 163-65
Nondegeneracy, 21 moving direction for LP, 148
Null space, 117, 210 moving direction for QP, 233
Null sp.:.ce projection matrix, 148 potential function for LP, 160
Numerical factorization, 263, 264 potential function for QP, 239
potential push method for LP, 160-63
potential push method for QP, 239-40
0 range space vs null space formulation for
QP, 238-39
Objective function, 3 reduced costs vector for LP, 150
Optimal solution, 15 starting, 155-58
Optimal solution set, 15 big-M method, 155-56
Optimality conditions, 2 two phase method, 156-58
Ordered graph, 261 step-by-step procedure for convex
Orthogonal matrix, 269, 271, 278 programming, 248-49
Overdetermined system, 43 step-by-step procedure for LP, 159
step-by-step procedure for QP, 235-36
step length for LP, 149
p step lengths for QP, 223-24
stopping rules for LP, 158-59
Parallel constraints, 106, 107
stopping rules for QP, 235
Parallel cuts, 106-7
with barrier function, 203-4, 211-12
Path following approach, 179 Primal-dual affine scaling algorithm, 177-94,
Permutation matrix, 260, 262, 263 205-7, 213-17
Perturbation, 78, 80 barrier parameter, 178
Pivot in, 36 basic ideas, 178-82
Pivot out, 36 centering direction, 190
Pi voting, 22, 31, 36 convergence properties for QP, 245-46
Polyhedral set, 99 directions of translation for LP, 180-81
Polynomial time, 2 directions of translation for QP, 242
Polynomially bounded algorithm, 93 feasibility direction, 190
Polytope, 16 objective reduction direction, 190
Post optimality analysis (see Sensitivity polynomial-time termination (LP), 184-88
analysis) power series acceleration, 193-94
Power series expansion, 268, 272 practical procedure for LP:
Power series method, 172-76 moving direction, 190
Potential function, 121 starting, 188-89
Potential push: step-by-step method, 191-93
boundary conditions (QP), 252 step length, 190--91
problem, 161 step-by-step procedure for LP, 184
Primal affine scaling algorithm: step-by-step procedure for QP, 243-44
basic ideas, 145-47 step lengths for LP, 183
computational complexity (LP), 160 step lengths for QP, 243-44
convergence for LP, 152-55 Primal-dual pair, 56
dual estimates for LP, 150 Primal feasibility, 63
dual estimates for QP, 233 Primal problem, 55
Index 301

Primal slackness vector, 61 changes in the right hand side vector,


Primal solution, 56 80--82
Product resource, 64 Simplex, 104, 107
Projected normal vector, 239 half, 107
Projected push vector, 239 multipliers, 45, 59
Projection mapping, 222 structure, ll5
Projective scaling algorithm, 2, 118-20 tableau, 45
Projective transformation, 113, 116-17, 142, Simplex method, 1, 10
144 algebra of, 31-37
Proportionality assumption, 3 checking for optimality, 33
Purification scheme, 125-26 complexity, 94--95
direction of translation, 33, 34, 35
dual method, 55, 65-72
Q step-by-step implementation, 70--71
Quadratic programming (QP), 224, 225 phase-I problem, 40
dual problem, 225-26 primal-dual algorithm, 55, 73-78
ill-conditioned problem, 228 step-by-step implementation, 75-76
primal problem, 225 starting, 39, 41--42
big-M method, 31, 41
two phase method, 31, 39
steps, 37
R step length, 30, 36
Rank-one update, 68, 125 stopping rule, 30
Reduced costs, 31, 33 Slack variables, 4, 7
Reduced costs vector, 33 Smallest index rule, 36
Relative interior, 145 Source constraints, 7
Residual vector, 266 Sparse matrix techniques, 259
Resolution theorem, 23 Sparsity, 96
Restricted master subproblem, 276 manipulation of, 127
Restricted primal, 74 Sparsity ratio, 259
Revised simplex method: Spectral norm, 252
step-by-step procedure, 45--46 Spectral radius, 272
Right-hand-side coefficients, 3 Spheroid, 97, 123
Right-hand-side vector, 14 Stable range, 79, 82
Subspace:
isomorphic, 210
s Supporting hyperplane, 18
Surplus variables, 4, 7, 170
Sensitivity analysis, 10, 55, 78-86
Surrogate cuts, 106
Separation theorem, 23
Symbolic factorization, 263, 264
Shadow prices, 63
Symmetric pair, 57, 62
Sherman-Morrison-Woodbury formula, 66
changes in the constraint matrix, 82-86
adding a constraint, 83-86
adding a new variable, 83
T
removing a constraint, 86 Taylor series expansion, 124
removing a variable, 83 Technological constraint, 3, 6
changes in the cost vector, 78-80 Theorems of the alternatives, 60
302 Index

Transformed space, 118, 123 restricted and unrestricted, 5


Transportation problem, 90 Vector processing, 256
Transposition problem, 90
Two-person zero sum game, 89
w
u Warehousing problem, 8
Upper triangular matrix, 46, 259

z
v Zero slackness, 62
Variables: Zig-zagging, 228, 229

You might also like