Lecture Notes J2LALP

Linear Algebra and Linear Programming
Lecture notes
Dr. Hong Duong

School of Mathematics
University of Birmingham
Birmingham B15 2TT, UK
Email: h.duong@bham.ac.uk
Abstract
This is the lecture notes for the module “Linear Algebra and Linear Programming”
taught for second-year students in the Jinan-Birmingham dual degree undergraduate
programmes. The notes consists of two parts. The first part (Chapters 1 & 2) is the
continuation of the module “Vectors, Geometry & Linear Algebra” taught in the first
year and develops the theory of vector spaces covering eigenvectors, characteristic poly-
nomials, inner products, and diagonalisation. The second part (Chapters 3–11) focuses
on Linear Programming (LP) covering LP modelling, geometric method, duality theory,
the simplex method and some extensions of the simplex method.
Linear algebra grew out of the development of techniques at the start of the 18th
century by Leibniz, Cramer and Gauss to solve systems of linear equations. Cayley
developed matrix algebra in the middle of the 19th century and the definition of a vector
space was made by Peano at the end of the 19th century, resulting in a theory of linear
transformations and vector spaces in the early 20th century. Linear algebra is not only
fundamental to both pure and applied mathematics, but also has applications ranging
from quantum theory to Google search algorithms.
If linear algebra grew out of the solution of systems of linear equations, then linear
programming grew out of attempts to solve systems of linear inequalities, allowing one
to optimise linear functions subject to constraints expressed as inequalities. The theory
was developed independently at the time of World War II by the Soviet mathematician
Kantorovich, for production planning, and by Dantzig, to solve complex military planning
problems. Koopmans applied it to shipping problems and the technique enjoyed rapid
development in the postwar industrial boom. The first complete algorithm to solve linear
programming problems, called the simplex method, was published by Dantzig in 1947 and
in the same year von Neumann established the theory of duality. In 1975, Kantorovich
and Koopmans shared the Nobel Prize in Economics for their work and Dantzig’s simplex
method has been voted the second most important algorithm of the 20th century after
the Monte Carlo method. Linear programming is a modern and immensely powerful
technique that has numerous applications, not only in business and economics, but also
in engineering, transportation, telecommunications, and planning.
By the end of the module students should be able to:
• understand and use the basic concepts of linear algebra and matrices, including
linear transformations, eigenvectors and the characteristic polynomial,
• understand the basic theory of inner products and apply it to questions of orthog-
onality and/or diagonalisability,
• explain the basic techniques of linear programming (graphical method and simplex
method),
• construct linear programming models of a variety of managerial problems and in-

terpret the results obtained by applying the linear programming techniques to these
problems,
• explain why and when the simplex method fails to provide a solution and how to
resolve such a situation
• present, prove and use the results of duality theory and interpret them,
• explain the computational complexity of SIMPLEX and LP

Acknowledgments This lecture notes is based on the lecture notes Linear Algebra
written by Dr Anton Evseev and Professor Sergey Shpectorov and the lecture notes Linear
Programming written by Dr Yun-Bin Zhao. I would like to thank them for sharing their
notes.
1
Contents
1 Linear maps, eigenvalues and eigenvectors, and matrix diagonalisation 6
1.1 Linear maps, matrix representation, rank-nullity theorem . . . . . . . . . . 7
1.2 Change of basis and similar matrices . . . . . . . . . . . . . . . . . . . . . 21
1.3 Diagonal and diagonalisable matrices . . . . . . . . . . . . . . . . . . . . . 22
1.4 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.5 Characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6 An algorithm for diagonalisation . . . . . . . . . . . . . . . . . . . . . . . 36
1.7 Some applications of diagonalisation . . . . . . . . . . . . . . . . . . . . . 40
1.7.1 Systems of difference equations . . . . . . . . . . . . . . . . . . . . 40
1.7.2 Systems of differential equations . . . . . . . . . . . . . . . . . . . 44
1.8 Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2 Bilinear forms and inner products 48
2.1 Symmetric bilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 The Gram matrix of a bilinear form . . . . . . . . . . . . . . . . . . . . . . 51
2.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.4 Inner products and inner product spaces . . . . . . . . . . . . . . . . . . . 54
2.5 Length of a vector, Pythagoras’s theorem and Cauchy-Schwartz inequality 56
2.6 Gram-Schmidt orthogonalisation . . . . . . . . . . . . . . . . . . . . . . . 59
3 Introduction to Linear Programming 66
3.1 What is Linear Programming? . . . . . . . . . . . . . . . . . . . . . . . . . 67
2
CONTENTS
3.2 Definition of LP: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.4 Standard Forms of LPs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5 Canonical Form of LPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.6 LP Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.7 A brief history of LPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4 LPs with 2 Variables: Geometric Method 80
5 Introduction to Convex Analysis 86
5.1 Basic concepts in convex analysis . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 Extreme Points, Directions, Representation of Polyhedra . . . . . . . . . . 91
6 Duality theory 95
6.1 Motivation and derivation of dual LP problems . . . . . . . . . . . . . . . 96
6.2 Primal and dual problems of LPs . . . . . . . . . . . . . . . . . . . . . . . 98
6.3 Duality theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4 Network flow problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.5 Optimality Conditions & Complementary slackness . . . . . . . . . . . . . 108
6.6 Complementary slackness condition . . . . . . . . . . . . . . . . . . . . . 112
6.7 More Applications of Duality Theory . . . . . . . . . . . . . . . . . . . . . 114
7 The Simplex Method (I): Theory 116
7.1 Basic Feasible Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.1.1 Review of Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 117
7.1.2 Existence of the Optimal Extreme Point . . . . . . . . . . . . . . . 117
7.1.3 Basic Feasible Solution . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.1.4 The number of the BFS . . . . . . . . . . . . . . . . . . . . . . . . 121
7.1.5 Correspondence between BFSs & extreme points . . . . . . . . . . 122
7.2 The simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.2.1 Reformulation of LP by the current BFS . . . . . . . . . . . . . . . 126
3
CONTENTS
7.2.2 Checking the Optimality . . . . . . . . . . . . . . . . . . . . . . . . 128
7.2.3 Analysis for the case zj − cj > 0 and yj = B −1 aj has a positive

component for some nonbasic variable . . . . . . . . . . . . . . . . 129
8 The simplex method (II): Algorithm 133
8.1 The algorithm for minimization problems . . . . . . . . . . . . . . . . . . . 133
8.2 The simplex method in Tableau Format . . . . . . . . . . . . . . . . . . . 135
8.2.1 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.2.2 The simplex method in tableau Format . . . . . . . . . . . . . . . . 138
8.3 Degeneracy and cycling in the simplex method . . . . . . . . . . . . . . . . 141
8.4 Efficiency of the simplex method . . . . . . . . . . . . . . . . . . . . . . . 144
8.5 Further Examples for Simplex Methods & Degeneracy . . . . . . . . . . . 148
8.6 Extracting Information from Optimal Simplex Tableau . . . . . . . . . . . 152
8.6.1 Find optimal solution and objective value . . . . . . . . . . . . . . 152
8.6.2 Find the dual optimal solution and objective value . . . . . . . . . 153
8.6.3 Find the inverse of optimal basis . . . . . . . . . . . . . . . . . . . 154
8.6.4 Recovering original LP problems from the optimal tableau . . . . . 155
8.6.5 Recovering missing data in simplex tableau . . . . . . . . . . . . . 156
9 Sensitivity Analysis 157
9.1 Change in the cost vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.2 Change in the right-hand-side . . . . . . . . . . . . . . . . . . . . . . . . . 160
9.3 Adding a new activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.4 Adding a new constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
10 The Big-M Method and Two-Phase method 167
10.1 The Big-M method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
10.2 The Two-phase Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
11 The Dual Simplex Method 181
11.1 Dual information from the (primal) optimal simplex tableau . . . . . . . . 182
4
CONTENTS
11.2 The Dual Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
12 Practice exercises 187
12.1 Exercises for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
12.3 Exercises for Chapters 3 & 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 191
12.6 Exercises for Chapters 7 & 8 & 9 . . . . . . . . . . . . . . . . . . . . . . . 194
12.7 Exercises for Chapters 10 & 11 . . . . . . . . . . . . . . . . . . . . . . . . 195
Index 195
Bibliography 195
5
Chapter 1
Linear maps, eigenvalues and

eigenvectors, and matrix
diagonalisation
In this chapter we study linear maps and their properties. The following topics will
be covered:
• linear maps and matrix representation
• rank-nullity theorem
• linear transformations and change of basis,
• diagonal and diagonalisable matrices
• eigenvalues and eigenvectors,
• characteristic polynomials,
• diagonalisation.
At the end of this chapter, students should be able to
• compute matrix representation of a linear map with respect to given bases
• apply rank-nullity theorem
• perform change of basis,
6
1.1. LINEAR MAPS, MATRIX REPRESENTATION, RANK-NULLITY THEOREM
• find the characteristic polynomial of a linear transformation,
• calculate eigenvalues and eigenvectors of a linear transformation by solving the

characteristic polynomial,
• determine diagonalisability and perform diganolisation of a matrix.
[Tao16, pages 39-93], [Str16, Chapters 6& 8] and [KW98, Chapters 8-12] present similar
topics as in this chapter that you may find useful.
1.1 Linear maps, matrix representation, rank-nullity

theorem
In this section we recapitulate relevant definitions and properties of linear maps that
have been covered in the module Vectors, Geometry & Linear Algebra.
Throughout this section, we denote by U and V two vector spaces over a common field
F.
Definition 1.1 (Linear map). A function (or mapping) f : U → V is called a linear map
(or linear transformation) if the following two conditions are satisfied
(i) For all u, v ∈ U : f (u + v) = f (u) + f (v),
(ii) For all u ∈ U and a ∈ F: f (au) = af (u).
Note that the vector spaces U and V can be equal. Linear maps appear naturally in
many applications in economics, physics and in everyday life. Below are some examples
of linear maps.
Example 1.2 (Linear transformations in computer graphics). Let U = R2 . Examples of

linear transformations that play important roles in computer graphics
(1) symmetric about the origin
R2 → R2
   
x −x
  7→   ,
y −y
7
(2) symmetric about the x-axis
R2 → R2
   
x x
  7→   ,
y −y
(3) symmetric about the y-axis
R2 → R2
   
x −x
  7→   ,
y y
(4) rotation around the origin through an angle θ in counterclockwise direction
R2 → R2
      
x cos(θ)x − sin(θ)y cos(θ) − sin(θ) x
  7→  =  ,
y sin(θ)x + cos(θ)y sin(θ) cos(θ) y
(5) dilation with scale factor λ
R2 → R2
   
x λx
  7→   .
y λy
Example 1.3 (Linear maps in economics). There are m products, P1 , . . . , Pm , that are
made up from n elements, e1 , . . . , en . Product Pi (i = 1, . . . , m) contains aij units of
element ej (j = 1, . . . , n). Let xi (i = 1, . . . , n) be the price of each unit of element ei .
What is the price list off all products?
Let pi (i = 1, . . . , m) be the price of Product Pi . Then

n
X
pi = aij xj .
j=1
We can write all these equations in a matrix form
p = Ax,
where      
p a . . . a1n x
 1  11  1
 ..   .. .   .. 
A =  . . . . ..  ,

p =  . , x =  . .
     
pm a1m . . . amn xn
8
The price list is an example of a linear map
Rn → Rm
      
x1 p1 a11 . . . a1n x
  1
 ..   .   . .  . 
 .  7→  ..  =  .. . . . ..   .. 
    
      
xn pm a1m . . . amn xn
When m = n we obtain a linear transformation on Rn .
Example 1.4 (Linear maps in daily life: bills for the meals). We actually encounter linear
maps daily. A group of m students go for a diner. The meals are made up of n courses.
The meal for student i, i = 1, . . . , m, consists of aij units of course j, j = 1, . . . , n. The
price of a unit of course j is xj , j = 1, . . . , n. Then the price list of all students is
    
p a . . . a1n x
 1   11   1
 ..   .. ..   .. 
 .  =  . ... .  . .
    
pm a1m . . . amn xn
We notice that mathematically this problem is exactly the same as in Example 1.3.
Example 1.5 (Linear maps in Physics). A typical example of a linear map is Newton’s
−
→
second law in classical mechanics expressing a linear dependence between the force F
acting on an object and the acceleration −
→
a that is generated by the force
−
→
F = m−
→
a,
where m is the mass of the object. Here U = R3 . Many other laws in Physics such as
Fourier’s law of heat conduction, Fick’s law of diffusion and Hooke’s law of elasticity also
satisfy similar linear relations.
The following proposition states basic properties of a linear map, see Theorem 10.2
in J1VGLA Lecture Notes.
Proposition 1.6 (Basics properties of linear maps). Let f : U → V be a linear map.

Then
(i) f (0) = 0,
(ii) For all u ∈ U : f (−u) = −f (u),
9
(iii) For all u1 , . . . , um ∈ U and a1 , . . . , am ∈ F for some m ∈ N, we have

X m Xm
f ai u i = ai f (ui ).
i=1 i=1
To define a function f : U → V , one needs to specify what f (u) is for every single
element u ∈ U , and there are usually infinitely many such elements. The most useful
property for linear maps is that one only need to specify for a basis of the domain vector
space U (as long as U is finite dimensional). This is given in the following theorem, which
is Theorem 10.3 in J1VGLA Lecture Notes.
Theorem 1.7. Suppose that U is finite-dimensional. Let {u1 , . . . , un } be a basis of U .

Given any elements v1 , . . . , vn ∈ V , there exists a unique linear map f : U → V such that
f (u1 ) = v1 , f (u2 ) = v2 , . . . , f (un ) = vn . (1.1)
Moreover this map f satisfies the identity
f (a1 u1 + . . . + an un ) = a1 v1 + . . . + an vn (1.2)
for any a1 , . . . , an ∈ F.
Proof. We first show the existence of a linear map f satisfying the condition (1.1). Let
u ∈ U . Since {u1 , . . . , un } is a basis of U , u can be written as
u = α1 u1 + . . . + αn un
for some α1 , . . . , αn ∈ F. Then we define a mapping f : U → V by
f (u) := α1 v1 + . . . + αn vn .
One can verify easily that f is linear. We have u1 = 1u1 + 0u2 + . . . + 0un . Thus by
definition
f (u1 ) = 1v1 + 0v2 + . . . + 0vn = v1 .
Similarly we obtain f (u2 ) = v2 , . . . , f (un ) = vn . That is (1.1) is satisfied. The identity

(1.2) follows from the linearity of f .
Now suppose that there exist two linear maps f1 and f2 both satisfying (1.1). Let
u = α1 u1 + . . . + αn un ∈ U . Then by the linearity of f1 we have
f1 (u) = αf1 (u1 ) + . . . + αn f1 (un ) = α1 v1 + . . . αn vn .
Similarly we also get f2 (u) = α1 v1 + . . . αn vn . Thus f1 = f2 .
10
We also recall the definition of coordinate of a vector with respect to a given basis.
Definition 1.8 (coordinates). Let V be an n-dimensional vector space and B = {u1 , . . . , un }

be an ordered basis of V . Let v ∈ V . The (uniquely determined) elements a1 , . . . , an ∈ F
such that
v = a1 u 1 + . . . + an u n
is called the coordinates of v. The coordinate vector of v is the column vector, [v]B , defined
by  
a
 1
.
[v]B =  ..  .
 
an
Inspired by Theorem 1.7 we consider the following definition.
Definition 1.9 (matrix of a linear map). Let U and V be finite-dimensional vector spaces
over a common field F with dim U = n and dim V = m. Let f : U → V be a linear map.
Let B = {u1 , . . . , un } be a basis of U and C = {v1 , . . . , vm } be a basis of V . The matrix
of f with respect to the bases B and C is the m × n matrix A with columns
[f (u1 )]C , [f (u2 )]C , . . . , [f (un )]C .
Example 1.10 (matrix representation of a linear map with respect to standard bases).
Let us consider again Example 1.3 where the price list p ∈ Rm for all students was
computed from the that of the courses x by
p = Ax,
where A = (aij )(i = 1, . . . , m; j = 1, . . . , n). Let bn = (e1 , . . . , en ) and bm = (e01 , . . . , e0m ) be

standard bases of Rn and Rm respectively. We have [x]bn = x and [Ax]bm = Ax = A[x]bn
since
n m
∀x ∈ Rn and y ∈ Rm .
X X
x= xi ei , y= yi e0i
i=1 i=1
Hence the matrix of this linear map with respect to these standard bases is exactly A.
We consider another example.
Example 1.11. Let f : R2 → R2 be a linear map given by

   
x1 x1 + 2x2
f   =  .
x2 2x1 − x2
11
Let B = {u1 , u2 } be a basis of R2 where

   
−1 1
u1 =   , u2 =   .
1 1
Find the representation matrix of f with respect to B? We have

           
1 −1 1 3 −1 1
f (u1 ) =   = −2   −   , f (u2 ) =   = −   + 2   .
−3 1 1 1 1 1
Therefore,    
−2 −1
[f (u1 )]B =   , [f (u2 )]B =   .
−1 2
Hence, by definition the matrix representation of f with respect to B is given by
 
−2 −1
A= .
−1 2
The following theorem provides a relation between [f (u)]c and [u]B for a general vector
u in U via the matrix representation. This is an important result which enables the study
of a general linear map via the study of its matrix representation.
Theorem 1.12. Let U and V be finite-dimensional vector spaces over a common field F
with dim U = n, dim V = m. Let B = {u1 , . . . , un } be basis of U and C = {v1 , . . . , vm }
be a basis of V . Let A be the matrix representation of f with respect to the bases B and
C. Then for every u ∈ U , we have
[f (u)]C = A[u]B .
Proof. Let [u]B = (b1 , . . . , bn )T , so that u = j=1 bj uj . Let A be the matrix representation
Pn
of f with respect to B and C with

 
a a12 . . . a1n
 11 
 
 a21 a22 . . . a2n 
 ..
A= .. ..

 . . .


 
am1 am2 . . . amn .
According to the definition, j-column of A, j = 1, . . . , n, is the coordinate vector [f (uj )]C

of f (uj ) with respect to the basic C. Thus
m
X
f (uj ) = aij vi .
i=1
12
Using this equation and the linearity of f , we have

n
X n
X n
X m
X Xm X
n
f (u) = f ( bj uj ) = bj f (uj ) = bj aij vi = aij bj vi ,
j=1 j=1 j=1 i=1 i=1 j=1
which means that

P    
n
a1j bj a a12 . . . a1n b
 j=1   11   1
 Pn    
 j=2 a2j bj   a21 a22 . . . a2n   b2 
[f (u)]C =  ..   .. .. ..   .. 
=    = A[u]B .
.   . . .  . 


P    
n
j=1 amj bj am1 am2 . . . amn . bn
Thus we obtain [f (u)]C = A[u]B as claimed and finish the proof of the theorem.
Note that sometimes we denote A by [f ]B

C . Thus we have
[f (u)]C = [f ]B
C [u]B .
If U = V we usually use the same basis B for U and for V . In this case, instead of saying
“the matrix of f with respect to B and B” we will say “the matrix of f with respect to
B”. The following theorem is, in some sense, a converse to Theorem 1.12.
Theorem 1.13. Let U and V be finite-dimensional vector spaces over a common field F
with dim U = n and dim V = m. Let B and C be two given ordered bases of U and V
respectively. Let A be m × n-matrix with entries in F. Then
(i) There exists a unique map f : U → V such that the matrix of f with respect to B
and C is A.
(ii) Suppose that g : U → V is a function such that for all u ∈ U we have [g(u)]C =
A[u]B . Then g is linear and g is the (unique) linear map from U to V which has
matrix representation A with respect to B and C.
Proof. (i) Let B = {u1 , . . . , un } and C = {v1 , . . . , vm }. Define wj = aij vi for j =

Pm
i=1
1, . . . , n. By Theorem 1.7, there exists a unique linear map f : U → V such that f (uj ) =
wj for all j = 1, . . . , n. Therefore, the coordinate vector [f (uj )]C is (a1j , a2j , . . . , amj )T ,
which is the j-th column of A. Thus by definition of a matrix representation, A is the
matrix representation of f with respect to B and C.
(ii). Let f : U → V be the linear map constructed in the previous part, so that f
has matrix representation A with respect to B and C. We need to prove that f = g.
13
Let u ∈ U be arbitrary. By Theorem 1.12, [f (u)]C = A[u]B . Thus [f (u)]C = [g(u)]C .

It follows that f (u) = g(u). Since u is arbitrary, we have f = g which means g is the
(unique) linear map from U to V which has matrix representation A with respect to B
and C.
The following Theorem, cf. Theorem 10.15 and its proof in J1VGLA Lecture Notes,
provides a formula for computing the matrix representation of the composition of two
linear maps.
Theorem 1.14 (Matrix of the composition of two linear maps). Let U, V and W be
finite-dimensional vector spaces over a field F with bases B, C and D respectively. Let
f : U → V and g : V → W be linear maps. Let K be the matrix of f with respect to B
and C and let L be the matrix of g with respect to C and D. Then LK is the matrix of
the composition of g and f , g ◦ f , with respect to B and D.
Now we introduce important concepts related to a linear map.
Definition 1.15 (Kernel, image, rank and nullity of a linear map). Let f : U → V be a
linear map. The kernel (also known as the null space), ker(f ), the image (also known as
the range), im(f ), the , rank(f ) and the nullity, nullity(f ), of f are defined respectively
by
• ker(f ) := {u ∈ U : f (u) = 0},
• im(f ) := {f (u) : u ∈ U } = {v ∈ V : there exists u ∈ U such that f (u) = v},
• rank(f ) := dim(im(f )),
• nullity(f ) := dim(ker(f )).
The following figure demonstrates these concepts.
The following lemma shows that ker(f ) and im(f ) are two subspaces of U and V respec-
tively.
Lemma 1.16. (i) The kernel of f is a subspace of U ,
14
𝑈
𝑓: 𝑈 → 𝑉 𝑉
𝐾𝑒𝑟(𝑓)
𝐼𝑚(𝑓)
0
0
Figure 1.1: Kernel and image of a linear map f from a vector space U to a vector space
V . The kernel of f which contains all pre-image of the vector 0 is a subspace of U while
the image of f , which consists of all f (u), with u ∈ U , is a subspace of V .
(ii) The image of f is a subspace of V .
The following theorem states an important relation between the rank, the nullity and
the dimension of U .
Theorem 1.17 (Rank-nullity theorem). Let U and V be finite-dimensional vector spaces

and let f : U → V be a linear map. Then
rank(f ) + nullity(f ) = dim(U ).
Proof. Suppose that 0 ≤ k ≤ n is the nullity of f and {u1 , . . . , uk } be a basis of ker(f ),

which is a subspace of U . By Theorem 9.49 in J1VGLA Lecture Notes, we can extend
this set to a basis {u1 . . . , uk , uk+1 , . . . , un } of U . Let B = {f (uk+1 , . . . , f (un ))} so that
B has n − k elements. We will show that B is a basis of im(f ). It is clear that B is a
subset of im(f ), so the span of B is contained in im(f ). Let us show that in fact, B spans
15
the whole of im(f ). Let v ∈ im(f ). Then v = f (u) for some u ∈ U . Since {u1 , . . . , un } is
a basis of U , we have u = a1 u1 + . . . + an un for some a1 , . . . , an ∈ F. Using the linearity
of f , we have
v = f (u) = a1 f (u1 ) + . . . + ak f (uk ) + ak+1 f (uk+1 ) + . . . + an f (un )
= ak+1 f (uk+1 ) + . . . + an f (un ),
where we have used the fact that f (u1 ) = . . . = f (uk ) = 0 since u1 , . . . , uk ∈ ker(f ). The
above equation shows that v ∈ span(f (uk+1 ), . . . , f (un )), thus im(f ) ⊂ span(f (uk+1 ), . . . , f (un )).
Therefore im(f ) = span(f (uk+1 ), . . . , f (un )). We now prove that the set B is linearly in-
dependent. Suppose that there exist b1 , . . . , bn−k ∈ F such that
b1 f (uk+1 ) + . . . + bn−k f (un ) = 0.
Since f is linear, we have
0 = b1 f (uk+1 ) + . . . + bn−k f (un ) = f (b1 uk+1 + . . . + bn−k un ) = 0.
This means that u := b1 uk+1 + . . . + bn−k un ∈ ker(f ). Since {u1 , . . . , uk } is a basis of

ker(f ), it follows that
b1 uk+1 + . . . + bn−k un = c1 u1 + . . . + ck uk
for some c1 , . . . , ck ∈ F. Thus
c1 u1 + . . . + ck uk − b1 uk+1 − . . . − bn−k un = 0.
Since {u1 , . . . , un } is a basis of U (thus they are linearly independent), we deduce that
c1 = . . . = ck = b1 = . . . = bn−k = 0.
Hence B is linearly independent and thus, it is a basis of ker(f ), and we obtain that
dim(U ) = n = k + (n − k) = dim(ker(f )) + dim(im(f )) = nullity(f ) + rank(f ).
This finishes the proof of the theorem.
Example 1.18. Let U = R3 , V = R2 and consider a linear map f : U → V given by
f ((x, y, z)T ) = (x − y, x + 2z)T .
16
The kernel of f is determined by
ker(f ) = {(x, y, z)T ∈ R3 : x = y and x = −2z} = {x(1, 1, −1/2)T : x ∈ R}.
Thus dim(ker(f )) = 1 and according to the rank-nullity theorem dim(im(f )) = dim(U ) −

dim(ker(f )) = 3 − 1 = 2.
Theorem 1.19 (Special matrix representation). Let f : U → V be a linear map between

two finite-dimensional vector spaces U and V . Let n = dim(U ), m = dim(V ) and r =
rank(f ). Then there exists a basic of U and a basic of V such that the matrix of f with
respect to these bases is  
Ir 0r,n−r
 ,
0m−r,r 0m−r,n−r
where Ir denotes the identity matrix of order r and 0a,b denotes the a × b matrix of 0s.
The proof of this theorem is omitted.
Definition 1.20 (transition matrix ). Let U be a finite-dimensional vector space over a

field F with dim U = n. Let B = {u1 , . . . , un } and B 0 = {u01 , . . . , u0n } be two ordered bases
of U . The transition matrix from B to B 0 is the n × n matrix P with columns
[u1 ]B 0 , . . . , [un ]B 0 .
Example 1.21 (example of transition matrix). Let U = R2 and let B = {u1 , u2 } and
B 0 = {u01 , u02 } be two ordered bases U where
       
1 1 2 3
u1 =   , u2 =   , u01 =   , u02 =   .
0 1 1 2
We have
       
1 2 3 2
u1 =   = 2   − 1   = 2u01 − u02 ; thus [u1 ]B 0 =  ,
0 1 2 −1
       
1 2 3 −1
u2 =   = −   +   = −u01 + u2 ; thus [u2 ]B 0 =  .
1 1 2 1
Hence the transition matrix from B to B 0 is

 
2 −1
P = .
−1 1
17
Remark 1 (Relation between transition matrix and matrix representation of a linear

transformation, see Theorem 10.18, VGA Lecture Notes). Let U be a finite-dimensional
vector space over a field F. Let B and B 0 be two ordered bases of U . Let P be the
transition matrix from B to B 0 . Then P is also the matrix representation of the identity
transformation 1U with respect to the bases B in the domain and B 0 in the co-domain.
The transition matrix is useful when one want to change coordinate systems.
Theorem 1.22 (change of coordinate systems). Let U be a finite-dimensional vector

space over a field F. Let B and B 0 be two ordered bases of U . Let P be the transition
matrix from B to B 0 . Then for any v ∈ U , we have
[v]B 0 = P [v]B .
Proof. Let v ∈ U . Suppose that  

α
 1
 .. 
[v]B =  . 
 
αn
that is
v = α1 u1 + . . . + αn un . (1.3)
Suppose that the transition matrix from B to B 0 is of the form

 
p p . . . p1n
 11 12 
 
 p21 p22 . . . p2n 
P =  .. .. .. .. 

 . . . . 
 
pn1 pn2 . . . pnn
By definition of a transition matrix, we have
u1 = p11 u01 + p21 u02 + . . . + pn1 u0n
u2 = p12 u01 + p22 u02 + . . . + pn2 u0n

..
. (1.4)
un = p1n u01 + p2n u02 + . . . + pnn u0n .
Substituting (1.4) back into (1.3) we obtain
v = α1 (p11 u01 + p21 u02 + . . . + pn1 u0n ) + . . . + αn (p1n u01 + p2n u02 + . . . + pnn u0n )
= (p11 α1 + p12 α2 + . . . + p1n αn )u01 + . . . + (pn1 α1 + pn2 α2 + . . . + pnn αn )u0n .
18
This means that

    
p α + p12 α2 + . . . + p1n αn p p . . . p1n α
 11 1   11 12   1
    
 p21 α1 + p22 α2 + . . . + p2n αn   p21 p22 . . . p2n   α2 
[v]B 0 =  ..   .. .. .. ..   ..  = P [v]B .
=  
.   . . . .  . 


    
pn1 α1 + pn2 α2 + . . . + pnn αn pn1 pn2 . . . pnn αn
This completes the proof of the theorem.
Example 1.23 (change of coordinate system). The change of coordinate systems plays an
important role in many real applications such as computer graphics and global positioning
system (GPS). Example let U = R2 , let B = {e1 , e2 } be the standard basis (coordinate
system) of U and let B 0 = {v1 , v2 } be another coordinate system where
   √   √  
1 3 1 3
v e + 2 e2 e
 1  =  2 √1  =  2√ 2   1
.
v2 − 23 e1 + 12 e2 − 23 12 e2
Thus B 0 is obtained from B by a rotation counterclockwise through an angle π/3 about

the origin (see Example 1.2 below). The transition matrix from B 0 to B is
 √ 
1 3
2
− 2 
P = √ .
3 1
2 2
Now let v = v1 + v2 , that is the coordinate of v with respect to B 0 is

 
1
[v]B 0 =   .
1
Then the coordinate of v with respect to the standard basis B is

 √    √ 
1 3 1− 3
2
−2 1 2 
[v]B = P [v]B 0 = √
    =  √ .
3 1 1+ 3
2 2
1 2
Theorem 1.24. Let B and B 0 be two ordered bases of U . Let P be the transition matrix
from B to B 0 . Then P is invertible and P −1 be the transition matrix from B 0 to B.
Proof. We first prove the following claim:
Claim: Let A be an n × n matrix over F. Suppose that Aw = 0 for all w ∈ Fn . Then

A = 0.
19
Indeed, let ej , j = 1, . . . , n be the j-unit vector of Fn , that is ej = (0, . . . , 0, 1, 0, . . . , 0)

where 1 is in j-th position. According to the hypothesis, Aej = 0 for all j = 1, . . . , n.
Since Aej is the j-column of A. Thus all columns of A are zero, thus A = 0.
Now we will use this claim to prove the statement of the Theorem. Let Q be the
transition matrix from B 0 to B. Let v ∈ U be an arbitrary vector in U . By applying
Theorem 1.22 twice, the first time for P and the second one for Q, we have
[v]B 0 = P [v]B = P (Q[v]B 0 ) = (P Q)[v]B 0 ,
where the last equality is due to the associativity of matrix multiplication. Since every
element of Fn can be written in the form [v]B 0 for some v ∈ U , the above equation implies
that
w = (P Q)w for all w ∈ Fn ,
or equivalently
(P Q − I)w = 0 for all w ∈ Fn .
According to the claim above, we deduce that P Q = I. Similarly we also obtain QP = I.

Thus P is invertible and P −1 = Q is the transition matrix from B 0 to B. Vice versa, we
also have Q is invertible and Q−1 = P .
Remark 2. Because of Theorem 1.24, some textbooks/sources will define the transition
matrix as being P −1 .
To conclude this section, we include two known theorems that will be useful later on.
Theorem 1.25. Let A and B be n × n matrices.
(i) det(AB) = det(A) det(B);
(ii) If A is invertible, then det(A−1 ) = det(A)−1 ;
(iii) det(A) = det(AT );
(iv) Suppose that A has a block upper-triangular form

 
A ∗ ... ∗
 1 
 0 A2 ... ∗
 

A=  .. .. .. ..

 . . . .


 
0 0 . . . Ak
20
1.2. CHANGE OF BASIS AND SIMILAR MATRICES
where Aj is an nj × nj -matrix for j = 1, . . . , k for some integers n1 , . . . , nk satisfying

n1 + . . . + nk = n, each ∗ denotes an arbitrary block of the appropriate dimensions,
and each 0 is the zero matrix of the appropriate dimensions. Then
det(A) = det(A1 ) det(A2 ) . . . det(Ak ).
Proof. See Chapter 8 in the lecture notes J1VGA of Vectors, Geometry & Linear Algebra.
Theorem 1.26. Let A be an n × n matrix over a field F. Then the following statements
are equivalent
(i) det(A) = 0,
(ii) There exists a non-zero w ∈ Fn such that Aw = 0,
(iii) A is not invertible,
(iv) The columns of A are linearly dependent.
Proof. See Corollary 8.34 and and Corollary 8.35 in the lecture notes J1VGA of Vectors,
Geometry & Linear Algebra.
1.2 Change of basis and similar matrices
Let F be a field, U be a finite-dimensional vector space over F, dim U = n, and f be

a linear transformation of U . The vector space U often has (infinitely many) different
bases and to each basis f has a matrix representation. The following theorem provides a
formula for computing the matrix of f when changing the basis.
Theorem 1.27 (change of basis). Let B = {u1 , . . . , un } and B 0 = {u01 , . . . , u0n } be two
ordered bases of U . Let P be the transition matrix from B to B 0 . Let A and A0 be the
matrix of f with respect to B and B 0 respectively. Then
A0 = P AP −1 .
Proof. We need to show that P AP −1 is the matrix of f with respect to B 0 . According to

Theorem 1.13-Part (ii) it is sufficient to show that for all u ∈ U one has
[f (u)]B 0 = (P AP −1 )[u]B 0 . (1.5)
21
1.3. DIAGONAL AND DIAGONALISABLE MATRICES
Since P is the transition matrix from B to B 0 , according to Theorem 1.22, we have
[f (u)]B 0 = P [f (u)]B . (1.6)
Since A is the matrix representation of f with respect to B, according to Theorem 1.12,

we have
[f (u)]B = A[u]B . (1.7)
By Theorem 1.24, P −1 be the transition matrix from B 0 to B. Using Theorem 1.22 again
we get
[u]B = P −1 [u]B 0 . (1.8)
Bringing (1.6), (1.7) and (1.8) together we obtain
[f (u)]B 0 = P [f (u)]B = P A[u]B = P AP −1 [u]B 0 = (P AP −1 )[u]B 0 ,
which is (1.5) as required. This finishes the proof of this theorem.
Inspired by the above theorem, we consider the following definition of similarity be-
tween two matrices.
Definition 1.28 (similar matrices). Two n × n-matrices A and B are said to be similar
if there is an invertible matrix P such that B = P AP −1 .
   
1 2 −6 −4
Example 1.29. A =   is similar to B =   since B = P AP −1 , which
3 4 16 11
can be verified by direct computations, where
 
1 −1
P = .
−1 2
In subsequent sections, we will see that similar matrices have many interesting prop-
erties in common.
1.3 Diagonal and diagonalisable matrices
Matrices that arise from real applications are often of high dimension. Computations
with such matrices are complicated and costly. For instance, suppose that one wants to
raise a matrix A to a large power, say A1001 . Using usual matrix multiplication will be
22
1.3. DIAGONAL AND DIAGONALISABLE MATRICES
extremely hard. In such situations, one wants to simplify complex matrices into simpler
ones. One particular class of matrices that particularly facilitate such tasks are diagonal
matrices.
Definition 1.30 (diagonal matrix). A n × n matrix A is said to be diagonal if all off-

diagonal entries are zeros, that is aij = 0 whenever i 6= j. In this case, we write
A = diag(a1 , . . . , an ).
Computations with diagonal matrices are easy:
• addition: diag(a1 , . . . , an ) + diag(b1 , . . . , bn ) = diag(a1 + b1 , . . . , an + bn ),
• scalar multiplication: αdiag(a1 , . . . , an ) = diag(αa1 , . . . , αan ),
• matrix multiplication:
diag(a1 , . . . , an )diag(b1 , . . . , bn ) = diag(a1 b1 , . . . , an bn ).
In particular,
diag(a1 , . . . , an )2 = diag(a21 , . . . , a2n ),
and more generally, raising diagonal matrices to high power can be done easily: for any
k∈N
diag(a1 , . . . , an )k = diag(ak1 , . . . , akn ).
By combining elementary rules above, we obtain that for any polynomial p:
p(diag(a1 , . . . , an )) = diag(p(a1 ), . . . , p(an )).
Example 1.31. Let A = diag(a1 , . . . , an ) and f (x) = 5x3 − 4x2 + 3x + 2. Then
f (A) = 5A3 − 4A2 + 3A + 2I = diag(5a31 − 4a21 + 3a1 + 2, . . . , 5a3n − 4a2n + 3an + 2).
Next we are particularly interested in the question whether a matrix A is similar to

some diagonal matrix.
Definition 1.32 (diagonalisable matrix). An n × n matrix A is said to be diagonalisable

if there exists a diagonal matrix D such that A is similar to D. In other words, A is said
to be diagonalisable if there exist a diagonal matrix D and an invertiblle matrix P such
that A = P DP −1 .
23
1.4. EIGENVALUES AND EIGENVECTORS
 
−1 2
Example 1.33. A =   is diagonalisable since A is similar to the diagonal
−1 2
   
1 0 1 1
matrix D =  . This is because A = P DP −1 where P =  .
0 3 1 −1
Computations with diagonalisable matrices are also easy: for any polynomial f and
a diagonalisable matrix A with A = P −1 DP where D = diag(d1 , . . . , dn ) then
f (A) = f (P −1 DP ) = P −1 f (D)P = P −1 diag(f (d1 ), . . . , f (dn ))P.
We also have a corresponding concept of diagonalisability for a linear transformation
Definition 1.34 (diagonalisable linear transformation). A linear transformation f : U →

U is said to be diagonalisable if there exists a basic C of U such that the matrix of f with
respect to C is diagonal.
1.4 Eigenvalues and eigenvectors
Let f : U → U be a linear transformation from a vector space U to itself. A simple

example of such a transformation is the identity transformation: f (u) = u for all u ∈ U .
Another simple example is the dilation that is a multiple of the identity: f (u) = λu for
all u ∈ U for some scalar λ.
Definition 1.35 (eigenvector and eigenvalue of a linear transformation). An eigenvector

v of f is a non-zero vector v ∈ U such that f (v) = λv for some scalar λ. The scalar λ is
called the eigenvalue of f corresponding to v.
Definition 1.36 (eigenspace). Let λ ∈ F. The λ-eigenspace of f is the set
Uλ := {v ∈ U : f (v) = λv}
Theorem 1.37. Let λ ∈ F. The λ-eigenspace of f is a subspace of U . Moreover,

Uλ = ker(λidU − f ) and
λ is an eigenvalue of f ⇐⇒ Uλ 6= {0} ⇐⇒ ker(λidU − f ) 6= {0}.
Above idU is the identity matrix on U , that is idU : U → U satisfying idU (u) = u for
all u ∈ U .
24
Proof. Let v ∈ U . Then (λidU − f )(v) = λv − f (v). Therefore
v ∈ Uλ ⇔ λv − f (v) = 0 ⇔ (λidU − f )(v) = 0 ⇔ v ∈ ker(λidU − f ).
This means that Uλ = ker(λidU − f ). In particular, it follows from 1.16 that Uλ is a

subspace of U .
Similarly we also have the concept of an eigenvalue/eigenvector of a square matrix.
Definition 1.38 (eigvenvector and eigenvalue of a square matrix). Let A be n × n-matrix

over F. An eigenvector v of A is non-zero vector v ∈ Fn such that Av = λv for some
λ ∈ F. The scalar λ is called the eigenvalue of A corresponding to v.
Example 1.39.
• A = diag(a1 , . . . , an ) and f (v) = Av, then the basis vectors e1 , . . . , en are eigenvec-
tors for f with eigenvalues a1 , . . . , an respectively.
 
1 4
• f : R2 → R2 with f (u) = Au for all u ∈ R2 where A =  . Then (2, 1)T is an
1 1
eigenvector of f with eigenvalue 3 since
        
2 1 4 2 6 2
f   =    =   = 3 .
1 1 1 1 3 1
The following theorem shows the equivalence between the set of eignevalues of a linear
transformation with that of its matrix representation with respect to some basis.
Theorem 1.40. Let U be a finite-dimensional vector space and B is an (ordered) basis

of U . Let f be a linear transformation on U . Suppose that A is the matrix representation
of f with respect to B. Then λ ∈ F is an eigenvalue of f if and only if it is an eigenvalue
of A.
Proof. We have λ ∈ F is an eigenvalue of f if and only if there exists 0 6= v ∈ U such that

f (v) = λv. Since [f (v)]B = A[v]B , we have
f (v) = λv ⇔ [f (v)]B = λ[v]B ⇔ A[v]B = λ[v]B .
Thus λ ∈ F is an eigenvalue of f if and only if it is an eigenvalue of A.
25
Because of the above theorem, it is sometimes more convenient when using matrix
representation.
Lemma 1.41. Suppose that U is an n-dimensional vector space and suppose that f :
U → U is a linear transformation. Suppose that U has an ordered basis B = (u1 , . . . , un )
such that each uj is an eigenvector of f with eigenvalue λj . Then the matrix A of f with
respect to B is diagonal: A = diag(λ1 , . . . , λn ). Conversely, if B = (u1 , . . . , un ) is a basis
such that the matrix A of f with respect to B is diagonal, A = diag(λ1 , . . . , λn ) then each
uj , j = 1, . . . , n is an eigenvector of f with eigenvalue λj .
Proof. Suppose that uj is an eigenvector of f with eigenvalue λj . Then f (uj ) = λj uj .

Thus [f (uj )]B is the column vector with j th entry equal to λj , and all other entries zero.
Putting all these column vectors together we obtain A = diag(λ1 , . . . , λn ). Conversely,
suppose that the matrix of f with respect to B is diagonal A = diag(λ1 , . . . , λn ). Then
[f (uj )]B = (0, . . . , λj , . . . , 0)T , which implies that f (uj ) = λj uj . Hence uj is an eigenvector
with eigenvalue λj .
The following theorem provides conditions for a square matrix to be diagonalisable

and how to diagonalise it.
Theorem 1.42. Let A be an n × n matrix and suppose that (v1 , . . . , vn ) is an ordered

basis of Rn such that each vj is an eigenvector of A with eigenvalue λj , j = 1, . . . , n. Then
we have
A = Qdiag(λ1 , . . . , λn )Q−1 ,
where Q is the n × n-matrix with columns v1 , . . . , vn :
Q = (v1 , . . . , vn ).
We will provide two proofs using different knowledge studied previously. The first
proof will use Theorem 1.27 on the formula for a change of basis. In this proof, one
combines matrix representation, transition matrix, with the recently introduced concepts
of eigenvectors and eigenvalues. The second proof directly applies matrix multiplications.
First proof. We will use Theorem 1.27. Let B 0 = (v1 , . . . , vn ) be the ordered basis of Rn
given in the hypothesis of the lemma. Let B = (e1 , . . . , en ) be the standard basis of Rn .
26
The transition matrix from B 0 to B is
P = ([v1 ]B , . . . , [vn ]B ) = (v1 , . . . , vn ) = Q.
Let f be a linear transformation determined by
f (v) = Av for all v ∈ Rn .
Since B is the standard basis of Rn , we have [w]B = w for all w ∈ Rn . Thus
[f (v)]B = f (v) = Av = A[v]B for all v ∈ Rn ,
which implies that A = [f ]B

B that is A is the matrix of f with respect to the basis B. We
now compute the matrix of f with respect to B 0 . We have
f (vj ) = Avj = λj vj ,
which implies that [f ]B

B 0 = diag(λ1 , . . . , λn ). By applying Theorem 1.27 we obtain
0
A = Qdiag(λ1 , . . . , λn )Q−1
as claimed.
Second proof. Recalling that Q = (v1 . . . vn ) where Avj = λj vj for 1 ≤ j ≤ n, then we

have
AQ = A(v1 . . . vn ) = (Av1 . . . Avn ) = (λ1 v1 . . . λn vn ) = (v1 . . . vn )diag(λ1 , . . . , λn )
= Qdiag(λ1 , . . . , λn ),
which means that AQ = Qdiag(λ1 , . . . , λn ). The above computations follow just from
the definition of matrix multiplications (remembering that v1 , . . . , vn are column vectors).
Since {v1 , . . . , vn } forms a basis of Rn , they are linearly independent. Thus Q is invertible.
Therefore, we get A = Qdiag(λ1 , . . . , λn )Q−1 as claimed
Theorem 1.43. Let A be an n × n matrix. Let v1 , . . . , vm ∈ U be eigenvectors of A

with eigenvalues λ1 , . . . , λm respectively. Suppose that λ1 , . . . , λm ∈ F are distinct. Then
v1 , . . . , vm are linearly independent.
Proof. We prove by contradiction: suppose that v1 , . . . , vm are linearly dependent. Then

there are some scalars α1 , . . . , αm that are not all equal to zero such that
α1 v1 + . . . + αm vm = 0. (1.9)
27
Without loss of generality we assume that α1 is non-zero. Applying (A − λm I) to both

sides of (1.9) and using the fact that A − λm I is linear we obtain
α1 (A − λm I)v1 + α2 (A − λm I)v2 + . . . + αm (A − λm I)vm = 0. (1.10)
Since vj is an eigenvector with eigenvalue λj , we have, for j = 1, . . . , m,
(A − λm I)vj = Avj − λm vj = λj vj − λm vj = (λj − λm )vj .
In particular, (A − λm I)vm = 0. Equation 1.10 becomes
α1 (λ1 − λm )v1 + . . . + αm−1 (λm−1 − λm )vm−1 = 0.
By applying both sides of this equation with (A − λm−1 I) and using again the argument
above we eliminate vm−1 and get
α1 (λ1 − λm )(λ1 − λm−1 )v1 + . . . + αm−2 (λm−2 − λm )(λm−2 − λm−1 )vm−2 = 0.
Repeating this process we can eliminate vm−2 , . . . , v2 and finally get
α1 (λ1 − λm )(λ1 − λm−1 ) . . . (λ1 − λ2 )v1 = 0.
By the hypothesis {λi } are all distinct, thus (λ1 − λm )(λ1 − λm−1 ) . . . (λ1 − λ2 ) is non-zero.
Since α1 is also non-zero, we must have v1 = 0. This contradicts with the fact that v1 is
an eigenvector. Hence the assumption that v1 , . . . , vm are linearly dependent can not be
true, that is they are linearly independent.
Corollary 1.44. Let λ1 , . . . , λm ∈ F be distinct and let Uλ1 , . . . , Uλm be the corresponding
eigenspaces. If Cj is a basis of Uλj for each j = 1, . . . , m, then the set
C1 ∪ C2 ∪ . . . ∪ C m
is linearly independent.
Proof. Suppose that Cj = {w1,j , . . . , wdj ,j } for each j = 1, . . . , m. Then we have
C1 ∪ C2 ∪ . . . ∪ Cm = {wi,j : 1 ≤ j ≤ m, 1 ≤ i ≤ dj }.
Suppose that
dj
m X
(1.11)
X
aij wij = 0
j=1 i=1
28
1.5. CHARACTERISTIC POLYNOMIAL
for some scalars αij ∈ F. We need to show that all aij are zero. For each j = 1, . . . , m,
define
dj
X
vj = aij wij .
i=1
Since Cj is basis of Uλj and Uλj is a subspace of U , we have vj ∈ Uλj . Equation (1.11)
becomes
m
X
vj = 0.
j=1
If at least one of v1 , . . . , vm is non-zero then the equality above means that we have a
linearly dependent set of eigenvectors with distinct eigenvalues which is contradicting
Theorem 1.43. Hence vj = 0 for all j = 1, . . . , m. By definition of vj , we obtain
dj
X
aij wij = 0.
i=1
Since Cj = {wij : 1 ≤ i ≤ dj } is a basis of Uλj , it is linearly independent. Therefore, we

must have aij = 0 for all 1 ≤ j ≤ m and 1 ≤ i ≤ dj , which implies that C1 ∪ C2 ∪ . . . ∪ Cm
is linearly independent as claimed.
Lemma 1.41 and Corollary 1.44 provide a strategy for diagonalising a linear transfor-
mation f : first find all the eigenvalues of λ1 , . . . , λm of f , then find a basis Cj for each
eigenspace Uλj and consider C = C1 ∪ C2 ∪ . . . ∪ Cm . Then by Corollary 1.44 C is linearly
independent and if the size of C is n then C is a basis of U , with respect to which f is
represented bi a diagonal matrix according to Lemma 1.41.
1.5 Characteristic polynomial
Let A be the matrix of a linear transformation f : U → U with respect to a fixed

basis B. According to Theorem 1.37, λ ∈ F is an eigenvalue of f if and only if ker(λidU −
f ) 6= {0}. The matrix of λidU − f with respect to B is λIn − A. By Theorem 1.26,
ker(λIn − A) 6= {0} ⇐⇒ det(λIn − A) = 0. Therefore, λ ∈ F is an eigenvalue of f if and
only if det(λIn − A) = 0.
Definition 1.45 (characteristic polynomial). The characteristic polynomial of an n × n-

matrix A is the polynomial χA (λ) = det(λIn − A).
By the argument preceding the definition, we have
29
Theorem 1.46. Let λ ∈ F. Then λ is an eigenvalue of A if and only if χA (λ) = 0, i.e.,

if and only if λ is a root of the characteristic polynomial of χA .
Remark 3. Note that, since det(λI −A) = (−1)n det(A−λI), det(λI −A) and det(A−λI)
have the same roots. Because of this reason, the characteristic polynomial of A can be
alternatively defined as det(A − λI) as in some literature.
Example 1.47. Let F = C and

 
7 10
A= .
−5 −8
The characteristic of A is
 
λ − 7 −10
χA (λ) = det   = (λ − 7)(λ + 8) + 50 = λ2 + λ − 6 = (λ − 2)(λ + 3).
5 λ+8
It follows that the eigenvalues of A are λ1 = 2 and λ2 = −3. To find eigenvectors

corresponding to λ1 we solve the equation
    
7 10 x x
   = 2 ,
−5 −8 y y
i.e., we have to solve the system of equations
7x + 10y = 2x; −5x − 8y = 2y.
This system is equivalent to x = −2y. Thus Uλ1 = {(x, y)T ∈ R2 : x = −2y} which has a
basis (−2, 1)T . Similarly to find eigenvectors corresponding to λ2 , we solve
    
7 10 x x
    = −3   ,
−5 −8 y y
which is equivalent to
7x + 10y = −3x; −5x − 8y = −3y.
Solving this system gives x = −y. Thus Uλ2 = {(x, y)T : x = −y} which has a basis
(1, −1)T .
Example 1.48. Let  

2 3
A= .
−3 −2
30
The characteristic of A is
 
λ − 2 −3
χA (λ) = det   = (λ − 2)(λ + 2) + 9 = λ2 + 5.
3 λ+2
If F = R, χA has no roots. Thus A has no eigenvalues over R. If F = C then χA has two

√ √
roots λ1 = − 5i and λ2 = 5i. Thus A has two eigenvalues λ1,2 over C. This example
shows that it is essential to remember the field that one is working on.
Theorem 1.49. The characteristic polynomials of similar matrices are equal.
Proof. Suppose that A and B are similar matrices, so that B = P −1 AP for some invertible
matrix P . We recall the following property of a determinant: for square matrices of equal
size A1 , A2 :
det(A1 A2 ) = det(A1 ) det(A2 ),
and in particular, if A1 is invertible then det A−1 det(A1 ) = 1. We have
χB (λ) = det(λIn − B) = det(λIn − P −1 AP ) = det(P −1 λIn P − P −1 AP )
= det(P −1 ) det(λIn − A) det(P ) = det(λIn − A)
= χA (λ).
Definition 1.50 (trace of a matrix). Let A = (aij ) be an n × n-matrix over a field F.

The trace of A, tr(A), is defined to be the sum of the diagonal entries of A:
n
tr(A) = aii ∈ F.
X
i=1
Theorem 1.51. The characteristic χA (λ) of an n × n matrix A has the form
χA (λ) = λn − tr(A)λn−1 + . . . + (−1)n det(A).
Proof. This can be proved by induction. See [Tao16, Theorem 4, page 179]
Example 1.52.
31
 
a11 a12
• For n = 2, A= , then
a21 a22
 
a11 − λ a12
χA (λ) = det   = (a11 − λ)(a22 − λ) − a12 a21
a21 a22 − λ
= λ2 − (a11 + a22 )λ + a11 a22 − a12 a21
= λ2 − tr(A)λ + det(A).
 
a 0 0
 
• For n = 3, A = 0 b 0, then
 
 
0 0 c
 
λ−a 0 0
 
χA (λ) = det  0 λ−b 0  = (λ − a)(λ − b)(λ − c).
 
 
0 0 λ−c
• More generally the characteristic of a diagonal matrix A = diag(a1 , . . . , an ) is easily

calculated
χA (λ) = (λ − a1 ) . . . (λ − an ).
From now on we assume that F = C (unless otherwise specified). This is due to the
following important theorem.
Theorem 1.53 (Fundamental theorem of algebra). Every polynomial with real or complex
coefficients has at least one complex root.
Corollary 1.54. Let p = p(z) be a non-zero polynomial of degree n with complex coeffi-
cients. Then p(z) can be written in the form
p(z) = a(z − λ1 )m1 (z − λ2 )m2 . . . (z − λk )mk ,
where 0 6= a ∈ C, λ1 , . . . , λk ∈ C are distinct, and m1 , . . . , mk are positive integers

satisfying m1 + . . . + mk = n.
The fundamental theorem of algebra is one of the most important theorems in Math-
ematics and has a long history. Many proofs have been proposed including those by
d’Alembert in 1746, Euler (1749), de Foncenex (1759), Lagrange (1772), and Laplace
32
(1795). A rigorous proof was first published by Argand in 1806 and revised in 1813.
For more information on the history of the fundamental theorem of algebra, one can see
[Sma81]. As far as I know (I do not have Argand’s paper), the following well-known proof
using the extreme value theorem, that is a real-valued continuous function on a bounded
interval must attain a minimum, is very close to the one by Argand.
Proof of Theorem 1.53. Let p(z) = p0 + p1 z + . . . + pn z n be a polynomial of degree n with

complex coefficients, with neither p0 nor pn equal to zero. Suppose the contradiction that
p has no roots in the complex plane. Define
K = {z ∈ C : |p(z)| ≤ |p0 |}.
Since p) is a polynomoial, |p(z)| tends to +∞ as |z| goes to infinity. Thus K is a compact

set (that is K is closed and bounded set). Since K is compact and |p| is continuous in K,
according to the extreme value theorem, |p| attains a minimum in K and this is also a
global minimum of |p| on the whole complex plane (since outside K we have |p| > |p0 |).
Without loss of generality we can assume that the minimum is attained at the origin (since
otherwise, if it is attained at some point a, we can consider the polynomial p(z + a)), so
that the minimal value is |p(0)| = |p0 |. Thus p is of the following form
p(z) = p0 + pm z m + . . . + pn z n ,
where pm 6= 0 is the first nonzero coefficient of p(z) following p0 . We will show that there
exists a point z ∗ such that |p(z)| < |p0 | which would contradict the assumption that p
attains its global minimum at 0. Define
∗ p0 1/m

z =r −
pm
where r is a small positive number which will be chosen later, and (−p0 /pm )1/m denotes
any of the m-roots of −p0 /pm . Clearly z ∗ 6= 0. We evaluate the value of p at z ∗ :
p(z ∗ ) = p0 − p0 rm + pm+1 rm+1 (−p0 /pm )(m+1)/m + . . . + pn rn (−p0 /pm )n/m = p0 − p0 rm + R,
where the remainder term R is given by
R := pm+1 rm+1 (−p0 /pm )(m+1)/m + . . . + pn rn (−p0 /pm )n/m .
33
Let s := r(|p0 /pm |)1/m . Using the triangle inequality, we can estimate the remainder as
follows
n
X
|R| ≤ ri pi (p0 /pm )i/m
i=m+1
p0 (m+1)/m
≤ rm+1 max |pi |(1 + s + . . . + sn−m−1 )
pm m+1≤i≤n
p0 (m+1)/m 1 − sn−m
= rm+1 max |pi |
m+1≤i≤n pm 1−s
r m+1
maxm+1≤i≤n |pi | p0 (m+1)/m
≤ .
1−s pm
Since the right-hand side tends to 0 as r tends to 0, for sufficiently small enough r, we
have |R| < |p0 |rm /2. Thus for such an r, we have
|p(z ∗ )| = |p0 − p0 rm + R| ≤ |p0 − p0 rm /2| = |p0 |(1 − rm /2) < |p0 | = |p(0)|.
This is a contradiction with the assumption that |p| attains its global minimum at the
origin.
The number mj in the theorem above is called the multiplicity of the root λj (j =
1, . . . , k). We now apply Theorem 1.53 to the characteristic polynomial χA of an n × n
matrix A
χA (λ) = (λ − λ1 )m1 (λ − λ2 )m2 . . . (λ − λk )mk , (1.12)
for distinct λ1 , . . . , λk ∈ F. By Theorem 1.46, λ1 , . . . , λk are precisely the eigenvalues of

f . Let Uλ1 , . . . , Uλk be corresponding eigenspaces. We define the geometric multiplicity
di of Uλi , i = 1, . . . , k to be
di = dim(Uλi ). (1.13)
To distinguish the two multiplicities, we refer to mi as the algebraic multiplicity of the

eigenvalue λi . The following theorem provides a relationship between them.
Theorem 1.55 (Algebraic and geometric multiplicities). Let A be an n × n-matrix with

complex entries. Let mi and di , i = 1, . . . , k, be the algebraic and geometric multiplicities
of the eigenvalue λi that are defined in (1.12) and (1.13) respectively. Then we have
di ≤ mi ∀i = 1, . . . , k.
34
Proof. We define a linear transform f : Cn → Cn by f (v) = Av. Let B = {e1 , . . . , en } be

the standard basis of Cn . Then the matrix representation of f with respect to B is A.
Let {v1 , . . . , vdi } be a basis of the eigenspace Uλi . We extend it to a basis
B 0 = {v1 , . . . , vdi , vdi +1 , . . . , vn }
of U . Since {v1 , . . . , vdi } is a basis of Uλi , we have f (vj ) = λi vj for j = 1, . . . , di . Let A0

be the matrix of f with respect to B 0 . Then A0 has the form
 
λ 0 ... 0 ∗ ... ∗
 i 
 0 λi . . . 0 ∗ ... ∗ 
 
 .. .. . . . . .. .. 
. .. ..
 
 . . . . 
 
A0 =  0 0 . . . λi ∗ ... ∗ ,
 
 
 0 0 ... 0 ∗ ... ∗ 
 
 .. .. .. .. .. .. .. 
 
 . . . . . . . 
 
0 0 ... 0 ∗ ... ∗
where ∗ indicates an unknown entry. Applying Theorem 1.25 part (iv), we obtain that
χA0 (λ) = det(λIn − A0 ) = (λ − λi )di p(t),
for some polynomial p(t) which is the characteristic of the bottom right block in A0 . By
Theorem 1.27, A and A0 are similar and hence by Theorem 1.49 we have χA (λ) = χA0 (λ) =
(λ − λi )di p(t). According to Theorem 1.53 the biggest power of λ − λi dividing χA (λ) is
(λ − λi )mi . This implies that di ≤ mi as claimed.
Theorem 1.56 (diagonalisability). A matrix is diagonalisable if and only if the algebraic

multiplicity and geometric multiplicity of each eigenvalue are equal.
Proof. Suppose that A has k distinct eigenvalues λ1 , . . . , λk with corresponding algebraic

and geometric multiplicities mi and di . Suppose that the algebraic multiplicity and geo-
metric multiplicity of each eigenvalue are equal, that is mi = di for all 1 ≤ i ≤ k. Then
we choose a basis Ci for each eigenspace Uλi and define C = C1 ∪ . . . ∪ Ck . Then accord-
ing to Corollary 1.44 C is linearly independent. The size of C is |C| = d1 + . . . + dk =
m1 + . . . + mk = n. Therefore C, which consists of eigenvectors of A, is a basis of Fn .
According to Theorem 1.42 A is diagonalisable.
35
1.6. AN ALGORITHM FOR DIAGONALISATION
Conversely, suppose that A is diagonalisable, that is there exist a diagonal matrix D

and an invertible matrix P such that A = P DP −1 . A and D have the same characteristic
polynomial and therefore the same eigenvalues with the same algebraic multiplicities.
Moreover for all 1 ≤ i ≤ k the dimensions of the nullspaces of λi I − λ and λi I − D are
equal. Thus the geometric multiplicity di of each λi is the same as the dimension of the
nullspace of λi I − D. Without loss of generality, assume D is of the form
D = diag(λ1 , . . . , λ1 , . . . , λk , . . . , λk )
where λi (1 ≤ i ≤ k) appears mi times. Hence the diagonal matrix λi I − D has exactly

mi zeros in its main diagonal and n − mi non-zeros entries. Thus the dimension of the
nullspace of λi I − D is exactly mi , which is the algebraic multiplicity of λi .
Corollary 1.57. The linear transformation f : U → U is diagonalisable if and only if

di = mi for all i = 1, . . . , k. That is f is diagonalisable if and only if the geometric
multiplicity of each of its eigenvalue is equal to the algebraic multiplicity.
1.6 An algorithm for diagonalisation
Let A be an n × n matrix. Is A diagonalisable? If yes, how to diagonalise A (that

is, how to find a diagonal matrix D and an invertible matrix P such that A = P DP −1 )?
Using Theorem 1.42 and Theorem 1.56, we can answer these questions following the
following steps.
Step 1. Compute the characteristic polynomial χA (λ) and factorise it
χ( A)(λ) = (λ − λ1 )m1 . . . (λ − λk )mk ,
with distinct λ1 , . . . , λk ∈ C.
Step 2. For each j = 1, . . . , k find a basis vj,1 , . . . , vj,dj for the eigenspace Uλj with
dj = dim(Uλj ).
• If dj < mj for some j ∈ {1, . . . , k} then A is not diagonalisable.
• If dj = mj for all j then A is diagonalisable and P is the matrix whose columns are
the basis vectors and D is the diagonal matrix D = diag(λ1 , . . . , λ1 , . . . , λk , . . . , λk )
where each λj appear dj times.
36
Example 1.58. Let  

1 1
A= 
0 1
We have
χA (λ) = (λ − 1)2 .
So A has only one the eigenvalue λ1 = 1 with algebraic multiplicity m1 = 2. Let us find
eigenvectors with respect to λ1 . We need to solve
    
1 1 x x
   =  ,
0 1 y y
that is equivalent to the system of equations
x + y = x, y = y.
Hence y = 0 and x is arbitrary,
Uλ1 = {(x, 0) : x ∈ C}.
It follows that the geometric multiplicity d1 = dim(Uλ1 ) = 1. Since d1 < m1 , f is not

diagonalisable.
Example 1.59. Let f : C2 → C2 be the standard linear transformation associated with

the matrix  
7 10
A= ,
−5 −8
that is f (v) = Av for all v ∈ C2 . We have already computed in Example 1.47:
χA (λ) = (λ − 2)(λ + 3),
so A has two distinct eigenvalues λ1 = 2 and λ2 = −3. Moreover, Uλ1 has a basis
v1 = (−2, 1)T and Uλ2 has a basis v2 = (1, −1)T . Since m1 = m2 = 1, f is diagonalisable.
The matrix of f with respect to the basis
 
−2 1
B0 =  
1 −1
is the diagonal matrix  

2 0
D= .
0 −3
37
Example 1.60. Let f : C3 → C3 be the standard linear map associated with the matrix
 
1 2 −4
 
A = −10 −11 20  .
 
 
−6 −6 11
The characteristic matrix of A is

 
1−λ 2 −4
 
χA (λ) = det  −10 −11 − λ 2
20  = (λ + 1) (λ − 3).
 
 
−6 −6 11 − λ
Hence A has two eigenvalues λ1 = −1 and λ2 = 3 with algebraic multiplicities m1 = 2

and m2 = 1 respectively. This implies that d2 = 1 and d1 is either 1 or 2. Let us find
eigenvectors v = (x1 , x2 , x3 )T corresponding to λ1 . We need to solve
  
2 2 −4 x
   1
(A + I)v = −10 −10 20  x2  = 0,
  
  
−6 −6 12 x3
that is 
2x1 + 2x2 − 4x3 = 0






 −10x1 − 10x2 + 20x3 = 0



−6x1 − 6x2 + 12x3 = 0.

This system is equivalent to the only equation x1 + x2 − 2x3 = 0. Thus

x1 + x2
Uλ1 = {(x1 , x2 , ) : x1 , x2 ∈ C}
2
= {x1 (1, 0, 1/2)T + x2 (0, 1, 1/2)T : x1 , x2 ∈ C}.
= h(1, 0, 1/2)T , (0, 1, 1/2)T i.
Hence d1 = dim(Uλ1 ) = 2 and has a basis
C1 = {v1 = (1, 0, 1/2)T , v2 = (0, 1, 1/2)T }.
Since d1 = m1 and d2 = m2 , f is diagonalisable. Let us find an eigenvector with eigenvalue

λ2 = 3. We need to solve
  
−2 2 −4 x
   1
(A − 3I)v = −10 −14 20  x2  = 0.
  
  
−6 −6 8 x3
38
One vector satisfies this equation is



−1
 
v3 =  5  .
 
 
3
Thus f is diagonalisable. The matrix of f with respect to the basis
B 0 = {v1 , v2 , v3 }
is the diagonal matrix  

−1 0 0
 
D =  0 −1 0 .
 
 
0 0 3
Example 1.61.  
1 1 0
 
A = 0 1 1
 
 
0 0 1
Similarly as in Example 1.58, A is non-diagonalisable since
χA (λ) = (1 − λ)3 , λ1 = 1, m1 = 3, d1 = 1 < m1 .
Example 1.62.  
−26 3
 
A = 0 −2 0 
 
 
4 0 −3
The characteristic polynomial of A is
 
λ−6 −3 2
 
det(λI − A) = det  0
 
λ+2 0 
 
−4 0 λ+3
 
λ−6 2
= (λ + 2) det  
−4 λ + 3
h i
= (λ + 2) (λ − 6)(λ + 3) + 8
= (λ + 2)2 (λ − 5).
39
1.7. SOME APPLICATIONS OF DIAGONALISATION
Thus
λ1 = −2, m1 = 2; λ2 = 5, m1 = d1 = 1.
We now compute d1 . Let u = (x, y, z)T ∈ Uλ1 . We have

  
−8 −3 2 x
  
0 0 y  = 0.
  
0
  
−4 0 1 z
Solving this system gives z = 4x, y = 0. Therefore,
Uλ1 = {(x, 0, 4x) : x ∈ C} = h(1, 0, 4)T i.
Hence d1 = 1 < m1 . Thus A is non-diagonalisable.
1.7 Some applications of diagonalisation
1.7.1 Systems of difference equations
Let x0 ∈ Rn be a given (initial) vector and A be a given n × n matrix. The equations
xk = Axk−1 for k ≥ 1, (1.14)
is called a system of difference equations. These type of equations appear in many practical
applications where the vector xk is often used to model the “state” of a biological/phys-
ical/economical system at time (day/month/year) k given the initial state x0 . Suppose
that A is diagonalisable. One can solve (1.14) as follows. We have
x1 = Ax0
x2 = Ax1 = A(Ax0 ) = A2 x0
x3 = Ax2 = A(A2 x0 ) = A3 x0
..
.
and in general for all k ≥ 1

xk = Ak x0 .
Thus to find xk , one needs to compute Ak . This can be done by diagonalising A = P DP −1 ,

where D is the diagonal matrix D = diag(λ1 , . . . , λn ) where λ1 , . . . , λn are eigenvalues of
40
A and P is the matrix formed by corresponding (column) eigenvectors. Then we have
Ak = P Dk P −1 thus xk = P Dk P −1 x0 ,
which can be computed easily since Dk = diag(λk1 , . . . , λkn ).
Below we consider a concrete and interesting example.
The Fibonacci sequence: The sequence of Fibonacci numbers Fn are defined re-
cursively as follows: given F0 = 0, F1 = 1 and for n ≥ 2
Fn = Fn−1 + Fn−2 . (1.15)
Thus, the sequence is
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . .
This sequence of numbers appears in nature such as in branching in trees, arrangement

of leaves on a stem, as well as in biology such as population of rabbits in months. It,
particularly the golden ratio also has many applications in architecture and music.
Theorem 1.63 (Explicit formula for Fibonacci numbers). For n ≥ 1:

√ √
(1 + 5)n − (1 − 5)n
Fn = √ . (1.16)
2n 5
Proof. There are many possible ways to prove this theorem. Here we will show a proof
using Linear Algebra. For k ≥ 1, we define
 
Fk
uk =  .
Fk−1
 
1
Thus u1 =   and from (1.15) we have
0
      
Fk+1
Fk + Fk−1 1 1 Fk
uk+1 = = =   = Auk ,
Fk Fk 1 0 Fk−1
 
1 1
where A =  . It follows that for all n ≥ 2,
1 0
un+1 = Aun = A2 un−1 = . . . = An u1 ,
41
that is    
F 1
 n+1  = An   . (1.17)
Fn 0
The key now is to calculate An which is n-th power of matrix A. This will be efficiently
done via diagonalisation. First we need to find eigenvalues of A. The characteristic
polynomial of A is
 
λ − 1 −1
pA (λ) = det(λI − A) = det   = λ2 − λ − 1.
−1 λ
By solving the quadratic equation pA (λ) = 0 we find two eigenvalues of A:

√ √
1+ 5 1− 5
λ1 = and λ2 = .
2 2
Since the two eigenvalues of A are real and distinct, A is diagonalisable.
  Nextwe 
need
x1 x2
to find a basis which consists of eigenvectors of A. Let v1 =   and v2 =   be
y1 y2
eigenvectors corresponding to λ1 and λ2 respectively, that is
         
1 1 x x 1 1 x x
   1  = λ1  1  , and    2  = λ2  2 
1 0 y1 y1 1 0 y2 y2
Solving these equations yields

       
x λ x2 λ2
 1 =  1 , and   =   .
y1 1 y2 1
According to Theorem 1.42 we have
A = P DP −1 ,
where      
λ1 λ2 1 1 −λ2 λ1 0
P =  , P −1 =  , D =  .
1 1 λ1 − λ2 −1 λ1 0 λ2
Thus    
1 λ1 λ2 λn1 0 1 −λ2
An = P Dn P −1 =    
λ1 − λ2 1 1 0 λn2 −1 λ1
Substituting this formula into (1.17) we get
        
n n+1 n+1
Fn+1 1 λ λ λ 0 1 −λ 1 1 λ − λ
 =  1 2  1  2
  =  1 2
.
Fn λ 1 − λ2 1 1 0 λ2 n
−1 λ1 0 λ 1 − λ 2 n
λ1 − λ2n
42
By equating the corresponding entries, we obtain

√ √
λn1 − λn2 (1 + 5)n − (1 − 5)n
Fn = = √ .
λ1 − λ2 2n 5
This finishes the proof of the theorem.
One can derive many interesting properties of Fibonacci numbers using explicit for-
mula obtained in Theorem 1.63. The next proposition states a connection between the
so-called golden ration with Fibonacci numbers.
Proposition 1.64 (The golden ration).

√
Fn+1 1+ 5
lim =
n→∞ Fn 2
The ratio above is known as the golden ratio which appears ubiquitously in architec-
ture and designs.
Proof. Using explicit formula of Fn given in Theorem 1.63 we get

√ √ √
Fn+1 (1 + 5)n+1 − (1 − 5)n+1 2n 5
lim = lim √ √ √
n→∞ Fn n→∞ 2n+1 5 (1 + 5)n − (1 − 5)n
√ √
1 (1 + 5)n+1 − (1 − 5)n+1
= lim √ √
2 n→∞ (1 + 5)n − (1 − 5)n
√ √ n
1−√5
√
1 (1 + 5) − (1 − 5)
= lim
1+ 5
√ n
2 n→∞ 1 − 1− √5
1+ 5
√
1+ 5
= ,
2
where to obtain the last equality we have use the fact that
1 − √ 5 n
lim √ = 0,
n→∞ 1 + 5
√
which is because 1−√5
1+ 5
< 1. This finishes the proof of this proposition.
Using the method in the proof of Theorem 1.63 and Proposition 1.64 one can find
explicit formula and properties for similar sequences that are defined recursively.
43
1.7.2 Systems of differential equations
We knew that the solution to the following ordinary differential equation (ODE)
dx(t)
= ax(t), x(0) = x0 ,
dt
where a, x0 ∈ R are given numbers, is given by
x(t) = eat x0 .
Now suppose we have an n × n system of ODEs
dx(t)
= Ax(t), x(0) = x0 , (1.18)
dt
where x0 ∈ Rn and A ∈ Rn×n are given. How do one solve this system? Can one define
the exponential of a matrix so that
x(t) = eAt x0 ?
We recall that the exponential function can be defined by the following power series
∞
x
X xi x 2 x3
e := =1+x+ + + ....
i=0
i! 2 6
In the above formula, the right hand sides consist of powers of x. We have known how
to define powers of a matrix. Inspired by this, we can define exponential of a matrix as
follows.
Definition 1.65 (exponential of a matrix). The exponential of a matrix A is defined as

the sum of the infinite series
∞
Ai
exp(A) :=
X
.
i=0
i!
When A is diagonalisable one can find exp(A) using the formula exp(A) = P −1 exp(D)P
with exp(D) = diag(ed1 , . . . , edn ). Once the exponential function of a matrix has been
defined, one can solve the system (1.18): its solution can be given as x(t) = exp(At)x0 .
Example 1.66. Solve the following system of ODEs


x0 (t) = y(t), x(0) = x0

y 0 (t) = x(t),

y(0) = y0 .
44
Let      
0 1 x(t) x0
A= , x(t) :=  , x(0) =  
1 0 y(t) y0
the above system can be written in a matrix form as in (1.18):
dx(t)
= Ax(t), x(0) = x0 . (1.19)
dt
To find the exponential of A we find its eigenvalues and corresponding eigenvectors. A

has two eigenvalues λ1 = 1 and λ2 = −1 with corresponding eigenvectors
   
1 1
u1 =   , u2 =   .
1 −1
Thus A can be diagonalised as

   −1
1 1 1 0 1 1
A=    .
1 −1 0 −1 1 −1
Hence the exponential matrix etA is given by

   −1
t
1 1 e 0 1 1
etA =   
1 −1 0 e−t 1 −1
    
t
1 1 e 0 1 1 i
 1
h
=  
1 −1 0 e−t 2 1 −1
  
t −t
1 e e 1 1
=   
2 et −e−t 1 −1
 
t −t t −t
1 e + e e − e
=  .
2 et − e−t et + e−t
Hence the solution to the system (1.19) is given by

    
t −t t −t
x(t) e +e e −e x
x(t) =   = etA x0 = 1    0
y(t) 2 et − e−t et + e−t y0
 
1 x0 (et + e−t ) + y0 (et − e−t )
= .
2 x0 (et − e−t ) + y0 (et + e−t )
45
1.8. JORDAN CANONICAL FORM
1.8 Jordan canonical form
As we have seen in the previous section and examples, not every matrix is diagonalis-
able. In this section we introduce a more general form than diagonal, namely the Jordan
canonical form, that every matrix can be represented in this form.
Definition 1.67 (Jordan block). Let λ ∈ C and ` ∈ N. The Jordan λ-block is of size ` is
the ` × `-matrix  
λ 1 0 0 ... 0
 
0 λ 1 0
. . . 0
 
 
 
0 0 λ 1 . . . 0
 ..
J` (λ) =  .. . . . . . . ..  .

. . . . . .
 
 
0 0 ... 0 λ 1
 
0 0 ... 0 0 λ
Thus J` (λ) is a bidiagonal matrix where the entries are equal to λ on the main diagonal
and to 1 on the diagonal above. For instance
 
  λ 1 0
λ 1  
J1 (λ) = (λ), J2 (λ) = , J3 (λ) =  0 λ 1  .
   
0 λ  
0 0 λ
Theorem 1.68 (Jordan canonical form). Let f : U → U be a linear transformation where

U is an n-dimensional vector space over C. Then there exists a basis C of U such that
the matrix J of f with respect to C is a block-diagonal matrix made of Jordan blocks, i.e.
 
J (λ ) 0 0 0
 `1 1 
0 J`2 (λ2 ) 0 0
 
 
J = . .
 
.

 0 0 0 
 
0 0 0 J`k (λk )
for some natural number `1 , . . . , `k and some λ1 , . . . , λk ∈ C (not necessarily distinct).

Moreover the matrix D, which is called the Jordan canonical form of f , is uniquely
determined by f up to permutation of the Jordan blocks.
A proof of Theorem 1.68 is technical and is out of the scope of this module. For
further information, see for instance [Str06, Appendix B]. The Jordan canonical form
46
1.8. JORDAN CANONICAL FORM
plays an important role in solving linear equations because of two reasons (1) it has a
very simple structure (upper triangular, and only 1s just above the diagonal) and (2) it
can be computed for any square matrix by Theorem 1.68. For instance for the matrix A
in Example 1.62 (recalling that A is non-diagonalisable)
 
6 3 −2
 
A = 0 −2 0 
 
 
4 0 −3
we have A = SJS −1 where

   
1 1 2 −2 1 0
   
S = 0 −7/3 0 , J =  0 −2 0 .
   
   
4 0 1 0 0 5
47
Chapter 2
Bilinear forms and inner products
In previous chapters, we have studied vector spaces and various important concepts
that are concerned with the two basis operations equipped to a vector space (addition and
scalar multiplications) such as bases, linear transformations, eigenvalues and eigenvectors
and diagonalisation. In this chapter, we will study another important operation with
vectors, namely inner products. Via inner products, we will be able to study geometrical
properties of vectors such as orthogonality and length of a vector. The following topics
will be covered:
• bilinear forms,
• the Gram matrix of a bilinear form,
• orthogonality,
• inner products,
• length of a vector,
• Pythagoras’s theorem and Cauchy-Schwartz inequality,
• Gram-Schmidt orthogonalisation.
• determine whether a map is a bilinear form,
• compute the Gram matrix of a bilinear form,
48
2.1. SYMMETRIC BILINEAR FORMS
• compute the length of a vector,
• apply Pythagoras’s theorem and Cauchy-Schwartz inequality,
• construct an orthogonal basis of a vector space by performing the Gram-Schmidt

orthogonalisation process.
[Tao16, pages 175-221], [Str16, Chapter 4] and [KW98, Chapters 3-5] contain relevant
material to this chapter that you may find helpful.
2.1 Symmetric bilinear forms
Throughout this chapter we consider a vector space U over a field F.
Definition 2.1 (Bilinear form). A bilinear form on U is a map Φ : U × U → F such that

the following two conditions are satisfied
(B1) Φ(u + v, w) = Φ(u, w) + Φ(v, w) and Φ(au, v) = aΦ(u, v) for all u, v, w ∈ U and
a ∈ F.
(B2) Φ(u, v + w) = Φ(u, v) + Φ(u, w) and Φ(u, av) = aΦ(u, v) for all u, v, w ∈ U and
a ∈ F.
In other words, Φ : U × U → F is bilinear if Φ is linear in the first argument when the

second argument is fixed and is linear in the second argument when the first argument is
fixed. Checking (B1) and (B2) is equivalent to checking the identities
Φ(au + v, w) = aΦ(u, w) + Φ(v, w) and (2.1)
Φ(u, av + w) = aΦ(u, v) + Φ(u, w) (2.2)
for all u, v, w ∈ U and a ∈ F. Since Φ is linear in each argument, by Lemma Proposition

1.6 we have
Φ(0, u) = Φ(u, 0) = 0 ∀u ∈ U.
49
2.1. SYMMETRIC BILINEAR FORMS
Proposition 2.2. Let Φ be a bilinear form on U . Let u1 , . . . , un , v ∈ U and a1 , . . . , an ∈ F.

Then we have
Xn n
X
Φ( ai ui , v) = ai Φ(ui , v),
i=1 i=1
n
X Xn
Φ(v, ai ui ) = ai Φ(v, ui ).
i=1 i=1
Definition 2.3 (symmetric bilinear form). A bilinear form Φ : U × U → F is said to be

symmetric if Φ(u, v) = Φ(v, u) for all u, v ∈ U .
If the identity Φ(u, v) = Φ(v, u) is satisfied for all u, v ∈ U then in order to prove
bilinearity of Φ it suffices to check either (B1) or (B2), the other condition follows auto-
matically.
For notational convenience, we often refer to a bilinear form as (−, −) and write (u, v)
when the arguments are explicit.
The standard scalar product on Fn . For u = (x1 , . . . , xn )T ∈ Fn and v =

(y1 , . . . , yn )T ∈ Fn then the standard scalar product, (u, v), of u and v is defined by
n
X
(u, v) := xi y i .
i=1
It is easy to verify that the standard scalar product is a symmetric bilinear form.
Definition 2.4. Let G = (gij )1≤i,j≤n be an n × n-matrix over F. We say that G is

symmetric if G = GT , i.e. gij = gji for all i, j ∈ {1, . . . , n}.
Bilinear form associated with a matrix. Let G = (gij )1≤i,j≤n be an n × n-matrix

over F. Given u = (x1 , . . . , xn )T ∈ Fn and v = (y1 , . . . , yn )T ∈ Fn , we define
n X
n
(2.3)
X
T
(u, v)G := u Gv = gij xi yj .
i=1 j=1
Lemma 2.5. The form on Fn associated with the matrix G defined in (2.3) is a bilinear
form.
Proof. We will verify (2.1) and (2.2). Let u, v, w ∈ Fn and a ∈ F. We have
(au + v, w)G = (au + v)T Gw = a uT Gw + v T Gw = a(u, w)G + (v, w)G .
Thus (2.1) is satisfied. (2.2) is verified similarly.
50
2.2. THE GRAM MATRIX OF A BILINEAR FORM
If G is symmetric, i.e., gij = gji for all i, j ∈ {1, . . . , n}. We have

n X
X n n X
X n n X
X n
(u, v)G = gij xi yj = gji yj xi = gij yi xj = (v, u)G ,
i=1 j=1 i=1 j=1 i=1 j=1
where to obtain the third equality we have switched the roles of indices i and j. Hence if
G is symmetric, so is the bilinear form (·, ·)G .
In particular, when G = In (the identity matrix), i.e.,

gij = 1 if i = j, 0 if i 6= j,
then
n X
X n n
X
(u, v)G = gij xi yj = xi y i .
i=1 j=1 i=1
Thus (u, v)G becomes the standard scalar product on Fn .

 
1 2
Example 2.6. Let F = R and G =  . Then G is symmetric and the (symmetric)
2 1
bilinear form on R associated with G is given by: for x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 :
n
   
x1 y1
(x, y)G =   ,   = x1 y1 + 2x1 y2 + 2x2 y1 + x2 y2 .
x2 y2
G
2.2 The Gram matrix of a bilinear form
Assume that U be a finite-dimensional vector space, dim U = n, over a field F. Let

(·, ·) be a bilinear form on U and B = {u1 , . . . , un } be a basis of U . Like a linear map, all
the information about a bilinear form can also be recorded by a matrix.
Definition 2.7 (The Gram matrix). . The Gram matrix of the bilinear form (·, ·) with
respect to the basis B is the n × n-matrix G = (gij )1≤i,j≤n over F that is defined by
gij = (ui , uj ).
The Gram matrix is useful since we can calculate the values of the form at any pair of
arguments using their coordinates. The following theorem transforms a general bilinear
form on U into a bilinear form on Fn that is associated to the Gram matrix G.
Theorem 2.8. Let G be the Gram matrix of the bilinear form (·, ·) with respect to a given
basis B of U . Then for any u, v ∈ U we have
(u, v) = ([u]B )T G[v]B . (2.4)
51
2.2. THE GRAM MATRIX OF A BILINEAR FORM
Proof. Let [u]B = (x1 , . . . , xn )T and [v]B = (y1 , . . . , yn )T , i.e.

n n
and v =
X X
u= xi ui yi ui .
i=1 i=1
Using Proposition 2.2 we compute

n
X n
X n
X n
X n X
X n
(u, v) = ( xi ui , v) = xi (ui , v) = xi (ui , yj vj ) = xi yj (ui , uj )
i=1 i=1 i=1 j=1 i=1 j=1
n X
X n
= xi yj gij = ([u]B )T G[v]B .
i=1 j=1
Thus (u, v) = ([u]B )T G[v]B as claimed.
Proposition 2.9. The Gram matrix of a symmetric bilinear form with respect to any
basis of U is symmetric.
Proof. Let (·, ·) be a symmetric bilinear form on U . Let B = {u1 , . . . , un } be an arbitrary

basis of U and G = (gij ) be the Gram matrix of the bilinear form with respect to B.
Since the bilinear form is symmetric, we have
gij = (ui , uj ) = (uj , ui ) = gji for all 1 ≤ i, j ≤ n,
thus G is also symmetric.
Example 2.10. Let U = P2 (R) be the vector space of polynomials of degree at most 2
together with the zero polynomial with real coefficients. We consider a form on U defined
by Z 1
(u, v) = u(x)v(x) dx. (2.5)
−1
By linearity of integrals, it is easy to check that the above form defines a symmetric bilinear
form on U . In fact, we verify (2.1), for u, v, w ∈ U
Z 1 Z 1 Z 1
(au + v, w) = (au(x) + v(x))w(x) dx = a u(x)w(x) dx + v(x)w(x) dx
−1 −1 −1
= a(u, w) + (v, w).
(2.2) is verified analogously. In addition, we have

Z 1 Z 1
(u, v) = u(x)v(x) dx = v(x)u(x) dx = (v, u).
−1 −1
52
2.3. ORTHOGONALITY
Thus the bilinear form is symmetric. We now compute the Gram matrix of this bilinear
form with respect to the basis B = {1, x, x2 } of U . We have
Z 1
g11 = (1, 1) = 1 dx = 2,
−1
Z 1
x2 1
g12 = (1, x) = x dx = = 0,
−1 2 −1
Z 1
2 x3 1 2
g13 = (1, x ) = x2 dx = = ,
−1 3 −1 3
Z 1
2
g22 = (x, x) = x2 dx = ,
−1 3
Z 1
x4 1
g23 = (x, x2 ) = x3 dx = = 0,
−1 4 −1
Z 1
2 2 x5 1 2
g33 = (x , x ) = x4 dx = = .
−1 5 −1 5
By symmetry we get
2
g21 = g12 = 0, g31 = g13 = and g32 = g23 = 0.
3
Thus we obtain  
2
2 0 3

 2 
G = 0 3 0 .
 
2 2
3
0 5
2.3 Orthogonality
Let (·, ·) be a symmetric bilinear form on U .
Definition 2.11. We say that vectors u, v ∈ U are orthogonal with respect to (·, ·), and
write u ⊥ v, if (u, v) = 0.
We say that {u1 , . . . , un } is an orthogonal set if ui ⊥ uj for all i 6= j ∈ {1, . . . , n}.
It is important to note that orthogonality is with respect to a given symmetric bilinear

form. For instance on U = R2 , the two vectors
   
1 −2
u =  , v =  
0 1
is orthogonal to the symmetric bilinear form (·, ·)G associated to the symmetric matrix
 
1 2
G= 
2 1
53
2.4. INNER PRODUCTS AND INNER PRODUCT SPACES
since
(u, v)G = uT Gv = 0.
However they are not orthogonal with respect to the standard scalar product (·, ·) on R2
since
(u, v) = −2 6= 0.
One important property of the concept of orthogonality is that it allows us to check

whether an orthogonal set is linearly independent or not.
Theorem 2.12. Let {u1 , . . . , un } be orthogonal vectors in U such that (ui , ui ) 6= 0 for all
i = 1, . . . , n. Then u1 , . . . , un are linearly independent.
Proof. Suppose that there exist α1 , . . . , αn ∈ F such that

n
X
αi ui = 0.
i=1
For each i = 1, . . . , n, multiplying the above equality with ui and using the fact that
(ui , uj ) = 0 for all i 6= j, we get
n
X n
X
0=( αj uj , ui ) = αj (uj , ui ) = αi (ui , ui ).
j=1 j=1
Since (ui , ui ) 6= 0, we get αi = 0 for all i = 1, . . . , n. Thus u1 , . . . , un are linearly

independent.
2.4 Inner products and inner product spaces
From now on we assume that F = R, that is we consider vector spaces over R.
Definition 2.13 (Inner products and inner product spaces). Let (·, ·) be a symmetric
bilinear form on U . We say that (·, ·) is an inner product if it is positive definite, that is
the two following conditions are satisfied
(i) (u, u) ≥ 0 for all u ∈ U ,
(ii) If u ∈ U is such that (u, u) = 0 then u = 0.
The vector space U equipped with an inner product is called an inner product space.
54
2.4. INNER PRODUCTS AND INNER PRODUCT SPACES
Example 2.14. Let U = Rn and (·, ·) be the standard scalar product, i.e.,
n
∀u, v ∈ Rn .
X
T
(u, v) = u v = ui vi
i=1
Then (·, ·) is symmetric. Further more

n
X
(u, u) = u2i ≥ 0
i=1
and (u, u) = 0 implies that ui = 0 for all i = 1, . . . , n that is u = 0. Hence the standard
scalar product is an inner product and Rn equipped with this inner product is an inner
product space.
We recall the following fact from Analysis: if f : [a, b] → R is a continuous function,

where a < b are real number, and f (x) ≥ 0 for all x ∈ [a, b], then
Z b
f (x) dx ≥ 0,
a
and moreover, the equality holds if and only if f (x) = 0 for all x ∈ [a, b].
Example 2.15 (Frobenius inner product on matrices). Let A, B be two matrices in

Mm,n (R). The Frobenius inner product hA, BiF is defined by
m X
X n
hA, BiF := Aij Bij .
i=1 j=1
The inner product induces the Frobenius norm

v
u m X
n
uX
p
kAkF = hA, AiF = t A2ij .
i=1 j=1
Example 2.16. We recall the bilinear form in Example 2.10: U = P2 (R) with the sym-
metric bilinear form defined in (2.5). Then for u ∈ U we have
Z 1
(u, u) = u(x)2 dx ≥ 0,
−1
and moreover (u, u) = 0 implies that u = 0. Thus the symmetric bilinear form (2.5) is an
inner product and P2 (R) equipped with it becomes an inner product space.
Example 2.17. Let (·, ·) be the bilinear form on R4 associated with the matrix
 
1 0 0 0
 
 
0 1 0 0 
G = diag(1, 1, 1, −1) = 
.

0 0 1 0 
 
0 0 0 −1
55
2.5. LENGTH OF A VECTOR, PYTHAGORAS’S THEOREM AND
CAUCHY-SCHWARTZ INEQUALITY
Since G is symmetric, the bilinear form is also symmetric. However, it is not positive
definite since, for example with u = (0, 0, 0, 1)T then (u, u) = −1 < 0. This form plays
an important role in Special Relativity, in the context of 4-dimensional space time, with
−1 corresponding to the time coordinate and the 1s corresponding to the three spatial
coordinates.
Theorem 2.18. Let (·, ·) be a symmetric bilinear form on U . Suppose that U has an
orthogonal basis {u1 , . . . , un } with (ui , ui ) > 0 for all i = 1, . . . , n. Then (·, ·) is positive
definite, that is it is an inner product on U .
Proof. Let u ∈ U and suppose that

n
X
u= xi ui .
i=1
For i = 1, . . . , n, we define ai = (ui , ui ). Then ai > 0. Using the fact that (ui , uj ) = 0 for
j 6= i, we have
n
X n
X
(ui , u) = (ui , xj uj ) = xj (ui , uj ) = xi (ui , ui ) = ai xi .
j=1 j=1
Now we compute
n
X n
X n
X
(u, u) = ( xi ui , u) = xi (ui , u) = ai x2i ≥ 0.
i=1 i=1 i=1
Moreover, (u, u) = 0 implies that xi = 0 for all i = 1, . . . , n, that is u = 0. Hence (·, ·) is

positive definite.
2.5 Length of a vector, Pythagoras’s theorem and

Cauchy-Schwartz inequality
In inner products spaces, we are able to introduce geometrical concepts between vec-
tors such as length of a vector, distance and angle between two vectors. We can also
generalise a well-known theorem, namely Pythagoras’s theorem for right triangles, to a
more general context.
Let U be an inner product space equipped with inner product (·, ·).
56
Definition 2.19. Let u ∈ U . We define the length kuk of u by

p
kuk = (u, u).
Example 2.20. Let U = Rn be equipped with the standard scalar product. Then the length
of a vector u = (u1 , . . . , un )T ∈ Rn is given by the classical formula
p q
kuk = (u, u) = u21 + . . . + u2n .
Example 2.21. For the inner product (2.5) on P2 (R) in Example 2.10, the length of the
vector 1 + x + x2 is
sZ
1
k1 + x + x2 k = (1 + x + x2 )2 dx
−1
sZ
1
= (1 + 2x + 3x2 + 2x3 + x4 ) dx
−1
r r
x 4 x5 1 22
= (x + x2 + x3 + + )|−1 = .
2 5 5
As in the Euclidean space, Pythagoras’ theorem also applies to orthogonal vectors in

general inner product spaces.
Theorem 2.22 (Pythagoras’ theorem). If u, v ∈ U are orthogonal then
ku + vk2 = kuk2 + kvk2 .
Proof. Since (u, v) = (v, u) = 0, we have
ku + vk2 = (u + v, u + v) = (u, u) + (u, v) + (v, u) + (v, v) = (u, u) + (v, v) = kuk2 + kvk2
as claimed.
Theorem 2.23 (Cauchy-Schwartz inequality). Let u, v ∈ U . Then
|(u, v)| ≤ kukkvk.
Proof. If v = 0 then the inequality holds true since both sides are equal to 0. We consider
the case v 6= 0. We define
(u, v)
λ= .
kvk2
57
Then we have
0 ≤ ku − λvk2
= kuk2 − 2λ(u, v) + λ2 kvk2

2
(u, v)2 hu, vi
= kuk2 − 2 2
+
kvk kvk2
kuk2 kvk2 − (u, v)2
= ,
kvk2
which implies that (u, v)2 ≤ kuk2 kvk2 , that is |(u, v)| ≤ kukkvk. Moreover, if the equality
holds, then u = λv, that is u and v is linearly dependent. Conversely, if u and v are
linearly dependent then there exists a non-zero a such that u = av. Then we have
|(u, v)| = |(av, v)| = |a|kvk2 = kukkvk.
In other words, the equality holds if and only if u and v are linearly dependent.
Cauchy-Schwartz inequality plays an important roles in many branches of Mathemat-

ics. Below are two typical examples.
Example 2.24. Let U = Rn and u = (u1 , . . . , un )T , v = (v1 , . . . , vn )T , then the Cauchy-

Schwartz inequality becomes
n n n
X 2 X X
ui vi ≤ u2i vi2 .
i=1 i=1 i=1
Example 2.25. Let U = C([a, b], R) be the set of all continuous functions from [a, b] to
R. Then for f, g ∈ U we can define an inner product by
Z b
(f, g) := f (x)g(x) dx.
a
The Cauchy Schwartz inequality becomes

Z b 2 Z b Z b
2 2
f (x)g(x) dx ≤ f (x) dx g (x) dx .
a a a
Definition 2.26 (distance between two vectors). Let (·, ·) be an inner product on a real
vector space V . The distance d(u, v) between two vectors u and v in V is defined by
d(u, v) = ku − vk.
58
2.6. GRAM-SCHMIDT ORTHOGONALISATION
Definition 2.27 (angle between two vectors). Let (·, ·) be an inner product on a real
vector space V . The angle between non-zero vectors u and v in V is the unique element
θ ∈ [0, π] such that
(u, v) = kukkvk cos θ.
Note that according to Cauchy-Schwartz inequality
|(u, v)|
0≤ ≤ 1,
kukkvk
thus the definition above is well-defined. In particular, u ⊥ v ⇐⇒ θ = π2 .
2.6 Gram-Schmidt orthogonalisation
Suppose that (·, ·) is a symmetric bilinear form on a vector space U over R. Let
u1 , . . . , un be vectors in U . From these vectors, we wish to construct a family of or-
thogonal vectors v1 , . . . , vn that hv1 , ..., vn i = hu1 , . . . , un i. The following Gram-Schmidt
orthogonalisation process allows us to do so
Gram-Schmidt process
v1 = u1 ,
(u2 , v1 )
v2 = u2 − v1 ,
(v1 , v1 )
(u3 , v1 ) (u3 , v2 )
v3 = u3 − v1 − v2 ,
(v1 , v1 ) (v2 , v2 )
..
.
k−1
X (uk , vi )
vk = uk − vi ,
i=1
(vi , vi )
..
.
n−1
X (un , vi )
vn = un − vi .
i=1
(vi , vi )
Note that in the computations, if (vk , vk ) = 0 for some 1 ≤ k ≤ n − 1 then we can not
compute vk+1 since that would require dividing by 0. In this case, the process stops at
the k-th step. If (vk , vk ) 6= 0 for all k ∈ {1, . . . , n − 1}, then we obtain n new vectors
59
v1 , . . . , vn ∈ U . We emphasize that in the Gram-Schmidt process the symmetric bilinear

form (·, ·) is not required to be an inner product.
Theorem 2.28. Suppose that u1 , . . . , un ∈ U . Assume that the Gram-Schmidt process

constructed above gets to the end. Then v1 , . . . , vn are orthogonal.
Proof. We prove by induction on k, 2 ≤ k ≤ n, that v1 , . . . , vk are orthogonal. For k = 2,

we have
(u2 , u1 ) (u2 , u1 )
(v2 , v1 ) = u2 − u1 , u1 = (u2 , u1 ) − (u1 , u1 ) = (u2 , u1 ) − (u2 , u1 ) = 0.
(u1 , u1 ) (u1 , u1 )
Suppose that the statement holds true up to k − 1 for 3 < k ≤ n − 1, that is v1 , . . . , vk−1
are orthogonal. We need to prove that v1 , . . . , vk are also orthogonal. Indeed, for any
1 ≤ j < k we have
k−1
! k−1
X (uk , vi ) X (uk , vi )
(vk , vj ) = uk − vi , vj = (uk , vj ) − (vi , vj )
i=1
(vi , vi ) i=1
(vi , vi )
(uk , vj )
= (uk , vj ) − (vj , vj ) = (uk , vj ) − (uk , vj ) = 0,
(vj , vj )
where to obtain the third equality we have used the induction hypothesis that (vi , vj ) = 0
for 1 ≤ i 6= j ≤ k − 1. So vk is orthogonal to each of v1 , . . . , vk−1 . Since v1 , . . . , vk−1 are
orthogonal, we obtain that v1 , . . . , vk are orthogonal, that is the statement holds true for
k. Hence the statement holds true for all 1 ≤ k ≤ n.
Theorem 2.29. Suppose that (·, ·) is an inner product on U and that {u1 , . . . , un } is a
basis of U . Then the Gram-Schmidt constructed above gets to the end and {v1 , . . . , vn }
form an orthogonal basis of U .
Proof. We first note that if vj is any vector constructed by the Gram-Schmidt process,
then by the formula defining {vk }1≤k≤j , vj belongs to the span of {u1 , . . . , uj }, i.e., vj ∈
hu1 , . . . , uj i. Hence if vk is constructed during the process, then
hv1 , . . . , vk−1 i ⊂ hu1 , . . . , uk−1 i (2.6)
since vj ∈ hu1 , . . . , uk−1 i for all j = 1, . . . , k − 1. Suppose that the Gram-Schmidt does
not get to the end, that is (vk , vk ) = 0 for some 1 ≤ k ≤ n − 1. Then by definition of the
inner product vk = 0. This implies that
k−1
X (uk , vi )
uk = vi .
i=1
kvi k2
60
So uk ∈ hv1 , . . . , vk−1 i. Together with (2.6) this implies that uk ∈ hu1 , . . . , uk−1 i. This
means that u1 , . . . , uk are linearly dependent which is a contradiction since {u1 , . . . , un } is
a basis of U . Therefore vk 6= 0 for all k = 1, . . . , n − 1. So the Gram-Schmidt does get to
the end. In addition using the same argument we deduce that vn 6= 0. Thus vi 6= 0 for all
i ∈ {1, . . . , n}. According to Theorem 2.12, together with the fact that {v1 , . . . , vn } are
orthogonal, we conclude that {v1 , . . . , vn } are linearly independent forming an orthogonal
basis of U .
Given a finite-dimensional space U , Theorem 2.29 provides us an algorithm for testing

whether a symmetric bilinear form (·, ·) on U is an inner product and construct (in some
cases) an orthogonal basis of U .
Step 1. We start with any basis {u1 , . . . , un } of U and apply the Gram-Schmidt
process to this basis. If (vk , vk ) = 0 for some 1 ≤ k ≤ n then (·, ·) is not an inner product
since according to Theorem 2.29 if it is an inner product vk 6= 0 for all k = 1, . . . , n.
Step 2. Suppose that (vk , vk ) 6= 0 for all k = 1, . . . , n, so the Gram-Schmidt does get
to the end. According to Theorem 2.28 we get an orthogonal basis {v1 , . . . , vn } of U .
• If (vk , vk ) < 0 for some k ∈ {1, . . . , n} then (·, ·) is not positive definite and obviously
is not an inner product.
• If (vk , vk ) > 0 for all k ∈ {1, . . . , n} then according to Theorem 2.18, (·, ·) is an
inner product.
We illustrate the above algorithm by several examples below.
Example 2.30. Let U = R2 and consider the symmetric bilinear form on U associated
with the matrix  
1 2
G= ,
2 1
i.e., for u = (x1 , x2 )T ∈ U and v = (y1 , y2 )T ∈ U , we have
(u, v)G = uT Gv = x1 y1 + 2x1 y2 + 2x2 y1 + x2 y2 .
We want to check whether (·, ·)G is an inner product on U . We apply the Gram-Schmidt
to the standard basis {e1 , e2 } of R2 .
 
1 (e2 , v1 )G
v1 = e 1 =   , v2 = e2 − v1 .
0 (v1 , v1 )G
61
By definition of (·, ·)G we compute

  
1 2 1
(e2 , v1 )G = (0, 1)     = 2,
2 1 0
  
1 2 0
(v1 , v1 )G = (0, 1)     = 1.
2 1 1
Thus      
0 1 −2
v2 = e2 − 2v1 =   − 2   =   .
1 0 1
We have   
1 2
−2
(v2 , v2 )G = −2 1     = −3 < 0.
2 1 1
Hence (·, ·)G is not an inner product. However, the Gram-Schmidt does get to the end
and we obtain an orthogonal basis (with respect to (·, ·)G ) {v1 , v2 }.
Example 2.31. Consider U = R2 equipped with the standard inner product. Let
   
2 1
u1 =   , u 2 =   .
1 1
Then (u1 , u2 ) is a non-orthogonal basis of U . We apply the Gram-Schmidt process to these
vectors
       
2 (u2 , v1 ) 1 3 2 −1/5
v1 = u1 =   , v2 = u2 − v1 =   −   =  .
1 (v1 , v1 ) 1 5 1 2/5
Then {v1 , v2 } is an orthogonal basis of U which can be directly verified.
Example 2.32. As in Examples 2.10 and 2.16, we consider the inner product on U =
P2 (R) given by Z 1
(u, v) = u(x)v(x) dx.
−1
In Example 2.16 we already proved that (·, ·) is an inner product. Using the computations
in Example 2.10, we now apply the Gram-Schmidt to the standard basis {1, x, x2 } of
U:
v1 = 1,
(x, 1) 0
v2 = x − 1 = x − 1 = x,
(1, 1) 2
2 2
(x , x) (x , 1) 0 2/3 1
v3 = x2 − x− 1 = x2 − x− 1 = x2 − .
(x, x) (1, 1) 2/3 2 3
Then {v1 , v2 , v3 } form an orthogonal basis of U = P2 (R).
62
Thus the Gram-Schmidt orthogonalisation process provides us a procedure to con-

struct an orthogonal basis of an inner product space from any arbitrary basis. Sometimes,
we also want to have an orthonormal basis, which is an orthogonal basis consisting of unit
vectors.
Definition 2.33. A basis {u1 , . . . , un } of an inner product space U is called an orthonor-

mal basis if it is an orthogonal basis and ui is a unit vector, that is kui k = 1 for all
i = 1, . . . , n.
A typical example of an orthonormal basis is the standard basis {e1 , e2 , e3 } of R3

equipped with the standard scalar inner product since
e1 · e2 = e1 · e3 = e2 · e3 = 0, ke1 k = ke2 k = ke3 k = 1.
Remark 4. To obtain an orthonormal basis from an orthogonal basis, we simply need to

normalise each basis vector by dividing the vector by its length. Thus, if {u1 , . . . , un } is
an orthogonal basis, then
n u un o
1
,...,
ku1 k kun k
is an orthonormal basis. Thus by normalising the orthogonal basis obtained from the Gram-
Schmidt orthogonalisation process we can get an orthonormal basis from any arbitrary
basis.
Example 2.34. We recall from Example 2.32 that {v1 , v2 , v3 } where
v1 = 1, v2 = x, v3 = x2 − 1/3,
is an orthogonal basis of the vector space U = P2 (R) with inner product

Z 1
(u, v) = u(x)v(x) dx.
−1
We compute the length of each vector v1 , v2 , v3 :

sZ sZ
1 √ 1 p
kv1 k = dx = 2, kv2 k = x2 dx = 2/3,
−1 −1
sZ
1 p
kv3 k = (x2 − 1/3)2 dx = 8/45.
−1
Hence we obtain an orthonormal basis for the above inner product space, which is given
by
n v
1 v2 v3 o n 1 x x2 − 1/3 o
, , = √ ,p , p .
kv1 k kv2 k kv3 k 2 2/3 8/45
63
A great advantage of an orthogonal/orthonormal basis is that it is easy to compute

the coordinate vector of any vector. Instead of solving a system of equations, one only
need to compute the inner product of the given vector with each of the basis vector. That
is the content of the following theorem.
Theorem 2.35. Let {u1 , . . . , un } be an orthogonal basis of an inner product space U .

Then for any vector u ∈ U , one can write
n
X (u, ui )
u= ui .
i=1
kui k2
Moreover,
n 2
2
X (u, ui )
kuk = .
i=1
kui k
Proof. Let u ∈ U . Since {u1 , . . . , un } is a basis of U , there exist scalar numbers a1 , . . . , an

such that
n
X
u= aj u j .
j=1
Using the orthogonality of {u1 , . . . , un } we have, for any i ∈ {1, . . . , n},

n
X Xn
(u, ui ) = aj uj , ui = aj (uj , ui ) = ai (ui , ui ),
j=1 j=1
which implies that ai = . Thus u = i=1 kui k2 ui . Now we compute

(u,ui ) (u,ui ) Pn (u,ui )
(ui ,ui )
= kui k2
n n n X
n n n 2
2
X X X X X (u, ui )
kuk = (u, u) = ai u i , aj u j = ai aj (ui , uj ) = a2i kui k2 = .
i=1 j=1 i=1 j=1 i=1 i=1
kui k
When {u1 , . . . , un } is an orthonormal basis, since kui k = 1 for all i = 1, . . . , n, the

theorem above becomes
Theorem 2.36. Let {u1 , . . . , un } be an orthonormal basis of an inner product space U .

Then for any vector u ∈ U , one can write
n
X
u= (u, ui )ui .
i=1
Moreover,
n
X
2
kuk = (u, ui )2 .
i=1
64
The above theorem provides a method to decompose/express a “complex object”

u (such as a complex signal) in terms of (as a linear combination of) simpler objects
u1 , . . . , un (such as simpler/basic signals). Because of this, orthogonal/orthonormal basis
has many applications such as in Fourier analysis, least square approximation, statistical
analysis of large data sets, signal, image and video processing. Some of these subjects will
be studied in year 3.
Example 2.37. Let U = Rn be the Euclidean vector space with the standard inner product
and let B = {e1 , . . . , en } be the standard basis. Since

1, if i = j

(ei , ej ) =
0, if i 6= j

B is an orthonormal basis of Rn . For a vector u = (u1 , . . . , un ) ∈ Rn , we already knew

that
n n n
and kuk =
X X X
2
u= ui e i = (u, ei )ei u2i .
i=1 i=1 i=1
Theorem 2.36 simply generalises this example to an orthonormal basis for a general inner
product space.
Example 2.38. We recall from Example 2.34 that {u1 , u2 , u3 } where
1 x x2 − 1/3
u1 = √ , u2 = p , u3 = p ,
2 2/3 8/45
is an orthonormal basis for the inner product space P2 (R) with

Z 1
(u, v) = u(x) v(x) dx.
−1
Let u = 2x2 − 3x + 4 be an element in P2 (R). Then u can be written as
u = a1 u1 + a2 u2 + a3 u3 ,
where
Z 1
1 28
a1 = (u1 , u) = √ (2x2 − 3x + 4) dx = √ ,
−1 2 3 2
Z 1
x 2 √
a2 = (u2 , u) = p (2x2 − 3x + 4) dx = − p = − 6,
−1 2/3 2/3
1 2
x − 1/3
Z
16 8
a3 = (u3 , u) = p (2x2 − 3x + 4) = p =√ .
−1 8/45 45 8/45 90
65
Chapter 3
Introduction to Linear Programming
This chapter is the beginning of the second part of this lecture notes that is about
Linear Programming (LP). The chapter provides key concepts of LP that include its
definition and terminologies, standard and cannonical forms. Some typically important
realistic problems that can be modelled and solved using LP are also discussed. The
following topics will be covered:
• definition of LP,
• terminologies of LP,
• standard forms of LPs,
• canonical forms of LPs,
• LP modelling.
At the end of this section, students will be able to
• understand the definition of a LP and relevant terminologies.
• write LPs in standard and canonical forms,
• recognise and formulate some typical problems in real applications into LPs.
[BT97, Chapter 1] and [DT97, Chapter 1] discuss similar topics as in this chapter that
you may find useful.
66
3.1. WHAT IS LINEAR PROGRAMMING?
3.1 What is Linear Programming?
• Mathematical optimization (mathematical programming) is a mathematical method

for determining a way to achieve the best outcome (e.g. maximum profit or lowest
cost) in a given list of requirements (constraints).
• Linear programming (LP) is a specific case of mathematical optimization. It is a

technique for minimizing (or maximizing) a linear objective function, subject to
linear equality and/or linear inequality constraints.
A good understanding of the theory and algorithms for linear programming is essential
for understanding
• nonlinear programming (nonlinear optimization),
• integer programming (integer optimization),
• combinatorial optimization problems,
• game theory,
• big-data sparse representation.
So, the theory of LP serves as a guide and motivating force for developing results for other
mathematical optimization problems with either continuous or discrete variables.
Example 3.1. A small machine shop manufactures two models: standard and deluxe.
• Each standard model requires 2 hours of grinding and 4 hours of polishing.
• Each deluxe module requires 5 hours of grinding and 2 hours of polishing.
• The manufacturer has three grinders and two polishers. Therefore in 40 hours week
there are 120 hours of grinding capacity and 80 hours of polishing capacity.
• There is a net profit of £3 on each standard model and £4 on each deluxe model.
67
3.2. DEFINITION OF LP:
To maximize the profit, the manager must decide on the allocation of the available pro-
duction capacity to standard and deluxe models.
Let x1 and x2 be the numbers of standard and deluxe models manufactured, respectively.
For this small machine shop, there are two constraints (restrictions):
• Grinding restriction:
2x1 + 5x2 ≤ 120.
• Polishing restriction:
4x1 + 2x2 ≤ 80.
The variables x1 , x2 are nonnegative:
x1 ≥ 0, x2 ≥ 0. (W hy?)
The net profit is

3x1 + 4x2 .
Therefore, we obtain the following model:
Maximize 3x1 + 4x2

subject to 2x1 + 5x2 ≤ 120,
4x1 + 2x2 ≤ 80,
x1 , x2 ≥ 0.
This is a linear programming problem.
3.2 Definition of LP:
Definition 3.2. A linear programming problem is the problem of minimizing or maximiz-

ing a linear function subject to linear constraints. The constraints may be equalities
or inequalities.
68
3.3. TERMINOLOGY
Minimize (Maximize) c1 x1 + c2 x2 + ... + cn xn
Subject to a11 x1 + a12 x2 + ... + a1n xn {≤, =, ≥} b1 ,
a21 x1 + a22 x2 + ... + a2n xn {≤, =, ≥} b2 ,

..
.
am1 x1 + am2 x2 + ... + amn xn {≤, =, ≥} bm ,
xj ≥ 0, j = 1, ..., n.
LP in matrix notation: Ťhe LP can be written as
Minimize (Maximize) cT x
subject to Ax (≥, =, ≤) b,
x ≥ 0,
where
       
a a12 · · · a1n b1 c1 x1
 11       
 a21 a22 · · ·
       
a2n   b2   c2   x2 
 ..
A= .. .. .. .. ..
, b =  , c =  ,x =  .
 . . . . . .
      
      
       
am1 am2 · · · amn bm cn xn
All vectors are understood as column vectors, unless otherwise stated.
The problem can be also represented via the columns of A. Denoting
A = [a1 , a2 , ..., an ]
where aj is the jth column of A, the LP can be written as

n
Minimize (Maximize)
X
cj x j
j=1
n
subject to
X
aj xj (≥, =, ≤) b,
j=1
x ≥ 0.
3.3 Terminology
• The function
cT x = c1 x1 + ... + cn xn
69
3.3. TERMINOLOGY
is called the objective function (criterion function, goal, or target).
• c = (c1 , c2 , ..., cn )T is called the cost coefficient.
• x1 , x2 , ..., xn are called the decision variables.
• The inequality ‘Ax ≤ b’ is called the constraint, and aij xj ≤ bi denotes the
Pn
j=1
ith constraint.
• A is called or coefficient matrix.
• b is called the right-hand-side vector.
• If x = (x1 , x2 , ..., xn )T satisfies all the constraints, it is called a feasible point, or

feasible solution.
• The set of all such points is called the feasible region (or feasible set).
Example 3.3.
Minimize 2x1 + x2
subject to x1 + x2 ≤ 8,
−3x1 + 5x2 ≤ 10,
x1 ≥ 0, x2 ≥ 0.
• What is the matrix form for this LP?
• What is the feasible region for this LP?
For this example, the problem has two decision variables x1 and x2 . The objective function
to be minimized is 2x1 + x2 . The LP problem is to find a point in the feasible region with
the smallest objective value.
70
3.4. STANDARD FORMS OF LPS:
3.4 Standard Forms of LPs:
Minimize cT x
subject to Ax = b,
x ≥ 0.
Throughout this course, we call the above minimization problem as a standard form. In
fact, any LP problem can be converted to this form.
Reduction to the standard form:
• An inequality can be easily transformed into an equality by adding a slack vari-

able
dT x ≤ α ⇐⇒ dT x + t = α, t ≥ 0,
dT x ≥ α ⇐⇒ dT x − t = α, t ≥ 0.
In matrix form:
Ax ≤ b ⇐⇒ Ax + s = b, s ≥ 0.
• A ‘free’ variable can be replaced by two nonnegative variables: If xj is unrestricted

in sign, then
xj = x0j − x00j , where x0j , x00j ≥ 0.
• Converting a maximization (minimization) problem into a minimization (max-

imization) problem: Note that over any feasible region, we have
maximum cT x = −minimum (−cT x).
So, by multiplying the coefficients of the objective function by −1, the maximization
problem can be converted into a minimization problem.
• Eliminating upper and lower bounds:
Lj ≤ xj ≤ Uj ⇐⇒ xj ≤ Uj and xj ≥ Lj
which are reduced to the first case.
71
3.5. CANONICAL FORM OF LPS
Example 3.4. Write down the standard form of the following linear programming:
max 4x1 + x2 − x3
s.t. x1 + 3x3 ≤ 6,
3x1 + x2 + 3x3 ≥ 9,
x1 , x2 ≥ 0, x3 is unrestricted in sign.
Example 3.5. Write down the standard form of the following linear programming:
min 3x1 − 4x2 + 2x3

s.t. 5x1 − 3x2 + x3 ≥ 7,
−4x1 + 9x3 ≤ 10,
x1 + x2 − 2x3 = 5,
x1 ≥ 0, x3 ≥ 0.
3.5 Canonical Form of LPs
Ťhe LP problem
min cT x
s.t. Ax ≥ b,
x≥0
is called the canonical form of the LP.
Ľater, we will see that the canonical form is very useful in the duality representation
of LPs. Introducing a slack vector, the canonical form can be easily converted to the
standard form
min cT x
s.t. Ax − s = b
x ≥ 0, s ≥ 0.
Čonversely, a standard from can be easily transformed to a canonical form. (Do this
exercise)
3.6 LP Modeling
Linear programming is an extremely powerful tool for addressing a wide range of

practical problems. It is used extensively in business, economics, and engineering prob-
lems. LP models have been widely used in transportation, energy, telecommunications,
72
3.6. LP MODELING
manufacturing, production planning, routing, flight crew scheduling, resource allocation,

assignment, and design, to name but a few.
Basic steps in the development of an LP model:
• Determine the decision variables.
• Determine the objective function.
• Determine the explicit constraints.
• Determine the implicit constraints.
Example 3.6 (Production Planning).
• A manufacturing firm has to decide on the mix of Ipads and IPhones to be produced.
• A market research indicated that at most 1000 units of Ipads and 4000 units of
IPhones can be sold per month.
• The maximum number of man-hours available is 50,000 per month.
• To produce one unit, an Ipad requires 20 man-hours and an IPhone requires 15

man-hours.
• The unit profits of the Ipads and IPhones are £300 and £400 respectively.
It is desired to find the number of unites of Ipads and IPhones that the firm must produce
in order to maximize its profit.
Determine the decision variables. Suppose that the number of the units for Ipads is x1
and the number of the units for IPhones is x2 .
Determine the explicit constraints. There are two kinds of constraints: The market
constraints and man-hours restriction. For the former, we have
x1 ≤ 1000,
x2 ≤ 4000.
73
3.6. LP MODELING
For the latter, we have

20x1 + 15x2 ≤ 50, 000.
Determine the objective function. The total profit is given by
300x1 + 400x2 .
Determine the implicit constraints. The number of Ipads and IPhones cannot be neg-
ative, so x1 , x2 ≥ 0.
Ťherefore, the problem is formulated as the following linear programming problem:
Maximize 300x1 + 400x2
subject to 20x1 + 15x2 ≤ 50, 000,
x1 ≤ 1000,
x2 ≤ 4000,
x1 , x2 ≥ 0.
Example 3.7 (The Workforce Planning Problem). Čonsider a restaurant that is open
seven days a week. Based on past experience, the number of workers needed on a particular
day is given as follows:
Day Mon Tue Wed Thu Fri Sat Sun

Number 14 13 15 16 19 18 11
Ěvery worker works five consecutive days, and then takes two days off, repeating this
pattern indefinitely. How can we minimize the number of workers that staff the restaurant?
Define the variables: Let the days be numbers 1 through 7 and let xi be the number
of workers who begin their five-day shift on day i.
Ľet us consider Monday as an example. People work on Monday must start to work
on Thur, Fri, Sat, Sun and Monday. Thus the total number of workers on Monday is
x1 + x4 + x5 + x6 + x7
74
3.6. LP MODELING
which is at least 14. So we have the following constraints
x1 + x4 + x5 + x6 + x7 ≥ 14.
Similarly, we may consider other days. Thus, the linear programming model for this
problem is given as follow.
7
min
X
xi
i=1
s.t x1 + x4 + x5 + x6 + x7 ≥ 14 (M on),
x1 + x2 + x5 + x6 + x7 ≥ 13 (T ue),
x1 + x2 + x3 + x6 + x7 ≥ 15 (W ed),
x1 + x2 + x3 + x4 + x7 ≥ 16 (T hur),
x1 + x2 + x3 + x4 + x5 ≥ 19 (F ri),
x2 + x3 + x4 + x5 + x6 ≥ 18 (Sat),
x3 + x4 + x5 + x6 + x7 ≥ 11 (Sun),
xi ≥ 0, i = 1, ..., 7.
Example 3.8 (The Diet Problem).
• There are m different types of food, F1 , . . . , Fm , that supply varying quantities of the
n nutrients, N1 , . . . , Nn , that are essential to good health.
• Let cj be the minimum daily requirement of nutrient Nj .
• Let bi be the price per unit of food Fi .
• Let aij be the amount of nutrient Nj contained in one unit of food Fi .
The problem is to supply the required nutrients at minimum cost.
Ľet yi be the number of units of food Fi to be purchased per day.
Ťhe cost per day of such a diet is
b1 y1 + b2 y2 + · · · + bm ym . (3.1)
75
3.6. LP MODELING
The amount of nutrient Nj contained in this diet is
a1j y1 + a2j y2 + · · · + amj ym
for j = 1, . . . , n.
W̌e need to ensure that all the minimum daily requirements are met, that is,
a1j y1 + a2j y2 + · · · + amj ym ≥ cj , j = 1, . . . , n. (3.2)
Of course, we cannot purchase a negative amount of food. So we have
y1 ≥ 0, y2 ≥ 0, . . . , ym ≥ 0. (3.3)
Ťhe problem is to minimize (3.1) subject to (3.2) and (3.3). This is exactly the fol-
lowing LP problem:
Minimize b1 y1 + b2 y2 + · · · + bm ym
s.t a1j y1 + a2j y2 + · · · + amj ym ≥ cj , j = 1, . . . , n,
y1 ≥ 0, y2 ≥ 0, . . . , ym ≥ 0.
Example 3.9 (The Transportation Problem).
• There are I production plants, P1 , ..., PI , that supply a certain commodity, and there
are J markets, M1 , . . . , MJ , to which this commodity must be shipped.
• Plant Pi possesses an amount si of the commodity (i = 1, 2, . . . , I), and market Mj

must receive the amount rj of the commodity (j = 1, . . . , J).
• Let cij be the cost of transporting one unit of the commodity from plant Pi to market
Mj .
The problem is to meet the market requirements at minimum transportation cost.
1) Let xij be the quantity of the commodity shipped from plant Pi to market Mj .
76
3.6. LP MODELING
2) The total transportation cost is

I X
J
(3.4)
X
xij cij .
i=1 j=1
The amount sent from plant Pi is

J
X
xij
j=1
and since the amount available at plant Pi is si , we must have

J
xij ≤ si , for i = 1, . . . , I. (3.5)
X
j=1
3) The amount sent to market Mj is

I
X
xij
i=1
and since the amount required there is rj , we must have

I
(3.6)
X
xij ≥ rj , j = 1, . . . , J.
i=1
4) It is assumed that we cannot send a negative amount from plant Pi to Mj . So, we

have
xij ≥ 0, i = 1, . . . , I, j = 1, . . . , J. (3.7)
Ťhe problem is to minimize the objective (3.4) subject to the constraints (3.5), (3.6)
and (3.7), i.e.,
I X
J
Minimize
X
xij cij
i=1 j=1
J
X
s.t xij ≤ si , i = 1, . . . , I,
j=1
I
X
xij ≥ rj , j = 1, . . . , J,
i=1
xij ≥ 0, i = 1, . . . , I, j = 1, . . . , J.
Example 3.10 (The Assignment Problem).
77
3.6. LP MODELING
• There are 100 persons available for 10 jobs.
• The value of person i working 1 day at job j is aij (i = 1, . . . , 100, j = 1, . . . , 10).
• Let us suppose that only one people is allowed on a job at a time (two people cannot
work on the same job at the same time).
The problem is to choose an assignment of persons to jobs to maximize the total value.
Ǎn assignment is a choice of numbers, xij, for i = 1, . . . , 100, and j = 1, . . . , 10, where

xij represents the proportion of person i’s time that is to be spent on job j. Thus,
10
(3.8)
X
xij ≤ 1, i = 1, . . . , 100,
j=1
100
(3.9)
X
xij ≤ 1, j = 1, . . . , 10,
i=1
and
xij ≥ 0, i = 1, . . . , 100, j = 1, . . . , 10. (3.10)
• Inequality (3.8) reflects the fact that a person cannot spend more than 100 percentage
of his time working.
• Inequality (3.9) means that the total time spent on a job is not allowed to exceed a
day.
• (3.10) says that no one can work a negative amount of time on any job.
Subject to constraints (3.8), (3.9) and (3.10), we wish to maximize the total value,
100 X
10
(3.11)
X
aij xij .
i=1 j=1
This is the following LP problem:

100 X
10
max
X
aij xij
i=1 j=1
10
X
s.t xij ≤ 1, i = 1, . . . , 100,
j=1
100
X
xij ≤ 1, j = 1, . . . , 10,
i=1
xij ≥ 0, i = 1, . . . , 100, j = 1, . . . , 10.
78
3.7. A BRIEF HISTORY OF LPS
3.7 A brief history of LPs
Ľinear programming was developed during the second world war to plan expenditures
and returns in order to reduce costs to the army and increase losses to the enemy.
Ǐt was kept secret until 1947. Postwar, many industries found its use in their daily
planning.
• Leonid Kantorovich, a Russian mathematician, developed linear programming

problems in 1939.
• George B. Dantzig published the simplex method in 1947.
• John von Neumann developed the theory of the duality in the same year.
• The linear programming problem was first shown to be solvable in polynomial time
by Leonid Khachiyan in 1979.
• A larger theoretical and practical breakthrough in the field came in 1984 when
Narendra Karmarkar introduced a new interior point method for solving linear
programming problems.
79
Chapter 4
LPs with 2 Variables: Geometric

Method
In this chapter, we will study LPs with 2 variables and how to solve them using the
geometric method. Topics that will be covered:
• notion of steepest descent,
• graphical method for solving LPs with two variables.
At the end of this chapter students should be able to use geometric method to solve a
two variable LP problem determining whether it is infeasible, unbounded or has a finite
solution. [DT97, Chapter 2] presents similar topics to this chapter that you may find
helpful.
80
The following result contains the underlying idea of the geometric method for solving
LPs with two variables that we will study in this chapter.
Proposition 4.1. Let f (x) be a continuously differentiable function where x ∈ Rn . At

any point x, moving in the direction of gradient ∇f (x) 6= 0 for any small stepsize, the
function value increases, and moving in the direction −∇f (x) for any small stepsize, the
function value decreases.
This can be seen from the directional derivative:

f (x + λd) − f (x)
lim = ∇f (x)T d.
λ→0 λ
Corollary 4.2. Moving in the direction c, the value of the function cT x increases; moving
in the direction −c, the function value decreases.
Geometric solution:
Minimizing cT x corresponds to moving the plane cT x =constant in the direction of

−c as far as possible.
• When the feasible region is unbounded, we might move the plane indefinitely while
always intersecting the feasible region, and hence the problem is unbounded.
• For a bounded and some unbounded feasible regions, the objective hyperplane moves
and must stop at a certain point, otherwise it goes beyond the feasible region. In
such cases, the problem has a finite optimal solution.
• The feasible set might be empty, and hence the problem is infeasible.
Example 4.3 (Finite solution case). Solve the LP geometrically (or graphically)
Minimize x1 + x2
subject to 2x1 + x2 ≤ 6,
−x1 + x2 ≤ 4,
x1 , x2 ≥ 0.
Three steps to solve the problem:
81
1 . Sketch the feasible set
2 . Pass one point in feasible set and draw the objective hyperplane
3 . For a minimization problem, move the objective hyperplane in the direction of

−c to reach the optimal solution. For a maximization problem, move the objective
hyperplane in the direction of c to obtain the optimal solution.
x2
-x1+x2=4
Objective
hyper plane
(0,0) x1 (0,0)
2x1+x2=6
-c
Figure 4.1: Figure for Example 4.3: In the left picture, the feasible domain is the shaded
region formed by the two lines x1 + x2 = 6 and −x1 + x2 = 4 as well as the two axes.
In the right picture, we move the objective hyperplane x1 + x2 along the direction of the
negative of the cost coefficient vector until reaching the optimal point (0, 0).
Example 4.4 (Unbounded case). Solve the LP geometrically
Maximize x1 + x2
subject to −x1 + x2 ≤ 4,
x1 , x2 ≥ 0.
82
Štill, we follow the above three steps and it is easy to see from Figure 4.2 that that this
problem has an unbounded solution (no finite optimal solution).
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
-x1+x2=4
x2 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
x
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Unbounded
feasible region
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxx
x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxx c
xxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxx
x
xxxxxxxxxx
x1
xxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx c
xxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxx
xxxxxxxxxx
Objective function
Figure 4.2: Example 4.4: The feasible region, which is formed by the line −x1 + x2 = 4
and the two axes, is unbounded. Moving the objective hyperplane along the direction of
the cost coefficient vector will increase the objective function infinitely. Thus the problem
has an unbounded solution.
Example 4.5 (Infeasible case). Solve the LP geometrically
Minimize x1 + x2
subject to −x1 + x2 ≥ 4,
x1 − 2x2 ≥ 6,
x1 , x2 ≥ 0.
In summary, given an LP, there are only three cases:
1 . The problem has a finite optimal solution (unique optimal solution or infinitely
many optimal solutions).
2 . The solution of LP is unbounded.
83
3 . The problem is infeasible, i.e., the feasible region is empty.
Proposition 4.6. If a linear programming problem has a finite optimal solution, then it
has either a unique solution or infinitely many solutions.
Proof. Consider the linear programming problem
min cT x
s.t Ax = b,
x ≥ 0.
Suppose that the optimal solution of the LP is not unique. Then it has at least two
different optimal solutions, denoted by x
b and x̄.
Since x
b and x̄ are optimal solutions, they must be feasible to the problem, and the
objective values at these two solutions are equal, i.e.
Ax̂ = b, x̂ ≥ 0, Ax̄ = b, x̄ ≥ 0
and
cT x̂ = cT x̄ = z ∗ ,
where z ∗ denotes the optimal objective value. Now, let’s consider the point of the form
x(α) = αx̂ + (1 − α)x̄ (4.1)
where α can be any number in (0,1).
First, we note that x̂ ≥ 0 and x̄ ≥ 0. By the construction of x, we must have x(α) ≥ 0.
Second, we note that
Ax(α) = A(αx̂ + (1 − α)x̄)
= αAx̂ + (1 − α)Ax̄
= αb + (1 − α)b
= b.
Therefore, x(α) is a feasible solution to the linear programming problem. Finally, checking
84
the function value at x(α), we have
cT x(α) = cT (αx̂ + (1 − α)x̄)
= αcT x̂ + (1 − α)cT x̄
= αz ∗ + (1 − α)z ∗
= z∗,
which indicates that the objective value at x(α) is also optimal.
Therefore, x(α) is an optimal solution of the LP. Since α can be any number in (0,1),
all x(α) given by (4.1) are optimal to the LP, and hence the LP problem has infinitely
many optimal solutions.
85
Chapter 5
Introduction to Convex Analysis
In this chapter, we study basic knowledge in convex analysis that will be used in
subsequent chapters. The following topics will be covered
• basic concepts in convex analysis (convex set, convex functions, etc)
• extreme points and directions,
• representation of polyhedra.
At the end of this section, students should be able to
• understand basics concepts in convex analysis,
• compute extreme points and extreme directions of simple convex sets.
[BT97, Chapter 2] contains similar material to this chapter that you may find useful.
5.1 Basic concepts in convex analysis
Hyperplane and half-space:
• A hyperplane H in Rn is a set of the form
{x : pT x = α}
where p is a nonzero vector called the normal or the gradient to the hyperplane.
86
5.1. BASIC CONCEPTS IN CONVEX ANALYSIS
• A hyperplane divides Rn into two regions, called half-spaces. A half-space is the

collection of points of the form
{x : pT x ≥ α} or {x : pT x ≤ α}.
W̌e often use the following symbols:
H = {x : pT x = α},
H + = {x : pT x ≤ α},
H − = {x : pT x ≥ α}.
Convex combination
When 0 ≤ α ≤ 1, the point αx + (1 − α)y is called the convex combination of x and

y. It is the line segment between x and y.
In general, we have the following definition.
Definition 5.1. A vector b is said to be convex combination of the vectors u1 , ..., uk if

k k
λj u , where
X X
j
b= λj = 1, λj ≥ 0, j = 1, ..., k.
j=1 j=1
 
2/3
Example 5.2. Write b =   as a convex combination of three vectors
1/2
     
1 0 1
u1 =   , u2 =   , u3 =  
0 1 1
We need to find α, β, γ ≥ 0 such that
b = αu1 + βu2 + γu3 and α + β + γ = 1.
These are equivalent to 




 α + γ = 2/3,



β + γ = 1/2,

α + β + γ = 1,







α, β, γ ≥ 0.

87
This system can be easily solved directly to obtain
α = 1/2, β = 1/3, γ = 1/6.
Thus b is written as a convex combination of {u1 , u2 , u3 } as
1 1 1
b = u1 + u2 + u3 .
2 3 6
Definition 5.3. • A polyhedron P ⊂ Rn is the set of points that satisfy a finite

number of linear inequalities, i.e.,
P = {x : Ax ≤ b},
where A is an m × n matrix and b ∈ Rm .
• A bounded polyhedron is called a polytope.
Convex set
A set T is a convex set if x1 , x2 ∈ T implies that λx1 + (1 − λ)x2 ∈ T for all 0 ≤ λ ≤ 1.
W̌e may prove the following result
Proposition 5.4. (i) Any hyperplane H = {x : pT x = α} is convex.
(ii) The half-spaces {x : pT x ≥ α} and {x : pT x ≤ α} are convex.
Proposition 5.5. (i) The intersection of convex sets is convex.
(ii) Since each half-space is convex, and a polyhedron is the intersection of a finite
number of half-spaces, so a polyhedron is convex.
Proof of (i). Let S1 , ..., Sn be convex sets in Rn . Let

n
\
S= Si
i=1
be the intersection of Si ’s. We now prove that S is a convex set. Indeed, let x, y ∈ S.
We now prove that for any α ∈ (0, 1), we have x(α) = αx + (1 − α)y ∈ S. In fact, since
88
x, y ∈ S which is the intersection of Si ’s, the vectors x and y are in Si for all i = 1, ..., n.
By the convexity of Si , we must have
x(α) = αx + (1 − α)y ∈ Si , ∀i = 1, ..., n.
Thus x(α) is in the intersection of these sets, and hence x ∈ S.
Example 5.6. 1 . The set of solutions to the linear system of equations Ax = b is

convex (Why?)
2 . The set of solutions to the system Ax = b, x ≥ 0 is convex (Why?)
3 . Given a ∈ Rn and r > 0, the ball B = {x : kx − ak ≤ r} is a convex set. (Why?)
4 . Given a positive definite matrix Q and a vector y ∈ Rn , the ellipsoid
E = {x : (x − y)T Q(x − y) ≤ 1}
is a convex set. (Why?)
Cone and Convex Cone
A set C is called a cone if x ∈ C implies that λx ∈ C for all λ ≥ 0. A convex cone is

a convex set and cone.
Proposition 5.7. A polyhedron is a cone if and only if it can be represented as
P = {x : Ax ≤ 0}
for some matrix A.
Proof. If the set can be represented as P = {x : Ax ≤ 0} for some matrix A, then for any
x ∈ P we have αx ∈ P for any α ≥ 0. Thus, P is a cone.
Conversely, if P is a polyhedron, then there exists a matrix A and a vector b such

that P can be represented as
P = {x : Ax ≤ b}.
We now further assume that P is a cone. By the definition of cone, for any x ∈ P , it
must hold that αx ∈ P for any α ≥ 0. In particular, by taking α = 0, we see that 0 ∈ P ,
89
thus b ≥ 0. Let α > 0 be any sufficiently large number, it follows from A(αx) ≤ b that
Ax ≤ 0. Thus P ⊆ {x : Ax ≤ 0}. Since b ≥ 0, it further implies that
P ⊆ {x : Ax ≤ 0} ⊆ P,
so P = {x : Ax ≤ 0}.
Convex function
A function f of the vector (x1 , x2 , . . . , xn ) is said to be convex if the following inequality

holds for any two vectors x, y ∈ Rn :
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y), ∀λ ∈ [0, 1].
In 1-dimensional case, the foregoing inequality can be interpreted as follows: λf (x) +

(1 − λ)f (y) represents the height of the chord joining (x, f (x)) and (y, f (y)) at the point
λx + (1 − λ)y. The convexity means the height of the chord is at least as large as the
height of the function itself.
Proposition 5.8. The objective function cT x is convex.
Proof. This can be verified directly. Let x, y ∈ Rd and λ ∈ [0, 1], then
cT (λx + (1 − λ)y) = λcT x + (1 − λ)cT y.
Example 5.9. The function f (x) = x2 , x ∈ R, is convex. Indeed, let x, y ∈ R and

λ ∈ [0, 1]. We need to verify that
f (λx + (1 − λ)y) = (λx + (1 − λ)y)2 ≤ λf (x) + (1 − λ)f (y) = λx2 + (1 − λ)y 2 .
By subtracting the left-hand side from the right-hand side, the above equality is equivalent
to
λ(1 − λ)(x − y)2 ≥ 0,
which is true for all x, y ∈ R and λ ∈ [0, 1].
90
5.2. EXTREME POINTS, DIRECTIONS, REPRESENTATION OF POLYHEDRA
In summary,
• the feasible set of linear programming is a polyhedron,
• the feasible region of LPs is a convex set, and
• the objective of the LP is a convex function.
5.2 Extreme Points, Directions, Representation of Poly-

hedra
Definition 5.10 (Extreme point). A point x in a convex set X is called an extreme point
of X, if x cannot be represented as a strict convex combination of two distinct points in
X. In other words, if x = λx1 + (1 − λ)x2 with 0 < λ < 1 and x1 , x2 ∈ X then it must
have x = x1 = x2 .
Definition 5.11 (Rays). Let d 6= 0 be a vector and x0 be a point. The collection of points
of the form {x0 + λd : λ ≥ 0} is called a ray.
x0 is called the vertex of the ray, and d is the direction of the ray.
Definition 5.12 (Direction of a convex set). Given a convex set, a vector d 6= 0 is

called direction (or recession direction) of the set, if for each x0 in the set, the ray
{x0 + λd : λ ≥ 0} also belongs to the set.
Therefore, starting at any point x0 in the set, one can recede along d for any step
length λ ≥ 0 and remain within the set.
Example 5.13.
1. What is the direction of bounded convex set?
2. What is the direction of polyhedron X = {x : Ax ≤ b, x ≥ 0}?
3. What is the direction of polyhedron X = {x : Ax = b, x ≥ 0}?
Example 5.14. Consider the set
X = {(x1 , x2 )T : x1 − 2x2 ≥ −6, x1 − x2 ≥ −2, x1 ≥ 0, x2 ≥ 1}.
91
What is the direction of X? Ľet x0 = (x1 , x2 )T be an arbitrary fixed feasible point. Then
d = (d1 , d2 )T is a direction of X if and only if
(d1 , d2 )T 6= 0
and
x0 + λd = (x1 + λd1 , x2 + λd2 )T ∈ X, ∀λ ≥ 0.
Therefore,
x1 − 2x2 + λ(d1 − 2d2 ) ≥ −6,
x1 − x2 + λ(d1 − d2 ) ≥ −2,
x1 + λd1 ≥ 0,
x2 + λd2 ≥ 1,
for all λ ≥ 0. Since the last two inequalities must hold for fixed x1 and x2 and for all
λ ≥ 0, we conclude that d1 , d2 ≥ 0. Similarly, from the first two inequalities above, we
conclude that
d1 − 2d2 ≥ 0, d1 − d2 ≥ 0.
Since d1 , d2 ≥ 0, from d1 ≥ 2d2 it implies that d1 ≥ d2 . Therefore, (d1 , d2 )T is a direction

of X if and only if it satisfies the following conditions:
(d1 , d2 )T 6= 0,
d1 ≥ 0, d2 ≥ 0, d1 ≥ 2d2 .
Definition 5.15 (Extreme direction of a convex set). An extreme direction of a convex

set is a direction of the set that cannot be represented as a positive combination of two
distinct directions of the set.
Two directions d1 and d2 are said to be distinct if d1 cannot be represented as a

positive multiple of d2 , i.e., d1 6= αd2 for any α ≥ 0.
Example 5.16. 1. What is the extreme direction of the set X in Example 5.14?
2. What is the extreme direction of the first orthant {x : x ≥ 0}?
92
Theorem 5.17. A polyhedron
P = {x ∈ Rn : Ax ≤ b} 6= ∅
has a finite number of extreme points and extreme rays.
Proof. This theorem can be proved directly using convex analysis. However, that ap-
proach is beyond the scope of this course. Instead, as will be seen in Theorem 7.5, the
set of extreme points of a polyhedron is the same as the set of basic feasible solution of
a linear programming problem defined on the polyhedron. Thus the number of extreme
points is the same as the number of basic feasible solution, which is finite (cf. Section
7.1.4).
Theorem 5.18. Let P be a polyhedron. If the LP problem
min{cT x : x ∈ P }
has a finite optimal solution, then there is an optimal solution that is an extreme point.
Proof. Similarly as Theorem 5.17, this theorem can also be proved directly using convex
analysis. However, it can be seen as a direct consequences of Theorem 7.5 and Theorem
7.6 in Chapter 7.
How to find an extreme direction?
Finding an extreme direction of a polyhedron can be converted into finding an extreme

point of certain polyhedron.
1). Consider the set

{x : Ax ≤ b, x ≥ 0}.
The direction of the set is characterized by the following conditions:
d ≥ 0, d 6= 0, and Ad ≤ 0.
This defines a polyhedral cone, called recession cone.
To eliminate duplication, these directions may be normalized, e.g. eT d = 1. Then the

set of recession directions of X can be given by
D = {d : Ad ≤ 0, eT d = 1, d ≥ 0}.
93
Therefore, the extreme directions of X are precisely the extreme points of D.
2). Similarly, we may prove that the extreme directions of the nonempty polyhedral
set
X = {x : Ax = b, x ≥ 0}
correspond to extreme points of the following set:
D = {d : Ad = 0, eT d = 1, d ≥ 0}.
Representation of a Polyhedron The following theorem provides an important

representation of a polyhedron in terms of extremes points and extreme rays. The proof
is beyond the scope of the lecture notes and hence is omitted.
Theorem 5.19 (Minkowski’s Theorem). If P = {x ∈ Rn : Ax ≤ b} is nonempty and

rank (A) = n, then
n k
X l
X k
X
j i
P = x: x= λj x + µi d , λj = 1,
j=1 i=1 j=1
o
λj ≥ 0, j = 1, ..., k, µi ≥ 0, i = 1, ..., l ,
where {x1 , x2 , ..., xk } is the set of extreme points of P and {d1 , d2 , ..., dl } is the set of
extreme rays of P .
Minkowski’s theorem will be used in Chapter 7 to prove the existence of an optimal

extreme point for a linear programming problem under certain conditions.
94
Chapter 6
Duality theory
In this chapter we will study an important topic of LP, namely duality theory. Duality
in linear programming is essentially a unifying theory that develops the relationships
between a given LP problem (the primal problem) and another related LP problem (the
dual problem). We also study the optimality condition, i.e., the so-called Karush-Kuhn-
Tucker (KKT) optimality condition, for LP problems. The duality theory and optimality
conditions form the backbone of linear and nonlinear optimization problems.
The following topics will be covered
• definition of a dual problem,
• weak duality theorem,
• strong duality theorem,
• optimality conditions,
• complementary slackness condition,
• some further applications of duality theory.
• formulate a dual problem of a LP,
• understand weak and strong duality theorems,
• understand the derivation of optimality conditions from duality theorems,
95
6.1. MOTIVATION AND DERIVATION OF DUAL LP PROBLEMS
• write down optimality conditions for a LP problem.
[BT97, Chapter 5] and [DT97, Chapter 5] contain relevant topics to this chapter that you
may find helpful.
6.1 Motivation and derivation of dual LP problems
Let us consider a common situation in economics: minimising cost vs. maximising

profit.
Example 6.1 (minimising cost vs. maximising profit). There are two products, A and
B, that are made up from two elements, e and f . Product A contains 5 units of element
e and 2 units of element f and can be sold with price 16 Yuan, while product B contains
3 units of element e and 4 units of element f and can be sold with price 10 Yuan.
Consumption/customer’s perspective: A customer wants to buy some product A and

B. To fulfill his/her task, the customer needs at least 100 units of element e and 60 units
of element f . How many (units of) product A and B should the customer buy in order to
minimise the cost?
Let x and y be the units of product A and B that the customer buys. The above
problem can be formulated into a minimising LP problem as follows
Minimise 16x + 10y (the total cost)
subject to
5x + 3y ≥ 100 (the total units of element e),
2x + 4y ≥ 60 (the total units of element f ),
x, y ≥ 0 (the customer needs both products).
This LP can be written in a matrix form as follows
Minimise cT x̄,
subject to 
Ax̄ ≥ b,

x̄ ≥ 0,

96
6.1. MOTIVATION AND DERIVATION OF DUAL LP PROBLEMS
where        
x 5 3 100 16
x̄ =   , A= , b =  , c =  .
y 2 4 60 10
Production’s perspective. Now the supplier wants to assign price for each unit of
element e and f so that the customer’s need can be met and the profit is maximal. Let u
and v be the price that the supplier should assign to each unit of element e and f . His/her
task can be formulated into the following maximising LP problem
Maximise 100u + 60v (the total profit)
subject to
5u + 2v ≤ 16 (the price for product A should not exceed 16 Yuan),
3u + 4v ≤ 10 (the price for product B should not exceed 10),
u, v ≥ 0 (the prices for each unit of each element).
We also can write this LP in a matrix form using the notations above as
Maximise bT ȳ,
subject to 
AT ȳ ≤ c,

ȳ ≥ 0,

where ȳ = (u, v)T . These are two dual problems.
Suppose now in Example above we only know the minimization problem. How do we
attempt to solve this? Since we are looking for a minimum value of the objective function,
we first need to find a lower bound for it. To do so, we combine the constraints to obtain
a lower bound of the type
u(5x + 3y) + v(2x + 4y) = (5u + 2v)x + (3u + 4v)y ≥ 100u + 60v, (6.1)
by multiplying the first and second constraints by u ≥ 0 and v ≥ 0 respectively then

adding them up. To ensure that the newly obtained estimate provides a lower bound for
the objective function, we require that
5u + 2v ≤ 16 and 3u + 4v ≤ 10.
97
6.2. PRIMAL AND DUAL PROBLEMS OF LPS
Finally to achieve the best optimal lower bound for the objective function, we should
maximise the lower bound in (6.1)
maximise 100u + 60v.
Bringing all steps together, we end up with the following problem
maximise 100u + 60v.
subject to 
5u + 2v ≤ 16,






 3u + 4v ≤ 10,



u, v ≥ 0

This is exactly the problem from the production’s perspective in Example 6.1.
6.2 Primal and dual problems of LPs
Duality deals with pairs of LPs and the relationship between their solutions. One
problem is called primal and the other the dual.
It does not matter which problem is called the primal since the dual of the dual is the
primal (see Proposition 6.5 below).
Let A be an m × n matrix, b ∈ Rm and c ∈ Rn . We define the standard LP as the

primal problem.
Primal Problem:
(LP ) Minimize cT x
s.t. Ax = b, x ≥ 0.
Its dual problem is defined by
Dual problem:
(DP ) Maximize bT y
s.t. AT y ≤ c,
where y ∈ Rm .
98
By introducing a slack vector, the dual problem can be equivalently written as
(DP ) Maximize bT y
s.t. AT y + s = c, s ≥ 0
where s ∈ Rn is called the dual slack variable.
Example 6.2. Find the dual problem to the LP
Minimize 6x1 + 8x2
s.t. 3x1 + x2 − x3 = 4,
5x1 + 2x2 − x4 = 7,
x1 , x2 , x3 , x4 ≥ 0.
By the definition, the dual problem of the above LP is
Maximize 4w1 + 7w2
s.t. 3w1 + 5w2 ≤ 6
w1 + 2w2 ≤ 8
−w1 ≤ 0
−w2 ≤ 0.
When an LP is not given in standard form, we may first convert them to the standard
form, and then write down their dual problems.
Minimize 6x1 + 8x2
s.t. 3x1 + x2 − x3 ≥ 4,
5x1 + 2x2 − x4 ≤ 7,
x1 , x2 , x3 ≥ 0, x4 is free.
Proposition 6.4. If the primal problem is given by (canonical form)
(LP ) Minimize cT x
s.t. Ax ≥ b, x ≥ 0,
99
then its dual problem is
(DP ) Maximize bT y
s.t. AT y ≤ c, y ≥ 0.
Minimize 6x1 + 8x2
s.t. 3x1 + x2 − x3 ≥ 4,
5x1 + 2x2 − x4 ≤ 7,
x1 , x2 , , x3 , x4 ≥ 0.
Proposition 6.6. The dual of the dual is the primal.
Proof. Suppose that the following standard LP is the primal problem:
(LP ) Minimize cT x
s.t. Ax = b, x ≥ 0.
According to the definition, the dual problem of the LP is given by
(DP ) Maximize bT y
s.t. AT y + s = c, s ≥ 0.
We now consider the dual of (DP). First we transform this problem as the standard form.
Thus we have
(DP ) Minimize −bT y 0 + bT y 00
s.t. AT y 0 − AT y 00 + s = c, s ≥ 0, y 0 ≥ 0, y 00 ≥ 0.
This can be written as

 
0
y
 
(DP ) Minimize (−b , b , 0 )  y 
T T T 00
 
 
s
   
0 0
y y
   
s.t. (AT , −AT , I)  y 00  = c,  y 00  ≥ 0.
   
   
s s
100
6.3. DUALITY THEOREM
The dual of this standard form is given by
(DDP ) Maximize cT z
 
−b
 
s.t. T T T
(A , −A , I) z ≤  b  ,
 
 
0
which can be further written as
(DDP ) Maximize cT z
s.t. Az ≤ −b, −Az ≤ b, z ≤ 0.
It is equivalent to
(DDP ) Minimize −cT z
s.t. −Az = b, z ≤ 0.
Setting w = −z, the above problem becomes
(DDP ) Minimize cT w
s.t. Aw = b, w ≥ 0.
This is the primal problem.
6.3 Duality theorem
Denote by Fp and Fd the feasible regions of primal and dual problems, respectively,
i.e.
Fp = {x : Ax = b, x ≥ 0},
Fd = {(y, s) : AT y + s = c, s ≥ 0}.
The following result shows that the objective value at any primal feasible solution is at
least as large as the objective value at any feasible dual solution.
Theorem 6.7 (Weak duality Theorem). Let x be any feasible point of the primal problem,
i.e. x ∈ Fp and y be any feasible point of the dual problem, i.e., (y, s) ∈ Fd . Then
cT x ≥ bT y.
101
Proof. For any feasible point x ∈ Fd and (y, s) ∈ Fd , we have Ax = b and x ≥ 0, and
s = c − AT y ≥ 0, and thus
cT x − bT y = cT x − (Ax)T y = xT (c − AT y) = xT s ≥ 0.
The last inequality above follows from the fact that (x, s) ≥ 0.
The quantity cT x − bT y is often called the duality gap. From the weak duality
theorem we have the following immediate consequences:
Corollary 6.8. If the primal LP is unbounded (i.e., cT x → −∞) , then the dual LP must
be infeasible.
Corollary 6.9. If the dual LP is unbounded, then the primal LP must be infeasible.
Thus, if either problem has an unbounded objective value, then the other problem
possesses no feasible solution.
Corollary 6.10. If x is feasible to the primal LP, y is feasible to the dual LP, and
cT x = bT y, then x must be optimal to the primal LP and y must be optimal to the dual
LP.
Unboundedness in one problem implies infeasibility in the other problem.
• Is this property symmetric?
• Does infeasibility in one problem imply unboundedness in the other?
The answer is “not necessarily”.
Example 6.11. Consider the primal problem
Minimize −x1 − x2
s.t. x1 − x2 ≥ 1,
−x1 + x2 ≥ 1,
x1 , x2 ≥ 0.
102
Its dual problem is given by
Maximize w1 + w2
s.t. w1 − w2 ≤ −1,
−w1 + w2 ≤ −1,
w1 , w2 ≥ 0.
Both problems are infeasible.
The last corollary above identifies a sufficient condition for the optimality of a primal-
dual pair of feasible solutions, namely that their objective values coincide. One natural
question to ask is whether this is a necessary condition. The answer is yes, as shown by
the next result.
Theorem 6.12 (Strong Duality Theorem). If both the primal LP and the dual LP have
feasible solutions, then they both have optimal solutions, and for any primal optimal
solution x and dual optimal solution y we have that cT x = bT y.
Proof. As will be seen in Chapter 11 one can solve both the primal and the dual problem
simultaneously using the simplex method.
Therefore, for LPs, exactly one of the following statements is true
• Both problems have optimal solutions x∗ and (y ∗ , s∗ ) with cT x∗ = bT y ∗ .
• One problem has unbounded objective value, in which case the other problem must
be infeasible.
• Both problems are infeasible.
In Summary,
P optimal ⇔ D optimal
P unbounded ⇒ D infeasible
103
6.4. NETWORK FLOW PROBLEMS
D unbounded ⇒ P infeasible
P infeasible ⇒ D unbounded or infeasible
D infeasible ⇒ P unbounded or infeasible
6.4 Network flow problems
A flow network (also known as a transportation network) is a directed graph where

each edge has a capacity and each edge receives a flow. The amount of flow on an edge
cannot exceed the capacity of the edge. Often in operations research, a directed graph is
called a network, the vertices are called nodes and the edges are called arcs. A flow must
satisfy the restriction that the amount of flow into a node equals the amount of flow out
of it, unless it is a source, which has only outgoing flow, or sink, which has only incoming
flow. A network can be used to model traffic in a computer network, circulation with
demands, fluids in pipes, currents in an electrical circuit, or anything similar in which
something travels through a network of nodes.
Definition 6.13. A network N = (V, E) is a directed graph where V and E are respectively
the set of vertices (nodes) and edges (arcs). Given two vertices i, j ∈ V we denote the
edge between them by (i, j).
Definition 6.14. The capacity of an edge is the maximum amount of flow that can pass
through an edge in both directions. Formally it is a map c : E → R+ , called the capacity
map.
In most networks there are one or more nodes which are distinguished. A source is a
node such that all edges through it are oriented away from it. Similarly a sink is a node
such that all edges through it are oriented towards it.
Given a function g on the edges g : E → R, denote by gij the value of g on (i, j) ∈ E.
Definition 6.15. A flow is a map f : E → R+ that satisfies the following constraints:
• capacity constraint: The flow of an edge cannot exceed its capacity, in other words:
fij ≤ cij , for each (i, j) ∈ E;
104
• conservation of flows: The sum of the flows entering a node must equal the sum
of the flows exiting that node, except for the sources and the sinks:
P
i:(i,j)∈E fij =
k:(jk)∈E fjk , for each v ∈ V \ {S, T }.

P
In brief: flows generate at sources and terminate at sinks.
Example 6.16 (The max-flow problem). Let N = (V, E) be a directed network with only
one source S ∈ V , only one sink T ∈ V and a set of intermediate nodes. Assume the flow
is never negative, namely if along an edge (i, j) the flow fij is strictly positive, then the
flow fji in the opposite direction is 0. The maximal flow problem is to maximise the flow
from source to sink subject to the network edge capacities and conservation of flow:
Maximise
X
fSj
i:(S,j)∈E
subject to fij ≤ cij , for each (i, j) ∈ E,
fjk , for each j ∈ V \ {S, T }

X X
fij =
i:(i,j)∈E k:(j,k)∈E
fij ≥ 0 for each (i, j) ∈ E.
In the context of flow analysis, there is only an interest in considering how units are
transferred between source and sink. Namely, we can re-write the above linear problem by
considering the whole paths from S to T . Let P be the set of all possible paths from S to
T . We associate with each path p ∈ P a quantity xp specifying how much of the flow from
S to T is being transferred along the path. With this view, the conservation constraint
above is unnecessary and we obtain the following simpler linear programming problem
Maximise
X
xp
p∈P
subject to xp ≤ cij , for each (i, j) ∈ E,

X
p∈P :(i,j)∈p
xp ≥ 0, for all p ∈ P.
Example 6.17 (The min-cut problem). Let N = (V, E) be a directed network with only
one source S ∈ V , only one sink T ∈ V and a set of intermediate nodes. Let yij denote
the weight to assign to an edge (i, j). The problem is to separate S and T with minimum
total weighted capacity. The condition that source and sink are separated can be stated as
follows: for every path p ∈ P from the source S to the sink T , the weight of the path is at
105
least 1.
Minimise
X
cij yij
(i,j)∈E
subject to
X
yij ≥ 1 ∀p ∈ P
(i,j)∈p
yij ≥ 0 (i, j) ∈ E.
This is the dual problem to the max-flow problem.
The max-flow min-cut theorem is a special case of the duality theorem. It states that
in a flow network, the maximum amount of flow passing from the source to the sink is
equal to the total weight of the edges in the minimum cut. The max-flow min-cut theorem
has many practical applications in networks, transportation and scheduling.
Example 6.18 (Example of the max-flow min-cut theorem). We consider the following
network
Figure 6.1: Example of the max-flow min-cut theorem. The network consists of 5 nodes
{S, A, B, C, T } including the source S and the sink T . The arrow indicates the direction
of flow in each edge, and the number beside each edge is the capacity of that edge.
In this example
• the network is N = (V, E) where
• V = {vertices (nodes)} = {S, A, B, C, T },
• E = {edges} = {(S, A), (S, B), (A, C), (A, T ), (B, T ), (C, T )}.
106
Recall that the max-flow problem is
Maximise
X
xp
p∈P
subject to xp ≤ cij , for each (i, j) ∈ E,

X
p∈P :(i,j)∈p
xp ≥ 0, for all p ∈ P.
In the above problem, P denotes the set of all paths from the source S to the sink T . In
this example, there are three paths from S to T
p1 = {S, A, C, T }, p2 = {S, A, T } and p3 = {S, B, T }.
Let x1 , x2 , x3 be the flow transferred along the corresponding path. The max-flow problem
for this example is
Maximise x1 + x2 + x3
subject to x1 + x2 ≤ 2 (for (S,A)),
x1 ≤ 3 (for (A,C)),
x2 ≤ 4 (for (A,T)),
x3 ≤ 7 (for (S,B)),
x3 ≤ 6 (for (B,T)),
x1 ≤ 5 (for (C,T)),
x1 , x2 , x3 ≥ 0.
From the constrains we have
x1 + x2 + x3 = (x1 + x2 ) + x3 ≤ 2 + 6 = 8
where the equality is obtained for instance when x1 = x2 = 1 and x3 = 6.
Recall that the dual problem to the general max-flow problem is
Minimise
X
cij yij
(i,j)∈E
subject to
X
yij ≥ 1 ∀p ∈ P
(i,j)∈p
yij ≥ 0 (i, j) ∈ E.
107
6.5. OPTIMALITY CONDITIONS & COMPLEMENTARY SLACKNESS
In this example, the dual problem is given by
Minimise 2y1 + 7y2 + 3y3 + 4y4 + 6y5 + 5y6
subject to y1 + y3 + y6 ≥ 1 (for p1 ),
y1 + y4 ≥ 1 (for p2 ),
y2 + y5 ≥ 1 (for p3 ),
y1 , y2 , y3 , y4 , y5 , y6 ≥ 0.
From the constrains we have
2y1 +7y2 +3y3 +4y4 +6y5 +5y6 = (y1 +y3 +y6 )+(y1 +y4 )+6(y2 +y5 )+y2 +2y3 +3y4 +4y6 ≥ 8,
where the equality is obtained when y1 = y5 = 1 and y2 = y3 = y4 = y6 = 0.
We now interpret the result from the min-cut problem. Recalling that a cut is simply a
partition of the nodes V into two disjoint parts V = V1 ∪ V2 such that S ∈ V1 and T ∈ V2 .
The capacity of a cut is the sum of the capacity of edges going out the cut. Examples of
cuts with their capacities are
• {S, A} ∪ {B, C, T } with capacity 14 = 3 + 4 + 7,
• {S, A, B} ∪ {C, T } with capacity 13 = 3 + 4 + 6,
• {S, A, C} ∪ {B, T } with capacity 16 = 4 + 5 + 7, etc.
The cut with minimum capacity is {S, B} ∪ {A, C, T } whose capacity is 8 = 2 + 6 which
is the same as the optimal max flow.
6.5 Optimality Conditions & Complementary slack-

ness
The optimality condition can be obtained by the strong duality theorem from the last
lecture.
The strong duality theorem provides the condition to identify optimal solutions:
x ∈ Rn is an optimal solution of LP if and only if it satisfies the following three

conditions:
108
• x is feasible to the primal, i.e., Ax = b, x ≥ 0.
• There exists a y ∈ Rm such that y is feasible to the dual, i.e., AT y ≤ c.
• There is no duality gap, i.e., cT x = bT y.
Thus we have the following theorem.
Theorem 6.19 (Optimality Condition). Consider the standard LP problem:
Minimize cT x
s.t. Ax = b, x ≥ 0,
where b ∈ Rm , c ∈ Rn , A ∈ Rm×n . Then x̄ is an optimal solution to the LP if and only if

there exists ȳ such that the following three conditions hold
AT ȳ ≤ c, (6.2)
Ax̄ = b, x̄ ≥ 0, (6.3)
bT ȳ = cT x̄. (6.4)
Remark. Similarly, the optimality condition for the duality problem is the same as
above.
Therefore, by solving an optimality conditions, we can obtain the optimal solutions

for primal and dual problems at the same time.
Proposition 6.20 (Reformulation of optimality conditions). The optimality conditions

(6.2) - (6.4), i.e.,
AT ȳ ≤ c
Ax̄ = b, x̄ ≥ 0,
bT ȳ = cT x̄
can be restated as follows
AT ȳ + s̄ = c, s̄ ≥ 0
Ax̄ = b, x̄ ≥ 0,
x̄T s̄ = 0.
109
Proof. First, adding the slack variable vector s̄ to inequality (6.2), we have
AT ȳ + s̄ = c, s̄ ≥ 0
Since b = Ax̄, we have
cT x̄ − bT ȳ = cT x̄ − (Ax̄)T ȳ = cT x̄ − x̄T (AT ȳ) = cT x̄ − x̄T (c − s̄) = x̄T s̄.
Thus, the third condition (6.4) is equivalent to
x̄T s̄ = 0.
As a result, we may restate Theorem 6.19 as follows.
Theorem 6.21 (Optimality Condition). Consider the standard LP problem
Minimize cT x
s.t. Ax = b, x ≥ 0
where b ∈ Rm , c ∈ Rn , A ∈ Rm×n . Then x̄ is an optimal solution to the LP if and only if

there exists ȳ such that the following three conditions hold
AT ȳ + s̄ = c, s̄ ≥ 0,
Ax̄ = b, x̄ ≥ 0,
x̄T s̄ = 0.
Example 6.22. What is the optimality condition of the following LP?
Minimize −2x1 − x2
s.t. x1 + 2x2 ≥ 15,
−x1 − x2 ≥ −20,
x1 , x2 ≥ 0.
Verify that whether (15,0) and (20, 0) are optimal solutions to the problem.
Outline of the solution:
110
• Introduce slack variables x3 and x4 and transform the problem to the standard form
Minimize −2x1 − x2
s.t. x1 + 2x2 − x3 = 15,
−x1 − x2 − x4 = −20,
x1 , x2 , x3 , x4 ≥ 0.
• The optimality conditions for this LP are given as follows:
x1 + 2x2 − x3 = 15,
−x1 − x2 − x4 = −20,
x1 , x2 , x3 , x4 ≥ 0,
y1 − y2 ≤ −2,
2y1 − y2 ≤ −1,
−y1 ≤ 0, −y2 ≤ 0,
−2x1 − x2 = 15y1 − 20y2 .
• Consider the point (x1 , x2 ) = (15, 0). From the first two equalities above, we see that
(x3 , x4 ) = (0, 5) satisfies the first three conditions. Thus at x = (15, 0, 0, 5)T the
above optimality conditions are reduced to
y1 − y2 ≤ −2
2y1 − y2 ≤ −1
−y1 ≤ 0, −y2 ≤ 0
−30 = 15y1 − 20y2 .
It is easy to very that this system is inconsistent (there exists no y satisfies these
conditions). Thus, (x1 , x2 ) = (15, 0) is not an optimal solution to the LP.
• Consider the point (x1 , x2 ) = (20, 0). Clearly, this point, together with (x3 , x4 ) =
(5, 0), satisfies the first three conditions of the optimality. We now check if there
111
6.6. COMPLEMENTARY SLACKNESS CONDITION
exists a y satisfying the remaining conditions, i.e.,
y1 − y2 ≤ −2
2y1 − y2 ≤ −1
−y1 ≤ 0, −y2 ≤ 0
−40 = 15y1 − 20y2 .
Clearly, (y1 , y2 ) = (0, 2) satisfies these conditions, thus the point (x1 , x2 ) = (20, 0)
is an optimal solution to the LP problem.
6.6 Complementary slackness condition
From the optimality condition (see Theorem 6.21), we note that at the optimal so-
lution x∗ and (y ∗ , s∗ ), we have (x∗ )T s∗ = 0, which is equivalent to x∗j s∗j = 0 for every
j = 1, ..., n since the nonnegativity of x∗ and s∗ . Thus, at the optimal solution, one of the
terms x∗j and s∗j must be zero. This is called complementary slackness condition.
Consider the LP
Minimize cT x
s.t. Ax = b, x ≥ 0,
where b ∈ Rm , c ∈ Rn , A ∈ Rm×n . From Theorem 7.2, x̄ is an optimal solution to LP if

and only if there exists ȳ such that the following three conditions hold
AT ȳ + s̄ = c, s̄ ≥ 0. (Dual feasibility)
Ax̄ = b, x̄ ≥ 0. (Primal feasibility)
x̄T s̄ = 0. (Complementary slackness condition)
So we may restate Theorem 6.21 again as follows.
Theorem 6.23. x and (y, s) are optimal solutions to primal and dual problems, respec-
tively, if and only if they satisfy the primal feasibility, dual feasibility and the complemen-
tary slackness condition.
112
6.6. COMPLEMENTARY SLACKNESS CONDITION
Example 6.24. Consider the linear program
min 5x1 + 12x2 + 4x3

s.t. x1 + 2x2 + x3 = 10,
2x1 − x2 + 3x3 = 8,
x1 , x2 , x3 ≥ 0.
You are given the information that x2 and x3 are strictly positive in the optimal solution.
Use the complementary slackness conditions to solve the dual problem.
Solution: The dual problem is
max 10y1 + 8y2

s.t. y1 + 2y2 ≤ 5,
2y1 − y2 ≤ 12,
y1 + 3y2 ≤ 4.
Since x2 and x3 are positive in the optimal solutions, the corresponding constraints in
the dual problem become equalities. Thus y = (y1 , y2 ) is an optimal solution to the dual
problem if and only if it satisfies the first constraint and the following system
2y1 − y2 = 12 and y1 + 3y2 = 4.
Solving this system gives y1 = 40

7
, y2 = − 47 , which satisfies the first constraint since
32
y1 + 2y2 = < 5.
7
Thus it is an optimal solution to the dual problem.
Note that we can also use this information to find the optimal solution to the primal
problem. In fact, since the first constraint in the dual problem is an inequality, by the
complementary slackness conditions, the first variable in the primal problem is zero, that
is x1 = 0. Substituting this to the primal constraints, we obtain x2 = 22
7
, x3 = 26
7
, which
are nonnegative. Also we can verify that the objective values coincide:
368
10y1 + 8y2 = 5x1 + 12x2 + 4x3 = .
7
Thus y = (40/7, −4/7) and x = (0, 22/7, 26/7) are respectively optimal solutions to the
primal and the dual problems.
113
6.7. MORE APPLICATIONS OF DUALITY THEORY
A solution (x∗ , s∗ ) is called a strictly complementary solution if it satisfies that (x∗ )T s∗ =

0 and x∗ + s∗ > 0.. The following result shows that an LP always has such a solution.
Theorem 6.25 (Strict Complementarity). If primal and dual problems both have feasible
solutions, then both problems have a pair of strictly complementary solutions x∗ ≥ 0 and
s∗ ≥ 0 with
(x∗ )T s∗ = 0, x∗ + s∗ > 0.
Moveover, the supports
I ∗ = {j : x∗j > 0} and J ∗ = {j : s∗j > 0}
are invariant for all pairs of strictly complementary solutions.
The proof of this theorem can be found in [Gre94] where it is proved that a primal-
dual pair of optimal solutions is unique if and only if it is a strictly complementary pair
of basic solutions, see also Remark 5 in Chapter 8 for more discussions.
6.7 More Applications of Duality Theory
Example 6.26. Solve the following LP
Minimize x1 + x2 + 6x3
s.t. −x1 + x2 + x3 ≥ 2,
x1 − 2x2 + 2x3 ≥ 6,
x1 , x2 , x3 ≥ 0.
Example 6.27. Using duality theory to prove the Farkas’s Lemma: Only one of the
following systems has a solution (feasible)
• (System 1) Ax = b, x ≥ 0.
• (System 2) AT y ≤ 0, bT y > 0.
Proof. If System 1 has a solution, then for any y such that AT y ≤ 0 we have
bT y = (Ax)T y = xT (AT y) ≤ 0.
114
6.7. MORE APPLICATIONS OF DUALITY THEORY
The inequality follows from the fact that AT y ≤ 0 and x ≥ 0. So, the System 2 is
impossible to have a solution.
Suppose that the System 1 has no solution, then the following LP
Minimize 0T x
s.t. Ax = b, x ≥ 0
is infeasible. According to the duality theorem, its dual problem which is given as
Maximize bT y
s.t. AT y ≤ 0
is either unbounded or infeasible. However, this dual problem is feasible (y = 0 a feasible

point), so the dual problem must be unbounded. As a result, there exists a y such that
AT y ≤ 0, bT y > 0 , and the System 2 has a solution.
115
Chapter 7
The Simplex Method (I): Theory
How to solve a LP problem? George B. Dantzig introduced the simplex method

for solving LPs in 1947. The method has been selected by the editors of Computing
in Science & Engineering / CiSE (January/February 2000, pp. 22-23, vol. 2) as the
second most significant algorithm in the 20-th century. In this chapter, we start to learn
this important method. This chapter will introduce essential concepts and develops the
theory underlying the simplex method. The implementation and analysis of method will
be discussed in subsequent chapters. The following topics will be covered
• review of Linear Algebra,
• basic feasible solutions (BFS),
• reformulation of a LP by the current BFS,
• termination of the iteration and declaration of optimality.
• compute basic feasible solutions,
• formulate a LP by the current BFS,
• understand the principles of the simplex method.
[BT97, Chapter 3] and [DT97, Chapter 3] contain relevant material to this chapter that
you may find useful.
116
7.1. BASIC FEASIBLE SOLUTION
7.1 Basic Feasible Solution
7.1.1 Review of Linear Algebra
For matrices, we may perform some elementary row operations. These operations are
most helpful in solving a system of linear equations and in finding the inverse of a matrix.
Elementary row operations:
An elementary row operation on a matrix A is one of the following operations:
1. Row i and row j are interchanged.
2. Row i is multiplied by a nonzero scalar α.
3. Row i is replaced by row i plus α times row j.
Elementary row operation on a matrix is equivalent to pre-multiplying A by a specific

matrix. Elementary column operation on a matrix is equivalent to post-multiplying A by
a specific matrix.
Solving a system of linear equations:
Theorem 7.1. Ax = b if and only if A0 x = b0 , where (A0 , b0 ) is obtained from (A, b)

by a finite number of elementary row operations. (Any elementary row operations never
change the solution of the system of linear equations)
By row operations, A0 can be upper triangular, and then we can solve the system by
back substitution. The process of reducing A to an upper triangular matrix with ones on
the diagonal is called Gaussian Reduction of the system, by further reduction, we may
reduce A to the identity matrix, the process is called a Gauss-Jordan reduction of the
system.
Calculation of inverse: By elementary row operations, reducing (A I) to (I B),

then B = A−1 .
7.1.2 Existence of the Optimal Extreme Point
Still, we consider the standard form of LP:
min{cT x : Ax = b, x ≥ 0},
117
for which we have the following result
Theorem 7.2 (Existence of Optimal Extreme Point). Assume that the feasible region is
nonempty. Then
• an (finite) optimal solution exists if and only if cT dj ≥ 0 for j = 1, ..., l, where

d1 , ..., dl are the extreme directions of the feasible region.
• Otherwise, if there is an extreme direction such that cT dj < 0, then the optimal
objective value of LP is unbounded.
• If an optimal solution exists, then at least one extreme point is optimal.
Proof. Let x1 , x2 , ..., xk be the extreme points of the constraint set. And let d1 , ..., dl be
the extreme directions. By Minkoski’s theorem, any feasible point x can be represented
as
k
X l
X
j
x= λj x + µj dj ,
j=1 j=1
where
k
X
λj = 1, λj ≥ 0, j = 1, ..., k,
j=1
µj ≥ 0, j = 1, ..., l.
Thus, LP can be transformed into the following problem in variables λ and µ:

k l
Minimize
X X
T j
(c x )λj + (cT dj )µj
j=1 j=1
k
s.t.
X
λj = 1, λj ≥ 0, j = 1, ..., k,
j=1
µj ≥ 0, j = 1, ..., l.
Since the µj ’s can be made arbitrarily large, the minimum is −∞ if cT dj < 0 for some j.
Thus, the necessary condition for the existence of the optimal solution is
cT dj ≥ 0, ∀j = 1, ..., l.
If this condition holds, we can show that the optimal solution exists and attains at an
extreme point. In fact, if cT dj ≥ 0 for j ≤ l, in order to minimize the objective function,
µj should be taken to be zero for all j. In order to minimize the first term of the objective,
we simply find the minimum cT xj , say cT xp , and let λp = 1, and all other λj ’s equal to
zero.
118
• In summary, the optimal solution of the linear programming is finite if and only if
cT dj ≥ 0 for all extreme directions.
• Furthermore, if this is the case, then we find the minimum point by picking the
minimum objective value among all extreme points.
• This shows that if an optimal solution exists, we must be able to find an optimal
extreme point.
7.1.3 Basic Feasible Solution
To solve a linear programming algebraically, we need to address the following several

issues.
• How to identify (recognize) an extreme point?
• If the current extreme point is not optimal, how to move from it to the next better
extreme point?
• How to terminate the algorithm and declare optimality?
To address the first issue, we need the concept of Basic Feasible Solution (or basic
feasible point).
Definition 7.3 (Basic Feasible Solution). Consider the system
Ax = b, x ≥ 0.
where A is an m × n and b ∈ Rm . Suppose A is full rank, i.e., Rank(A) = m. After

possibly rearranging the columns of A, let
A = [B, N ],
where
 B is an m × m invertible matrix and N is an m × (n − m) matrix. Denote
xB
x=  . Then the system become
xN
 
xB
[B, N ]   = b,
xN
119
i.e.,
BxB + N xN = b.
Note that
xB = B −1 b, xN = 0
is a solution to the above system. This solution is called a basic solution. Thus a basic
solution has at least n−m zero variables. In addition, if x is a basis solution then columns
of A corresponding to non-zero variables of x are linearly independent.
• B —— the basic matrix (or simply basis),
• N —— the nonbasic matrix (or simply nonbasis).
• xB —– basic variable
• xN —– nonbasic variable
• If xB ≥ 0, (B −1 b, 0) is called a basic feasible solution of the system.
• If xB = B −1 b > 0, then x = (B −1 b, 0) is called a non-degenerate basic feasible

solution, otherwise degenerate basic feasible solution.
Example 7.4. Consider the polyhedral set:
x1 + x2 + x3 = 6,
x2 + x4 = 3,
x1 , x2 , x3 , x4 ≥ 0.
The constraint matrix

 
1 1 1 0
A = [a1 , a2 , a3 , a4 ] =  .
0 1 0 1
The basic feasible solution corresponds to finding 2 × 2 invertible submatrix B. The

following are possible ways of extracting B out of A.
         
1 1 x1 3 x3 0
1. B = [a1 , a2 ] =   , xB =   = B −1 b =   , xN =  = .
0 1 x2 3 x4 0
120
         
1 0 x1 6 x2 0
2. B = [a1 , a4 ] =   , xB =   = B −1 b =   , xN =  = .
0 1 x4 3 x3 0
         
1 1 x2 3 x1 0
3. B = [a2 , a3 ] =   , xB =   = B −1 b =   , xN =  = .
1 0 x3 3 x4 0
       
1 0 x2 6 x1
4. B = [a2 , a4 ] =   , xB =   = B −1 b =   , xN =   =
1 1 x4 −3 x3
 
0
 .
0
         
1 0 x3 6 x1 0
5. B = [a3 , a4 ] =   , xB =   = B −1 b =   , xN =  = .
0 1 x4 3 x2 0
Note that the points corresponding to 1,2,3, and 5 are basic feasible solutions. The
point obtained in 4 is not a basic feasible solution because it violates the nonnegativity
restrictions. In other words, we have four basic feasible solutions, namely
       
3 6 0 0
       
       
 3   0   3   0 
x1 =  
 , x2 =   , x3 =   , x4 =   .
     
 0   0   3   6 
       
0 3 0 3
These points belong to R4 since after introducing the slack variables we have four variables.
There basic feasible solutions, projected in R2 —that is, in the (x1 , x2 )-plane — give rise
to the following four points (3, 3), (6, 0), (0, 3) and (0, 0). These points are precisely the
extreme points of the feasible region.
7.1.4 The number of the BFS
The possible number of basic feasible solutions is bounded by the number of ways of
extracting m columns from A to form the basis, so in general,
 
n n!
the number of basic feasible solution ≤  = .
m m!(n − m)!
121
7.1.5 Correspondence between BFSs & extreme points
Theorem 7.5. A point is a basic feasible solution if and only if it is an extreme point.
Proof. Let S = {x ∈ Rn : Ax = b, x ≥ 0} be the feasible region. Suppose that x is a

basic feasible solution. Then there exists a basis B and a non-basis N such that xN = 0.
Suppose that x is not extreme. Then there exist y, z in the feasible region such that
x = λy + (1 − λ)z for some 0 < λ < 1. We have 0 = xN = λyN + (1 − λ)zN . Since
yN , zN ≥ 0 we must have yN = zN = 0. Furthermore since y, z ∈ S, we have
Ay = ByB + N yN = b = Az = BzB + N xN
which implies ByB = BzB = b so B(yB − zB ) = 0. Since B is invertible, we have yB = zB .

Hence y = z. This is a contradiction, so x must be extreme.
Now suppose that x is an extreme point of S. Since x ∈ S it is feasible. We need to

show that it is basic. Suppose that x is not a basic feasible solution. Let P = {i : xi > 0}
and O = {i : xi = 0}. Since x is feasible Ax = AP xP + AO xO = b, thus AP xP = b. Since
x is not a basic feasible solution, the columns of AP are linearly dependent. According
to Theorem 1.26, there exists a non-zero vector yP ∈ Rk, where k is the cardinality of
yP
P , such that AP yP = 0. We take yO = 0 and y =  . Consider two points x + y
yO
and x − y for small . For > 0 small enough, these two points are both feasible since
A(x ± y) = Ax ± y = b and x ± y ≥ 0. Hence
1 1
x = (x + y) + (x − y).
2 2
So x is not extreme, which is a contradiction. Hence x must be a feasible solution.
Thus, the set of basic feasible solutions coincides with the set of extreme points (the
former is an algebraic description and the latter is a geometric description). Since an LP
with a finite optimal value has an optimal solution at an extreme point, an optimal basic
feasible solution can always be found for such problems.
Theorem 7.6 (Fundamental theorem of Linear Programming). i) If there is a feasible

solution, i.e., {x : Ax = b, x ≥ 0} 6= ∅, then there is a basic feasible solution
122
ii) If there is an optimal solution for LP, there is a basic feasible solution that is
optimal.
Proof. i) Let P = {x : Ax = b, x ≥ 0} be the feasible region. Suppose that P 6= ∅ and

x is a feasible solution, that is, Ax = b and x ≥ 0. We will construct a basic feasible
solution from x. Suppose that only first p coordinates of x are positive. Thus
a1 x1 + . . . ap xp = b, (7.1)
where aj is the j-th column of A. We consider two cases.

Case 1: If a1 , . . . , ap are linearly independent. Then p ≤ m = rank(A).
• If p = m, we take B = [a1 , . . . , ap ], then x is a basic feasible solution with respect

to this basis.
• If p < m, then we can find (m − p) columns from ap+1 , . . . , an to form a basis B to

which x is a basic feasible solution.
Case 2: If {a1 , . . . , ap } are linearly dependent. Then there exist y1 , . . . , yp , not all zero,
such that
y1 a1 + . . . + yp ap = 0. (7.2)
Without loss of generality we can assume that yi > 0 for some i ∈ {1, . . . , p} (otherwise,
we can multiply the above equation by −1). From (1.6) and (7.2) we deduce that
(x1 − εy1 )a1 + . . . + (xp − εyp )ap = b.
Hence A(x − εy) = b where y = (y1 , . . . , yp , 0, . . . , 0)T . For sufficiently small ε we have
x1 − εy1 > 0, . . . , xp − εyp > 0.
We choose ε by
nx o
ε = min
i
i = 1, . . . , p; yi > 0 .
yi
Then we have
(x − εy) ≥ 0 and A(x − εy) = b.
Thus (x − εy) is a feasible solution. In addition, due to the choice of ε, the first p
coordinates of (x − εy) are non-negative and at least one of them is zero. Hence we reduce
123
p by at least 1. By repeating this process, we will either Case 1 or p = 0. In both cases,

there will be a basic feasible solution. This completes the proof of Part i).
Part ii). Suppose that x is an optimal solution. Without loss of generality we assume
that x has minimal number of positive ordinates. We will construct a basic feasible
optimal solution from x. If x = 0 then x is basic (and the cost is zero). If x 6= 0, let
I = {i1 , . . . , ip } is the set of index such that xij > 0. Similarly as in Part i), there are two
cases
Case 1: the column vectors {ai1 , . . . , aip } of A are linearly independent. This case we can
proceed exactly the same as in Part i) and obtain a basic feasible optimal solution. Note
that the optimality does not change in this case.
Case 2: the column vectors {ai1 , . . . , aip } of A are linearly dependent. We also proceed
similarly as in the second case in Part i. However, we need to show that the optimality
is preserved during the process. To show that x − εy is optimal, we consider its objective
value
cT (x − εy) = cT x − εcT y
For sufficiently small |ε|, the vector x = εy is a feasible solution (for positive or negative
values of ε. It follows that we must have cT y = 0, since otherwise, we could find a small
ε of the proper sign such that cT (x − εy) < cT x, which would violate the optimality of x.
Thus x − εy is still optimal.
Example 7.7. Consider the LP problem in standard form with

 
3
     
−1 3 0 0 0 3  2 
 
     
A =  0 2 −1 −1 0  b= 2  x= 1 
     
     
0 0 0 −1 1
 
0  1 
 
1
Let us see how to construct a BFS from the feasible point x = (3, 2, 1, 1, 1)T . The given x
has no zero entries, so if we try to replicate the proof of the theorem 7.6, we have m = 3
and p = 5. Therefore we need to solve the system Ay = 0 and pick one solution, for
example y = (0, 0, 1, −1, −1)T .
n
Note that y has only one component y3 that is strictly positive, so ε = min xyii i =
o
1, . . . , p; yi > 0 = xy33 = 1 and x − y = (3, 2, 0, 2, 2)T . We take this as our new x̃ =
124
7.2. THE SIMPLEX METHOD
(3, 2, 0, 2, 2)T this is still a feasible point. It has one coordinate equal to 0 and therefore
Ax̃ = Ãx̃ with  
−1 3 0 0
 
Ã =  0 2 −1 0 .
 
 
0 0 −1 1
Let us solve the system Ãỹ = 0 and pick one solution, for example (3, 1, 2, 2)T , namely
n o n o
ỹ = (3, 1, 2, 2)T . We calculate ε = min x̃ỹii i = 1, . . . , 5; yi > 0 = min 33 , 21 , 22 , 22 = 1..
Again we take x̃ − ỹ = (0, 1, 0, 0, 0)T . This is a BFS. Infact, if we selct the matrix B made
of the second, fourth and fifth columns of A, we can see that x̃ − ỹ = B −1 b.
In the next session, we are going to study how to solve LPs by the simplex method.
7.2 The simplex method
• The simplex method is to proceed from one BFS to an adjacent or neighboring one
in such a way as to continuously improve the value of the objective function (moving
from an extreme point to another with a better at least not worse objective).
• It also discovers whether the feasible region is empty, and whether the optimal
solution is unbounded. In practice, the method only enumerates a small portion of
the extreme points of the feasible region.
• The key to the simplex method lies in recognizing the optimality of a given extreme
point solution based on local consideration without having to enumerate all extreme
points or basic feasible solutions.
• The basic idea of the simplex method is to confine the search to extreme points of
the feasible region in a most intelligent way. The key for the simplex method is to
make computers find extreme points.
125
7.2.1 Reformulation of LP by the current BFS
For simplicity, throughout the remainder of this course, the vector c is understood as
a row vector. We always assume that the problem is in standard form
min cx
s.t. Ax = b, x ≥ 0,
where A is an m × n matrix with full rank, i.e., rank(A) = m, and b a vector in Rm , and
c is a vector in Rn .
Initial basis and corresponding objective value

 
−1
B b
Suppose that we have a basic feasible solution  . The corresponding objec-
0
tive value f0 is given by
 
B −1 b
f0 = (cB , cN )   = cB B −1 b.
0
For this given basis B, we would like to check if it is an optimal solution of the
problem.
If it is not, we would like to find another basic feasible solution at which the objective
value is smaller than the current objective value f0 .
Reformulation of the problem
Let x be a feasible solution of Ax = b, and let xB and xN be its basic and nonbasic
variables for this given basis. Thus, we have
 
xB
b = Ax = [B, N ]   = BxB + N xN .
xN
Multiplying by B −1 we have
X X
xB = B −1 b − B −1 N xN = B −1 b − B −1 aj xj = b̄ −

y j xj
j∈R j∈R
where R denotes the current set of indices of the nonbasic variables, and
b̄ = B −1 b
126
and the column vector yj is given by
yj = B −1 aj , j ∈ R.
The corresponding objective value is given by
f = cx
= cB xB + cN xN
= (cB B −1 b − cB B −1 N xN ) + cN xN
= cB B −1 b − cB B −1 N − cN xN

= f0 − (z − cN )xN
X
= f0 − (zj − cj )xj ,
j∈R
where the row vector

z = cB B −1 N,
i.e.,
zj = cB B −1 aj = cB yj , j ∈ R.
(aj means the jth column vector of A, j ∈ R.)
Therefore, LP can be written as
Minimize
X
f = f0 − (zj − cj )xj
j∈R
subject to (7.3)
X
(yj )xj + xB = b̄
j∈R
xj ≥ 0, j ∈ R, and xB ≥ 0.
For this problem, xB can be viewed as a slack variable, so the above LP can be further
rewritten as
Minimize
X
f = f0 − (zj − cj )xj
j∈R
subject to (7.4)
X
(yj )xj ≤ b̄
j∈R
xj ≥ 0, j ∈ R.
Definition:
127
(i) The value −(zj − cj ) = cj − zj is referred to be as reduced cost coefficient.
(ii) If b̄ = B −1 b > 0, we call the current BFS is non-degenerated. If one of compo-

nents of the vector b̄ is zero, such a BFS is called degenerated.
7.2.2 Checking the Optimality
Case 1: zj − cj ≤ 0 for all j ∈ R.
Proposition 7.8. From (7.4), we see that if (zj − cj ) ≤ 0 for all j ∈ R, then the current
basic feasible solution is optimal.
Proof. Since zj − cj ≤ 0 for all j ∈ R, from f = f0 − − cj )xj we see that f ≥ f0

P
j∈R (zj
for any feasible solution. For the current (basic) feasible solution, we know that f = f0
since xj = 0 for all j ∈ R. So, the current feasible solution is optimal.
Case 2: zj − cj > 0 for some j ∈ R.
For such j, there are two possible cases:
• If zj − cj > 0 and the vector

yj = B −1 aj ≤ 0,
then LP has an unbounded solution (or simply, we say that the LP is unbounded)
since we can send xj to +∞ in (7.4).
• If zj − cj > 0 and the vector
yj = B −1 aj = (y1j , ..., ymj )T
has a positive component. In this case, the current objective value can be further
decreased (if the current BFS is non-degenerated).
Principles of simplex method:
The simplex procedure moves from one basis to an adjacent basis, by introducing one
variable into the basis, and removing another variable from the basis. The criteria for
entering and leaving are summarized below.
• Entering: If zk − ck > 0, xk may enter the basis.
128
• Leaving: If
b̄r b̄i
≡ min : yik > 0
yrk 1≤i≤m yik
then xBr (the rth component of xB ) may leave the basis.
During the process, if
1. zj − cj ≤ 0 for all nonbasic variables xN , terminate. We have an optimal solution
2. If for some k ∈ R, we have zk − ck > 0, and yk = B −1 ak ≤ 0, then LP is unbounded.
3. Otherwise, according to the above rule to choose entering variable, and leaving variable,
to form new basis, and get improved basic feasible solution.
7.2.3 Analysis for the case zj −cj > 0 and yj = B −1 aj has a positive
component for some nonbasic variable
In fact, choose one of the positive number like zk − ck , and perhaps the most positive
of all the zj − cj , j ∈ R. Set
xj = 0, for j ∈ R − {k}.
Denote  
y
 1k 
 y2k
 

 ..
 
 .


yk = 

.

 yrk 
 ..
 
 .


 
ymk
We see from (7.3) that
f = f0 − (zk − ck )xk (7.5)
129
and      
x b̄ y1k
 B1   1   
 xB2  b̄2   y2k
     
 
 ..  ..   ..
     
 .  .   .
 
(7.6)
 
xB =   = b̄ − (yk )xk =  −  xk .
     
 xBr   b̄r   yrk 
 ..  ..   ..
     
 .  .   .
 
 
     
xBm b̄m ymk
Assume the current basic feasible solution is not degenerated, i.e. b̄ > 0. Now let us
analyze what will happen when we increase the value of xk .
(i) From (7.5), the objective value will decrease.
(ii) For those yik ≤ 0, when xk increases, (7.6) implies that the component xBi will
increase, continuing to be positive.
(iii) For those yik > 0, increasing xk , the corresponding component xBi will decrease.
From (7.6), the first basic variable dropping to zero corresponds to the minimum of b̄i /yik
for positive yik . More precisely, we can increase xk until

b̄r b̄i
xk = ≡ min : yik > 0 . (7.7)
yrk 1≤i≤m yik
Under non-degeneracy, xk > 0. From (7.5) and fact that zk − ck > 0, it follows that
objective can be strictly improved, As xk increases from level 0 to br /yrk , a new improved
feasible solution is obtained. Substituting xk given by (7.7) into (7.6) yields the following:

 x = b̄i − yyrkik
b̄r , i = 1, 2, ..., m,
 Bi


xk = yb̄rkr , (7.8)

 all other (x )0 s are zero.


j
From (7.8), we have xBr = 0, and hence at most m variables are positive. The
corresponding columns in A are
aB1 , aB2 , ..., aBr−1 , ak , aBr+1 , ..., aBm .
Proposition 7.9. These columns are linearly independent since yrk 6= 0.
130
By the definition of yk , i.e., yk = B −1 ak . Thus,
ak = Byk = aB1 y1k + ... + aBr yrk + ... + aBm ymk .
Since yrk 6= 0, when the rth column of B is replaced by ak , the resulting vectors are still
linearly independent. This fact follows from the next Lemma, which answers the following
natural question: If we have a basis of Rn , what is the condition that will guarantee that
if a vector of the basis, say aj , is replaced by an another vector, say, a, then the new set
of vectors still forms a basis?
Lemma 7.10. Let a1 , a2 , ..., an form a basis of Rn . Then any vector a can be represented
as
n
X
a= λ j aj .
j=1
Let’s replace aj by a, then the vectors a1 , ..., aj−1 , a, aj+1 , ..., an are linearly independent if
and only if, in the above representation, λj 6= 0.
Proposition 7.11. (7.8) is the a basic feasible solution corresponding the basis
B
e = [aB1 , aB2 , ..., aBr−1 , ak , aBr+1 , ..., aBm ],
which is formed by replacing rth column of B.
Proof. Let’s versify that

 
xB1
 
 xB2
 

 ..
 
 .


[aB1 , aB2 , ..., aBr−1 , ak , aBr+1 , ..., aBm ] 

=b

 xk 
 ..
 
 .


 
xBm
131
where xB1 , xB2 , ..., xk , ..., xBm is given by (7.8). In fact, the left-hand-side

y1k b̄r ymk
b̄1 − b̄r aB1 + ... + ak + ... + b̄m − b̄r aBm
yrk yrk yrk

= b̄1 − y1k xk aB1 + ... + xk ak + ... + b̄m − ymk xk aBm
X m

= b̄j aBj + −b̄r aBr − y1k xk aB1 − ... − y(r−1)k xk aBr−1 + xk ak − ... − ymk xk aBm
j=1
m m
!
X X
= b̄j aBj + xk − yjk aBj + ak
j=1 j=1
= B b̄ + xk (ak − B(yk ))
=b+0=b
The last inequality follows from the fact yk = B −1 ak .
132
Chapter 8
The simplex method (II): Algorithm
In this chapter, we continue studying the simplex method focusing on the simplex
algorithm algorithmic aspect. We will study how to implement the simplex method in
practices. The following topics will be covered
• description of the simplex algorithm,
• the simplex method in tableau format,
• further examples of the simplex methods,
• extracting information form optimal simplex tableau.
At the end of the chapter, students should be able to
• understand the main steps of the simplex algorithm,
• apply the algorithm to solve a LP problem using tableau format,
• extract useful information from the optimal simplex tableau.
[BT97, Chapter 3] and [DT97, Chapter 3] have similar material in this chapter that you
may find useful.
8.1 The algorithm for minimization problems
Initial Step
133
8.1. THE ALGORITHM FOR MINIMIZATION PROBLEMS
Choose a starting basic feasible solution with basis B.
Main Step
Step 1. Solve the system BxB = b. Set
xB = B −1 b = b̄,
xN = 0,
f = cB xB = cB B −1 b = cB b̄.
Step 2. Set w = cB B −1 . Calculate
zj − cj = waj − cj for all nonbasic variables.
Set
zk − ck = max{zj − cj },
j∈R
where R is the current set of indices associated with the nonbasic variables.
If zk − ck ≤ 0, then stop (the current basic feasible solution is an optimal solution).

Otherwise, go to Step 3.
Step 3. Solve the system Byk = ak , obtaining yk = B −1 ak . If yk ≤ 0, then stop (the

optimal value of LP is unbounded). Otherwise, go to step 4.
Step 4. Let xk enter the basis. The index r of the current basic variable xBr which
leaves the basis is determined by the following minimum ratio test:

b̄r b̄i
≡ min : yik > 0
yrk 1≤i≤m yik
where yik denotes the ith component of yk .
Update the basis B where ak replaces aBr , update the index set R, and go to step 1.
Remarks: At each iteration, one of the following three actions is executed.
• We may stop with an optimal solution if zk − ck ≤ 0;
• We may stop with an unbounded optimal value if zk − ck > 0 and yk ≤ 0;
134
8.2. THE SIMPLEX METHOD IN TABLEAU FORMAT
• or else we generate a new basic feasible solution if zk − ck > 0 and yk 6≤ 0, and the
objective function strictly decreases.
Therefore, in the absence of degeneracy (i.e. xB > 0), the objective strictly decreases
at each iteration, and hence the basic feasible solution generated by the simplex method
are distinct. Since there is only a finite number of basic feasible solutions, the method
would stop in a finite number of steps with either an optimal solution or an unbounded
optimal value.
Theorem 8.1 (Convergence). In the absence of degeneracy (and assuming feasibility),

the simplex method stops in a finite number of iterations, either with an optimal basic
feasible optimal solution or with the conclusion that the optimal value is unbounded.
8.2 The simplex method in Tableau Format
All the operations in simplex method can be performed in a Tableau. Let B be a

starting basis. The LP problem can be represented as follows:
Minimize f
subject to f − cB xB − cN xN = 0 (8.1)
BxB + N xN = b (8.2)
xB , xN ≥ 0.
(8.2) can be written as

xB + B −1 N xN = B −1 b.
Multiplying both sides by cB yields
cB xB + cB B −1 N xN = cB B −1 b.
Adding this equality to (8.1) leads to
f + 0xB + cB B −1 N − cN xN = cB B −1 b.

135
So the LP is reformulated as
Minimize f
subject to f + 0xB + cB B −1 N − cN xN = cB B −1 b,

xB + B −1 N xN = B −1 b,
xB , xN ≥ 0.
Here we think of f as a variable to be minimized. We put all information of the above

problem in the following tableau:
xB xN RHS
f 0 cB B −1 N − cN cB B −1 b
xB I B −1 N B −1 b
The table contains all information of the linear programming that we need to proceed
with the simplex method:
• The right-hand-side gives us the objective value,
• The right-hand-side also gives basic variable value B −1 b,
• The cost row gives us cB B −1 N − cN , which consists of the zj − cj for nonbasic

variables,
• Vector yj , j ∈ R, i.e., the columns of B −1 N.
We need a scheme to do the following:
(i) Update the basic variable and their values
(ii) Update the zj − cj values of the new nonbasic variables.
(iii) Update the yj columns.
136
All of the tasks can be simultaneously accomplished by a simple pivoting operations

which are actually a series of matrix elementary row operations on the matrix
 
1 cB cN 0
 .
0 B N b
8.2.1 Pivoting
If xk enters the basis and xBr leaves the basis, then pivoting on yrk includes the
following operations:
• Divide row r by yrk .
• For i = 1, ..., m and i 6= r, update the ith row by adding to it −yik times the new
rth row.
• Update row zero by adding to it −(zk − ck ) times the new rth row.
The following two tables show the situation before and after pivoting operations.
Table 8.1: Before pivoting
xB 1 xB r xB m xj xk RHS
f 0 ··· 0 ··· 0 ··· zj − cj ··· zk − ck ··· cB b̄
xB1 1 ··· 0 ··· 0 ··· y1j ··· y1k ··· b̄1

.. .. .. .. .. .. ..
. . . . . . .
xBr 0 ··· 1 ··· 0 ··· yrj ··· yrk ··· b̄r
.. .. .. .. .. .. ..
. . . . . . .
xBm 0 ··· 0 ··· 1 ··· ymj ··· ymk ··· b̄m
Implication of the pivoting operation:
• xk entered and xBr left the basis. This is illustrated on the left-hand-side of the
tableau by replacing xBr with xk .
137
Table 8.2: After pivoting
xB1 xB r xB m xj xk RHS
(zj − cj )
f 0 ··· ck −ck
yrk
··· 0 ··· ··· 0 ··· cB b̄ − (zk − ck ) yb̄rkr
yrj
− yrk (zk − ck )
1 0 0
−y1k yrj y1k
xB 1 ··· yrk
··· ··· y1j − yrk y1k ··· ··· b̄1 − b̄
yrk r
.. .. .. .. .. .. ..
. . . . . . .
0 0 1
1 yrj b̄r
xk ··· yrk
··· ··· yrk
··· ··· yrk
.. .. .. .. .. .. ..
. . . . . . .
0 1 0
−ymk yrj ymk
xBm ··· yrk
··· ··· ymj − y
yrk mk
··· ··· b̄m − b̄
yrk r
• The right-hand-side gives the current values of the basic variables. The nonbasic
variables are kept zero.
• Suppose that the original columns for the new basic and nonbasic variables are B̂
and N̂ , respectively. The pivoting operation which is a sequence of elementary row
operations results in a new tableau that gives the new B̂ −1 N̂ under the nonbasic
variables, and updated set of zj − cj ’s for the new nonbasic variables, and the values
of the new basic variables and objective.
8.2.2 The simplex method in tableau Format
Initialization step:
Find an initial basic feasible solution with basis B. Form the following initial tableau.
xB xN RHS
f 0 cB B −1 N − cN cB B −1 b
xB I B −1 N B −1 b
Main Step:
Step 1. Let zk − ck = max{zj − cj : j ∈ R}.
• If zk − ck ≤ 0, stop. The current point is optimal.
138
• Otherwise, xk will enter, and go to Step 2
Step 2. Examine yk .
• If yk ≤ 0, then stop (The LP is unbounded).
• Otherwise, go to Step 3
Step 3. Determine the index r by minimum ratio test to find the variable xBr to
leave.
Step 4.
• Update the tableau by pivoting at yrk .
• Update the basic and nonbasic variables where xk enters the basis and xBr leaves
the basis, and repeat the main step.
Example 8.2. Solve the following LP problem by the simplex method
Minimise x1 + x2 − 4x3
s.t. x1 + x2 + 2x3 ≤ 9,
x1 + x2 − x3 ≤ 2,
−x1 + x2 + x3 ≤ 4,
x1 , x2 , x3 ≥ 0.
Adding slack variables leads to
Minimise x1 + x2 − 4x3 + 0x4 + 0x5 + 0x6
subject to x1 + x2 + 2x3 + x4 = 9,
x1 + x2 − x3 + x5 = 2,
−x1 + x2 + x3 + x6 = 4,
x1 , x2 , x3 , x4 , x5 , x6 ≥ 0.
Since b = (9, 2, 4)T ≥ 0, initial basis can be chosen as B = [a4 , a5 , a6 ] = I. so B −1 b = b̄ ≥ 0.

Thus, initial tableau is given by
Initial iteration
139
x1 x2 x3 x4 x5 x6 RHS
f −1 −1 4 0 0 0 0
x4 1 1 2 1 0 0 9
x5 1 1 −1 0 1 0 2
x6 −1 1 1∗ 0 0 1 4
• From this table, we see that x3 is a nonbasic variable, and z3 − c3 = 4 > 0. Thus,
the current basic feasible solution (x4 , x5 , x6 ) = (9, 2, 4) is not optimal. x3 will enter
the basis.
• Now we are going to determine which vector among x4 , x5 , x6 will leave the basis.
By checking y3 = (2, −1, 1)T , we note that it has two positive components. By the
minimum ratio test: min{ 92 , 41 } = 4. The vector x6 leaves the basis, and the marked
elements in the above table is the pivoting element.
• We do pivoting operations to get a new basic feasible solution. Thus the algorithm
proceeds to the next iteration.
Iteration 1.
f 3 −5 0 0 0 −4 −16
x4 3∗ −1 0 1 0 −2 1
x5 0 2 0 0 1 1 6
x3 −1 1 1 0 0 1 4
• x1 is a nonbasic variable, and z1 − c1 = 3 > 0. Thus, the current basic feasible

solution (x4 , x5 , x3 ) = (1, 6, 4) is not optimal. x1 will enter the basis.
• Now we determine which vector among x4 , x5 , x3 will leave the basis. By checking
y1 = (3, 0, −1)T and the minimum ratio test indicates that x4 should leave the basis.
• We do pivoting operations, and get the following tableau:
Iteration 2.
140
8.3. DEGENERACY AND CYCLING IN THE SIMPLEX METHOD
f 0 −4 0 −1 0 −2 −17
x1 1 −1/3 0 1/3 0 −2/3 1/3
x5 0 2 0 0 1 1 6
x3 0 2/3 1 1/3 0 1/3 13/3
Now, zj − cj ≤ 0 for all nonbasic variables, and thus an optimal solution is found and
it is given by
x∗1 = 1/3, x∗2 = 0, x∗3 = 13/3.
The optimal value is

z ∗ = −17.
The optimal basis B consists of the column a1 , a5 and a3 , i.e., B = [a1 , a5 , a3 ].
8.3 Degeneracy and cycling in the simplex method
When applying the simplex method to solve a linear programming problem, it is

possible that one starts from some basic feasible solution and ends up at the exact same
basic feasible solution after a sequence of pivots. This phenomena is called ”cycling” and
can occur when a basic feasible solution is degenerate. In the worst scenario, the simplex
method can cycle forever. The first cycling example was given by Hoffman in 1952 which
had 11 variables and 3 equations. Since then, many examples have been constructed. To
demonstrate the cycling phenomena, we consider the following famous example which was
given by E. M. L. Beale in 1955 [Bea55]. It is conjectured that this is the simplest example
of cycling which is not totally degenerate in the sense that none can be constructed with
fewer variables regardless of the number of equations.
141
Example 8.3 (Beale 1955). Solve the following LP by using simplex method:
3 1
Minimise − x1 + 150x2 − x3 + 6x4
4 50
1 1
subject to x1 − 60x2 − x3 + 9x4 ≤ 0,
4 25
1 1
x1 − 90x2 − x3 + 3x4 ≤ 0,
2 50
x3 ≤ 1,
x1 , x2 , x3 , x4 ≥ 0.
Proof. In each iteration below, the pivoting entry is marked by an asterisk. Adding slack
variables x5 , x6 , x7 to obtain the following problem
3 1
Minimise − x1 + 150x2 − x3 + 6x4
4 50
1 1
subject to x1 − 60x2 − x3 + 9x4 + x5 = 0,
4 25
1 1
x1 − 90x2 − x3 + 3x4 + x6 = 0,
2 50
x3 + x7 = 1,
x1 , x2 , x3 , x4 , x5 , x6 , x7 ≥ 0.
Iteration 1.
x1 x2 x3 x4 x5 x6 x7 RHS
f 3/4∗ −150 1/50 -6 0 0 0 0
x5 1/4 -60 -1/25 9 1 0 0 0
x6 1/2 -90 -1/50 3 0 1 0 0
x7 0 0 1 0 0 0 1 1
Iteration 2.
f 0 30 7/50 -33 -3 0 0 0
x1 1 -240 -4/25 36 4 0 0 0
x6 0 30∗ 3/50 -15 -2 1 0 0
x7 0 0 1 0 0 0 1 1
Iteration 3.
142
f 0 0 2/25 -18 -1 -1 0 0
x1 1 0 8/25∗ -84 -12 8 0 0
x2 0 1 1/500 -1/2 -1/15 1/30 0 0
x7 0 0 1 0 0 0 1 1
Iteration 4.
f -1/4 0 0 3 2 -3 0 0
x3 25/8 0 1 -525/2 -75/2 25 0 0
x2 -1/60 1 0 1/40∗ 1/120 -1/60 0 0
x7 -25/8 0 0 525/2 75/2 -25 1 1
Iteration 5.
f 1/2 -120 0 0 1 -1 0 0
x3 -125/2 10500 1 0 50∗ -150 0 0
x4 -1/4 40 0 1 1/3 -2/3 0 0
x7 125/2 10500 0 0 50 150 1 1
Iteration 6.
f 7/4 -330 -1/50 0 0 2 0 0
x5 -5/4 210 1/50 0 1 -3 0 0
x4 1/6 -30 -1/150 1 0 1/3∗ 0 0
x7 0 0 1 0 0 0 1 1
Iteration 7.
f 3/4∗ −150 1/50 -6 0 0 0 0
x5 1/4 -60 -1/25 9 1 0 0 0
x6 1/2 -90 -1/50 3 0 1 0 0
x7 0 0 1 0 0 0 1 1
143
8.4. EFFICIENCY OF THE SIMPLEX METHOD
This tableau is exactly as the initial tableau. Therefore, the simplex method will cycle
endlessly. We observe that the basic feasible solution in each iteration is degenerate
(but not totally degenerate) and the objective function is actually does not change. It is
this degeneracy that causes cycling. There are several ways to avoid cycling. The most
commonly used method is Bland’s rule which is formally stated as follows:
1. Choose the lowest-numbered (i.e., leftmost) nonbasic column with a negative (re-
duced) cost.
2. If there are more that one row that satisfy the minimum ratio test, choose the row
with the lowest-numbered column (variable) basic in it.
Let us apply the Bland’s rule to the above example. In the Iteration 5. we choose x1 to
enter instead of x5 . Iteration 5’.
f 1/2 -120 0 0 1 -1 0 0
x3 -125/2 10500 1 0 50 -150 0 0
x4 -1/4 40 0 1 1/3 -2/3 0 0
x7 125/2∗ 10500 0 0 50 150 1 1
This choice will avoid the cycling.
8.4 Efficiency of the simplex method
An interesting property of the simplex method is that it is remarkably efficient in

practice. However, in 1972, Klee and Minty [KM72] provided an example, which is now
known as the Klee–Minty cube, showing that the worst-case complexity of the simplex
method as formulated by Dantzig is exponential time. Since then the Klee–Minty cube
has been used to analyze the performance of many algorithms both in the worst case and
on average. The Klee-Minty example is given by
144
Example 8.4 (Klee-Minty cube).
Maximise 2n−1 x1 + 2n−2 x2 + . . . + 2xn−1 + xn
subject to x1 ≤ 5,
4x1 + x2 ≤ 25,
8x1 + 4x2 + x3 ≤ 125,

..
.
2n x1 + 2n−1 x2 + . . . + 4xn−1 + xn ≤ 5n ,
x1 , . . . , xn ≥ 0.
The LP has n variables, n constraints and 2n extreme points. The simplex method,
starting from the origin, will go through each of the extreme points before obtaining the
optimal solution at (0, 0, . . . , 5n ). To illustrate this, we consider the Klee-Minty cube for
the case n = 3, that is
Maximise 4x1 + 2x2 + x3
subject to x1 ≤ 5,
4x1 + x2 ≤ 25,
8x1 + 4x2 + x3 ≤ 125,
x1 , x2 , x3 ≥ 0.
It is easy to see that 4x1 + 2x2 + x3 ≤ 8x1 + 4x2 + x3 = 125 and the inequality becomes
equality when x1 = x2 = 0, x3 = 125. Thus the optimal value is 125 and is achieved
at (x1 , x2 , x3 ) = (0, 0, 125). Adding s1 , s2 , s3 as slack variables to obtain the following
problem
Maximise 4x1 + 2x2 + x3
subject to x1 + s1 = 5,
4x1 + x2 + s2 = 25,
8x1 + 4x2 + x3 + s3 = 125,
x1 , x2 , x3 , s1 , s2 , s3 ≥ 0.
Below are all the tableaux
145
x1 x2 x3 s1 s2 s3 RHS x1 x2 x3 s1 s2 s3 RHS
−f 4 2 1 0 0 0 0 −f 0 2 1 -4 0 0 -20
s1 1 0 0 1 0 0 0 x1 1 0 0 1 0 0 5
s2 4 1 0 0 0 0 0 s2 0 1 0 -4 1 0 5
s3 8 4 1 0 0 1 0 s3 0 4 1 -8 0 1 85
−f 0 0 1 4 -2 0 -30 −f -4 0 1 0 -2 0 -50
x1 1 0 0 1 0 0 5 s1 0 0 0 1 0 0 5
x2 0 1 0 -4 1 0 5 x2 4 1 0 0 1 0 25
s3 0 0 1 8 -4 1 65 s3 -8 0 1 0 -4 1 25
−f 4 0 0 0 2 -1 -75 −f 0 0 0 -4 2 -1 -95
s1 1 0 0 1 0 0 5 x1 1 0 0 1 0 0 5
x2 4 1 0 0 1 0 25 x2 0 1 0 -4 1 0 5
x3 -8 0 1 0 -4 1 25 x3 0 0 1 8 -4 1 65
−f 0 -2 0 4 0 -1 -105 −f -4 -2 0 0 0 -1 -125
x1 1 0 0 1 0 0 5 s1 1 0 0 1 0 0 5
s2 0 1 0 -4 1 0 5 s2 4 1 0 0 1 0 25
x3 0 4 1 -8 0 1 85 x3 8 4 1 0 0 1 125
Thus the simplex method applied to the Klee-Minty cube for n = 3, starting at the origin,
goes through all of the 8 vertices before reaching the optimal value.
Remark 5 (Uniqueness of optimal solutions to a linear programming problem). It has

been proved [Gre94] that a primal-dual pair of optimal solutions is unique if and only if
it is a strictly complementary pair of basic solutions (see Theorem 6.25 for the definition
of a strictly complementary solution). However, in general, it may happen that both the
primal and the dual linear programming problems have multiple optimal solutions. For
146
example, consider the following LP problem
Minimise x1 + x2 + 2x3
subject to x1 + x3 = 1,
x2 + x3 = 1,
x3 ≤ 0,
x1 , x2 ≥ 0.
This problem has multiple solution, for instance (1, 1, 0) and (0, 1, 1). In fact, any feasible
point is optimal. The dual problem is
Maximise y1 + y2
subject to y1 ≤ 1,
y2 ≥ 1,
y1 + y2 ≤ 2.
The dual also has multiple solutions, for example (1, 1) and (0, 2).
Remark 6. There are other algorithms for solving linear-programming problems. In par-
ticular, interior-point methods refer to class of methods for solving linear programming
problems that move through the interior of the feasible region such as Khachiyan’s ellip-
soidal algorithm and Karmarkar’s algorithm. This is in contrast to the simplex algorithm,
which finds an optimal solution by traversing the edges between vertices of the feasible
region. Interior-point methods often have worst-case polynomial complexity; however, in
most practical applications, the simplex method is often very efficient.
Remark 7 (Open problems in Linear Programming). In 1998 the mathematician Stephen

Smale (Fields Medal in 1966) compiled a list of 18 problems in mathematics to be solved
in the 21st century, known as Smale’s problems. This list was compiled in the spirit of
Hilbert’s famous list of problems produced in 1900 for the 20th century. The 9-th problem
in the list is the following
The linear programming problem: Find a strongly-polynomial time algorithm which for
given matrix A ∈ Rm×n and b ∈ Rm decides whether there exists x ∈ Rn with Ax ≥ b.
147
8.5. FURTHER EXAMPLES FOR SIMPLEX METHODS & DEGENERACY
8.5 Further Examples for Simplex Methods & De-

generacy
Example 8.5. Solve the following LP by using simplex method:
Minimise −3x1 + x2
s.t. x1 + 2x2 ≤ 4,
−x1 + x2 ≤ 1,
x1 , x2 , x3 , x4 ≥ 0.
Example 8.6. Solve the following LP by using simplex method:
Minimise −x1 − 3x2
s.t. x1 − 2x2 ≤ 4,
−x1 + x2 ≤ 3,
x1 , x2 ≥ 0.
The following tables summarize the result.
x1 x2 x3 x4 RHS
f 1 3 0 0 0
x3 1 −2 1 0 4
x4 −1 1∗ 0 1 3
x1 x2 x3 x4 RHS
f 4 0 0 −3 −9
x3 −1 0 1 2 10
x2 −1 1 0 1 3
Since z1 − c1 = 4 > 0, x1 should enter the basis, however, the corresponding y1 =

(−1, −1)T ≤ 0. Thus, the LP problem has unbounded optimal value.
148
Example 8.7. Solve the following LP by using simplex method
Maximise −3x1 + x2
s.t. x1 − 3x2 ≥ −3,
2x1 + x2 ≥ −2,
2x1 + x2 ≤ 8,
x1 and x2 are unrestricted.
Degeneracy cases
When the right-hand-side vector b̄ is not positive (at least one of its components is
zero), we can still use the simplex method to solve the problem. The following is such an
example.
Example 8.8 (Investment problem). Someone has $ 20,000 to finance various invest-
ments. There are five categories of investments, each with an associated return and risk:
Investment Return Risk

Mortgages 8 2
Commercial loans 10 6
Government securities 0 6
Personal loans 12 8
Savings 3 0
His goal is to allocate the money to the categories so as to maximize the return, subject
to the following constraints:
(a) An average risk is no more than 6 (all averages taken over the invested money, not
over the savings).
b) The amount in mortgages and personal loans combined should be no higher than the
amount in commercial.
Model the problem:
Since there is no return from government securities, we do not consider such an

investment. Denote
149
• x1 — the amount invested in mortgages,
• x2 — the amount allocated to commercial loans,
• x3 and x4 are the amounts invested in personal loans and savings, respectively.
Note that the constraint (a) implies that
2x1 + 6x2 + 8x3

≤ 6,
x1 + x2 + x3
which can be written as

−4x1 + 2x3 ≤ 0.
We may formulate the problem as follows:
(P1 ) Maximise 8x1 + 10x2 + 12x3 + 3x4
subject to x1 − x2 + x3 ≤ 0,
−4x1 + 2x3 ≤ 0,
x1 + x2 + x3 + x4 = 20, 000,
x1 , x2 , x3 , x4 ≥ 0.
This can be written as a standard form
(P2 ) Minimise −8x1 − 10x2 − 12x3 − 3x4
subject to x1 − x2 + x3 + x5 = 0,
−4x1 + 2x3 + x6 = 0,
x1 + x2 + x3 + x4 = 20, 000,
x1 , x2 , x3 , x4 , x5 , x6 ≥ 0.
Thus, the initial table is
f 8 10 12 3 0 0 0
x5 1 −1 1 0 1 0 0
x6 −4 0 2 0 0 1 0
x4 1 1 1 1 0 0 20,000
150
Restore the simplex tableau
f 5 7 9 0 0 0 −60, 000
x5 1 −1 1∗ 0 1 0 0
x6 −4 0 2 0 0 1 0
x4 1 1 1 1 0 0 20,000
Replacing x5 by x3 , the next table will be
f −4 16 0 0 −9 0 −60, 000
x3 1 −1 1 0 1 0 0
x6 −6 2∗ 0 0 −2 1 0
x4 0 2 0 1 −1 0 20,000
Replacing x6 by x2 , we have
f 44 0 0 0 7 −8 −60, 000
x3 −2 0 1 0 0 1/2 0
x2 −3 1 0 0 −1 1/2 0
x4 6∗ 0 0 1 1 −1 20,000
f 0 0 0 −22/3 −1/3 −2/3 −620, 000/3
x3 0 0 1 1/3 1/3 1/6 20,000/3
x2 0 1 0 1/2 −1/2 0 10,000
x1 1 0 0 1/6 1/6 −1/6 10,000/3
The optimal solution is given by

10, 000 20, 000
(x∗1 , x∗2 , x∗3 , x∗4 ) = , 10, 000, ,0 .
3 3
Thus, he will allocate $ 10,000/3 to mortgages, $10,000 to commercial loans, $20,000/3

to personal loans, and no allocation to savings. Note that the optimal value of problem
151
8.6. EXTRACTING INFORMATION FROM OPTIMAL SIMPLEX TABLEAU
(P1 ) is −620, 000/3, and thus the maximum return for the original problem (P2 ) is
620, 000
f∗ = .
3
8.6 Extracting Information from Optimal Simplex Tableau
Suppose that we get the final tableau (optimal tableau) of simplex method. What
information can be obtained from this final tableau? and how this information can be
used?
8.6.1 Find optimal solution and objective value
Example 8.9. The optimal tableau is given as follows:
f 0 0 0 −22/3 −1/3 −2/3 −620, 000/3
x3 0 0 1 1/3 1/3 1/6 20,000/3
x2 0 1 0 1/2 −1/2 0 10,000
x1 1 0 0 1/6 1/6 −1/6 10,000/3
The optimal solution is given by

   
x∗1 10, 000/3
   
x∗2   10, 000
   
 
   
x∗3   20, 000/3
   
 
 = .
 ∗   
 x4   0 
   
x∗5  
   
 0 
   
x∗6 0
The optimal value of objective function is
f ∗ = −620, 000/3.
152
8.6.2 Find the dual optimal solution and objective value
The dual optimal solution is given by
w∗ = cB B −1 ,
where B is the optimal basis. For the following two cases, we may find the dual optimal
solution immediately without any effort from the optimal tableau.
Case 1: The canonical form,
min{cx : Ax ≥ b, x ≥ 0}.
For this type of problems, the dual optimal solution is given by
wi∗ = (cB B −1 )i = −(zn+i − cn+i ), i = 1, ..., m,
which are given by the “reduced cost coefficient” of slack variables xn+i (i = 1, ..., m) in
the optimal Tableau.
Case 2: If the original problem is of the form
min{cx : Ax ≤ b, x ≥ 0},
then, the dual optimal solution is given as
wi∗ = (cB B −1 )i = zn+i − cn+i , i = 1, ..., m
which are “reduced cost coefficients” for slack variables xn+i (i = 1, ..., m) in the optimal
tableau.
For standard case,

min{cx : Ax = b, x ≥ 0},
to determine the dual optimal solution, we need to know B −1 . Is this information available
in optimal tableau?
153
8.6.3 Find the inverse of optimal basis
In fact, at each iteration, the current B −1 can be obtained immediately from the
current tableau, and the inverse of optimal basis can be found in optimal tableau.
Assume that the original tableau has an identity matrix. The process of reducing the
basis matrix B of the original tableau to an identity matrix in the current tableau, is
equivalent to premultiplying rows 1 through m of the original tableau by B −1 to produce
the current tableau. This also converts the identity matrix of the original tableau to B −1 .
Therefore, B −1 can be extracted from the current tableau as the submatrix in rows 1
through m under the original identity columns.
Example 8.10. Consider the problem
Minimise −8x1 − 10x2 − 12x3 − 3x4
subject to x1 − x2 + x3 ≤ 0,
−4x1 + 2x3 ≤ 0,
x1 + x2 + x3 + x4 = 20, 000,
x1 , x2 , x3 , x4 ≥ 0.
Introducing the slack variables x5 , x6 leads to
Minimize −8x1 − 10x2 − 12x3 − 3x4
subject to x1 − x2 + x3 + x5 = 0,
−4x1 + 2x3 + x6 = 0,
x1 + x2 + x3 + x 4 = 20, 000,
x1 , x2 , x3 , x4 , x5 , x6 ≥ 0.
Initial table is given as
f 8 10 12 3 0 0 0
x5 1 −1 1 0 1 0 0
x6 −4 0 2 0 0 1 0
x4 1 1 1 1 0 0 20,000
154
Final tableau (optimal tableau) is given by
f 0 0 0 −22/3 −1/3 −2/3 −620, 000/3
x3 0 0 1 1/3 1/3 1/6 20,000/3
x2 0 1 0 1/2 −1/2 0 10,000
x1 1 0 0 1/6 1/6 −1/6 10,000/3
What is B and B −1 ?
   
1 −1 1 1/3 1/6 1/3
   
B = [a3 , a2 , a1 ] =  2 0 −4  , B −1 =  −1/2 1/2  .
   
0
   
1 1 1 1/6 −1/6 1/6
Example 8.11. The following simplex tableau shows the optimal solution of a linear
programming. It is known that x4 and x5 are the slack variables in the first and second
constraints of the original problem. The constraints of the original problem are of the ‘≤’
type.
x1 x2 x3 x4 x5 RHS
f 0 −2 0 −3 −2 −35
x3 0 1/4 1 1/2 0 5/2
x1 1 −1/2 0 −1/6 1/3 5/2
The initial identity basis corresponds to x4 and x5 . So in the optimal tableau, the
inverse of the optimal basis B is given by
 
1/2 0
B −1 =  .
−1/6 1/3
8.6.4 Recovering original LP problems from the optimal tableau
In many situations, from the optimal tableau, we may recover the original LP problem.
Example 8.12. (Same example as Example 8.11) The following simplex tableau shows
the optimal solution of a linear programming. It is knows that x4 and x5 are the slack
variables in the first and second constraints of the original problem. The constraints are
of the ‘≤’ type
155
x1 x2 x3 x4 x5 RHS
f 0 −2 0 −3 −2 −35
x3 0 1/4 1 1/2 0 5/2
x1 1 −1/2 0 −1/6 1/3 5/2
1. Determine optimal basis B.
2. Determine the second column of A, i.e., a2 .
3. Find the RHS vector b.
4. Determine the cost coefficient c of the objective.
5. What is the original LP problem?
8.6.5 Recovering missing data in simplex tableau
The data in the simplex tableaux might be lost or corrupted, in which case we need
to recover these data.
Example 8.13. The following is the optimal tableau of simplex method for a given min-
imization problem. The objective is to minimize −2x1 + 3x2 , and the slack variables are
x3 and x4 corresponding to the first and second inequalities, respectively. The constraints
are of the ‘≤’ type.
x1 x2 x3 x4 RHS
f b −1 f g -6
x3 c 0 1 1/5 4
x1 d e 0 2 a
Find the values of a, b, c, d, e, f, g in the above tableau.
156
Chapter 9
Sensitivity Analysis
What is sensitivity and why do we have to care about it?
• In most practical applications, some of the problem data are uncertain or cannot be
known exactly. In this situation, the problem data are estimated. It is important to
be able to find the new optimal solution of the problem as other estimates of some
of the data become available, without the expensive task of resolving the problem
from scratch.
• Also at early stages of problem formulation some factors may be overlooked. It is

important to update the current solution in a way that takes care of these factors.
• Furthermore, in many situations the constraints are not very rigid. For example,
a constraint may reflect the availability of some resources. This availability can
be increased by extra purchase, and the like. It is desirable to examine the effect
of relaxing some of the constraints on the value of the optimal objective without
having to resolve the problem.
These and other related topics constitute sensitivity analysis. [BT97, Chapter 5] and
[DT97, Chapter 7] have similar material in this chapter that you may find useful.
We still consider the standard linear programming problem:
min{cx : Ax = b, x ≥ 0}.
Suppose that the simplex method produces an optimal basis B.
157
9.1. CHANGE IN THE COST VECTOR
We shall describe how to make use of the optimality conditions in order to find a new
optimal solution, if some of the problem data change, without resolving the problem from
scratch. In particular, the following variations in the problem will be considered.
• Change in the cost vector c.
• Change in the right-hand-side vector b.
• Change in the constraint matrix A.
• Addition of a new activity.
• Addition of a new constraint.
9.1 Change in the cost vector
Changes of the cost vector c will change the cost row of the final tableau, that is, the
dual feasibility may be affected (lost). Suppose ck is changed to c0k . This includes two
cases:
• Case I: xk is a nonbasic variable.
• Case II: xk is a basic variable.
We consider only the first case in this course, i.e., ck is changed to c0k and xk is
nonbasic.
In this case, cB is not changed, and hence zj = cB B −1 aj is not changed.
• For j 6= k, the reduced cost coefficient zj − cj is not changed, and it satisfies that
zj − cj ≤ 0.
• For j = k, we have
zk − c0k = (zk − ck ) + (ck − c0k ).
Thus, if ck is increased, i.e., if c0k ≥ ck , then we still have zk − c0k ≤ 0, the current
solution remains to be optimal. More general, if c0k ≥ zk , the solution is no change.
Otherwise, zk − c0k > 0, in which case xk must be introduced into the basis and the
simplex method is continued as usual.
158
9.1. CHANGE IN THE COST VECTOR
Minimise −2x1 + x2 − x3
s.t. x1 + x2 + x3 ≤ 6,
−x1 + 2x2 ≤ 4,
x1 , x2 , x3 ≥ 0.
The optimal tableau is given by
x1 x2 x3 x4 x5 RHS
f 0 −3 −1 −2 0 −12
x1 1 1 1 1 0 6
x5 0 3 1 1 1 10
Suppose that c2 = 1 is changed to c02 = −3. Since x2 is nonbasic, then
z2 − c02 = (z2 − c2 ) + (c2 − c02 ) = −3 + 4 = 1 > 0,
and all other zj − cj are unaffected. Hence x2 enters the basis to replace x5 in the basis.
So we get the following table
x1 x2 x3 x4 x5 RHS
f 0 1 −1 −2 0 −12
x1 1 1 1 1 0 6
x2 0 3∗ 1 1 1 10
Continue the simplex iteration, and find the new optimal solution as follows.
x1 x2 x3 x4 x5 RHS
f 0 0 −4/3 −7/3 −1/3 −46/3
x1 1 0 2/3 2/3 −1/3 8/3
x2 0 1 1/3 1/3 1/3 10/3
So the solution of the changed LP is (x∗1 , x∗2 ) = (8/3, 10/3), and the optimal value is
−46/3.
159
9.2. CHANGE IN THE RIGHT-HAND-SIDE
9.2 Change in the right-hand-side
Suppose that the RHS vector b is changed to b0 . Then
• zj − cj = cB B −1 aj − cj has no change,
• and the solution B −1 b is changed to B −1 b0 . Thus, the primal feasibility might be

affected. If one of the components of B −1 b0 is negative, the simplex method can be
continued on the final tableau by using the dual simplex method.
Otherwise, if B −1 b0 is nonnegative, the current basis remains to be an optimal basis,

and the new optimal value would be cB B −1 b0 .
s.t. x1 + x2 + x3 ≤ 6,
−x1 + 2x2 ≤ 4,
x1 , x2 , x3 ≥ 0.
x1 x2 x3 x4 x5 RHS
f 0 −3 −1 −2 0 −12
x1 1 1 1 1 0 6
x5 0 3 1 1 1 10
i) Suppose that the right-hand-side vector b = (6, 4)T is replaced by b0 = (5, 4)T . Check
if this change will affect the current optimal solution of the problem.
ii) What if the RHS vector is replaced by b0 = (−3, 4)T ?
Solutions: the initial identity basis corresponds to x4 and x5 . So in the optimal tableau,
the inverse of the optimal basis B is given by
 
1 0
B −1 =  .
1 1
160
9.3. ADDING A NEW ACTIVITY
Suppose that the RHS vector b = (6, 4)T is replaced by b0 = (5, 4)T . We compute
    
1 0 5 5
B −1 b0 =     =   > 0.
1 1 4 9
Thus the current basis is still optimal. The optimal solution become
   
x1 5
xB =   = B −1 b0 =   .
x5 9
The optimal objective value is

 
5
cB B −1 b0 = −2 0   = −10.
9
Now suppose that the RHS vector is changed to b0 = (−3, 4)T . We compute
    
1 0 −3 −3
B −1 b0 =    =  .
1 1 4 1
Since B −1 b0 has a negative entry, the current basis becomes infeasible. The dual simplex
method (which will be studied in Chapter 11) implies that the dual problem is unbounded.
Hence the changed problem is infeasible. For this specific case, we can deduce that the
changed problem is infeasible directly since the first constraint is replaced by
x1 + x2 + x3 ≤ −3,
which is impossible for x1 , x2 , x3 ≥ 0.
9.3 Adding a new activity
Suppose that a new activity xn+1 with cost cn+1 and consumption column an+1 is con-
sidered for possible production. Without resolving the problem, we can easily determine
whether producing xn+1 is worthwhile. First calculate zn+1 − cn+1 .
• If zn+1 − cn+1 ≤ 0 (minimization problem), then x∗n+1 = 0 since it is nonbasic

variable, and current solution is optimal.
• If zn+1 − cn+1 > 0, then xn+1 is introduced into the basis and the simplex method
continues to find the new optimal solution.
161
9.3. ADDING A NEW ACTIVITY
s.t. x1 + x2 + x3 ≤ 6,
−x1 + 2x2 ≤ 4,
x1 , x2 , x3 ≥ 0.
x1 x2 x3 x4 x5 RHS
f 0 −3 −1 −2 0 −12
x1 1 1 1 1 0 6
x5 0 3 1 1 1 10
Let’s find new optimal solution if a new activity x6 ≥ 0 with c6 = 1 and a6 = (−1, 2)T is
introduced.
From the optimal tableau, we find that the the inverse of the optimal basis B is
 
1 0
B −1 =  .
1 1
We compute
  
−1 1 0
z6 − c6 = cB B −1 a6 − c6 = −2 0     − 1 = 1 > 0.
1 1 2
    
1 0 −1 −1
y6 = B −1 a6    =  .
1 1 2 1
We incorporate this information into the last optimal simplex tableau:
f 0 −3 −1 −2 0 1 −12
x1 1 1 1 1 0 -1 6
x5 0 3 1 1 1 1∗ 10
We continue using the simplex method: x6 enters the basis and x5 leaves. By pivoting
around the pivoting entry, which is marked with a star on the above tableau, we obtain
the following tableau
162
9.4. ADDING A NEW CONSTRAINT
f 0 −6 −2 −3 -1 0 −22
x1 1 4 2 2 1 0 16
x5 0 3 1 1 1 1 10
This is an optimal solution. Thus the optimal solution to the changed problem is (x∗1 , x∗2 , x∗3 ) =
(16, 0, 0), the optimal value is −22.
9.4 Adding a new constraint
Suppose that a new constraint is added to the problem.
• If the optimal solution to the original problem satisfies the added constraint, it is
then obvious that the point is also an optimal solution to the new problem.
• If the optimal solution to the original problem does not satisfy the new constraint,
i.e., if the constraint cut away the optimal point, we can use the dual simplex method
to find the new optimal solution.
The detail will be given as follows.
Suppose that B is the optimal basis before the constraint
am+1 x ≤ bm+1
is added. The corresponding system is given by the following system
f + (cB B −1 N − cN )xN = cB B −1 b,
xB + B −1 N xN = B −1 b.
Notice that the new constraint can be written as
am+1
B xB + am+1
N xN + xn+1 = bm+1 .
Here am+1 is decomposed into am+1

B , am+1
N and xn+1 is a nonnegative slack variable. Multi-
plying the above second equation by am+1
B and subtracting from the new constraint gives
163
the following system:




 f + (cB B −1 N − cN )xN = cB B −1 b,

xB + B −1 N xN = B −1 b,


 (am+1 − am+1 B −1 N )x + x

=b − am+1 B −1 b.
N B N n+1 m+1 B
The basic feasible solution to this system is
xB = B −1 b
xN = 0
xn+1 = bm+1 − am+1

B B −1 b.
The only possible violation of optimality of the new problem is the sign of bm+1 −am+1
B B −1 b.
If
bm+1 − am+1
B B −1 b ≥ 0,
then the current solution is optimal. Otherwise, if
bm+1 − am+1
B B −1 b < 0,
then the dual simplex method is used to restore feasibility.
s.t. x1 + x2 + x3 ≤ 6,
−x1 + 2x2 ≤ 4,
x1 , x2 , x3 ≥ 0.
x1 x2 x3 x4 x5 RHS
f 0 −3 −1 −2 0 −12
x1 1 1 1 1 0 6
x5 0 3 1 1 1 10
164
Consider the problem with the added constraint
−x1 + 2x3 ≥ 2.
Clearly the optimal point (x1 , x2 , x3 ) = (6, 0, 0) does not satisfy this constraint. The
constraint −x1 + 2x2 ≥ 2 is rewritten as
x1 − 2x3 + x6 = −2,
where x6 is a nonnegative slack variable. This row is added to the optimal tableau of the
above problem to obtain the following tableau.
f 0 −3 −1 −2 0 0 −12
x1 1 1 1 1 0 0 6
x5 0 3 1 1 1 0 10
x6 1 0 −2 0 0 1 −2
Multiply row 1 by −1 and add to row 3 in order to restore column x1 to a unit vector.
f 0 −3 −1 −2 0 0 −12
x1 1 1 1 1 0 0 6
x5 0 3 1 1 1 0 10
x6 0 −1 −3∗ −1 0 1 −8
The dual simplex method can then be applied to the above tableau. By minimum ratio test
of the dual simplex method, we see that x3 enters the basis to replace x6 . Thus, we get the
following tableau.
f 0 −8/3 0 −5/3 0 −1/3 −28/3
x1 1 2/3 0 2/3 0 1/3 10/3
x5 0 8/3 0 2/3 1 1/3 22/3
x3 0 1/3 1 1/3 0 −1/3 8/3
165
So the optimal solution of the changed problem is
(x∗1 , x∗2 , x∗3 ) = (10/3, 0, 8/3).
In summary, when a constraint is added, restore the simplex tableau first, and then
continue simplex iterations (by using dual simplex method) until new solution is found.
166
Chapter 10
The Big-M Method and Two-Phase

method
The simplex method assumes that an initial basic feasible solution (BFS) is at hand.
In many cases, such a BFS is not readily available. There are some work may be needed
to get the simplex method started.
Example 10.1. Consider the following two groups of constraints
1.
x1 + 2x2 ≤ 4,
−x1 + x2 ≤ 1,
x1 , x2 ≥ 0.
2.
x1 + x2 + x3 ≤ 6,
−2x1 + 3x2 + 2x3 ≥ 3,
x1 , x2 , x3 ≥ 0.
For the first group of constraints, an initial basis can be found without any difficulty.
However, the second one is not. Some extra work is needed to get the initial basis.
In this chapter we will study two procedures that can be used to construct initial basis:
167
10.1. THE BIG-M METHOD
• The big-M method.
• The two-phase method.
Both involve artificial variables to obtain an initial basic feasible solution. The ”Big M”
refers to a large number associated with the artificial variables, represented by the letter
M. As suggested by its names, the two-phase method consists of two phases. Phase I is
used to find a basic feasible solution for the original constraints, then the original linear
programming problem will be solved using the simplex method.
At the end of this chapter, students should be able to apply the big-M method and
the two-phase method to solve some LPs.
10.1 The Big-M method
Observation:
In general, any LP can be transformed into a problem of the following form:
Minimise cx
subject to Ax = b
x ≥ 0,
where b ≥ 0. (Can you prove this statement?)
When we use the simplex method, we also assume that we have this form.
The big-M approach is to get rid of the artificial by assigning coefficients for those
artificial variables in the original objective function in such way as to make their presence
in the basis at a positive level very unattractive from the objective function point of view.
Consider the following problem:
Minimize cx + M (eT xa )
subject to Ax + xa = b,
x, xa ≥ 0.
where M is a very large positive number.
The term M eT xa can be interpreted as a penalty to be paid by any solution with

xa 6= 0.
168
Even though the starting solution x = 0, xa = b is feasible to the above problem, it

has a very unattractive objective value.
Therefore, the simplex method will try to get the artificial variables out the basis,
and then continue to find an optimal solution of the original problem.
Analysis of the Big-M method.
Two cases may arise after solving the Big-M problem by the simplex method:
• Arrive at an optimal solution of Big-M problem.
• The Big-M problem has an unbounded optimal solution.
Relationship between the Big-M problem and the Original LP problem
Case A: The Big-M problem has an finite optimal solution. There are two subcases:
• (x, xa ) = (x∗ , 0) is an optimal solution of the Big-M problem. Then x∗ is the optimal
solution to the original problem.
Proof. In fact, if x is an arbitrary feasible point of the original LP, then (x, 0) is a
feasible solution to the Big-M problem. Since (x∗ , 0) is the optimal solution to the
Big-M problem, we have
cT x∗ + 0 ≤ cT x + 0,
i.e.,
cT x ∗ ≤ cT x
which implies that x∗ is indeed an optimal solution of the original LP problem.
• (x∗ , x∗a ) is an optimal solution to the Big-M problem and x∗a 6= 0.
In this case, the original LP has no feasible point, i.e., the problem is infeasible.
(Why?)
Case B: The Big-M problem has an unbounded optimal value.
• zk − ck = max(zj − cj ) > 0 and yk ≤ 0, and all the artificial are equal to zero, then
the original LP is unbounded. (Why?)
169
• zk − ck = max(zj − cj ) > 0 and yk ≤ 0, and not all the artificial are equal to zero,
then the original LP is infeasible. (Why?)
Example 10.2. Solve the LP by Big-M method
Minimise x1 − 2x2
subject to x1 + x2 ≥ 2,
x2 ≤ 3,
x1 , x2 ≥ 0.
Adding slack variables x3 and x4 and artificial variables x5 , the Big-M LP can be written
as
Minimise x1 − 2x2 + M x5
subject to x1 + x2 − x3 + x5 = 2,
x2 + x4 = 3,
x1 , x2 , x3 , x4 , x5 ≥ 0.
The initial basic variables are x5 and x4 ,

   
1 0 1 1 −1
B = [a5 a4 ] =   , N = [a1 a2 a3 ] =  ,
0 1 0 1 0
cB = (c5 c4 ) = (M 0), cN = (c1 c2 c3 ) = (1 − 2 0).
Note that in the simplex method, we work with tableau (including the initial one) of the
form
xB xN RHS
f 0 cB B −1 N − cN cB B −1 b
xB I B −1 N B −1 b
where the reduce cost zj − cj are zero for all basic variables.
There are two ways to find the initial tableau. The first approach is to compute all
170
the quantities appeared in the tableau above.

 
1 1 −1
cB B −1 N − cN = M 0   − 1 −2 0 = M − 1 M + 2 −M ,
0 1 0
     
2 1 1 −1 2
cB B −1 b = M 0   = 2M, B −1 N = N =   , B −1 b = b =   .
3 0 1 0 3
Thus we obtain the following initial tableau for the simplex method
x1 x2 x3 x4 x5 RHS
f M − 1 M+2 −M 0 0 2M
x5 1 1∗ −1 0 1 2
x4 0 1 0 1 0 3
The second approach is to simply start with the tableau
xB xN RHS
f −cB −cN 0
xB I B −1 N B −1 b
which encodes the formulation (8.1)-(8.2) in Section 8.2. Then we do elementary row
operations to reduce zj − cj for all basic variables to zero. In this example, we can start
with the following tableau
x1 x2 x3 x4 x5 RHS
f −1 2 0 0 −M 0
x5 1 1 −1 0 1 2
x4 0 1 0 1 0 3
Then we reduce zj − cj for all basic variables, x5 and x4 , to zero by a few elementary
row operations. Substituting the f -row by it plus the x5 -row multiplied by M , leads to the
following tableau:
x1 x2 x3 x4 x5 RHS
f M − 1 M+2 −M 0 0 2M
x5 1 1∗ −1 0 1 2
x4 0 1 0 1 0 3
171
This is exactly the initial tableau for the simplex method obtained in the first approach.
x2 enters and x5 leaves:
x1 x2 x3 x4 x5 RHS
f −3 0 2 0 −(M + 2) −4
x2 1 1 −1 0 1 2
x4 −1 0 1∗ 1 −1 1
x3 enters and x4 leaves:
x1 x2 x3 x4 x5 RHS
f −1 0 0 −2 −M −6
x2 0 1 0 1 0 3
x3 −1 0 1 1 −1 1
So the optimal solution of the original problem is
x∗ = (x∗1 , x∗2 ) = (0, 3)
and the optimal value is

f ∗ = −6.
Example 10.3. Solve the LP by using Big-M method
Minimise −x1 − 3x2 + x3
subject to x1 + x2 + 2x3 ≤ 4,
−x1 + x3 ≥ 4,
x3 ≥ 3,
x1 , x2 , x3 ≥ 0.
Example 10.4. Solve the LP by using Big-M method
172
10.2. THE TWO-PHASE METHOD
Minimize −x1 − x2
subject to x1 − x2 − x3 = 1,
−x1 + x2 + 2x3 − x4 = 1,
x1 , x2 , x3 , x4 ≥ 0.
10.2 The Two-phase Method
To find an initial basic feasible solution, one approach is Phase I problem which is
used to find a basic feasible solution for the original constraints, then we can initiate Phase
II wherein the Simplex Method is applied to solve the original linear programming prob-
lem. The combination of Phases I and II gives rise to the Two-Phase Simplex Method.
Phase 1. Solve the following linear programming problem starting with the basic
feasible solution x = 0 and xa = b :
Minimise eT xa
subject to Ax + xa = b,
x, xa ≥ 0.
• If at optimality xa 6= 0, then stop. The original problem has no feasible solution.
• Otherwise let the basic and nonbasic variables be xB and xN .
Phase II. Solve the following linear programming problem starting with the basic
feasible solution xB = B −1 b and xN = 0 :
Minimise cB x B + cN x N
subject to xB + B −1 N xN = B −1 b,
xB , xN ≥ 0.
The purpose of the phase I is to get us to an extreme point of the feasible

region, while phase II takes us from this feasible point to an optimal point.
173
Remark: For the system Ax = b and x ≥ 0, we change the restrictions by adding an

artificial vector xa leading to the system
Ax + xa = b, x ≥ 0, xa ≥ 0.
This gives an immediate basic feasible solution of the new system, namely xa = b and
x = 0. Even though we now have a starting basic feasible solution and the simplex method
can be applied, we have in fact changed the problem. In order to get back to our original
problem, we must force these artificial variables to zero, because Ax = b if and only if
Ax + xa = b with xa = 0. In other words, artificial variables are only a tool to get the
simplex method started, however, we must guarantee that these variables will eventually
drop to zero.
Example 10.5. Solve the following LP by using two-phase method
Minimise x1 − 2x2
subject to x1 + x2 ≥ 2,
−x1 + x2 ≥ 1,
x2 ≤ 3,
x1 , x2 ≥ 0.
By introducing the slack variables x3 , x4 and x5 , the problem becomes
Minimise x1 − 2x2
subject to x1 + x 2 − x3 = 2,
−x1 + x2 − x4 = 1,
x2 + x5 = 3,
x1 , x2 , x3 , x4 , x5 ≥ 0.
Phase I
174
Adding artificial variables x6 and x7 and consider the Phase-I problem:
Minimise x6 + x7
subject to x1 + x2 − x3 + x6 =2
−x1 + x2 − x4 + x7 = 1
x2 + x5 =3
x1 , x2 , x3 , x4 , x5 , x6 , x7 ≥ 0.
Let f0 = x6 + x7 . The Phase I minimises the objective f0 .
f0 0 0 0 0 0 −1 −1 0
x6 1 1 −1 0 0 1 0 2
x7 −1 1 0 −1 0 0 1 1
x5 0 1 0 0 1 0 0 3
First, restore the simplex tableau. Since x5 , x6 , x7 are basic variables, we may add rows 1,
2 to row 0 so that z6 − c6 = z7 − c7 = 0.
f0 0 2 −1 −1 0 0 0 3
x6 1 1 −1 0 0 1 0 2
x7 −1 1* 0 −1 0 0 1 1
x5 0 1 0 0 1 0 0 3
Since z2 − c2 = 2 > 0, x7 is replaced by x2 , after pivoting operations, we have the following

table
f0 2 0 −1 1 0 0 −2 1
x6 2* 0 −1 1 0 1 −1 1
x2 −1 1 0 −1 0 0 1 1
x5 1 0 0 1 1 0 −1 2
175
f0 0 0 0 0 0 −1 −1 0
x1 1 0 −1/2 1/2 0 1/2 −1/2 1/2
x2 0 1 −1/2 −1/2 0 1/2 1/2 3/2
x5 0 0 1/2 1/2 1 −1/2 −1/2 3/2
which is the optimal tableau of the Phase I. There is no artificial variable in the optimal
basis of the first phase. We have a starting basic feasible solution for proceeding to the
second phase.
Phase II
Removing the artificial variables from the table, we do not consider it any more.
Replace the first row by the coefficients of the equation f − x1 + 2x2 = 0 since the original
objective is f = x1 − 2x2 .
x1 x2 x3 x4 x5 RHS
f −1 2 0 0 0 0
x1 1 0 −1/2 1/2 0 1/2
x2 0 1 −1/2 −1/2 0 3/2
x5 0 0 1/2 1/2 1 3/2
Restoring the simplex tableau gives rise to
x1 x2 x3 x4 x5 RHS
f 0 0 1/2 3/2 0 −5/2
x1 1 0 −1/2 (1/2)* 0 1/2
x2 0 1 −1/2 −1/2 0 3/2
x5 0 0 1/2 1/2 1 3/2
Since z4 − c4 = 3/2 > 0, x4 is the entering variable. By minimum ratio test, x1 is the
leaving variable.
176
x1 x2 x3 x4 x5 RHS
f −3 0 2 0 0 −4
x4 2 0 −1 1 0 1
x2 1 1 −1 0 0 2
x5 −1 0 1* 0 1 1
It is evident that x3 is the entering variable and x5 is the leaving variable.
f x1 x2 x3 x4 x5 RHS
f 1 −1 0 0 0 −2 −6
x4 0 1 0 0 1 1 2
x2 0 0 1 0 0 1 3
x3 0 −1 0 1 0 1 1
The optimal solution to the original LP problem is given by
x∗ = (x∗1 , x∗2 ) = (0, 3).
Note that phase I moves from the infeasible point (0,0) to the infeasible point (0,1),
and finally to the feasible point (1/2, 3/2). From this extreme point, phase II moves to
the feasible point (0,2) and finally to the optimal point (0, 3).
Analysis of the Two-Phase Method:
At the end of phase I, there are possible cases:
Case A: xa 6= 0. In this case, the original problem is infeasible.
Proof. In fact, if there is an x ≥ 0 with Ax = b, then (x, 0) is a feasible solution of the

phase I problem and objective value at this feasible solution is 0, violating the optimality
of xa at which the objective is positive.
Case B: xa = 0. There are two subcases:
• If all artificial variables are out of the basis, remove from the last form the artificial
and construct the initial simplex step since there is a basic feasible solution at hand,
and hence proceed to Phase II.
177
• If some artificial variables are in the basis at zero level, we have two ways to continue.
We may first eliminate the columns corresponding to the nonbasic artificial variables
of Phase I, and then proceed to phase II directly to continue to replace those left
artificial variables without increasing the original objective values.
Example 10.6. Solve the following LP problem:
Minimise 2x1 + 3x2 + 4x3
s.t. x1 + 2x2 + x3 ≥ 3,
2x1 − x2 + 3x3 ≥ 4,
x1 , x2 , x3 ≥ 0.
First, we introduce slack variables x4 and x5 , to write the problem in the standard form
s.t. x1 + 2x2 + x3 − x4 = 3,
2x1 − x2 + 3x3 − x5 = 4,
x1 , ..., x5 ≥ 0.
Phase I: now we introduce artificial variables x6 and x7 and consider the following phase-I
problem
Minimise x6 + x7
s.t. x1 + 2x2 + x3 − x4 + x6 = 3,
2x1 − x2 + 3x3 − x5 + x7 = 4,
x1 , ..., x7 ≥ 0.
f0 0 0 0 0 0 −1 −1 0
x6 1 2 1 -1 0 1 0 3
x7 2 1 3 0 -1 0 1 4
Restore the initial simplex method.
f0 3 1 4 -1 -1 0 0 7
x6 1 2 1 -1 0 1 0 3
x7 2 1 3∗ 0 -1 0 1 4
178
x3 enters, x7 leaves
f0 1/3 7/3 0 -1 1/3 0 -4/3 5/3
x6 1/3 7/3 0 -1 1/3 1 -1/3 5/3
x3 2/3 -1/3 1 0 -1/3 0 1/3 4/3
x2 enters, x6 leaves.
f0 0 0 0 0 0 -1 -1 0
x2 1/7 1 0 -3/7 1/7 3/7 -1/7 5/7
x3 5/7 0 1 -1/7 -2/7 1/7 3/7 11/7
This is an optimal solution to the phase I problem. The artificial variables x6 and x7 are
not basic variable at optimality. Thus we remove columns x6 and x7 and move to phase
II.
Phase II: we replace the original cost function to obtain the tableau:
x1 x2 x3 x4 x5 RHS
f -2 -3 -4 0 0 0
x2 1/7 1 0 -3/7 1/7 5/7
x3 5/7 0 1 -1/7 -2/7 11/7
We restore the simplex tableau:
x1 x2 x3 x4 x5 RHS
f 9/7 0 0 -13/7 -5/7 59/7
x2 1/7 1 0 -3/7 1/7 5/7
x3 5/7∗ 0 1 -1/7 -2/7 11/7
x1 enters, x3 leaves.
x1 x2 x3 x4 x5 RHS
f 0 0 −9/5 −8/5 −1/5 28/5
x2 0 1 −1/5 −2/5 1/5 2/5
x1 1 0 7/5 −1/5 −2/5 11/5
179
Thus the optimal solution is

x∗ = (11/5, 2/5, 0),
and optimal value

f ∗ = 28/5.
180
Chapter 11
The Dual Simplex Method
In this chapter, we study another extension/variant of the simplex method, the dual
simplex method. This method is useful and necessary in, for instances, the following
situations.
In what follows, c remains to be understood as a row vector.
• In certain instances, it is difficult to find a starting basic feasible solution to a linear

program without adding artificial variables.
• However, in these same instances, it is often possible to find a starting basis, which
is not necessary to be primal feasible, but which is dual feasible (i.e., all zj − cj ≤ 0
for minimization problem).
• The dual simplex method is a variant of the simplex method that would produce a
series of simplex tableau that maintain dual feasibility and complementary slackness
and strive toward primal feasibility.
• The dual simplex method solves the dual problem directly on the (primal) simplex
tableau. At each iteration we move from a basic feasible solution of the dual problem
to an improved basic feasible solution until optimality of the dual (and also the
primal) is reached, or else until we conclude that the dual is unbounded and that
the primal is infeasible.
181
11.1. DUAL INFORMATION FROM THE (PRIMAL) OPTIMAL SIMPLEX
TABLEAU
11.1 Dual information from the (primal) optimal sim-

plex tableau
Consider the following canonical form:
Minimise cx
subject to Ax ≥ b
x ≥ 0.
The slack variables are xn+1 , ..., xn+m . Let’s assume that we have the following optimal
simplex tableau:
f x1 x2 ··· xn xn+1 ··· xn+m RHS

f 1 z1 − c1 z2 − c2 ··· zn − cn zn+1 − cn+1 ··· zn+m − cn+m cB b̄
xB1 0 y11 y12 ··· y1n y1,n+1 ··· y1,n+m b̄1
xB2 0 y21 y22 ··· y2n y2,n+1 ··· y2,n+m b̄2
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
xBm 0 ym1 ym2 ··· ymn ym,n+1 ··· ym,n+m b̄m
F̌rom this optimal tableau, we may also get the optimal solution to the dual problem.
In fact, define
w = cB B −1 .
Then we have
zj − cj = cB B −1 aj − cj = waj − cj , j = 1, ..., n.
Since at the optimal tableau, we have
zj − cj ≤ 0, j = 1, ..., n
which implies that

waj − cj ≤ 0, j = 1, ..., n,
i.e., wA ≤ c.
Ňote that
an+i = −ei , cn+i = 0, i = 1, ..., m
182
11.1. DUAL INFORMATION FROM THE (PRIMAL) OPTIMAL SIMPLEX
TABLEAU
(where ei is the ith column of the identity matrix) and so we have
zn+i − cn+i = wan+i − cn+i = w(−ei ) − 0 = −wi , i = 1, ..., m
It follows from zn+i − cn+i ≤ 0 that w ≥ 0. Therefore, w is feasible to the dual problem,
and by denoting
w∗ = cB B −1 ,
then the dual objective value
bT w∗ = (cB B −1 )T b = cB b̄
i.e., the primal and dual objectives are equal. By weak duality, we conclude that w∗ is
dual optimal.
Theorem 11.1. At optimality of the primal minimization problem in the canonical form,
w∗ = cB B −1
is an optimal solution to the dual problem. Furthermore,
wi∗ = −(zn+i − cn+i ) = −zn+i
for i = 1, ..., m.
Extracting primal and dual solutions from optimal simplex forms
W̌hen the optimal solution is attained, from the final tableau of the simplex method,
we may get both the primal and dual optimal solutions. by using the following correspon-
dence
1. Slack variables of the primal problem correspond to the dual variables
2. The variables of the primal problem correspond to the dual slack variables.
Note that a symmetric analysis holds for the equality constrained case.
Theorem 11.2. For an optimal tableau with basis B, taking
w∗ = cB B −1 ,
the primal optimality conditions zj − cj ≤ 0(j = 1, ..., n) imply that w∗ aj − cj ≤ 0 for

j = 1, ..., n. Thus w∗ is optimal for the dual problem.
183
11.2. THE DUAL SIMPLEX METHOD
11.2 The Dual Simplex Method
Consider the linear program:
min{cx : Ax = b, x ≥ 0},
and suppose that we have a dual feasible point, i.e., the initial table of simplex method
is given as follows,
f x1 ··· xj ··· xk ··· xn RHS

f 1 z1 − c1 ··· zj − cj ··· zk − ck ··· zn − cn cB b̄
xB1 0 y11 ··· y1j ··· y1k ··· y1n b̄1
xB2 0 y21 ··· y2j ··· y2k ··· y2n b̄2
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
xB r 0 yr1 ··· yrj ··· yrk ··· yrn b̄r
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
xBm 0 ym1 ··· ymj ··· ymk ··· ymn b̄m
Where
zj − cj ≤ 0, i = 1, ...., n.
W̌e have several cases:
Case 1. If b̄ ≥ 0, then the current point is already optimal to the linear program.
Case 2. If b̄r < 0 for some r, and the rth row vector is nonnegative, i.e., yrj ≥ 0,
for all j = 1, ..., n, then the dual is unbounded (and hence the primal problem is infeasible).
Case 3. If b̄r < 0 for some r, and there exists at least one element yrj < 0 in the rth
row. In this case, let xBr leave, and use the following minimum ratio test to determine
the variable to enter:

zk − ck zj − cj
= min : yrj < 0 .
yrk yrj
Ǎfter pivoting at yrk , it is easy to see that the new
yrj
(zj − cj )0 = (zj − cj ) − (zk − ck ) ≤ 0,
yrk
184
and hence the dual feasibility is kept. Furthermore, we may show that the dual objective
is improved, in fact, the new objective value is
cB B −1 b − (zk − ck )b̄r /yrk ≥ cB B −1 b,
which follows from the fact
zk − ck ≤ 0, b̄r < 0, yrk < 0.
Ťhus, after a pivoting, the dual simplex method moves from one dual basic feasible
point to another one with improved function value. Under non-degeneracy assumption,
the algorithm terminates after finite iterations and finds the primal and dual optimal
solutions.
Summary of the dual simplex method (minimization problem)
Initialization Step: Find a basis B of the primal such that zj − cj ≤ 0 for all j.
Main Step:
Step 1. If b̄ = B −1 b ≥ 0, stop, the current solution is optimal. Otherwise, select the

pivot row r with b̄r < 0, say, b̄r = min{b̄i }.
Step 2. If yrj ≥ 0 for all j, stop, the dual is unbounded (and hence the primal is
infeasible). Otherwise, select the pivot column k by the following minimum ratio test:

zk − ck zj − cj
= min : yrj < 0 .
yrk yrj
Step 3. Pivot at yrk and return to Step 1.
185
Example 11.3. Solve the following LP problem:
s.t. x1 + 2x2 + x3 ≥ 3,
2x1 − x2 + 3x3 ≥ 4,
x1 , x2 , x3 ≥ 0.
Multiplying both sides of the inequalities by −1 and adding slack variables x4 and x5 , we
may form the first table as follows:
x1 x2 x3 x4 x5 RHS
f −2 −3 −4 0 0 0
x4 −1 −2 −1 1 0 −3
x5 −2∗ 1 −3 0 1 −4
Since b2 = −4 < 0, x5 leaves the basis. Which one to enter the basis? By minimum
ratio test,
z1 − c1 z3 − c3 −2 −4
min{ , } = min{ , } = min{1, 4/3} = 1
y21 y23 −2 −3
which indicates that x1 enters the basis. So after pivoting operations, we get the next
tableau.
x1 x2 x3 x4 x5 RHS
f 0 −4 −1 0 −1 4
x4 0 −(5/2)∗ 1/2 1 −1/2 −1
x1 1 −1/2 3/2 0 −1/2 2
Since b̄1 = −1 < 0, x4 must leave the basis. By the minimum ratio test, x2 enters the
basis. After pivoting, we get the optimal tableau
x1 x2 x3 x4 x5 RHS
f 0 0 −9/5 −8/5 −1/5 28/5
x2 0 1 −1/5 −2/5 1/5 2/5
x1 1 0 7/5 −1/5 −2/5 11/5
Thus the optimal solution is
x∗ = (11/5, 2/5, 0),
and optimal value

f ∗ = 28/5.
186
Chapter 12
Practice exercises
This chapter includes some additional practice exercises.
12.1 Exercises for Chapter 1
Q 1. Consider the following functions

(a)
f : R2 → R, for x = (x1 , x2 ) ∈ R2 .
1
f (x) = (x31 + x32 ) 3
(b)
f : R3 → R, for x = (x1 , x2 , x3 ) ∈ R3 .
1
f (x) = (x51 +x52 −3x21 x2 x23 +x21 x22 x3 +3x41 x2 +x53 ) 5
(c)
f : R3×3 → R, for A ∈ R3×3 .
1
f (A) = (det(A)) 3 ,
(c)
f : R3×3 → R, f (A) = tr(A) + (det(A)) 3 , for A ∈ R3×3 .
1
(e)
f : Pn (R) → R, for p = a0 + a1 x + . . . an xn ,
1
f (p) = (a0 a1 · · · a5 ) 6
where Pn is the vector space of all real polynomials of degree ≤ n.
For each of the above function, answer the following questions (below U is the corre-
sponding vector space in each case):
1. Show that f (0) = 0 and f (−x) = −f (x) for all x ∈ U .
187
12.1. EXERCISES FOR CHAPTER 1
2. Show that f (ax) = af (x) for all a ∈ R and x ∈ U .
3. Is f a linear map?
Q 2. Consider the following functions (note that in each case, C is a vector space over
itself)
(a)
f : C → C, f (z) = z̄ for z ∈ C.
(b)
f : C → C, f (z) = Rel(z) for z ∈ C.
For each of the above function, answer the following questions:
1. Show that f (z1 + z2 ) = f (z1 ) + f (z2 ) for all z1 , z2 ∈ C.
2. Is f a linear map?
Q 3. Let f : P3 (R) → P2 (R) be a map defined by
f (a0 + a1 x + a2 x2 + a3 x3 ) = (a0 + a2 ) + (a1 + a3 )x + (a0 + 2a1 + 2a2 + a3 )x2 .
1. Show that f is a linear map.
2. Find the matrix representation of f with respect to the bases
B = {1, t, t(t + 1), (t − 1)t(t + 1)} and C = {1, t, t(t + 1)}.
Q 4. Let f : R2×2 → R2×2 be a map defined by

   
a b a+b b + 2c
f   =  
c d a + b + c a + 2b + 3c + d
1. Show that f is a linear map.
2. What is the matrix representation of f with respect to the standard basis of R2×2 .
Q 5. Let A, B, C be three different bases of the same finite-dimensional vector space. Let
P be the transition matrix from A to B, and Q is the transition matrix from B to C.
What is the transition matrix from A to C? Prove your finding.
188
Q 6. 1. Show that B = {M1 , M2 , M3 , M4 } is a basis of R2×2 , where

       
1 1 −1 1 0 0 0 0
M1 =   , M3 =   , M3 =   , M4 =  
0 0 0 0 1 0 0 1
 
1 2
2. Find the coordinate vector of M =   with respect to B.
3 4
3. Find the transition matrix from the standard basis of R2×2 to B.
Q 7. Consider the bases B = {u1 , u2 } and B 0 = {u01 , u02 } of R2 , where

       
1 1 2 1
u1 =   , u2 =   , u01 =   , u02 =   .
0 1 1 2
1. Find the transition matrix from B to B 0 .
2. Find the transition matrix from B 0 to B.

 
−3
3. Compute the coordinate vector [u]B with u =  , then using part 1 to compute
2
[u]B 0 .
4. Compute [u]B 0 directly and compare with part 3.

 
10 −7
Q 8. Consider the matrix A =  .
2 1
Which of the following matrices is similar to A?

     
2 7 12 3 2 −1
B= , C =  , D =  .
1 9 0 2 6 9
Q 9. Show that the following matrices are not diagonalisable
     
2 1 0 1 1 0 −1 1 0
     
0 3 1 , 0 1 ,  0 −1 1  .
     
1
     
0 0 2 0 0 2 0 0 −1
Q 10. Find all the eigenvalues of the following matrices
 
  a+e b c d
  a+d b c 



a+c b    a b+e c d 
A= , B= a c , C= .
 
b+d  
a b+c    a b c+e d 
a b c+d  
a b c d+e
189
 
0.4 0.2
Q 11. Consider a 2 × 2 matrix A =  .
0.6 0.8
1. Find all eigenvalues and eigenvectors of A.
2. Compute An where n is a positive integer.
3. What is the limit lim An ?

n→+∞
Q 12. Let V be an inner product space. Show that for all u, v ∈ U the following identities
hold true
ku + vk2 + ku − vk2 = 2(kuk2 + kvk2 ),
and
1h 2 2
i
hu, vi = ku + vk − ku − vk .
4
Q 13. Suppose that {u1 , . . . , un } is a set of orthonormal vectors in an inner product space
V . Find ku1 + . . . + un k.
Q 14. Using the Cauchy-Schwarz inequality show that if a1 , . . . , an are positive real num-
bers then
1 1
(a1 + . . . + an ) + ... + ≥ n2 .
a1 an
Q 15. Let V be an inner product space. Let u, u1 , . . . , un ∈ V . Show that if u is orthogonal
to u1 , . . . , un then it is orthogonal to the subspace spanned by u1 , . . . , un .
Q 16. Let P2 be the vector space of real polynomials of degree ≤ 2. In P2 we define an

inner product Z 1
hp, qi = p(x)q(x) dx.
0
Apply the Gram-Schmidt process to transform the standard basis {1, x, x2 } to an orthono-
mal basis.
190
12.3. EXERCISES FOR CHAPTERS 3 & 4
12.3 Exercises for Chapters 3 & 4
Q 17. Is the following problem a linear programming problem? If not, formulate it into
a linear programming problem.
Minimize |x1 | + |x2 |
subject to −x1 + x2 ≤ 12,
x1 + x2 ≤ 1,
x1 , x2 ≥ 0.
Q 18. Is the following problem a linear programming problem? If not, formulate it into
a linear programming problem.
c1 x + c2 y + c3
Minimize
c4 x + c5 y + d
subject to a1 x + a2 y ≤ b1 ,
a3 x + a4 y ≥ b 2 ,
c3 x + c4 y + d > 0.
Q 19. Using the graphical method, solve the following LP problem
Maximize 2x1 + x2
subject to −2x1 + 3x2 ≤ 12,
x1 + x2 ≤ 5,
3x1 − 2x2 ≤ 6,
x1 , x2 ≥ 0.
Q 20. Show that the following LP problem has infinitely many optimal solutions
Minimize 2x1 + x2
subject to x2 ≥ 3,
x1 ≤ 2,
x1 , x2 ≥ 0.
What happens if we replace Minimize by Maximize?
191
Q 21. Solve the following LP problem
Minimize 2x1 + x2
subject to x2 ≥ 3,
x1 ≤ 2,
x1 + 2x2 ≤ 4,
x1 , x2 ≥ 0.
Q 22. Is an union of convex sets convex?
Q 23. Show that the Cartesian product of two convex sets is also a convex set.
Q 24. Let f : Rn → R be a convex function. Show that the sublevel sets {x : f (x) ≤ a}
and {x : f (x) < a} with a ∈ R are convex sets.
Q 25. Consider the set S = {(x, y) ∈ R2 : −x + 2y ≤ 2, x − 3y ≤ 3, x ≥ 0, y ≥ 0}.

Identify all extreme points and extreme directions of S. Represent the point (4, 2) as a
convex combination of the extreme points plus a nonnegative combination of the extreme
directions.
Q 26. Consider the following domain
A = {(x, y) ∈ R2 : −2x + y ≤ 4, x + y ≤ 5, 3x − 2y ≤ 6, x ≥ 0, y ≥ 0}.
1. Find the maximum of the function 2x + 5y where (x, y) ∈ A.
2. Find all the extreme points of A.
3. Compute the values of the function 2x + 5y at the extreme points of A, and compare
the maximum one with the result in part 1.
Q 27. Consider the following domain
A = {(x, y) ∈ R2 : −x + y ≤ 4, 2x + y ≤ 5, x + 2y ≥ 2, 3x − 2y ≤ 6, x ≥ 0, y ≥ 0}.
1. Find the minimum of the function x − 5y where (x, y) ∈ A.
192
2. Find all the extreme points of A.
3. Compute the values of the function x − 5y at the extreme points of A, and compare
the minimum one with the result in part 1.
Q 28. Find the dual problems for the following linear programming problems
(a) Minimize 2x1 + x2 − x3
subject to −2x1 + 3x2 + x3 ≤ 12,
x1 + x2 − 3x3 ≤ 5,
3x1 − 2x2 + 4x3 ≤ 6,
x1 , x2 , x3 ≥ 0.
(b) Minimize 3x1 + x2 − x3
subject to −x1 − 3x2 + x3 ≥ 10,
x1 + x2 − 2x3 ≤ 5,
3x1 − 2x2 + 4x3 ≥ 6,
x1 , x2 , x3 ≥ 0.
Q 29. Consider the following linear programming problem
Maximize 2x1 + 3x2
subject to −2x1 + 3x2 ≤ 6,
x1 + x2 ≤ 5,
x1 − 2x2 ≤ 4,
x1 , x2 ≥ 0.
1. Solve the above problem using the graphical method.
2. Find the dual problem.
3. Using duality theorems and complementary slackness conditions to solve the dual
problem.
193
12.6. EXERCISES FOR CHAPTERS 7 & 8 & 9
12.6 Exercises for Chapters 7 & 8 & 9
Q 30. Consider the following linear programming problem
Minimize −2x1 − 3x2 + x3
subject to −2x1 + 3x2 − 2x3 ≤ 6,
x1 + 2x2 + x3 ≤ 4,
x1 + x2 + x3 ≤ 5,
x1 , x2 , x3 ≥ 0.
1. Find a basic feasible solution.
2. Solve the problem using the simplex method.
3. In each of the following change of the data, determine if the optimal solution is
affected. If it is affected, find the new optimal solution
(a) The cost coefficient vector is changed to (c01 , c02 , c03 ) = (−2, 4, 1).
(b) The right hand side vector is changed to (b01 , b02 , b03 ) = (6, 4, 6).
(c) The right hand side vector is changed to (b01 , b02 , b03 ) = (6, 6, 5).
Q 31. Solve the following linear programming problem
Maximize x + 2y
subject to 2x + 3y ≤ 6,
2x + y ≤ 4,
x ≤ 5,
x, y ≥ 0.
To understand how the optimal solution to the above problem is affected when the data is
changed, solve the following problems.
Maximize x + (2 + a)y
subject to 2x + 3y ≤ 6,
2x + y ≤ 4,
x ≤ 5,
x, y ≥ 0.
194
12.7. EXERCISES FOR CHAPTERS 10 & 11
Maximize x + 2y
subject to 2x + 3y ≤ 6 + a,
2x + y ≤ 4,
x ≤ 5,
x, y ≥ 0.
You are encouraged to create your own problems by changing other data. Note that this
problem has only two variables, you can solve it and observe the change geometrically.
12.7 Exercises for Chapters 10 & 11
Q 32. Solve the following LP problems using any of the method you have studied.
(1) Minimise x1 + x2 + 4x3
s.t. x1 + 2x2 + x3 ≥ 3,
2x1 − 2x2 + 3x3 ≥ 4,
x1 , x2 , x3 ≥ 0.
(2) Minimise x1 + x2 + 4x3
s.t. x1 + 2x2 + x3 ≥ 3,
2x1 − 2x2 + 3x3 ≤ 4,
x1 , x2 , x3 ≥ 0.
You are encouraged to create your own similar problems where the constraints are mixed.
195
Index
algebraic multiplicity, 34 difference equations, 40

algorithm for minimization problems, differential equations, 44
133 distance, 56
angle between two vectors, 56 dual problem, 98
basic feasible solution, 119 eigenspace, 24

bilinear form, 49 eigenvalue, 24
eigenvector, 24
canonical form, 72
extreme direction, 92
change of basis, 21
extreme point, 91
change of coordinate systems, 18
characteristic polynomial, 29 feasible point, 70
coefficient matrix, 70 feasible region, 70
complementary slackness condition, 112 feasible set, 70
constraint, 70 feasible solution, 70
constraint matrix, 70 Fibonacci numbers, 41
convex combination, 87 fundamental theorem of algebra, 32
coordinate vector, 11
geometric method, 80
coordinates, 11
geometric multiplicity, 34
cost coefficient, 70
Gram matrix, 51
decision variables, 70 Gram-Schmidt process, 59
diagonal matrices, 23
half-space, 87
diagonalisable, 26
hyperplane, 86
diagonalisable matrix, 23
diagonalise, 26 image, 14
196
INDEX
inner product, 54 orthogonal basis, 60

inner product space, 54 orthonormal basis, 63
Jordan canonical form, 46 pivoting, 137

polyhedron, 88
kernel, 14
power of a matrix, 42
length of a vector, 56 primal problem, 98
linear map, 7
linear programming, 67 range, 14
linear transformation, 7 rank, 14

rank-nullity theorem, 15
mathematical optimization, 67
right-hand-side vector, 70
matrix of a linear map, 11
matrix representation, 12 sensitivity analysis, 157
max-flow min-cut theorem, 106 similar matrices, 22
max-flow problem, 105 standard form, 71
min-cut problem, 105 steepest descent, 80
Minkowski’s Theorem, 94 strong duality theorem, 103

symmetric bilinear form, 50
network flow, 104
new activity, 161 tableau format, 136
new constraint, 163 the big-M method, 168
null space, 14 the dual simplex method, 181
nullity, 14 the simplex method, 125, 133
the two-phase method, 168
objective function, 70
transition matrix, 17
optimality condition, 109
orthogonal, 53 weak duality theorem, 101
197
Bibliography
[Bea55] E. M. L. Beale. Cycling in the dual simplex algorithm. Naval Research Logistics
Quarterly, pages 269–275, 1955.
[BT97] D. Bertsimas and J.N. Tsitsiklis. Introduction to Linear Optimization. Athena

Scientific series in optimization and neural computation. Athena Scientific, 1997.
[DT97] George B. Dantzig and Mukund N. Thapa. Linear programming. 1. Springer

Series in Operations Research. Springer-Verlag, New York, 1997.
[Gre94] Harvey J. Greenberg. The use of the optimal partition in a linear programming
solution for postoptimal analysis. Operations Research Letters, 15(4):179 – 185,
1994.
[KM72] V. Klee and G.J. Minty. How good is the simplex algorithm? In O. Shisha,
editor, Inequalities, III, pages 159–175, 1972.
[KW98] R.W. Kaye and R. Wilson. Linear Algebra. Oxford science publications. Oxford
University Press, 1998.
[Sma81] Steve Smale. The fundamental theorem of algebra and complexity theory. Bull.
Amer. Math. Soc. (N.S.), 4(1):1–36, 01 1981.
[Str06] Gilbert Strang. Linear algebra and its applications. Thomson, Brooks/Cole,
2006.
[Str16] G. Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press, 2016.
[Tao16] T. Tao. Linear algebra course notes. AMS open math notes, 2016.
198

Lecture Notes J2LALP

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Notes J2LALP

Uploaded by

Copyright:

Available Formats

Linear Algebra and Linear Programming

Dr. Hong Duong

• construct linear programming models of a variety of managerial problems and in-

• explain the computational complexity of SIMPLEX and LP

1 Linear maps, eigenvalues and eigenvectors, and matrix diagonalisation 6

1.1 Linear maps, matrix representation, rank-nullity theorem . . . . . . . . . . 7

1.2 Change of basis and similar matrices . . . . . . . . . . . . . . . . . . . . . 21

1.3 Diagonal and diagonalisable matrices . . . . . . . . . . . . . . . . . . . . . 22

1.4 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.5 Characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.6 An algorithm for diagonalisation . . . . . . . . . . . . . . . . . . . . . . . 36

1.7 Some applications of diagonalisation . . . . . . . . . . . . . . . . . . . . . 40

1.7.1 Systems of difference equations . . . . . . . . . . . . . . . . . . . . 40

1.7.2 Systems of differential equations . . . . . . . . . . . . . . . . . . . 44

1.8 Jordan canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2 Bilinear forms and inner products 48

2.1 Symmetric bilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.2 The Gram matrix of a bilinear form . . . . . . . . . . . . . . . . . . . . . . 51

2.4 Inner products and inner product spaces . . . . . . . . . . . . . . . . . . . 54

2.5 Length of a vector, Pythagoras’s theorem and Cauchy-Schwartz inequality 56

2.6 Gram-Schmidt orthogonalisation . . . . . . . . . . . . . . . . . . . . . . . 59

3 Introduction to Linear Programming 66

3.1 What is Linear Programming? . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.2 Definition of LP: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.4 Standard Forms of LPs: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.5 Canonical Form of LPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.7 A brief history of LPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4 LPs with 2 Variables: Geometric Method 80

5 Introduction to Convex Analysis 86

5.1 Basic concepts in convex analysis . . . . . . . . . . . . . . . . . . . . . . . 86

5.2 Extreme Points, Directions, Representation of Polyhedra . . . . . . . . . . 91

6.1 Motivation and derivation of dual LP problems . . . . . . . . . . . . . . . 96

6.2 Primal and dual problems of LPs . . . . . . . . . . . . . . . . . . . . . . . 98

6.3 Duality theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.4 Network flow problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.5 Optimality Conditions & Complementary slackness . . . . . . . . . . . . . 108

6.6 Complementary slackness condition . . . . . . . . . . . . . . . . . . . . . 112

6.7 More Applications of Duality Theory . . . . . . . . . . . . . . . . . . . . . 114

7 The Simplex Method (I): Theory 116

7.1 Basic Feasible Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

7.1.1 Review of Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 117

7.1.2 Existence of the Optimal Extreme Point . . . . . . . . . . . . . . . 117

7.1.3 Basic Feasible Solution . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.1.4 The number of the BFS . . . . . . . . . . . . . . . . . . . . . . . . 121

7.1.5 Correspondence between BFSs & extreme points . . . . . . . . . . 122

7.2 The simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

7.2.1 Reformulation of LP by the current BFS . . . . . . . . . . . . . . . 126

7.2.2 Checking the Optimality . . . . . . . . . . . . . . . . . . . . . . . . 128

7.2.3 Analysis for the case zj − cj > 0 and yj = B −1 aj has a positive

8 The simplex method (II): Algorithm 133

8.1 The algorithm for minimization problems . . . . . . . . . . . . . . . . . . . 133

8.2 The simplex method in Tableau Format . . . . . . . . . . . . . . . . . . . 135

8.2.1 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

8.2.2 The simplex method in tableau Format . . . . . . . . . . . . . . . . 138

8.3 Degeneracy and cycling in the simplex method . . . . . . . . . . . . . . . . 141

8.4 Efficiency of the simplex method . . . . . . . . . . . . . . . . . . . . . . . 144

8.5 Further Examples for Simplex Methods & Degeneracy . . . . . . . . . . . 148

8.6 Extracting Information from Optimal Simplex Tableau . . . . . . . . . . . 152

8.6.1 Find optimal solution and objective value . . . . . . . . . . . . . . 152

8.6.3 Find the inverse of optimal basis . . . . . . . . . . . . . . . . . . . 154

8.6.4 Recovering original LP problems from the optimal tableau . . . . . 155

8.6.5 Recovering missing data in simplex tableau . . . . . . . . . . . . . 156

9 Sensitivity Analysis 157

9.1 Change in the cost vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 158