Professional Documents
Culture Documents
Regarded as mathematical entities, the objective and the constraints in a Mathematical Pro-
gramming problem are functions of several real variables; therefore before entering the Optimiza-
tion Theory and Methods, we need to recall several basic notions and facts about the spaces Rn
where these functions live, same as about the functions themselves. The reader is supposed to
know most of the facts to follow, so he/she should not be surprised by a “cooking book” style
which we intend to use below.
A.1.1 A point in Rn
A point in Rn (called also an n-dimensional vector) is an ordered collection x = (x1 , ..., xn ) of
n reals, called the coordinates, or components, or entries of vector x; the space Rn itself is the
set of all collections of this type.
291
292 APPENDIX A. PREREQUISITES FROM LINEAR ALGEBRA
1. The entire Rn ;
2. The trivial subspace containing the single zero vector 0 = (0, ..., 0) 1) ; (this vector/point
is called also the origin)
3. The set {x ∈ Rn : x1 = 0} of all vectors x with the first coordinate equal to zero.
The latter example admits a natural extension:
4. The set of all solutions to a homogeneous (i.e., with zero right hand side) system of linear
equations
a11 x1 + ... + a1n xn = 0
a21 x1 + ... + a2n xn = 0
x ∈ Rn : (A.1.1)
...
am1 x1 + ... + amn xn = 0
always is a linear subspace in Rn . This example is “generic”, that is, every linear sub-
space in Rn is the solution set of a (finite) system of homogeneous linear equations, see
Proposition A.3.6 below.
5. Linear span of a set of vectors. Given a nonempty set X of vectors, one can form a linear
subspace Lin(X), called the linear span of X; this subspace consists of all vectors x which
∑N ∑N
can be represented as linear combinations λi xi of vectors from X (in λi xi , N is an
i=1 i=1
arbitrary positive integer, λi are reals and xi belong to X). Note that
A.1.3.B. Sums and intersections of linear subspaces. Let {Lα }α∈I be a family (finite or
infinite) of linear subspaces of Rn . From this family, one can build two sets:
∑
1. The sum Lα of the subspaces Lα which consists of all vectors which can be represented
α
as finite sums of vectors taken each from its own subspace of the family;
∩
2. The intersection Lα of the subspaces from the family.
α
2. Every vector from Rn is a linear combination of vectors from the collection (i.e.,
Lin{f 1 , ..., f m } = Rn ).
Besides the bases of the entire Rn , one can speak about the bases of linear subspaces:
A collection {f 1 , ..., f m } of vectors is called a basis of a linear subspace L, if
2. L = Lin{f 1 , ..., f m }, i.e., all vectors f i belong to L, and every vector from L is a linear
combination of the vectors f 1 , ..., f m .
In order to avoid trivial remarks, it makes sense to agree once for ever that
With this convention, the trivial linear subspace L = {0} also has a basis, specifically, an empty
set of vectors.
Theorem A.1.2 (i) Let L be a linear subspace of Rn . Then L admits a (finite) basis, and all
bases of L are comprised of the same number of vectors; this number is called the dimension of
L and is denoted by dim (L).
We have seen that Rn admits a basis comprised of n elements (the standard basic
orths). From (i) it follows that every basis of Rn contains exactly n vectors, and the
dimension of Rn is n.
(ii) The larger is a linear subspace of Rn , the larger is its dimension: if L ⊂ L′ are linear
subspaces of Rn , then dim (L) ≤ dim (L′ ), and the equality takes place if and only if L = L′ .
We have seen that the dimension of Rn is n; according to the above convention, the
trivial linear subspace {0} of Rn admits an empty basis, so that its dimension is 0.
Since {0} ⊂ L ⊂ Rn for every linear subspace L of Rn , it follows from (ii) that the
dimension of a linear subspace in Rn is an integer between 0 and n.
(iii) Let L be a linear subspace in Rn . Then
(iii.1) Every linearly independent subset of vectors from L can be extended to a basis of L;
(iii.2) From every spanning subset X for L – i.e., a set X such that Lin(X) = L – one
can extract a basis of L.
It follows from (iii) that
– every linearly independent subset of L contains at most dim (L) vectors, and if it
contains exactly dim (L) vectors, it is a basis of L;
– every spanning set for L contains at least dim (L) vectors, and if it contains exactly
dim (L) vectors, it is a basis of L.
(iv) Let L be a linear subspace in Rn , and f 1 , ..., f m be a basis in L. Then every vector
x ∈ L admits exactly one representation
∑
m
x= λi (x)f i
i=1
is a one-to-one mapping of L onto Rm which is linear, i.e. for every i = 1, ..., m one has
λi (x + y) = λi (x) + λi (y) ∀(x, y ∈ L);
(A.1.2)
λi (νx) = νλi (x) ∀(x ∈ L, ν ∈ R).
The reals λi (x), i = 1, ..., m, are called the coordinates of x ∈ L in the basis f 1 , ..., f m .
E.g., the coordinates of a vector x ∈ Rn in the standard basis e1 , ..., en of Rn – the
one comprised of the standard basic orths – are exactly the entries of x.
(v) [Dimension formula] Let L1 , L2 be linear subspaces of Rn . Then
Important convention. When speaking about adding n-dimensional vectors and multiplying
them by reals, it is absolutely unimportant whether we treat the vectors as the column ones,
or the row ones, or write down the entries in rectangular tables, or something else. However,
when matrix operations (matrix-vector multiplication, transposition, etc.) become involved, it
is important whether we treat our vectors as columns, as rows, or as something else. For the
sake of definiteness, from now on we treat all vectors as column ones, independently of how
we refer to them in the text. For example, when saying for the first time what a vector is, we
wrote x = (x1 , ..., xn ), which might suggest that we were speaking about row vectors. We stress
that it is not the case, and the only reason for using the notation x = (x1 , ...,
xn ) instead
of
x1 x1
.. ..
the “correct” one x = . is to save space and to avoid ugly formulas like f ( . ) when
xn xn
speaking about functions with vector arguments. After we have agreed that there is no such
thing as a row vector in this Lecture course, we can use (and do use) without any harm whatever
notation we want.
296 APPENDIX A. PREREQUISITES FROM LINEAR ALGEBRA
Exercise A.1 1. Mark in the list below those subsets of Rn which are linear subspaces, find
out their dimensions and point out their bases:
(a) Rn
(b) {0}
(c) ∅
∑
n
(d) {x ∈ Rn : ixi = 0}
i=1
∑n
(e) {x ∈ Rn : ix2i = 0}
i=1
∑n
(f ) {x ∈ Rn : ixi = 1}
i=1
∑n
(g) {x ∈ Rn : ix2i = 1}
i=1
3. Consider the space Rm×n of m × n matrices with real entries. As far as linear operations
– addition of matrices and multiplication of matrices by reals – are concerned, this space
can be treated as certain RN .
(a) Find the dimension of Rm×n and point out a basis in this space
(b) In the space Rn×n of square n × n matrices, there are two interesting subsets: the set
Sn of symmetric matrices {A = [Aij ] : Aij = Aij } and the set Jn of skew-symmetric
matrices {A = [Aij ] : Aij = −Aji }.
i. Verify that both Sn and Jn are linear subspaces of Rn×n
ii. Find the dimension and point out a basis in Sn
iii. Find the dimension and point out a basis in Jn
iv. What is the sum of Sn and Jn ? What is the intersection of Sn and Jn ?
3. [positive definiteness]: The quantity ⟨x, x⟩ always is nonnegative, and it is zero if and only
if x is zero.
Remark A.2.1 The outlined 3 properties – bi-linearity, symmetry and positive definiteness –
form a definition of an Euclidean inner product, and there are infinitely many different from
each other ways to satisfy these properties; in other words, there are infinitely many different
Euclidean inner products on Rn . The standard inner product ⟨x, y⟩ = xT y is just a particular
case of this general notion. Although in the sequel we normally work with the standard inner
product, the reader should remember that the facts we are about to recall are valid for all
Euclidean inner products, and not only for the standard one.
The notion of an inner product underlies a number of purely algebraic constructions, in partic-
ular, those of inner product representation of linear forms and of orthogonal complement.
f (x) = ⟨f, x⟩ ∀x
a linear form on Rn ;
(iii) The above one-to-one correspondence between the linear forms and vectors on Rn is
linear: adding linear forms (or multiplying a linear form by a real), we add (respectively, multiply
by the real) the vector(s) representing the form(s).
298 APPENDIX A. PREREQUISITES FROM LINEAR ALGEBRA
L⊥ = {f : ⟨f, x⟩ = 0 ∀x ∈ L}.
Theorem A.2.2 (i) Twice taken, orthogonal complement recovers the original subspace: when-
ever L is a linear subspace of Rn , one has
(L⊥ )⊥ = L;
(ii) The larger is a linear subspace L, the smaller is its orthogonal complement: if L1 ⊂ L2
are linear subspaces of Rn , then L⊥ ⊥
1 ⊃ L2
(iii) The intersection of a subspace and its orthogonal complement is trivial, and the sum of
these subspaces is the entire Rn :
L ∩ L⊥ = {0}, L + L⊥ = Rn .
Remark A.2.2 From Theorem A.2.2.(iii) and the Dimension formula (Theorem A.1.2.(v)) it
follows, first, that for every subspace L in Rn one has
x = xL + xL⊥
into a sum of two vectors: the first of them, xL , belongs to L, and the second, xL⊥ , belongs
to L⊥ . This decomposition is called the orthogonal decomposition of x taken with respect to
L, L⊥ ; xL is called the orthogonal projection of x onto L, and xL⊥ – the orthogonal projection
of x onto the orthogonal complement of L. Both projections depend on x linearly, for example,
i ̸= j ⇒ ⟨f i , f j ⟩ = 0
⟨f i , f i ⟩ = 1, i = 1, ..., m.
A.2. SPACE RN : EUCLIDEAN STRUCTURE 299
Theorem A.2.3 (i) An orthonormal collection f 1 , ..., f m always is linearly independent and is
therefore a basis of its linear span L = Lin(f 1 , ..., f m ) (such a basis in a linear subspace is called
orthonormal). The coordinates of a vector x ∈ L w.r.t. an orthonormal basis f 1 , ..., f m of L are
given by explicit formulas:
∑
m
x= λi (x)f i ⇔ λi (x) = ⟨x, f i ⟩.
i=1
Example of an orthonormal basis in Rn : The standard basis {e1 , ..., en } is orthonormal with
respect to the standard inner product ⟨x, y⟩ = xT y on Rn (but is not orthonormal w.r.t.
other Euclidean inner products on Rn ).
Proof of (i): Taking inner product of both sides in the equality
∑
x= λj (x)f j
j
with f i , we get
∑
⟨x, fi ⟩ = ⟨ λj (x)f j , f i ⟩
∑j
= λj (x)⟨f j , f i ⟩ [bilinearity of inner product]
j
= λi (x) [orthonormality of {f i }]
(ii) If f 1 , ..., f m is an orthonormal basis in a linear subspace L, then the inner product of
two vectors x, y ∈ L in the coordinates λi (·) w.r.t. this basis is given by the standard formula
∑
m
⟨x, y⟩ = λi (x)λi (y).
i=1
Proof:
∑ ∑
x = λi (x)f i , y = λi (y)f i
i∑ ∑ i
⇒ ⟨x, y⟩ = ⟨ λi (x)f i , λi (y)f i ⟩
∑i i
= λi (x)λj (y)⟨f i , f j ⟩ [bilinearity of inner product]
∑
i,j
= λi (x)λi (y) [orthonormality of {f i }]
i
(iii) Every linear subspace L of Rn admits an orthonormal basis; moreover, every orthonor-
mal system f 1 , ..., f m of vectors from L can be extended to an orthonormal basis in L.
Important corollary: All Euclidean spaces of the same dimension are “the same”. Specifi-
cally, if L is an m-dimensional space in a space Rn equipped with an Euclidean inner product
⟨·, ·⟩, then there exists a one-to-one mapping x 7→ A(x) of L onto Rm such that
• The mapping converts the ⟨·, ·⟩ inner product on L into the standard inner product on
Rm :
⟨x, y⟩ = (A(x))T A(y) ∀x, y ∈ L.
Indeed, by (iii) L admits an orthonormal basis f 1 , ..., f m ; using (ii), one can immediately
check that the mapping
x 7→ A(x) = (λ1 (x), ..., λm (x))
which maps x ∈ L into the m-dimensional vector comprised of the coordinates of x in the
basis f 1 , ..., f m , meets all the requirements.
Proof of (iii) is given by important by its own right Gram-Schmidt orthogonalization process
as follows. We start with an arbitrary basis h1 , ..., hm in L and step by step convert it into
an orthonormal basis f 1 , ..., f m . At the beginning of a step t of the construction, we already
have an orthonormal collection f 1 , ..., f t−1 such that Lin{f 1 , ..., f t−1 } = Lin{h1 , ..., ht−1 }.
At a step t we
After m steps of the optimization process, we end up with an orthonormal system f 1 , ..., f m
of vectors from L such that
Exercise A.2 1. What is the orthogonal complement (w.r.t. the standard inner product) of
∑n
the subspace {x ∈ Rn : xi = 0} in Rn ?
i=1
2. Find an orthonormal basis (w.r.t. the standard inner product) in the linear subspace {x ∈
Rn : x1 = 0} of Rn
A.3. AFFINE SUBSPACES IN RN 301
∑
m
xL = (xT f i )f i .
i=1
(L1 + L2 )⊥ = L⊥ ⊥
1 ∩ L2 ; (L1 ∩ L2 )⊥ = L⊥ ⊥
1 + L2 .
5. Consider the space of m × n matrices Rm×n , and let us equip it with the “standard inner
product” (called the Frobenius inner product)
∑
⟨A, B⟩ = Aij Bij
i,j
(a) Verify that in terms of matrix multiplication the Frobenius inner product can be writ-
ten as
⟨A, B⟩ = Tr(AB T ),
where Tr(C) is the trace (the sum of diagonal elements) of a square matrix C.
(b) Build an orthonormal basis in the linear subspace Sn of symmetric n × n matrices
(c) What is the orthogonal complement of the subspace Sn of symmetric n × n matrices
in the space Rn×n of square n × n matrices?
[ ]
2 1 2
(d) Find the orthogonal decomposition, w.r.t. S , of the matrix
3 4
Definition A.3.1 [Affine subspace] An affine subspace (a plane) in Rn is a set of the form
M = a + L = {y = a + x | x ∈ L}, (A.3.1)
302 APPENDIX A. PREREQUISITES FROM LINEAR ALGEBRA
E.g., shifting the linear subspace L comprised of vectors with zero first entry by a vector a =
(a1 , ..., an ), we get the set M = a + L of all vectors x with x1 = a1 ; according to our terminology,
this is an affine subspace.
Immediate question about the notion of an affine subspace is: what are the “degrees of
freedom” in decomposition (A.3.1) – how “strict” M determines a and L? The answer is as
follows:
L = M − M = {x − y | x, y ∈ M }. (A.3.2)
The shifting vector a is not uniquely defined by M and can be chosen as an arbitrary vector from
M.
Corollary A.3.1 Let {Mα } be an arbitrary family of affine subspaces in Rn , and assume that
the set M = ∩α Mα is nonempty. Then Mα is an affine subspace.
From Corollary A.3.1 it immediately follows that for every nonempty subset Y of Rn there exists
the smallest affine subspace containing Y – the intersection of all affine subspaces containing Y .
This smallest affine subspace containing Y is called the affine hull of Y (notation: Aff(Y )).
All this resembles a lot the story about linear spans. Can we further extend this analogy
and to get a description of the affine hull Aff(Y ) in terms of elements of Y similar to the one
of the linear span (“linear span of X is the set of all linear combinations of vectors from X”)?
Sure we can!
Let us choose somehow a point y0 ∈ Y , and consider the set
X = Y − y0 .
All affine subspaces containing Y should contain also y0 and therefore, by Proposition A.3.1,
can be represented as M = y0 + L, L being a linear subspace. It is absolutely evident that an
affine subspace M = y0 + L contains Y if and only if the subspace L contains X, and that the
larger is L, the larger is M :
L ⊂ L′ ⇒ M = y 0 + L ⊂ M ′ = y 0 + L′ .
Thus, to find the smallest among affine subspaces containing Y , it suffices to find the smallest
among the linear subspaces containing X and to translate the latter space by y0 :
Now, we know what is Lin(Y − y0 ) – this is a set of all linear combinations of vectors from
Y − y0 , so that a generic element of Lin(Y − y0 ) is
∑
k
x= µi (yi − y0 ) [k may depend of x]
i=1
with yi ∈ Y and real coefficients µi . It follows that the generic element of Aff(Y ) is
∑
k ∑
k
y = y0 + µi (yi − y0 ) = λi y i ,
i=1 i=0
where ∑
λ0 = 1 − µi , λi = µi , i ≥ 1.
i
We see that a generic element of Aff(Y ) is a linear combination of vectors from Y . Note, however,
that the coefficients λi in this combination are not completely arbitrary: their sum is equal to
1. Linear combinations of this type – with the unit sum of coefficients – have a special name –
they are called affine combinations.
We have seen that every vector from Aff(Y ) is an affine combination of vectors of Y . Whether
the inverse is true, i.e., whether Aff(Y ) contains all affine combinations of vectors from Y ? The
answer is positive. Indeed, if
∑k
y= λ i yi
i=1
∑
is an affine combination of vectors from Y , then, using the equality λi = 1, we can write it
i
also as
∑
k
y = y0 + λi (yi − y0 ),
i=1
y0 being the “marked” vector we used in our previous reasoning, and the vector of this form, as
we already know, belongs to Aff(Y ). Thus, we come to the following
When Y itself is an affine subspace, it, of course, coincides with its affine hull, and the above
Proposition leads to the following
Corollary A.3.2 An affine subspace M is closed with respect to taking affine combinations of
its members – every combination of this type is a vector from M . Vice versa, a nonempty set
which is closed with respect to taking affine combinations of its members is an affine subspace.
X = Y − y0
Affinely independent sets. A linearly independent set x1 , ..., xk is a set such that no nontriv-
ial linear combination of x1 , ..., xk equals to zero. An equivalent definition is given by Theorem
A.1.2.(iv): x1 , ..., xk are linearly independent, if the coefficients in a linear combination
∑
k
x= λi xi
i=1
are uniquely defined by the value x of the combination. This equivalent form reflects the essence
of the matter – what we indeed need, is the uniqueness of the coefficients in expansions. Ac-
cordingly, this equivalent form is the prototype for the notion of an affinely independent set: we
want to introduce this notion in such a way that the coefficients λi in an affine combination
∑
k
y= λ i yi
i=0
for two different collections of coefficients λi and λ′i with unit sums of coefficients; if it is the
case, then
∑m
(λi − λ′i )yi = 0,
i=0
with nontrivial zero sum coefficients µi . Thus, we have motivated the following
A.3. AFFINE SUBSPACES IN RN 305
∑
k ∑
k
λi yi = 0, λi = 0 ⇒ λ0 = λ1 = ... = λk = 0.
i=1 i=0
With this definition, we get the result completely similar to the one of Theorem A.1.2.(iv):
Corollary A.3.3 Let y0 , ..., yk be affinely independent. Then the coefficients λi in an affine
combination
∑k ∑
y= λi y i [ λi = 1]
i=0 i
of the vectors y0 , ..., yk are uniquely defined by the value y of the combination.
Verification of affine independence of a collection can be immediately reduced to verification of
linear independence of closely related collection:
Proposition A.3.4 k + 1 vectors y0 , ..., yk are affinely independent if and only if the k vectors
(y1 − y0 ), (y2 − y0 ), ..., (yk − y0 ) are linearly independent.
From the latter Proposition it follows, e.g., that the collection 0, e1 , ..., en comprised of the origin
and the standard basic orths is affinely independent. Note that this collection is linearly depen-
dent (as every collection containing zero). You should definitely know the difference between
the two notions of independence we deal with: linear independence means that no nontrivial
linear combination of the vectors can be zero, while affine independence means that no nontrivial
linear combination from certain restricted class of them (with zero sum of coefficients) can be
zero. Therefore, there are more affinely independent sets than the linearly independent ones: a
linearly independent set is for sure affinely independent, but not vice versa.
Affine bases and affine dimension. Propositions A.3.2 and A.3.3 reduce the notions of
affine spanning/affine independent sets to the notions of spanning/linearly independent ones.
Combined with Theorem A.1.2, they result in the following analogies of the latter two statements:
Proposition A.3.5 [Affine dimension] Let M = a + L be an affine subspace in Rn . Then the
following two quantities are finite integers which are equal to each other:
(i) minimal # of elements in the subsets of M which affinely span M ;
(ii) maximal # of elements in affine independent subsets of M .
The common value of these two integers is by 1 more than the dimension dim L of L.
By definition, the affine dimension of an affine subspace M = a + L is the dimension dim L of
L. Thus, if M is of affine dimension k, then the minimal cardinality of sets affinely spanning
M , same as the maximal cardinality of affine independent subsets of M , is k + 1.
Theorem A.3.1 [Affine bases] Let M = a + L be an affine subspace in Rn .
A. Let Y ⊂ M . The following three properties of X are equivalent:
(i) Y is an affine independent set which affinely spans M ;
(ii) Y is affine independent and contains 1 + dim L elements;
(iii) Y affinely spans M and contains 1 + dim L elements.
306 APPENDIX A. PREREQUISITES FROM LINEAR ALGEBRA
We already know that the standard basic orths e1 , ..., en form a basis of the entire space Rn .
And what about affine bases in Rn ? According to Theorem A.3.1.A, you can choose as such a
basis a collection e0 , e0 + e1 , ..., e0 + en , e0 being an arbitrary vector.
Barycentric coordinates. Let M be an affine subspace, and let y0 , ..., yk be an affine basis
of M . Since the basis, by definition, affinely spans M , every vector y from M is an affine
combination of the vectors of the basis:
∑
k ∑k
y= λ i yi [ λi = 1],
i=0 i=0
and since the vectors of the affine basis are affinely independent, the coefficients of this com-
bination are uniquely defined by y (Corollary A.3.3). These coefficients are called barycentric
coordinates of y with respect to the affine basis in question. In contrast to the usual coordinates
with respect to a (linear) basis, the barycentric coordinates could not be quite arbitrary: their
sum should be equal to 1.
Comment. The “outer” description of a linear subspace/affine subspace – the “artist’s” one
– is in many cases much more useful than the “inner” description via linear/affine combinations
(the “worker’s” one). E.g., with the outer description it is very easy to check whether a given
vector belongs or does not belong to a given linear subspace/affine subspace, which is not that
easy with the inner one5) . In fact both descriptions are “complementary” to each other and
perfectly well work in parallel: what is difficult to see with one of them, is clear with another.
The idea of using “inner” and “outer” descriptions of the entities we meet with – linear subspaces,
affine subspaces, convex sets, optimization problems – the general idea of duality – is, I would
say, the main driving force of Convex Analysis and Optimization, and in the sequel we would
all the time meet with different implementations of this fundamental idea.
• Subspaces of dimension 0 are translations of the only 0-dimensional linear subspace {0},
i.e., are singleton sets – vectors from Rn . These subspaces are called points; a point is a
solution to a square system of linear equations with nonsingular matrix.
where y0 , y1 is an affine basis of l; you can choose as such a basis any pair of distinct points
on the line.
The “outer” description a line is as follows: it is the set of solutions to a linear system
with n variables and n − 1 linearly independent equations.
• Subspaces of dimension > 2 and < n − 1 have no special names; sometimes they are called
affine planes of such and such dimension.
• Affine subspaces of dimension n − 1, due to important role they play in Convex Analysis,
have a special name – they are called hyperplanes. The outer description of a hyperplane
is that a hyperplane is the solution set of a single linear equation
aT x = b
5)
in principle it is not difficult to certify that a given point belongs to, say, a linear subspace given as the linear
span of some set – it suffices to point out a representation of the point as a linear combination of vectors from
the set. But how could you certify that the point does not belong to the subspace?
A.3. AFFINE SUBSPACES IN RN 309
with nontrivial left hand side (a ̸= 0). In other words, a hyperplane is the level set
a(x) = const of a nonconstant linear form a(x) = aT x.
• The “largest possible” affine subspace – the one of dimension n – is unique and is the
entire Rn . This subspace is given by an empty system of linear equations.
310 APPENDIX A. PREREQUISITES FROM LINEAR ALGEBRA