Lecture 02

MS&E310 Lecture Note #02
Mathematical Preliminaries
Yinyu Ye
Department of Management Science and Engineering
Stanford University
Stanford, CA 94305, U.S.A.
http://www.stanford.edu/˜yyye
Chapters 1 and Appendixes A,B.1-B.2,C.1
1
Real n-Space; Euclidean Space
• R, R+ , int R+
• Rn , Rn+ , int Rn+
• x ≥ y means xj ≥ yj for j = 1, 2, ..., n
• 0: all zero vector; and e: all one vector
• Inner-Product:
∑
n
x • y := xT y = xj yj
j=1
√ (∑ )1/p
n
• Norm: ∥x∥2 = xT x, ∥x∥∞ = max{|x1 |, |x2 |, ..., |xn |}, ∥x∥p = j=1 |xj |
p
• The dual of the p norm, denoted by ∥.∥∗ , is the q norm, where 1

p + 1
q =1
• Column vector:
x = (x1 ; x2 ; . . . ; xn )
2
and row vector:

x = (x1 , x2 , . . . , xn )
• A set of vectors a1 , ..., am is said to be linearly dependent if there are scalars λ1 , ..., λm , not all
zero, such that the linear combination
∑
m
λi ai = 0
i=1
• A linearly independent set of vectors that span Rn is a basis.
3
Matrices
• Rm×n , ai. , a.j , aij

• AI denotes the submatrix of A whose rows belong to I , AJ denotes the submatrix of whose columns
belong to J , AIJ .
• 0: all zero matrix, and I : the identity matrix

• N (A), R(A):
Theorem 1 Each linear subspace of Rn is generated by finitely many vectors, and is also the
intersection of finitely many hyperplanes; that is, for each linear subspace of L of Rn there are
matrices A and C such that L = N (A) = R(C).
• det(A), tr(A)
• Inner Product:
∑
A • B = trA B =
T
aij bij
i,j
4
• The operator norm of ∥A∥:

∥Ax∥2
∥A∥ := max n
2
0̸=x∈R ∥x∥2
• Sometimes we use X = diag(x)

• Eigenvalues and eigenvectors
Av = λ · v
5
Symmetric Matrices
• Sn
• The Frobenius norm:
√ √
∥X∥f = trX T X = X •X
• Positive Definite (PD): Q ≻ 0 iff xT Qx > 0, for all x ̸= 0

• Positive SemiDefinite (PSD): Q ≽ 0 iff xT Qx ≥ 0, for all x
• PSD set: S+
n
, int S+
n
6
Known Inequalities
• Cauchy-Schwarz: given x, y ∈ Rn , xT y ≤ ∥x∥∥y∥.

• Triangle: given x, y ∈ Rn , ∥x + y∥ ≤ ∥x∥ + ∥y∥.
• Arithmetic-geometric mean: given x ∈ Rn+ ,
∑ (∏ )1/n
xj
≥ xj .
n
7
Hyper plane and Half-spaces
∑
n
H = {x : ax = aj xj = b}
j=1
∑
n
H + = {x : ax = aj xj ≤ b}
j=1
∑
n
H − = {x : ax = aj xj ≥ b}
j=1
8
(0,3) 3x+5y>15
3x+5y=15
3x+5y<15
0 (5,0) x
Figure 1: Hyperplan and half-space
9
System of Linear Equations
Solve for x ∈ Rn from:

a1 x = b1
a2 x = b2
⇒ Ax = b
··· · ·
am x = bm
10
y
(0,6)
3x+2y=12
(0,4)
(2.4,2.4)
2x+3y=12
0 (4,0) (6,0) x
Figure 2: System of linear equations
11
Fundamental Theorem of Linear Equations
Theorem 2 Given A ∈ Rm×n and b ∈ Rm , the system {x : Ax = b} has a solution if and only if
that AT y = 0 and bT y ̸= 0 has no solution.
A vector y, with AT y = 0 and bT y ̸= 0, is called an infeasibility certificate for the system.

Example Let A = (1; −1) and b = (1; 1). Then, y = (1/2; 1/2) is an infeasibility certificate.
Alternative systems: {x : Ax = b} and {y : AT y = 0, bT y ̸= 0}.
12
0
Ax
Figure 3: b is not in the set {Ax : x}, and y is the distance vector from b to the set.
13
Affine, Convex, Linear and Conic Combinations
When x and y are two distinct points in Rn and α runs over R ,
{z : z = αx + (1 − α)y}
is the line connecting x and y. When 0 ≤ α ≤ 1, it is called the convex combination of x and y and it is
the line segment between x and y.
{z : z = αx + βy},
for multipliers α, β , is the linear combination of x and y, and it is the hyperplane containing origin and
x and y. When α ≥ 0, β ≥ 0, it is called the conic combination...
14
Convex Sets
• Set notations: x ∈ Ω, y ̸∈ Ω, S ∪ T , S ∩ T
• Ω is said to be a convex set if for every x1 , x2 ∈ Ω and every real number α ∈ [0, 1], the point
αx1 + (1 − α)x2 ∈ Ω.
• The convex hull of a set Ω is the intersection of all convex sets containing Ω
• Intersection of convex sets is convex
• A point in a convex set is an extreme point if and only if it cannot be represented as convex
combination of two distinct points in the set.
• A set is polyhedral if and only if it has finite number of extreme points. Two standard forms for data
A ∈ Rm×n , b ∈ Rm , c ∈ Rn (with integers m ≤ n)
Equality Form : {x : Ax = b, x ≥ 0} and Inequality Form : {y : AT y ≤ c},
where the formal is the intersection of hyperplanes and nonnegative (up-right) orthant and the latter is
the intersection of half-spaces.
15
Proof of convex set
• All solutions to the system of linear equations, {x : Ax = b}, form a convex set.
• All solutions to the system of linear inequalities, {x : Ax ≤ b}, form a convex set.
• All solutions to the system of linear equations and inequalities, {x : Ax = b, x ≥ 0}, form a
convex set.
• Ball is a convex set: given center y ∈ Rn and radius r > 0, B(y, r) = {x : ∥x − y∥ ≤ r}.
• Ellipsoid is a convex set: given center y ∈ Rn and positive definite matrix Q,
E(y, Q) = {x : (x − y)T Q(x − y) ≤ 1}.
16
More on proof of convex set
Consider the set B of all b, for a fixed A, such that the set, {x : Ax = b, x ≥ 0}, is feasible.
Show that B is a convex set.
Example:
B = {b : {(x1 , x2 ) : x1 + x2 = b, (x1 , x2 ) ≥ 0} is feasible}.
17
Convex Cones
• A set C is a cone if x ∈ C implies αx ∈ C for all α > 0

• A convex cone is cone plus convex-set.
• Dual cone:
C ∗ := {y : y • x ≥ 0 for all x ∈ C}
.
18
Cone Examples
• Example 2.1: The n-dimensional non-negative orthant, Rn+ = {x ∈ Rn : x ≥ 0}, is a convex

cone. The dual cone is itself.
• Example 2.2: The set of all positive semi-definite matrices in S n , S+

n
, is a convex cone, called the
positive semi-definite matrix cone. The dual cone is itself.
• Example 2.3: The set {x ∈ Rn : x1 ≥ ∥x−1 ∥}, N2n , is a convex cone in Rn called the
second-order cone. The dual cone is itself.
• Example 2.4: The set {x ∈ Rn : x1 ≥ ∥x−1 ∥p }, Npn , is a convex cone in Rn called the p-order
cone with p ≥ 1. The dual cone is the q -order cone with 1q + p1 = 1.
19
Polyhedral Convex Cones
• A cone C is (convex) polyhedral if C can be represented by
C = {x : Ax ≤ 0} or {x : x = Ay, y ≥ 0}
for some matrix A. In the latter case, K is generated by the columns of A.
• The nonnegative orthant is a polyhedral cone but the second-order cone is not polyhedral.
20
Figure 4: Polyhedral and non-polyhedral cones.
21
Carathéodory’s theorem
The following theorem states that a polyhedral cone can be generated by a set of basic directional vectors.
Theorem 3 Given matrix A ∈ Rm×n where n > m, let convex polyhedral cone C = {Ax : x ≥ 0}.
For any b ∈ C,
∑
d
b= aji xji , xji ≥ 0, ∀i
i=1
for some linearly independent vectors aj1 ,...,ajd chosen from a1 ,...,an .
Note than we must have d ≤ m.
22
Real Functions
• Continuous functions C
• Weierstrass theorem: a continuous function f (x) defined on a compact set (bounded and closed)
Ω ⊂ Rn has a minimizer in Ω.
• The least upper bound or supremum of f over Ω
sup{f (x) : x ∈ Ω}
and the greatest lower bound or infimum of f over Ω
inf{f (x) : x ∈ Ω}
• A function f (x) is called homogeneous of degree k if f (αx) = αk f (x) for all α ≥ 0.
23
Let c ∈ Rn be given and x ∈ int Rn+ . Then cT x is homogeneous of degree 1 and

∑
n
ϕ(x) = n log(cT x) − log xj
j=1
is homogeneous of degree 0.
Let C∈ S n be given and X ∈ int S+
n
. Then xT Cx is homogeneous of degree 2, C • X and
det(X) are homogeneous of degree 1 and n, respectively; and
Φ(X) = n log(C • X) − log det(X)
is homogeneous of degree 0.
• The gradient vector C 1 :

∇f (x) = {∂f /∂xi }, for i = 1, ..., n.
• The Hessian matrix C 2 :

{ 2
}
∂ f
∇2 f (x) = for i = 1, ..., n; j = 1, ..., n.
∂xi ∂xj
24
• Vector function: f = (f1 ; f2 ; ...; fm )

• The Jacobian matrix of f is  
∇f1 (x)
 
∇f (x) = 
 ... .

∇fm (x)
25
Convex Functions
• f is a convex function iff for 0 ≤ α ≤ 1,
f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y).
• The level set of f is convex:

L(z) = {x : f (x) ≤ z}.
• Convex set {(z; x) : f (x) ≤ z} is called the epigraph of f .

• tf (x/t) is a convex function of (t; x) for t > 0 if f is a convex function of x; and it’s homogeneous
with degree 1.
26
Proof of convex function
Consider the minimal-objective value function of b for fixed A and c:
z(b) := minimize cT x
subject to Ax = b,
x ≥ 0.
Show that z(b) is a convex function in b for all feasible b.
27
Theorems on functions
Taylor’s theorem or the mean-value theorem:
Theorem 4 Let f ∈ C 1 be in a region containing the line segment [x, y]. Then there is a α, 0 ≤ α ≤ 1,
such that
f (y) = f (x) + ∇f (αx + (1 − α)y)(y − x).
Furthermore, if f ∈ C 2 then there is a α, 0 ≤ α ≤ 1, such that
f (y) = f (x) + ∇f (x)(y − x) + (1/2)(y − x)T ∇2 f (αx + (1 − α)y)(y − x).
Theorem 5 Let f ∈ C 1 . Then f is convex over a convex set Ω if and only if

f (y) ≥ f (x) + ∇f (x)(y − x)
for all x, y ∈ Ω.
Theorem 6 Let f ∈ C 2 . Then f is convex over a convex set Ω if and only if the Hessian matrix of f is
positive semi-definite throughout Ω.
28
Linear least-squares problem
Given A ∈ Rm×n and c ∈ Rn ,
(LS) minimize ∥c − AT y∥2

subject to y ∈ Rm .
A close form solution:

AAT y = Ac or y = (AAT )−1 Ac.
c − AT y = c − AT (AAT )−1 Ac = c − P c
Projection matrix: P = AT (AAT )−1 A or P = I − AT (AAT )−1 A.
29
0
Pc
Figure 5: Projection of c onto a subspace
30
Choleski decomposition method
AAT = LΛLT
LΛLT y∗ = Ac
31
System of nonlinear equations
Given f (x) : Rn → Rn , the problem is to solve n equations for n unknowns:
f (x) = 0.
Given a point xk , Newton’s Method sets
f (x) ≃ f (xk ) + ∇f (xk )(x − xk ) = 0.
xk+1 = xk − (∇f (xk ))−1 f (xk )

or solve for direction vector dx :
∇f (xk )dx = −f (xk ) and xk+1 = xk + dx .
32
f(x)
f(a)
f(b)
f(c)
z c b a x
Figure 6: Newton’s method for root finding
33
The quasi Newton method
For minimization of objective function f (x), then f (x) = ∇f (x)
xk+1 = xk − α(∇2 f (xk ))−1 ∇f (xk )
where scalar α ≥ 0 is called step-size. More generally
xk+1 = xk − αM k ∇f (xk )
where M k is an n × n symmetric matrix. In particular, if M k = I , the method is called the gradient

method, where f is viewed as the gradient vector of a real function.
34
Convergence and Big O
• {xk }∞ 0 1 2 k
0 denotes a seqence x , x , x , ..., x , ....
• xk → x̄ iff
∥xk − x̄∥ → 0
• g(x) ≥ 0 is a real valued function of a real nonnegative variable, the notation g(x) = O(x) means
that g(x) ≤ c̄x for some constant c̄;
• g(x) = Ω(x) means that g(x) ≥ cx for some constant c;

• g(x) = θ(x) means that cx ≤ g(x) ≤ c̄x.
• g(x) = o(x) means that g(x) goes to zero faster than x does:
g(x)
lim =0
x→0 x
35

Lecture 02

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 02

Uploaded by

Copyright:

Available Formats

MS&E310 Lecture Note #02

Chapters 1 and Appendixes A,B.1-B.2,C.1

Real n-Space; Euclidean Space

• The dual of the p norm, denoted by ∥.∥∗ , is the q norm, where 1

and row vector:

• A linearly independent set of vectors that span Rn is a basis.

• Rm×n , ai. , a.j , aij

• 0: all zero matrix, and I : the identity matrix

• The operator norm of ∥A∥:

• Sometimes we use X = diag(x)

• Positive Definite (PD): Q ≻ 0 iff xT Qx > 0, for all x ̸= 0

• Cauchy-Schwarz: given x, y ∈ Rn , xT y ≤ ∥x∥∥y∥.

Hyper plane and Half-spaces

System of Linear Equations

Solve for x ∈ Rn from:

Fundamental Theorem of Linear Equations

A vector y, with AT y = 0 and bT y ̸= 0, is called an infeasibility certificate for the system.

Affine, Convex, Linear and Conic Combinations

When x and y are two distinct points in Rn and α runs over R ,

Equality Form : {x : Ax = b, x ≥ 0} and Inequality Form : {y : AT y ≤ c},

Proof of convex set

More on proof of convex set

Show that B is a convex set.

• A set C is a cone if x ∈ C implies αx ∈ C for all α > 0

• Example 2.1: The n-dimensional non-negative orthant, Rn+ = {x ∈ Rn : x ≥ 0}, is a convex

• Example 2.2: The set of all positive semi-definite matrices in S n , S+

Polyhedral Convex Cones

• A cone C is (convex) polyhedral if C can be represented by

for some matrix A. In the latter case, K is generated by the columns of A.

Figure 4: Polyhedral and non-polyhedral cones.

Note than we must have d ≤ m.

and the greatest lower bound or infimum of f over Ω

• A function f (x) is called homogeneous of degree k if f (αx) = αk f (x) for all α ≥ 0.

Let c ∈ Rn be given and x ∈ int Rn+ . Then cT x is homogeneous of degree 1 and

Φ(X) = n log(C • X) − log det(X)

• The gradient vector C 1 :

• The Hessian matrix C 2 :

• Vector function: f = (f1 ; f2 ; ...; fm )

• f is a convex function iff for 0 ≤ α ≤ 1,

f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y).

• The level set of f is convex:

• Convex set {(z; x) : f (x) ≤ z} is called the epigraph of f .

Proof of convex function

Consider the minimal-objective value function of b for fixed A and c:

Show that z(b) is a convex function in b for all feasible b.

Taylor’s theorem or the mean-value theorem:

Theorem 5 Let f ∈ C 1 . Then f is convex over a convex set Ω if and only if

Linear least-squares problem

Given A ∈ Rm×n and c ∈ Rn ,

(LS) minimize ∥c − AT y∥2

A close form solution:

Figure 5: Projection of c onto a subspace

Choleski decomposition method

System of nonlinear equations

Given f (x) : Rn → Rn , the problem is to solve n equations for n unknowns:

Given a point xk , Newton’s Method sets

f (x) ≃ f (xk ) + ∇f (xk )(x − xk ) = 0.

xk+1 = xk − (∇f (xk ))−1 f (xk )

∇f (xk )dx = −f (xk ) and xk+1 = xk + dx .

Figure 6: Newton’s method for root finding

The quasi Newton method

For minimization of objective function f (x), then f (x) = ∇f (x)