You are on page 1of 35

MS&E310 Lecture Note #02

Mathematical Preliminaries

Yinyu Ye
Department of Management Science and Engineering
Stanford University
Stanford, CA 94305, U.S.A.

http://www.stanford.edu/˜yyye

Chapters 1 and Appendixes A,B.1-B.2,C.1

1
MS&E310 Lecture Note #02

Real n-Space; Euclidean Space

• R, R+ , int R+
• Rn , Rn+ , int Rn+
• x ≥ y means xj ≥ yj for j = 1, 2, ..., n
• 0: all zero vector; and e: all one vector
• Inner-Product:

n
x • y := xT y = xj yj
j=1

√ (∑ )1/p
n
• Norm: ∥x∥2 = xT x, ∥x∥∞ = max{|x1 |, |x2 |, ..., |xn |}, ∥x∥p = j=1 |xj |
p

• The dual of the p norm, denoted by ∥.∥∗ , is the q norm, where 1


p + 1
q =1
• Column vector:
x = (x1 ; x2 ; . . . ; xn )

2
MS&E310 Lecture Note #02

and row vector:


x = (x1 , x2 , . . . , xn )

• A set of vectors a1 , ..., am is said to be linearly dependent if there are scalars λ1 , ..., λm , not all
zero, such that the linear combination

m
λi ai = 0
i=1

• A linearly independent set of vectors that span Rn is a basis.

3
MS&E310 Lecture Note #02

Matrices

• Rm×n , ai. , a.j , aij


• AI denotes the submatrix of A whose rows belong to I , AJ denotes the submatrix of whose columns
belong to J , AIJ .

• 0: all zero matrix, and I : the identity matrix


• N (A), R(A):
Theorem 1 Each linear subspace of Rn is generated by finitely many vectors, and is also the
intersection of finitely many hyperplanes; that is, for each linear subspace of L of Rn there are
matrices A and C such that L = N (A) = R(C).
• det(A), tr(A)
• Inner Product:

A • B = trA B =
T
aij bij
i,j

4
MS&E310 Lecture Note #02

• The operator norm of ∥A∥:


∥Ax∥2
∥A∥ := max n
2
0̸=x∈R ∥x∥2

• Sometimes we use X = diag(x)


• Eigenvalues and eigenvectors
Av = λ · v

5
MS&E310 Lecture Note #02

Symmetric Matrices

• Sn
• The Frobenius norm:
√ √
∥X∥f = trX T X = X •X

• Positive Definite (PD): Q ≻ 0 iff xT Qx > 0, for all x ̸= 0


• Positive SemiDefinite (PSD): Q ≽ 0 iff xT Qx ≥ 0, for all x
• PSD set: S+
n
, int S+
n

6
MS&E310 Lecture Note #02

Known Inequalities

• Cauchy-Schwarz: given x, y ∈ Rn , xT y ≤ ∥x∥∥y∥.


• Triangle: given x, y ∈ Rn , ∥x + y∥ ≤ ∥x∥ + ∥y∥.
• Arithmetic-geometric mean: given x ∈ Rn+ ,
∑ (∏ )1/n
xj
≥ xj .
n

7
MS&E310 Lecture Note #02

Hyper plane and Half-spaces


n
H = {x : ax = aj xj = b}
j=1


n
H + = {x : ax = aj xj ≤ b}
j=1


n
H − = {x : ax = aj xj ≥ b}
j=1

8
MS&E310 Lecture Note #02

(0,3) 3x+5y>15

3x+5y=15

3x+5y<15
0 (5,0) x
Figure 1: Hyperplan and half-space

9
MS&E310 Lecture Note #02

System of Linear Equations

Solve for x ∈ Rn from:


a1 x = b1
a2 x = b2
⇒ Ax = b
··· · ·
am x = bm

10
MS&E310 Lecture Note #02

y
(0,6)
3x+2y=12

(0,4)
(2.4,2.4)

2x+3y=12

0 (4,0) (6,0) x
Figure 2: System of linear equations

11
MS&E310 Lecture Note #02

Fundamental Theorem of Linear Equations

Theorem 2 Given A ∈ Rm×n and b ∈ Rm , the system {x : Ax = b} has a solution if and only if
that AT y = 0 and bT y ̸= 0 has no solution.

A vector y, with AT y = 0 and bT y ̸= 0, is called an infeasibility certificate for the system.


Example Let A = (1; −1) and b = (1; 1). Then, y = (1/2; 1/2) is an infeasibility certificate.
Alternative systems: {x : Ax = b} and {y : AT y = 0, bT y ̸= 0}.

12
MS&E310 Lecture Note #02

0
Ax

Figure 3: b is not in the set {Ax : x}, and y is the distance vector from b to the set.

13
MS&E310 Lecture Note #02

Affine, Convex, Linear and Conic Combinations

When x and y are two distinct points in Rn and α runs over R ,

{z : z = αx + (1 − α)y}

is the line connecting x and y. When 0 ≤ α ≤ 1, it is called the convex combination of x and y and it is
the line segment between x and y.

{z : z = αx + βy},
for multipliers α, β , is the linear combination of x and y, and it is the hyperplane containing origin and
x and y. When α ≥ 0, β ≥ 0, it is called the conic combination...

14
MS&E310 Lecture Note #02

Convex Sets

• Set notations: x ∈ Ω, y ̸∈ Ω, S ∪ T , S ∩ T
• Ω is said to be a convex set if for every x1 , x2 ∈ Ω and every real number α ∈ [0, 1], the point
αx1 + (1 − α)x2 ∈ Ω.
• The convex hull of a set Ω is the intersection of all convex sets containing Ω
• Intersection of convex sets is convex
• A point in a convex set is an extreme point if and only if it cannot be represented as convex
combination of two distinct points in the set.

• A set is polyhedral if and only if it has finite number of extreme points. Two standard forms for data
A ∈ Rm×n , b ∈ Rm , c ∈ Rn (with integers m ≤ n)

Equality Form : {x : Ax = b, x ≥ 0} and Inequality Form : {y : AT y ≤ c},

where the formal is the intersection of hyperplanes and nonnegative (up-right) orthant and the latter is
the intersection of half-spaces.

15
MS&E310 Lecture Note #02

Proof of convex set

• All solutions to the system of linear equations, {x : Ax = b}, form a convex set.
• All solutions to the system of linear inequalities, {x : Ax ≤ b}, form a convex set.
• All solutions to the system of linear equations and inequalities, {x : Ax = b, x ≥ 0}, form a
convex set.

• Ball is a convex set: given center y ∈ Rn and radius r > 0, B(y, r) = {x : ∥x − y∥ ≤ r}.
• Ellipsoid is a convex set: given center y ∈ Rn and positive definite matrix Q,
E(y, Q) = {x : (x − y)T Q(x − y) ≤ 1}.

16
MS&E310 Lecture Note #02

More on proof of convex set

Consider the set B of all b, for a fixed A, such that the set, {x : Ax = b, x ≥ 0}, is feasible.

Show that B is a convex set.

Example:
B = {b : {(x1 , x2 ) : x1 + x2 = b, (x1 , x2 ) ≥ 0} is feasible}.

17
MS&E310 Lecture Note #02

Convex Cones

• A set C is a cone if x ∈ C implies αx ∈ C for all α > 0


• A convex cone is cone plus convex-set.
• Dual cone:
C ∗ := {y : y • x ≥ 0 for all x ∈ C}
.

18
MS&E310 Lecture Note #02

Cone Examples

• Example 2.1: The n-dimensional non-negative orthant, Rn+ = {x ∈ Rn : x ≥ 0}, is a convex


cone. The dual cone is itself.

• Example 2.2: The set of all positive semi-definite matrices in S n , S+


n
, is a convex cone, called the
positive semi-definite matrix cone. The dual cone is itself.

• Example 2.3: The set {x ∈ Rn : x1 ≥ ∥x−1 ∥}, N2n , is a convex cone in Rn called the
second-order cone. The dual cone is itself.

• Example 2.4: The set {x ∈ Rn : x1 ≥ ∥x−1 ∥p }, Npn , is a convex cone in Rn called the p-order
cone with p ≥ 1. The dual cone is the q -order cone with 1q + p1 = 1.

19
MS&E310 Lecture Note #02

Polyhedral Convex Cones

• A cone C is (convex) polyhedral if C can be represented by

C = {x : Ax ≤ 0} or {x : x = Ay, y ≥ 0}

for some matrix A. In the latter case, K is generated by the columns of A.

• The nonnegative orthant is a polyhedral cone but the second-order cone is not polyhedral.

20
MS&E310 Lecture Note #02

Figure 4: Polyhedral and non-polyhedral cones.

21
MS&E310 Lecture Note #02

Carathéodory’s theorem

The following theorem states that a polyhedral cone can be generated by a set of basic directional vectors.

Theorem 3 Given matrix A ∈ Rm×n where n > m, let convex polyhedral cone C = {Ax : x ≥ 0}.
For any b ∈ C,

d
b= aji xji , xji ≥ 0, ∀i
i=1
for some linearly independent vectors aj1 ,...,ajd chosen from a1 ,...,an .

Note than we must have d ≤ m.

22
MS&E310 Lecture Note #02

Real Functions

• Continuous functions C
• Weierstrass theorem: a continuous function f (x) defined on a compact set (bounded and closed)
Ω ⊂ Rn has a minimizer in Ω.
• The least upper bound or supremum of f over Ω

sup{f (x) : x ∈ Ω}

and the greatest lower bound or infimum of f over Ω

inf{f (x) : x ∈ Ω}

• A function f (x) is called homogeneous of degree k if f (αx) = αk f (x) for all α ≥ 0.

23
MS&E310 Lecture Note #02

Let c ∈ Rn be given and x ∈ int Rn+ . Then cT x is homogeneous of degree 1 and



n
ϕ(x) = n log(cT x) − log xj
j=1

is homogeneous of degree 0.
Let C∈ S n be given and X ∈ int S+
n
. Then xT Cx is homogeneous of degree 2, C • X and
det(X) are homogeneous of degree 1 and n, respectively; and

Φ(X) = n log(C • X) − log det(X)

is homogeneous of degree 0.

• The gradient vector C 1 :


∇f (x) = {∂f /∂xi }, for i = 1, ..., n.

• The Hessian matrix C 2 :


{ 2
}
∂ f
∇2 f (x) = for i = 1, ..., n; j = 1, ..., n.
∂xi ∂xj

24
MS&E310 Lecture Note #02

• Vector function: f = (f1 ; f2 ; ...; fm )


• The Jacobian matrix of f is  
∇f1 (x)
 
∇f (x) = 
 ... .

∇fm (x)

25
MS&E310 Lecture Note #02

Convex Functions

• f is a convex function iff for 0 ≤ α ≤ 1,

f (αx + (1 − α)y) ≤ αf (x) + (1 − α)f (y).

• The level set of f is convex:


L(z) = {x : f (x) ≤ z}.

• Convex set {(z; x) : f (x) ≤ z} is called the epigraph of f .


• tf (x/t) is a convex function of (t; x) for t > 0 if f is a convex function of x; and it’s homogeneous
with degree 1.

26
MS&E310 Lecture Note #02

Proof of convex function

Consider the minimal-objective value function of b for fixed A and c:

z(b) := minimize cT x
subject to Ax = b,
x ≥ 0.

Show that z(b) is a convex function in b for all feasible b.

27
MS&E310 Lecture Note #02

Theorems on functions

Taylor’s theorem or the mean-value theorem:

Theorem 4 Let f ∈ C 1 be in a region containing the line segment [x, y]. Then there is a α, 0 ≤ α ≤ 1,
such that
f (y) = f (x) + ∇f (αx + (1 − α)y)(y − x).
Furthermore, if f ∈ C 2 then there is a α, 0 ≤ α ≤ 1, such that
f (y) = f (x) + ∇f (x)(y − x) + (1/2)(y − x)T ∇2 f (αx + (1 − α)y)(y − x).

Theorem 5 Let f ∈ C 1 . Then f is convex over a convex set Ω if and only if


f (y) ≥ f (x) + ∇f (x)(y − x)
for all x, y ∈ Ω.
Theorem 6 Let f ∈ C 2 . Then f is convex over a convex set Ω if and only if the Hessian matrix of f is
positive semi-definite throughout Ω.

28
MS&E310 Lecture Note #02

Linear least-squares problem

Given A ∈ Rm×n and c ∈ Rn ,

(LS) minimize ∥c − AT y∥2


subject to y ∈ Rm .

A close form solution:


AAT y = Ac or y = (AAT )−1 Ac.

c − AT y = c − AT (AAT )−1 Ac = c − P c
Projection matrix: P = AT (AAT )−1 A or P = I − AT (AAT )−1 A.

29
MS&E310 Lecture Note #02

0
Pc

Figure 5: Projection of c onto a subspace

30
MS&E310 Lecture Note #02

Choleski decomposition method

AAT = LΛLT

LΛLT y∗ = Ac

31
MS&E310 Lecture Note #02

System of nonlinear equations

Given f (x) : Rn → Rn , the problem is to solve n equations for n unknowns:

f (x) = 0.

Given a point xk , Newton’s Method sets

f (x) ≃ f (xk ) + ∇f (xk )(x − xk ) = 0.

xk+1 = xk − (∇f (xk ))−1 f (xk )


or solve for direction vector dx :

∇f (xk )dx = −f (xk ) and xk+1 = xk + dx .

32
MS&E310 Lecture Note #02

f(x)

f(a)

f(b)
f(c)
z c b a x

Figure 6: Newton’s method for root finding

33
MS&E310 Lecture Note #02

The quasi Newton method

For minimization of objective function f (x), then f (x) = ∇f (x)

xk+1 = xk − α(∇2 f (xk ))−1 ∇f (xk )

where scalar α ≥ 0 is called step-size. More generally

xk+1 = xk − αM k ∇f (xk )

where M k is an n × n symmetric matrix. In particular, if M k = I , the method is called the gradient


method, where f is viewed as the gradient vector of a real function.

34
MS&E310 Lecture Note #02

Convergence and Big O

• {xk }∞ 0 1 2 k
0 denotes a seqence x , x , x , ..., x , ....

• xk → x̄ iff
∥xk − x̄∥ → 0

• g(x) ≥ 0 is a real valued function of a real nonnegative variable, the notation g(x) = O(x) means
that g(x) ≤ c̄x for some constant c̄;

• g(x) = Ω(x) means that g(x) ≥ cx for some constant c;


• g(x) = θ(x) means that cx ≤ g(x) ≤ c̄x.
• g(x) = o(x) means that g(x) goes to zero faster than x does:
g(x)
lim =0
x→0 x

35

You might also like