Professional Documents
Culture Documents
Yangjing Zhang
15-Aug-2023
NUS
Contacts
lecture1 1/59
About
lecture1 2/59
Assessment
lecture1 3/59
Today’s content1
• Introduction
1. Motivating examples
2. Graphical illustration
3. General nonlinear programming
4. Basic calculus and linear algebra
5. Necessary/Sufficient optimality conditions
6. Convex sets/functions
• Principal component analysis
1. Recap linear algebra (eigenvalues/eigenvectors) and statistics
(mean/covariance)
2. PCA
Example 1
If K units of capital and L units of labor are used, a company can
produce KL units of a product. Capital can be purchased at $4 per
unit and labor can be purchased at $1 per unit. A total of $8000 is
available to purchase capital and labor. How can the firm maximize the
quantity of the product manufactured?
lecture1 5/59
Simple examples
Example 1
If K units of capital and L units of labor are used, a company can
produce KL units of a product. Capital can be purchased at $4 per
unit and labor can be purchased at $1 per unit. A total of $8000 is
available to purchase capital and labor. How can the firm maximize the
quantity of the product manufactured?
maximize KL
subject to 4K + L ≤ 8000, K ≥ 0, L ≥ 0.
lecture1 5/59
Simple examples
Example 2
It costs a company $c to produce a unit of a product. If the company
charges $p per unit, and the customers demand D(p) units, what price
should the company charge to maximize its profit?
lecture1 6/59
Simple examples
Example 2
It costs a company $c to produce a unit of a product. If the company
charges $p per unit, and the customers demand D(p) units, what price
should the company charge to maximize its profit?
maximize (p − c)D(p)
subject to p ≥ 0.
lecture1 6/59
Linear regression
10 6
8
6
Price
1
1000 2000 3000 4000 5000 6000 7000 8000 9000
Area
2 https://www.kaggle.com/datasets/yasserh/housing-prices-dataset
lecture1 7/59
Linear regression
lecture1 8/59
Linear regression
lecture1 9/59
Optimal transportation (OT)
Example: OT
s supplies a1 , . . . , as of goods must be transport to meet t demands
b1 , . . . , bt of customers. The cost of transporting one unit of ith good
to jth customer is cij . What is the optimal transportation plan (xij
unit of ith good transported to jth customer) to minimize the cost?
Cost: X
cij xij
Constraints:
demands
supplies
lecture1 11/59
Optimal transportation (OT)
√
maximize f (x) = x 24 − x
50
40
30
f(x)
20
10
0
0 10 20 30
x
√
x = 16 achieves the maximal value 16 24 − 16
lecture1 13/59
Graphical illustration in 2-dimension
• Note that (x1 − 4)2 + (x2 − 6)2 measures the distance between two
points (x1 , x2 )T and (4, 6)T
• Graphically, we want to find the shortest distance of (x1 , x2 )T from
(4, 6)T
lecture1 14/59
Graphical illustration in 2-dimension
subject to x1 ≤ 4 7
x2 ≤ 6 6
3x1 + 2x2 ≤ 18 5
2
x
x1 ≥ 0 4
3
x2 ≥ 0
2
0
0 2 4 6 8
x
1
(x1 , x2 ) = ( 34 66 T 34 2 66
13 , 13 ) achieves the minimal value ( 13 − 4) + ( 13 − 6)
2
lecture1 15/59
Graphical illustration
lecture1 16/59
General nonlinear programming
Nonlinear programming
lecture1 17/59
Terminology and notation
lecture1 18/59
Local/global minimizer
20
15
10
global minimizer
5
-5
local minimizer
-10
0 0.5 1 1.5 2 2.5 3 3.5
x
• Local/global minimizer may not be unique
• A global minimizer is a local minimizer. However, the converse is
not true in general lecture1 19/59
Local/global minimizer
lecture1 20/59
Basic calculus and linear algebra
Interior point
lecture1 22/59
Hessian matrix
2
• The ij-entry of Hf (x) is ∂x∂i ∂x
f
j
(x). In general, Hf (x) is not
symmetric. However, if f has continuous second order derivatives,
then the Hessian matrix is symmetric.
lecture1 23/59
Example
lecture1 24/59
Example
∂2f ∂2f
" # " #
∂x1 ∂x1 (x) ∂x1 ∂x2 (x) 6x1 2
Hf (x) = ∂2f ∂2f
=
∂x2 ∂x1 (x) ∂x2 ∂x2 (x)
2 2
lecture1 24/59
Positive (semi)definite matrices
lecture1 25/59
Positive (semi)definite matrices
lecture1 26/59
Necessary/Sufficient optimality
conditions
Unconstrained NLP
minimize f (x)
lecture1 27/59
Necessary/sufficient condition
• confine our search for global minimizers within the set of stationary
points
lecture1 28/59
Necessary/sufficient condition
• confine our search for global minimizers within the set of stationary
points
Sufficient condition If
(1) x∗ is a stationary point, i.e., ∇f (x∗ ) = 0
(2) Hf (x∗ ) is positive definite,
then x∗ is a local minimizer of f .
lecture1 28/59
Example
Example
Let f (x) = x21 + x22 − 2x2 , x = (x1 ; x2 ) ∈ R2 .
(1) Find a stationary point of f .
(2) Verify that the stationary point is a local minimizer (by checking
the sufficient condition).
lecture1 29/59
Example
Example
Let f (x) = x21 + x22 − 2x2 , x = (x1 ; x2 ) ∈ R2 .
(1) Find a stationary point of f .
(2) Verify that the stationary point is a local minimizer (by checking
the sufficient condition).
Solution. We let
" # " # " #
∂f ∗
∂x1 (x ) 2x∗1 0
∇f (x∗ ) = ∂f ∗
= = 0 ⇒ x∗ =
∂x2 (x ) 2x∗2 − 2 1
Thus x∗ = (0, 1)T is a stationary point of f . Next, we verify that
" 2
∂2f
# " #
∂ f ∗ ∗
∗ ∂x1 ∂x1 (x ) ∂x1 ∂x2 (x ) 2 0
Hf (x ) = ∂2f ∗ ∂2f ∗
=
∂x2 ∂x1 (x ) ∂x2 ∂x2 (x )
0 2
is positive definite (since the eigenvalues of Hf (x∗ ) are 2, 2, which are
positive).
lecture1 29/59
Example
Example
Let f (x) = x21 x2 + x1 x32 − x1 x2 , x = (x1 ; x2 ) ∈ R2 .
(1) Verify that x = (0; 1) a stationary point of f .
(2) Is x = (0; 1) a local minimizer?
lecture1 30/59
Example
Example
Let f (x) = x21 x2 + x1 x32 − x1 x2 , x = (x1 ; x2 ) ∈ R2 .
(1) Verify that x = (0; 1) a stationary point of f .
(2) Is x = (0; 1) a local minimizer?
Solution.
" # " #
2x1 x2 + x32 − x2 0
∇f (x) = 2 ⇒ ∇f (0; 1) =
x1 + 3x1 x22 − x1 0
Thus x = (1; 0) is a stationary point of f . Next,
" # " #
∗ 2x2 2x1 + 3x22 − 1 2 2
Hf (x ) = ⇒ Hf (0; 1) =
2x1 + 3x22 − 1 6x1 x2 2 0
√
is not positive semidefinite (since the eigenvalues of Hf are 1 + 5,
√
1 − 5 < 0). The necessary condition fails, thus x = (0; 1) is not a local
minimizer.
lecture1 30/59
Convex sets/functions
Convex sets
lecture1 31/59
(Strictly) convex functions
lecture1 32/59
Geometrical interpretation
For a convex function f , the line For a concave function f , the line
segment joining f (x) and f (y) segment joining f (x) and f (y)
lies above the graph of f in [x, y]. lies below the graph of f in [x, y].
lecture1 33/59
Geometrical interpretation
lecture1 34/59
Benefit of convexity
minimize f (x)
lecture1 35/59
Test for convexity
lecture1 36/59
Preliminaries
Notation
lecture1 37/59
Eigenvalues/eigenvectors
lecture1 38/59
Eigenvalues/eigenvectors: An example
" #
1.5 0.5
Consider A = .
0.5 1.5
lecture1 39/59
Eigenvalues/eigenvectors: An example
• Eigenvalue decomposition:
" #" #" #T
−0.7071 0.7071 1 −0.7071 0.7071
A= .
0.7071 0.7071 2 0.7071 0.7071
eigenvector 1 eigenvector 2
• Mean vector:
n
1X
µ = x̄ = xi .
n i=1
• (Sample/Empirical) Covariance matrix:
n
1 X
Σ= (xi − µ)(xi − µ)T .
n − 1 i=1
lecture1 41/59
Mean/covariance: An example
" # " #
1 2
Mean µ = , covariance Σ = . We generate 1000 observations
4 3
of a Gaussian random variable
xi ∼ N (µ, Σ), i = 1, . . . , 1000.
mu = [1;4];
Sigma = [2 0; 0 3];
for i = 1:1000
xi = mvnrnd ( mu , Sigma ) ;
plot ( xi (1) , xi (2) , 'k . ') ;
hold on ;
end
lecture1 42/59
Mean/covariance: An example
lecture1 43/59
Mean/covariance: An example
" # " # " # " #
1 2 2 1 2 −2
µ= Σ= µ= Σ=
4 2 3 4 −2 3
lecture1 44/59
Mean/covariance: An example
" # " # " # " #
1 2 2 1 2 −2
µ= Σ= µ= Σ=
4 2 3 4 −2 3
Can you find a vector that would approximate the data? Blue one!
lecture1 45/59
PCA: Intuition
" # " #
1 2 2
µ= Σ= Consider eigenvectors of covariance matrix Σ
4 2 3
. λ = 0.4382 " #
−0.7882
eigenvector 1 =
0.6154
. λ = 4.5616 " #
0.6154
eigenvector 2 =
0.7882
. eigenvector 2 accounts for most
of the data
lecture1 46/59
Principal component analysis
PCA
10
Example 1
8
p = 2 variables on n = 8 samples
4
(1, 2), (3, 3), (3, 5), (5, 4), (5, 6),
2
-2
lecture1 47/59
PCA: step-be-step explanation
lecture1 48/59
PCA: step-be-step explanation
10
1.5
8
1
6
0.5
-2 -1 1 2
-0.5
2
-1
-2 2 4 6 8 10 12
-1.5
-2
lecture1 49/59
PCA: step-be-step explanation
which yields
Sigma =
1.0000 0.9087
0.9087 1.0000
lecture1 50/59
PCA: step-be-step explanation
which yields
1.5
0.5
-2 -1 1 2
-0.5
-1
-1.5
lecture1 52/59
PCA: step-be-step explanation
Xnew = XQ
lecture1 53/59
PCA: step-be-step explanation
1.5
0.5
-2 -1 1 2
-0.5
-1
-1.5
Figure 8: Recast the data along the first principal direction. Discard the
second component and the dimension is reduced by 1.
lecture1 54/59
Standard PCA workflow
lecture1 55/59
Application: Iris flower data set
3
Principal component 2 (23.0%)
-1
-2
-3
-3 -2 -1 0 1 2 3 4
Principal component 1 (72.8%)
-1
-3 -2 -1 0 1 2 3 4
Programming Exercise
Do PCA on the Iris flower data set and plot Figure 9 and Figure 10.
lecture1 58/59
Application: Iris flower data set
lecture1 59/59
References i
K.-C. Toh.
MA5268: Theory and algorithms for nonlinear optimization.