Professional Documents
Culture Documents
Applications
Chapter 3: Basics of mathematical optimization
where
F (w ) - objective (loss) function
w - decision (optimization) variable (model parameter)
Here, the UNLP is used to refer to the unconstrained opti-
mization problem.
F (w ) ≥ F (w ∗ ), for any w ∈ Rn .
√ √
2 2
has two local minimum at the points − 2 , 2 and
√ √
2 2
2 , − 2 .
3.3. Global and Local Minimum Point...
import matplotlib.cm as cm
X = np.linspace(-2,2,100)
Y = X.copy()
X, Y = np.meshgrid(X, Y)
Z = X*Y*np.exp(-X*X-Y*Y)
fig = plt.figure()
ax = fig.add_subplot(111)
# Reversed Greys colourmap for filled contours
cpf = ax.contourf(X,Y,Z, 20, cmap=cm.Greys_r)
# Set the colours of the contours and labels so theyre white where the
# contour fill is dark (Z < 0) and black where its light (Z >= 0)
colours = [r if level<0 else k for level in cpf.levels]
cp = ax.contour(X, Y, Z, 20, colors=colours)
ax.clabel(cp, fontsize=12, colors=colours)
plt.show()
2 2
Figure: Contour plot for F (w ) = w1 w2 exp−w1 −w2 .
3.3. Global and Local Minimum Point...
Now, we can find a sufficiently small λ ∈ [0, 1], so that w ∗ + λ w − w ∗ is in the neighborhood of (i.e., near to) w ∗ . Hence,
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
F (w ) ≤ F (w +λ w − w ) = F ((1−λ)w +λw ) ≤ (1−λ)F (w )+(1−λ) F (w ) < (1−λ)F (w )+λF (w ) = F (w )
| {z }
<F (w ∗ )
This implies that F (w ∗ ) < F (w ∗ ) which is a contradiction. Hence, the assumption is false and w ∗ is a global minimum point.
3.4. Optimality Criteria
Q: (i) How do we know that a given point is a local minimizer for a loss
function? (ii) How do we find a minimum point?
Note that:
w ∗ is local minimum point of F (w ) means that there is a
neighborhood N (w ∗ ) such that
F (w ∗ ) ≤ F (w ), for any w ∈ N (w ∗ ).
F (w ∗ ) ≤ F (w ∗ + αd).
∇F (w )> d < 0.
Example
2 2
For the function F (w ) = w1 w2 e −w1 −w2 , at the points
(−1, 1) and (1, 1) we have ∇F (w ) = 0, but neither of them
is a local minimum point.
Example
solve
(w1 − 2)
∇F (w ) = w2 = 0.
∇F (w ) = −A> (y − Aw ) + γw = 0.
This is equivalent to
h i
A> A + γIm+1 w = A> y .
Here, for a given γ > 0, the matrix D := A> A + γIm+1 and the
vector b = A> y are known, since they are defined through the
dataset.
Next, solve the system of linear equations Dw = b to determine
w.
3.5. Optimality ... Linear Regression Models...
Example: Given the average monthly temperature in Germany from May 2019
to May 2020 (From:https://www.statista.com/statistics/982472/average-monthly-temperature-germany/ )
Month May june jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May
Temp. 10.9 19.8 18.9 19 14.1 10.9 5.2 3.7 3.3 5.3 5.3 10.5 11.9
find a reasonable prediction for the average temperature in June 2020.
Solution:
1. Identify the type of function class to fit to the data.
For this, visualize the data.
Clearly, a function f (x, w ) which is linear w.r.t. w will not be a good fit.
2. For instance, we may fit a polynomial
f (x, w ) = w5 x 5 + w4 x 4 + w3 x 3 + w2 x 2 + w1 x + w0 .
3.5. Optimality ... Linear Regression Models...
3. Dataset {(xj , yj ) | j = 1, . . . , 13}
{(1, 10.9), (2, 19.8), (3, 18.9), (4, 19), (5, 14.1), (6, 10.9), (7, 5.2), (8, 3.7), (9, 3.3), (10, 5.3), (11, 5.3), (12, 10.5), (13, 11.9)}
1 1 1 1 1 1
32 16 8 4 2 1
243 81 27 9 3 1
1024 256 64 16 4 1
3125 625 125 25 5 1
7776 1296 216 36 6 1
A = 16807 2401 343 49 7 1
32768 4096 512 64 8 1
59049 6561 729 81 9 1
100000 10000 1000 100 10 1
161051 14641 1331 121 11 1
248832 20736 1728 144 12 1
371293 28561 2197 169 13 1
The Hessian matrix of F
H(w ) = A> A + γI
Elastic-net
( 13
)
1X
min f (w ) = |yj − f (xj , w )|2 + γ2 kw k22 + γ1 kw k1 ,
w ∈Rm+1 2 j=1
where
f (x) = c > x + b - linear objective function.
Ax = a - linear equality constraints
Bx ≤ b - linear inequality constraints
S = {x ∈ Rn | Ax = a; Bx ≤ b} - feasible set of (LP).
3.2.1. Constrained Optimization Problems - Quadratic
Programming
where
f (x) = 12 x > Qx + q > x - quadratic objective function.
Ax = a - linear equality constraints
Bx ≤ b - linear inequality constraints
S = {x ∈ Rn | Ax = a; Bx ≤ b} - feasible set of (QP).
3.2.1. .... Constrained Optimization ...
Feasible Set
A point x ∈ Rn is called a feasible point of the NLP
if hi (x) = 0, i = 1, 2, . . . , p and
gj (x) ≤ 0, j = 1, 2, . . . , m.
Represent the set of all feasible points of the NLP by
S := {x ∈ Rn | hi (x) = 0, i = 1, . . . , p; gj (x) ≤ 0, j = 1, . . . , m} .
Any point that lies outside the feasible set is infeasible (not
admissible) to the optimization problem.
Infeasible points are usually not considered during the
optimization process.
3.2.1. ... Constrained Optimization ...
Example 1:
1 2
(NLP1) min x + x1 x22
x 2 1
s.t.
x1 x22 − 1 = 0,
− x12 + x2 ≤ 0,
x2 ≥ 0.
In this example
there is one equality constraint h1 (x) = x1 x22 − 1 and
two inequality constraints g1 (x) = −x12 + x2 ≤ 0 and
g2 (x) = −x2 ≤ 0.
I Observe that, x = (1, 1)> is a feasible point; while (0, 0)> is not
feasible; i.e., x = (0, 0)> does not belong to the feasible set
S = x ∈ R2 | x1 x22 − 1 = 0, −x12 + x2 ≤ 0, x2 ≥ 0 .
3.2.1. Introduction to Constrained Optimization ...
Example 2:
Questions
Q1: How do we verify that a point x ∈ Rn an optimal solution to
the the NLP? ( We need optimality criteria )
Q2: What methods are available to solve a constrained nonlinear
optimization problem?
( Methods for constrained optimization. )
3.2.1. ... Constrained Optimization ...
gi (x) = 0.
Descent direction
A vector d is a descent direction to the objective function f at
the point x if
f (x + d) ≤ f (x).
A movement from the point x in the direction of the vector d reduces the value of the function f .
f (x + d ) − f (x ) = d > ∇f (x ) < 0 ⇒ f (x + d ) − f (x ) ≤ 0.
>
i = 1, . . . , p : hi (x + αd) ≈ hi (x) +α d ∇hi (x); ⇒ hi (x + d)
e = 0,
| {z } | {z }
=0 =0
>
j ∈ A(x) : gj (x + αd) ≈ gj (x) +α d ∇gj (x) ⇒ gj (x + d)
e ≤ 0;
| {z } | {z }
=0 <0
Lagrange-Function
The function
p
X m
X
L(x, λ, µ) = f (x) + λi hi (x) + µj gj (x)
i=1 j=1
Optimality conditions
∂L 1
= 0 ⇒ 2x1 + λ + µ = 0 ⇒ x1 = − (λ + µ) (18)
∂x1 2
∂L 1
= 0 ⇒ −2x2 + 2λ − µ = 0 ⇒ x2 = (2λ − µ) (19)
∂x2 2
Complementarity
1 1
µg (x) = 0 ⇒ µ(x1 − x2 − 3) = 0 ⇒ µ − (λ + µ) − (2λ − µ) − 3
2 2
1 2 1 2
⇒ µ − (µ − ) + µ − 2(µ − ) − µ − 3 = 0
2 3 2 3
3 4
⇒ µ − µ − 2 = 0 ⇒ µ = 0 or µ = − .
2 3
1 1 2 1
x1∗ = − (λ∗ + µ∗ ) = − (− + 0) = (21)
2 2 3 3
1 1 2 2
x2∗ = (2λ∗ − µ∗ ) = (2 × (− ) − 0) = − . (22)
2 2 3 3
>
Consequently, the point x ∗ = 13 , − 32 is the only candidate for
local minima.
(NLP ) min f (x )
x
s .t .
hi (x ) = 0, i = 1, 2, . . . , p ;
gj (x ) ≤ 0, j = 1, 2, . . . , m.
S = {x ∈ Rn | hi (x ) = 0, i = 1, 2, . . . , p ; gj (x ) ≤ 0, j = 1, 2, . . . , m.}
is a convex set;
The Lagrange function
p
X m
X
L(x , λ, µ) = f (x ) + λi hi (x ) + µj gj (x )
i =1 j =1
Optimization problem:
2
(NLP ) min W (A1 , A2 ) = ρL √ A1 + A2 (23)
A1 , A2 3
subject to
8 3
displacement constraint: F √ +3 ≤ σ0 (24)
3A1 A2
2F
stress constraint bar 1: − σ0 ≤ ≤ σ0 (25)
A1
√
3F
stress constraint bar 2: − σ0 ≤ ≤ σ0 (26)
A2
A1 ≥ 0, A2 ≥ 0. (27)
3.2. Optimality Criteria ...KKT Conditions
In standard form Optimization problem:
2
(NLP) min W (A1 , A2 ) = ρL √ A1 + A2 (28)
A1 ,A2 3
8 3
subject to: F √ + − σ0 ≤ 0 (29)
3A1 A2
2F
− σ0 ≤ 0 (30)
A1
2F
− − σ0 ≤ 0 (31)
A1
√
3F
− σ0 ≤ 0 (32)
A2
√
3F
− − σ0 ≤ 0 (33)
A2
A1 ≥ 0, A2 ≥ 0. (34)