Professional Documents
Culture Documents
DATA SCIENCE
LINEAR ALGEBRA
OPTIMIZATION
STATISTICS
Fundamentals of optimization
What is optimization ?
*“http://en.wikipedia.org/wiki/Mathematical_optimization”
What is optimization?
… the use of specific methods to
determine the “best” solution to a problem
◦Find the best functional representation
for data
◦Find the best hyperplane to classify data
Objective function
◦We look at minimization problem
Decision variables
Constraints
UNCONSTRAINED CASE
8
Data science for Engineers
min f ( x )
x
Local
Golbal
minimum (𝒙∗𝟏 )
minimum (𝒙∗𝟐 )
f(x)
minimizer
f*
x* x
𝒙∗𝟏 𝒙∗𝟐
Optimization for Data Science 9
Data science for Engineers
min f ( x )
x
xR
min f ( x )
x
xR
Necessary and sufficient conditions for 𝑥 ∗ to be the minimizer of the function 𝑓(𝑥)
f ( x) = 3 x 4 − 4 x 3 − 12 x 2 + 3
First order condition Second order condition
f ' ( x) = 12 x 3 − 12 x 2 − 24 x = 0
f ' ' ( x) = 36 x 2 − 24 x − 24
= 12 x( x 2 − x − 2 x) = 0
f ' ' ( x) x =0 = −24
= 12 x( x + 1)( x − 2) = 0
f ' ' ( x) x = −1 = 36 0
x = 0, x = −1, x = 2 f ' ' ( x) x = 2 = 72 0
𝑓 −1 = −2 𝑓 2 = −29
UNCONSTRAINED
MULTIVARIATE OPTIMIZATION
1
Data science for Engineers
z = f ( x1 , x2 ....xn )
z= x12 + x22
Contour plot
10
Increasing values
8
of z
6
0
10
5 10
0 5
0
-5 -5
-10 -10
At 𝑥 𝑘 = 𝑥 ∗ (minimizer of 𝑓(𝑥))
ҧ
0 1
f ( x ) f ( x ) + [f ( x )] ( x − x ) +
* * T *
( x − x * )T 2 f ( x * )( x − x * )
2
1
f ( x ) − f ( x ) ( x − x * )T 2 f ( x * )( x − x * )
*
2
positive Has to be positive
Optimization for data science 5
Data science for Engineers
( x − x * )T 2 f ( x * )( x − x * ) 0
(v )T 2 f ( x * )(v ) 0
Hessian matrix is said to be positive definite at a point if all the eigen values of
the Hessian matrix are positive
min f ( x ) min f ( x )
x x
xR xR n
x x x2
2
2 1
UNCONSTRAINED
MULTIVARIATE OPTIMIZATION
1
Data science for Engineers
𝑥 𝑘+1
GRADIENT (STEEPEST)
DESCENT (OR) LEARNING RULE
1
Data science for Engineers
300
250
200
150
100
f(X)
50
-50
-100
5
4
3 5
2 4
1 3
0 2
-1 1
X2 0 X1
-2 -1
-3 -2
-4 -3
-4
-5 -5
Optimization for data science 2
Data science for Engineers
X2
0,1 0,2 0
2 8 ( 2 ) + 3 ( 2 ) − 5.5
X1 = − 0.135
2
-1
3 ( 2 ) + 5 ( 2 ) − 4
-2
-0.2275
X1 = f ( X 1 ) = 0.0399
0.3800 -3
-2 -1 0 1 2 3
X1
Optimization for data science 3
Data science for Engineers
-0.2275
First iteration ( X 1 ) =
4
0.3800
3
X0
Step 2: X 2 = X 1 − f ' ( X 1 )
2
1
-0.2275 8 x1,1 + 3 x1,2 − 5.5 X2
X2 = − 0.135 3 x + 5 x − 4
X2
X1
0.3800 1,1 1,2 0
0.6068
X2 = f ( X 2 ) = − 2.0841 -3
0.7556 -2 -1 0 1 2 3
X1
0.6068
Second iteration ( X 2 ) = 2
0.7556
1.5
Step 3: X 3 = X 2 − f ' ( X 2 ) 1
X2
0.6068 8 x2,1 + 3 x2,2 − 5.5
X2
0.5 X3
X3 = − 0.135 3 x + 5 x − 4
X1
0.3879
Third iteration ( X3 ) =
0.5398
1.2
1
Step 4: X 4 = X 3 − f ' ( X 3 )
0.8 X2
X2
0.5398 3,1 3,2 0.4
X1
-0.2
0.4928
X4 = f ( X 4 ) = − 2.3675 -0.4
0.5583
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1
X1
0.5
Optimal solution ( X opti ) = f ( X opti ) = − 2.3750
0.5
Gradient is zero at the optimum point
1
Data science for Engineers
Fundamentals of optimization
Multivariate optimization with constraints
min 2 x12 + 4 x22
x1 , x 2
st
3 x1 + 2 x2 = 12 3𝑥1 + 2𝑥2 = 12
4
All points on this line represent
3 the feasible region
Constrained
minimum 2
Unconstrained minimum is not
Unconstrained
1 the same as constrained
minimum 0 minimum
-1
-2
-3
-3 -2 -1 0 1 2 3 4
Optimization for data science 2
Data science for Engineers
Fundamentals of optimization
Multivariate optimization with constraints
min 2 x12 + 4 x22
x1 , x 2
st
3 x1 + 2 x2 12 3𝑥1 + 2𝑥2 = 12
4
3
Constrained
minimum 2
-1
-2
feasible region
-3
-3 -2 -1 0 1 2 3 4
Optimization for data science 3
Data science for Engineers
Fundamentals of optimization
Multivariate optimization with constraints
min 2 x12 + 4 x22
x1 , x 2
st
3 x1 + 2 x2 12 3𝑥1 + 2𝑥2 = 12
4
3
Constrained
minimum 2
Unconstrained
1 feasible region
minimum 0
Fundamentals of optimization
Multivariate optimization with equality constraints
−𝛻𝑓(𝑥 ∗ ) = 𝜆∗ 𝛻ℎ(𝑥 ∗ )
In higher dimensions and when there are more than one equality constraint
Fundamentals of optimization
Multivariate optimization
min 2 x12 + 4 x22
x1 , x 2
st
3 x1 + 2 x2 − 12 = 0
L = 0
L
= 4 x1 − 3 = 0
x1 x1* 3.27
solving *
L x2 = 1.09
= 8 x2 − 2 = 0 *
x2
4.36
L
= −(3 x1 + 2 x2 − 12) = 0
1
Data science for Engineers
Fundamentals of optimization
Multivariate optimization with constraints
min 2 x12 + 4 x22
x1 , x 2
st
3 x1 + 2 x2 12 3𝑥1 + 2𝑥2 = 12
4
3
Constrained
minimum 2
-1
-2
feasible region
-3
-3 -2 -1 0 1 2 3 4
Optimization for data science 2
Data science for Engineers
Fundamentals of optimization
Multivariate optimization with constraints
min 2 x12 + 4 x22
x1 , x 2
st
3 x1 + 2 x2 12 3𝑥1 + 2𝑥2 = 12
4
3
Constrained
minimum 2
Unconstrained
1 feasible region
minimum 0
General formulation
Multivariate optimization
min
−
f ( x)
x
st
hi ( x ) = 0, i = 1,...m
g j ( x ) 0, j = 1,2...l
Sufficient condition
𝜆𝑖 𝜖𝑅, 𝑖 = 1 … 𝑙
➢ In general it is difficult to use the KKT conditions to solve for the optimum of an inequality
constrained problem (than for a problem with equality constraints only) because we do not
know a priori which constraints are active at the optimum.
➢ KKT conditions are used to verify that a point we have reached is a candidate optimal solution.
Fundamentals of optimization
Multivariate optimization-quadratic programming
𝒙𝟏 = 𝟏
min 2 x12 + 4 x22
x1 , x 2
st Feasible region
3 x1 + 2 x2 12
2 x1 + 5 x2 10
x1 1
Fundamentals of optimization
Multivariate optimization-quadratic programming
min 2 x12 + 4 x22 Lagrangian
x1 , x2
st
𝐿 𝑥1 , 𝑥2 , 𝜇1 , 𝜇2 , 𝜇3 = 2𝑥12 + 4𝑥22 + 𝜇1 3𝑥1 + 2𝑥2 − 12
3 x1 + 2 x2 12 ( a ) +𝜇2 10 − 2𝑥1 − 5𝑥2 + 𝜇3 (𝑥1 − 1)
2 x1 + 5 x2 10 (b) First order KKT conditions
x1 1 (c )
4𝑥1 + 3𝜇1 − 2𝜇2 + 𝜇3 = 0
8𝑥2 + 2𝜇1 − 5𝜇2 = 0
𝜇1 3𝑥1 + 2𝑥2 − 12 = 0
𝜇2 10 − 2𝑥1 − 5𝑥2 = 0
𝜇3 𝑥3 − 1 = 0
𝜇𝑖 ≥ 0
Fundamentals of optimization
Multivariate optimization-quadratic programming
Active (A) /Inactive (I) Possible
Solution
Sl.no constraints optima Remark
(𝒙, 𝝁)
(a) (b) (c) (Y/N)
1 A A A Infeasible N Equations do not have a valid solution.
𝑥 = 3.6364 0.5455
2 A A I N 𝑥1 ≤ 1 is not satisfied, 𝜇1 < 0, 𝜇2 < 0
𝜇 = [−5.2 −1.45 0]
𝑥 = 1 4.5
3 A I A N 𝜇1 < 0
𝜇 = [−18 0 50]
𝑥 = 1 1.6
4 I A A Y All constraints and KKT conditions satisfied
𝜇 = [0 2.56 1.12]
𝑥 = 3.27 1.09
5 A I I N 𝑥1 ≤ 1 is not satisfied
𝜇 = [−4.36 0 0]
𝑥 = 1.21 1.51
6 I A I N 𝑥1 ≤ 1 is not satisfied
𝜇 = [0 2.45 0]
𝑥= 1 0 2𝑥1 + 5𝑥2 ≥ 10 is not satisfied
7 I I A N
𝜇 = [0 0 −4]
𝑥= 0 0
8 I I I N 2𝑥1 + 5𝑥2 ≥ 10 is not satisfied
𝜇 = [0 0 0]