This action might not be possible to undo. Are you sure you want to continue?

1

Optimization Theory

Don Johnson

∗

This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License †

Optimization theory is the study of the extremal values of a function: its minima and maxima. Topics in this theory range from conditions for the existence of a unique extremal value to methodsboth analytic and numericfor nding the extremal values and for what values of the independent variables the function attains its extremes. In this book, minimizing an error criterion is an essential step toward deriving optimal signal processing algorithms. An appendix summarizing the key results of optimization theory is essential to understand optimal algorithms.

1 Unconstrained Optimization

The simplest optimization problem is to nd the minimum of a scalar-valued function of a scalar variable f (x)the so-called objective functionand where that minimum is located. Assuming the function is dierentiable, the well-known conditions for nding the minimalocal and globalare1

d f (x) = 0 dx d2 f (x) > 0 dx2 All values of the independent variable x satisfying these relations are locations of local minima. Without the second condition, solutions to the rst could be either maxima, minima, or inection points. Solutions to the rst equation are termed the stationary points of the objective function. To nd the global minimumthat value (or values) where the function achieves its smallest valueeach candidate extremum must be tested: the objective function must be evaluated at each stationary point and the smallest selected. If, however, the objective function can be shown to be strictly convex, then only one solution of d dx (f ) = 0 exists and that solution corresponds to the global minimum. The function f (x) is strictly convex if, for any choice of x1 , x2 , and the scalar a, f (ax1 + (1 − a) x2 ) < af (x1 ) + (1 − a) f (x2 ). Convex objective functions occur often in practice and are more easily minimized because of this property. When the objective function f (·) depends on a complex variable z , subtleties enter the picture. If the function f (z) is dierentiable, its extremes can be found in the obvious way: nd the derivative, set it equal to zero, and solve for the locations of the extrema. However, there are many situations in which this function is not dierentiable. In contrast to functions of a real variable, non-dierentiable functions of 2 a complex variable occur frequently. The simplest example is f (z) = (|z|) . The minimum value of this function obviously occurs at the origin. To calculate this obvious answer, a complication arises: the function f (z) = z is not analytic with respect to z and hence not dierentiable. More generally, the derivative of a

∗ Version

1 The

† http://creativecommons.org/licenses/by/1.0

1.4: Mar 22, 2008 4:24 pm GMT-5

maximum of a function is found by nding the minimum of its negative.

http://cnx.org/content/m11240/1.4/

so can the variable and its conjugate with the advantage that the 2 2 ∂ ∂ mathematics is far simpler. this gradient becomes 2Ax. ∀α. This result is easily derived by expressing the quadratic form as a double sum ( i ∧ j (Aij xi xj )) and evaluating the partials directly. This quantity is known as the Hessian. http://cnx.org/content/m11240/1. Seemingly. This fact is often used in numerical optimization algorithms. testing stationary points as possible locations for minima is more complicated [3]. α > 0 : (xk . This complication can be resolved with either of two methods tailored for optimization problems. Thus. ∂z (|z|) = z and the stationary point is z = 0.Connexions module: m11240 2 function with respect to a complex-valued variable cannot be evaluated directly when the function depends on the variable's conjugate. all stationary points can be found by setting the derivative (in the sense just given) with respect to eitherz or z to zero [1]. . The second. the quantities z and z can be treated as independent variables.3 Thus. As the following theorem suggests. However. When A is symmetric. x (f (x)) = . dened to be the matrix of all the second partials of f (·). ∂ f (x) ∂xN 2 2 Theorem 1: For example. which is often the case. The gradient "points" in the direction of the maximum rate of increase of the function f (·).2 This approach is unnecessarily tedious but will yield the solution. First. In most cases. if the real and imaginary parts can be considered as independent variables. The rst is to express the objective function in terms of the real and imaginary parts of z and nd the function's minimum with respect to these two variables. Thus. Strictly convex functions are certainly smooth for this method to work. the gradient of xT Ax is Ax + AT x. try both and paragraphs. ∂2 2 f (x) x (f (x)) ij = ∂xi ∂xj 2 The multi-variate minimization problem is discussed in a few 3 Why should this be? In the next few examples. A variable and its conjugate are thus viewed as the result of applying an invertible linear transformation to the variable's real and imaginary parts. xk−1 − α x (f (x))) If the objective function is suciently "smooth" (there aren't too many minima and maxima). In this way. the objective function's sole stationary point is its global minimum. to nd the minimum of (|z|) . See Churchill [2] for more about the analysis of functions of a complex variable. the evaluation of the function's stationary points is a simple extension of the scalar-variable case. z) is real-valued and analytic with respect to z and z . ∂z (|z|) = z and ∂z (|z|) = z . The gradient of the gradient of f (x). each considered a constant with respect to the other. approach relies on two results from complex variable theory. is a matrix where j th column is the gradient x of the j th component of f 's gradient. ∂ ∂x1 f (x) . more elegant. see which you feel is "easier". this approach will yield the global minimum. denoted by 2 (f (x)). the next step to minimizing the objective function is to set the derivatives with respect to each quantity to zero and then solve the resulting pair of equations. The method of steepest descent is an iterative algorithm where a candidate minimum is augmented by a quantity proportional to the negative of the objective function's gradient to yield the next candidate. that solution is overly complicated.4/ . compute the derivative with respect to either z or z . If the function f (z. As this objective function is strictly convex. ∂ the derivative with respect to z is the most convenient choice. The gradient of the scalar-valued function f (x) of a vector x (dimension N ) equals an N -dimensional vector where each component is the partial derivative of f (·) with respect to each component of x. When the objective function depends on a vector-valued quantity x.

To implement the method of steepest descent. for example. The minima of the objective function f (x) occur when x (f (x)) = 0 and i. The Hessian of the objective function is simply A. Let f (z. Optimization by Vector Space Methods . consider the variation of f given by Theorem 2: δf = i ∂ ∂ (f ) δzi + (f ) δzi ∂zi ∂zi ( z =( H z (f )) δz + ( T z (f )) δz T This quantity is concisely expressed as δf = 2 (f )) δz . [2] R. positive denite. A complex gradient operator and its application in adaptive array theory. http://cnx. Luenberger. the Hessian dened to be the matrix of mixed partials given by z ( z (f (z))) must be positive denite.W. implying for positive denite A that a stationary point is z = 0. z) is z (f (z))[1]. how to evaluate the gradient becomes an issue: is z or z∗ more appropriate?.G. Complex Variables and Applications IEE Proc. the objective function f (x) = xT Ax is convex whenever A is positive denite and symmetric. 1989. 1969. Churchill and J. 1983. McGraw-Hill. the maximum value of this variation occurs when δz is in the same direction as ( z (f )). To nd the stationary points of a scalar-valued function of a complex-valued vector.Connexions module: m11240 3 The Hessian is always a symmetric matrix. the Hessian evaluated at that point must be a positive denite matrix. For example. When the objective function is strictly convex. In contrast to the case of complex scalars. . Thus. To show this result. z) be a real-valued function of the vector-valued complex variable z where the dependence on the variable and its conjugate is explicit. [3] D. z) is in the direction of its gradient with respect to z. Brown.. 2 x (f (x)) > 0 . Wiley. By treating z and z as independent variables.4/ . this test need not be performed.org/content/m11240/1. the issues discussed in the scalar case also arise. the choice in the case of complex vectors is unique.V. F and H. for a stationary point to be a minimum.e. the quantity pointing in the direction of the maximum rate of change of f (z. For example. Brandwood. conrming that the minimum of a quadratic form is always the origin. Thus. 4 Note that the Hessian of xT Ax is 2A. Because of the complex-valued quantities involved.H. the direction corresponding to the largest change in the quantity f (z.4 When the independent vector is complex-valued. Pts. we must solve z (f (z)) = 0 (1) For solutions of this equation to be minima. the required gradient of the objective function zH Az is given by Az. References [1] D. By the Schwarz inequality. the gradient with respect to the conjugate must be used. 130:1116.

- Connexions Module
- Lec 17 Multivariable OT
- 03a-Optimization1.pptx
- Optimization Theory Lecture 2
- Unconstrained Op Tim Ization
- Question Bank AOT
- Lecture Note on Optimization
- 11 OD Nonlinear Programming a-2008-2
- Necessary Optimality Conditions for Approximate Minima in Unconstrained Finite Minmax Problem
- Unconstrained Optimization
- M2L3slides
- Test Functions.pdf
- MATLAN_MatrixCalculus_OPTIMIZATION.pptx
- Subgradient and Bundle methods Report
- Chapter 7
- Mae 345 Lecture 10
- Mathematical Optimization
- CHE 555 7 Optimization
- ot
- Mathematical Optimization
- Line Search techniques.pdf
- Course Notes
- Fin500J_ConstrainedOptimization_2011
- Lec1 Unconstr Opt
- EO VIII Units Presentation
- Course Notes
- 1.EO
- Lecture When is a Function Convex Hessian Positive Definite
- Arora - Optimization of Structural and Mechanical Systems (WorldSci, 2007).pdf
- Mathematical Operation
- Theory of Optimizations

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd