You are on page 1of 8

ams/econ 11b

supplementary notes

ucsc

Optimization in several variables I,


rst and second order conditions
c 2010, Yonatan Katznelson

This note covers the basics of (unconstrained) optimization, and attempts to explain things like why critical points are critical, and why the second derivative test works. The key denitions and facts are summarized in neat little boxes, (like this one).

1.

Local extreme values and critical points.

1.1 Local extreme values. The value z0 = f (x0 , y0 ) is called a local minimum value of the function if there is a number > 0 such that f (x, y) f (x0 , y0 ) (1) for all points (x, y) satisfying |x x0 | < and |y y0 | < . Likewise, z1 = f (x1 , y1 ) is a local maximum value if f (x, y) f (x1 , y1 ) (2) The for all points (x, y) satisfying |x x1 | < and |y y1 | < , for some number > 0. local maximum and minimum values of a function are also called local extreme values, when the distinction between maximum and minimum is not important. More generally, w = f (1 , . . . , xn ) is a local maximum (or minimum) value if and only x if there is some number > 0 such that f (x1 , . . . , xn ) f (1 , . . . , xn ) x (or f (x1 , . . . , xn ) f (1 , . . . , xn ), respectively) x for all points (x1 , . . . , xn ) satisfying |xj xj | < , for j = 1, 2, . . . , n. 1.2 First order conditions. By studying the rst order Taylor approximation of the function z = f (x, y), we can identify the points (x0 , y0 ) where local extreme values might occur. Recall that the rst order approximation can be written f (x, y) f (x0 , y0 ) fx (x0 , y0 )(x x0 ) + fy (x0 , y0 )(y y0 ), (3) where the approximation is accurate for all points (x, y) suciently close to (x0 , y0 ), i.e., for all points satisfying |x x0 | < and |y y0 | < for some positive number .
The condition |x x | < and |y y | < , for some number > 0 is a more precise way of saying suciently 1 1 close to (x1 , y1 ). For our purposes in this note, the precise value of is not important. The important thing is that exists.

Suppose that fx (x0 , y0 ) > 0, then by choosing y = y0 and x1 > x0 , the approximation (??) tells us that f (x1 , y0 ) f (x0 , y0 ) fx (x0 , y0 )(x1 x0 ) > 0, as long as x1 is suciently close to x0 . This means that f (x0 , y0 ) cannot possibly be a local maximum value, since there are points arbitrarily close to (x0 , y0 ) where the function takes larger values. Still assuming that fx (x0 , y0 ) > 0 and staying with y = y0 , but choosing x1 < x0 , we see that f (x1 , y0 ) f (x0 , y0 ) fx (x0 , y0 )(x1 x0 ) < 0, as long as x1 is suciently close to x0 . This implies that f (x0 , y0 ) cannot possibly be a local minimum value, since there are points arbitrarily close to (x0 , y0 ) where the function takes smaller values. The preceding two paragraphs show that if fx (x0 , y0 ) > 0, then f (x0 , y0 ) is not a local extreme value. Almost identical arguments show that if fx (x0 , y0 ) < 0, then f (x0 , y0 ) is not a local extreme value. Of course, there is nothing special about the variable x in this context, i.e., if fy (x0 , y0 ) = 0, then the same arguments show that f (x0 , y0 ) is not a local extreme value. This means that if f (x0 , y0 ) is a local extreme value, then fx (x0 , y0 ) = 0 and fy (x0 , y0 ) = 0. Finally, the arguments work just as well for a function of 17 variables as a function of two variables, so we draw the following general conclusion.

Fact 1. If f (1 , . . . , xn ) is a local extreme value of the function f (x1 , . . . , xn ), then x fxj (1 , . . . , xn ) = 0 x for j = 1, . . . , n.

This fact motivates the following denition.

Denition 1. A point (1 , . . . , xn ) is called a critical point (or a stationary point) of the function x f (x1 , . . . , xn ) if the n equations fxj (1 , . . . , xn ) = 0 x for j = 1, . . . , n. (4)

are all satised. The value of the function at a critical point, f (1 , . . . , xn ), is called a x critical value (or a stationary value). The equations in (??) are called the rst order conditions for f (1 , . . . , xn ) to be a x local extreme value.
This conclusion also relies on the implicit assumption that the function is suciently dierentiable, which means that its partial derivatives are all dened (and continuous) in the neighborhood of the points in question. Other conclusions in this note likewise rely on the similar assumptions, which we make without stating them explicitly.

The rst order conditions must be satised in order that f (1 , . . . , xn ) to be a local x extreme value, but they do not guarantee that the critical value is a local extreme value. The function f (x, y) = x2 y 2 has exactly one critical point, (0, 0), since the two rst order conditions for this function

Example 1.

fx = 2x = 0 fy = 2y = 0 force x = 0 and y = 0. The lone critical value is f (0, 0) = 0. Next observe that f (x, 0) = x2 > 0 for any x = 0, however small |x| is. This implies that f (0, 0) = 0 is not a local maximum, since the function takes larger values at points arbitrarily close to (0, 0). Likewise, f (0, y) = y 2 < 0 for any y = 0, however small |y| is, which means that the function takes smaller values at points arbitrarily close to (0, 0) as well. I.e., f (0, 0) is not a local minimum either. It follows that the critical value f (0, 0) is not a local extreme value in this case.

Example 2.
are

The rst order conditions for the function g(x, y) = x2 + y 2 2x + 6y + 10 gx = 2x 2 = 0 gy = 2y + 6 = 0.

The rst equation has the solution x0 = 1 and the second equation has the solution y0 = 3. So (x0 , y0 ) = (1, 3) is the only critical point for this function, and the critical value is g(1, 3) = 0. By completing the square separately in x and y, we see that g(x, y) = x2 + y 2 2x + 6y + 10 = (x 1)2 + (y + 3)2 . It follows that g(x, y) 0 for all (x, y), and g(x, y) = 0 only at the critical point (1, 3). I.e., if (x, y) = (1, 3), then g(x, y) > 0 = g(1, 3), which means that g(1, 3) is a local minimum value, and in fact that it is the global minimum value, since it is the smallest value of the function over all.

2.

A necessary digression: quadratic forms.

The two examples above illustrate the fact that analyzing the critical values of quadratic functions is relatively easy, requiring only basic algebra (i.e., completing the square) and the fact that u2 > 0 for all u = 0. For our purposes, we need only analyze the behavior of quadratic functions that have no constant term and no linear terms, for example, functions of the form Q(u, v) = au2 + buv + cv 2 , in the case of two variables, Q(u, v, w) = au2 + bv 2 + cw2 + duv + euw + f vw,
They

are necessary conditions, but they are not sucient conditions. I.e., to explain the second derivative test.

in the case of three variables, etc. This type of function is called a quadratic form. Specically, what we will eventually need to know about quadratic forms is when such a form produces only positive values, only negative values, or when it produces both positive and negative values. This behavior is completely determined by the coecients of the quadratic form, as I will illustrate for forms in two and three variables. Moreover, we dont need calculus to do this analysis. 2.1 The case of two variables. Consider the form Q(u, v) = au2 + buv + cv 2 . If a = 0, then we can factor out an a from the rst two terms, and write b Q(u, v) = au2 + buv + cv 2 = a u2 + uv + cv 2 . a
b Adding the term 4a2 v 2 inside the parentheses, and subtracting the term parentheses shows that Q(u, v) can be written as
2

b2 2 4a v

outside the

b2 b Q(u, v) = a u2 + uv + 2 v 2 a 4a b v 2a
2

b2 2 v + cv 2 4a (5)

=a u+

4ac 4a

b2

v2.

The expression 4ac b2 that appears in the numerator of the coecient of v 2 in the second line is called the discriminant of the form Q(u, v), and denoted by D. There are ve possibilities to consider. i. If D > 0 and a > 0, then both terms in the third line of (??) are nonnegative for all choices of (u, v). I.e., in this case Q(u, v) 0 for all points (u, v) and, more importantly, Q(u, v) > 0 if (u, v) = (0, 0). ii. If D > 0 and a < 0, then both terms in the third line of (??) are nonpositive for all choices of (u, v). I.e., in this case Q(u, v) 0 for all points (u, v) and, more importantly, Q(u, v) < 0 if (u, v) = (0, 0). iii. If D < 0 and a > 0, then D/4a < 0. It follows that Q(0, 1) = D/4a < 0 and Q(1, 0) = a > 0, and we see that Q produces both positive and negative values. iv. If D < 0 and a < 0, then D/4a > 0. It follows that Q(0, 1) = D/4a > 0 and Q(1, 0) = a < 0, and once again, Q produces both positive and negative values. v. If D = 0, then Q(u, v) = a (u + bv/2a)2 . So, if a > 0, then Q(u, v) 0 for all (u, v), and if a < 0, then Q(u, v) 0 for all (u, v). Comments: Cases iii. and iv. can be combined to yield the conclusion: if D < 0, then Q(u, v) produces both positive and negative values.

If a = 0 and c = 0, then we have Q(u, v) = b2

4acb2 4c

u2 + c v +

2 b 2c u .

In this case,

D= < 0, and the conclusion above still holds, i.e., Q produces both positive and negative values. If a = 0 = c, then D = b2 < 0 and Q(u, v) = buv. In this case, Q(1, 1) = b and Q(1, 1) = b, and so, whatever the sign of b,we see that Q produces both positive and negative values. I included case v., D = 0, for completeness sake, but it isnt helpful in the formulation of the second derivative test. The key points of the preceding analysis are summarized below.

Fact 2. If Q(u, v) = au2 + buv + cv 2 and D = 4ac b2 = 0, then 1. If D > 0 and a > 0, then Q(u, v) > 0 for all (u, v) = (0, 0). 2. If D > 0 and a < 0, then Q(u, v) < 0 for all (u, v) = (0, 0). 3. If D < 0, then Q(u, v) produces both positive and negative values.

2.2 The case of three variables. As you can imagine, with three variables, there are more cases to consider in the algebraic analysis. Ill spare you the details, and merely summarize the main points in Fact ??.

Fact 3. Let Q(u, v, w) = au2 + bv 2 + cw2 + duv + euw + f vw, and set D2 = 4ab d2 and D3 = 4abc + def be2 af 2 cd2 .

1. If a > 0, D2 > 0 and D3 > 0, then Q(u, v, w) > 0 for all (u, v, w) = (0, 0, 0). 2. If a < 0, D2 > 0 and D3 < 0, then Q(u, v, w) < 0 for all (u, v, w) = (0, 0, 0). 3. If D2 < 0, or if D2 > 0 but a D3 < 0, then Q(u, v, w) yields both positive and negative values.

I wont discuss the case of quadratic forms with more than three variables here, beyond saying that Facts ?? and ?? generalize directly to give similar criteria.

3.

The second order conditions.

We have already seen that for f (1 , . . . , xn ) to be a local extreme value, it is necessary x that the point (1 , . . . , xn ) be a critical point of the function. I.e., the point must satisfy the x
5

rst order conditions in (??). On the other hand, we also saw that the rst order conditions, by themselves, are not enough to guarantee that the critical value f (1 , . . . , xn ) is a local x extreme value. In the case that f (x1 , . . . , xn ) is a quadratic function, it is relatively easy to analyze the nature of the critical value by completing the square. To extend this same analysis to general (nonquadratic) functions, we use the second order Taylor approximation, centered at the critical point (1 , . . . , xn ), x
n

f (x1 , . . . , xn ) f (1 , . . . , xn ) + x
j=1

fxj (1 , . . . , xn )(xj xj ) x (6) x fxj xk (1 , . . . , xn )(xj xj )(xk xk ),

1 + 2

j=1 k=1

which is accurate when the dierences |xj xj |, j = 1, . . . , n, are all suciently small. If the point (1 , . . . , xn ) satises the rst order conditions (??), then the coecients x fxj (1 , . . . , xn ) of the linear terms in (??) are all zero. This means that, when centered at x a critical point, Taylors second order approximation may be written as f (x1 , . . . , xn ) f (1 , . . . , xn ) x 1 2
n n

fxj xk (1 , . . . , xn )(xj xj )(xk xk ), x


j=1 k=1

(7)

after subtracting the constant term, f (1 , . . . , xn ), from both sides of (??). x The expression on the right-hand side of (??) is a quadratic form in the variables uj = (xj xj ). To simplify the exposition a little, Ill use the notation 1 Qf (u1 , . . . , un ) = 2 and rewrite (??) as f (x1 , . . . , xn ) f (1 , . . . , xn ) Qf (u1 , . . . , un ). x Also, observe that (x1 , . . . , xn ) = (1 , . . . , xn ) is the same as (u1 , . . . , un ) = (0, . . . , 0). x It follows from (??) that we have the following three cases. Q1. If Qf (u1 , . . . , un ) > 0 for all points (u1 , . . . , un ) = (0, . . . , 0), then f (x1 , . . . , xn ) > f (1 , . . . , xn ) x for all points (x1 , . . . , xn ) = (1 , . . . , xn ) that are suciently close to (1 , . . . , xn ). This x x means that f (1 , . . . , xn ) is a relative minimum value. x
Though what I mean by easy may not jibe with what you mean by easy, and in any case, easy or not, the analysis is admittedly somewhat tedious.

fxj xk (1 , . . . , xn )uj uk , x
j=1 k=1

(8)

Q2. If Qf (u1 , . . . , un ) < 0 for all points (u1 , . . . , un ) = (0, . . . , 0), then f (x1 , . . . , xn ) < f (1 , . . . , xn ) x for all points (x1 , . . . , xn ) = (1 , . . . , xn ) that are suciently close to (1 , . . . , xn ). This x x means that f (1 , . . . , xn ) is a relative maximum value. x Q3. If Qf (u1 , . . . , un ) takes both positive and negative values, then both phenomena, f (x1 , . . . , xn ) > f (1 , . . . , xn ) x and f (x1 , . . . , xn ) < f (1 , . . . , xn ), x occur for points (x1 , . . . , xn ) that are arbitrarily close to (1 , . . . , xn ), which implies x that f (1 , . . . , xn ) is a neither a relative minimum value, nor a relative maximum value. x The coecients of Qf are given by the values of the various second order derivatives of the function f (x1 , . . . , xn ) at the critical point (1 , . . . , xn ), and these determine which of x the three cases, above, occur (if any). For functions of two or three variables, the analysis that we need is contained in Facts ?? and ??, respectively. 3.1 The case of two variables. If (x0 , y0 ) is a critical point of the function f (x, y), then the double sum on the right of (??) simplies to 1 1 f (x, y) f (x0 , y0 ) fxx (x0 , y0 )(x x0 )2 + fxy (x0 , y0 )(x x0 )(y y0 ) + fyy (y y0 )2 , 2 2 and the quadratic form Qf may be written simply in this case as 1 1 Qf (u, v) = fxx (x0 , y0 )u2 + fxy (x0 , y0 )uv + fyy (x0 , y0 )v 2 . 2 2 The discriminant of this form is D = fxx (x0 , y0 )fyy (x0 , y0 ) fxy (x0 , y0 )2 , which is why we call the function Df (x, y) = fxx (x, y)fyy (x, y) fxy (x, y)2 (9) the discriminant of f (x, y). Fact ?? (about quadratic forms in two variables), combined with observations Q1, Q2 and Q3, above, yield the second derivative test for functions of two variables. Fact 4. Suppose that (x0 , y0 ) satises the rst order conditions fx (x0 , y0 ) = 0 then: 1. If Df (x0 , y0 ) > 0 and fxx (x0 , y0 ) > 0, then f (x0 , y0 ) is a local minimum value of the function f (x, y). 2. If Df (x0 , y0 ) > 0 and fxx (x0 , y0 ) < 0, then f (x0 , y0 ) is a local maximum value of the function f (x, y). 3. If Df (x0 , y0 ) < 0, then f (x0 , y0 ) is neither a local minimum value nor a local maximum value of the function f (x, y). and fy (x0 , y0 ) = 0,

The conditions in 1. (in 2.) are called the second order conditions for a local minimum value (a local maximum value) at (x0 , y0 ). 3.2 The case of three variables. In this case, Taylors approximation (??), centered at a critical point (x0 , y0 , z0 ) of the function g(x, y, z) simplies to g(x, y, z) g(x0 , y0 , z0 ) a(x x0 )2 + b(y y0 )2 + c(z z0 )2 + d(x x0 )(y y0 ) + e(x x0 )(z z0 ) + f (y y0 )(z z0 ), where a= and d = gxy (x0 , y0 , z0 ), e = gxz (x0 , y0 , z0 ) and f = gyz (x0 , y0 , z0 ), as you can check, if you so desire. So, the quadratic form Qg is given by Qg (u, v, w) = au2 + bv 2 + cw2 + duv + euw + f vw, where, to remind you, u = (x x0 ), v = (y y0 ) and w = (z z0 ). Motivated by Fact ??, we form the discriminants D2 (x, y, z) = gxx (x, y, z)gyy (x, y, z) gxy (x, y, z)2 and D3 (x, y, z) = gxx (x, y, z)gyy (x, y, z)gzz (x, y, z) + 2gxy (x, y, z)gxz (x, y, z)gyz (x, y, z) gxx (x, y, z)gyz (x, y, z)2 gyy (x, y, z)gxz (x, y, z)2 gzz (x, y, z)gxy (x, y, z)2 . Based on observations Q1, Q2 and Q3 and Fact ??, we then formulate the second derivative test for functions of three variables in Fact ?? gxx (x0 , y0 , z0 ) , 2 b= gyy (x0 , y0 , z0 ) gzz (x0 , y0 , z0 ) and c = , 2 2

Fact 5. Suppose that (x0 , y0 , z0 ) satises the rst order conditions gx (x0 , y0 , z0 ) = 0, Then: 1. If D3 (x0 , y0 , z0 ) > 0, D2 (x0 , y0 , z0 ) > 0 and gxx (x0 , y0 , z0 ) > 0, then g(x0 , y0 , z0 ) is a local minimum value. 2. If D3 (x0 , y0 , z0 ) < 0, D2 (x0 , y0 , z0 ) > 0 and gxx (x0 , y0 , z0 ) < 0, then g(x0 , y0 , z0 ) is a local maximum value. 3. If D2 (x0 , y0 , z0 ) < 0, or if D2 (x0 , y0 , z0 ) > 0 but D3 (x0 , y0 , z0 ) gxx (x0 , y0 , z0 ) < 0, then g(x0 , y0 , z0 ) is neither a local minimum value nor a local maximum value. gy (x0 , y0 , z0 ) = 0 and gz (x0 , y0 , z0 ) = 0.

You might also like