P. 1
Lagrange Multipliers

# Lagrange Multipliers

|Views: 11|Likes:

See more
See less

05/23/2012

pdf

text

original

# Lagrange Multipliers without Permanent Scarring

Dan Klein

1 Introduction
This tutorial assumes that you want to know what Lagrange multipliers are, but are more interested in getting the intuitions and central ideas. It contains nothing which would qualify as a formal proof, but the key ideas need to read or reconstruct the relevant formal results are provided. If you don’t understand Lagrange multipliers, that’s ﬁne. If you don’t understand vector calculus at all, in particular gradients of functions and surface normal vectors, the majority of the tutorial is likely to be somewhat unpleasant. Understanding about vector spaces, spanned subspaces, and linear combinations is a bonus (a few sections will be somewhat mysterious if these concepts are unclear). Lagrange multipliers are a mathematical tool for constrained optimization of differentiable functions. In the that we want to basic, unconstrained version, we have some (differentiable) function maximize (or minimize). We can do this by ﬁrst ﬁnd extreme points of , which are points where the gradient is zero, or, equivlantly, each of the partial derivatives is zero. If we’re lucky, points like this that we ﬁnd will turn out to be (local) maxima, but they can also be minima or saddle points. We can tell the different cases apart by a variety of means, including checking properties of the second derivatives or simple inspecting the function values. Hopefully this is all familiar from calculus, though maybe it’s more concretely clear when dealing with functions of just one variable. All kinds of practical problems can crop up in unconstrained optimization, which we won’t worry about here. One is that and its derivative can be expensive to compute, causing people to worry about how many evaluations are needed to ﬁnd a maximum. A second problem is that there can be (inﬁnitely) many local maxima which are not global maxima, causing people to despair. We’re going to ignore these issues, which are as big or bigger problems for the constrained case. In constrained optimization, we have the same function to maximize as before. However, we also have some restrictions on which points in we are interested in. The points which satisfy our constraints are refered to as the feasible region. A simple constraint on the feasible region is to add boundaries, such as insisting that each be positive. Boundaries complicate matters because extreme points on the boundaries will not, in general, meet the zero-derivative criterion, and so must be searched for in other ways. You probably had to deal with boundaries in calculus class. Boundaries correspond to inequality constraints, which we will say relatively little about in this tutorial. Lagrange multipliers can help deal with both equality constraints and inequality constraints. For the majority of the tutorial, we will be concerned only with equality constraints, which restrict the feasible region to points lying on some surface inside . Each constraint will be given by a function , and we will only be interested in points where .1

G @ A C 8 65 HFED97(4

A

A @ 8 65 B97(4

1 If

you want a

constraint, you can just move the to the left:

. 

&





& 2301'¡ £  

£

\$ %£

#

This is the downward cupping paraboloid shown in ﬁgure 5. 2. 2. R %£ while satisfying R R %£ 2 0  £¡ Y¦¤¢  £ 0 0 R £ S vP X R r R £ £ StP X X & 2u0 £ 0 § £ R rsR qc f'¡ X 2 p0 R X 0W (i£ P I R £ P hI   X 0 § £¡ g f'¢  2 U0 S TP £ (1) (2) .5 1 1. while the unconstrained minimum is not even deﬁned (you can ﬁnd points with as low as you like). Here we can just substitute our value for (1) into .3 2.5 2 4 2 0 −2 −4 −6 −8 −10 2 1 0 0 −1 −2 −2 −1 1 2 2 Trial by Example Let’s do some example maximizations. However. with a constraint.5 0 −0. we’ll have an example of not using Lagrange multipliers. it highlights a basic way that we might go about dealing with constraints: substitution..5 2 1. Maximize the value of .5 −1 −0.5 1 0. P QI Figure 1: A one-dimensional domain. but we’ll come back to it once the Lagrange multipliers show up. First. Then.2 Substitution Let .5 −1 −1. To do this.1 A No-Brainer Let’s say you want to know the maximum value of subject to the constraint (see ﬁgure 1). and get our maximum value of . The unconstrained maximum is clearly at . we add the constraint .5 −2 −2 −1. we maximize (or minimize) by ﬁrst solving for one of the variables explicitly: S S P dcI 0 S eR aP 2 b0 £ XI ¨WP R £ R X P wgS X P ` I P VI Figure 2: The paraboloid . It isn’t the most challenging example..5 0 0. Now let’s say we constrain and to lie on the unit circle.

Left is the unit circle 0 R R r %£ R R %£ . which has a maximum at . dropping. let’s stick with the from the last section and consider how the Lagrange multiplier method would work. . the ellipse is tangent to the circle. while costs twice as much from . Beyond this value. At ﬁrst. it’s usually very hard to do. Since the contour (light curve) is a level curve. or level curves. Consider what happens as the ellipse expands. As the ballon expands. Since we may move anywhere along and still satisfy the constraint. the level curves do not intersect the circle. along with two different constraints. Since the curves aren’t tangent.5 −1 −0. the values of are high. The size-zero ellipse has the highest value of . we can nudge along to either side of the contour to either increase or decrease . S 0   X   r £  2 30 S 0 H    & S   S 0    a0 X . The minimum value occurs at both these boundary points. Again.5 −0. which are wide in the dimension. Note that the direction of greatest increase is always perpendicular to the level curves.5 1 1 0. The key thing is that. the two curves are tangent. and which represent points which have the same value of . are ellipses. So cannot be an extreme point.5 −1 −0. until the ellipse surrounds the circle and only the short axis endpoints are still touching. Imagine the ellipses as snapshots of an inﬂating and balloon.5 1 1.3 Inﬂating Balloons The main problem with substitution is that. 2 Differentiable curves which touch but do not cross are tangent.5 0 0. The contours.5 2 −2 −2 −1. as in ﬁgure 4(left). and no point on the level curve will be in the feasible circle. The dark circle in the middle is the feasible region satisfying the constraint . the value of along the ellipse decreases.5 0 0. despite our stunning success in the last section. then the curves will cross. This is the maximum constrained value for – any larger. ). This shouldn’t be too surprising.5 −1. where and . as in ﬁgrefﬁg:crossing(right). 2. If the two curves were not tangent.5 −2 −2 −1.5 −1 −1 −1. intersecting the circle at four points. Rather than inventing a new problem and discovering this the hard way. we’re stuck on a circle which trades for linearly. while the points on the other side have lower value.5 1. imagine a point (call it ) where they touch.5 1 1. we’re back to a one-dimensional unconstrained problem. the points to one side of the contour have greater value.2 2 1. meeting at .5 0. the one-dimensional problem is still actually somewhat constrained in that must be in . where and . but feel free to verify it by checking derivatives! RS S X 2 0 U£ T  2 0  0 aB£  S P S h §   X     X R X 2 30 & X R %£  2 0 tB  £   R XI (P   S 2 0 3"£ 0 2 B  § R R X ¨I X P wgS R XwsS VI P P ¡ P VI r R £   S  ¡ 0 0 X 0  f'¢  § £¡          XI ¨yP  &   P xI   Figure 3: The paraboloid right is the line .2 The ellipse then continues to grow. Figure 3(left) shows a contour plot of . When the long axis of the ellipse ﬁnally touches the circle at .5 0 0 −0. but the ellipse does not intersect the feasible circle anywhere. and highlights one weakness of this approach. The curves being tangent at the minimum and maximum should make intuitive sense. This is the minimum ( . Finding the constrained minimum here is slightly more complex.5 2 (3) and substitute into (4) (5) (6) Then. at . The arrows point in the directions of greatest increase of .

5 0. and nudge along that direction.96 −0.75 0 0 0. while the partial derivative with respect to recovers the constraint . . way of looking at the tangent requirement.5 0 0.1 −1. This should also make sense – the gradient is the direction of steepest ascent.04 −1. the gradient of will be perpendicular to as well. Consider again the zooms in ﬁgure 4.08 −0. but at least it illustrates the method. The contour is a level curve. So here’s another. equivalent.7 −0.0.8 0. It was so easy to solve with substition that the Lagrange multiplier method isn’t any easier (if fact it’s harder). Let’s re-solve the circle-paraboloid problem from above using this method. there is no local move you can make to increase which does not take you out of the feasible region .5 0. The is our ﬁrst Lagrange multiplier.04 0.08 1 0.94 −0. If the direction of steepest increase and decrease take you off perpendicularly off of . really!) rests on it. Formally.5 1 1. we can write our claim that the normal vectors are parallel at an extreme point as: So.02 −1 −0. The two curves being tangent at a point is equivalent to the normal vectors being parallel at that point.   # &        &  STP I P P VI £ ¡ R RR £ r ¥R ¤ F£ R¥ £ & P  £ '¡  '¢   £¡ 2 0  £ 3'¡ & & P  £ ¦¤¡  ¦¤¢   £¡ &  £ ¦¤¡  &  ¡ #  #  2 & & P #  0  § £ ¤¡  #  £¡ ¤¢  # 2   #     0  ¢  # ¡ & 0'¡  £ 0  £¡ '¢  # 0  § £ %f'¡  &  2 0  £ 31'¡ 0  § £ f'¡  #  0 0  § £ ¤¡  £ & &     &   & (7) (8) (9) (10) (11) (12) (13) (14) (15) . then.8 −0. Now think about the normal vectors of the contour and constraint curves.06 −0.1 1. maxima.92 −1. which generalizes better.02 −0.6 −0.06 0.65 −1 −0. and. But that means that at an extreme point . the direction of steepest ascent had better be perpendicular to . while it is ﬁne for to have a non-zero gradient.04 0. get a non-zero direction along .6 Figure 4: Level curves of the paraboloid. is normal to it.7 −0. our method for ﬁnding extreme points3 which satisfy the constraints is to look for point where the following equations hold true: We can compactly represent both equations at once by writing the Lagrangian: and asking for points where The partial derivatives with respect to recover the parallel-normals equations. even if you are not at an unconstrained maximum of .5 −1. Otherwise. This intuition is very important. or neither. At a solution .02 0. The Lagrangian is: and we want 3 We can sort out after we ﬁnd them which are minima. intersecting the constraint circle.98 −0. we must be on .5 0. the entire enterprise of Lagrange multipliers (which are coming soon. and so the gradient of . we can project onto .5 −1 −0.5 −0. increasing but staying on .

and . it normal of at . we still wanted that there be no direction of increase inside the feasible region. but note that the Lagrangian is a function of +1 variables ( plus ) and so we do have the right number of equations to hope for unique. .+) quadrant. Before we do anything numeric. existing solutions: from the partial derivatives. we can write the Lagrangian. we have and we want We can see from the ﬁrst two equations that . means . If . then . To recap: in the presence of a constraint. . we must have either or . With multiple constraints . at a point where the line is tangent to a level curve of .From the ﬁrst two equations. This occured whenever the gradient at . this method still works perfectly well. since they sum to one. we knew we were at a local extreme because the gradient of was zero – there was no local direction of motion which increased . . If . Let’s say we instead want the constraint that and sum to 1 ( ). while probably not zero. with. In unconstrained optimization (no constraints). and solve for zero.4 More Dimensions If we want to have mutliple constraints. and . These are the minimum and maximum. Actually solving that system of equations can be hard. The last statement generalizes to multiple constraints. because we were conﬁned to . convince yourself from the picture that the maximum is going to occur in the (+. Also convince yourself that the minimum will not be deﬁned. though it get harder to draw the pictures to illustrate it. At those values. 2. plus one from the partial derivative. had no components which were perpendicular to the does not have to be zero at a solution . let’s think of the parallel-normal idea in a slightly different way. which. respectively. To generalize. So what do we have so far? Given a function and a constraint. We will also want the gradient to be non-zero along the ih (I  & 0 ¥ he£  ¡ &  ¢  # 2m0Y¤¡ \$  & £  ¡ #    2 t0  2 0 3" 2 0 3" S TP   ¢  # ¡ R £ r ¥ £ P R gP f £ P ¥ £ I P ih f ¨kgP 0 " R £ I h¦£ 0 ¥ 0  § £ ¤¡  0  § £ ¤¡   0  § £ ¤¡  d  & 2 0 on ¡ \$ ¥ e£ d R d£ d which gives S  0 ¥ 2 ge¦£ a0 R £   2 30 S hP STP R I P ¥ VI P  £ r ¥e£¤ F£ ¡ R R £ & P  £ '¡  '¢   £¡ 0 " S P vEX 2 0  £ 3'¡ 2 0 ¥ £ UseD r £ I P & S vP R 2 t0 R £ I P R 2 U0 R D £ I P 0 B #  ¥ e£ P £ r ¥R £ f gP  £¡ ¤¢  # I P S hP X   £ 2 0 tB  0 H 0  § £ %f'¡  0  § £ %f'¡   0  § £ %f'¡  d 0  § £ f'¡  # 0 0  § £ ¤¡  2 0 ¥ Use£ ¥ £d R d£  d d d d d (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) which gives the equations ih ¨kf S \$ £ l \$ £ 0 B   a0 R £  l I P   0 " l & ih (jS S 0 R £ & 0 B    . Along came the constraint and dashed all hopes of the gradient being completely zero at a constrained extreme . then . differentiate. Then. and . just has to be entirely contained in the (one-dimensional) subspace spanned by . we will insist that a solution satisfy each . that values get arbitrarily low in both directions along the line away from the maximum. Formally. we have the situation in ﬁgure ??(right). However.