You are on page 1of 14
Dynamic Optimization The Calculus of Variations 15 a yr and Optimal Control SLI K AG in Economics and Management Series Volume 4 Morton I. Kamien and Nancy L. Schwartz Northwestern University NORTH-HOLLAND New York + Amsterdam + Oxford Section 20 Dynamic Programming A third approach to dynamic optimization problems, called dynamic pro- gramming, was developed by Richard Bellman. It has been fruitfully applied to both discrete and continuous time problems. We discuss only the latter. The basic principle of dynamic programming, called the principle of optimality, can be put roughly as follows. An optimal path has the property that whatever the initial conditions and control values over some initial period, the control (or decision variables) over the remaining period must be optimal for the remaining problem, with the state resulting from the early decisions considered as the initial condition. Consider max f"f(t,x,u)dt + 9(x(7),T) 0 @ subjectto x’ =g(t,x,u), x(0)= Define the optimal value function J(¢o,xo) as the best value that can be obtained starting at time ¢, in state xo. This function is defined for all 0 <1) < T and for any feasible state x, that may arise. Thus, Iton Xo) = max [fox u)dt + @(x(T),T) subjectto x’ =g(t,x.u), x(t9) = Xp. (2) Then, in particular, AT.x(T))=GixtT iT. (3) Section 20. Dynamic Programming 239 We break up the integral in (2) as J(t9s%9) max( f “rae [7 fae), (4) fot At where At is taken to be very small and positive. Next, by the dynamic programming principle, we argue that the control function u(1), fo + Al S ¢ 0, where a >0, 6 >0. With f=e"(ax? + bu?) and g =u, (7) becomes ~J,=min{[ eax? + bu?) + J]. ay Differentiate to find the optimal u: 2e-"butJ,=0, so u=—Je/2b. (12) Substituting for u from (12) into (11) yields = J,=e° "ax? + J2e7"/4b) — Se /2b. Section 20. Dynamic Programming 21 Collecting terms and multiplying through by e” gives ax? — Je"'/46 +e, =0. (13) To solve the partial differential equation, we propose a general form of the solution to see if there is some set of parameter values for which the proposed solution satisfies the partial differential equation. Let us “try” I(t, x) = e-"Ax?, (14) where A is a constant to be determined. From the problem statement (10), the optimal value must be positive, so A > 0. For (14) compute Sm nre"Ax? and J, = 2e“"Ax. Substituting into (13) gives ax? — 4e7*14?x7¢?"" /4b — Ax? =0. Simplify: A/b+rd—a=0. (15) Thus, (14) solves (13) if 4 is the positive root of the quadratic equation (15); that is, A=[-r+(r?+4a/b)]b. (16) The optimal control is now determined from (14) and (12) to be u=—Ax/b. (17) Note that (17) gives the optimal control in terms of the state variable; this is the so-called feedback form. If solution in terms of ¢ is desired, one can recall that x’ = u= —Ax/b and solve this differential equation for x(t)= xge4‘/*, from which w is readily determined. A simpler form of the optimality conditions is available for infinite horizon autonomous problems, such as in the preceding example. An infinite horizon autonomous problem can be written in the form max f"e"Vxu) ae fi sg) subject to x’ =g(x,u). Hence, : Higs%o) = max f“e-"f(x,u) de to : sev"omax f efx, u) de. The value of the integral on the right depends on the initial state, but is independent of the initial time. Now let V(x) = max fe I(x, 0) dt f 242 Part I. Optimal Control Then (t,x) =e-"V(x), Ja —re"'V(x), J,=e"'V'(x). Substituting into (7) and multiplying through by e” yields the basic ordinary differential equation rV(x) =max[ f(x,u) + V’(x)g(x, u)] (19) obeyed by the optimal current value function V(x) associated with prob- lem (18). EXERCISES 1, Solve the example of this section using (19). 2. Solve by dynamic programming: min fey? + eax) dt 0 subjectto x’=u, x(0)=0, x(T)=B. (Hints: c,x — J2/4c, + J, =0. Try a solution of the form I(t,x) =a + bxt + hx? /t + kt, where a, b, h, and k are constants to be determined. Compare the solution with that found earlier by other methods.] FURTHER READING For an introduction to dynamic programming, see Bellman, Howard, or Nemhauser. Bellman and Dreyfus shows the relationships among dynamic programming, the calculus of variations, and optimal control. Beckmann discusses some applications of dynamic programming to economics. Our discussion has focused on continuous time dynamic programming. In applications, dynamic programming’s greatest strength may be in discrete prob- lems, particularly where the underlying functions are not smooth and “nice.” The dynamic programming approach permits very efficient computer algorithms to be developed for such problems. This topic is of great interest but aside from the primary thrust of this book. See any of the references mentioned above for an introduction to this area, Section 21 Stochastic Optimal Control Stochastic features have appeared in many of our examples, including uncertain consumer or machine lifetime and uncertain rival behavior. The literature contains further examples, with uncertainty regarding, for exam- ple, the timing of illness, catastrophe, expropriation, or technical breakthrough. These applications involve a known probability distribution function that is typically a state variable and occasionally a function of a state variable. The widespread use of this approach attests to its service- ability, but it will not do for all stochastic problems of interest. Another approach to stochastic modeling, which has become especially prevalent in modern finance, is the topic of this section. The movement of the state variable may not be fully deterministic but may be subject to stochastic disturbance. To consider such problems, we make some assertions about the stochastic calculus of It6, which forms the basis for the analysis. The dynamic programming tools of the last section will also be used; since a random element enters into the movement of the system, the optimal control must be stated in feedback form, in terms of the state of the system, rather than in terms of time alone (because the state that will be obtained cannot be known in advance, due to the stochastic disturbance). Instead of the usual differential equation x’ = g(t,x,u), we have the formal stochastic differential equation dx = g(t,x,u) dt + o(t,x.u) dz, ay where dz is the increment of a stochastic process z that obeys what is called Brownian motion ot white noise or is a Wiener process. The expected rate of change is g, but there is a disturbance term. Briefly, for a Wiener process z, and for any partition 19, ,/,... of the time interval, the random variables 244 Part II. Optimal Control 2(,) — 2(t9), 24g) — 244), 2(t3) — 2(t),--- are independently and nor- mally distributed with mean zero and variances 1, — tg, t, — t,t; — fay-0 +5 respectively. It then turns out that the differential elements dt and dz have the following multiplication table: dz di az| a 0 (2) a|o 0 Since (dz)? = dt, the differential of a function y= F(t,2), where z is a Wiener process, will include a second partial derivative. In particular, in expanding by Taylor series, we get dy = F,dt+ F,dz +4F, (dt) + F,dtdz +3 F(dzy + hot, so ay =(E+3F,)dtt+ Rede (3) on using the multiplication table (2). Subscripts indicate partial derivatives. Similarly, if y= F(t,x), (4) where x obeys (1), then y is stochastic since x is. The stochastic differential of » is found using Taylor’s theorem and (2). We get dy = Edt + F,dx +4F(dxy, (5) where dx is given by (1). This rule (5) is known as 116’s theorem. Of course, (3) is a special case of (5). One can substitute from (1) into (5) and use (2) to simplify the result, obtaining the equivalent statement dy=(F,+ Fg +}F,.07) dt + Rode. (6) The It6 stochastic calculus extends to many variables, with many sto- chastic processes. For instance, let x = (x),...,x,) and dx; = g(t,x) dt + > 0,,(t,x) dz,, a ia 0) Let the Wiener processes dz, and dz, have correlation coefficient p,,. Finally, let y = F(#,x). Then, Itd’s theorem gives the rule for the stochastic differential: ay = (aF/ax,) ae, + (0F/a1) at is +4 s 3 (80°F /0x,0x,) dx,dx;, (8) isl j=l Section 21. Stochastic Optimal Control 7A5 where the dx, are given in (7) and the products dx,dx, are computed using (7) and the multiplication table de,di,=pyjdt, — i.f=lsseosn, dz,dt=0, i=1,...,7, ©) where the correlation coefficient p,,= 1 for all i= 1,.. The rules for integration of a stochastic differential a are differ- ent from the ordinary calculus rules. For example, in the usual case, if dy =ydx, then y=e*. But in the stochastic calculus, the differential equation dy =y dz has solution y= e*~/*, The method of verification is the same in each case; one differentiates the proposed solution by the appropriate rules and checks whether the differential equation is satisfied. To verify the stochastic example, write y=er?? = F(z,t) and differentiate using (3). Since Ray, E,=y, and F=-y/2, we get dy =(—y/2+y/2)de+yde=ydz as claimed. As a second example, the stochastic differential equation dx = ax dt + bx dz has solution x(t) = xye(e B/aetbe, To verify, we denote the right side F(t,z), compute F,= (a—6?/2)x, E=bx, E,=b*x, and plug into (3) to get =[(a—b?/2)x + b?x/2] dt + bxdz =axdt + bxdz as claimed. As a third example, we seek the stochastic differential of x = Q/P where Q-and P obey aP/P=adt+bdz, (10) dQ/Q=cdt, (i) with a,b, given constants, In the ordinary deterministic case, we should have dx/x=dQ/Q—aP/P. 246 Part II. Optimal Control However, using It6’s theorem (8), we compute dx =xpdP + xgdQ +4xpp(dP) + XpgdPdQ +4x99(dO)” = —(Q/P?)dP + dO/P + (Q/P°)(dP) — dP dQ/P? + 0. Multiplying through by 1/x = P/Q gives dx/x = —dP/P+dQ/Q+(dP/PY —(dP/P)(dQ/Q). Substituting from (10) and (11) and simplifying gives dx/x=(c+b?—a)dt—bdz Now consider the stochastic optimal control problem. a Af f"He,x,u)dt+ o(x(T),T)) 0 (12) subject to dx =g(¢,x,w)dtto(t,x,u)dz, x(0)=x,, where the function £ takes the expected value. To find necessary condi- tions for solution, we follow the method of the preceding section. Define I(t, ¥) to be the maximum expected value obtainable in a problem of the form of (12), starting at time fg in state x(t)) = 0 J(to, Xp) = max A [He xuydre o(x(T), )) subjectto dx=gdt+oadz, x(t) =X. 3) Then, as in (20.2)-(20.6), we obtain I(t, x) max E( f(t,x,u) At +J(t+ At,x +Ax)). (14) Assuming that J is twice continuously differentiable, we expand the function on the right around (¢, x): I(t + At, x + Ax) =JS(t,x) +J,(t, x) At+J,(t,x) Ax +h (,x)(Axy + hot. (15) But recalling the differential constraint in (12), we have approximately Ax=gAttoAz, (16) (Ax)? =g2(Aty + 02(Az)? + 2go0ArAz =o7 Ar + hy where use has been made of (2) and the foresight that we will soon divide by Ar and then let A‘ 0. Substitute from (16) into (15) and then put the result into (14) to get I(t, x) = max E( fAtt+J+J,At+J.gdt+Jo dz +4J,,07 At + ho.t.). a7) Section 21. Stochastic Optimal Control 2aT Note that the stochastic differential of J is being computed in the process. Now take expectation in (17); the only stochastic term in (17) is Az and its expectation is zero by assumption. Also, subtract J(t,x) from each side, divide through by Ar, and finally let Ar+0 to get Idx) = max ( f(t, 0) FJ(tx)8(t, x4) +3071). (18) This is the basic equation for stochastic optimal control problem (12). It has boundary condition A(T, x(T)) = @(x(T),T). (19) Conditions (18) and (19) should be compared with (20.7) and (20.3). To illustrate the use of these necessary conditions, consider a stochastic modification of the example of Section 20: ar Ef "eax? + bu?) dt t (20) subjectto dx=udi+oxdz, a>0, b>0, o>0. Note that o is a constant parameter here. The function o(t, x,«) = ox. Substituting the special form of (20) into (18) yields ~J,= min (e~"(ax? + bu?) + Ju + 30°x,,). (21) The minimizing u is u=—Je"/2b. (22) Substituting (22) into (21) and simplifying gives = eS, = ax? —J2e?"/4b +10?x7J,,e". (23) Now try a solution to this partial differential equation of the same form as that which worked in the deterministic case: I(t, x) = eA, (24) Compute the required partial derivatives of (24), substitute into (23), and simplify to find that 4 must satisfy A/b+(r—0?)A—a=0. (25) Since only the positive root made sense before, we take the positive root here: Am {a?=r+|(r— 0) + 40/6] 6/2. (26) Again using (24) in (22) gives the optimal control, u=—Ax/b, (27) where A is given by (26). Compare this with the findings in Section 20. 248 Part I. Optimal Control A simpler form of (18) is available for problems that are autonomous and have an infinite time horizon. The optimal expected return can then be expressed in current value terms independently of 1. The procedure was followed in the preceding section and the results are analogous. Let V(x9) = max E fe", 0) a subjectto dx =g(x,u)dt+o(x,u)dz, x(to) =Xo, (28) so I(t, x) = e7"'V(x). (29) Substituting from (29) into (18) gives rV(x) =max( f(x,u) + V'(x)a(x,u) +40%(x,u)V'"(x)), (30) which should be compared with (20.19). The next example, based on work by Merton, concerns allocating personal wealth among current consumption, investment in a sure or riskless asset, and investment in a risky asset in the absence of transaction costs. Let otal wealth, w =fraction of wealth in the risky asset, ‘return on the sure asset, expected return on the risky asset, oe variance per unit time of return on risky asset, consumption, U(c) =c*/b = utility function, Bb <1. ‘The change in wealth is given by dW =([s(1—w)W+ awW — c] dt + wWodz. (31) The deterministic portion is composed of the return on the funds in the sure asset, plus the expected return on the funds in the risky asset, less consumption. The objective is maximization of the expected discounted utility stream. For convenience, we assume an infinite horizon: " max ef (enc /b) dt (32) subject to 31) and W(0)=W. This is an infinite horizon autonomous problem with one state variable W and two controls ¢ and w. Although (30) was developed for a problem with just one state variable and one control, it is readily extended to the present Section 21. Stochastic Optimal Control 249 case. Using the specifications of (31) and (32), (30) becomes rV(W) = max (c’/b + V'(W)[s(1—w)W + awW —c] + hw? W07V'"(W)). (33) Calculus gives the maximizing values of c and w in terms of the parameters of the problem, the state W, and the unknown function V: c=[V(W)JYO-?, w= V(W)(s—a)/oeWV"(W). (34) We assume the optimal solution involves investment in both assets at all times, Substituting from (34) and (33) and simplifying gives VW) = (V7)! = b)/b + sSWV! = (s—a(V'P/20?V". (35) Let us “try” a solution to this nonlinear second order differential equation of the form V(W)=AW*, (36) where A is a positive parameter to be determined. Compute the required derivatives of (36) and substitute the results into (35). After simplification, one gets Ab= {[r—sb—(s—a)'b/20°%(1-)|/(1—8)}""". 37) Hence the optimal current value function is (36), with A as specified in (37), To find the optimal control functions, use (36) and (37) in (34): c=W(AbyYO-?, — w=(a—s)/(1—b)o?. (38) The individual consumes a constant fraction of wealth at each moment. The optimal fraction depends on all the parameters; it varies directly with the discount rate and with the riskiness of the risky asset. The optimal division of wealth between the two kinds of assets is a constant, indepen- dent of the total wealth. The portion devoted to the risky asset varies directly with the expected return of the risky asset and inversely with the variance of that return. EXERCISES 1. Solve problem (20) using the current optimal value function V(x). 2. Find a control function e(t) to max E *eore4 ‘jdt J, © subject to dx =(bx—c) dt thxdz, x(0)=x9>0, where z(/) is Wiener. SOLUTION. ¢(1) = [(r— ab)/(1 — a) + ah? /2)x(1). 250 Part Il. Optimal Control FURTHER READING For stochastic problems in this text, review examples and exercises in Sections I8, 19, 111, and II10 and IIIS for instance. See also Cropper (1976, 1977), Dasgupta and Heal, Kamien and Schwartz (197la, 1974a, 1977a), Long (1975), Raviv, and Robson. Dreyfus (1965, pp. 215-224) gives a derivation of the necessary conditions of this section and some examples. Arnold provides a good readable treatment of the stochastic calculus. See Merton (1969) for a more thorough discussion of the example of this section and the methodology of solution and for analysis of the more realistic case of an individual with a finite planning horizon. Brock (1976) is an excellent “user’s manual.” Exercise 2 is discussed fully by Brock. For further applications of the stochastic optimal control, see Merton (1971) (consumption and asset management), Fischer (index bonds), Gonedes and Lieber (production planning), Constantinides and Richard (cash management), and Tapiero (advertising).

You might also like