You are on page 1of 35
OPTIMIZATION AND CONTROL Richard Weber Contents DYNAMIC PROGRAMMING 1 Dynamic Programming: ‘The Optimality Equation Ti" Conirl opinion ve ie UE Rutmpe to honon path pie 14 The opmatyeuation === 15 Marie decom proces 2 Some Examples of Dynamic Programming 21am: manage sedis aa 25. Example: acepting te best ofr 1 Dissnnted cost 52. Bsampl: jb whiting BA Theinisiteborinn cm |! : 8:4 The optimality equation in the ait bvion eae 5. Examples sing an mee 4 Positive Programming, 41 Esamper pole fc of an ota ply 12 Charcereo of he opel py {3 Bsample: option! suboe Valu teraton 3 ample: piarmaceuia ike 5 Negative Programming S10 Stationty poles 2 Chnncainuton af he oil iy 'Opsmal song over «Malte hotuon 1 Example: epi parking iL 2 a 1" u uw 1® 55. Optimal soning over the nite Borne © Average-cost Programming Br Avrageeast optimisation £2 Blample:wiisioncomtol at ie 82. Val eration te Poly improvement . Lae sysrEMs "he 1) regu a 72 The Reem recon 18. Rsample: aries white oie TA Contmoar tine LQ regulon 4. Lmar dele Unearation 8 Conteolaitty RI” Disturbances 82 Tracking 8: Contra incontinence 85, Example: broom bnimcing Infinite Horizon Limits 1 Examples sae in» pve eit 2. Sealab 8.4 Examples penal 35, nt rvs LO veaton Di The [Ane] epee 10 Observabity 101 Oleeray 102 Obserailty in coatinvous ine 103 Bamps 11 Kelman Filtering and Certainty Equivalence 112 The Kalman Ber : 13 Caro egufabence 14 Example nits oe wih a’ pation tsi CONTINUOUS-TIME MODELS 2 Et 2 » » 2 a a 6 “ a 12 Dynamic Programming in Continuous Time “6 21 Theopsunay equation % 129 Bvample: 1Q retin 6 123 Example eaate penning ie 124 Beample Laren ° 19 Ponteyagin's Maxiznam Principle Th Beare destin 122 Baample: bron patie tor 183 Connon with Lge euler 124 vamp eof taney onisone 1 Applications of the Maximum Principle ss TEP Probl with torvial conn = 22 Exam: unui a 13 sample Insects a optiniers 1A Example cet thre opinion 15 Controlled Markov Jump Processes st TR Thedynanmi programming equation . Ea 185 Unibriton bn the ffi rion os a 154 Example also contra tn mis = oa) 16 Contra Difco Prowse “ 161 Dion procmes and oneal difasn proces L 162 Example: LQG in consinon tine 2 6A Addenduot on PNP. cova subject wo comer Loot Schedules ‘he fst 6 lectues ate devoted to dynamic programming in dicretine end cone bth Gite td insite boraon pans: dnote cnt. pt, negate ‘ale eration hp impeoverer “Thenest Slater ntedevavd tote LQG model (nea system quadratic cot) sd cover the impore es fenton ser; the Ret eto “The final 5 lctures are devoted to continuous-time models and incitement ‘of Poraryagins asian principe andthe Hamionnn ako deesaon proceso Enh of te 16 lecture i eign to be some scot i thee wal be one eat o ‘Negative Programming, one on Contvola ce Exes ‘a spinon ar portant nth couae 0 there ae oe nore mie eee in each etre Examples sheets "here are three example abet, corresping tothe sid of he course, "Tere ae tor thee guenions or ch tare oe tected nd wre oa poe ee, ch question eck to indicate the lnere with which ex acne Lecture Notes and Handouts "There are priated lture noes er the couse and tbe ocuoal andwts, "There srrahetssammarangnoeaton ar what Sou re expt ofthe eae ‘Te notes acide at of kayword and wilh denn your atria to thee we yo long Ifyou have «gor sep of te meaning ofeach of ee hry Wor, ek ou wil be well on your wa to undestaning the iportact concep ofthe ours WWW pages etp://ewastatalab. can, ac UR) Pew /oe/ dex tS Books ‘The flowing bool re recomended D.P, Beriatas, Dynamic Programming, Prete Hal 198, .P. Berta Dyna Programing ond Opal Contr LLM Hocking, Optimal Control. Aw introduction to the ory and appleations, Oxf Raw, Itmdtion to Stachate Dyn Progaming, culate Pres 08, White, Optimisation Oeer Tame, Varn Tap I, Wikey, 1962.85. ows book i probly te ease 1 read. However, only coves art Lo he ‘outa: Whites bk i pot foe Pst H and Hoek book ot for Dare HL Tes the arena clletonexn te fa fn Seton 3D and 3D ofthe DPA ary Notation ders fom book o book My notation wil be dost to that of White's books and consent througout. For example, I wil always depen iia cst Fnetion ty) (cere inthe econ tke pun nd, Weed a nytt symbols ed for thls qos) 1 Dynamic Programming: The Optimality Equation We invoice the des of dso programming an the pic of ontinaty, We sve notation fo tatesteutie eel, fete te af eck, open To, ‘cap coat ni dc roms hl he ha cate eh 1.1 Control as optimization over time Optimization la key tol in moda. Semen is npartane co salve «prob Fe cially Other tines ett a pt aptinal lution is food enough the reat wotlen dosent hare ge eterion by which a solution can be jue. However, ‘ven thes optiizton i efi a way Yo test thinking He opt solton ts ‘calor may aupat wasn whi both modal ad hiking can be eid ‘Cael mers is cou wil dye mptin sn ie opinion over time. T ceo fr the Face that «snarl seer ta eve socal Ut ay wanna unkown ciety ere as wes frit, i tte UK econo), ‘Thc contenste wih opinion mln the TB come (ch thon or LP el ‘evr ow mods): hee stl and wig was rato o dn. Te ees thee tev letra: dja stctnate eve, and inpertect eta cleeration, tt ‘We cost ape an et Ici can the portance of corral ry teasing te detent song the winds on qveror sd soo Sah le this theory hich we ite, Nault everon ad Asami la 1.2 The principle of optimality A ly iden in that optimization over time can ofien be regard ms opimizaton in ‘ages We tae of our dese vo attain te lowest posable eos atthe preset Sage tinimies the su cf the cot ncured tthe curren tae and he least otal ost that Se beer om all sue ages, compet is dein Thi snows ‘tke Pine of Opal. Definition 1.1 (Principle of Optimality) From any pont onan eptinalaeetors, (he remataing trajectory optina forthe coreaponding prehlm intaado that pn 1.8. Bxample: the shortest path problem Consider the stagecoach probe? in which a trelerwhbes to minimize dhe lente fs joaraey om town Ata town I by St tring to one of UC or D aed he ‘a to one of Ey F of G then nas wo one of Hor Tad th ally t6 3. oad sytem fo asscoach problem Solution. Lt F(X) be the ina ste required to each J rom X. "The hal, Fay 0, PU) 3 ad FO) = PIE) = info +A, = ds on, Recursive, me oa #(A) = 1 a in ASDF althowah ii ot une). The study of dui programming date fron Wiad Balan, who woe he fit hoon the abe (1077) and ga iene. A ry age mame of problene ‘ou be ental sy 1.4 The optimality equation ‘The optimality equation im the general ene. In disenetatime ttle inser ‘ale oy ‘Suppose wf contel variable wave value st be chan at Ue LE Urey = Caps) ont the pti ence of ental (or dots) {alin dt sng Supp the oot pt the ime Mors be by © = Git) = Glayeny opt) "Th the price of pt ity is xpath lowing bmoxenn ‘Theorem 1.2 (The principle of optimality) Define the fonctions Cientd= ial Ua) hen thee obey the mension Cilia H) ight) tek th torminatexataion GU, 8) = GC). "Th cot isin ro the dition of GINA hs Ur) igh inf ley teas Oe gay “The state structured case, ‘The cokrl vrais choweron the ss of cng Upon = (Goya, (wich determines everything es) But a me economical repression othe pet bitin bf aici. or ecole nny it med tow se ene path that bas ben followed yp tote but ony the pace co which teins taen ur The den of state ariable © Bs that wae nf dence, is ebulbl om own quite ie obeys plant equation (or of motion) 4) Sonpose we wish to minnie cot Beton of he frm fee 0 Scnent-r Css oy cio ot) Dib he fm S enwnr) +Oxlen), a) o ud the minimal cont from ine towards a an optimization ove {steal EO Fist) = iC ee 1) th inna fers ont sonar hat the at ea ab time € Then by an inductive poo, one can showin Theorem 1.2 that Fiat) igen + Plata. 0). teh. as) eth terminal eoaiton F(A) = Cy (2) Herein generic vie of x. The in Ising in (3) he opin conte wat al nor of my, yar erent ‘The optimally equation (12) ius ealled the dynamle programming equation (DP) Bellasan equation "Te DP quate an opin contol den in what eal eek or lowed lop fr, wih y= u(y 1). Then evtras to the open Toop eatin invwbich (oyun) ate to be dterined al a one a tne A policy (ot straigy) fo rl fr ning le the cae eae der al pune ‘Sreunstancen a fuelion ofthe perevelercstances, To summaries (6) he etal a nein only af yan olen) (i) The DP eqntion exprenes the opal yin dome lop fon, He opti whatever the pat cote pies ry ave bee, “The DP stone mack ocuion in tne (a wok we a the opin, stth=Iythen h=2 and on) The ater poly i decided it. “Life must belived forvart on wndertod backwards? (Kien) 1.5 Markov decision processes Consider nom soba evoaton, Lat Xy = (eye) md Ue = (ys ) dene the x und wbistoree ttn As above, tate srusie is charter by the Hct tie withthe following pope me " (4) Mertor dymemies: (Le, the tach seson ofthe plantation) Plaeas |XeUi) = Pheer |) (0) Deaonposate cos eas sven by (1) ‘Thee amtimitions define tate structure, Far the moment we se ei (©) Perfect state obcration: The caren vlc of the ate i observable, That i, “rts pw at the Ue at which te use be dicen So, hice Wy dane he herve stay atte fe meue Wy = (XU) Nove that Ci determined Uy aso we ise wate C= CWA) ‘Twe menpions dfn wnt chow a dcr tine Markow decison pro- eat (UDI) Many of out toate wil be of this type, Av aoves the ent onward em by (1.2), Denote the minimal expected cot hor Ge rere oy Whee = denotes a ple, Le, «rule fr ein the contd wy. Weean ‘Theorem 18 FIN) ie a Jencton of 24 and talon, soy lent) Moye the opt nat ean Flew if (derued) +2 Crant + Dlaaulhy Coby AA) sith ermal condition Penh) = Cao. Moreover, a insng value of in (LA) (which els oy a fction 2 ant) i pte, rook The vl of (HH i Cay) 20 the aera redeton of Fie ada ine Aun talline t+ 1 Ibe DP eintion th FIM) =igflea ent) +BlPtensn t+ DL XU) as) Bat, by auto a) the igh esi of (15 eet hl of (4) All Ue aerions ten fallow. 2° Some Examples of Dynamic Programming We strate the meted of amie programing ao me fl rick’ 2.1 Example: managing spending and savings An investor eines anal incone fom sulk society of pom in yet. He ‘ete uy sd hy a to is emp, Oy sy Th pital ete Interest ate Bo 100%, ob incon par t+ 1 increas 60 signs een) + Bloe—a edie aon is total oat ove yes, C= 327 Solution. Inthe notation we have Be tng, ott) = Wy, Cp) =. Thi ie ‘avis o workin tre of ‘time to gol, = ht Lee F(a) dete the ai foward obtainable tating in wate rand wen there i tse # vo go. The domme Progamming equation Bi nego Beale +620) shee Fs) = 0 ace no mone canbe obtained once ine his ence) Hae, 2 and (ee geri vals for Zp F(a) = gasce+ t Bee—a)) = ps a 0) = Next, Fila) pag + File Blew pps ea) Sige a $2 +02—e Hein ts maa cur tw = oe Fla) = mas 4 O25] ~ a 02) — pe Th rete he gues ra) = facie Tele hl, wed w+ ocaat + pole Ege t pve + et2— wl se ‘sour gue ist and Fs) = e., where p, obeys the recursion impli in he RTE ST) ings {hseere TE te at gt LM hp Ietoinves the whole at the inore in gears. .-sh 1 (0 bunt pea ast ‘There are ever thingy north remeber rom example. () Ie often cal to fame things interns ‘of tae o> yo, (u) Ah the form of the dame ‘agreming ation cnn steeies lok mary, try waking bmcoar fre FC) (stich) Oten pater wll cee a whieh we ea pee topeber SDs (i) Whew the dais nre Heer, tbe optimal comedies ta etere Pic of th wo outset The form of po chee some nothing ‘Secon: everthing, known as bang-bang control. 22. Example: exercising a stock option. The owner of «ell option kab the opin to buy sare at Hed ting pep The pt nse be nace yh I a neo the pion on ny fae he inadintly nl the abr xt the caret yi rhe xs mem pat of = Suppor the pice sequcice obeys the equation iy = 24+ ty whet te ae Lid random variable or which |< sc. be am eto exer the opie apna. Tar Pe) eth sl antn uanel snpestel wat ha the shar snd there ae days to go. Show tat () #2) non deeensing as i) Fo) —2 is tonocteasng ina il) fa) comtnuois a, Dede tat the oda poy hs he aration lone ‘her ents enn deresing somnee (a) sch that en otal ply it exerese the option th ratte ha 2 ey are 2 he caren price ans the ner ef das Coy ere exp of he oan Solution, The sate yrabl ttn ¢ sly opening, x pt «verble wih indies whether the option has ben cece or at Moneve, ie oly the her ‘ome wich in often ri the sive tne variable. Since dae programe ‘alas ealeaations bacrard, fos the teredantion pol it wen stage Fivetion (maxinal expe probit) with «as to go then sd the dynamic programing aton i BI mn(a—pEIRa(e teh 9 hae Note thatthe expectation opener comes oui ot rid, F-() ‘Oe can we tdcton vo show () (i) ad (i). or eam) 6 obviows, ince ‘Formal prot " Ke) we(e— pits +0)} 2 max(s —p.0) = F(a) Now supony adusivly that Fon > oa, The Ble marten p.BUP.s(o+ 6} 2 (e~n EIR ate) whee, isn dereing in» Sines am inductive prot afi) lows fom Bia) = 2= mx(-7.Balete)~ (249) +810}, snc the ke han! draenei he non eres erate the ih om (i) and the fat tat (2) > pt lowe that ere exis sah at 2a) iskeater that s~pilz and eas 2 ~pilz > on I follows ro tha ie onrdeeresing t,he constant oy the ales for hich Fale) 2p 2.8 Rxamples accepting the best offor We aca ntrviow hcudate fr jbs At the of ah evi we a ithe hire reject se andate we have Just se, ad mony tot cange eis detente Candidates ae sen in radon Orden an raed wet the ten previo Solution, Lm Wy be the hiory of users sp to tw 1, be er we have inerviemed the i candi A sh matters ete eo wb te hk ‘adit Utter than all ber predevewors let = 1 if thin intra ah ry = 0 ier. Inte easy =, the probably she ste bes ofall hepaites Pibestofh) ht, First oti ~~ Now he fet that he cast te bet the # cates ee 0 psc rection othe rltive rn ft Bvt = andes: the y= Tae Way ae atic lodependet ad we have Ps |=), PW) et F(t~ 1) be the probity that ander an optimal poy we sect the et ‘oat, ve at we ve seen #1 ean aa Ue La ne a he Piast of | bes oft ¢) = P= = Perm) = Pim) = 4 Foe) = 100-2 Eo) ewe py F(t 2) 2 FOE) he a A Tee es o/h and 10) 110.) > ha for lng t we hae FO.) F913). Hence s sete a be opis oly ue Joe ae tk a decening oder othe es SEP7(0~B%) Ths type ening ie noo ats inerchemge meet ‘hare ae » couple pits to ote, 0) Am interchange argent an be usa for sing dco problem about syst tat eve cage ARoogh suck [woblena Gan be altel ty dyunnac programming, as ierchange argument bert reckon eniee (8) The ds pte ec ote ally paced i ne eve they ate te points at which «numa fas ave been camped. 8.8. The infinite-horizon ease Inthe it oraon meth at fanetion obtained simply rr (2) tthe bowed ‘cir from te tori ok However, wi the horion ate tae 0 ‘Tet ws coder the tinebonmgencous Maroy cas, in which cost ed nami ‘donot depen aise] = eau). Spout tha here ino tril est, a; Oy(a) 0. Deine the r-horon car wer peg = = eead= Be |S aetaun where Ey denotes expectation oer tho path ofthe proces under paige tae "efi th rsp to eve the fia oon os Fie) = Wh Fa), (Cory: this naps ete aod tio the optic euation Faay= (lee) + ELE (ar) [39 = 2500 =P ey with ein condition f2) = 0 “The epe torcn cst under poy is ko gute naturally defied | Fie) = im, Fle. D (discounted programming): 0 <8 <1 tee) @” Hac day thre te prot 19 that Fd the poly thi iiss the fventr exacted retur and exes ha aia ‘eur neque rot of easton. Show ta ge) 2h he {hen e ould nvep tho it Wi Whi» at east Solution. Thee are only two states, depetng whether sbi has bee acepeat ‘riot Lt thes lad rept The optimality tin ie Ly ay SETH) 9 ra = 981) + [iy BEN) dy aris fie ort o-aeoy- [ sn hh na ng = ha oe a a site tne tect iene ead a Sabie SSSA ube a a the agetiaitaTad tiga alae ate et (1 si) =2/8FC)~ BEOYEWP, snd has #* = 1/ VIE Bh ad 9° = VAT = 1312 1/2 9 sould nately aco be Bit A ‘ose that dsoumtng ain in tic role beans a each stage the pb stitay 1" shat neste’ wll ctor shat Ings hinge to an rd Te ‘Abrectecaton fe mane a whieh decoustng ean wie ft ule wet PAP ae ao 4 Positive Programming \We addres the seca theory of masinzine oti ewan, oon tat thee ray leno optimal poe bat tat ifs paey has ave neon tot satis dopey ‘euntion then opt) nd the et ae ere, 4.1 Example: possible lack of an optimal poliey. Posive programming concerns minimising non postive cots, cra) © 0. The mme “riginates rom the eee problem of masinitng nom give rem rz6) > fed for this seta we peers in that setting, The ellowing example sw ‘hat thre say be ‘Snowe the pou stain at the oo-neatee integers alin eat x we have ‘nice of hee ving to mats +1 aed weevig no rewur,otsng ost ‘Ring eae 1 1/s sn) en reining tae O toner ad Sain Fare reed. The optimal ection it Pooyeumal—w/eFet Dy 230 (hen (2) = 2,2 > 0, ht the pic that dros the miming ston in che ‘pray equation slnays owen co sate 1 ad hen hn sve venerd. Cal thor xno ple that actualy achieee newt of 42. Characterization of the optimal policy The following theorem proves» necessary and scien comton for plies tobe ‘optima nay value tnction mst st she opt eto. Ths hore ‘Theorem 41 Suppote Do P belie end is a pay whose value faction F(2) tela the optima opnton Poa) = mpl ayad + AERC) 20 root Let 7 be any poly ao upon it hee) = fe. Sine M2) eather ‘he opty ewatian ene) & es fle) + SE Pra) |= 2308 = fle By repeated alti fei it tel, we Bad Bena) 2 Be [Eortenea| nee P we ca drop the final term on the ight ad sie of (42) (because tonne) and then et nese D we cn to ei alert (he ter teat era. Hither wae hve Pye) 2 POG) 4.3 Example: optimal gambling ‘A exmle he pound and want neease tic to N. At ach ae ab ean bet any Frcton of he expt sa 5 Ether se ws with paiy pl wow hae) oun she les, wi Probabity¢= 1 7 and bas t= j pte Lt the save Seow te (0.1, N)_ Th pan tps pon ronching ate Or The oly ser Fevardis 1, upon reaching state. Supowe p= 1/2 Prove that the uid states, ‘tadways btn ony T pod, ain the yeobabity’ ofthe gambler naling N FO mnbOF +A +e, 1p show tht he tnid strategy opti we need to Go it value faction, 37 Gli and show she (2 slate to the opeaty equation. We bave Cli) = “Lat Latent), 0 = =e past oe] +6 ins bei bd ie. Tes a spl exci to show that 44 Value iteration ‘Thoin ot function # cau be spprenitnate by successive approximation o& value leeratiom. us porns and practical tld of comping Let te Mine Fela nF) = Bim inf Fe 2). a "This eats (by mumstane convergence ute No Po by theft tat er D the tort inoue ter Ge ¢ is sshingly al) ‘erie sat (4.2) revere the OM Oye ahd Int 0 (6). The lowing By(2)~ Pla), However, it cae N we no! an adn amptin (Gite actions): There are only nity many mile vals of in each tte “Thcorem 42 Sappoee Hat D ov B holds oN and hall. Thon Fae) = Fle). 0) (0) = Sim (0) ~ Li in 5.2 & i (ie) Fla ‘Taking the inna ove 5 ge Fal) & Pl). "Now we prove 2" a th pie ewe, eau) <0, 99 Fle) > F(a). Now ie oc, Tn the dace ct, wt 0] Big salaacing B > 0 fom ‘texy cot Thieves the aie bio co ume ang ole by omy B/D) te F(a A) aso deceno by hs aout Alles ce no eee sche rest have jae proved apie. [Akeni note hat FG)- 98/8) < Fla) < Fa) +" BL-B) (ove you ote why) and Mealy Fle) = FU} Ts the negative ae, Fre) =i moles) + Final) [=a = e000 =a) = ete) + ie 1B ale) [20 = 2500 =) = mine) + Bole) 20= 2,00 = 0), wa see the it aut flloms besarte inn ner ite umber of tera Uh cond equality fellow by Labeepve mocotone converge (nce P=) inte lay Me be de pty thn came he naning mtn he hte oe (3), Theis, sbsition of (1.3 ft sland ing the fc shat N mples esting $00 gies F(a) 2 Fle) 2 Fa). 4.5 Example: pharmaceutical trials Tn to woot fr sen pati with pba independ of steoas be ther picts The new dei tested ad wa uaknown probably of wcce Which the doctor believes oe unordered ever 1 He teats oe patent [dy det chore hic Sr to we. Suppor he be Sneed eects Ethrs wie se new drug Les Pe, j} be the tata expect dco mambo Tate patents who are surely ete fhe choos betwen the deus opty ie creme: For exe ile te ony she otal reg te expec ‘ioctl mea ena ray ed ppb Sm A Tho wast dean fa Grew of, oc0ch nC Mo of ose sn thrones ns) = (41/6442) Ps ota eon i eas £5 peas re 221 acon py Eton ssa) ee nt pile to gow wie expen for Bt we eo Bo a apie une sation. Is fl very Inge sa 90, ten Sef) = (2 DY Le EP +) I goad appeosimation wo @. Thus we can take roan) 2170 nod work baked For 9 = 005, rr AISI St S08 902917 11 02 ‘had ast a8 ee ee) “Thewe nambers are the yeas also» for which tis worth continuing wh a lest one more tl ofthe new drug, For example, Wi Tee worth ing with th ew drag wn p= 05 © O63. AL thi pol he prob that he new drag will accsuly treat the eat pati ie 5 aad 20 the dctor "od actly presi the diy hi oly to cel Thi example bw he ‘Bikeenosveraen genie pocy hich ie fone iments sorry ‘pina pie, wie frets imiadiate reward in ord ga infomation sad Dodly gener roars ater on. Notice tha worth wing the new dg at st ‘er p= 6761, ewe shou a et the new dw ony Be coe mh 5 Negative Programming We ares the spc theory of miiizine postive casts (ting thatthe atom that ‘xtremises the ight band sd ofthe opty sation vena opinl pli, aed “ope probes ad thei slain 5.1 Stationary policies A Maskov policy i plcy Unt pecifeth contra at time to he imply fant ‘fhe sete and tine In she pro of Tore 1 ne tad na = fi) 4 pei he cectsl at tine This convenient notation for Maro gol, aed we wre to Ufnfien) fim ion the pry doo nt depen ont, 80 ‘eationary Marlow pole, sid we we T= (fof) =F 5.2 Characterization of the optimal policy Negative programming concen miniising now mgatve costs eee) > 0. The ame ‘rine rom the etal pablo rim ne pve rrr) = "The flonan Ueorem sve» necessary aie capdiin a tataoaey iyo be opti mae mst choose the opin vo the ght hae sie of {he opal equation, Nove tht inthe satemet of the Weorer We te requ ‘Theorem 1 Suppate Dor Whole. Sarpe 4, f(0))+ AEE (e) | 9 3300 = He] = mine. 0) +E 20 ‘hen P(ey2) = Fa), and x optimal Sinn thie poy tax = fT by aubteating the ope eqption ins tan wing the foc thi F species the wiinzns cont ech staat, [Seca Fe) x= 2] +06 (Felie=a). a) ts case N we can drop the Boal erm on the righthand ie of (1) (Omen He on neste) ond the le» 2 neue Dee en et 9 2 ect bering that ‘his ter eds to ae. Hither was we have F(a) > F3). ML ‘Acoollary ie that an oil poley abaya exit, Nether Thorn 6 hi oar at ee oe pastive poring (ese example Seton 11) 5.8 Optimal stopping over a finite horizon (bo way tha he tosl-expectl cast cam be nie i iti posible wo ete 9 state From which no fartber ect are nerd. Suppose is just to poste isa = (ctop)and w= 1 (oon. Suppor there oe eration state 0 0; hat eter ‘pon choosing the stoping action Once ths state erered te ayes stay a that ite soto further en enced thera ayn The fniteharnon dye yrogtarming ection shore File) = mine), le) + BYE (2) | 9 = 2.04 = 1) 62 with, iG) =H). = Coir the ac of wate in wichita lest gd to stop now wt continue ‘one more step a then sap 8 = fe: Ma} Se) + 16) [29 = 2500 Dh (Chars it cannot ho optima ost i 8, since in chat ease it would Yo srcthy sl nite borin opm ples, “Toren 5.2 Suppose 5 i clas so that once the ate enters $i emai.) ‘hon an optial oly or lt fase hortcans stop iond only 2 8. Proof. ‘The poo i by iaduction. Ifthe horizon «= then obviously i is psa to sop oly ifr eS. Suppo the theorem steve fr alae of 1. Ae san, 1 8 then bexter tonne for ore one te ads eke shan stp in wee 2 lec, then thetic that Sf loed ple ©8 and then (2) gives (2) = (2). So we sboul sop If eS. MP "The opine plc snow as « one-step Tookeaead rate (SLA). 5A Exampl A dsver is oking for» patking apace on the way tobi dtoton, Bach pking ‘cei ve with proba p indepen of ether other pking spaces ae ee {root The driver cana obverve wither parking pace Be itl he reach the destination without havc parked the cost is D. Show that an optimal policy isto ak inthe es ee spce tht Is vo furber shan s* fom thedesinaton, where» the retest ape ach tht (Dp th" > Solution. When the dive s+ spaces om the destination it only matters whether (he pace eae (2 1) oe fll (2 ~ 0. The optimality equation ive emf optimal parking (Gabe mail pe) F-0) + pFix(D), inne wale space) ihre P40) = DF. Suppose the diver adops polly of taking che at ee space that io ae Let the cnt alee thi ae be A), hee 1H) + 90-1), with K(0) ~ yD. The gece clon is ofthe for He) = <4/p 46+ of So afer Siieutuung tn wig the boundary conan at #0, we hae wo=-fese (pedo, seas. ? ee beter to stop an (ot tan + fom the destination) thas tooo and te Ue ot vate spe i in epi S= 2:85 Me=1)} = (4:(Dp et 2h "This set slat (snc sdereses) and so by Theorem 52 tis opping st describes ‘he opt policy mh te diver pari in the fst sible space pt his tiation und wall backs, tnt: DL yD, ee D=Ijpua neem ger a a 2 5.5 Optimal stopping over the infinite horizon Let um coer the stoping pon cnr the infnite-hrizon. Ae nba (2) ethene given th we are epi to stop by ines Let FC) be the aia ‘ost when ll that reued sth We stop even alt Since les ost ca be eure (0) 2 Fale) 2 FE). ‘Tht by meme converses F(a) tr ta aH ny Fe and Fla) > PC), Cones the yen of op nin nah 8 the inter, where (2) = 0, (2) = expl-2). The yliy of stopping mse ©. has FUe.2) ‘Sp(=2); and this ei te inferno pinay eqanin, Fe) mingexr(-2), (1/28 +1) + (0/2)P reaching large an iotener as we like before stopning: hence Fix) = 0. Inductively Sn cn ee that F(z) = cpl 2) So F(a) > Fl. Rohe uta ht at igh eo Examples Theorem 41 ix not true for negative programming Consider the above example but now suppose ne fs allowed never to stp. Since ‘tintin cots re the pst poly fr ll ite orzo an he nite bao Peeve e cap, So (a) ~O ad thesis she pally xan above. However, Fiz) = exg(—x) sho sat the opinalty equation sd he cit ered by ‘opping inmate Thiet notre at orp programing) tht Pay ‘he folowing lemma tes condtons under which the Ifa Hite hortaen cost oes conert the nina nite hor cot emma 5.3 Sipe all re Pode as fotos. (0) K=apkte) <2 O/C miafele)>0. 63) hen Fe) Pe) as 6 Proof (tare) Suppose xi an opti pig forthe init bxizon problem ad ‘top at thera tne 7. Th ot at ou (0 )CPle > 3). Hoeven it oak be pmb to sop at ie Oe cet ao wo mor an K0| (er nerie >) Pe) SK In tho horizon roblem we cou fob, but stop at tine sf > This imles fa PO) ERO) aed te types 6 weakened tor for cach = ere ent aw wach that (63) hele hen i placed by ro Supe w i dase Ly some yy, By repented ulin of (63) it iat we have oe) <2 [Se seen Dive this by ¢ ad et + to obi oe Where te inl term on the rat Band see snply the average-cost under pole x Minvising the right hae cde over z ges the oui, Te la for oplatel by © ta Os -¥ tig bey ‘Theorem 6.2 Suppuse ere exists constant \ and dened fnetion ¢ setifing (62). Then Ase te mol atrag-cnt ad te optimal satunary poe the oe ‘aoe he pti oh ight hand wef (62) Prout. anion (2 impo tha (63) hoe with quality when oe taba 0 she ‘tatouey pale Ua chooses the opting wou the gt hase of (6.2). Thus = ‘epee i he imal erage BP “The averag-cost optimal policy sound simply by looking oe abounded sation to (62) Notice that Md iw slution of (52) thon 2 4 conan), becuase the (a constant) wil cance from both ses of (62). "hs is undeteried up 20 ive coment le etching fr «schon to (62) we x hero ck my tee, ‘Fy Bsn bitrate 3) = 6.2. Example: admission control at a queue ch day a consltnt i presented with che opportunity to tle on new job The Jobs at ndepeneatydsrbuted oven ponble ype and on a sven day the fered ype wi robliy wif = Devote dle of ype fey up ep, (ne he bs accept jb he may ces no othe bul that ob complete. The probabity hat no of peaks edna (0 ot, = EW be Sh he coms eee” Solution, Let 0 acd {deste te state in which be fee wo accep job, a hich hee epg pon mb of fps epee Thaw (6:2) 5 2440) = Samant sa) Ard) = Camda nite + Taking (0) = 0, Ue have slat $i) = Hy — Np al ce d= Semele eV there is root. sy A", and this isthe minal average-teward, ‘The opsial paioy tales be fem ace only jos for ick p> Ne 6.3 Value iteration hounds Value erton inthe aveagecest ce wed pon the en that #2) —Fy-a2) yoxiates the minanal average cnt for Ie ‘Theorenn 6 Define se mle) — Beales apatite) Raaledhs (6a) Then mg SX My, where the minimal average cot root (stare Maroy pan J Thee appar tbs the fs stp of « shorn opal pay lows BiG) = Fae) +10) ~ Fee) = es fC0)) + BF) [20 = 2,80 = Fe) Fyala) +m, S cea} + BIR-ale | for ll xu, Appling Theorem 6. with ¢= Fos and X’ Tound 12 Af eesti a iar ex “This juries the follwing value iteration algorithm. At termination the algo ‘idk poten tna ply that within [OOS of ope (0) Se Fila) =o (0) Coma F om 5.9 =a, implies me ©. The fa) — min) + BF) lae— 8 8 — a) (2) Compute and se gor ep (3) 6A Policy Ptigy imprest im civ mite! of impor satoonry ein. provement Policy improvement in the averagecont cane In the average cot at policy improves lors ean be bas on the following ‘oteriscons Supp Unt ply F=f, We have sat A, 2 wonton to A ole) =e, Foy) + Blea) Lao =2,0 = Sa) a suppose fr se poi de ge) E osfalen) | Blt) Lay vas = fells wa with tre nna fr sre x. ‘Thes lowing he ines of roa in Tore 6. eae a which (0.5) ke x tne (2) ae pin, Tie tien "he llovi policy improvement sleorithen (0) Cans a artary stationary py oS 41 (pe fy pley Fon = fry eormive6, to save AA da) = ea fesCa) Blols) Le =e = fee) es [Eons Ths gives wo ear pty eal eat ue sn (0:2). (2) Now dtarnine the pai m= "om a fle) # foe) [29 = 20 = Hl) nga) +2160) Ian = 389 oo taking $2) = f-u(2) whenever this posible. By applition of Theorem 6.1, ths yes 8 srt inprowemnent wherever possible, I y= sy then the Meo teeinater and sy Hopi, Oster, muta sop (2) with #0 2 beth the action and tate apace ar Fite shen thre are ony» Site mur of peal stationary poles nt so the poley Improvement agordin wild oo ‘pt stsionney py in tele may erations By contrat, the ale erat “Mboithn cum only cae move und mere aces appreciation of 1 ho cao of strc dscountng, the fling tbzorem plays the rol of There 61 The poet © sna, hy reveal satin of (0) ta eal ‘Theorem Suppose Hers crits bounded fiction onh th fol = on Fe) le) + 881010) [non 00 = "hen G mn and (i) canta tafe fo wth minimal cot Sh oF w 16 TOMI Mn =A), 10, root) Resse (83) ha io fo aray ad ly Meh ake 2° ab Migr and ni she vse whe w= 8, and MMA =0 os wa i) The tak of My ie non decreasing ia, 0 iF tie reonolable then itis ‘eo fo 6 7. Bt she ean este for r= my the Casey Mamion ‘hore () Comin the Lagann Paws aa-Ta-m ay, weary Now we can derive Hom (8) a 84 Controllability in continuous-time ‘Tucorem 8.4 () Then danensonal ste (A,B, «| 6 eontraae sf and onl if the matric By hs awk, oF (8) emnalenty, eu ony 1 posto aft oa > 0 (i) teste is ote hen 6 antra at aches th eer from 20) 021) wth ile nu wes ‘ee Nait —et0), oe th ‘he eranaer becomes more lfc and costly bs 1 However Ele) LO. 61 00 8.5 Example: broom balancing Conair the probe of blancing x broom is a upright poston on your hand. By Kents ony se spc nya tan 3) gare. Final @ we bae fon Land @ wala’ = (2a) / Ean wth = 9 the pa equation aate-9, aE) GC)» spony, Sie he ates contol i nally sl. 9 Infinite Horizon Limits We present one fare example of controllably in cotinine the notion of Ray nnd she inte boro Tin fo he LQ reeubsion pb. 941 Example: satellite in a plane orbit Conder a stele of ut oa pena ital take polar ordinate 6). re hece pad up are the ail ad tangential componente of rt, = Othe rma (uch that f= G0) wth + = pail 8 = Ve Tcl that one run fr tanga srs Tear del eho they tl saat estoy sro ac uli put neg hare ia pearing force ‘ke cordate of perturbation he, with Fe icy to check that My = [BAD] bs ra an ha therefore he sem poo we no iy 0 Ib ap 0” aryl s=|i] w= an 2 #5) n Sic (2p0,0,07.Ma =O thin guar ad aw rank 3. The contrallabe compo ei the angar momentum, p+ 708 = 608g radius by target baking est “ 92. Stal sing the satinary control, inability Kay, we be = heyy + Bian = (ABR Deition 8. We sy thot Piso stbiey marie inthe dcr tine sense fall agencies of Tbe cy ove the unt dem the comps pow, |= "The [4,3 eso a to stale of tre ete K sachet A+ BK ce sahity nar and eee that 24 ast Noe that ue = Ka nar and Maro, I soking contra such that» it lutea oly to coer coutel ofthis spe sine, ae we So Ll such Cottle ‘ie opel conor the ie rao Let probe 93 Stabi To contin in the plant eauation deel Pt Bk in continuous-time Ae + Ba, Site wi onto = Ker Bote amen hme l= SImyin We say € i stabi matric the continuous tine sensei al egewaes of ave neat el prt, and race Bat x; > 244 Bxample: pendulum we wish to stile @ 10 zero by application ofa force w. Then G = pbsind + We hangs th tte variable to 2 = (08) and write a(6)~( states) ~( hw) = (42 )G)*() Suppose we ry ste Wh a como = Ne = Aa. Ten aeon=( 4° « 4) sd this ha eignnlues 1 Y=GE=E. Se ether “gh — K > 0 ad one sgenalie Ti ponte real pe. ich cane cere tm fice inwtabity gl KD ‘nd egevals ae purely ian, which mane we wis genoa Base eclaon: Sosirewail ration rant hea fineon of 8 we wel (and tix Wo come et of ‘liton toh Le] pubes poo) 9.5 Infinite-horizon LQ rogulation Consider the tae armgencous eas and wet the Site Boon et in vee of ie tons The terminal ert, when #0 ist! Fs) =, nal hat ll wee = Oy without hee of gael. Lemma 9.2 Suppor y= 0, > 0, Q > 0 ond [,B, «|i contol or stl eh (Un) has» fe Hin root Cots are non-native, Fo) non-leeeasing in: Now #2) Thi "lly non derenng ns ft eeryw-Tochow hat eye bs one of so arguments esyster sconrlae then 2H is Lounded hesuse there ply which, for any zp a il rng te state rset at mt nepal Bie et hold ew er with ner thereto. Whe yao fabian then tere c= Kak shat nae ad singe = 0, we ae elke A BK ia saber ae) 207 [Saye xT eK') «coe ence is either case we haves uppee ound an =H tend to a Et fe ‘very Dy catering F ~ cy, the vector with m une in the jth place and eo ‘bene, we coclue thatthe jek element Othe ago oi eunergen Then faking 4 de fllws thet the of aga! lata ao comer Bodh mali ization and policy improve are eetve ways to canpute she ‘olution toa infnitehorion LO relation probes. Poly improvement nes alone, the lnes developed in Lctre ‘hwo 4.2 which etal te sme fae xD, wad P pegraing, The LQ ryuition probe a negative programing poder, bers we ct mpi The roo 2, beens pene cetera otf 27 The i ot sre ‘Theorem 0.8 Suppose that > 0, Q > 0 andthe sien (A,B, -) conta Then () The equi tt eesion on aso wnique now patie dfite solution I (i) Ror any Rite now negative defite 1 the vequesce {Il} converge fo) The gin matic careponding WoT 2 ity mata. Proof (Csarced) Define I asthe fini of he sequence £0. By the pei en te now tha hint eet ads otto (0), Consier op = Kay oad spy = (A+ BK) ty for tary zy, where 21Qs BYUB) BTA nd P= A+ 2K. Wee wis (01) a2 4 + RTQK 4 0Mr. 02) ane hanet flay = a(R + RTQR)oy +37 Mnos Dalal Thus 2s decenses a, being Voued below by ato it tes to init. Ths 2] (RR QR) testo. Since 1-48 QR Ts postive deine his imps 25, hic mpi i). Hance for atrary Sita nom np det = sy 2 700, os) owes if we one he cl ey wy = Kx then lle hat <0 (7 )Mbr* Hh on hs (93) sd (4 oy). 4 Fully non mezatie dest alo tien (0.1) the f= (01 ~ whence (0 fon 9.6 The [4,2,C) system “The tion of conollailty rt on th wsurtion thatthe inl value of the state war iow It, however, ene mit rely pn imprte aerations, the she (qui ares whee dhe ae of tae (te ache pastor i the preset) ca be ‘SXvnnat fom the obecrvatione The Scr ine ton BC) i ein oy the plat exintion ad beeen relation Aes + Bas Ce Here y © RY ie aber but # isnot, We cuppore C i tuminr)and (i) the determination of pcan Ye exreaned nearer, won Prout Ifthe eptem hac elation fo (which io Wy hypothesis) the chi olin ‘tls uciqe and onl the mates, has ea, whenceaserion (). Assertion (ih flows em (). The sec into) al i) em be veri ciety tn dine the desation m= js ~CA™“zq than the ogaton amounts t= 0 1.242, Miho eqntions were hot covers we cul il tine 9 est sates ‘olson othe by enim ay positive det quate Frm i thee dentin Sih rcpct ty pret, we cond riimion Srey Thi inion ‘wes (104). I oqutione (103) inde have x mln i, re milly consist, swe supp) aid thi unique then expeson (104) ne a his sltion ths NEN, should be norte, which alo condition (i). ‘oe dat we hve agin oud i Helpful bring i an optnztion tevion in proving this tn, not ad to constr Oe ite stam ct ay, Bat atl fo construct abet it lution wre a seat salto ight ot hee xed This eprench fe se tothe sais! nppeoadh neces whem Oberst ae corinne ‘ond only ifthe matric Ny has rok nr (i) equiclenty,ifond only if Hie) -[ eC is {spots dite for a> 0 i) I the system i osreb then the determination (of 0) con Be writen 0) — HU" [orem see : nav [learrounes 10.3. Examples Example, Observation of population CConier two populsions whow sats are changing accrue to the equations iad, f= dore (22) ce. me(h BD sd othe tinal population ate eerie # Example Radioctive decay Suppose §= 12. nd dey fom 110 trate a um 1 0 rate Band orn Peo Sat ate. We obere oly the accualaign in state The = A, where Go + 80,8 hy bea tonzerodterninat wd oer Example. Satellite cr pation of he mt ow atte, (ere a \ fle uw? 0 8 Pa 0 0 01 a) \ie 0 won By taking C= [0 01.0) we se tht me case that ster tolerable om he bn of angle nenurmnts alae but nt seen fr C= [100 0), be, o ore 1 oe me{S 22 2] weft, 2 oo pe oo a no 10.4 Imperfect state observation with noise The full LQG model, whose description hus been dered wt now, assumes Kone dynam, qdratie ces ard Gevssan nob. Inperet observation te most i Be Ate Buy He 05) ween atm, 000) where is proces nos, is he oeration a ne tan is the oberon mae. ‘The miue oberon ae degrade nthat me omer only Cae Ate o()-#G)G) Gt 4) sod th 2 = NU). ba (tes) at Salty Wt esa aan oC te at Cece rm tet spose Ute at tare we tra to the quetion of xtiating fe y. ‘We consider the iste of state entiation nl pial contol al ball show (0) canbe cleat recursively fom the Kalan ier (aoe operator = Ab + Bey 4 Fhlon Ch ‘rocens = y= Cy. rather than the whi ie 1 Aone el (Gs 90 = 20) andthe opi cok ne = Keres then witout fall ifertion tbe epial contol 1 uy = Kat whee i he Inst neu let sqanen eto fy Ln the inbrmaton (Uy) He any ofthe eas we encounter in titans are waratd the spi sate leper tems 11 Kalman Filtering and Certainty Equivalence \We pets the important concept ofthe Kanan er, etn evan and the -Sparntion prinpe 11.1 Preliminaries emma 11 Suppase ¢ end y are jst normal wath cere moms and comnanee me Nee Vay coll he va] ‘en te dotrduton of = comistionl ony 4 Gousan, wat Bey) ~Valatys any ond cory) = Vou Vein aa) Proof. Bothy and x~ Vaya! ae linear fnctins of and y and beer they are Guinn. Brom & [2 — Veliatyg"] = 0 i allows that they are uncorrelated wd {Unpisny aisle. He ie oan a =p ania 1 i kati wth ite uoeondiional dation, xd the Gan th ero men td the covariance mati ive by (112) "The etn a = in treo y din me ¢ = y= VigNy eh a the Yinooe Yost square estimate of a tn of. Bve without the mst Lat {rail y are ots nom thi Tee Beton of hae marae corn ate ihn any caer ued eine for Ut ar funtion oy. I the Guta ‘one, ako the mas eed etm. 11.2 The Kalman filter Let ue ae the LQG and eat structure rumen of Seton 10. + Borate. us) ms ony Notice that oth 2 aye canbe writan a nen funetins of the kan sae (05.Ue-s} rns be normal, with soe ren yan cvarntno ates Vi The followin "ore desis recuse updating elt fo ee two qt ‘Theorem 12.2 (The Kalman flee) Suppose tet conde Wo, the el ate ye dotribted N15) ed the state ond ebereaons oly he rcasions of he LQG model (1.3){118). Then conditional on Wythe coment tte i dried ity ti). The conditional moon nd varance oy the wpdtngrctrions A Ai + Bacay Hise CB.) as) ENE AY LAT (EH AN AC™AL CYC CHa"), 118) whee Hy = (b+ AN ACM 4 CH CT) an Proof. ‘The prot iby infusion ont: Conder tbe moment whe u-bas been sina bs hot yet re The arn of) ental Conasionl on (1), Rese quate are noemally dite Ah zero mens Sod overoce tse, wales =a Ee fe = 08, (Wersee-nGs (ki ellen vo wing 1) moral with eats eV sd ovine mot Vg = Vgc We. Tw give (1LS}4009)_ cua Mr C7 11.3 Certainty equivalence We sy that a quantity a potepandepeent (| Wo) inept ‘Theorem 11.8 Suppose £9G model assumptions had Then (i) OW) = 3 + as) sehr the lina en aqureseatimete of. those extn is determine by the [Kalmon ler i Thsorem 11. end“ dates terms tht are oly independent (i) the optinal control gi by sho and Ky are th ame matin ani fl formations of Poon 2.2 1k ie important to gmp the cemctle fact thet (@) neers the optimal conta tw ezactly the se ae wold De ol bens ware wr wad tok rales eel {6 ther car ket egore states fog ther conor hewn) bel ot ‘scrstions yp tote t. Ths the Wea koa as ortanty equtvnenee Ax we ive ee i he previous seta, the dtibitn ofthe tition emer Era dom to depend on Uns The Bic tht the proiemy af optimal timation wl ‘oral canbe decoupled hs way is nown as the separation principle root ‘The roa i oy backed induetian. Suppose (1.8) baw Recall hat Se Ale + Dany tHe, "Thon wth drat cot of he frm) Boe ste) Halley += 1% a Aete) "Re Bu! Se-4 0! Qu, we awe es] HG) | Winns [slsenna) + (Aber +a) MA(ABAA + Boal] 5 ce a ie fac Ut camino Wyte bal By ad 6 have Inet td are ple inept. This enmure thas when we ex the qundetien in powers of yy ad Hiya the expected tale ofthe Baer term inthe unten ‘resero sl thxpcted le of te jar terms eprseniad by +) re fey independent 114 Example: inertialess rocket with noisy position sensing Cons che ela case of contig the poston of a racket by inertia contra ie velocy bat i the prea of ipefoce poo seine Notice thatthe alseratno sion dies fom the usta eel of = C1 + Toda Kalan ka fre for his vito we arg duty Rom ser Suppome feat —seny VV) Couette Fei bur Mate (Te rleven onoration proces snow ‘egsion and suctuting a es y= Beat Ham Be) he sarin of i shrfre fey) Sabai eae sore en 2M BL Me) Mining ta with respect wo Hyves Hy = V-(1+ Via)" othe ane in che Vee Wa VA Hin t =Waf Nea) Hee Ve Veh tbe Tether is complete ls of fran atte tar, then Vi Welty fey maa) (¢ Ns tad ae Te t ‘Ac far atthe optinal conto concerned, suppose an Inductive hypobess chat UW) 2B psy whee denote ple independ tre The =0,¥.=1/tand Fon) = gE (0 + Ge 4 oF + aR FP + bias Hk Bor Ae WPM to) LP) inning ome han wl Ries ecu of Mas =H Hace, = D/(L4D(h-2) am sh opal conte ithe certainty eikonal t= Diy D(h~ 0). Thin the ue contra in he determi ce, Bat wah 2 relic’ by 2 nya +) 12 Dynamic Programming in Continuous Time We conidee deterministic dstamic moearanine i ominous tine 124 The optimality equation 1 coeinoun tne the plat enaton daainush ce [emeinnna seFC(ar 1, ‘The dain fatr on Sise-*# Fo, 08+ of. So the optimality eqution i, Fle lsat) +8) +06)) atlas 4 Dy consi he et kien i hr ern xpson we bi ow ap fern ort een tT a) with F(a.) = Cle1) In he undixouted ease, wesley puto =. ‘Te flowing there ste tha if we can Bed pic whose vale fiction stax the DP sation then hat ey fo ‘Theorem 121 Suppose « policy z, wing « cnt u, has wlve fonction F which ‘atc the DP anion (21) forall sls ofe and. Then So open fret, Cone an poly ng com ye abe any a Fe sm fon (den oP ME Satna) settelans) eqn nel lang te path, Boe 40) 2 Fv etoa)-eecenn< f" men a “Th the v path incur cot of at ht (0,0), and oe = opti 12.2 Example: LQ regulation The uliscountd onsinsus tine DP equation or he LQ regulation poten s 0 igh" +0" Qu f+ EEA Bul] Suppose we ty acon ofthe as ant) = 2TH, mete Mf) In ay ratte Then B= Ute and che opsimisng view = 40°™B"F, = -O-™B Hee ‘Thecfore the DP uation sats with this wi fesnasatn pron hee me we the fet dat 2eTILAe = 2TiLAs + 27ATHe Hence we have x ltion toh DP tion I 1) so hr Heat dire eqn Sesion 12.8 Example: estate planning, Aman iconierng hie Meine plan of vest al expt. He biti levelof sings 20) and no other icone ote han hat which he abalone en ata xed ees fate, Hs tral apts therefore goverbed Uy the equation HH = 8019 ul, hee > 0 ad ws hi ate of expenditure, He wie to masimise for x pven 7. Fd bis opin pai Solution, The optcality equation fe “Vaat, Suppose we try a ssn fe fom F(t) = ftv. or thie to work me need mn [va ones HVE + otae a) a Br Yeats epnting tad eed i wan|p-o-torsst] ws "hele sl ve sli fe choose Jt make he raced an sxrnttr We howe the bounryconison ar) = Oy chose OP) Thus wed — for By Theorem 124 we have etablhal the fem of optimal contol whi in else loop wrnew= a) a 12.4 Examples harvesting At population size x obers the past equation, ene" 28 ‘The faction a() rel the Facts thatthe poplation can grow when it smal bt tebe evn ate when eget ain he ined tat hares fl ve" dt Solution, The DP syst (vith dincounting sper + Siae) ano, cor Heo uv yw ents eae ng ei came ae a isthe lanes nese fishing rate sand Ft 0. Then nto Sapoce Fat) Fs) a8 = wh or BE pi] 0 oa IEF) is concae, . [pes ely | xe ‘Cerys he operating point, We cappose pefA9re 0 ree 2) vane <0, 253, ‘Wes that ther chattering about the point 2 in the sense that sites between ies man ae our ales ether de of 2, etl taking the ae (2) at rine (29 ‘So fom (123) and (12.4) we bave (ey = HR ee Pee Ne, (29) ee) oleh ole) cect ona Assuming Fis concave, Fes ia estve if texts, So fem ats (Gare) (aa) sara) (Saya) ana(e) ater ule) ‘mbes Iie fbsbacuse (12.5) olen nlboubood of 2 es regula ‘that Fae be oeantive. Bit the denominator changes sgn at the rurrator it ae) ‘Grows eae a) utes to xronment peesues ‘Thee ica saeriee of long vem ied fr imate return a> IF te ina Dopiton i pester than 2 the the oil poly js to cre a en we reach the new F mi then fh nt tte = Wl2) Av a2” WO), #0, ey Safety Inge Is becomes optinal to wipe au te fish population, 13 Pontryagin’s Maximum Principle We ein Ponsrains mesma pice derive it an gve examples of its ue 13.1 Heuristic derivation Pontejagin’» maximum principle (PMP) cates 9 necoary condition that snst fan ota raya. Ie x eneletso or# Fa aT ae the te, (0, In compara, the DD approach ealeuatin for general nial vale ofthe ‘eate. PMP ean be bed ax bth computational and soli tebe (an in he ‘fend enc sole depen fr general tlle) ‘Greist «See srarant rianton wil pa tion &~ ay) ane tanec cnt crn) et Sand ternal rat Ks} The tle fnetion FC) ‘bern the DP easton fthout dnsmsing) en wan xe 6, thoi oon Pek, nes was Lente att are om proof tht Ae exis inthe rie se i actly tik el mnt. NO ine the Hainan Hasu.A) = Male) 2.0), asa) sear dling a sk pit ft path x facto of he current at) "Theorem 18.1 (PMP) Suppose wt) and) represen the optinal ental and state trajectory, Then here eit an ind toetory Nf) such ha taper wf) af) nd Xe) st tem ) (a) Mert al (36) nd for allt, 0:5 ¢ 1, and al ese contol, a0), 8, MO) $ Mla}, ul,NO), a7) fe, the optimal contra ti he sole of» masimising H(i. NO) "Proot? Our her prot is bard pos the DP equation ths he ct diet ad cgtoning way to derive once that nay be expect to ld i ener. “Asertion (185 neat, nt (187) lows fom theft tht he sale of (13.1) optimal We ca wr (11) hcremertal fore ae Fe) = ele m+ Beal + 8) ‘Using the enn le to dierent with eect toyed oxy = Bee = Mis - DB a49 206) hee (15) flows . "sie tha (10.5) and (12.6) ech give epations. Condon (1.7) ges free imequations ace true aon n't feta it, 0) ad 0 ‘One ean make othe acertions ncaling specication of endcondtions (te so: cola eraneerealty ondltione) ‘Theorem 13.2 (i) H=0 onthe opin path (i) The nl intial eondtion i ect [fusion of he natal. The terminal condos OsKy'6 lds tthe terminal x forall such ht 2+ is within oe) ofthe termination pint fo posible optimal trajectory for ll sufcienty small paste ‘Proof? Anetian() flows tom (131), andthe Set amerton of 8) evident. We have the tei! conion (12), fom whence lle hat (Fy Ke)" = 0 be ‘Mls each thet soe abo Beto & br ll ell anh pines Hesse se ‘nly ners In pits wbene moi ject al Bas ery to ad "he pins (13.3) bok. Thc we int hve (18) . cas) 13.2 Example: bringing a particle to nest in minimal time ‘Apart wi sven nl pain aa veo (0) (0) be bron 80 sot ‘poston 0 in minimal tne ‘Thi tobe done sng te conta ce sch tat [oS wid dynam of = 23 nd fy = ve That a(2)-(a)(s)*G)» oe sed we ww inst c=[ Ta where a theft tinea which 2 (0.0). The Hanon ie Hada t ent hich ie maiiond by = sgn). The anne varia sti = 28/2 | a (910) ‘Tho srninal + mus bo, 9 in (188) we can oly te ¢ =O ands (188) provides ‘aioe information or hs peblem. However, a raion b= ay B "he interme of tie to go mec compe Ma meae "Te eval fn of sedi: dheve a wan oe change of a ‘otal path asim in oe ection nd chen pe ail nthe ohen "Avi tothe at shat Hf Oa erin hen = 0) me cole ht M)—ta2 Othen Ayton Ofc a> O and eek omen meee sl teajctary Hes on the parables) = 2/222 Oy SO hie Isai ote switching oeus 2) = 25/2 Ma) Pee HP, Bae med oS “The control ale expressed as «function af operon, but a tens af 2.) am Ue tein Tea x owe Lop ‘ori tha tepth sent othe ntl eonins Rat the opin path ie very ferent fr to polit jam eter side ofthe oietng lose 18.3 Connection with Lagrangian multiplis ‘An ahermtive my ta nerd the ina pine fo tink of as La- ‘qian ship sealated with he corstenne += (au). Conder the Lagrangian ta [hee ale Keer, to be masini with respect to the (=) path, Here ft) tener taping se Sant F. Wetonagate NE by prsto btan antares [es sedate Figue 2: pina eajetoris fo the Bash problem ‘he integrand must be satomary with respect 192) and hence A = Hs The ‘eqn muse ao be statonnry wih rept to ¢ > Oy x{Z} 40 © $ and ence (CE) + K. (eC) = 0. ke, (186) and (8). Te good to have this eltemative ‘ew, Unt the treatment sew iramelat ad lem ey tog 12.4 Example: uso of transversality conditions IF the tern soe i contro then (ome ae a the west eat) en onset Ive Theorem 13.2; eat A masz oO, but the tbr caus of Theorems TLL and 152 cori tl (Coser poe with the dpe (1.9, a with wuncostand (0) = (0.0) and cot fein enh f oora—ncr) hc i ade, He) ==) he Hota Mien) =dervt onde, hihi at) = A) Row SO hee yeh ah re cdo, (1.4K) 0, ary a0 wah ave any 0 \yin)=0 ‘mst sotuton mast be Ae) = and Ax) = Tt Hen he optimal pled ee inet) == which dross Ener with ard eenteemve a 14 Applications of the Maximum Principle \We drs the tril contions ofthe msm prin nl farther explo of insu, The argunents are epi of tine wd ta sythenive aston to nm pa ‘ete ble Wy ne of he ma pene 14.1 Problems with terminal conditions Suppose 0, 6, Sand K are all-Sependnt. The DD equation for #24) now be ye + Hel ay) ‘ute a toning tS. with Fx) = Kia. for (0) nS. However een race this ton forms mine ce by aenting she ste arable by the variable off -fl (a4,3) ein 29 — Ay ‘hore 121 sold oc be werd, meer (136) we nn oad By tho as) Qo KaTo# Det Ber = 0, say hich st olla dhe termination poi (2) cet) within fof che Ueemination pin of ws optinalteajtor fo al sl enogh pitaec We ea vow ‘ndertanlwh todo wth astute of ei conc tbe mopping tale pio ony ced terminal tae 7th rut sro ts unconstrained 9 Ha (143) Uncouse NC] = KG. The pele in Section 13 fete ti tine te» teen termina tim den fe wicontesine ad oo (143) pie “Ag(T)= Ky. An cramp of thi eae appear in Seton 11-2 bow the stem b tine homage, it that ea eae depelet off, bu he teeminal coe (=P) depen on 7, then (12) pes that yn coat al wo she ‘asinine fH noms the opti oi. The problem a eetion 132 x Uetreted shi way. We take Kay) = Ian dade fo the tranmeraty cnn arsine + Is uncooatalo 2y-= Ry =k Ths f= Na+ Agu anand Ms row hols th tive remaiing tock of Cane ley wie fr the ig ear 15, eee rae (oni ie be ees a cepa) = (n/t Or deal pa) Ofwre t:Se een comet ‘i O ad wire in a yt maxis be ttl Gncoaste e, [eno (whee inunecrtetn Solution. ‘The plat entice and th Haman is (a3) = ea) — a= eat — 2) 0.) is constant The terial tne i conten so the tanerraiy condition Son ier act ne came eS oe oe SratEs 0st ee tice ther dite on [otereoerry0 ‘Thus ws mpc a fnetion of, hough; Tbe optimal ae fuetion 14.3 Example: insects as optimizers A colony oftmocts ons of works ud queens, of murs w(t) al) at te & ies sumeepdee proportion i?) of the cys ene a ite prong wear (0° ue) 6 The faction w vo be chown to masiize sho tor of same atthe of the seam, Sw that the opt poy opens ‘nly workes upto se more pce only vec tera Solution. The Hamann is = Non — ts) ae ‘The adit equnions and trannerny conitions (th X= 9) give Hy == 0 \ \ WT) =-Ky= 0 Mew-Werett—, XG IIT T+ snd oe) co 1) =o al Tht 1 st =P arene So. A. hn wf) =. Hea Ee hha yes = Do he * " Aeabe ous) ‘a ong as Avi sol, hi <0. "Terefore ate remaining ane rene, (8) then y= (e=8) 0, whisk pes that) ati oineeoe a ere, Fhe bck othe star So href no fre mth ih "Te pot at whic te sng choca a oud ty negra (14) fom £40 1, to give yf) (c/8]1— ead so he ste occurs where Ayo =O, be, mea or tan = T+ (1/0) be) evidence age that sca insets do coe fllow this ples wad open swt ine tha Ie ney opine ete bare ero. 14.4 Example: rocket thrust optimization Nard rocket as poe mas wt potion wcty © abd mas rss ‘dng ony by expan af wt the jet. Supe the et hae vet oie E fel othe tok and the rocket lsu to exter ce J. Then the coun ‘trate comer Fk, (om —Sio +59) + (0 Esme = ft, sd this gives the anced rocket unto’ bins Suppose the je sped A= 1/ is xa, but cho steton ss he rate of expuion ‘amo sae Tee ono sn econ = hin ol = ‘Sy Fin the corel tha asimios the height thatthe rocket eb. Solution. ‘The plat tion Gn B2) wonuts v= bu We ie dual arial p 4, coerepondingto zm. The to poe TED (here ee oats ae prey trial Sa (4 BE -)es The controle bangs on p47 a deerme nthe dua ution the rocket is buted vera thenf = mg an te dua gations give j = 0, ‘pad # = eu} 20. Sappose we want tase the beh tht she recat Sinan. Lt tthe naa te rode sae mt te ann gh hn te reached ifm = yy amd» <0. Since K-= 2 at termination, che teansvesalty ‘zttons ve pi) 1 g(P) = 0. Tha ls) = 1 (6) = sa fo mast msinie [ults/m~ br). One cam cece tas (es) — 0} > 0, and bee we shou is thee ar anc pts soe i, al therefor cst mi eight om er tne 15 Controlled Markov Jump Processes We conchae with mes for controll ptiizationproblens in a cotinsus ine eacinte stig Th Fectite ne contrelled Marko jmp preeses, Whih ne eee wen the tate space dre 15.1 ‘The dynamic programming equation infest + BIR 04649 |) = 2 0 =e appropriate mits exis ten ths ean be writen nthe nt 0 as inlets.) + 2,0 + Ao) Ptat eve (a st operator dei by ot) [EME Leeatimal=e) gga let + 50) — 03 ® the eoodtonl expectation ofthe ate of change’ af (0) long the path. The operator Acer ccslt tus of ate, ea), to another such futon Apa). Monee, te “pene pon th cl, we mite ae Aa). Wee alla he tran -encrator of the consollat Marky prowess Equation (151) quae > EYotet + 50 | (0) = 2.00} = a= 2) + (uh + 6) This qqation takes rely dee fr depending upn whether Ue state space UWalucete or continuous Both are portant apd ei Hie orm Carn, egg wth ase sae pe, soe = yy so=aey=4] 15.2 The case of a diserete state space Suopene tht cant ly vl a daerete set, abel by a inten jad ‘hehe transition intent al 1 Pelt) = BL) =f al =a) Ie ded for all and 9k Then [sett +49) | wl) ) Taser (: > ssc) a) +469, hee fll hat ACE = Fal wot = 6) sd the DP equation besos ig [tsa +n. + Dantotao—eU40) ‘Thin the opiniy equation for» Markov jump proces. 15.3 Uniformization in the infinite horizon case In this atin wo epin how (in he inf hvion ex) be continu tine DP (srnton (152) can be revrten to bok le 9 dacrte ne DP equtiny Once th tedone the all he koro Laces 1-6 ean be apie. nh encod cot cae (0152) merges the al oletna to tat fo) oF G.0)+F.0 +a 0 ral] o 1 init horn case, ert beoies indepedet fine ad we have [sine + nie Fo] 0 Spon we cam hase «ange noigh ha epi to ine B- Dane, dal for nl jal a. By nding (+9) Fy) 2 bt se of (8), tbe DP ton eat be writes rsonerany fio Fowl, ving by B+, this can be weiten ae ripen oA Ensen] wn ind oe Bo nae) 2H aad Spe = "hire the dynamic peogratming motion lok Me wee of counted dye Dyogehmming a dete tne, oF af negative programming Ha = All he Fslts Se hve fr thou eum ean ow be se fe. aeration, OSLA rab, ate) The Ue of ula lage B wo minke the reducon from «cola tom eee die Firmalatin sealed wiformistion the unlscoted cane me cou ry slim 4 (152) of the frm PL) = at ofp) Sutetting this in (152, were tha his we a saaton rove a [oo 14 Eatoles) wo] By adding B44) to bth sof the sain, thi ding by B, wna ig he ober sutton abe lat i= 8), thie egal Bad a0 srangfio Ere]. ws hich has te sume frm sthodsrete tne average cot ptt union of Latore The theo td techn of tt lature ca sow be ap 15.4 Example: admission control at a queue tc ubere contelable Iter Od a maxim aloe \ Lat clara) = a2 ~ a This corresponds paying «ent per uate foreach customer i the queue and receing a evar 1 atthe poe that enh new estore sata (ae heeone Inertng reward wt tte eben te arial rate on) Late take B= Apa wnbout low of gaverahty nose B= The avenue cone oa sxntion foe O53) 210) ifl-Ru + wo} +44 Aw), igfat—M+ #1) - 00 + (u-+600), a) +1 ~ inflar— eu nolo 1)4+ wale 1)40—wieel. or bul-R+ ee +1) ~ dle) + yale —1) + oka, > 0. ‘sw sould be coven wo be oF 1 a4 + la 1)— oe) ponive eztv Vetus cose what happen under che pcg Unt tae w'= A foe alls The relate eats fo his pi. my fate ven by fla) bymar- BAG Mies Dbafle2), 230 ‘The aalation tothe bomagenou part of thi recursion in ofthe fr $(2) = 1" + Ay Asstumung 0, ley Towers thn poly aency exits an ieteretng property. feet etme salle que fet than dows pay wich rej mci a oy ‘This sod pte i opis feels purely concer with weer ono nv ‘dato tt ja when thee Ze = ener ok ot al low = pea Goethe base ofthe difece betwen the rear 12 nad bis expected hang cot (G4 Tels Thi esample exits the dirence betwen individual optimality [wtih opie) and social optimality. The socal optimal poey is mae luca {Dadi ctstones baense eps hat oe toms ote nth tle les badly abou ngage patton n customer that peta Bnslf oy, eogniing thar admiting sich cuseamer ca xine eomere wh sr dnd ae hi tet [Beerdelay“Avexposed the pce ne nel the mee cel ote al 16 Controlled Diffusion Processes \We ive x bri introdurtion to conrlld continent tine sochaatie mode with & ‘vino tate spe, Uy eontrlled difison poe 16.1 Diffusion processes and controlled dilfusion processes ‘he Wiener proceas {J} 8 scalar poset fr wich (0) = 0, the joer in over ote nterrals ae titel ndopenat and) ie normally sebte wih sro mewn and rine (8 tan or Brownian motion ) This ‘pectin is inertally coset cae oe exanpe, BU)» Bit) + [B40 ~ BAe] tae mean and wit vice hand tty epee, SB the neeert of it tine aera ene 9 tbe 26D) = 0, EUR |= 8, EUABY| = 0f6, fr j>2, 2160/6071 = 0 [601-4] 50, the formal dervtve ¢ = dB/dt(comlguous tae “ite wo’) dow not et in a Ineusquare sens, bt expectations sich #[{{-osna} ]-2[{f eam} - for ranks sae i the integral is converse. Now comer «rlchartie diferential equation Som alae (2,060, hich nl rite Fall = alesal tate This, ms 4 Maroy proces, bs an afniesal gener with ation suo = py [SA hee Nea) = eal. The DP exuatoa i thie iy e+ +R Fa] lyfe + Feat StN Fea =0. 16.2 Example: LQG in continuous time ig Mew Qu H+ AT Az + Bu) + Bele Tn anslgy with the dgrete a deters contanous eases that we bave conse previ, we ty a sla of te fo, Fe) = 2700 +900, “Tis les 9 the sme Rees equation as in Section 12 = esnasaTn—uag ans St] 2 da, asin Section 7.3, Bev =0, ving > fi sosuinyar 10.3 Bsamples passage a stopping seb Cine robin moves on tt une era ¢ 2 < 1 comms tne, teat owen ort ter ofthe eval. Te orto up efan gal eh TEC Qwhpenalang both contol rd the spent ternal cn ih Eu eC) cy ting enti aes pl a Show tasted cne v= 0 one Shalt fro othe tention pete et nt eae coe FS) be a fe tear ary ts prey dscns ae ote boundary pte ht Couey Shon the maceie cae he aie gran eution WE the covet ie pion et cane eid ya rian #0} = log) {le oan yan eer th rot FE) a0 o

You might also like