You are on page 1of 12




Optimal Cost and Policy for a Markoyian Replacement Problemr
E. L . Se n N r xr n u o S. L M e n cu sr Communicatedbv Y. C. Ho

Abstract. We consider the computation of the optimal cost and policy associatedwith a two-dimensional Markov replacement problem with partial observations,for two specialcasesof observation quality. Relying'on structural results available for the optimal policy associatedwith these two particular models, we show that, in both cases,the infinitehonzon, optimal discountedcost function is piecewise linear, and provide formulas for computing the cost and the policy. Several examples illustrate the usefulnessof the results. Key Words. Markov chain, dynamic programming, discounted cost, optimal policy.

l. Introduction In this paper, we consider the computation of the optimal cost and policy for a discretetime, two-dimensionalMarkov replacementprocess with partial observations, two specialcasesof observationquality. Let for l:0, 1,2, . . . } denote stateassociated the with a machine that produces {x,,

'Thjs researchwas supported by the Air Force Office of Scientific ResearchGrant AFOSR-860029,by the National Science FoundationGrant ECS-86- 7860,by the AdvancedTechnology I Program of the State of Texas,and by the Air Force Ofhce of ScientificResearch(AFSC) Contracr F 49620-89 -0044 -C 2Graduate Student,Departmentof Electricaland Computer Engineering, University of Texas, Austin, Texas, rProfessor, Departmentof Eleclricaland Computer Engineering,University of Texas,Austin, Texas.


0022-3239 | i 100041 05$06,50/0 .( l g9l Plenun Publishing Corporation /9

That i s.sincethey provide upper and lower boundsfor are the optimal value of the cost function in the partially observed(po) case .. and as we will see. The decisions available to the decision maker are inspect. . Ref. 2}: U. l.for example. 11. the piecewise give the sarneresults the optimal cost and policy. Ref.that the infinite-horizon optimal cost function is optimal policy is finitely. 3..only one action is optimal. I.. are not finitely transient. Chapter5). Ross (Ref. r(t) and p(t) will usually be denotedby r andp.In addition.11. This characterization implies that one need only look within the classof structuredpolicies when searchingfor an optimal designadaptivepoliciesto overcomeuncertainties of in changes the valuesof the parameters the model (see.. This should be contrastedwith the caseof perfect observations. all the relevantinformation for selecting control action at time I is summarized (see. so that this structural information can be usedto designefficient algorithmsfor computingthe optimal cost and policy (see.produce" standsfor pro{produce.for example. 2}. OC TOBER9 9 l t0'l it em s .no observations the stateare i n s p e cfo r p 1 < p < p 2 . thos.The complicationarises. 0<p(r) <l. 5) and White (Ref. 5..3.among other reasons. DecisionIVIodel dependenton the stateof the The cost of the item producedis assumed being 0 if the machine is in the good state and C if the machine.etc. as well as severalvariations of the here (Refs. and theclosed u. 8-10. 20 (equipment checkingand target iearch). i:1. and Ref l1.our hope is that theoreticalinsights for more complex problems tan be derivedfrom theseresults. or four intervals such that. replace}: {0. two.p(t)).and of "inspection" refers to produce with inspection. several See.Ref. Refs. which only two actionsare availin able.r. Ref' 18 (one-armed bandit problem).tr(t):(l-p(t). ( see. a function of p. chapter 3) the conditionalprobability distribution.Ref.inspection region. l4).12) gavesufficientconditions for the optimal policy to be characterized.we describethe models consideredhere in more detail. 1. Using theseresultsand the dynamic programming (Dp) formulation (Refs. for all valuesofp in each of theseintervals.somestructuredoptimal policieswhich Sondik'sresults. ducewithout inspection.bad}.7 l . someconcludingremarksregardto ing the extension more generalmodels are given in Section5' and Notation 2. 71.the cost costs are for machineis in the bad state. that can be obtained by using Sondik's algorithm (wheneverthe optimal policiesare finitely transient). 5. 2. .). ta k e s v a l u e si n X : { 0 .for example. 22). problemconsidered replacement the modelsof the linearity of the optimal cost function also follows from the hence.2.1. . i. piecewise Thereare however.N O. by three numbersp. s u c h th a t i t i s o p ti m alto producefor 0< p< p1 and pz < pS p s . Recall that Sondik (Refs. iinear. ) the control process. Sincethe true stateof the machineis not known. and more recentlyRefs. I and 2. 590. the by e.This is the spirit in which this paper ts beingwritten. three. introducingsomenotation to be used throughout the work. finally.106 JorA: vot-. S pr!h < 1 . Section of 3 containsthe characterization the optimal cost function. Theseexpressions compute u.2}.g.transient.Refs. because the control actionshave to be specified eachof the uncountablyinfinite for number of valuesof p. sinceit providesan easyway iterativeprocedure. as 0< p. t= 0..for the PO case. p' 846) ' O t her and R respecti vely authors prefer to specifythe inspectioncost as the production cost plus a . Refs' I 5. l6) ' decisionproblemsthat are naturally modinteresting ih. The information obtained can in turn be of to ihe parameters or used.5. of analyses. D enote by {u. 13.e.e. 4 (advertisingmodel). ln the next section. 2l (internal audit timing).Ref . g. p.8) cost is piecewise optimal discounted showed. respecrively.4-12). seeRefs.3. simplilying the computationconsiderably. our problem considered replacement on research computationaltechniquesprovides exact solutions for certain approachto overcomethis difficultyhas beenthat of finding structural resultscharacterizing optimal cost and the policy (see.Ref' l7 eled using two states..e'g'. {0. of specialinterestto us are the resultsconcerningthe structureof the optimal policy associated with the modelsconsidered here. valueof a. I l. the optimal cost and policy. ur.e loop (CL) case. a n dto repl ace p3< p< 1.the computationof the optimal cost and policy is not easy. omitting explicit dependence t.2.g. in Thesecases of interest.e.the state being perfectly observed the inspectiontimes. here.e{0. denotes decision The the taken at time r. the piecewise linear whenever associated with structured optimal policiesassociated We will seethat almost all the herearefinitely transient.a t for structured optimal policy may consistof one. withp(l) the probabilirythat the machineis in the bad stateat time r givenpast observations and actions. namely. Ref' 19 (inspectionof standby systems). whereone needonly computethe control for eachof the finite numberof valuesof the state.e.and in these casesour approach still applies to linearity of the optimal cost function' show the piecewise linearity is used here to developformulas to In addition. also referredto as informationvector. with respect to perform sensitivity the model. ocroBER 1991 l JOTA:VOL .12. on Evenfor thesesimplemodels.The inspectionand replacement and given by 1 simplicity taken to be independentof the underlying state.9.. we show that the infinite_horizon. x .they require no This is particularlyuseful. 1 } : {good.and Ref. No. Four examples in areexamined detail in Section4. l. (optimal stopping times). whichu.The two special at cases considered hereare: the completelyunobserved (cU) case.We referto these intervals regions as (i.. 7.

2. l. if the machrne in is replaced the beginningof period . of observ_ ation.The observation process t:1.) and qit(u. Thesematrices are given by to epoch(Ref.The modeljust described the one studiedby Ross in Ref.(n(t)).u. .239). (3) . e( J. .=0. . U u.... t {y.:g1 ') for all valuesof r. Then.X x U R.l .}P' is an admissible in statex.v the initial probability distribution zr and the strategyg (seeRef.p.eU. and whetherto replace machineor not.eU.therefore. and g': [0.eU. ll). lr Let P(u. l/p(tc) is the expected cost accruedwhen an optimal policy is selected. 1] (l a) . u. 1] + U that is.The machineevolvesaccordingto transition probabilities pa(a. l]+ U.. B. 1. is of mapsgr: [0. and observation a objectiveis to find an optimal admissiblecontrol policy that minimizesthe expected discountedcosl Jr(x). 1. is the cost accruedwhen the machine is measurable poiicy.225). . At time /. u.1f.)lLt..(rc(t)). Define V p(r)=t nf Js( n) .eU. is the discountfactor. (2) . } t . The results to be presentedin the sequelfor Ross' model can also be obtained in a straightforwardmanner for White's model.r : 0. case. l99l VO NO l99l JO TA: L. u.p (u ):Pr y . The results to be presented here can be obtained in a straightforwardmanner for this modelas well. 12.u.we need only conpolicies. 22) replacement models.u. OCTOBER 71. .: i . X1. i for t:0. . Similar commentscan be made in relation to Wang's (Ref.1. selected. l\ I lt+ t. . It is further assumed that no item was produced duringthat period. Ref.l' ] is the expectation-with on D induced b. Xt+ 1.. However. NO. ll) in the sequence by in which the eventstake place. given by p(o):p(r)=[t.). p. 2.can still be written as u. wher enow n( l) is t he a post er iorpr obabilit ydist r ibut ion costcan be expressed This is because expected the definedoverthe two states. If the machineis replaced. on the otherhand. y. 236). {g. ) ) r . a B function c'.More precisely. the stateof the machineat the beginning period t+ I is determined of by the transition probability matrix p(2). This is model differsfrom that described white (Ref.:g..i. pp. cll . 3. the replacement the machineat period l places machinein the good of the respectto the uniqueprobability measure here. . is the probability makinga correct 4..Eg.a i. To avoid trivialities.236.1 @ )--Px ' + r:i f x .*.eY . 5.). 16. 5. 5g7). example. Y' w ' l . ( 1b) where d is the probability of machinefailure in one time step.).eU.. If g.I . For x..r. u. and in the functional of the probabilities in the definition equation satisfiedby the cost function.0].108 JOTA: VOL. p.r:l a ' : 0. . where e [0. u: 0. u. be the transition probability matriceswith entriespa(rz. see siderstationary Let c(u.In eithercase.with the discretetopology.r t-o " " l 4" o i . u . } gD- sequences. The observationprocessis related to the stateand the control processes meansof the conditional probabilities by q .a t : k f x . u. 7l. u. for Ref. sincethe computationof n(t) doesnot involve any observations.2}... . c(x. c1x. the policy is said to be stationary(when computing optimal policiesin the infinite-horizoncase. r.This leads minor changes stateat the nextdecision pii(u. takesvalues X: {0.): (c( 0. If no observations available. Pri x r= l } ) are given. .2.:g. for l:0. 1. explicitlyin termsof n(t).). 1. . . u . defined by P .action. i l t . 23. givenby f..Let D*=(Xx Ux Y)in We are interested the infinite-horizon with elements {0.: u\s . E u. one must decidewhetherto inspectthe item producedor not. ) . 9) and Hughes' (Ref.Qo Q. and let D be the in be the space infinite sequences of Borel o-algebraobtainedby endowing D.}. then it is assumed at that the machrne will be in the goodstareat the end of that period(Ref. .: uj . {Xo. andg:{g.5..) entriesof the observation the matricese@). l :0.2. in Assumethat the initial probabilities :0 r(0 ) : (P r{ x 6 } . Ross' model. . I I.5. I }.l..o>l'r-'. a sequence Borel measurable ' are suchthat u. . . The of represents realizatton the state. assume we 0<C<1<R (seeRefs. .we assume0 < 0 < l.p. leN u {0}. ! t . and actionr. .O CTO BER 109 fixedfee (see. trf . or Ref..: i . 1] x [0. { wrth qip@. this case{g'}[1 is a deterministic sequence.('.eX .: url. I . . p. will the it bc in the good statcat the end ofperiod r. 140-144) white's model. 1] x [0. 0<B <l.

0) : rQ) : p(l . are these structured pe[O.u: 0. i s al w aysess For q6:0.a.2): r(t. (v) T(l . the case R w henever > C /(1 .p. I. It is well known (e. r. n what f ollows. where T ( 0.5 (CU case) : T( (vi ) r(0. o'): trQp@)P(u)e. (6 ) The next lemma collectssomeof the propertiesof the maps T(k. u= 0 . D/ Q .we take advantage the fact that the of conditionaldistribution vector z is characterized the scalarconditional by probabilityp. for eachkeX.1 . Consider first the two-action CU case. p. For the remainderof the paper.1 . OC TOB E R N.l i l (1 -p)(l . ft= 0. I]" and "proconsidered. l. e (a )+B L D (k.s. p.+ p(l -q. l) . o ): (l .l .g.p.'. u )..(u)the 2x2 diagonal matrix rvith entries qit@). u:0. u )< T (\. and it is constantfor fixed C.o.3. ul )| . V p(T(k. p. u) for pef\. u) = xQe@)P(u)/D(k. tor u:0. ) .u) is monotonenondecreasing [0. 1).2q. .+ p (t-q ..5 one obtains D (k. N O. t henz! > 1 and T\ 0.r. T-' sati sfyi ng (p)< p lor pef O . l.0) . Theorem 3. (vi i ) For0< a<b< l. a:0.that is.p.'( p) : ( p. ( ii) T (0 . r Q ) sat isf ies p) >p forpe [0. presentthe characterizatton the cost functton in modelsdescribed Section 1.p. p.0): D( k. 1). is the probability that the next observationwill be k. given the probability distributiona and action u. 5. reducingthe optimizationproblem (4) to a scalarone. From Lemma 2. for u:0. u ) i n [0 . T(k. p. l]" to be optimal. l)t and Qr. the expecteddisgiventhat the machwhen an optimal policy is selected countedcost accrued ine startsin the bad statewith probabilityp. u)). w i th uniquef ixedpoint p: 1.0 ) + 0. and fluturecostsare discountedat rate P.)1.r ." and wherethe first term in bracesis associated the secondone with the action "replace.0) ) . l . 1. u ):l (l -p )0 (l and -q " )+ p q . p. u). B.7 1O . 1].T (k .' . 5) that Vp(rc)is the unique solution of V B@ ):m i n \rc . r. with e=(1. z3) and p> T(10. 1. (i v) T(0. T. u). u) f or pe[ 0. L e t q " e (0 .0 ).3 that. u ):[(1 -p )0 q .'@) : ( ba) / ( l.R+ Bvp( e) }.u )< T . 0. 0) = ( ') : p( le) + e.)-pq. menand so the proof is omitted. for the two special of 3.p<T(0.0) l( t . see Ref . tionedabove)..l .0). l) . r) i n [0.u) hasf ixedpoint sz! : l ar d zl: @q) lQ q. k: 0. l ]. 1. r@ . h e n: T (i) T(k..1(vi)and Eq. r." the latter is also referredto ducefor for as a controllimit policy. R. l. Aiso. one can rvrite the functional equation for Vp(p).when qe:0. Theor em3.2):0.2): 0.u) is the updatedconditional probabilitygivenobservation action u. t. | . o: 0. . r. 5) gave necessary and sufficientconditionsfor the policy"producefor all pe[0. 4a. when only two actions are policies "producefor allpe[0. 1. p.)]i K I-p )q . of It follows from Ref. (5b) Thus. 1) . l].2 in P. V p(p):min{Cp+ FVp( T( p) ) .. k: 0. OC TOB E R l 99l lll given that the initial probability distribution is a. T(k.2 in Ret 1l or Lemma 2.7l . ' . Hence.110 J O T A :VOL . In particular. p. u: 0."Also.< Il (2. u) for pe(zoz. I l t1 . for the two models of the problem considered replacement herq. oeU[ k:O ) f) (4) where D(k. Bayes'rule it is given by T(k.5 .2. (5a) k:0.p . 1. ( 7) r(0. Lem m a2 . Characterization the Optimal Cost CU Model.q. 2) : | / 2. 1. p. r.P( 1 ..A) >b.Refs. and prior distribution z.' (0 . u) >p f or pe [ 0. l 99l J OTA : V OL. andp: 1 is it s uniquef ixedpoint . In addition.2. u). t: with the action "produce. u:0.ate as follows: B.p. T ( l. and future costsare discounted at :. T t \ D. and using k. lf q. we of In the next section. Ross (Ref. we know that r(k.the optimal policy has one of the four in structures described Section l.p*l and replace pe(p*.. In addition. in ueU. (6). 1) . o:0.p. l ]. Note that T (k . L ( iii) T -' (1 .i: 0. l . note lhat R+ BVp(0) is independent p. t ) p<T( l. It is essentially Corollary5. 5.ef 24 (the maps are slightly differentin this model for the reasons . p.0): T( 1. l than zer o.u) hasf ixedpoint szl: I and zL: @Q . tr.

To the best of our knowledge. is constant. for by the defi ni tion t he Eis. from Lem m a . I ). p. 6. are convex. l rirl Since"I* I is the smallestintegern for which T-'(p*)<0. te N . l]. for the length of all the line segments. JO TA: L. 116).however). Also. i>1.1(vi)and the continuityof I(. (8) length of E1 is strictly greater than zero. Hence. Next.r-t.11. In addition.. NO .a .J+ 1. with T(p)e E1.1 (v i i ).but he did not considerthe three-action model.2. 6.we just showedthat Vp(') is affine in Er. 2 l E j l < l E j * tl . is piecewise linear. These j>1..)s affi ne. Wang'swork (Ref. that the restriction of Vp( )to.5 .7 lN o. VB( ). sinceby Lemma3.2in Ref. see a Ref .o-t.Then.1. When p*: l. Although his results can be used to show the piecewise linearity of the optimal cost function.replace) CU modei.. restricrion the of the optimal cost function to pe(. it is alwaysoptimal to producefor 0 <p < 9. N> 2. 17) showed the piecewise linearity of the optimal cost function associated with a stoppingtime model (the proof is not clear. 5. This lower bound implies an upper bound on the number of line segments describing the cost function. In a later work CL (Ref. Note that *t 1pye1p*. 5. 1] . T. Proposition3.0) ". ) and ?". greaterthan zero.for peE1. Thus.4a. O CTO BER 1991 113 we assume that the optimal policy is a control-limit policy.) is affinetoo. ow . it since is optimal to produce. .v 1 011. it follows from the definition of the d's that I is the least upper bound for the length of the 4's. 113. Let J+ 1 be the number of line segments describingVp(p)p. However. there is a uniform lower bound.p+.( r * "( p*) . 9) was aimedat showingthat controllimit policies were optimal for: the two-action CU case. From (7) and for p*<1. p. it is clear that Vp(p)1o-.r1.the E.).We want to characterize Vp(p)no.Wangextended results the N-dimensional his to case. we assume t hatd < p * < L Following Sawaki's notation (Ref..r-t to each E1 is affine. i>1. i > 2. Remark3.9. l .The sameapproachfollowed herecan be used to obtain the resultsof Ref. one obtainsthat T-" { p *1= I .Let E) denotethe length of subintervail. Furthermore. Wang studieda more generalmodel for the two-action CU problem than the one treated here. of [ 0. Continuing recursively all E1 one obtains for . Hence. 0.p . Monahan (Ref. s i n c e VB (. (5 ).l. E t* . ) are strictly monotonefor pe [0.3. 17.disjoint intervals.1. (e) and sincethe maps ?"(.e.the piecewise linearity of the optimal cost function associated with the infinite-horizonreplacement models described above has not been reported previously (see. T. However.from Lemma 2. and so the orderingrelation(9) is not necessarily satisfied i: 1.'( . For the two-action (produce.From the definitionin (8) and the propertiesof Z(p).. consider peEz.1 ): T (p )e E. Remark3. Wang did not do so. r'e): re): p(r e)+ 0 .t ( p*) 1. Remark3. T ' (p )e (p * . Theorem3. I99l VO 71.But peEl impiiesT(p)e(p*. p * ). the i .Wang also gave analyticalexpressions computingthe optimal cost and the optimal policy for for this problem. l et 0< p* < l . 1l i .Er *r also has posit ive of measure. ocToB E R . i N we have that V p (p ): C p + F V B(r(i l ). i: 1. 10). by deflningrecursively T -'(P) = T .: lpel}. roQ)=p. Below. 5 9 1i n R e f. ]. 25 and 26).and E q. ll. recent surveyslike Refs.. the infinite-horizon. since the unifying element of these problemsis the quality of the observations. p*l: Tr 1l) : lA. it is optimal to produceand V p (p ): C p + BV p (r@ )). for peE2. of E1 : { p e l } . t hus. and. Note that thesesubintervals satisfythe recursion E 1 : { p e [0 . and Vp( ) is affine. the resultfollowsfrom Ref. Proof.optimal cost it E.For peE r. Vp(f(')) is constant. ( l. define the subintervals (subsets) [0.( l pr ) . where T ' (p )= T (T ' -' (p ))...we will seethat thereare somecases which Ey+r satisfles for (9).t (T .112 J o T A :Vo L . const it ut e par t it ion. Vp(. we have that Ei.

formulasrepresents sinceobtaining Vp(p) with these a minimal computationaleffort compared ro that required for solving the problemwith.the optimal cost function is piecewise relation (9). O .replace)CL model is piecewise three-action linear.ll4 N R J OT AVOL .thus more structuredpoliciesto consider.the value-iteration algorithm. associated for linear. T' (0 ) }. l].and are more admissible p now also depends the observations on availableduring inspection.act ion he CU t model can still be appliedto eachof the structuredpoliciesavailablein this is case. T'(0).the optimal cost and policy will be found if the initial guessfor the degreeof (Ref.g. l):0 and appr oach f(1. 8.In addition. expression with respectto the parameters making them attractive for sensitivityanalyses of the model.sincethere actions.3. to be derivedin the next section. The piecewise linearity of the optimal cost function has allowed us to obtain also in this case formulasto computethe optimal costand the control However. Let p'' denotethe value of p deflningthe control-limit policy at iteration n.computationally comparisonof the differentcostscalcuthe latedhereis reasonable.This doesnot represent a problem though.Ref. we have also developed an analytical expressionto compute p*. the piecewise . 1).is that the intersection finitelytransient. and we just statethe resultin the next proposition. as n+:c) of the sequence partitions {E"}. (7). the optimal policy has in one of the four structures described Sectionl. Next. In thosecases.4. consequently. (10a) 6. the (r0 b ) it that algorithm from the theoryofcontraction mappings is guaranteed since (10) converges uniformly to the unique solution Vp(p) of (7) as n + co (see. i:1.inspect. Sincethe formulas give the exact solution and not just an approximation. Consider the three-actionCL case. Theorern3.3.given the data of the problem. while the othersarecostsassociated with particular (possiblynonstationary) policies. O CTO BER NO l99l 115 resultscould be taken as the startingpoint for extendingthe resultsstated CL below for the three-action model to hieher dimensions. sincewe can compute the optimal cost and the control policies.for each fixed set ol parametervalues. R + P V h-tG)i .1]" are finitelytransient. a Then. .computed assuminga policy structure different from the optimal mentionedbefore. Sincef(0.R } .1 is still valid in this caseand Et*t satisfies computedfrom the equations optimal cost function in this casecan also be derivedin the next section. they constitutean easyway to obtain an insight into the behavior of the processwith respectto. we do not presentit here for the sakeof brevity (seeRef. The piecewise algorithm (seeRefs. the by defined Sondik (Refs. expressions the optimal cost function is piecewise the optimal cost and policy. T(e ). p ' l : T ' 1 p )e (p ^ . . As will be seenbelow. 30). Proposition3.However. OC TOB E1991 : JO TA: YO L.2. theoptimalpolicyis not by Sondik.First note that the problem now is more complexthan that of the previoussection. e.I was obtained. 27). It is important to. optimal cost function for the (produce. wheneverp* does not to belong thesetof points{0. 71. seeRef.. and neither this nor that for the optimal cost involve an iterative procedure.28. We illustratethis in the examplesbelow. and linearity of the cost function follows. Tt(O). t he sam e usedin t he t wo. Sondik'sterminology. 8. linearity can also be obtained ftom analyzRemark 3. seealsoRef.If Sondik's algorithm is used in this case.only one of the costs computed is the optimal one (recall that the functional equation has a unique solution). for eachsuchn.p*. f"@)}. for all the structured the and select optimal one as the one that givesthe smallestcost. i=1. theoptimalpolicies as for "produceforpe[0. The Proposition3.l . of This wasin fact the way in which the result of Proposition3. sincethe partition E.292) is smaller than the actualnumberof line the approximation in segments the optimal cost function.uncertainties changesin the values of the parametersof the or model. p*] and replace pe(p*.. Example 4.which is the optimal policy structure. p.'. is the limiting partition (p'. as definedin (8). 29.3 linear: the proof of below).However. we flrst show how to obtain the formulas in the two-action CU case(the formulasfor the three-action CL caseare obtainedsimilarly).l. for example. and for eachn for which p" < 1 definethe subintervals Ei : { p e L \.4. V ' p (p ):mi n i c p + BV ' p ' Q @ D . 0. The infinite-horizon. for example. Lemma I and Corollary. Remark 3.from Ref. 1):1 for Qr : 1. 5. limits pr. the Ei constitute partition.2.we limits pi. . .observethat.7. in The reason.2.l l }. say. . have not found necessary and suffflcient conditions to state.3. CL Model. 5 and 3) usedto solveEq.7 1 . v te Q )= mi n { C P . and the the between set of points wherethe optimal polioy is discontinuous set of next valuesfor the information vector is never empty (cf. ing the value-iteration namelv.fiom Ref.i . f@). and then we will address possibilityof extendingtheseresultsto the PO case. definingthe structuredoptimal policies. Sincethe analysis similarto that of the previoussectlon. . sayE'. of [0. .5. i eN .' . . Theorem3. This comparisonis valid since.provide the sameresultsas thosethat can be obtainedby using the algorithm suggested Whenp* e {0. but lengthier.

(11) with Vp(l) given by Eq. we simply made taking into accountthe observations needto apply Eq.. p * 1 .I.then one obtains that t-l i-l is the generalexpression Vp(p)xrr.o-t given by Eq. . let as a VB()=Vp(p)xp-. Finally. .. one obtainsfrom (7) and (11) that Next.More precisely. In order to obtain the formulas to compute the optimal cost and policy for the two-action CU model.. sincethe ("r+ 1)th line segment specified [0. Proposition 3. the fact rhat n n| p*:[(i . in the previous sections. . OCTOBER Thus. NO.B)vp(l)l/ order to comof pute the solution to the problem in the two-action cu case.+ l betw fn adj acenti ne segments T h e i n te rse& i on f orpe E1 . Eq.r1.0) .l_g). from (7) and the definition of the 4's. 0 2 ) equaa peE2. ( 14).l \-l.0 ) k( t .' :.' .is .ii llrri ti l l E r: (a ' ._ll F'(p):Cpl"ft: 'L p(t -O)-l )l+ 'P ' Y ilt ) / '' B ' -if l-g )' -' -ll f r_--____ a gl g-B '-'-t _ Bl . j:o lrl<1. is in substituting(14) into this expression. (11). a .l ( 16) :cp L B t\-e)t+c L p ' 0 -(1 -a )' ) + B 2 v p 0 ). .I. and P'"'(' ) is ( | .3.. .. Therefore. and the line segments describingVp(p)no. .. -p )' ' (1 ' i. constant. a /].8".'-a ta\ f ." Note that vB 0)0-P ) :' c (1 -0 )' I -1 ----. givenby r'1. . noted above.J.namely 0 .*urdto obtain recursive for in tion for all the.r is the smallestintegerk for which the left-handside of ( 18) is less than d. p*].sinc. Vp(l) is computed using Eq. Vp(l) is computed by v p (r) :R + B P ' ( 0 ) . (11) Now.l ^^t^B +CPIP k .0) r _:__=__:l. E t* r:I0 . i: 1.B) .lf <0.:1. B -t B(t o)-r ' I t (14) r--! (l-0 )* . N O. that is. ir 'i. . . / (15) g .ing it inaulttion. (13) can be rewritten as . r +l. . (16) . J. .'l c atr . J. One nice characteristic (18) is that it depends of explicitly only on the parameters the problem. . . from Remark 3.l 'L p-l Bk-'(t-otu-'-tll P0-e )r------::--:1---. JOTA: VOL. we also have that a' :T-' ( p*) . F'(p):cp L po(t-0)o+F ' v B 1 )+ cI B o (r-(l-9 )k )... . we know thar integern suchthat T-"(p*)<0..0 ) . Then p*. C. (7) recursively. p e E r.1.. l .r1Observethat the only unknown rn for . .r . and at < 0 . we obtain from (7) that VB Q ): C p + BV p (l ).J + 1 .i :1 . (1 3 ) for i: 1.. fn. the value p that definesthis policy. Thus. obtain that we V n(l l : T(p)eE1._ tl . __f_ {^* ur r Bk r r .The are number of line segments J * 1 can be calculated from the following inequality: j= t L ort-t: I ari:(a(r"-\)lQ-1).to simplify the notation.. (16) i s.e) k .. o tl .we onlv have . OC TOBER9 9 l l 117 I 16 1991 71. .7 l .:1 q . we just comparePt(p1 and Zp(l). is given by Eq. Assume that there is an optimal stationary control lirnit policy associated with the two-action CU model. L'.. Formulasfor the CU Model.r C( r .Then. q .truigrrli. by (8).u.rln. p . The aboveresultsare "/+l is the smallest summarizedin the next proposition./+ I line segments [0. (18) p(l-e\-r I ( l.B 0 . Therefore.IOTA:VOL . for peE2. (16).e Vp(p) : Cp+ p vp(T(p)) : Cp+ gICr( p) + FIt p(1)\ ll c fn 5A ^^B'( t. To find the critical valuep*. rvhere"/* I is the smallestinteger k in (18) for which the left-hand side is lessthan zero. R. If we denote bV F(p) the lth line segment.B r + r r . .o\r .

l ). 1.5369462.1 / (Z .00000p159.0. p*.l(iv) and (v). k :0 . 0. Now. + 153. note that there are sevenline segments describingZp(p) for pe (0.5369462.54959 p e (0. the maps (k.1783 0. pe (0.0i 2<5 < l. 42r 01 161.1 ( r e) + 7.4 0 < l l Q .4605770.s6342r + 4. 0: 0.l. CL Note on the PO Case.e). 4. p e (0.computeVp(l) andp* with (16) and (17). p): 10. 0. 9: 0. p e (0.0)> 0.p e [0 .5 . J OT AVOL .563 5.57 r68p 15'l + . C: 4. 9). T-'(1 . is of interestto seesomeof it the complicationsthat arisewhen working directly (analytically)with the PO problem. R: 10. OC TOB E R : VO 71.In this case. observe that. 0. As mentioned Remark 3.0864824].607 07931.p-1 . respectively. (17) and (18) with right-handside 0 in yield.5 I 93. as Wang states(Ref. and two line segmentsdescribing Vp(p) for pe(0.For 1>qo2l/(2.optimal cost function has only one line segto ment. for 5 lf qo:6 /(2 . the fixed point z! is lessthan 1.I sati sfy one mi ght be tempted i nferthat to T ( k . Example4.6070793.0 )> p .p*.5145193J and p e (0. I : 5. 6073388. one of the motivations to is studythe modelsof the previoussection that theyprovideupperand lower boundsfor the optimal cost in the PO problem.607 931. for p* <7.1.46057701. 0. 1 00000001.0864824.i ilill I rl' .respectively. it also can be interpretedas the or numberof periodsafter which it is optimal to replacea brand new machine. in followingthe approachsuggested this paper.Another reasonfor studying thesetwo casesarisesfrom the difficulty of dealing directly with the PO problem.0 ).10005. p e (0.optimal a result. 0.d ).0.3.88782."The associated optimal cost is given by 23. The stationarystructuredoptimal policy is: "produce lor pe [0. 145 Qt) (r-p)lil-. then w e havethat I.0).5< qs<1/Q.r r e> ( r .the sameexpressions obtainedby Wang in Ref. Beforewe presentsomeexamples.44782. s i n c e /(2 . As we stated above. p e (0.6070793.334045'71.51451931.7r230p+ vp( 155.15262p+ 152. after somealgebra. 9.17 20. p . 56. and the term in brackets ( l9) becomes in (20) rl . since in this casethe policy calls for never replacingthe 783421.32544. 0).0) l p-a L\40 / 'rq. one attempts ing 0.514 5193.0000000. 0. Considerthe three-actionCL model with the following data. The srructured optimal policy now changesfrom four to two regions. 0 < d < l .53694621. Observe that Eq. NO . 1.0. changeg from 0.81182p+ 152.2600507.0. 0. 0.Also. -/* I can be interpreted as the number of line segmentsdescribing Vp(p)rc.o*t. pe (0. 6: 0. p e (0.4006411. B :0985. corresponding VB(p)no. 53694621.5. ( 20)is only sat isf ied p<0. 51.0. 18.1 ro 0. f or As linearity of the infinite-horizon.(19) is only satisfied somecombinationof valuesof qs andp.l ..1. and replaceforpe(0. .54600p15'7 . fo r 0 . However. or after somealgebra. linearfor all qssatisfyoptimal costfunctionis piecewise the infinite-horizon. / -.01965. p e [0.83191. to solve intervalsusing (14) and and their associated and computethe line segments (15). Considerthe two-action CU problem. From Lemrna 2. O CTO BER 1991 119 for"Iin (18). 501 152.5<qo<l /(2 . Examples illustrate some of the ideaspresented The following examples above.A similar interpretationcan be made for the three-action problem. Unfortunately linearityof the costfunction for 0.4616r. 0.5369462.0).6070793). k:0.)'-ir-r) ! ] .88782. which complicatesthe problem even further. If. Corollary 2. one finds that to havepiecewise iinearityof the cost function it is requiredthat Z(0.96460p 40457. 21.56075. forexarnpl e. optimal cost funcFinally.' . 0.l . | l).82985.40064 13 + . If this is not the case.45950.33 15. using to provepiecewise the sameapproachusedin Proposition3.2600 l.11. we ha v e th a t.ll8 1991 . the infinite-horizon.0).6 ).inspect for p e (0.5< q6. 1. 98. the infinite-horizon In tion for the two-actionCU problemhas -I*2line segments. 0.51451931. 0. this case.0 ). If p*: l.33618p+ 151.09230p+ 7 42.49645p| 54.0)> that f -'(0. JO TA: L. p e (0. H e n ce. (21) is a closed-lbrmformula for the seems that the piecewise cost cannot be established completegenerality in for 0.7 iN O.

09.8691. if d doesnot change.52171.65825. The optimal policy the to "producefor p e [0. and its determinationis equivalentto solving for p* for each y.0p+151. 36641.e.yet fail to be optimal for the infinite-horizonproblem. Thus.N O. consider casewhen I changes I :0.08713. .26065.577379 I can solvethe whole problem again.401201.e) + 4. 18.4962. satisfactory systemcontinuesto perform in a preselected This examplealso revealsthe following.9233 .00000p+ 151.178461.97982. pe [0. To the bestof our knowledge.41 + t8. 0. pe (0.Thus. Unlessone knows in advance obtain by using the value-iteration the structure of the stationary.097 p+ 22. 0. a particular structured policy may occur at any time during the iterative procedure. 20.0.3210p+ 151. | 6r.57731911. case.iliil . 1 0.00000. 0. 15. ?"( 0)T'( 0) . .56355.967'77.5458p157. . 0.r+ 1) is not knownrin advance.38101931. t53.33463).57729121.9233. . 0. (23) 13.iorthe number of line segments describingthe cost function. policy structureswhich are not optimal for somefinite horizon cannot be eliminatedas suboptimal. | 4p 256.33463.48742211. pe(0.0562. I .evenafter several of the grid have been tested (with the correspondingtime consumption involved). Thus.89336p+ 141.00000001. 0.0.65825 specified (23). 31.607 . w .40120.607211.Of course. pe(0.461111.2261 rp 0. p e(0.08713]. p e(0.51502.0.097 3664.0.0080949]. . .7959p+ 152.0000000.Sincefor this problem modify the value of some of the parametersto designed for changein some other parameters)so that the compensate an undesired way. As shown above.2525291.487 4221.5773791.changes any of the other parameters the model may change in of p* and. 4. 24. Considerthe two-action CU case.14156.592'7. 0.0.5772912.9666r. + 7.. . p e (0.[0.R.46111. .0. + 7. 0. However.436127 61.5'7'1 121. with formula Also. one might be temptedto solve (7) for the valuesof the cost function at the only values of p which can occur in the model when starti ng i th a br and new m achine( nam ely{0.481 4221]andp e (0.380193. 13. knowing the value of the optimal cost function for countably many valuesof pe [0.57729| 2.69048p149.9452p+153. + 155. pe(0. vp(P): Q2) pe(0. However.0000p+ 159. C).577379 1. l] doesnot give enough information to make a statement about the sensitivity the optimal value p* with respect the paramof to etersof the problem." The optimal cost 29 is givenby The structured optimal policy for this example still has four regions if d to changes any value in [0. in in changes the optimal policy due to suchsmall changes the parameters on of the model could not be studiedbefore.0754p+ 152. 21.000001 2l .4753p+154.09.infinite-horizon optimal policy. note that the number of line for in segments eachofthe "produce" regionschanges differentvaluesof 9. pe(0.7 p + 146.7 17 + 144. As is mentionedin Ref. pe (0. The optimal cost is given by 23. p e (0. O C T O B E R 9 9 1 7 JOTA:VOL .4945. 1. pe(0. pe(0. adaptive policies can be (for example. thus.0. f '( 0) }) .17860341. inspectfor is : p e (0. 11. 10.0080949.We are able to study smallchanges the parameters in of the model because the analyticexpressions of found as a consequence of the piecewise linearity of the optimal cost. the infinite-horizon It is clear that the same kind of analysis carried out here can be done of for any of the other parameters the problem (i.252s29 tl. seeRef.but vp(P): 25. l .4367276.In fact.6890p 0.6017. 10.260651. The results were very difficult to algorithm. pe(0.l0).P.33760p+ 145.31980151.(p): p ( t . Note how a relativelysmall changein the valueof 0 resultedin a significant qhange the structureof the optimal policy. pe(0. the number o[points for which Vp(p) has to be evaluated(that is. 0. p € (0.. t6. OC TOBER 9 9 1 1 121 p*:0. pe(0.4874221.515021.1 6427 1281. note the sizeof the interval for which the line segment in resultswith those is 4.replacef or p e (0. it is very dfficult to decidewhen the optimal policy has been reached. l . 1 .56497 144.p. I.and henceto choices decidewhen to stop the computationalprocedure. the equationsfound in this work can be usedto obtain an insight into the way the systemrespondsto uncertainties.17846. estima' tion of the minimum number of iterations [for examplein algorithm (10)].0. 31.8774. 71. pe(0.60'121.'73232.3632. 20. 29. that a flnite-horizonoptimal policy is also optimal for requiredto guarantee remainsan outstandingproblem.1 20 1 NO J O TA: VO L.3198015. 0. We comparedthese given by the value-iterationalgorithm.1346p+ 152. 4.because necessity discretizthe of in ation oftendoesnot permit high confidence the resultsobtainedby following the DP algorithm. Now. + 143.9677'1.71507p+ 143.0973664.alue the parametersof the model. small of changes the parameters the model changethe optimal cost function in of and p*.64421p14499467 p e(0.

15557 . are algorithm might not be attractiveif real applications being considered. pe(0.13. pe (0.10000. p e (0.valuesof p very closeto I are questionable.597167.56953.4931p 17 . Example4. Tt(0) ) ).31601.namely: vp( p'l: Example4. Theorem5.2154. C.24002J.Let B :0 . . pe 10. criterionis the average mance 18. 0) ] . flor the samecomputer and computerload).8927p 11.1 7 .528. and Us ing(1 6 )a n d (1 7 ). * :0.0315. pe(0. 1].5608.9141.0000p p e [0. R :10.7120p+ 18.0. pe(0. p e(0. 16. when working with the discounted and cost.and sincethe immediatecostsare the not averaged. 15.the expressions problem for which the perforfor equivalentexpressions the replacement cost (seeRef. p * :0 . I : .2400p 22. there are 1l line segments describingVp(p)no.0. 15.271001.24002. pe(0.9 9 9 9 . the expressions although the valuesobtained may sufficewhen solving the initial control and analysis would be not only expensive problem.316011.9141.0 0 0 . might be impracticalto consider sets are illustrates.63 4.0)+T(1.In addition.8759.651321.343901. pe(0. 0.0. P *:0.569s31. t8.0 7 7 .0.1067p+ 18.w e o b ta i nVe 0 )= . +24.7 l .9794p+ 17. vBQ)-.0.p.1 .9 and leave0. The optimalpolicyis not finitelytransient.s961ll.636s01.and we are dealing with the infiniteof the horizon problem). p* gi ves ri se to two dif f er entpoint s I I .3.1604.551241.6'1 r n :1 5 . 5.8927+ Il .468561. OC TOB E 99l JOTA:VOL 71.2224. 2851.5954. The latter is given by t23 all it if many parameters uncertain.32.2365p+ 17. 0 :0 . Take B = 0.8644p . + 20.r ( 1. solving the problem again using the DP if only one parameterchanges. Obsen'e that this particular examplesaysthat.9902p 19 + .1).00000.0.0815. l.8644p 20.27100.0. consideragain the two-actionCU problem. i' ri ! .9 .1028. 75].If one usesalgorithm(10).ut. developed hand.the immediatecostsdo not at really get discounted the initial times.0. .52t70.0617 i 55571.5925. p e(0. C:4. .67285 and VBQ):26. and p* :0.4476. pe(A. 13. V p(I):18.122 lR N J O T A :VOL .7 . s l ast casetook 52 minutesof CPU (comparedto lessthan a secondwhen using derived here.601.4931p+ 17. To further illustrate the complexity of the strictly PO case.However. a changeof optimality In criterion should be considered. pe(0. . Vp (l ):2272. 9.190001.1515p+ 17.46856. + 7. p 11 ." /:10. Now.for n: 10. + l7. Using the formulas presentedhere. 0.O. 0 :0 .521701.55124.9794p + 16.:+:10.612s81.19000.9849 9.5046.R as in Example 4. considerthe two-action CU case. pe(0.2442. with a grid of 1001points in the interval [0.0.5961l.l l i i l ltLlll T C f or t h e tw o -a c ti o n U m o d e l . one obt ai n s fo r n :1 0 0 0 . 1.000. we obtain p*:0.9441. : P Thi V ^1) :1 5 . 0.but also hard to perform in the sense determining actual effect of the uncertainties on the optimal cost and policy.6 0 9 . resultsobtainedmight not be very useful. C :4 .since ?n(0.0.0. If 4e changes from 0. . ( 0.000001..7'742. Q4) p e (0. pe (0.00000.882. OCTOBER 1991 optimal cost is piecewise linear.1301. 18. + 4. p*.409511.1 . 1l .I.588. 16.44s971.68619.2365p 17 597 + .0).501371. p.44597. Let fr:0 .2400p 22.68619:Tr(0 ).38441.61258. f@ ).686191. p*.384411.I5l5p 17.0. pe(0.0. + 17 . pe(0.2227.10. + 7. by letting B + 1. Example4. On the other herecan be usedto obtain. Qs) p e(0. Z. the setof pointswherethe optimalpolicy is discontinuous contained in the set of next states for the information vector but (e.40951.0000p 24.1067p+ 17. the following example as of potentialchanges.62 p* :0. is apparentthat a sensitivity in termsof computertime (this is so for any computationalalgorithm since p takes uncountably many values. + 18.9902p+ 19.10000]. even Furthermore. 0. 27. the + 16.4450.0.28.366. 0) . 18.5 to 0.0. 4.65132.9388.50137.5005. + p e [0. 0. p e(0. R :10.NO.7120p can be verified that the optimal cost function is still piecewise linear. p e ( pe(0.0. h e n . That {{I/(9)}) is is. which .2.

Journal olMathematical Analysisand Applications. pp.N. 26. 10. 15.p* . problem is ated with a two-dimensional... 24. V o l . E. ManagementScience.qs. 1978.esses: Theory. 1024. ?' -' ( Subjectto Partial Obseruation. Vol. New Jersey. P. p* . IEEE Transactionson Automatic control. 0 ). 1 9 8 5 ..Prentice-Hall. J. D.124 l 99l J O T A :VOL . L. 37. SciManagement Ross.1989.1980. 1971. 1J. The Optimal Control of Partiallt' Obseruable Markou Decision ouer Processes the InfiniteHorizon: Di. e nc e. AdaptiueContol of Discrete DiscountedMarkou Decision Chains. Graduate School of Business. pp_ 1_16. l . Por-rocrc.340-348. Journal of Applied probability. Lovrrov. New york. 3. partially Obseraed Research. D. 13. Journal of Applied Probability.M. E. 843 852. 7.. 26. 13. p p.Z35243. ResearchPaper No. 46 .AccountingReview.ncus.79l-797. E.. Journal of Optimization Theory and Applicat i o n s . 29. A.1 9 8 2 . C. S. This in turn of piecewise cases the observation linearfor two special allowed us to develop formulas to compute the optimal cost and policy.. i9 (seeRef. Dynamic Programming. and the PO case. 0). Bentsexrs. Texas.I.. 19'77. 14. 1967.2s9-273.454-465. Stochastic OptimalControl.1988.pp. 1987. 1 9 8 1 . t l2 .G.. Frnraxoez-Gnuclren. 1971.Management Science. pp. R. 25.596. 'illlii'i I i 8. Journal of the Operational ResearchSociety... pp.T.partially observedreplacement quality. S.l lE.tsing hnperfecr ldormation.' ( 0..EnglewoodCliffs. | 1.scounted cosrs.. HenxaNorz-LEntre. 5. Jecoes. pp.pp. dingsofthe 27th conferenceon Decisionand contror.5. 587..In of abovecan be usedto obtain resultssimilar to addition. Vp(P)o. 1976. Journal of Mathematical Analysis and Applications. O. 1987. pp. R. SoNorr.(1 . 71. On the OptimalSolution the One-Armed of Bandit Adaptiue Control Problem. 56 68. C. K. Optimal Inspectionand Repatr of a Production ProcessSubject to Deterioration. C. p*. MoNasau. pp. R.5.V o l . 5.I.. 17. 0). pp.1987.p p ... 174. OptitnalInternal Audit Timing.O . 27. 16. 17. M. 91. above. into Piet'ewise V ol. HenNaNorz-Lenvn.1911 . C. for StandbySystems. 28.Naval 19"19.concerningwhether..we expectthat the knowledgegainedwill enablethe solution of here.table for 1979.the inspection standby thoseshownin this work in other applications. Markou Decision Ar-enrcHr. C. p p . StanfordUniversity. and Mancus. Vol...2 -' (0.Hucnrs. Optimal Stopping in a Partially ObseruableBinary-Valued Markou chain with Costly Perfect Information.Models. Asrnov. Wnrrr. with the kind of modelsconsidered associated severalopen questions like the one posedin Ref. 0 ). 3.Journal of Applied Probability. Systems..Yol. C.. pp. P. and Mlncus. 21.2 2 7 2 3 5 . 19. T . T'heOptimal Control oJ'Pufiially Obseruable Stanford University. OC TOB E R l J OTA : V OL.Vol.a. Markoo Processes. On TheAdaptiuecontrol of a partiallv obseruable proceeMarkou Decision process. D. Austin. pp. WaNc. and ScHrnsn. p p . 20.1. OptimalReplacenrent Policyt1ei1fu Unobseruable Srales. 33. C. W. p. and so on. This was illustrated with severalexamples. Engineering PhD Thesis.trurs.19g2. Wexc. and SerowaN. p.7 2 8 1 . A. Yol.l l 7 G1 l 8 4 .T. 1978. A Surueyof parrially Obseruable Markou Decision pro. Vol. Processes. 559.ot is describedby : l'. OC TOB E R l 99l 125 in turn give rise to four different points [?" '(0. This is the set of structuredpoliciesremains the sameconsidered study.and Algorithms. Computationally FeasibleBounds Partially Obserued fuIarkou for DecisionProcesses. Annposr. Conclusions optimal cost function associWe haveshown that the infinite-horizon. P. 10.ManagementSciencc. Journal of OptimizationTheory and Applications.rNp.5005. 0)1.241. meaning that. t r .282-304. Optimal Control of Markou Processes Information.0). L. ControlunderMarkouianDeterioration. Soruorx.Vol.826-832. Qualitl. currentlyunder References State with Incomplete K.... Optimat Inspection policies Communicationsin Statistics-stochasticModels. 23. 12.. S. 9. Vol. S. SolutionProceduresfor Markou Decision Processes.. in Ref. Vol. 30).. A Markau Quality Contol Proce. pp. ComputingOptimal Control policies-Two Actions. 18. New york. the ideaspresented of e. C. V o l .g..421-424. J.cHA.operations Research. Vol. r' ( 1. 23. 22. units considered ofthese resultsto the PO casehas not yet been Although the extension obtained. and Geven. Transformationof Partially Obseruahle Linear Ones. C. C. 2g. AcademicPress. Structural Results Partially Obserr. 2 6 . 52. J. Boundson the Optimal Costfor a Replatement LogisticsQuarterly. J.pp. C.Departmcntof E[ectrical l9 il. Research Partinl Obseruations. C. T-tQ. pp. 0).. r20412i0..5. Vol.Voi.. Vol.Vol. 1041-1053. 1 9 . OperationsResearch.205. N O. WHrrE. W . O.TheDiscrere Time Ca. T -' (1 . pp. 1965. G. P..7N. E. and M.p * . 198. l . lvlarkou DecisionProcesses Sewarr.0:o Z' 20+l iine segments.E. 52. Tnov. vol. with respectto the making thesemodels suitable for sensitivityanalyses parameters the model. J-5. Hucnas. MoN. Problenrwith Wstrr. .0).227. 1988. t978.. Minimum-Cost Checking{. Yol. Operations Research. pp. A Noteon Quality ControlunderMarkouianDeterioration. 3.5.415-422. E. Brnrsexes. for 4o=0.. AdaptiueControlof Markou processes witlr Incompletestate Infarmation and unknown parameters. I. WHtre. Kuvan.

935-944. Illinois. C... 23-52. 30. J . pp. Partiltlly Obseruable Markou Decision Processes with an Auerage Cost Criterion.E. 29. and Comon puting. Markou Decision Processes. L. K.. 1980. J.. S.. L. New York.. ANonryrNov. 555-561. 4. S. Florida. and WHrre. pp. J. 39. J I ti il . and Mancus. and Uvwov. Vol. Joumal of the Operations Research Society ofJapan. AnaRosu'rnrs. and Mancus. pp. A. Optinal Contolfor Partially Obseruable Markou Decision Processes ouer an Infinite Horizor?. and Scnwetrzen.126 1991 JOTA:VOL.. Dynamic Programming and Its Applications. C... A. Automation and Remote Control.N. 1989.471-512. Kocl.NO.Control.. S. BayesianDecisionProblemsand Markou Chains.. V. On E. Proceedings the 28th Conference Decision and Control.21.. 1991.Proceedings of the 2?th Annual Allerton Conference Communication.1989. FenN. Annals of Vol. New York. New York. G. Wnrre. A.. and lcrrxewa. Edited by M. 29. Senurrc. I. A. Vol. and Marcus. pp.. Optimal Control of a Partinlly Obseruable Disuete Markou Process. Discounted and Undiscounted Value Iteration in Markors Decision Problems: A Suruey. 32.71. l. Snwert. 1989. European Journal ofOperational Research. Mrn rrN. JJ.1-15. 4. Puterman. P.1979.John Wiley and Sons. 1261-1272. Comments the Sensitioity the Optimal on of Cost and the Optimal Policyfor a Discrete Markou DecisionProcess. J . Academic Press. pp.. 28. 2't. pp. of on pp. L. E. 1.New York. Tampa. Vol. l-16. D. Monticello. SenNrx. 1967.rNnez-GAUcHERAND. OCTOBER 26. L. On the Compulationof the Optimal Cost Functionfor Discrete Time Markou Models with Partial Obseroations. 31. A. 1978. FpoencRusN. OperationsResearch.