A STOCHASTIC MODEL OF COMPUTER-HUMAN INTERACTION FOR
LEARNING DIALOGUE STRATEGIES
Esther Levin and Roberto Pieraccini AT&T Labs-Research, 180 Park Avenue, Floram Park, NJ 07932-0971, USA (estherjroberto)@research.att.com
ABSTRACT faction (e.g. a simple happy/not happy-with-the-system
Recent progress in the eld of spoken natural language feedback from the user at the end of dialogue, number of understanding expanded the scope of spoken language users hanging up before the completion of the dialogue systems to include mixed initiative dialogue. Currently goal, etc.). The actions taken by the system may af- there are no agreed upon theoretical foundations for the fect some or all of the terms of the objective function, design of such systems. In this work we propose a stochas- and therefore an optimal strategy is a result of a correct tic model of computer-human interactions. This model trade o between them. We illustrate this formalization can be used for learning and adaptation of the dialogue in two examples. One is a simple form lling applica- strategy and for objective evaluation. tion for which an optimal strategy is derived analytically, and it quanties a reasonable strategy adopted in real systems [5]. The other example refers to our research 1. INTRODUCTION database retrieval dialogue system [6]. Due to the com- plexity of the system, the optimal strategy cannot be de- Man-machine interactions, from simple touch tone menus rived analytically in this case. Hence we propose to rep- to more complex speech based systems are ubiquitous in resent the dialogue system as a stochastic model known today's world. In the natural language and AI research as Markov Decision Process (MDP) and to use reinforce- communities there are even more complex examples of ment learning algorithms [4] to nd the optimal strategy dialogue systems that make use of natural ways of com- automatically. munication like speech and language [1, 2, 3, 6]. In this paper we show that such dialogue systems can 2. FORMALIZATION be formally described in terms of their state space, action set and strategy. The state of a dialogue system repre- Here we introduce a formal model that describes, without sents all the knowledge the system has at a certain time loss of generality, any man-machine dialogue system in about internal and external resources it interacts with terms of state space, action set and strategy. (e.g. remote databases or machinery, user input, etc.). st - the state of the dialogue system at time t that The action set of the dialogue system includes all possi- includes the current available information about internal ble actions it can perform, including dierent interactions and external processes controlled by the dialogue system. with the user (e.g. asking the user for input, providing a S - the space of the system states (nite or innite). It user some output, conrmations, etc.), interactions with includes two special states: sI is an initial state, and sF other external resources (e.g. querying a database), and is a nal state. internal processing. The dialogue strategy species, for at - the action performed by the system at time t. The each state reached, what is the next action to be invoked. next state of the dialogue, st+1 , depends on the current The strategy is usually devised by a designer that tries state st and current action at . This dependence is in to predict all possible situations the dialogue system can general not deterministic. get into (in terms of conditions on the dialogue state), A - the set of all system actions (usually is nite). and what are the appropriate actions to take in those - the strategy of the system described in terms of a situations. There exist no scientically motivated guid- mapping between the state space S and the action set A. ing principles for the design of the strategy, and therefore The strategy of a dialogue system species conditions on this process of design can be considered as an art, rather the current state upon which a certain action is invoked. than engineering or science. As a result, today there are To illustrate these concepts we consider a very simple no known methods for objective evaluation and compari- form lling application: the goal of the system consists son of dialogue systems, even those designed for the same in lling all the slots of a form asking the user the ap- application. There are no methods for automatizing the propriate questions. We assume that the user will always design of the strategy, neither for an automatic adapta- answer the system questions obediently. tion of the dialogue strategy in presence of interactions State space: A state s of the system is represented by with users and their feedback. k1 ; : : : ; kN where N is the number of slots to be lled and Here we propose to state the problem of dialogue strat- n slot i is lled : egy design as an optimization problem. We assume that there is an implicit objective function (expected dialogue ki = 10 ifotherwise (1) cost) that drives the design of the dialogue system. This objective function can be written as a sum of dierent S consists, in this case, of 2N states. terms, each representing the cost of a particular dialogue Action set: A = fA0 ; A1 ; : : : ; AN g, where dimension. Some of this dimensions can be measured di- Ai ; i = 1; : : : ; N represents a question about the i-th slot. rectly by the system, like dialogue duration, cost of in- Generally, an action corresponding to a system's question ternal processing, cost of accessing external databases or requires the use of a speech recognizer, an understanding other resources, cost of ineectiveness (e.g. number of system, or other devices for dierent input modalities, in errors the system made due to poor speech recognition); order to collect the answer from a user. others quantify such abstract dimensions as user satis- A0 corresponds to the action of ending the dialogue and submitting the form. 3.2. Dialogue system as an MDP State transitions: In this system the state transitions To use MDP for modeling man-machine dialogue requires are deterministic: the system starts in the initial state the following assumptions: where ki = 0; i = 1; : : : ; N ; an action Ai ; i = 1; : : : ; N de- The transitions between the states of the dia- terministically changes the current state to the next one, logue are characterized by a stochastic Markov process for which ki = 1; A0 brings the system to the nal state. PT (stjst?1 ; at?1 ). Strategy: Common sense will suggest the following rea- There are costs c associated to dialogue actions in each sonable strategy for this system: for each state ask a ques- state of the dialogue distributed according to PC (ct = tion about one of the slots that are not lled yet, and C jst = S; at = A). submit the form when all the slots are lled. The optimal strategy for a dialogue system results from a minimization of an expected dialogue session cost. 3. A QUANTITATIVE MODEL With this assumptions we pose the problem of nding a As was explained in the previous section, the system de- good strategy for a dialogue session as an optimization sign for new application consists of determining the state problem of minimizing an expected cost of a dialogue ses- space S, the action set A, and nding a good strategy, sion. In general, the expected cost can be written as a that is usually the most time consuming part of the de- sum of multiple terms, sign. Since the state space in most of the applications is X very big or even innite, strategy design involves an iter- CD = i Ci ; (4) ative process that consists in testing the system, nding states in which the current strategy results in an unrea- i sonable action, correcting it, and so on, until a reasonably where the terms Ci represent the expected value of impor- stable strategy is found. In what follows we propose a tant dimensions of quality of a dialogue session, i.e., the model for quantifying the concept of good strategy. The duration of a dialogue, number of errors in recognition advantages of this quantitative model include the possi- or understanding of the user, cost of accessing external bility of objective evaluation and an automatic way of databases, distance from the user goal, etc., and i are designing an optimal strategy, or learning it from real or positive weights. The 's measure the actual cost per simulated data. unit of respective dimensions, and control their relative 3.1. Markov Decision Process importance. The quantitative dialogue model relies on the description of dialogue system in terms of Markov Decision Process 4. EXAMPLE 1: SIMPLE FORM FILLING (MDP) [4]. In this example we assume that the user input is noisy Formally a Markov Decision process is a stochastic (e.g. due to errors introduced by a speech recognizer), model that consists of: and the goal of the system consists in lling the form cor- State space S, including initial and nal states sI and rectly. We assume that the error rate p of user answers sF . At each discrete time step only one state is active, depends on the type of question the system asked. the active state at t = 0 being the initial state sI ; State space: as in the example in section 2. Set of actions A; Action set: A = fA0 ; A1 ; : : : ; AN ; A1;2 ; : : : ; AN;N ?1 g, Transition probabilities PT (st jst?1 ; at?1 ), describing where A0 corresponds to the action of ending the dia- the probability of the next active state given the previous one and the previous action. The Markovian property of logue and submitting the form; Ai ; i = 1; : : : ; N repre- this model assumes that: sents a question about the i-th slot; Ai;j ; i; j = 1; : : : ; N is a compound question about i-th and j -th slot (an exam- P (st jst?1; at?1 ; : : : ; s0 ; a0 ) = PT (st jst?1 ; at?1 ): (2) ple of a compound question is asking the user for a date, instead of asking separately for month and day number). We assume that the error rate for user's answer is p1 for Each action in a given state is associated with a rein- single-slot questions Ai ; i = 1; : : : ; N , and p2 > p1 for forcement (cost or reward) c 2 <. An MDP is charac- compound questions. terized by cost probabilities PC (ct = C jst = S; at = A), State transitions: Each question modies the current describing the probability of getting a reinforcement C state similarly to the example of section 2 by determin- when executing an action A in state S . istically changing the values of ki to 1 for those entries i In a generative mode, the system starts at state sI , the user was asked for. chooses an action from a possible set of actions A, re- Reasonable Strategy: A strategy in this example de- ceives a reinforcement signal c drawn randomly from an pends on the actual values of p1 and p2 . If p2 p1 , appropriate cost probability PC , and reaches the next a reasonable strategy will prefer compound questions to state according to PT . This process will continue until reduce the duration of the dialogue. When p2 is much the system reaches the nal state sF . Assuming that the larger than p1 and compound questions will result in an system reached a nal state, the cost of such session (path unreasonable error rate, single-slot questions should be through the state space) is the sum of all the cost involved: preferred. Expected dialogue cost: Here the goal of the system is X t=TF to ll correctly the form slots with shortest dialogue, and CD = ct; (3) therefore the dialogue cost is t=0 CD = Q NQ + U NU + E NE ; (5) where sTF = sF . The expected cost CD is the expecta- tion of session cost with respect to the two probabilities, where: PT , and PC and it depends on the particular sequence of NQ is the expected number of questions in the session, actions chosen during the session. NU is the expected number of unlled slots in the sub- A strategy of an MDP is a mapping between states mitted form, and actions, indicating which action the system should NE is the expected number of erroneously lled slots take in each state. An optimal strategy is a strategy in the form determined by the error rate for the kind of that minimizes the expected cost CD , question used to ll the slots. Cost distributions: The costs in this application are de- from the database) the system will generate a question terministic: asking for additional constraints. Then a query is formed ( Q + E p1 if at = Ai ; i = 1; : : : ; N according to the user's answer, and the data is retrieved. There are three possible situations: if there is no data to Q + E p2 if at = Aij ; i; j = 1; : : : ; N : c(at ; st ) = match the request (ND = 0), the system will ask the user U N U if at = A0 to relax a constraint, form a new query, and retrieve the (6) data; if the number of tuples retrieved is too large (e.g. Optimal strategy: The optimal strategy minimizing ND > 3) the system will ask the user for additional con- the expected cost (5) quanties the reasonable strategy by straints; otherwise (e.g. 0 < ND < 3), the system will specifying the exact conditions on error rates p2 and p1 output the data. for choosing the right actions: Expected dialogue cost: Don't ask any questions and submit an empty form if U < Q + E p1 . CD = Q NQ + R ND + 0 f (ND ); (8) Else, if the error rate for a composite question is much larger than the one for a single question, p2 ? p1 > 2QE , where NQ is the expected number of questions measuring ask only single-slot questions without repetitions, and the length of the dialogue, ND is the expected number submit the form when lled. of tuples retrieved from the database during the session Else, ask composite questions about unlled slots in the measuring the cost for data retrieval and f (ND ) is the form, and submit the form when lled. expected channel cost associated to the number of tuples the system showed the user by the end of the session. It 5. EXAMPLE 2: AMICA re
ects the preference of users for short and concise, but AMICA [6] dialogue system is a research mixed initiative non-empty, outputs. In our case: spontaneously spoken input dialogue system that was ini- ( 0 1 ND 3 tially developed for the ARPA ATIS task. The application f (ND ) = C1 ND 0 ; (9) here is an intelligent interface between a user and a rela- C2 ND 3 tional database. State space: The current state of the system is repre- Referring to the system description in [6] we can map spe- sented by s = (M; Q; ND ; C; R), where M is a meaning cic modules to cost terms of equation (8): the minimal template representing the current user request (obtained information module tries to minimize the database access by a speech understanding module); Q is a database cost ND ; the constraining and relaxation modules try to query; ND is the number of data tuples obtained by minimize the cost of the output channel f (ND ); and all retrieving data from the database according to Q, with of them contribute to the duration cost NQ . ND = ?1 if no retrieval was attempted yet; C is a tem- Cost distributions: The costs are random variables as plate of additional constraints; R is a template of con- follows: straints to relax. ( S Action set : Q at 2 AC AR [ [ [ c(at ; st ) = R N D at = AQ ; (10) A = AQ AC AR AO (7) O f (ND ) at = AO where: Optimal strategy: The optimal strategy minimizing the AQ is an action that forms a query Q (i.e. appending expected cost, trades o the cost components of equa- the additional constraints from C to M , and removing tion (8). We cannot derive the optimal strategy analyt- the relaxed constraints specied in R), and retrieving the ically. Currently we are experimenting with a reinforce- data from the database according to Q. ment learning algorithm for learning the optimal strategy AC is a set of actions that correspond to asking the user from interactions [4]. for an additional constraint. This set is parameterized by 6. SUMMARY the attribute the system suggests to constrain. AR is a set of actions that correspond to asking the In this paper we propose a formal quantitative model user to relax a constrain. This set is parametrized by the for man-machine dialogue systems. First, we introduce attribute the system suggests to relax. a general formalization of such systems in terms of their AO is the action of showing or verbalizing the retrieved state space, action set and strategy. With this formaliza- data to the user. tion we can describe any dialogue system without loss of State transitions: The state transitions in this case are generality, but it does not provide a quantitative analy- deterministic for all actions, except the actions of asking sis of dialogue system qualities. Then, we proceed with the user for constraining/relaxing the query, where the the main assumption that a good strategy for a dia- next state is determined by the user's answer to the ques- logue system is minimizing an objective function that re- tion with the appropriate probability: the system starts
ects the costs of all the important dialogue dimensions. in an initial state where M is set to the initial user's re- With this assumption we can model any man-machine di- quest, C , Q, R are empty, and ND = ?1; actions in AC alogue system using a Markov decision process, a stochas- modify the C template in the current state by append- tic model commonly used today for control, games, and ing or not the appropriate constraints according to the other applications, and use reinforcement learning algo- user's answer; actions in AR modify R template in the rithms for designing the optimal strategy automatically. current state by appending or not the appropriate con- This paradigm also allows us to objectively evaluate and straints according to the user's answer; AQ sets Q and compare dierent strategies and dierent systems for the ND ; AO brings the system to the nal state. same application. Reasonable Strategy: A strategy that we found useful REFERENCES and reasonable for this system is as follows: the system starts in an initial state in which an initial user's query [1] Glass, J. et al., \The MIT Atis System: December was specied. If this query is under-constrained (we spec- 1994 Progress Report", Proc. of 1995 ARPA Spo- ied a set of conditions on the query when we believe that ken Language Systems Technology Workshop, Austin it might result in too large of a data set to be retrieved Texas, Jan. 1995. [2] Sadek, M.D., Bretier, P., Cadoret, V., Cozannet, A., Dupont, P., Ferrieux, A., & Panaget, F., \A Cooper- ative Spoken Dialogue System Based on a Rational Agent Model: A First Implementation on the AGS Application," Proceedings of the ESCA/ETR Work- shop on Spoken Dialogue Systems, Hanstholm, Den- mark, 1995. [3] Stallard, D., \The BBN ATIS4 Dialogue System," Proc. of 1995 ARPA Spoken Language Systems Tech- nology Workshop, Austin Texas, Jan. 1995. [4] Kaelbling, L. P., Littman, M. L., Moore, A. W., \Re- inforcement Learning: A Survey," in Journal of Arti- cial Intelligence Research, No. 4, pp. 237-285, May 1996. [5] Marcus, S. M., Brown, D. W., Goldberg, R. G., Schoeer, M. S., Wetzel, W. R., and Rosinski, R. R. \Prompt Constrained Natural Language - Evolving the Next Generation of Telephony Services," Proc. of ICSLP '96, Philadephia (PA), October 1996. [6] Pieraccini, R., Levin, E., \AMICA: the AT&T Mixed Initiative Conversational Architecture," Eurospeech 97, Rhodes (Greece), Sept. 1997