CHRISTEL KEMKE, RICHARD GALKA, MONIRUL HASAN Department of Computer Science, University of Manitoba, CANADA

Abstract. This paper gives an overview of a speech and language interface to an agent system operating in the domain of interior design. The user can instruct the system to place objects in a room, and query it about spatial relations among these objects. The core issue of this research is to provide a framework for the development of intelligent design systems, based on the integration of a natural language interface, an ontology, domain rules and constraints, and a suitable representation of actions, combined under the paradigm of an intelligent agent system.

1. Introduction The "intelligent interior design system" is a prototype of a combined natural language processing and visual scene creation system, with an agent system at its centre, which acts in the domain of interior design. The system handles instructions and questions related to the positioning of furniture in the room, and the addition of furniture to or removal from the scene. The system works with speech input processed through the Dragon Naturally SpeakingTM speech recognition system. It performs a syntactic and semantic analysis of the recognized verbal input and produces a frame-structure representing the user‘s query or command. The analyzed query or command is processed by the system through accessing and modifying the knowledge base, which contains an ontology of the domain, i.e. essentially concepts of furniture and other interior design objects. Spatial relations between those objects are also modeled in the knowledge base. The system integrates some commonsense rules like transitivity of spatial relations, or simple design constraints about placing objects in a room. The system maintains a visual representation of the scene, showing the results of successful instructions given by the user. We describe in other publications [10-15] our general approach for the development of adaptable speech and language interfaces, which has been used for various other intelligent agent systems. Similar research projects,
This work has been supported by NSERC.


which inspired this work, are in particular agent systems with natural language interfaces like TRAINS [1,20] and CommandTalk [17]; BeRP [8] and Verbmobil [22] regarding the connection of speech and language; and WordsEye [3] for visual scene creation based on verbal descriptions. 2. Processing Examples In the "intelligent interior design system", the user can issue command sentences like: "Put a green table in front of the fireplace." The system creates a green table based on the conceptual specification of table stored in the knowledge base. The created instance of the table will be assigned the color-value 'green'. table is in the conceptual hierarchy a subconcept of moveable-furniture, and thus we are allowed to re-position (move) the table later. The resulting visual scene created based on this command-input is shown in the left picture of Figure 1. When interpreting commands or questions, the system currently assumes a fixed position of the user looking at the scene, and interprets spatial relations like 'left-of' or 'behind' with respect to the user's viewpoint. The right picture in Figure 1 shows the change of the scene after the user issues the instruction

"Put a blue sofa besides the table."
The instruction to put a blue sofa besides the green table is ambiguous, and due to the absence of any other rules or constraints to resolve this ambiguity, the system enters a clarification dialog with the user: "What did you mean by beside? Left or right?" This results in the interpretation 'left-of' for 'besides', based on the user's response, which is in this example supposed to be "left".

Figure 1: Modification of a visual scene as response to a verbal instruction



The system confirms a successful action to the user with a verbal feedback: "The sofa has been successfully placed beside the green table." If the verbal input is a query, the system formulates an answer through retrieving the requested information from the knowledge base, for example: "Where is the green table?" Since the knowledge base stores spatial relations between objects, which are updated frequently, when actions occur in the scene, the system can retrieve information about the current relative positions of objects present in the scene. As mentioned earlier, when answering questions about the location of moveable items, a preference is given to describe those locations in relation to non-moveable objects. Thus, the system creates for the question above the following answer: "The green table is in front of the fireplace." instead of specifying its location relative to the blue sofa. It is also possible to remove objects from the scene, simply by issuing an instruction like "Remove the green table". The object denoted as "green table" will be identified by the system, the visual representation will be removed from the scene, and the corresponding instance of 'table' will be removed from the knowledge base together with all its relations to other currently existing objects. 3. System Architecture and Processing at a Glance The overall processing within the system, connecting the natural language interface and the agent system is illustrated in Figure 2. The analysis and representation of a natural language instruction is shown with an example of a put-instruction. The processing starts with a syntactic analysis, followed by the creation of a case frame as basic linguistic representation of the verbal instruction. The processing includes several reasoning steps, for example, identifying the sofa, which is addressed in the user's instructions, as the concrete instance of a sofa named sofa-1, which is present in the scene. Furthermore, the system has to resolve the ambiguity of besides (left-of or right-of), which in this case takes place in cooperation with the user, through a clarification dialogue, since the agent system has otherwise no evidence to decide whether the sofa has to be placed left or right of the fireplace. The result is an instantiated frame representation of the action to be done, including references to involved parameter-objects necessary for the performance of the action. This representation is used for execution, i.e. if it is part of a command-sentence, then the visual scene is updated accordingly, provided this does not violate any constraints.


"Put the red sofa besides the fireplace."

NL Input Analysis

sentence-type: command verb: put dir-object: NP1 (det the) (adj red) (noun sofa) destination: PP1 (prep besides)(det the) (noun fireplace)

NL Frame Interpreter

KB access

action: put object: sofa(sofa-1) ∧ colour(sofa-1)=red destination: besides(fireplace)

Action Interpreter

Clarification Dialogue

action: move object: sofa-1 source: location(sofa-1) destination: left-of(fireplace)

Figure 2. Processing and representation of a natural language command.

In the following sections, we describe the linguistic processing and the underlying knowledge representation of the interior design system in more detail.



4. Linguistic Processing The natural language inputs considered here, which establish the communication with the agent system, consist typically of commands, questions, and possibly declarative statements, as well as confirmations as part of clarification dialogues between system and user. The natural language input is processed with a standard parsing algorithm, the Early-parser, and uses a grammar adapted to typical sentence types and structures used in such verbal interactions, plus a domain specific vocabulary, stored in the lexicon. The lexicon provides a connection to the knowledge base, since lexical items (words) may refer to respective concepts in the knowledge base. For example, the words "sofa" and "couch" are entries in the lexicon, which both refer to the concept sofa in the knowledge base. The major part of the lexicon is constituted by verbs, nouns, adjectives, and prepositions, which are syntactic categories relevant in the interior design domain. The lexicon also contains information on necessary and optional complements to verbs, which is taken into account in the grammar rules and used during parsing. For example, verbs like "put" require a direct object (what to put) as well as a destination phrase (where to put it); verbs like "move" can have in addition a source specification (from where to move it). The parser determines the sentence type (e.g. query or command), and the main syntactic constituents of the sentence, like subject noun phrase, object noun phrase, and prepositional phrases for locations, and an indicator for the queried part of the sentence in case of questions. A command sentence, for example, is characterized through a leading verb, like "move", followed by complements, like a noun phrase for the affected object (the "theme" or "patient"), and one or more prepositional phrases for describing locations (e.g. the "source" and "destination" in case of a move-action). The structural representation of the sentence is processed further through a shallow syntactic-semantic analysis, in which the contents or fillers for a case-frame representation [6,7] are determined. Example: "Put the red sofa beside the fireplace" The system recognizes the sentence-type as a command-sentence, due to the structure verb ("put") followed by an object-NP ("the red sofa"). The prepositional phrase "beside the fireplace" is interpreted as location, since it contains a spatial preposition ("beside") followed by a physical object ("the fireplace"). The system can now construct a case frame representation as shown in Figure 2. Case frames are suitable for systems like the "intelligent interior design system", since they combine syntactic and semantic information, and are thus useful to bridge the gap between the verbal input, the knowledge base, and the agent system.


5. Knowledge representation

The interior design system focuses on certain types of physical objects, like furniture, lamps, etc. and other floor plan items like fireplaces, windows, doors, and actions related to positioning and moving furniture and similar objects. Descriptions of these objects are provided through concepts in the knowledge base, which also contain characteristic features of such objects, like color and size, and spatial relations, as interpretations of prepositions and location phrases. General and domain-dependent rules and constraints are important in this application, in order to check for the applicability and meaningfulness of user instructions.

The representational framework is based on structured concepts, which are arranged in a taxonomic hierarchy. A formal semantics defines clearly the meaning of those concepts and the relations between them. The knowledge representation is based on the formalism of description logics [2]. Description Logic (DL) languages were conceived and used in the first instance for the representation of static concepts and their structural properties. Work on adding dynamic concepts, like actions and events, into the description logic formalism have started in the mid eighties [12,14,15]; more recent work is described in particular in [4,5,11]. The knowledge base maintains conceptual knowledge about objects in general, e.g. that a table is moveable and has a color, and information about specific objects present in the current scene, e.g. that a green table is in front of the fireplace (in more formal terms: there is an instance of the concept 'table', with 'color'-value 'green', and relation 'in-front-of' to a unique instance of the concept 'fireplace'). Spatial relations are modeled as concepts in their own right, which can connect object concepts. Instances of this relation represent physical objects, which are in the respective relation to each other, e.g. if a specific table called "table-1" is left of a specific sofa called "sofa-1", then the pair (table1, sofa-1) is an instance of the concept left-of. Actions are described in a format similar to case frames but with an additional role for the precondition and the effect of the action, described in a logic formula, whose predicates refer to object and relation concepts described in the concept hierarchy. The description of preconditions and effects in a logic format is necessary to apply planning and reasoning methods. For more details and examples of the representation of actions we developed for this kind of communicating agents see [10,11].



Domain rules and constraints are implemented and utilized in reasoning processes by the system through a connection to CLIPS [24]. The rules typically specify spatial conditions and constraints, e.g. "if x is left of y, then y is right of x" or "x can be placed on top of y, only if y has a flat surface". Here is an example of a simple CLIPS-rule, expressing the transitivity of the UNDER-relation.
(defrule UNDER-TRANSITIVITY (direction ?X ?Y UNDER) (direction ?Y ?Z UNDER) => (assert (dir ?X ?Z UNDER)))

When interpreting a user's instruction, the agent system matches the input representation to the rules in the CLIPS rule base. If an instruction involves violating a constraint, it will be refused by the agent. The agent provides instead an explanation, e.g. "The table cannot be put on top of the sofa, because the sofa does not have a flat surface." or "The sofa cannot be placed behind the door because the door can not have objects behind it." 6. Conclusion and Outlook We presented work on a prototypical "intelligent interior design system", an agent system for interior design equipped with a natural language interface and a visual scene creation. As future work, a realistic application of the "intelligent interior design system" in connection with a commercial design system would be desirable. We looked into integrating our system with commercial design systems like the IKEA kitchen planner, but these endeavors were not yet successful, since they would require access to the source code of the commercial design system, which is in general not publicly available. Some improvements to the "intelligent interior design system" would also be required. Firstly, the rules to model spatial constraints are very unspecific. The introduction of more sophisticated design rules and guidelines would be appropriate and useful. Secondly, the system works currently with relative relations to describe physical locations of objects. This creates problems, for example, if an object is being removed. A combination of relative relations and absolute descriptions of locations should be used instead. Based on our experience through the development of the interior design system, and artificial intelligence background, we outline in the next section some requirements and recommendations for the conception and development of Intelligent Virtual Design Environments (IVDEs) in general.


7. General Requirements and Recommendations for IVDEs In order to specify and elicit requirements for Intelligent Virtual Design Environments, it makes sense to look at the composition of words involved in this term, and discuss in which way they contribute to the overall vision of an IVDE. The core term is, of course, "design". This clarifies the domain we deal with, and also the application, which aims at designers and possibly lay persons, who need a design system for their personal use. Thus, we might have to deal with two interest groups: professional designers, who are experts in the domain, and thus familiar with tools and techniques involved in design; and with lay persons, who have only marginal knowledge in this area, and thus need a more intuitive, "everyday" presentation of the design process and outcome. Whereas professional designers will have to deal with design tasks and processes in all areas and on various levels of complexity and abstraction, lay persons will focus on simple design tasks, for example in the home environment, like furnishing a room or house, planning a kitchen or bathroom; in the office environment, e.g. designing stationary for personal or business use, like templates for letter etc.; and in the hobby or leisure area, e.g. designing clothes or jewelry. Whereas for lay persons, a rough description, provided in a non-specialized terminology, is adequate and sufficient, a professional design typically requires very detailed instructions and specifications, which can be subsequently used as basis for an industrial production process, involving teams of human workers and/or manufacturing machines. Thus, the "design environment" will have to deal with different domains (for professional designers and lay persons); the interaction between system and user will have to take different levels of expertise about the design process and design methods into account; and the requirements regarding the outcome will vary and this has to be taken into consideration as well. What is common to both groups and another core term in "IVDE" is that the environment should bare "intelligence". What is this "intelligence" and how can we capture it in an IVDE? The intelligence of a design environment comes from two sides: first of all, it is captured intelligence of designers. This is meant in the sense of the development of "expert systems", which are now more often called "knowledge based systems". An IVDE has to provide a considerable amount of design expertise, which it needs to provide adequate assistance to the designer. A major issue in the development of IVDEs is thus the representation of design knowledge, and - as a more basic foundation - knowledge about the design domain, e.g. kitchen furniture and equipment, and basic principles of physical domains, like the representation of spatial relations and rules and constraints for reasoning about them, as described in the first part of this paper for the interior design system.



On the technical level, hierarchical taxonomies of structured concept descriptions to represent domain objects, and attributes and relations between them, seems suitable. In addition, we need a set of rules describing guidelines and constraints on various levels, including purely physical constraints, domain- and application-dependent rules and constraints, and rules reflecting design guidelines and principles. For example, the interior design system described above, works currently with relative relations only, like "beside" or "left-of". It does not deal with coordinates or absolute location specifications. We found - as expected - that we need absolute locations: relative relations are insufficient; since a lot of rules and reasoning processes cannot be described this way. The issue now is to integrate both forms of representation and include them both in reasoning processes. This bears all kinds of standard problems known in artificial intelligence and knowledge representation, including questions of consistency and ambiguity; complexity and efficiency; closed-world assumption etc. I would assume that some of these problems are discussed in the workshop on "Understanding and Constructing Spatio-Visual Representations". What seems to be common and basic to all design systems is the foundation in a visual, spatial domain. It might be worthwhile to discuss, in which way design domains have common physical grounds, and how we can describe those. When trying to work out some common-sense rules in the domain of interior design, like "two objects can't occupy the same location at any given time", I found that the case is not as obvious and easy as I thought. Some objects can occupy the same space at a time (for example, a table-lamp can be put on top of a table), whereas other objects can't be in the same space at a time, e.g. we don't want to put two sofas on top of each other. One reason for this problem is that our system started out with a 2D visual representation of the scene. I assumed - very naively - that a bird's eye perspective would be sufficient for our purposes, since a lot of planners and design systems in industrial settings work this way. But I was wrong. Objects in an interior design system have to be arranged in 3D. Objects like laps can be put on tables; fireplaces are inside walls, pictures are hanging on walls, lamps can hang from the ceiling etc. Currently the system has a "fake" 3D perspective, in which certain objects like doors, windows and fireplaces are presented as parts of the walls. But even then, certain phenomena cause problems, e.g. putting an object inside another object, for example, a shelf inside a cabinet. The IKEA kitchen planner, for example, is unable to deal with this case. A more sophisticated knowledge representation is required, where the cabinet is a kind of container and the second object is a possibly contained object, or, alternatively, the definition of a rule, which states when an object can be contained in another object, and vice versa. This kind of knowledge representation, regarding objects and rules, is "domaindependent", since what is represented and how it is represented depends on


the domain (e.g. furniture are represented with color and size for an interior design system; and the rules include that sofas cannot be placed on top of sofas), or "application-dependent", since some of the representations and rules depend on what is being done, for example, a warehouse design for IKEA might deal with cabinets and other furniture, described in the same way as for an interior design system, but in this case we can staple sofas, and cabinet doors do not have to be opened. Thus, certain constraints important for the interior design application are not valid for the warehouse design. There are design rules on various levels, from simple, basic physical constraints to philosophical considerations. Rules for interior design systems, for example, deal with comfortable and convenient living and working. On the most basic level, it has to be ensured that doors (of the room and the cabinets) can be opened and that there are no obstacles in the way. It also has to be ensured that there is enough space to walk around and handle items, especially if the design is for a functional space, like a workplace or kitchen. Such rules are according to basic design principles, based on physical and application-oriented constraints. On a higher level, there are questions of aesthetics or design philosophies, like Fengshui. I consider this as special, high-level design expertise. It would be interesting to develop an IVDE, which delivers alternative designs based on certain design principles or philosophies, to be selected by the user. Since these principles are often not "hard" rules, but more something like guidelines, they can be violated. Thus, an IVDE interacting with a user should allow different settings for the relevance of such rules. On a more technical level, this raises questions about the reasoning techniques we use, and how we can combine different forms of guidelines, rules, and constraints. The above mentioned rules are at different levels of "seriousness" - some are hard constrains, since their violation creates physically impossible constellations; others are necessary in a certain domain or application setting, since otherwise the design misses it's purpose; some rules or guidelines might be "soft" (they could be violated); unspecific (vague, with no clear and exact specification) or abstract, and thus require a different kind of representation, for example through fuzzy sets and systems for vague rules or a hierarchical approach to design for abstract rules. One conclusion, summarizing the points above, is that we need a collaboration of professional designers, who have the expertise and knowledge in the design domain, and computer scientists, who have expertise in knowledge representation and reasoning. These two elements seem to be the core issue in providing "Intelligent Design Environments". The last point to discuss here is the issue of providing a suitable "Virtual Environment". "Virtual" means, we design using a computer as the tool and medium of the design process. The term "Environment" indicates, that we



think of a set of tools, techniques and methods, the IVDE should provide, as well as various forms of interacting with the system, i.e. the interface between user (designer) and the IVDE. We focus in our research on verbal communication, i.e. natural language and speech input and output. Since we deal with a visual domain, with actions and objects in a spatial domain, one focus of this research has to be on how objects are described and referred to. The resolution of references and the inclusion of deictic expressions (going back to pointing gestures), have to be taken into account and specifically addressed in design systems and their interfaces. Also context-dependency is very important, since people will work with the system in a sequential manner, step-by-step developing and modifying the design through verbal actions. The TRAINS project [1] is an early example of an interactive planning system with a natural language interface to a virtual agent system. The topics of "context awareness" and "situatedness" [17] are discussed recently in the area of agent systems, which act and interact in work scenarios, in spatial environments, with a temporal component. The graphical presentation of the scenario in our interface is marginal. For a "real" Virtual Design Environment, the visual representation has a lot of relevance and more emphasis has to be put on this aspect. It is a question, what form of visual representation, and at what level of detail, is adequate and suitable for a user of the system. Designers sometimes work with sketch drawings, or with line drawings according to exact specifications and measurements, using dedicated icons and symbols for specific objects. On the other hand, a client of a designer or a lay person using the system might have to see more realistic visual presentations, in order to get an impression of the design. In our interior design system, the visual presentation was not a focus of research but nevertheless, we were in a dilemma whether to use more realistic images (which increases the amount of work to implement the design system, and the system itself has to deal with larger images, which reduces its efficiency), or to use plain line drawings with basic shapes, or something in between. We found in the end, that for our system and nonexpert users, an additional software component, which can create images according to a user's description, based on a prototypical shape of the object plus additional features would be the most suitable approach. Objects in the system are described as instances of concepts with different features, like type (e.g. sofa), size, and color. Based on these descriptions, we can generate images dynamically, which has the advantage that users can create objects and change their attributes on the fly, and the stored information about those objects is consistent with the visual image. In a wider view on design interfaces and environments, going somewhat beyond our research scope, it will be necessary and suitable to look at the integration of interaction modalities. There has been considerable research


in this area, for example Wahlster's research groups at the DFKI [26] and Andre's Multi-media group at the University of Augsburg [27]. The topic of multi-modal interaction is to combine various forms of inputs, e.g. gestures (mouse as well as hand movements), speech and natural language, eye movements, facial expressions etc. into one interface, and integrate this with different forms of output, e.g. visual through images, icons etc., verbal, through speech or text, gestures and other forms of deixis like highlighting etc. My research focuses on natural language interfaces, with a clear justification that a) speech is for us the most natural form of explicit, complex communication b) in some settings people cannot use their hands to act, and c) some people are physically disabled and might not be able to use manual inputs. Nevertheless, I consider that a major aspect of successful IVDE is to provide users with a variety of options regarding tools, techniques, and media of interaction. Acknowledgements
The prototype system described here was developed by graduate students at the University of Manitoba under the supervision of the first author. Richard Galka provided the CLIPS version of the system. Monirul Hasan is currently maintaining and expanding the interior design system. This work has been partly supported by NSERC.

1. J. F. Allen, et al. The TRAINS project: A Case Study in Defining a Conversational Planning Agent, J. of Experimental and Theoretical AI, 7, 1995, pp. 7-48. 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. Patel-Schneider (eds.), The Description Logic Handbook, Cambridge University Press, 2003. 3. B. Coyne and R. Sproat, WordsEye: An Automatic Text-to-Scene Conversion System, Siggraph, 2001. See also: Semantic Light, www.semanticlight. com 4. P. T. Devanbu and D. J. Litman, Taxonomic Plan Reasoning, Artificial Intelligence, 84, pp. 1-35, 1996. 5. B. Di Eugenio, An Action Representation Formalism to Interpret Natural Language Instructions, Computational Intelligence, 14, pp. 89-133, 1998. 6. C. F. Baker, C. J. Fillmore and J. B. Lowe: The Berkeley FrameNet Project. COLINGACL, Montreal, Canada, 1998. 7. C. Fillmore. The case for case. In E. Bach and R. Harms (eds.), Universals in Linguistic Theory, pp. 1-90, Holt, Rhinehart and Winston, New York, 1968. 8. D. Jurafsky, C. Wooters, G. Tajchman, J. Segal, A. Stolcke, E. Fosler, and N. Morgan, The Berkeley Restaurant Project, Proc. ICSLP, pp. 2139-2142, 1994. 9. D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000. 10. C. Kemke, Speech and Language Interfaces for Agent Systems, Proc. IEEE/WIC/ACM IAT-04, pp. 565-566, Beijing, China, September 2004. 11. C. Kemke, A Formal Approach to Describing Action Concepts in Taxonomical Knowledge Bases. In: N. Zhong, Z.W. Ras, S. Tsumoto, E. Suzuku (Eds.), Foundations of Intelligent Systems, Lecture Notes in Artificial Intelligence, Vol. 2871, Springer, , pp. 657662, 2003. 12. C. Kemke, What Do You Know about Mail? Knowledge Representation in the SINIX Consultant. Artificial Intelligence Review, 14:253-275, 2000.



13. C. Kemke, About the Ontology of Actions. Technical Report MCCS -01-328, Computing Research Laboratory, New Mexico State University, 2001. 14. C. Kemke, Die Darstellung von Aktionen in Vererbungshierarchien (Representation of Actions in Inheritance Hierarchies). In: Hoeppner (ed.), GWAI-88, Proceedings of the German Workshop on Artificial Intelligence, Springer, 1988. 15. C. Kemke, Representation of Domain Knowledge in an Intelligent Help System, Proc. of the Second IFP Conference on Human-Computer Interaction INTER-ACT’87. pp. 215200, Stuttgart, FRG, 1987. 16. J. McCarthy. Formalizing Common Sense: Papers by John McCarthy. Ablex, 1990. 17. G. Rickheit, and I. Wachsmuth (eds.), Situated Communication, Mouton de Gruyter, 2006. 18. A. Stent, J. Dowding, J. M. Gawron, E. Owen Bratt and R. Moore. The CommandTalk Spoken Dialogue System. Proc. 37th Annual Meeting of the ACL, pp. 183-190, University of Maryland, College Park, MD, 1999. 19. M. C. Torrance. Natural Communication with Robots, S.M. Thesis submitted to MIT Department of Electrical Engineering and Computer Science, January 28, 1994. 20. D. Traum, L. K. Schubert, M. Poesio, N. Martin, M. Light, C.H. Hwang, P. Heeman, G. Ferguson, J. F. Allen. Knowledge Representation in the TRAINS-93 Conversation System. Int. J. of Expert Systems 9(1), Special Issue on Knowledge Representation and Inference for Natural Language Processing, pp. 173-223, 1996. 21. S. Thrun, M. Beetz, M. Bennewitz, W. Burgard, A.B. Cremers, F. Dellaert, D. Fox, D. Haehnel, C. Rosenberg, N. Roy, J. Schulte, and D. Schulz. Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva. Intern. Journal of Robotics Research, 19(11):972-999, 2000. 22. W. Wahlster. VERBMOBIL: Erkennung, Analyse, Transfer, Generierung und Synthese von Spontansprache. Report, DFKI GmbH, Juni 1997. 23. PowerLoom, 24. CLIPS, 25. Dragon, 26. 27.