You are on page 1of 18

INTELLIGENT MULTIMEDIA PRESENTATION SYSTEMS:

A PROPOSAL FOR A REFERENCE MODEL


M. BORDEGONI G. FACONTI T. RIST
CNR ITIA CNR CNUCE DFKI
Milano, Italy Pisa, Italy Saarbrucken, Germany
S. RUGGIERI P. TRAHANIAS M. WILSON
Universita di FORTH-ICS CRLC-RAL
Pisa, Italy Heraklion, Crete, Greece Chilton, Didcot, UK
The development of the so called intelligent multimedia presentation systems
(IMMPSs) is currently very actively addressed by research groups worldwide. A
common goal of the research community is to develop mechanisms for the au-
tomated generation of multimedia presentations. Up to now, some large-sized
prototypes of IMMPSs have been built and user interfaces of some applications
already include automated components for certain generation tasks such as text or
graphics generation. Unfortunately, there is no common agreement on a generic ar-
chitecture for IMMPS with clear functional denitions of involved subcomponents.
Moreover, even the terminology used in the descriptions of existing IMMPSs varies
considerably across research teams. With the proposal of a reference model for
IMMPSs, this paper tries to ll a major methodological gap and thus may pro-
vide a sound basis for ongoing and future developments of IMMPSs. In essence,
the proposed reference model consists of several layers referring to the particu-
lar subtasks which occur in multimedia presentation generation. Following the
paradigm of knowledge-based computing, we introduce a minimum set of explic-
itly encoded knowledge, and assign it to some logically distinct knowledge sources.
These sources allow to share knowledge among components of dierent layers in a
client-server fashion. In order to demonstrate the use of the reference model, we
provide a comparison of two IMMPS by redescribing them in terms of the model.

1 Introduction
Observing an increasing number of applications where the \manual" author-
ing of presentations is no longer feasible, the development of mechanisms for
the automated generation of multimedia presentations has become a shared
goal across many disciplines. Since the use of multiple media for conveying
information does not guarantee eective and intelligible presentations per se,
these mechanisms need to be intelligent in the sense that they are able to
draw appropriate design decisions. Up to now, a lot of research and develop-
ment work has been conducted addressing aspects of automating multimedia
presentation generation, and even some large-size prototypes of IMMPSs have
been developed (e.g., MMI2 , WIP, MIPS, COMET 1 ). However, no generic
model has emerged so far. Each project began from scratch, relying only on
1
the past experience of the developers. Thus it is not surprising that there is
no agreement on the terminology to be used, on the functional de nition of
an IMMPS, or on a generic architecture which re
ects the logical structure of
processes for the generation of multimedia presentations.
With the proposal for a reference model for IMMPSs, this paper tries to ll a
major methodological gap and thus may provide a sound basis for ongoing and
future developments of IMMPSs. The agreement on a reference model in the
eld will have several advantages. First of all, there is a general motivation for
having a reference model for any class of related computing systems. Among
other things, a reference model will help to analyze and compare existing sys-
tems of a certain class on the basis of a common generic architecture, and by
means of a common terminology. Moreover, the generic architecture being the
core of a reference model will foster the modular development of future large-
size systems, as each module can be assigned a well de ned role (in terms of
the reference model), with well de ned interfaces between system components
and to other systems.
Our reference model is targeted towards the class of systems (or components
of superordinate systems) whose task is to present information to the user in
an e ective way. Hereby, the attribute e ective means, that the particular
information needs of the individual users are best met under a given set of
presentation constraints, such as resource limitations and the user's knowledge
and style preferences. Since in the vast majority of non-trivial applications the
information needs will vary from user to user and from situation to situation,
a presentation system should be able to
exibly generate various presentations
for one and the same information content to be communicated. Having identi-
ed the class of systems to be captured by the model, let's turn our attention
at the general guidelines that have driven our design decisions:
Adequate modularisation: To facilitate the development and compari-
son of practical large-size systems, the reference model must comprise a
modularisation of a generic process for multimedia presentation gener-
ation. Idealy, this modularization breaks down the generation process
into logically distinct and computationally feasible subtasks.
Appropriate degree of abstraction: The reference model should, on the
one hand, re
ect the peculiarities of multimedia generation. On the other
hand, it should be general enough to capture the whole class of IMMPSs.
Certainly, the generic architecture should abstract from concrete imple-
mentations, as it is always possible to rely on dierent mechanisms to
accomplish a single generation subtask, and dierent formats for repre-
senting knowledge are always a choice.
2
Identication and classication of knowledge sources: As intelligent pre-
sentation design is a knowledge intensive task, the reference model should
exhibit the basic set of logically distinct knowledge sources which are
required for multimedia presentation generation. The reference model
should also make clear how processes and knowledge sources are related
to each other. In particular, private knowledge sources, i.e., sources for
which a single owner component can be located, should be distinguished
from those sources which are shared among several components.
Modeling of shared sources in the client-server paradigm: To facilitate
sharing of knowledge sources (in the generic architecture as well as in
concrete system implementations) the latter should be modelled following
the client-server paradigm. Such sources will be referred to as expert
modules  they are deemed to serve requests from client modules, possibly
belonging to other systems.
Openess to other standards: Multimedia generation comprises subtasks
which have been dealt with in other disciplines. Therefore, the model
should be open to combine with existing or potential standards in these
disciplines. For example, the Computer Graphics Reference Model 2 may
be used to instantiate the subcomponent for graphics generation in the
generic architecture of our model. Existing reference models for hyper-
media presentations such as the Dexter model 3 or the AHM (Amster-
dam Hypermedia Model 4) may be used for the description of gener-
ated presentation speci cations. Also one may rely on a standardized
language for the exchange of knowledge between components, such as
KQML (Knowledge Query and Manipulation Language 5 ). Vice versa,
the reference model can itself become a component of a superordinate
model, such as the well known Seeheim Model 6 which has been proposed
as a generic architecture for user interfaces.
The rest of the paper is organized as follows. First, some basic notions are
introduced in order to characterize the class of intelligent multimedia presen-
tation systems . The core of the model, which is a generic architecture of
a generation process for multimedia presentations, is given in section 3. To
demonstrate the use of the model, two existing IMMPSs are redescribed with
the proposed terminology and architecture in section 4.
2 Basic Notions
Presentation systems are designed for achieving goals by means of presen-
tations that are perceivable and, consequently, subject to be interpreted by
3
their intended user, who is always assumed to be a human being.
Presentation goals which have to be achieved by the system, possibly ac-
companied by a set of presentational commands or presentation constraints
aecting the presentation process, constitute the primary input to the pre-
sentation system. Both goals and commands are assumed to be formulated
outside the presentation system, i.e., by the user, or by an external system, or
by a superordinate component in case the presentation system is a part of a
larger system. Goals and commands include a high level reference to collec-
tions of data together with the purpose (or the intention) for communicating
information. As an example, a goal can be formulated as an encoding of the
fact that the system has to inform the user of the location of a speci c switch
in a control panel. Similarly, a presentational command can be a representa-
tion of design constraints such as the minimum size of the switch for being
perceivable, if graphics is chosen as the presentation media.
Application data/knowledge provides the semantical grounding of each
presentation which may be generated by a system. As with presentation goals
and commands, it is assumed that application data/knowledge is part of the
input to a presentation system. In other words, there must be an external
source (e.g., an application system or a database) that makes available to the
presentation system the application data necessary for achieving posted goals.
Following the example above, the application can provide the switch and the
control panel geometries required to identify the switch position. However, the
request of informing the user of the switch position could have been formulated
by a dierent system.
A multimedia presentation is a presentation which comprises material in
dierent media such as text, graphics, sound, video, etc. However, it cannot
be denied that there is a lot of confusion about the fundamental notion of
media . One reason of this misunderstanding is certainly due to the fact that
the term media is used with dierent meanings in dierent contexts, such as in
semiotics, psychology, telecommunications, or computer science. The closely
related term modality is a further source of confusion. Some authors seem to
use both terms as synonyms, while others tend to reserve modality for input
only, and vice versa, media for output. Again others try to assign these terms
to dierent categories, e.g., medium for the system used to convey a piece of in-
formation, whereas modality denotes the way in which a presentation is sensed.
Unfortunately, to the best of our knowledge, we are not aware of any attempt
so far that clearly distinguishes both concepts from each other without becom-
ing inconsistent when generalizing the de nition to cover all possible media
which may occur in a presentation. To overcome this problem we only employ

4
the term medium in our model. We follow 7 and regard any single mechanism
by which to express information as a medium . Consequently, multimedia is an
adjective referring to the use of multiple media.
A multimedia presentation system in that sense is a presentation system
that starts from a given presentation goal as triggering input, and generates a
multimedia presentation as output.
Intelligent multimedia presentation systems are essentially knowledge-
based systems, i.e., systems which rely on the notion of knowledge , as the
justi ed true beliefs of an abstraction of data in order to draw appropriate
design decisions. The knowledge present in an IMMPS may include a number
of knowledge sources dicult to explicitly quantify. However, starting from
the experience of previous systems, the following knowledge sources seem to
be indispensable:
(i) knowledge to reason about application data, such as term interpretation
and data characterization
(ii) the discourse model and the context referencies
(iii) the user's goals and plans, capabilities, attitudes, knowledge or beliefs
(iv) general knowledge about the design of multimedia presentations in terms
of design constraints, cognitive theories, media models and device models,
together with the design/ realization for a speci c media in a multimedia
presentation, the system's characteristics relevant to the presentation
process and the knowledge for identifying an eective and coordinated
collection of media for the representation and presentation of application
data.
The knowledge present in an IMMPS may include, and often this is the case,
other explicitly encoded knowledge sources than those described. However, as
a working de nition we will classify a multimedia presentation system as an
intelligent system, if it exploits at least the above mentioned knowledge sources
to achieve presentation goals.
3 Outline of the Reference Model
The core of the reference model is a generic architecture for IMMPSs. The
conceptual design of this architecture re
ects a modularisation of the design
process into layers which are responsible for particular subtasks, and a sep-
aration of shared knowledge sources (called experts) from the layers. Fig. 1
introduces the IMMPS reference architecture. It is composed of four layers,
5
namely control, content, layout and presentation . In addition to their private
knowledge, these layers may exploit explicitly encoded knowledge provided by
a knowledge server which is composed of the application , context , user and de-
sign expert modules. The system receives goals and presentational commands
from the goal formulation , designs a presentation asking the application for
application knowledge, and nally communicates the presentation to the user.
Furthermore, a knowledge exchange format (i.e., a language and a protocol) is
used for the exchange of information among the system components. However,
this format is not explicitly dealt with here. It is only assumed that it underlies
each arrow between communicating components.
Goal Application
formulation

Knowledge Server
Control
External servers

Layer Application Goal Formulation


Expert
Goals
Present. commands Notifications
Content Context Control Layer
Layer Expert Goals
Present. commands Notifications
Content Layer
Layout
Media comm. acts
Layer User
External clients

Notifications
Expert Present. commands
Layout Layer
Design
Unrendered pres.
Presentation
Layer Present. commands Notifications
Expert

Presentation Layer
Presentations
User User

Fig. 1: Generic IMMPS Architecture Fig. 2: Information Flow through Layers

Although the goal formulation and the application play dierent roles, in most
real systems they appear as a single component, usually called the applica-
tion . Our distinction emphasizes the dierence between the availability of
application knowledge/data, and the use of that knowledge/data to satisfy
presentation goals.
Also, it is necessary to model some interactions of the system with external en-
tities. This is the case, for example, when the user expert acquires knowledge
directly from the user, or from an input subsystem. In the gures, interac-
tions with unspeci ed systems, called external servers/clients , are denoted by
dashed arrows. The modeling could be done by means of data capture metales
as in the Computer Graphics Reference Model 2 .
6
3.1 Layers
The four layers of the reference architecture are collectively responsible for re-
alizing the goal being requested. Each layer performs a transformation on its
input and delivers the result to the next layer in the hierarchy as illustrated in
Fig. 2. The input to the presentation generation process are goals/commands.
While being processed through the layers, goals/commands are eventually
transformed into multimedia presentations. Although the
ow of information
during this transformation is primary \top-down", it must be stressed that
many interactions can occur between components. This is indicated by the
noti cation arrows in Fig. 2. Each layer noti es to the layer above the result
of its processing, possibly together with some additional information, such as
causes of failure, recovery strategies, explanations, etc. The individual layers
are described in the following paragraphs.
Control Layer
The control layer consistently coordinates the presentation process in time.
Its task is to choose the next goal to be achieved or the next presentational
command to be executed. The task of goal selection may occur in multimedia
generation for two dierent reasons. First, it might be the case that the (ex-
ternal) goal formulator poses a set of goals to the presentation system, either
all at once or in an undetermined piece-meal fashion. In the latter case, an
already started generation process may be interrupted in order to achieve a
new incoming goal immediately. The second reason is that presentation goals
given as input to the system may be complex, so that they have to be split
into sets of less complex goals. While the decomposition of goals will be done
by the goal renement module of the content layer, the control layer ought
still to decide in which order the subgoals will actually be processed. Deciding
the next goal to be achieved can be a very simple task. It could involve only
popping a goal from the discourse model in the context expert. On the other
hand, some private knowledge may be present in the control manager, if the
decision involves more complex reasoning. The role of the GF Interface mod-
ule is to convert the messages to be exchanged between goal formulation and
control manager in the appropriate format. This task is not carried out by the
underlying knowledge exchange format, since the transformation depends on
the speci c representation language(s) used in the IMMPS.
Content Layer
The content layer (cf. Fig. 4) serves to determine a set of so called media com-
munication acts. These are communication acts enriched by semantic/logic
7
content they are assigned to a particular medium which should be used to
convey that content. Communication acts are an extension of speech acts 8
to multimedia communications (see also 9 10). The task is accomplished by
means of coordinated goal re nement, content selection, and media selection.
As presentation goals may be formulated at a high level of abstraction, they
need to be re ned accordingly. The term goal renement is used to capture
both the decomposition of a goal into a set of subgoals and the specialization
of a goal. During goal re nement, the content of the nal presentation will be
determined. The module for content selection assists in carrying out this task.
In a concrete system, this module may appear as a retrieval and lter compo-
nent which communicates with the application expert. The output of the goal
re nement process is a set of communication acts and a structural description
of the relations that may hold between these acts. As soon as communicative
acts have been worked out, it must be decided which media shall be employed
to convey them best. For this task, the reference architecture includes a com-
ponent for media selection . Since there are many dependencies among choices,
the architecture also foresees a coordination module. Its mission is to merge
communication acts passed by the modules for content and media selection.
In case of successful coordination, the acts {now called media communication
acts{ are handed over to to the layout layer. The coordination process can,
however, require negotiations among goal re nement, content/media selection,
and the subsequent layout layer. Consider the switch example, where the min-
imum size speci ed for the switch to be perceived, might be too big for the
control board to t in the indicated window.
Control layer
Goal Formulation

Goal Knowledge server


GF Interface Refinement

Content Media
Knowledge server

Selection Selection
Control Manager

Coordination

Content Layer Layout Layer

Fig. 3: Control Layer Fig. 4: Content Layer


8
Layout Layer
In this layer, the media communication acts are transformed into a presentation
layout. The layout layer is composed of the layout manager , the media design
and realization and the coordination (cf. Fig. 5). The task of the rst compo-
nent is to design the general layout structure of the presentation, together with
dispatching of the media communication acts to the relevant media design and
realization modules and notifying the content layer. The result of single media
processing is coordinated to the end of generating an unrendered presentation
to be passed to the presentation layer. From this, the layout layer receives
noti cations. Finally, it sends noti cation to the content layer, to report the
result of elaborating media communicative acts.
The determination of a presentation layout can incur various negotiations
among the involved media speci c generators. As mentioned above, a ne-
gotiation process can also occur between components of dierent layers. For
example, suppose for the purpose of determining the contents of a cross-media
reference expression, the content layer has to query the layout manager at
which part of the display space a certain picture will occur.
It should be noted that the unrendered presentation cannot be said to be an
internal representation of a presentation, but only of a piece of it, since the
layout layer has a restricted view on the presentation process and may be
asked to achieve goals which are less complex than the overall one. On the
other hand, the ouput of this layer entails all the information required to \run"
the presentation properly. To represent this information, one could rely on a
speci cation language as proposed in 4.
Content Layer

Layout

Manager
Design

Design

Design

Design
Graphics

Design
Media Design

Audio

Video
.........

Text

Knowledge Server
Media Realization

Realization

Realization

Realization

Realization
Realization
Audio

Graphics
...........

Video

Text

Layout

Coordination

PresentationLayer

Fig. 5: Layout Layer


9
Presentation Layer
While the output of the layout layer is a unrendered presentation, there must
be a further component which takes this representation as input and converts
it into a presentation perceivable by the user. This part of a presentation sys-
tem is often called \presentation display component" or \presentation runtime
environment". The presentation layer of the current reference model captures
the functionalities of such components. A coordination module dispatches each
part of the unrendered presentation to the suitable device interface(s). The
result presented to the user is the (spatially and temporally) coordinated fusion
of these outputs.

3.2 Knowledge Server


The Knowledge Server shown in Fig. 1 provides the layers of the reference ar-
chitecture with several types of knowledge. It consists of four expert modules,
designed along the lines of knowledge bases. Each of them represents knowl-
edge on a particular aspect of the presentation process: Application , user ,
context and design . They all share the same general structure. The experts
are accessed by other components in a client-server fashion. Requests will be
replied taking into account the user model and the context. Experts may re-
quire, and vice versa provide, services from/to external knowledge sources, such
as the application and the user himself. In this sense, they are independent
modules, easily integrable with other systems which share the same knowledge.
Expert modules
The overall structure on an expert module is independent from the particular
class of systems we are describing. In principle, they dier only in the knowl-
edge which they store and in the interface operations they provide. Fig. 6
shows the overall logical structure of a generic expert module.
The core of the expert module is the knowledge it stores. This knowledge
can consist of a number of logically distinct knowledge bases, each for a logi-
cally dierent aspect to be dealt with, or only a single knowledge base, when
such distinction is not relevant. The inference engine provides a uniform and
general view of the stored knowledge and of that inferable from it. Mainte-
nance involves incorporating new knowledge, which could be inconsistent with
the current state. If this is the case, inconsistency must be resolved by some
triggered action performed by an integrity checking sub-module. The three
components, knowledge base , inference engine and integrity checking , are re-
ferred to as the Knowledge Base System (KBS) of the expert.
The experts provide services to the other components of the system through
10
the server manager. Its clients are the layers of the presentation system, other
experts or external entities, namely the external sources or the application.
The server manager transforms the interface operations into (a collection of)
messages to the inference engine, collecting the answers, and responding to
the client. Two interfaces, context expert and user expert , allow the inference
engine to acquire knowledge respectively from the user expert and the context
expert. Finally, the acquisition interface supports access to other unde ned
servers. This is necessary when the knowledge of the expert depends on other
factors in addition to user and context. As an example, the application expert
has to interact with the application to request application data.
External knowledge sources

KB . . . . KB
Acquisition Integrity
Interface Checking

Inference Engine

KBS
Clients

Server Context Expert User Expert


Manager Interface Interface

Context Expert User Expert

Fig. 6: General structure of an expert module


It is often useful and intuitive to have a logically structured view of a knowl-
edge base. This is the reason of the presence of several knowledge bases. Real
systems are not required to satisfy this logical distinction in an implementa-
tion. For instance, the implementation could combine all knowledge in a single
knowledge base, or structure it under other points of view.
The architecture of an expert module is not directly constrained by the class of
IMMPSs. This means that the notion of expert module is generic and suitable
for other applications as well. Although it will be constrained to a certain ex-
tent when instantiating clients and external knowledge sources, it still remains
general enough to allow for interactions with unspeci ed servers/clients. This
is actually the role that the external servers/clients play in our architecture:
They are supposed to model interactions with unspeci ed systems. This al-
lows to merge the user expert with a functionally equivalent module of another
application - for instance, the input system. The table below summarizes our
instantiation of the knowledge server.
11
Modules of the Knowledge Server
Expert Ext. Know. Sources Clients Knowledge
Application Application Layers Application knowledge
External servers External clients
Context External servers Layers Other-Experts Context model
External clients
User External servers Layers Other-Experts User model
External clients
Design External servers Layers Design knowledge
External clients

4 Using the Reference Model


In this section we demonstrate how the reference model can be used to re-
describe existing systems in a common terminology. MMI2 and WIP have
been chosen, because two of the authors were involved in the development of
these systems.
4.1 MMI2
The system MMI2 (A Multi-Modal Interface for Man Machine Interaction with
Knowledge Based Systems) was developed with the purpose of demonstrating
the architecture and development method required to produce large scale co-
operative interfaces to knowledge-based systems 11. Within the project, two
demonstrators about local and wide area network design were produced 12 . The
main concerns in the MMI2 project were the architecture notion of \expert"
and the use of a common meaning representation (CMR). By \expert" it is
meant a module performing speci c tasks, with its own private data structures
and which allows a suciently coherent set of processes to be gathered in a
single module. This corresponds to our notion of module. CMR is the common
communication language among the components of the system. Communica-
tion between the application and the application expert is in the language of
the application. In our terminology, CMR is a knowledge exchange format. It is
used to support ssion and fusion of information between media and to supply
a common discourse context through which to resolve references made within
and between media. Each CMR packet contains one or more CMR acts, along
with the status, mode and time for those acts. A media eld identi es the me-
dia through which the packet was received as user input, or the one for which
it is destined as system output. MMI2 was devised for co-operative dialogues.
This implies that its components are designed for input-output {or two-way{
12
interactions between the user and the application. Since we are dealing only
with the output generation, we will isolate and describe this part of the system,
even though it is actually strongly merged with the input subsystem.
Goal formulation: The goal formulation manages input acquisition and ap-
plication data updating. In MMI2 , a goal is not directly speci ed by the user.
The goal formulation processes the (multimedia) inputs from the user, building
up a goal (or a presentational command) as the result of the fusion of several
coordinated inputs concerned with output generation. Using the terminology
of 13 , only the \attitudes" User wants to know and User wants are passed to
the presentation system. They correspond to goals and presentational com-
mands, respectively.
Control Layer: The control manager (called dialogue controller ) classi es the
CMR passed by the goal formulation into goal or presentational command. A
pair consisting of a CMR and its classi cation is called user-desire . The classi-
cation is done on the basis of the form of the CMR, and the form of previous
system presentations. The discourse model {stored in the context expert{ is
structured simply as a stack. The control manager pops the top of the stack to
decide the next action: achieving a goal, executing a presentational command,
or notifying to the goal formulation. A noti cation from the content layer can
contain some recovery strategies if the presentation process fails. These strate-
gies embody sub-goals to be achieved and noti cations to the goal formulation,
which are pushed in the discourse model.
Content Layer: The role of the content layer is to convey to the user the
system's intentions, in such a way that the rules of cooperativity in dialogue
are followed. The goal re nement module performs rst an analysis (called
informal semantics ) of the goal in input {exploiting some private knowledge{
in order to provide pragmatic, dialogue oriented functionalities beyond the
narrow range of selecting the data described by the goal (called formal seman-
tics ). The main functionalities are repair (identi cation of error conditions
which prevent straightforward handling of goals) by means of task plans, ex-
planations (the determination and provision of presentations that are more in-
formative than formal semantics alone would give), and clari cation (assessing
goals, checking their validity, etc.). A submodule called communication plan-
ner produces communication acts which are then forwarded to the modules for
content selection and media selection, respectively. The content selection mod-
ule interprets the communication acts by decomposing them in terms of simple
predicates directly interpretable by the application expert, and then collecting
and composing the answers from it. Media selection relies on media selection
rules, which are based on cognitive studies (see 14 for an overview). Finally, a
coordination module collects the media communication acts and sends them
13
to the layout layer.
Layout Layer: The layout manager (called interface expert in MMI2 ) main-
tains a record of window positions and provides locations for new windows. It
passes media communication acts received from the content layer to the media
design, media realization and coordination. The output media in the MMI2
are: English, French and Spanish language, and graphics. The graphics media
design and realization is supervised by a graphics manager, which has at its
disposal a number of graphics tools, including tables, bar charts, pie charts,
scatter plots, and network tools.
Presentation Layer: The coordination module mainly consists of a window
manager (SunView/X-windows), together with a component dispatching the
unrendered presentations to the suitable device interface. The only (output)
device is the display.
User Expert: The user expert dynamically acquires and stores knowledge
about the users, including: The user's general knowledge of the domain of ap-
plication (e.g., which domain objects the user knows about) the user's knowl-
edge of the MMI2 system (e.g., which MMI2 commands the user knows about)
user's preferences (e.g., which currency the user would prefer) a stereotype hi-
erarchy of users in the domain, which allows multiple inheritance from dierent
stereotypes the way that human experts decide that an interlocutor belongs
to a particular stereotype. Information about the current user is derived from
the discourse {stored in the context expert{ or even by the user himself (e.g.,
changing his stereotype) through a graphical interface to show the inheritance
network of user stereotypes, the knowledge within a particular user model, the
predicates permitted in user models, and any inconsistencies between beliefs in
a user model, or derived from its parents. The graphical interface is modelled
as an external client of the user expert.
Application and Application Expert: In MMI2 the application is an ex-
pert system (called NEST: a Network design Expert SysTem) providing knowl-
edge on network design (location of machines, graphics objects, etc.) In partic-
ular, its knowledge base contains all the de nitions of the needed objects, i.e.
both the various network components and the topological information relative
to the buildings. The role of the application expert is to provide the data
described by a (part of a) goal and the denotations of the symbols used in the
CMR packets, by exploiting the application knowledge, according to the user
preferences (e.g., currency, unit of measure preferred, etc.) and the context.
Context Expert: This module provides contextual functionalities {essentially
anaphora and ellipses resolution{ which are involved in the contextual process-
ing of each move and joins an MMI2 standard rei ed world with a rei ed world

14
of the application. The MMI2 standard world contains conceptual labels repre-
senting the rei ed objects of the layout, the graphics objects and the word sense
representatives that are relevant to the communication for English, French and
Spanish languages. The world of the application is represented by means of
the relevant terms of the application and the word senses representatives that
are relevant to the communication for English, French and Spanish languages.
Design Expert: The design expert provides knowledge about design con-
straints, cognitive impact of the used media, device model and media charac-
teristics. The knowledge is tailored to the user preferences and to the context.

4.2 WIP
The design of the WIP system 15 (WIP stands for the German abreviation of
Knowledge-Based Presentation of Information ) was in
uenced by the obser-
vation that communication is always situated (i.e., depends on some context).
This is taken into account by considering the user's preferences and design
constraints (called generation parameters .) A further basic assumption in the
WIP system is that, not only the generation of text and dialog contributions,
but also the design multimedia presentations are planning tasks. The cur-
rent prototype of WIP generates multimedia explanations and instructions on
assembling, using or maintaining physical devices. The major design goals of
WIP are the generation of coordinated presentations from a common represen-
tation, the adaptation of these presentations to the intended target audience
and situation and the incrementality of all processes constituting the design
and realization of the multimedia output. In addition, page layout is addressed
as a rethorical force.
Goal Formulation: In WIP goal formulation can be done by the user via
a menu-interface. This interface allows the user to modify the generation pa-
rameters and to choose a goal to be achieved (in MMI2 , instead, the goal is
built up by the goal formulation as a function of the user's inputs). A goal is
expressed as a mental state which the presentation viewer is to come about.
Control Layer: In WIP the selection of the next goals to be accomplished
is done by a subcomponent of the presentation planner. Thus WIP's presen-
tation planner actually spans over two layers of our generic reference architec-
ture. In contrast to MMI2, WIP only generates non-interactive presentations.
Therefore, no recovery strategy is provided by the content layer when the user
interrupts a presentation.
Content Layer: At the heart of the presentation system is a parallel top-
down planning module. Its task is to nd a presentation strategy for the given
15
goal by incrementally generating a re nement-style plan in the form of a di-
rected acyclic graph (DAG) by means of some presentation strategies. They
re
ect general presentation knowledge or they embody more speci c knowl-
edge (provided by the application expert) of how to present a certain subject.
The leaves of the planned DAG are speci cations for elementary multimedia
communication acts, which are elaborated by the media design and realization
in the layout layer. WIP's presentation planner instantiates the content layer
module in the reference model since it is responsible for goal renement, con-
tent selection, media selection, and coordination . Whereas in the MMI2 system
the goal re nement, media selection, and coordination tasks are performed se-
quentially , in the WIP system they are performed concurrently. The reason
for using such an integrated approach is that interdependencies between these
processes can be handled within a uniform processing mechanism and that the
approach also allows for incremental output generation.
Layout Layer: The WIP layout manager stores a set of document types, to-
gether with some layout constraints for each. In addition, communicative acts
asking for reply (e.g., is visible(object)) are answered by the layout manager,
after dispatching the request to the appropriate modules for media design and
realization. In WIP, these are modules for generating 3D-graphics, German
natural language and English natural language. In illustrated instructions for
technical equipment, graphics are used in order to accomplish presentation
tasks, such as depicting a domain object in a certain state, showing an ob-
ject's location, or visualizing the course of an action. The developers of WIP
operationalized certain 2D and 3D illustration techniques frequently used by
human illustrators: The formalization is based on a compositional semantics
of pictures. Using graphical design strategies, graphics design is in principle
a goal-driven planning process. However, it does not seem feasible to strictly
separate a graphics design and realization phase, as some realization operators
have side eects which are computationally expensive to anticipate. A solution
to this problem is to interleave graphics design and realization and to allow for
feedback. Noteworthy is also that WIP does a ne-grained coordination of text
and graphics generation. For example, WIP is able to generate cross-media
16 deictic assertions like \The on/o switch is located in the upper-left part
of the picture" and referring expressions which are itself composed of dierent
media such as showing a picture together with the assertion: \The switch on
the frontside."
Presentation Layer: The presentation layer of the WIP system comprises a
window manager and an interface to a postscript printer.
User Expert: The WIP stereotype user model distinguishes novice and ex-
pert users. User's goals, preferences and knowledge are stored in the knowledge
16
bases. The user model is updated after a goal has been achieved. From that
time on, the user is supposed to know the information conveyed by the pre-
sentation of the goal.
Application and Application Expert: The application knowledge is partly
codi ed as propositions in a terminological logics 17 and partly as geometric
wire-frames for the 3D-graphics generation. The propositionally represented
knowledge is used both for the generation of text and graphics, as the main
source of knowledge about the domain.
Context Expert: The context knowledge consists of a document design plan ,
and some predicates for managing referring expressions. The former is incre-
mentally built by WIP's presentation planner. The leaves of the resulting
document design plan are speci cations for elementary media communication
acts, such as speech acts and pictorial acts, which are to be accomplished by
WIP's text and graphics generator, respectively.
Design Expert: The design expert of WIP stores knowledge on generation
parameters (e.g., constraints on the document layout, like short/long presenta-
tion, etc.) given by the user, layout constraints dynamically inferred from the
context and the user preferences and, nally, knowledge on device availability
which is acquired from the operating system modelled as external server/client.
5 Conclusions
While signi cant results and expertise have been gained from building the
rst generation of IMMPSs, there have been no promising attempts to bring
together the dierent lines of expertise from across disciplines and viewpoints
and assemble them into a sound corpus of scienti c theory. The purpose of this
paper was to outline a preliminary version of a reference model for IMMPSs.
The proposed model may be summarized by the equation:
\IMMPS = Layers + Experts"
The design of the model re
ects both a decomposition of multimedia genera-
tion into logically distinct subtasks (represented as Layers) and a separation
of these tasks from the knowledge sources (called Experts) which might be
exploited to accomplish them. The model is meant as a rst step towards
a broader agreement of the scienti c and industrial community on the topic.
Further work and discussions are necessary to obtain a re ned reference model
which eventually can be forwarded to the relevant bodies devoted to standard-
ization activities. However, the current model can already be used to analyze
and compare real systems and may provide guidance for the development of
new IMMPSs.

17
References
1. S. K. Feiner and K.R. McKeown. Automating the Generation of Coordinate
Multimedia Explanations. In Maybury 18 .
2. Computer Graphics Reference Model. International Standard Organization,
ISO/IEC IS 11072, 1992.
3. F. Halasz and M. Schwartz. The dexter hypertext reference model. ACM
Communication, Vol. 37, No. 2, 1994.
4. L. Hardman, D. Bulterman, and G. van Rossum. The amsterdam hypermedia
model: Adding time and context to the dexter model. ACM Communication,
Vol. 37, No. 2, 1994.
5. KQML Advisory Group. An Overview of KQML: A Knowledge Query and
Manipulation Language. http://retriever.cs.umbc.edu:80/kqml/.
6. G. E. Pfa, editor. User Interface Management Systems: Proceedings of the
Seeheim Workshop. Springer Verlag, 1985.
7. F. Roth and E. Heey. Intelligent Multimedia Presentation Systems: Research
and Principles. In Maybury 18 .
8. J.R. Searle. What is a speech act? In M. Black, editor, Phylosophy in America,
pages 221{239. 1965.
9. E. Andre and T. Rist. Towards a Plan-Based Synthesis of Illustrated Docu-
ments. In Proceedings of ECAI '90, Stockholm, 1990.
10. M. T. Maybury. Planning Multimedia Explanations Using Communcative
Acts. In Intelligent Multimedia Interfaces 18 .
11. M.D. Wilson. Enhancing multimedia interfaces with intelligence. Multimedia
systems and applications, 1995.
12. The MMI2 Demonstrator Systems: A Multi-Modal Interface for Man Machine
Interaction with Knowledge Based Systems. Technical Report RAL-94-016,
Rutherford Appleton Laboratory, UK, 1994.
13. D. Sedlock, G. Doe, M. Wilson, and D. Trotzig. Formal and informal interpre-
tation for co-operative dialogue.
14. H.R. Chappel, M. D. Wilson, and B. Cahour. Engineering User Models to
Enhance Multi-modal Dialogue. In J.A. Larson and C. Unger, editors, En-
gineering for Human-Computer Interaction, pages 297{313. Elsevier Science
Publishers, Amsterdam, 1992.
15. E. Andre, W. Finkler, W. Graf, T. Rist, A. Schauder, and W. Wahlster. WIP:
The Automatic Synthesis of Multimodal Presentations. In Maybury 18 .
16. E. Andre and T. Rist. Referring to World Objects with Text and Pictures. In
Proceedings of Coling '94, Osaka, 1994.
17. J. Heinsohn, D. Kudenko, B. Nebel, and H. J. Protlich. RAT - representation
of actions using terminological logics. Technical report, DFKI, Saarbrucken,
Germany, 1992.
18. M. Maybury, editor. Intelligent Multimedia Interfaces. AAAI/The Mit Press,
1993.

18

You might also like