Professional Documents
Culture Documents
1 Introduction
Observing an increasing number of applications where the \manual" author-
ing of presentations is no longer feasible, the development of mechanisms for
the automated generation of multimedia presentations has become a shared
goal across many disciplines. Since the use of multiple media for conveying
information does not guarantee eective and intelligible presentations per se,
these mechanisms need to be intelligent in the sense that they are able to
draw appropriate design decisions. Up to now, a lot of research and develop-
ment work has been conducted addressing aspects of automating multimedia
presentation generation, and even some large-size prototypes of IMMPSs have
been developed (e.g., MMI2 , WIP, MIPS, COMET 1 ). However, no generic
model has emerged so far. Each project began from scratch, relying only on
1
the past experience of the developers. Thus it is not surprising that there is
no agreement on the terminology to be used, on the functional de nition of
an IMMPS, or on a generic architecture which re
ects the logical structure of
processes for the generation of multimedia presentations.
With the proposal for a reference model for IMMPSs, this paper tries to ll a
major methodological gap and thus may provide a sound basis for ongoing and
future developments of IMMPSs. The agreement on a reference model in the
eld will have several advantages. First of all, there is a general motivation for
having a reference model for any class of related computing systems. Among
other things, a reference model will help to analyze and compare existing sys-
tems of a certain class on the basis of a common generic architecture, and by
means of a common terminology. Moreover, the generic architecture being the
core of a reference model will foster the modular development of future large-
size systems, as each module can be assigned a well de ned role (in terms of
the reference model), with well de ned interfaces between system components
and to other systems.
Our reference model is targeted towards the class of systems (or components
of superordinate systems) whose task is to present information to the user in
an e ective way. Hereby, the attribute e ective means, that the particular
information needs of the individual users are best met under a given set of
presentation constraints, such as resource limitations and the user's knowledge
and style preferences. Since in the vast majority of non-trivial applications the
information needs will vary from user to user and from situation to situation,
a presentation system should be able to
exibly generate various presentations
for one and the same information content to be communicated. Having identi-
ed the class of systems to be captured by the model, let's turn our attention
at the general guidelines that have driven our design decisions:
Adequate modularisation: To facilitate the development and compari-
son of practical large-size systems, the reference model must comprise a
modularisation of a generic process for multimedia presentation gener-
ation. Idealy, this modularization breaks down the generation process
into logically distinct and computationally feasible subtasks.
Appropriate degree of abstraction: The reference model should, on the
one hand, re
ect the peculiarities of multimedia generation. On the other
hand, it should be general enough to capture the whole class of IMMPSs.
Certainly, the generic architecture should abstract from concrete imple-
mentations, as it is always possible to rely on dierent mechanisms to
accomplish a single generation subtask, and dierent formats for repre-
senting knowledge are always a choice.
2
Identication and classication of knowledge sources: As intelligent pre-
sentation design is a knowledge intensive task, the reference model should
exhibit the basic set of logically distinct knowledge sources which are
required for multimedia presentation generation. The reference model
should also make clear how processes and knowledge sources are related
to each other. In particular, private knowledge sources, i.e., sources for
which a single owner component can be located, should be distinguished
from those sources which are shared among several components.
Modeling of shared sources in the client-server paradigm: To facilitate
sharing of knowledge sources (in the generic architecture as well as in
concrete system implementations) the latter should be modelled following
the client-server paradigm. Such sources will be referred to as expert
modules they are deemed to serve requests from client modules, possibly
belonging to other systems.
Openess to other standards: Multimedia generation comprises subtasks
which have been dealt with in other disciplines. Therefore, the model
should be open to combine with existing or potential standards in these
disciplines. For example, the Computer Graphics Reference Model 2 may
be used to instantiate the subcomponent for graphics generation in the
generic architecture of our model. Existing reference models for hyper-
media presentations such as the Dexter model 3 or the AHM (Amster-
dam Hypermedia Model 4) may be used for the description of gener-
ated presentation speci cations. Also one may rely on a standardized
language for the exchange of knowledge between components, such as
KQML (Knowledge Query and Manipulation Language 5 ). Vice versa,
the reference model can itself become a component of a superordinate
model, such as the well known Seeheim Model 6 which has been proposed
as a generic architecture for user interfaces.
The rest of the paper is organized as follows. First, some basic notions are
introduced in order to characterize the class of intelligent multimedia presen-
tation systems . The core of the model, which is a generic architecture of
a generation process for multimedia presentations, is given in section 3. To
demonstrate the use of the model, two existing IMMPSs are redescribed with
the proposed terminology and architecture in section 4.
2 Basic Notions
Presentation systems are designed for achieving goals by means of presen-
tations that are perceivable and, consequently, subject to be interpreted by
3
their intended user, who is always assumed to be a human being.
Presentation goals which have to be achieved by the system, possibly ac-
companied by a set of presentational commands or presentation constraints
aecting the presentation process, constitute the primary input to the pre-
sentation system. Both goals and commands are assumed to be formulated
outside the presentation system, i.e., by the user, or by an external system, or
by a superordinate component in case the presentation system is a part of a
larger system. Goals and commands include a high level reference to collec-
tions of data together with the purpose (or the intention) for communicating
information. As an example, a goal can be formulated as an encoding of the
fact that the system has to inform the user of the location of a speci c switch
in a control panel. Similarly, a presentational command can be a representa-
tion of design constraints such as the minimum size of the switch for being
perceivable, if graphics is chosen as the presentation media.
Application data/knowledge provides the semantical grounding of each
presentation which may be generated by a system. As with presentation goals
and commands, it is assumed that application data/knowledge is part of the
input to a presentation system. In other words, there must be an external
source (e.g., an application system or a database) that makes available to the
presentation system the application data necessary for achieving posted goals.
Following the example above, the application can provide the switch and the
control panel geometries required to identify the switch position. However, the
request of informing the user of the switch position could have been formulated
by a dierent system.
A multimedia presentation is a presentation which comprises material in
dierent media such as text, graphics, sound, video, etc. However, it cannot
be denied that there is a lot of confusion about the fundamental notion of
media . One reason of this misunderstanding is certainly due to the fact that
the term media is used with dierent meanings in dierent contexts, such as in
semiotics, psychology, telecommunications, or computer science. The closely
related term modality is a further source of confusion. Some authors seem to
use both terms as synonyms, while others tend to reserve modality for input
only, and vice versa, media for output. Again others try to assign these terms
to dierent categories, e.g., medium for the system used to convey a piece of in-
formation, whereas modality denotes the way in which a presentation is sensed.
Unfortunately, to the best of our knowledge, we are not aware of any attempt
so far that clearly distinguishes both concepts from each other without becom-
ing inconsistent when generalizing the de nition to cover all possible media
which may occur in a presentation. To overcome this problem we only employ
4
the term medium in our model. We follow 7 and regard any single mechanism
by which to express information as a medium . Consequently, multimedia is an
adjective referring to the use of multiple media.
A multimedia presentation system in that sense is a presentation system
that starts from a given presentation goal as triggering input, and generates a
multimedia presentation as output.
Intelligent multimedia presentation systems are essentially knowledge-
based systems, i.e., systems which rely on the notion of knowledge , as the
justi ed true beliefs of an abstraction of data in order to draw appropriate
design decisions. The knowledge present in an IMMPS may include a number
of knowledge sources dicult to explicitly quantify. However, starting from
the experience of previous systems, the following knowledge sources seem to
be indispensable:
(i) knowledge to reason about application data, such as term interpretation
and data characterization
(ii) the discourse model and the context referencies
(iii) the user's goals and plans, capabilities, attitudes, knowledge or beliefs
(iv) general knowledge about the design of multimedia presentations in terms
of design constraints, cognitive theories, media models and device models,
together with the design/ realization for a speci c media in a multimedia
presentation, the system's characteristics relevant to the presentation
process and the knowledge for identifying an eective and coordinated
collection of media for the representation and presentation of application
data.
The knowledge present in an IMMPS may include, and often this is the case,
other explicitly encoded knowledge sources than those described. However, as
a working de nition we will classify a multimedia presentation system as an
intelligent system, if it exploits at least the above mentioned knowledge sources
to achieve presentation goals.
3 Outline of the Reference Model
The core of the reference model is a generic architecture for IMMPSs. The
conceptual design of this architecture re
ects a modularisation of the design
process into layers which are responsible for particular subtasks, and a sep-
aration of shared knowledge sources (called experts) from the layers. Fig. 1
introduces the IMMPS reference architecture. It is composed of four layers,
5
namely control, content, layout and presentation . In addition to their private
knowledge, these layers may exploit explicitly encoded knowledge provided by
a knowledge server which is composed of the application , context , user and de-
sign expert modules. The system receives goals and presentational commands
from the goal formulation , designs a presentation asking the application for
application knowledge, and nally communicates the presentation to the user.
Furthermore, a knowledge exchange format (i.e., a language and a protocol) is
used for the exchange of information among the system components. However,
this format is not explicitly dealt with here. It is only assumed that it underlies
each arrow between communicating components.
Goal Application
formulation
Knowledge Server
Control
External servers
Notifications
Expert Present. commands
Layout Layer
Design
Unrendered pres.
Presentation
Layer Present. commands Notifications
Expert
Presentation Layer
Presentations
User User
Although the goal formulation and the application play dierent roles, in most
real systems they appear as a single component, usually called the applica-
tion . Our distinction emphasizes the dierence between the availability of
application knowledge/data, and the use of that knowledge/data to satisfy
presentation goals.
Also, it is necessary to model some interactions of the system with external en-
tities. This is the case, for example, when the user expert acquires knowledge
directly from the user, or from an input subsystem. In the gures, interac-
tions with unspeci ed systems, called external servers/clients , are denoted by
dashed arrows. The modeling could be done by means of data capture metales
as in the Computer Graphics Reference Model 2 .
6
3.1 Layers
The four layers of the reference architecture are collectively responsible for re-
alizing the goal being requested. Each layer performs a transformation on its
input and delivers the result to the next layer in the hierarchy as illustrated in
Fig. 2. The input to the presentation generation process are goals/commands.
While being processed through the layers, goals/commands are eventually
transformed into multimedia presentations. Although the
ow of information
during this transformation is primary \top-down", it must be stressed that
many interactions can occur between components. This is indicated by the
noti cation arrows in Fig. 2. Each layer noti es to the layer above the result
of its processing, possibly together with some additional information, such as
causes of failure, recovery strategies, explanations, etc. The individual layers
are described in the following paragraphs.
Control Layer
The control layer consistently coordinates the presentation process in time.
Its task is to choose the next goal to be achieved or the next presentational
command to be executed. The task of goal selection may occur in multimedia
generation for two dierent reasons. First, it might be the case that the (ex-
ternal) goal formulator poses a set of goals to the presentation system, either
all at once or in an undetermined piece-meal fashion. In the latter case, an
already started generation process may be interrupted in order to achieve a
new incoming goal immediately. The second reason is that presentation goals
given as input to the system may be complex, so that they have to be split
into sets of less complex goals. While the decomposition of goals will be done
by the goal renement module of the content layer, the control layer ought
still to decide in which order the subgoals will actually be processed. Deciding
the next goal to be achieved can be a very simple task. It could involve only
popping a goal from the discourse model in the context expert. On the other
hand, some private knowledge may be present in the control manager, if the
decision involves more complex reasoning. The role of the GF Interface mod-
ule is to convert the messages to be exchanged between goal formulation and
control manager in the appropriate format. This task is not carried out by the
underlying knowledge exchange format, since the transformation depends on
the speci c representation language(s) used in the IMMPS.
Content Layer
The content layer (cf. Fig. 4) serves to determine a set of so called media com-
munication acts. These are communication acts enriched by semantic/logic
7
content they are assigned to a particular medium which should be used to
convey that content. Communication acts are an extension of speech acts 8
to multimedia communications (see also 9 10). The task is accomplished by
means of coordinated goal re nement, content selection, and media selection.
As presentation goals may be formulated at a high level of abstraction, they
need to be re ned accordingly. The term goal renement is used to capture
both the decomposition of a goal into a set of subgoals and the specialization
of a goal. During goal re nement, the content of the nal presentation will be
determined. The module for content selection assists in carrying out this task.
In a concrete system, this module may appear as a retrieval and lter compo-
nent which communicates with the application expert. The output of the goal
re nement process is a set of communication acts and a structural description
of the relations that may hold between these acts. As soon as communicative
acts have been worked out, it must be decided which media shall be employed
to convey them best. For this task, the reference architecture includes a com-
ponent for media selection . Since there are many dependencies among choices,
the architecture also foresees a coordination module. Its mission is to merge
communication acts passed by the modules for content and media selection.
In case of successful coordination, the acts {now called media communication
acts{ are handed over to to the layout layer. The coordination process can,
however, require negotiations among goal re nement, content/media selection,
and the subsequent layout layer. Consider the switch example, where the min-
imum size speci ed for the switch to be perceived, might be too big for the
control board to t in the indicated window.
Control layer
Goal Formulation
Content Media
Knowledge server
Selection Selection
Control Manager
Coordination
Layout
Manager
Design
Design
Design
Design
Graphics
Design
Media Design
Audio
Video
.........
Text
Knowledge Server
Media Realization
Realization
Realization
Realization
Realization
Realization
Audio
Graphics
...........
Video
Text
Layout
Coordination
PresentationLayer
KB . . . . KB
Acquisition Integrity
Interface Checking
Inference Engine
KBS
Clients
14
of the application. The MMI2 standard world contains conceptual labels repre-
senting the rei ed objects of the layout, the graphics objects and the word sense
representatives that are relevant to the communication for English, French and
Spanish languages. The world of the application is represented by means of
the relevant terms of the application and the word senses representatives that
are relevant to the communication for English, French and Spanish languages.
Design Expert: The design expert provides knowledge about design con-
straints, cognitive impact of the used media, device model and media charac-
teristics. The knowledge is tailored to the user preferences and to the context.
4.2 WIP
The design of the WIP system 15 (WIP stands for the German abreviation of
Knowledge-Based Presentation of Information ) was in
uenced by the obser-
vation that communication is always situated (i.e., depends on some context).
This is taken into account by considering the user's preferences and design
constraints (called generation parameters .) A further basic assumption in the
WIP system is that, not only the generation of text and dialog contributions,
but also the design multimedia presentations are planning tasks. The cur-
rent prototype of WIP generates multimedia explanations and instructions on
assembling, using or maintaining physical devices. The major design goals of
WIP are the generation of coordinated presentations from a common represen-
tation, the adaptation of these presentations to the intended target audience
and situation and the incrementality of all processes constituting the design
and realization of the multimedia output. In addition, page layout is addressed
as a rethorical force.
Goal Formulation: In WIP goal formulation can be done by the user via
a menu-interface. This interface allows the user to modify the generation pa-
rameters and to choose a goal to be achieved (in MMI2 , instead, the goal is
built up by the goal formulation as a function of the user's inputs). A goal is
expressed as a mental state which the presentation viewer is to come about.
Control Layer: In WIP the selection of the next goals to be accomplished
is done by a subcomponent of the presentation planner. Thus WIP's presen-
tation planner actually spans over two layers of our generic reference architec-
ture. In contrast to MMI2, WIP only generates non-interactive presentations.
Therefore, no recovery strategy is provided by the content layer when the user
interrupts a presentation.
Content Layer: At the heart of the presentation system is a parallel top-
down planning module. Its task is to nd a presentation strategy for the given
15
goal by incrementally generating a re nement-style plan in the form of a di-
rected acyclic graph (DAG) by means of some presentation strategies. They
re
ect general presentation knowledge or they embody more speci c knowl-
edge (provided by the application expert) of how to present a certain subject.
The leaves of the planned DAG are speci cations for elementary multimedia
communication acts, which are elaborated by the media design and realization
in the layout layer. WIP's presentation planner instantiates the content layer
module in the reference model since it is responsible for goal renement, con-
tent selection, media selection, and coordination . Whereas in the MMI2 system
the goal re nement, media selection, and coordination tasks are performed se-
quentially , in the WIP system they are performed concurrently. The reason
for using such an integrated approach is that interdependencies between these
processes can be handled within a uniform processing mechanism and that the
approach also allows for incremental output generation.
Layout Layer: The WIP layout manager stores a set of document types, to-
gether with some layout constraints for each. In addition, communicative acts
asking for reply (e.g., is visible(object)) are answered by the layout manager,
after dispatching the request to the appropriate modules for media design and
realization. In WIP, these are modules for generating 3D-graphics, German
natural language and English natural language. In illustrated instructions for
technical equipment, graphics are used in order to accomplish presentation
tasks, such as depicting a domain object in a certain state, showing an ob-
ject's location, or visualizing the course of an action. The developers of WIP
operationalized certain 2D and 3D illustration techniques frequently used by
human illustrators: The formalization is based on a compositional semantics
of pictures. Using graphical design strategies, graphics design is in principle
a goal-driven planning process. However, it does not seem feasible to strictly
separate a graphics design and realization phase, as some realization operators
have side eects which are computationally expensive to anticipate. A solution
to this problem is to interleave graphics design and realization and to allow for
feedback. Noteworthy is also that WIP does a ne-grained coordination of text
and graphics generation. For example, WIP is able to generate cross-media
16 deictic assertions like \The on/o switch is located in the upper-left part
of the picture" and referring expressions which are itself composed of dierent
media such as showing a picture together with the assertion: \The switch on
the frontside."
Presentation Layer: The presentation layer of the WIP system comprises a
window manager and an interface to a postscript printer.
User Expert: The WIP stereotype user model distinguishes novice and ex-
pert users. User's goals, preferences and knowledge are stored in the knowledge
16
bases. The user model is updated after a goal has been achieved. From that
time on, the user is supposed to know the information conveyed by the pre-
sentation of the goal.
Application and Application Expert: The application knowledge is partly
codi ed as propositions in a terminological logics 17 and partly as geometric
wire-frames for the 3D-graphics generation. The propositionally represented
knowledge is used both for the generation of text and graphics, as the main
source of knowledge about the domain.
Context Expert: The context knowledge consists of a document design plan ,
and some predicates for managing referring expressions. The former is incre-
mentally built by WIP's presentation planner. The leaves of the resulting
document design plan are speci cations for elementary media communication
acts, such as speech acts and pictorial acts, which are to be accomplished by
WIP's text and graphics generator, respectively.
Design Expert: The design expert of WIP stores knowledge on generation
parameters (e.g., constraints on the document layout, like short/long presenta-
tion, etc.) given by the user, layout constraints dynamically inferred from the
context and the user preferences and, nally, knowledge on device availability
which is acquired from the operating system modelled as external server/client.
5 Conclusions
While signi cant results and expertise have been gained from building the
rst generation of IMMPSs, there have been no promising attempts to bring
together the dierent lines of expertise from across disciplines and viewpoints
and assemble them into a sound corpus of scienti c theory. The purpose of this
paper was to outline a preliminary version of a reference model for IMMPSs.
The proposed model may be summarized by the equation:
\IMMPS = Layers + Experts"
The design of the model re
ects both a decomposition of multimedia genera-
tion into logically distinct subtasks (represented as Layers) and a separation
of these tasks from the knowledge sources (called Experts) which might be
exploited to accomplish them. The model is meant as a rst step towards
a broader agreement of the scienti c and industrial community on the topic.
Further work and discussions are necessary to obtain a re ned reference model
which eventually can be forwarded to the relevant bodies devoted to standard-
ization activities. However, the current model can already be used to analyze
and compare real systems and may provide guidance for the development of
new IMMPSs.
17
References
1. S. K. Feiner and K.R. McKeown. Automating the Generation of Coordinate
Multimedia Explanations. In Maybury 18 .
2. Computer Graphics Reference Model. International Standard Organization,
ISO/IEC IS 11072, 1992.
3. F. Halasz and M. Schwartz. The dexter hypertext reference model. ACM
Communication, Vol. 37, No. 2, 1994.
4. L. Hardman, D. Bulterman, and G. van Rossum. The amsterdam hypermedia
model: Adding time and context to the dexter model. ACM Communication,
Vol. 37, No. 2, 1994.
5. KQML Advisory Group. An Overview of KQML: A Knowledge Query and
Manipulation Language. http://retriever.cs.umbc.edu:80/kqml/.
6. G. E. Pfa, editor. User Interface Management Systems: Proceedings of the
Seeheim Workshop. Springer Verlag, 1985.
7. F. Roth and E. Heey. Intelligent Multimedia Presentation Systems: Research
and Principles. In Maybury 18 .
8. J.R. Searle. What is a speech act? In M. Black, editor, Phylosophy in America,
pages 221{239. 1965.
9. E. Andre and T. Rist. Towards a Plan-Based Synthesis of Illustrated Docu-
ments. In Proceedings of ECAI '90, Stockholm, 1990.
10. M. T. Maybury. Planning Multimedia Explanations Using Communcative
Acts. In Intelligent Multimedia Interfaces 18 .
11. M.D. Wilson. Enhancing multimedia interfaces with intelligence. Multimedia
systems and applications, 1995.
12. The MMI2 Demonstrator Systems: A Multi-Modal Interface for Man Machine
Interaction with Knowledge Based Systems. Technical Report RAL-94-016,
Rutherford Appleton Laboratory, UK, 1994.
13. D. Sedlock, G. Doe, M. Wilson, and D. Trotzig. Formal and informal interpre-
tation for co-operative dialogue.
14. H.R. Chappel, M. D. Wilson, and B. Cahour. Engineering User Models to
Enhance Multi-modal Dialogue. In J.A. Larson and C. Unger, editors, En-
gineering for Human-Computer Interaction, pages 297{313. Elsevier Science
Publishers, Amsterdam, 1992.
15. E. Andre, W. Finkler, W. Graf, T. Rist, A. Schauder, and W. Wahlster. WIP:
The Automatic Synthesis of Multimodal Presentations. In Maybury 18 .
16. E. Andre and T. Rist. Referring to World Objects with Text and Pictures. In
Proceedings of Coling '94, Osaka, 1994.
17. J. Heinsohn, D. Kudenko, B. Nebel, and H. J. Protlich. RAT - representation
of actions using terminological logics. Technical report, DFKI, Saarbrucken,
Germany, 1992.
18. M. Maybury, editor. Intelligent Multimedia Interfaces. AAAI/The Mit Press,
1993.
18