Professional Documents
Culture Documents
The Human
Face of Ambient
Intelligence
Cognitive, Emotional, Affective,
Behavioral and Conversational Aspects
Volume 9
Atlantis Ambient and Pervasive Intelligence
Volume 9
Series editor
Ismail Khalil, Johannes Kepler University Linz, Linz, Austria
Aims and Scope of the Series
The book series ‘Atlantis Ambient and Pervasive Intelligence’ publishes high
quality titles in the fields of Pervasive Computing, Mixed Reality, Wearable
Computing, Location-Aware Computing, Ambient Interfaces, Tangible Interfaces,
Smart Environments, Intelligent Interfaces, Software Agents and other related
fields. We welcome submission of book proposals from researchers worldwide who
aim at sharing their results in this important research area.
For more information on this series and our other book series, please visit our
website at:
www.atlantis-press.com/publications/books
Atlantis Press
29, avenue Laumière
75019 Paris, France
I have written this book to help you to explore ambient intelligence (AmI) in all its
complexity, intricacy, variety, and breadth, the many faces of a topical subject that
encompasses so much of modern and future life’s issues and practicalities, and can
be applied and made useful to the everyday lifeworld. Indeed, AmI technology will
pervade and impact virtually every aspect of people’s lives: home, work, learning,
social, public and infotainment environments, and on the move. This vision of a next
wave in information and communication technology (ICT) with far-reaching societal
implications is postulated to offer the possibility of a killer existence, signifying that
it will alter people’s perception of the physical and social world and thus their
notions of action in it, as well as their sense of self and the sense of their relations to
each other, things, and places. AmI is a field where a wide range of scientific and
technological areas and human-directed disciplines converge on a common vision
of the future and the fascinating possibilities and enormous opportunities such future
will bring and open up (as to the numerous novel applications and services that are
more intelligent and alluring as to interaction in both real and cyber spaces) that are
created by the incorporation of computer intelligence and technology into people’s
everyday lives and environments. While the boundaries to what may become
technologically feasible and what kind of impact this feasibility may have on
humans are for the future to tell, some scientists foresee an era when the pace of
technological change and its shaping influence (progress of computer intelligence
and reliance of humans on computer technology) will be so fast, profound, and
far-reaching that human existence will be irreversibly altered.
To facilitate your embarking on exploring the realm of AmI, I have designed the
book around three related aims: to help you gain essential underpinning knowledge
and reflect on the potentials, challenges, limitations, and implications pertaining to
the realization of the AmI vision—with consideration of its revisited core notions
and assumptions; to help you develop a deeper understanding of AmI, as you make
connections between your understandings and experiences (e.g., of using computer
vii
viii Preface
There are several factors that have stimulated my innate curiosity to jump into the
ever-evolving or blossoming field of ICT and subsequently stirred my interest in
embarking on writing this book, an intellectual journey into the modern, high-tech
world. I have always been interested in and intrigued by science, technology, and
society as fields of study. The world of science and technology (S&T) has gone
through overwhelming and fast advances that have had significant intended and
unintended effects within modern societies. My interest in exploring issues at the
intersection of those fields, in particular, stems from a deep curiosity about the
contemporary world we live in as to how it functions and the patterns of changing
directions it pursues and also from a desire to meet people from different academic
and cultural backgrounds for the purpose of social and lifelong learning as an
ongoing, voluntary, and self-motivated pursuit of knowledge for good reasons.
Having always been fascinated by the mutual process where science, technology,
and society are shaped simultaneously, I have decided to pursue a special academic
career by embarking on studying diverse subject areas, which has resulted, hitherto,
in an educational background encompassing knowledge from diverse disciplines,
ranging from computer science and engineering to social sciences and humanities.
My passion for other human-directed sciences, which are of relevance to this book,
sprouted in me around the age of fifteen when I read—first out of sheer curiosity—a
mesmerizing book on cognitive and behavioral psychology in the summer of 1988.
And this passion continues to flourish throughout my intellectual and academic
journey. In recent years, I have developed a great interest in interdisciplinary and
transdisciplinary scholarly research and academic writing. Having earned several
Master’s degrees and conducted several studies in the area of ICT, I have more
specifically become interested in topical issues pertaining to AmI, including
affective and aesthetic computing, cognitive and emotional context awareness,
Preface ix
Providing guidelines for the reading of this book is an attempt to domesticate the
unruly readers—who learn, interpret, and respond in different ways. The intention
of this book is to explore the technological, human, and social dimensions of the
large interdisciplinary field of AmI. In the book, I demonstrate the scope and
x Preface
In response to the growing need for a more holistic view of AmI and a clear
collaborative approach to ICT innovation and the development of successful and
meaningful human-inspired applications, this book addresses interdisciplinary, if
not transdisciplinary, aspects of a rapidly evolving area of AmI, as a crossover
approach related to lots of computer science and artificial intelligence topics as well
as various human-directed sciences (namely cognitive psychology, cognitive sci-
ence, social sciences, humanities). Up to now, most of the books about AmI focus
their analysis on the advancement of enabling technologies and processes and their
potential only. A key feature of this book is the integration of technological, human,
social, and philosophical dimensions of AmI. In other words, its main strength lies
in the inclusiveness pertaining to the features of the humanlike understanding and
intelligent behavior of AmI systems based on the latest developments and prospects
in research and emerging computing trends and the relevant knowledge from
human and social disciplines and sub-disciplines.
No comprehensive book has, to the best of one’s knowledge, been produced
elsewhere with respect to covering the characteristics of the intelligent behavior of
AmI systems and environments—i.e., the cornerstones of AmI in terms of being
sensitive to users, taking care of needs, reacting and preacting intelligently to
spoken and gestured indications of desires, responding to explicit speech and
gestures as commands of control, supporting social processes and being social
agents in group interactions, engaging in intelligent dialog and mingling socially
with human users, and eliciting pleasant experiences and positive emotions in users
through the affective quality of aesthetic artifacts and environments as well as the
intuitiveness and smoothness of interaction as to computational processes and the
richness of interaction as to content information and visual tools.
In addition, this book explains AmI in a holistic approach—by which it can
indeed be fully developed and advanced, encompassing technological and societal
Preface xi
different perspectives. In all, people in many disciplines will find the varied
coverage of the main elements that comprise the emerging field of AmI as a
socio-technological phenomenon to be of interest. My hope is that this book will be
well suited to people living in modern, high-tech societies.
Who Did Contribute to the Book and What are Its Prospects?
xiii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 1
1.1 The Many Faces of AmI . . . . . . . . . . . . . . . . . . . . . . . . ... 1
1.1.1 The Morphing Power, Constitutive Force,
and Disruptive Nature of AmI as ICT Innovations ... 1
1.1.2 Foundational and Defining Characteristics of AmI ... 4
1.1.3 The Essence of the (Revisited) AmI Vision . . . . . ... 4
1.1.4 AmI as a Novel Approach to Human–Machine
Interaction and a World of Machine Learning. . . . ... 5
1.1.5 Human-Inspired Intelligences in AmI Systems . . . ... 6
1.1.6 Human-like Cognitive, Emotional, Affective,
Behavioral, and Conversational Aspects of AmI . . ... 8
1.1.7 Context Awareness and Natural Interaction
as Computational Capabilities
for Intelligent Behavior . . . . . . . . . . . . . . . . . . . ... 8
1.1.8 Situated Forms of Intelligence as an Emerging
Trend in AmI Research and Its Underlying
Premises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.9 Underpinnings and Open Challenges and Issues . . . . . 12
1.2 The Scope and Twofold Purpose of the Book. . . . . . . . . . . . . 15
1.3 The Structure of the Book and Its Contents . . . . . . . . . . . . . . 16
1.4 Research Strategy: Interdisciplinary and Transdisciplinary
Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 18
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 19
xv
xvi Contents
2.12.9 Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.12.10 Sociology and Anthropology (Social, Cultural,
and Cognitive) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.5 Speech Perception and Production: Key Issues and Features. .. 354
7.5.1 The Multimodal Nature of Speech Perception. . . . . .. 354
7.5.2 Vocal-Gestural Coordination and Correlation
in Speech Communication . . . . . . . . . . . . . . . . . . .. 358
7.6 Context in Human Communication . . . . . . . . . . . . . . . . . . .. 361
7.6.1 Multilevel Context Surrounding Spoken Language
(Discourse) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 362
7.6.2 Context Surrounding Nonverbal Communication
Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
7.7 Modalities and Channels in Human Communication . . . . . . . . 365
7.8 Conversational Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
7.8.1 Key Research Topics. . . . . . . . . . . . . . . . . . . . . . . . 366
7.8.2 Towards Believable ECAs . . . . . . . . . . . . . . . . . . . . 367
7.8.3 Embodied Conversational Agents (ECAs) . . . . . . . . . 367
7.8.4 Research Endeavor and Collaboration
for Building ECAs . . . . . . . . . . . . . . . . . . . . . . . .. 368
7.8.5 SAIBA (Situation, Agent, Intention, Behavior,
Animation) Framework . . . . . . . . . . . . . . . . . . . . .. 369
7.8.6 Communicative Function Versus Behavior
and the Relationship . . . . . . . . . . . . . . . . . . . . . . .. 370
7.8.7 Taxonomy of Communicative Functions
and Related Issues . . . . . . . . . . . . . . . . . . . . . . . .. 372
7.8.8 Deducing Communicative Functions
from Multimodal Nonverbal Behavior
Using Context . . . . . . . . . . . . . . . . . . . . . . . . . . .. 374
7.8.9 Conversational Systems and Context . . . . . . . . . . . .. 375
7.8.10 Basic Contextual Components in the (Extended)
SAIBA Framework . . . . . . . . . . . . . . . . . . . . . . . .. 376
7.8.11 The Role of Context in the Disambiguation
of Communicative Signals . . . . . . . . . . . . . . . . . . .. 377
7.8.12 Context or Part of the Signal . . . . . . . . . . . . . . . . .. 379
7.8.13 Contextual Elements for Disambiguating
Communicative Signals . . . . . . . . . . . . . . . . . . . . .. 380
7.8.14 Modalities and Channels and Their Impact
on the Interpretation of Utterances and Emotions . . .. 381
7.8.15 Applications of SAIBA Framework: Text-
and Speech-Driven Facial Gestures Generation . . . . .. 383
7.8.16 Towards Full Facial Animation. . . . . . . . . . . . . . . .. 386
7.8.17 Speech-Driven Facial Gestures Based on HUGE
Architecture: an ECA Acting as a Presenter . . . . . . .. 387
7.9 Challenges, Open Issues, and Limitations. . . . . . . . . . . . . . .. 389
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 393
Contents xxiii
xxvii
xxviii List of Figures
xxix
About the Author
xxxi
xxxii About the Author
book on the social shaping of AmI and the IoT as science-based technologies—a
study in science, technology, and society (STS). This book has been completed and
delivered to the publisher.
Bibri has a genuine interest in interdisciplinary and transdisciplinary research. In
light of his varied academic background, his research interests include AmI, the
IoT, social shaping of science-based technology, philosophy and sociology of
scientific knowledge, sustainability transitions and innovations, governance of
socio-technical changes in technological innovation systems, green and knowledge-
intensive entrepreneurship/innovation, clean and energy efficiency technology,
green economy, ecological modernization, eco-city and smart city, and S&T and
innovation policy. As to his career objective, he would like to take this opportunity
to express his strong interest in working as an academic or in pursuing an inter-
disciplinary Ph.D. in a well-recognized research institution or center for research
and innovation.
Chapter 1
Introduction
Since the early 1990s, computer scientists have had the vision that ICT could do
much more and offer a whole range of fascinating possibilities. ICT could weave into
the fabric of society and offer useful services—in a user-friendly, unobtrusive, and
natural way—that support human action, interaction, and communication in various
ways wherever and whenever needed (e.g., Weiser 1991; ISTAG 2001). At present,
ICT pervades modern society and has a significant impact on people’s everyday
lives. And the rapidly evolving innovations and breakthroughs in computing and the
emergence of the ensuing new paradigms in ICT continue to demonstrate that there
is a tremendous untapped potential for adding intelligence and sophistication to ICT
to better serve people and transform the way they live, by unlocking its transfor-
mational effects as today’s constitutive technology. In recent years, a range of new
visions of a next wave in ICT with far-reaching societal implications, such as AmI,
ubiquitous computing, pervasive computing, calm computing, the Internet of
Things, and so on, and how they will shape the everyday of the future have generated
and gained worldwide attention, and are evolving from visions to achievable real-
ities, thanks to the advance, prevalence, and low cost of computing devices, mini-
ature sensors, wireless communication networks, and pervasive computing
infrastructures. AmI is the most prevalent new vision of ICT in Europe. Due to its
disruptive nature, it has created prospective futures in which novel applications and
services seem to be conceivable, and consequently visionaries, policymakers, and
leaders of research institutes have placed large expectations on this technology,
mobilized and marshaled R&D resources, and inspired and aligned various stake-
holders towards its realization and delivery. As a science-based technological
innovation, it is seen as indispensable for bringing more advanced solutions for
societal problems, augmenting everyday life and social practices, and providing a
whole range of novel services to consumers.
Indeed, common to all technological innovations is that they have strong effects
on people. They are very meaningful innovations because they do offer advance-
ments in products and services that can have significant impacts on people’s
everyday lives and many spheres of society. The underlying premise is that they
have power implications in the sense that they encapsulate and form what is held as
scientific knowledge and discourse, which is one of today’s main sources of
legitimacy in knowledge production as well as policy- and decision-making in
modern society. Thanks to this legitimization capacity, technological innovations
can play a major role in engendering social transformations—in other words, the
power effects induced by scientific discourse determine their success and expansion
in society. They embody a morphing power, in that they change how society
functions, creating new social realities and reshaping how people construct their
lives. Therefore, they represent positive configurations of knowledge which have
more significant intended and unintended effects within modern society. They have
indeed widely been recognized as a vehicle for societal transformation, especially
as a society moves from one technological epoch to another (e.g., from industrial to
post-industrial/information society). Technological epoch has been over the past
few decades predominantly associated with ICT or computing, more specifically
from the 1960s, the second half of the twentieth century to the beginning of the
twenty-first century.
The first half of the twenty-first century is heralding new behavioral patterns of
European society towards technology: ICT has become more sophisticated, thanks
to innovation, and deeply embedded into the fabric of European society—social,
cultural, economic, and political structures and practices. Hence, it is instigating and
unleashing far-reaching societal change, with its constitutive effects amounting to a
major shift in the way the society is starting to function (ISTAG 2006) and is
unfolding. ICT as a constitutive technology is a vision that builds upon the AmI
vision (Ibid). In this vision, computing devices will be available unobtrusively
everywhere and by different means, supporting human action, interaction, and
communication in a wide variety of ways whenever needed. This implies that a
degree of social transformation is present in AmI scenarios, whether they are
visionary, conceived by the creators of AmI technology, extrapolated from the
presence based on their view to illustrate the potential and highlight the merits of
that technology, or substantiated, determined by findings from in-depth studies
aimed at reconciling the futuristic and innovative claims of the AmI vision and its
realistic assumptions.
AmI has been a multidisciplinary field strongly driven by a particular vision of
how the potential of ICT development can be mobilized to shape the everyday of
the future and improve the quality of people’s lives. This has been translated into
concrete strategies, whereby the AmI vision has been attributed a central role in
shaping the field of new ICT and establishing its scenarios, roadmaps, research
agendas, and projects. With this in mind, ICT as a constitutive technology repre-
sents a widening and deepening of AmI strategies at the level of societal
1.1 The Many Faces of AmI 3
The AmI vision was essentially proposed and published in 1999 by Information
Society Technologies Advisory Group (ISTAG), the committee which advises the
European Commission’s Information Society Directorate General—the Information
Society Technology (IST) program. It postulates a new paradigmatic shift in
computing and constitutes a large-scale societal discourse as a cultural manifesta-
tion and historical event caused to emerge as a result of the remaking of social
knowledge, with strong implications for reshaping the overarching discourse of
information society. It offers technological evolution driven by integrating intelli-
gence in ICT applications and services in ways to transform computer technology
into an integral part of everyday life, and thus make significant impacts on society.
This vision has been promoted by, and attracted a lot of interest from, government
science and technology agencies, research and innovation policy, industry, tech-
nical research laboratories, research centers, and universities.
Materialized as a multidisciplinary field within—or rather inspired by the vision
of—ubiquitous computing, attracting substantial research, innovation, funding, and
public attention as well as leading to the formation of many consortiums and research
groups, AmI provides an all-encompassing and far-reaching vision on the future of
ICT in the information society, a vision of the future information society where
everyday environments will be permeated by computer intelligence and technology:
humans will be surrounded and accompanied by advanced sensing and computing
devices, multimodal user interfaces, intelligent software agents, and wireless and
ad-hoc networking technology, which are everywhere, invisibly woven into the
fabric of space, in virtually all kinds of everyday objects (e.g., computers, mobile
phones, watches, clothes, furniture, appliances, doors, walls, paints, lights, books,
paper money, vehicles, and even the flow of water and air), in the form of tiny
microelectronic processors and networks of miniature sensors and actuators, func-
tioning unobtrusively in the background of human life and consciousness. The
logically malleable nature of this computationally augmented everyday environment
—seamlessly composed of a myriad of heterogeneous, distributed, networked, and
always-on computing devices, available anytime, anywhere, and by various means,
enabling people to interact naturally with smart objects which in turn communicate
with each other and other people’s objects and explore their environment—lends
itself to a limitless functionality: is aware of people’s presence and context; adaptive,
responsive, and anticipatory to their desires and intentions; and personalized and
tailored to their needs, thereby intelligently supporting their daily lives through
providing unlimited services in new, intuitive ways and in a variety of settings.
The essence of AmI vision lies in that the integration of computer intelligence and
technology into people’s everyday lives and environments may have positive,
1.1 The Many Faces of AmI 5
AmI is heralding and giving rise to new ways of interaction and interactive appli-
cations, which strive to take the holistic nature of the human user into account—e.g.,
context, behavior, emotion, intention, and motivation. It has emerged as a result of
amalgamating recent discoveries in human communication, computing, and cogni-
tive science towards natural HCI—AmI technology is enabled by effortless (implicit
human–machine) interactions attuned to human senses and adaptive and proactive to
users. Therefore, as an integral part of everyday life, AmI promises to provide
efficient support and useful services to people in an intuitive, unobtrusive, and
natural fashion. This is enabled by the human-like understanding of AmI interactive
systems and environments and the varied features of their intelligent behavior,
manifested in taking care of needs, reacting and pre-acting intelligently to verbal and
nonverbal indications of desires; reacting to explicit spoken and gestured com-
mands; supporting social processes and being competent social agents in social
interactions; engaging in intelligent dialogs and mingling socially with human users;
and eliciting pleasant experiences and positive emotions in users through the
6 1 Introduction
The class of AmI applications and systems on focus, under investigation and
review, exhibits human-like understanding and intelligent supporting behavior in
relation to cognitive, emotional, social, and conversational processes and behaviors
of humans. Human-like understanding can be described as the ability of AmI
systems (agents) to analyze (or interpret and reason about) and estimate (or infer)
what is going on in the human’s mind (e.g., ideally how a user perceives a given
context—as an expression of a certain interpretation of a situation), which is a form
of mindreading or, in the case of conversational systems, interpreting communi-
cative intents, as well as in his/her body and behavior, which is a form of facial-,
gestural-, corporal-, and psychophysiological reading or interpreting and disam-
biguating multimodal communicative behavior), as well as what is happening in the
social, cultural, and physical environments. Here, context awareness technology is
given a prominent role. Further, input for computational understanding processes is
observed information acquired from multiple sources (diverse sensors) about the
human user’s cognitive, emotional, psychophysiological, behavioral, and social
states over time (i.e., human behavior monitoring), and dynamic models for the
human’s mental, physiological, conversational, and social processes. For the
human’s psychological processes, such a model may encompass emotional states
and cognitive processes and behaviors. For the human’s physiological processes,
such a model may include skin temperature, pulse, galvanic skin response, and
heart rate (particularly in relation to emotions), and activities. For the human’s
conversational processes, such a model may comprise a common knowledge base,
communication errors and recovery schemes, and language and, ideally, its cog-
nitive, psychological, neurological, pragmatic, and sociocultural dimensions. For
the human’s social processes, such a model may entail adaptation, cooperation,
accommodation, and so on as forms of social interaction. AmI requires different
types of models: cognitive, emotional, psychophysiological, behavioral, social,
cultural, physical, and artificial environment. Examples of methods for analysis on
the basis of these models include facial expression analysis, gesture analysis, body
analysis, eye movement analysis, prosodic features analysis, psychophysiological
analysis, communicative intents analysis, social processes analysis, and so forth.
In light of the above, the class of AmI applications showing intelligent behavior are
required to be equipped with context awareness (the ability to sense, recognize, and
react to contextual variables) and natural interaction (the use of natural modalities
like facial expressions, hand gestures, body postures, and speech) as human-like
1.1 The Many Faces of AmI 9
human cognition and thus action—the subtlety and intricacy of meaning attribution
to (perception of) context and the evolving nature of the latter, i.e., details of
context are too subjective, elusive, fluid, and difficult to recognize to be modeled
and encoded. Indeed, sensor data are limited or imperfect and existing models must
necessarily be oversimplified. That is to say, they suffer from limitations pertaining
to comprehensiveness, dynamicity, fidelity with real-world phenomena, and
robustness, and thus are associated with inaccuracies. Therefore, an ambience
created based on sensor information about human’s states and behaviors and
computational (dynamic) models for aspects of human functioning may not be the
most effective way of supporting human users in their daily activities or assisting
them in coping with their tasks, by providing services that are assumed—because of
their delivery being done in a particular knowledgeable manner—to improve the
quality of their life. One implication of an irrelevant behavior of the system is a loss
of control over the environment and of freedom to act within it or interact with its
artifacts. Consequently, some scholars called for shunning modeling and antic-
ipating actions as much as possible, particularly in relation to such application
domains as smart home environments and highly demanding circumstances or
tasks. Especially, the vision of the intelligent, caring environment seems to fail to
bring real benefits. Indeed, if the artificial actors (devices) gain control over human
users, it becomes questionable as to whether they will bring an added value to users.
Hence, AmI applications and environments should instead focus on (and ideally
possess) the capacity to respond to unanticipated circumstances of the uses’ actions,
an aspect which in fact makes interactive computer systems, so far, fundamentally
different form human communication (e.g., Hayes and Reddy 1983). Regardless,
user interfaces in AmI systems should, given the constraints of existing technolo-
gies and from an engineering perspective, minimalize modeling and anticipating
actions of the growing variety of users and an infinite richness of interactive situ-
ations. Many of these critical perspectives can be framed within a wider debate over
invisible and disappearing user interfaces underlying AmI technology and the
associated issues pertaining to the weakness of plans as resources in situated actions
(e.g., Suchman 1987, 2005), the negotiation among people involved in situations
(e.g., Lueg 2002), context as an issue of negotiation through interaction (e.g.,
Crutzen 2005), exposing ambiguities and empowering users (e.g., José et al. 2010),
and the development of critical user participatory AmI applications (e.g., Criel and
Claeys 2008).
All in all, the basic premise of situated forms of intelligence is to design AmI
technology that can capitalize on what humans have to offer in terms of intelligence
already embedded in scenarios, practices, and patterns of everyday life and envi-
ronment and hence leverage on their own cognitive processes and behavior to
generate alternative forms of situated intelligence. Instead of AmI technology being
concerned with offering to decide and do things for people or perform tasks on their
behalf—and hence modeling the elusive and complex forms of real-life intelli-
gence, it should offer people further resources to act and thus choose and think,
thereby engaging them more actively by empowering them into the process of
spur-of-the-moment situated cognition and thus action. This entails assisting people
12 1 Introduction
in better assessing their choices and decisions and thus enhancing their actions and
activities. Overall, a quest for situated forms of intelligence is seen by several
eminent scholars as an invigorating alternative for artificial intelligence research
within AmI.
It has widely been acknowledged that the realization (and evolution) of AmI vision
pose enormous challenges and a plethora of open issues in the sense of not being
brought to a conclusion and subject to further thought. AmI is an extremely complex,
complicated, and intricate phenomenon, with so many unsettled questions.
Specifically, AmI is a subject of much debate and current research in the area is
ambiguous; it involves a lot of details or so many parts that make it difficult to deal
with; and it entails many complexly arranged and interrelated elements and factors
which make it demanding to resolve. Therefore, there is a lot to tackle, address,
solve, draw out and develop, and unravel or disentangle in the realm of AmI.
AmI as a multidisciplinary paradigm or ‘crossover approach’ AmI is linked to a lot
of topics related to computer science, artificial intelligence, human-directed scientific
areas (e.g., cognitive psychology, cognitive science, cognitive neuroscience, etc.),
social sciences (e.g., sociology, anthropology, social psychology, etc.), and
humanities (e.g., human communication, single and interdisciplinary subfields of
linguistics, communication and cultural studies, philosophy, etc.). The relevance of
these disciplines and technological and scientific areas to AmI stems from its vision
being far-reaching and all-encompassing in nature, and postulates a paradigmatic
change in computing and society.
To create computer systems that emulate (a variety of aspects of) human
functioning for use in a broadened scope of application domains is no easy task. It
has been recognized by high-profile computer scientists and industry experts to be a
daunting challenge. Building AmI systems poses real challenges, many of which
pertain to system engineering, design, and modeling. This involves the develop-
ment of enabling technologies and processes necessary for the proper operation of
AmI systems and the application and convergence of advanced theoretical models
from many diverse scientific and social disciplines in terms of their simulation and
implementation into machines or computer systems within the areas of AmI and
artificial intelligence as mimicked in the form of processes and behaviors as well as
computationally formalized knowledge.
AmI research and development needs to address and overcome many design,
engineering, and modeling challenges. These challenges concern human-inspired
applications pertaining to various application domains, such as context-aware
computing, emotion-aware/affective computing, and conversational systems. They
include, and are not limited to: paradigms that govern the assemblage of such
systems; techniques and models of knowledge, representation, and run-time
behavior of such systems; methodologies and principles for engineering context
1.1 The Many Faces of AmI 13
human (verbal and nonverbal) communication have made major strides in pro-
viding new insights into understanding cognitive, emotional, physiological, neu-
rological, behavioral, and social aspects of human functioning. Although much
work still needs and remains to be done, complex models have been developed for a
variety of aspects of human contexts and processes and implemented in a variety of
application domains within the areas of AmI and artificial intelligence and at their
intersection. Though these models have yielded and achieved good results in lab-
oratory settings, they tend to lack usability in real life. If knowledge about human
functioning is computationally available—models of human contexts and processes
are represented in a formal and explicit form and developed based on concrete
interdisciplinary research work, and incorporated in everyday human environment
in computer systems that observe the contexts (e.g., psychological and physio-
logical states) and monitor the actions of humans along the changes in the envi-
ronment in circumambient ways; then these systems become able to carry out a
more in-depth, human-like, analysis of the human context and processes, and thus
come up with well-informed actions in support of the user in terms of cognitive,
emotional, and social needs. Moreover, advanced knowledge from the
human-directed sciences needs to be amalgamated, using relevant frameworks for
combining the constituents, to obtain the intended functioning of human-inspired
AmI systems in terms of undertaking actions in a knowledgeable manner in some
applications (e.g., biomedical systems, healthcare systems, and assisted living
systems) while applying a strengthened collaboration with humans in others (e.g.,
cognitive and emotional context-aware systems, affective systems, and
emotion-aware systems). This can result in a close coupling between the user and
the agent, where the human users partner with the system in the sense of negoti-
ating, about what actions of the former are suitable for the situation of the latter.
However, human-directed disciplines involve volatile theories, subjectivities,
pluralism of theoretical models, and a plethora of unsolved issues. Adding to this is
the generally understood extraordinary complexity of social sciences as well as
humanities (especially human communication with regard to pragmatics, sociolin-
guistics, psycholinguistics, and cultural dimensions of nonverbal communication
behavior), due to the reflexive nature of social and human processes as well as the
changing and evolving social and human conditions. This is most likely to carry over
its effects to modeling and implementation of knowledge about processes and
aspects of human functioning into AmI systems—user interfaces—and their
behavior. Computational modeling of human behavior and context and achieving a
human-like computational understanding (analysis of what is going on in the mind
and behavior of humans (and) in their physical, social, and cultural environments)
has proven to be the most challenging task in AmI and artificial Intelligence alike. In
fact, these challenges are argued to be the main reason why AmI is failing, hitherto,
to scale from prototypes to realistic environments and systems. While machine
learning and ontological techniques, coupled with recent hybrid approaches, have
proven to hold a tremendous potential to reduce the complexity associated with
modeling human activities and behaviors and situations of life, the fact remains that
most of the current reasoning processes—intelligent processing of sensor data and
1.1 The Many Faces of AmI 15
This book addresses the human face of AmI in terms of the cognitive, emotional,
affective, behavioral, and conversational features that pertain to the various appli-
cation domains where AmI systems and environments show human-like under-
standing and exhibit intelligent behavior in relation to a variety of aspects of human
functioning—states and processes of human users. These systems and environments
are imminent since they use essentially the state-of-the-art enabling technologies and
related computational processes and capabilities underlying the functioning of AmI
as (nonhuman) intelligent entities. It also includes ambitious ideas within the same
realm whose realization seems to be still far away due to unsolved technological and
social challenges. In doing so, this book details and elucidates the rich potential of
AmI from a technological, human, and social perspective; the plethora of difficult
encounters and bottlenecks involved in making AmI a reality, a deployable and
achievable paradigm; and the existing and future prerequisite enabling technologies.
It moreover discusses in compelling and rich ways the recent discoveries and
established knowledge in human-directed sciences and their application and con-
vergence in the ambit of AmI as a computing paradigm, as well as the application
and convergence of major current and future computing trends.
Specifically, this book has a twofold purpose. First, it aims to explore and assess
the state-of-the-art enabling technologies and processes; to review and discuss the
key computational capabilities underlying the AmI functioning; and to identify the
main challenges and limitations associated with the design, modeling, and imple-
mentation of AmI systems and applications, with an emphasis on various aspects of
human functioning. This is intended to inform and enlighten various research
communities of the latest developments and prospects in the respective research
area as well as to provide a seminal reference for researchers, designers, and
engineers who are concerned with the design and development of cognitive and
emotional context-aware, affective, socially intelligent, and conversational systems
and applications. Second, it intends to explore and discuss the state-of-the-art
human-inspired AmI systems and applications (in which knowledge from the
human-directed sciences such as cognitive science, cognitive psychology, social
sciences, and humanities is incorporated) and provide new insights and ideas on
how these could be further enhanced and advanced. This class of AmI applications
is augmented with aspects of cognitive intelligence, emotional intelligence, social
intelligence, and conversational intelligence at the cognitive and behavioral level.
More in detail, the main aim of this book is to support scholars, scientists, experts,
16 1 Introduction
The book is divided into two distinct but interrelated sections, each dealing with a
different dimension or aspect of AmI and investigation at an advanced level. It opens
with a scene setting chapter (this chapter, Sect. 1.1). This chapter contains a more
detailed introduction to Part I and Part II of the book. The major themes, issues,
assumptions, and arguments are introduced and further developed and elaborated on
in subsequent chapters. It moreover includes an outline of the book’s scope, pur-
pose, structure, and contents, in addition to providing a brief descriptive account of
the research strategy espoused in the book: a combination of interdisciplinary and
transdisciplinary approaches. Part I (Chaps. 2–6) looks at different permutations of
enabling technologies and processes as well as core computational capabilities.
As to enabling technologies and processes, it covers sensor and MMES technology,
multi-sensor systems and data fusion techniques, capture/recognition approaches,
pattern recognition/machine learning algorithms, logical and ontological modeling
methods and reasoning techniques, hybrid approaches to representation and rea-
soning, conventional and multimodal user interfaces, and software and artificial
intelligent agents. As to core computational capabilities, it comprises context
awareness, implicit and natural interaction, and intelligent behavior in relation to
human-inspired AmI applications.
Part II (Chaps. 7–9) deals with a variety of human-inspired AmI applications,
namely cognitive and emotional context-aware systems, affective/emotion-aware
systems, multimodal context-aware affective systems, context-aware emotionally
intelligent systems, socially intelligent systems, explicit natural and touchless
systems, and conversational systems. It provides a detailed review and synthesis of
a set of theoretical concepts and models pertaining to emotion, emotional intelli-
gence, cognition, affect, aesthetics, presence, nonverbal communication behavior,
linguistics, pragmatics, sociolinguistics, psycholinguistics, and cognitive linguis-
tics. With their explanatory power, these conceptual and theoretical frameworks,
coupled with the state-of-the-art enabling technologies and computational processes
and capabilities, can be used to inform the design, modeling, evaluation, and
implementation of the respective human-inspired AmI applications.
Parts I and II are anchored in, based on the nature of the topic, philosophical and
analytical discussions, worked out with great care and subtlety of detail, along with
theoretical and practical implications and alternative research directions, high-
lighting an array of new approaches to and emerging trends around some of the core
concepts and ideas of AmI that provide a more holistic view of AmI.
With its three parts, the book comprises 10 chapters, which have a standardized
scholastic structure, making them easy to navigate. Each chapter draws on some of
1.3 The Structure of the Book and Its Contents 17
the latest developments and prospects as to findings in the burgeoning research area
of AmI, along with voices of high-profile and leading scholars, scientists, and
experts. Moreover, the chapters can be used in various ways, depending on the
reader’s interests: as a stand-alone overview of contemporary (theoretical, empiri-
cal, and analytical) research on AmI; as a seminal resource or reference for pro-
spective students and researchers embarking on studies in computing, ICT
innovation, science and technology, and so forth. In addition, Part II can be used as
complement to the Part I chapters, enabling students, researchers, and others to
make connections between their perceptions and understandings, relevant research
evidence, and theoretical concepts and models, and the experiences and visions of
computer scientists and AmI creators and producers.
The contents of this book are structured to achieve two outcomes. Firstly, it is
written so the reader can read it easily from end to end—based on his/her back-
ground and experience. It is a long book that is packed with value to various classes
of readers. Whether the reader diligently sits down and reads it in a few sessions at
home/at the library or goes through a little every now and then, he/she will find it
interesting to read and accessible—especially those readers with passionate interest
in or deep curiosity about new visions of the future of technology. Secondly, it is
written so that the reader can call upon specific parts of its content information in an
easy manner. Furthermore, each of its chapters can be read on its own or in
sequence. It is difficult to assign a priority rating to the chapters given that the book
is intended for readers with different backgrounds and interests, but the reader will
get the most benefit from reading the whole book in the order it is written, so that
he/she can gain a better understanding of the phenomenon of AmI. However, if you
are short of time and must prioritize, start with those chapters you find of highest
priority based on your needs, desires, or interests. Hence, as to how important the
topics are, the choice is yours—based on your own reflection and assessment.
Overall, the book has been carefully designed to provide you with the material
and repository required to explore the realm of AmI. AmI is an extremely complex,
intricate, varied, and powerful phenomenon, and it is well worth exploring in some
depth. The best way to enable the reader to embark on such an exploration is to
seamlessly integrate technological, human, and social dimensions in ways that build
on and complement one another. Achieving this combination is the main strength
and major merit of this book and succeeding in doing so is meant to provide the
reader with valuable insights into imminent AmI technologies, their anticipated
implications for and role in people’s future lives, potential ways of addressing and
handling the many challenges in making it a reality, and alternative research
directions for delivering the essence of the AmI vision. This is believed to be of no
small achievement in its own right, and certainly makes the book rewarding reading
experience for anyone who feels they could benefit from a greater understanding of
the domain. I encourage you to make the most of this opportunity to explore AmI,
an inspiring vision of next wave of ICT with far-reaching implications on modern,
high-tech society. While some of us might shy away from foreseeing what the
future era of AmI will look like, it is certain to be a very different world. I wish you
well on the exploration journey.
18 1 Introduction
This research work operates out of the understanding that advances in knowledge
and an ever-increasing awareness of the complexity of emerging phenomena have
led researchers to pursue multifaceted problems that cannot be resolved from the
vantage point of a single discipline or sometimes an interdisciplinary field as an
organizational unit that crosses boundaries between academic disciplines. AmI is a
phenomenon that is too complex and dynamic to be addressed by single disciplines
or even an interdisciplinary field. Indeed, impacts of AmI applications in terms of
context, interaction, and intelligent behavior, for instance, well exceed the highly
interdisciplinary field, thereby the need for espousing transdisciplinary perspective
as to some of its core aspects. Besides, it is suggested that interdisciplinary efforts
remain inadequate in impact on theoretical development for coping with the
changing human circumstance. Still, interdisciplinary approach remains relevant to
look at AmI as a field of tension between social, cultural, and political practices and
the application and use of new technologies or an area where a wide range of
technological and scientific areas come together around a common vision of the
future and the enormous opportunities such future will open up. Thus, in the context
of AmI, some research topics remain within the framework of disciplinary research,
and other research topics cannot be accomplished in disciplinary research. In light
of this, both interdisciplinary and transdisciplinary research approaches are
espoused in this book to investigate the AmI phenomenon. Adopting this research
strategy has made it possible to flexibly respond to the topic under inquiry and
uncover the best way of addressing it. It is aimed at contributing to an integral
reflection upon where the still-emerging field of AmI is coming from and where it is
believed it should be heading.
Seeking to provide a holistic understanding of the AmI phenomenon for a
common purpose or in the pursuit of a common task, interdisciplinary approach
insists on the mixing of disciplines and theories. Thereby, it crosses boundaries
between disciplines to create new perspectives based on interactional knowledge
beyond these disciplines. It is of high importance because it allows interlinking
different analyses and spilling over disciplinary boundaries. The field of AmI
should see the surge of interdisciplinary research on the incidence of technological,
social, cultural, political, ethical, and environmental issues as well as strategic
thinking toward the social acceptance of AmI technology, with the capacity to
create methods for innovation and policy. Pooling various perspectives and mod-
ifying them so to become better suited to AmI as a problem at hand is therefore very
important to arrive at a satisfactory form of multidisciplinary AmI. The subject of
AmI appears differently when examined by different disciplines, for instance, his-
tory, sociology, anthropology, philosophy, cultural studies, innovation studies, and
so on.
Transdisciplinary approach insists on the fusion of disciplines with an outcome
that exceeds the sum of each, focusing on issues that cross and dissolve disciplinary
1.4 Research Strategy: Interdisciplinary and Transdisciplinary Approaches 19
References
Aarts E, de Ruyter B (2009) New research perspectives on Ambient Intelligence. J Ambient Intell
Smart Environ 1(1):5–14
Bibri SE (2014) The potential catalytic role of green entrepreneurship—technological eco–
innovations and ecopreneurs’ acts—in the structural transformation to a low-carbon or green
economy: a discursive investigation. Master Thesis, Department of Economics and
Management, Lund University
Criel J, Claeys L (2008) A transdisciplinary study design on context-aware applications and
environments. A critical view on user participation within calm computing. Observatorio
(OBS*) J 5:057–077
20 1 Introduction
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc 3
(4):219–232
Gunnarsdóttir K, Arribas-Ayllon M (2012) Ambient Intelligence: a narrative in search of users.
Lancaster University and SOCSI, Cardiff University, Cesagen
Hayes PJ, Reddy RD (1983) Steps toward graceful interaction in spoken and written man-machine
communication. Int J Man Mach Stud 19(3):231–284
ISTAG (2001) Scenarios for Ambient Intelligence in 2010. ftp://ftp.cordis.lu/pub/ist/docs/
istagscenarios2010.pdf. Viewed 22 Oct 2009
ISTAG (2006) Shaping Europe’s future through ICT. http://www.cordis.lu/ist/istag.htm. Viewed
22 Mar 2011
José R, Rodrigues H, Otero N (2010) Ambient intelligence: beyond the inspiring vision. J Univ
Comput Sci 16(12):1480–1499
Lindblom J, Ziemke T (2002) Social situatedness: Vygotsky and beyond. In: The 2nd international
workshop on epigenetic robotics: modeling cognitive development in robotic systems, pp 71–78,
Edinburgh, Scotland
Lombard M, Ditton T (1997) At the heart of it all: the concept of presence. J Comput Mediat
Commun 3(2)
Lueg C (2002) Operationalizing context in context-aware artifacts: benefits and pitfalls. Hum
Technol Interface 5(2)
Markopoulos P, de Ruyter B, Privender S, van Breemen A (2005) Case study: bringing social
intelligence into home dialogue systems. ACM Interact 12(4):37–43
Nijholt A, Rist T and Tuijnenbreijer K (2004) Lost in Ambient Intelligence? In: Proceedings of
CHI 2004, Vienna, Austria, pp 1725–1726
Punie Y (2003) A social and technological view of ambient intelligence in everyday life: what
bends the trend? Eur Media Technol Everyday Life Netw 2000–2003, Institute for Prospective
Technological Studies Directorate General Joint Research Center European Commission
Schindehutte M, Morris MH, Pitt LF (2009) Rethinking marketing—The entrepreneurial
imperative. Pearson Education, New Jersey
Suchman L (1987) Plans and situated actions: the problem of human-machine Communication.
Cambridge University Press, Cambridge
Suchman L (2005) Introduction to plans and situated actions II: human-machine reconfigurations,
2nd expanded edn. Cambridge University Press, New York/Cambridge
ter Maat M, Heylen D (2009) Using context to disambiguate communicative signals. In:
Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals, LNAI 5398.
Springer, Berlin, pp 164–169
Vilhjálmsson HH (2009) Representing communicative function and behavior in multimodal
communication. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals:
cognitive and algorithmic issues. Springer, Berlin, pp 47–59
Weiser M (1991) The computer for the 21st century. Sci Am 265(3):94–104
Part I
Enabling Technologies and Computational
Processes and Capabilities
Chapter 2
Ambient Intelligence: A New Computing
Paradigm and a Vision of a Next Wave
in ICT
2.1 Introduction
AmI has emerged in the past 15 years or so as a new computing paradigm and a
vision of a next wave in ICT. It postulates a paradigmatic shift in computing and
offers a vision of the future of ICT—with far-reaching societal implications, rep-
resenting an instance of the configuration of social-scientific knowledge. AmI is a
multidisciplinary field (within ubiquitous computing) where a wide range of sci-
entific and technological areas and human-directed sciences converge on a common
vision of the future and the enormous opportunities and immense possibilities such
future will open up and bring, respectively, that are created by the incorporation of
machine intelligence into people’s everyday lives. In other words, it is said to hold
great potential and promise in terms of social transformations. As such, it has
increasingly gained legitimacy as an academic and public pursuit and discourse in
the European information society: scientists and scholars, industry experts and
consortia, government science and technology agencies, science and technology
policymakers, universities, and research institutes and technical laboratories are
making significant commitments to AmI.
By virtue of its very definition, implying a certain desired view on the world, it
represents more a vision of the future than a reality. And as shown by and known
from preceding techno-visions and forecasting studies, the future reality is most
likely to end up being very different from the way it is initially envisioned. Indeed,
techno-visions seem to face a paradox, in that they fail to balance between inno-
vative and futuristic claims and realistic assumptions. This pertains to unreasonable
prospects, of limited modern applicability, on how people, technology, and society
will evolve, as well as to a generalization or oversimplification of the rather specific
or complex challenges involved in enabling future scenarios or making them for
real. Also, crucially, techno-utopia is a relevant risk in such a strong focus on
ambitious and inspiring visions of the future of technology. Techno-utopian dis-
courses surround the advent of new technological innovations or breakthroughs, on
the basis of which these discourses promise revolutionary social changes. The
central issue with techno-visions is the technologically deterministic view under-
lying many of the envisioned scenarios, ignoring or falling short in considering the
user and social dynamics involved in the innovation process.
Furthermore, yet recent years have—due to the introduction of technological
innovations or breakthroughs and their amalgamation with recent discoveries in
human-directed sciences—witnessed an outburst of claims for new paradigms and
paradigm shifts in relation to a plethora of visions of next waves in ICT, social
studies of AmI include—a kind of new paradigm and paradigm shift epidemic.
Many authors and scholars have a tendency to categorize AmI—as recent techno-
scientific achievements or advances in S&T—as a paradigm and thus paradigm
shift in relation to computing, ICT, society, and so on. In fact, there has been a near
passion for labeling new technological visions as paradigms and paradigm shifts as
a way to describe a certain stage of technological development within a given
society. While such visions emanate from the transformational effects of comput-
ing, predominately, where paradigm and paradigm shift actually hold, they still
entail a lot of aspects of discursive nature in the sense of a set of concepts, ideas,
claims, assumptions, premises, and categorizations that are historically contingent
and socio-culturally specific and generate truth effects accordingly. The underlying
assumption is that while AmI as new technological applications is the result of
scientific discovery or innovation, it is still directed towards humans and targeted at
complex, dynamic social realities made of an infinite richness of circumstances, and
involving intertwined factors and situated social dynamics. In other words, AmI has
been concerned with people-centered approaches in the practice of technological
development. I therefore argue that there is a computing paradigm profile relating to
AmI as to ubiquitous computing—which constitutes one its major visions, but there
is no paradigm in society—nor should there be. Accordingly, AmI as a techno-
logical vision involves paradigmatic, non-paradigmatic, pre-paradigmatic, and post-
paradigmatic dimensions, as well as discursive aspects.
However, at the technological level, AmI is characterized by human-like cog-
nitive and behavioral capabilities, namely context awareness, implicit and natural
interaction, and intelligence (cognitive, emotional, social, and conversational). By
being equipped with advanced enabling technologies and processes and what this
entails in terms of miniature smart sensors, sophisticated data processing and
machine learning techniques, and hybrid modeling approaches to knowledge rep-
resentation and reasoning, AmI should be capable to think and behave intelligently
in support of human users, by providing personalized, adaptive, responsive, and
proactive services in a variety of settings: living spaces, workspaces, social and
public places, and on the move. With the progress in the fields of microelectronics
(i.e., miniaturization and processing power of sensing and computing devices),
embedded systems, wireless and mobile communication networks, and software
intelligent agents/user interfaces, the AmI vision is evolving into a deployable and
achievable computing paradigm.
The aim of this chapter is to give insights into the origin and context of the AmI
vision; to shed light on the customary assumptions behind the dominant vision of
2.1 Introduction 25
AmI, underlying many of its envisioned scenarios, and provide an account on its
current status; to outline and describe a generic typology for AmI; to provide an
overview on technological factors behind AmI and the many, diverse research
topics and areas associated with AmI; to introduce and describe human-directed
sciences as well as artificial intelligence and their relationships and contributions to
AmI; and to discuss key paradigmatic, non-paradigmatic, pre-paradigmatic, and
post-paradigmatic dimensions of AmI. Moreover, this chapter intends to provide
essential underpinning conceptual tools for exploring the subject of AmI further in
the remaining chapters.
Much of what characterizes AmI can be traced back to the origins of ubiquitous
computing. AmI as a new computing paradigm has evolved as a result of an
evolutionary technological development, building upon preceding computing par-
adigms, including mainframe computing, desktop computing, multiple computing,
and ubiquitous computing (UbiComp). As a vision of a next wave in ICT, a kind of
shift in computer technology and its role in society, AmI became widespread and
prevalent in Europe about a decade after the emergence of the UbiComp vision in
the USA, a future world of technology which was spotted in 1991 by Mark Weiser,
chief scientist at the Xerox Palo Alto Research Center (PARC) in California, when
he published a paper in Scientific American which spoke of a third generation of
computing systems, an era when computing technology would vanish into the
background. Weiser (1991) writes: ‘First were mainframes, each shared by lots of
people. Now we are in the personal computing era, person and machine staring
uneasily at each other across the desktop. Next comes ubiquitous computing, or the
age of calm technology, when technology recedes into the background of our lives.
Alan Kay of Apple calls this “Third Paradigm” computing’. So, about 25 years ago,
Mark Weiser predicted this technological development and described it in his
influential article “The Computer for the 21st Century” (Weiser 1991). Widely
credited as the first to have coined the term ‘ubiquitous computing’, Weiser alluded
to ir as omnipresent computing devices and computers that serve people in their
everyday lives, functioning unobtrusively in the background of their consciousness
and freeing them from tedious routine tasks. In a similar fashion, the European
Union’s Information Society Technologies Advisory Group (ISTAG) used the term
‘ambient intelligence’ in its 1999 vision statement to describe a vision where
‘people will be surrounded by intelligent and intuitive interfaces embedded in
everyday objects around us and an environment recognizing and responding to the
presence of individuals in an invisible way’ (ISTAG 2001, p. 1). In the European
vision of AmI (or the future information society), ‘the emphasis is on greater
user-friendliness, more efficient services support, user-empowerment, and support
for human interactions’ (ISTAG 2001, p. 1). Issues on key difference between the
two visions and concepts are taken up in the next section.
26 2 Ambient Intelligence …
The research within UbiComp and the development of the vision in the USA has
been furthered in concert with other universities, research centers and laboratories,
governmental agencies, and industries. Among the universities involved include
MIT, Berkeley, Harvard, Yale, Stanford, Cornell, Georgia Tech’s College of
Computing, and so on. As an example, MIT has contributed significant research in
the field of UbiComp, notably Hiroshi Ishii’s Things That Think consortium at the
Media Lab and Project Oxygen. It is worth pointing out that research undertaken at
those universities has been heavily supported by government funding, especially by
the Defense Advanced Research Projects Agency (DARPA), which is the central
research and development organization for the Department of Defense (DoD), and
the National Science Foundation (NSF) as an independent federal agency. Many
other corporations have additionally undertaken UbiComp research, either on their
own or in consortia with other companies and/or universities. Among which
include: Microsoft, IBM, Xerox, HP, Intel, Cisco Systems, Sun Microsystems, and
so forth.
Inspired by the UbiComp vision, the AmI vision in Europe was promoted by
certain stakeholders—a group of scholars and experts, a cluster of ICT companies,
research laboratories, governmental agencies, and policymakers. AmI was origi-
nally developed in 1998 by Philips for the time frame 2010–2020 as a vision on the
future of ICT (consumer electronics, telecommunications, and computing) where
user-friendly devices support ubiquitous information, communication, and enter-
tainment. In 1999, Philips joined the Oxygen alliance, an international consortium
of industrial partners within the MIT Oxygen project. In 2000, plans were made to
construct a feasibility and usability facility dedicated to AmI. A major step in
developing the vision of AmI in Europe came from the Information ISTAG, a group
of scholars and industry experts who first advanced the vision of AmI in 1999. In
this year, ISTAG published a vision statement for the European community
Framework Program (FP) 5 for Research and Technological Development
(RTD) that laid down a challenge to start creating an AmI landscape. During 2000,
a scenario exercise was launched to assist in further developing a better under-
standing of the implications of this landscape as a collaborative endeavor between
the Joint Research Center’s Institute for Prospective Technological Studies
(IPTS-JRC) and DG Information Society, and the development and testing of
scenarios involved about 35 experts from across Europe. In parallel with the
development of the AmI vision at Philips at the time ISTAG working group was
chaired by CEO of Philips Industrial Research Dr. Martin Schuurmans, a number of
other initiatives started to explore AmI further with the launch of and the funneling
of expenditure in research projects. ISTAG continued to develop the vision under
the IST program of the European Union (EU) FP6 and FP7 for RTD. It has since
1999 made consistent efforts for ICT to get an increased attention and a higher pace
of development in Europe (Punie 2003). Indeed, it is a strong promoter of, and a
vocal champion for, the vision of AmI. With ISTAG and the EU IST RTD funding
program, huge efforts have been made in the EU to mobilise research and industry
towards laying the foundation of an AmI landscape and realizing the vision of AmI.
There has been a strong governmental and institutional support for AmI. AmI has
2.2 The Origin and Context of the AmI Vision 27
Notwithstanding the huge financial support and funding provided and the intensive
research in academic circles and in the industry, coupled with the strong interest
stimulated by European policy makers, the current state of research and develop-
ment shows that the vision of AmI is facing enormous challenges and hurdles in its
progress towards realization and delivery in Europe. Demonstrably, the ‘AmI
28 2 Ambient Intelligence …
history of ICT and social studies of new technologies have shown in terms of the
importance of social innovation as an ingredient in technology innovation and the
central role of multiple methods of participative design as innovation instruments, as
well as failing to make explicit the consideration for human values and concerns in
the design choices and decisions that will shape AmI technology. Seeing the user as
a shaper of technology, these views call upon a more active participatory role in
technology innovation and design, and thereby challenge the passive role of the user
as a mere adopter of new technologies (e.g., Alahuhta and Heinonen 2003).
Furthermore, putting emphasis on the user in AmI innovation research plays a
key role in the development of related applications and services. However, it is
unquestionable that the current or dominant user-centered design approaches—
albeit originated from participatory design—place the user at such a central stage as
they often claim, which goes together with the vision of AmI (e.g., Criel and Claeys
2008). As to the humanistic philosophy of technology design, experiences have
shown that it is very challenging to give people the lead and consider their values
and concerns in the ways systems and applications are developed and applied. In
other words, the difficulty with human-centered design approach is that it is far from
clear how this can be achieved due to the availability of little knowledge and the
lack of tools to integrate user behavior as a parameter in system design and product
and service development (Punie 2003; Riva et al. 2003). As to social innovation,
while it is considered decisive in producing successful technological systems as
well as in the acceptance of new technologies, it is often seen to be very challenging
as well as too costly and time consuming for technology creators to take onboard.
Regardless, in reference to the AmI vision, Aarts and Grotenhuis (2009) underscore
the need for a value shift: ‘…we need a more balanced approach in which tech-
nology should serve people instead of driving them to the max’. This argument
relates to social innovation in the sense of directing the development of new
technologies towards responding to users’ needs and addressing social concerns. In
other words, technological development has to be linked with social development.
The underlying assumption is that failing to make this connection is likely to result
in people rejecting new technologies and societal actors in misdirecting and mis-
allocating resources, e.g., mobilization of professionals, experts, companies, and
technical R&D.
Nevertheless, as many argue, visions of the future of technology are meant to
provoke discussion or promote debate and depict plausible futures or communicate
possible scenarios, adding to mobilizing and marshalling resources and inspiring
and aligning key stakeholders into the same direction. As Gunnarsdóttir and
Arribas-Ayllon (2012, p. 30) point out, ‘[t]he AmI vision emerges from a pedigree
of expectations about the future of computing…The original scenarios are central to
making up new worlds and building expectations around prospective lifestyles and
users. Rhetorically, they contribute to conditions that make visions of AmI seem-
ingly possible. But they also engender capacities to investigate what is actually
possible. Incorporating new challenges and anticipating problems modulates the
course of expectations… New visions are adapted to accommodate contingent
futures—uncertainties about design principles, experiences, identities and
30 2 Ambient Intelligence …
AmI and UbiComp share many similar assumptions, claims, ideas, terminologies,
and categorizations. They depict a vision of the future information society where
everyday human environment will be permeated by computer intelligence and
technology: humans will be surrounded and accompanied by advanced sensing and
computing devices, intelligent multimodal interfaces, intelligent software agents,
and wireless and ad-hoc (a system of network elements combined to form a network
entailing no planning) networking technology, which are everywhere, invisibly
embedded in human natural surroundings, in virtually all kinds of everyday objects
in order to make them smart. This computationally augmented everyday environ-
ment is aware of people’s presence and context, and is adaptive, responsive, and
anticipatory to their needs and desires, thereby intelligently supporting their daily
lives through providing unlimited services in new, intuitive ways and in a variety of
settings. In other words, smart everyday objects can interact and communicate with
each other and other people’s objects, explore their own environment (situations,
events, locations, user states, etc.), and interact with human users, therefore helping
them to cope with their daily tasks in a seamless and intuitive way.
While AmI and UbiComp visions converge on the pervasion of microprocessors
and communication capabilities into everyday human environments and thus the
omnipresence and always-on interconnection of computing resources and services,
AmI places a particularly strong focus on intelligent interfaces that are sensitive to
users’ needs, adaptive to and anticipatory of their desires and intentions, and
responsive to their emotions. Philips has distinguished AmI from UbiComp as a
related vision of the future of technology, by characterizing the AmI vision as a
2.4 AmI Versus UbiComp as Visions 31
its incorporation into diverse spheres of living and working, and the applications
(Punie 2003).
In fact, the vision of the future of technology is reflected in a variety of terms that
closely resemble each other, including, in addition to AmI and UbiComp, pervasive
computing, ubiquitous networking, everywhere computing, sentient computing,
proactive computing, calm computing, wearable computing, invisible computing,
affective computing, haptic computing, the Internet of Things, Things that Think,
and so on. These terms are used by different scholars and industry players to
promote the future vision of technology in different parts of the world. For example,
AmI is used in Europe, and the term was coined by Emile Aarts of Philips Research
in 1998 and adopted by the European Commission. Its equivalent in the USA is
UbiComp; Marc Weiser was first credited for dubbing the term in the late 1980s,
during his tenure as a Chief Scientist/Technologist at the Xerox Palo Alto Research
Center (PARC). He wrote some of the earliest papers on the subject, largely
defining it and sketching out its major concerns (Weiser 1991; Weiser et al. 1999).
Ubiquitous networking is more prevalent in Japan. Essentially all these terms mean
pretty much the same thing: regardless of their locations, researchers are all
investigating and developing similar technologies and dealing with similar chal-
lenges and problems (see Wright 2005).
different types of UbiComp systems have been proposed based upon merging dif-
ferent sets of core properties, including ubiquity and transparency; distributed
mobile, intelligence, augmented reality; autonomy and iHCI; AmI; and so forth.
A generic topology for AmI can improve its definition and reduce or remove the
ambiguity surrounding what constitutes it and thereby assist in the development of
AmI systems. While typologies are not panaceas, a generic one for AmI systems is
necessary, as it helps to define what AmI is and what it is not and assists the
designers and developers of AmI systems and applications, by having a better
understanding of AmI as a new computing paradigm (Gill and Cormican 2005).
A typology commonly refers to the study and interpretation of types or a taxonomy
according to general type. It is thus grouping models or artifacts describing different
aspects of the same or shared characteristics. There exist various approaches to AmI
typology, involving technological or human views or a combination of these and
supporting different characteristics pertaining to computational tasks and compe-
tencies depending on the application domain, among others. There exist many
theoretical models in literature (e.g., Arts and Marzano 2003; Hellenschmidt and
Kirste 2004; Riva et al. 2005; Gill and Cormican 2005) that look at technological
dimensions as to what enables or initiates an AmI system or take a combined view
of the characteristic of what an AmI system should involve, that is, what constitutes
and uniquely distinguishes AmI from other computing paradigms or technologies.
Based on the foundational tenets of AmI as a paradigm that builds upon
people-centered philosophy, Gill and Cormican (2005) propose an AmI system
typology based on a combined perspective—technological and human side of the
AmI—involving tasks and skills as two main areas that together define what an
2.8 Typologies for AmI 37
AmI system should entail—what is and what is not an AmI system. As illustrated in
Fig. 2.1, the outer ring represents the tasks that the AmI system needs to recognize
and respond to and the inner ring represent the skills that AmI system should
encompass. The authors stated that the tasks: habits, needs, gestures, emotions,
and context are human-orientated, in that they represent the human characteristics
that the AmI must be aware of, whereas the skills: sensitive/responsive,
intuitive/adaptive, people-centered, and omnipresent, are technology-orientated, in
that they represent the technology characteristics that the AmI must have or
inherently accomplish as abilities to interact with the human actors. They also
mentioned that the link between the two areas is of an inseparable, interlinked, and
interdependent nature.
To elaborate further on the link between the tasks and skills, the AmI system
needs to take care of needs, be sensitive to users, anticipate and respond intelli-
gently to spoken or gestured indications of desire, react to explicit spoken and
gestured commands, support the social processes of humans and be competent
agents in social interactions, engage in intelligent dialog or mingle socially with
human users, and elicit pleasant user experiences and positive emotions in users.
AmI thus involves supporting different kinds of needs associated with living, work,
social, and healthcare environments. These needs differ as to the necessity level—
i.e., either they improve the quality of people’s lives or sustain human lives. For
AmI technology to be able to interact with the human actor—what it must innately
accomplish as its aptitudes—and thus provide efficient services in support of the
user, it has to be equipped with such human-like computational capabilities as
context awareness functionality (see Chap. 3), natural interaction and intelligent
behavior (see Chap. 6), emotional and social intelligence (see Chap. 8), and cog-
nitive supporting behavior (see Chap. 9). These computational competencies enable
AmI systems to provide adaptive, responsive, and anticipatory services.
Responsiveness, adaptation, and anticipation (see Chap. 6 for a detailed account
and discussion and Chaps. 8 and 9 for application examples) are based either on
pre-programed heuristics or real-time learning and reasoning capabilities. However,
according to Gill and Cormican (2005, p. 6) for an AmI system to be
sensitive/responsive, it ‘needs to be tactful and sympathetic in relation to the
feelings of the human actor, has to react quickly, strongly, or favorably to the
various situations it encounters. In particular, it needs to respond and be sensitive to
a suggestion or proposal. As such, it needs to be responsive, receptive, aware,
perceptive, insightful, precise, delicate, and most importantly finely tuned to the
requirements of the human actor and quick to respond’. For AmI to be adaptive, it
‘needs to be able to adapt to the human actor directly and instinctively. This should
be accomplished without being discovered or consciously perceived therefore it
needs to be accomplished instinctively i.e., able to be adjusted for use in different
conditions. The characteristics it is required to show are spontaneity, sensitivity,
discerning, insightful and at times shrewd’ (Ibid). And for AmI to be anticipatory
and proactive, it needs to predict the human actor’s needs and desires and pre-act in
a way that is articulated as desirable and appropriate and without conscious
mediation. It is required to think on its own, make decisions based on predictions or
38 2 Ambient Intelligence …
expectations about the future, and act autonomously so the human actor does not
have to work to use it—the AmI system frees people from manual control of the
environment. As such, it needs to be predictive, aware, knowledgeable, experi-
enced, and adaptively curious and confident. This characteristic is, according to
Schmidhuber (1991), important to decrease the mismatch between anticipated states
and states actually experienced in the future. He introduces the concept of curiosity
for intelligent agents as a measure of the mismatch between expectations and future
experienced reality.
Considering the sprouting nature of AmI paradigm, any proposed typology for
AmI normally result from and build on pervious, ongoing, and/or future (theoretical
and empirical) research in the area of AmI, thereby evolving continuously with the
purpose of improving definitions and reducing the ambiguity around what consti-
tutes AmI. Indeed, since the inception of AmI, a number of typologies have been,
and continue to be, developed, revised, refined, restructured, expanded, or adapted
to reflect various renditions pertaining to the amalgamation of computational tasks
and competencies—how they have been, and are being, combined in relation to
various application domains (e.g., ambient assisted living, smart home environ-
ment, workspace, healthcare environment, social environment, etc.) as to what they
entail in terms of the underlying technologies used for the implementation of AmI
systems (e.g., capture technologies, data processing methods, pattern recognition
techniques, modeling and reasoning approaches, etc.) and in terms of the nature of
intelligent services to be provided. Therefore, typologies constantly evolve as new
research results transpire and knowledge advances. This process will continue as
AmI evolves as a computing paradigm and become more established and popular as
an academic discourse.
However, the existing literature on AmI remains heavy on speculation and weak
on empirical evidence and theory building—extant typologies, frameworks, and
models have poor explanatory power, and the applications and systems that have
been developed in the recent years are far from real-world implementation, i.e.,
generally evaluated and instantiated in laboratory settings. This concerns more the
vision of ‘human-centric computing’, as most of the many concepts that have
already been tested out as prototypes in field trials relate more to the vision of
UbiComp. Hence, thorough empirical and theorizing endeavor is necessary for AmI
as both a new computing paradigm and a vision of a next wave in ICT to have
strong academic buy-in and practical relevance in relation to the future form of the
kind of technological development in the information society. At present, the
growth of academic interest in AmI as a ‘paradigmatic shift in computing and
society’ (Punie 2003) is such that it is becoming part of mainstream debate in the
technological social sciences in Europe.
2.9 Paradigmatic, Non-paradigmatic … 39
can be defined as: ‘any goal-oriented activity requiring, benefiting from, or creating
computers. Thus, computing includes designing and building hardware and soft-
ware systems for a wide range of purposes; processing, structuring, and managing
various kinds of information; doing scientific studies using computers; making
computer systems behave intelligently; creating and using communications and
entertainment media; finding and gathering information relevant to any particular
purpose, and so on. The list is virtually endless, and the possibilities are vast’
(ACM, AIS and IEEE-CS 2005, p. 9).
According to Kuhn (1962, 1996), a paradigm denotes the explanatory power and
thus universality of a theoretical model and its broader institutional implications for
the structure, organization, and practice of science. A theoretical model is a theory or
a group of related theories designed to provide explanations within a scientific
domain or subdomain for a community of practitioners—in other words, a scientific
discipline- or subfield-shared cognitive or intellectual framework encompassing the
basic assumptions, ways of reasoning, and approaches or methodologies that are
universally acknowledged by a scientific community. A comprehensive theoretical
model involves a conceptual foundation for the domain; understands and describes
problems within the domain and specify their solutions; is grounded in prior
empirical findings and scientific literature; is able to predict outcomes in situations
where these outcomes can occur far in the future; guides the specification of a priori
postulations and hypotheses; uses rigorous methodologies to investigate them; and
provides a framework for interpretation and understanding of unexpected outcomes
or results of scientific investigations. Kuhn’s notion of paradigm is based on the
existence of an agreed upon set of concepts for a scientific domain, and this set forms
or constitutes the shared knowledge and specialized language of a discipline (e.g.,
computer science) or sub-discipline (e.g., artificial intelligence, software engineer-
ing). This notion of paradigm: an all-encompassing set of assumptions resulting in
the organization of scientific theories and practices, involves searching for invariant
dominant paradigm governing scientific research. And ‘successive transition from
one paradigm to another via revolution is the usual developmental pattern of mature
science’ (Kuhn 1962, p. 12). This is what Kuhn (1962) dubbed ‘paradigm shifts’.
A paradigm shift is, according to him, a change in the basic assumptions, thought
patterns or ways of reasoning, within the ruling theory of science—in other words, a
radical and irreversible scientific revolution from a dominant scientific way of
looking at the world. This applies to computing, as I will try to exemplify below. In
accordance with Kuhn’s (1962) conception, a paradigm shift in computing should
meet three conditions or encompass three criteria: it must be grounded in a
2.9 Paradigmatic, Non-paradigmatic … 41
anytime’ (Punie 2003, p. 12). This paradigm shift ‘has the objective to make
communication and computer systems simple, collaborative and immanent.
Interacting with the environment where they work and live, people will naturally
and intuitively select and use technology according to their own needs’ (Riva et al.
2003, p. 64).
More to Kuhn’s (1996) conception of paradigm shift, AmI stemming from
UbiComp is accepted by a community of practitioners and has a body of successful
practice. As mentioned earlier, there is a strong institutional and governmental
support for and commitment to AmI—industry associations, scholarly and scientific
research community, and policy and politics. The research and innovation within
AmI are active across Europe at the levels of technology farsightedness, science and
technology policy, research and technology development, and design of next
generation technologies (see Punie 2003; Wright 2005). They pertain predomi-
nantly to the areas of microelectronics (miniaturization of mechatronic systems,
devices, and components), embedded systems, and distributed computing. In par-
ticular, the trends toward AmI are noticeably driving research and development into
ever smaller sizes of computing devices. AmI is about smart dust with networked
miniature sensors and actuators and micro-electro-mechanical systems (MMES)
incorporating smart micro-sensors and actuators with microprocessors and several
other components so small to be virtually indiscernible or invisible. The minia-
turization trend is increasingly enabling the development of various types and
formats of sensing and computing devices that allow registering and processing
various human parameters (information about people) in an intrusive way, without
disturbing users or actors (see Chap. 4 for more detail on miniaturization trends and
related issues).
In the very near future, both the physical and human world will be overwhelmed
by or strewn with huge quantities of tiny devices (e.g., active and passive RFID
tags), entrenched into everyday objects and attached to people, for the purpose of
their identification, traceability, and monitoring. Today, RFID tags are attached to
many objects and are expected to be embedded in virtually all kinds of everyday
objects, with the advancement of the Internet of Things. In recent years, efforts have
been directed towards designing remote devices and simple isolated appliances—
that might be acceptable to the users and consumers of AmI technology, which
‘prepares the ground for a complete infiltration of our environment with even more
intelligent and interconnected devices. People should become familiar with AmI;
slowly and unspectacularly; getting used to handing over the initiative to artificial
devices. There is much sensing infrastructure already installed for handling secu-
rity… What remains to be done is to shift the domain of the intended monitoring
just enough to feed the ongoing process of people getting used to these controls and
forgetting the embarrassment of being permanently monitored, in other words—
having no off-switch’ (Crutzen 2005, p. 220). At present, the environment of
humans, the public and the private, is pervaded by huge quantities of active devices
of various types and forms, computerized enough to automate day-to-day decisions
and thus act autonomously on behalf of human–agents. However, the extensive
incorporation of computer technology into people’s everyday lives and thus the
2.9 Paradigmatic, Non-paradigmatic … 45
institutional dimension entails that there are clear political advantages to a break
with existing societal paradigm—which is not fully technologized, thereby AmI
finding strong institutional (and governmental) support.
The main goal of AmI is to make computing technology everywhere, simple to use
and intuitive to interact with, and accessible to people with minimal technical
knowledge. The AmI vision is evolving towards an achievable and deployable
computing paradigm, thanks to the recent advances in embedded systems, micro-
electronics, wireless communication networks, multimodal user interfaces, and
intelligent agents. These enabling technologies are expected to evolve even more.
They are a key prerequisite for realizing the AmI vision, especially in terms of its
UbiComp vision. This is about the technology necessary for turning it into reality,
making it happen. AmI systems are increasingly maturing and proliferating across a
range of application domains.
Embedded systems constitute one of the components for ambience in AmI. AmI
is characteristically embedded: many networked devices are integrated into the
environment. The recent advances in embedded systems have brought significant
improvements. Modern embedded systems, which are dedicated to handle a par-
ticular task, are based on microcontrollers (i.e., processors with integrated memory
and peripheral interfaces). An embedded system is a computer system with a
dedicated task, often with reactive computing—hardware and software systems are
48 2 Ambient Intelligence …
to all explicit user interfaces is that the user explicitly requests an action from the
computer, the action is carried out by the computer, and then the system responds
with an appropriate reply. In AmI computing, on the other hand, the user and the
system are in an implicit interaction where the system is aware of the context in
which it operates or is being used and responds or adapts its behavior to the
respective context. This relates to iHCI: ‘the interaction of a human with the
environment and with artifacts’ as a process which entails that ‘the system acquires
implicit input from the user and may present implicit output to the user’ (Schmidt
2005, p. 164). Hence, iHCI involves a number of the so-called naturalistic user
interfaces, including facial user interfaces, gesture user interfaces, voice interfaces,
motion tracking interfaces, eye-based interfaces, and so on.
The intelligent agent as a paradigm became widely recognized during the 1990s
(Russell and Norvig 2003; Luger and Stubblefield 2004), a period that marked the
emergence of UbiComp vision. In computing, the term ‘intelligent agent’ may be
used to describe a software agent that has some intelligence, a certain degree of
autonomy, ability to react to the environment, and goal-oriented behavior. There are
many different types of agents (see Chap. 6), but common to all of them is that they
act autonomously on behalf of users—decide and execute tasks on their own
autonomy and authority. Intelligent agents represent one of the most promising
technologies in AmI—intelligent user interfaces—because they are associated with
computational capabilities such as adaptation, responsiveness, and anticipation
relating to service delivery. Accordingly, capture technologies, pattern recognition
techniques, ontological and hybrid modeling and reasoning techniques, and actu-
ators have attracted increasing attention as AmI computing infrastructures and
wireless communication networks become financially affordable and technically
matured.
In all, intelligent environments, in which AmI can exist, which involve the
home, work, learning, and social settings, are increasingly becoming computa-
tionally augmented: equipped with smart miniature sensors and actuators and
information processing systems. These intelligent environments will be common-
place in the very near future. This can be explained by the dramatic reduction in the
cost and the advancement of computing, networking, and communication tech-
nologies, which have indeed laid the foundations for the vision of AmI to become
an achievable computing paradigm. In sum, it can be said that AmI is primarily
based on technological progress in the aforementioned fields. The required research
components in which significant progress has to be made in order to further develop
and realize the AmI vision include: in terms of ambient components, MEMS and
sensor technology, embedded systems, ubiquitous communications, input and
output device technology, adaptive software, and smart materials, and in terms of
intelligence component, contextual awareness, natural interaction, computational
intelligence, media handling and management, and emotional computing (ISTAG
2003).
50 2 Ambient Intelligence …
As a result of the continuous effort to realize and deploy AmI paradigm, which
continues to unfold due to the advance and prevalence of multi-sensory, minia-
turized devices, smart computing devices, and advanced wireless communication
networks, all AmI areas are under vigorous investigation in the creation of smart
environments, ranging from low-level data collection (i.e., sensing, signal pro-
cessing, fusion), to intermediate-level information processing (i.e., recognition,
interpretation, reasoning), to high-level application and service delivery (i.e.,
adaptation and actions), to networking and middleware infrastructures. As a mul-
tidisciplinary paradigm and a ‘crossover approach’, AmI is strongly linked to a lot
of topics related to computer science, artificial intelligence, and networking.
In terms of computer science, artificial intelligence, and networking, topics
include, and are not limited to: context-aware, situated, affective, haptic, sentient,
wearable, invisible, calm, smart, mobile, distributed, and location computing;
embedded systems; knowledge-based and perceptual user interfaces; micropro-
cessors and information processing units; machine learning and reasoning tech-
niques; ontological modeling and reasoning techniques; real-time operation
systems; multi-agent software; human-centered software engineering; sensor sys-
tems and networks; MMES and NMES; multimodal communication protocols;
wireless and mobile communication networks; smart materials for multi-application
smart cards; embodied conversational agents; and so forth (Punie 2003; Bettini
et al. 2010; Schmidt 2005; Oulasvirta and Salovaara 2004; Chen and Nugent 2009;
Picard 2000; Senders 2009; Lyshevski 2001; Vilhjálmsson 2009).
To create AmI environments requires collaboration between scholars and experts
from several research areas of AmI, which can be clustered into: ubiquitous
communication and networking, context awareness, intelligence, and natural HCI.
The first area involves fixed, wireless, mobile, and ad-hoc networking systems,
discovery mechanisms, software architectures, system integration, and mobile
devices. The second area encompasses sensors, smart devices, and software
architectures for multi-platform interfaces, as well as capture, tracking, positioning,
monitoring, mining, and aggregation techniques. The third area includes pattern
recognition algorithms, ontological modeling and reasoning, and autonomous
intelligent decision making. The last area involves multimodal interaction, hyper-
media interfaces, and agent-based interfaces. These areas have some overlaps
among them.
2.11 Research Topics in AmI 51
Psychology is the scientific study of the processes and behavior of the human brain.
Cognitive psychology is one of the recent psychological approaches and additions
to psychological research. It is thus the subfield of psychology that studies internal
mental information-manipulation processes and internal structures and
2.12 Human-Directed Sciences … 53
Cognitive psychology, cognitive science, and AI involve the study of the phe-
nomenon of cognition or intelligence, with cognitive psychology focused on the
nature of cognition in humans, cognitive science in both humans and computers,
and AI particularly in machines and computers. With aiming and sharing the
interest to understand the nature and organizing principles of the mind, they involve
low-level perception mechanisms and high-level reasoning and what they entail,
thereby spanning many levels of analysis. They all pride themselves on their sci-
entific basis and experimental rigor. As contributors to the cognitive evolution, they
are built on the radical notion that it is possible to study, with scientific precision,
the actual processes of thought. Insofar as research methods are taken to be com-
putational in nature, AI has come to play a central role in cognitive science
(Rapaport 1996). And given its interdisciplinary nature, cognitive science espouses
a wide variety of methodologies, drawing on scientific research methods from
cognitive psychology, cognitive neuroscience, and computer science. Cognitive
science and AI use computer’s intelligence to understand how humans think.
Computers as tools are widely used to investigate various cognitive phenomena.
In AI, computational modeling makes use of simulation techniques to investigate
how human intelligence may be structured (Sun 2008). Testing computer programs
56 2 Ambient Intelligence …
by how they can accomplish what they can accomplish is said, in the field of AI, to
be doing cognitive science: using AI to understand the human mind. Cognitive
science also provides insights into how to present information to or structure
knowledge for human beings so they can use it most effectively in terms of pro-
cessing and manipulation. In addition, cognitive science employs cognitive para-
digms to understand how information processing systems such as computers can
simulate cognition or how the brain implements information-processing functions.
In relation to this, del Val (1999) suggests that in order for cognitive psychology to
be useful to AI, it needs to study common-sense knowledge and reasoning in
realistic settings and to focus on studying how people do well the things they do
well. Also, analyzing AI systems provides ‘a new understanding of both human
intelligence and other intelligences. However, it is difficult to study the mind with a
similar one—namely ours. We need a better mirror. As you will see, in artificial
intelligent systems we have this mirror’ (Fritz 1997). Moreover, both cognitive
scientists and cognitive psychologists were the antagonists of reason and therefore
tended to reinforce the view that emotions interfere with cognition, and now dis-
covered, building on almost more than two decades of mounting work, that it is
impossible to understand how we think without understanding how we experience
emotions. This area of study has become of prime focus in AI—specifically
affective computing—in the recent years (addressed in the previous chapter).
Furthermore, core theoretical ideas of cognitive science, of which psychology is
the thematic heart, are drawn from AI; many cognitive scientists try to build
functioning models of how the mind works. AI is considered as one of the fields (in
addition to linguistics, neuroscience, philosophy, anthropology, and psychology)
that contributed to the birth of cognitive science (Miller 2003). Cognitive science
could be synonymous with AI when the mind is understood as something that can
be simulated through software and hardware—a computer scientist’s view (Boring
2003). AI and cognitive psychology are a unified endeavor, with AI focused on
cognitive science and ways of engineering intelligent entities. Cognitive psychol-
ogy evolved as one of the significant facets of the interdisciplinary subject of
cognitive science, which attempts to amalgamate a range of approaches in research
on the mind and mental processes (Sun 2008). Owing to the use of computational
metaphors and terminology, cognitive psychology has benefited greatly from the
flourishing of research in cognitive science and AI. One major contribution of
cognitive science and AI to cognitive psychology is the information processing
model of cognition. This is the dominant paradigm in the field of psychology,
which is a way of thinking and reasoning about mental processes, envisioning them
as software programs running on the computer as a human brain. In this account,
humans are viewed as dynamic information processing systems whose mental
operations are described in computational terminology, e.g., inputs, structures,
representations, processes, and outputs, and metaphors, e.g., the mind functions as a
computer. The cognitive revolution was, from its inception, guided by the metaphor
that the mind is like a computer, and ‘cognitive psychologists were interested in the
software’ programs, and this ‘metaphor helped stimulate some crucial scientific
breakthroughs. It led to the birth of AI and helped make our inner life a subject
2.12 Human-Directed Sciences … 57
suitable for science’ (Lehrer 2007). ‘The notion that mental states and processes
intervene between stimuli and responses sometimes takes the form of a “compu-
tational” metaphor or analogy, which is often used as the identifying mark of
contemporary cognitive science: The mind is to the brain as software is to hard-
ware; mental states and processes are (like) computer programs implemented (in the
case of humans) in brain states and processes’ Rapaport (1996, p. 2). All in all,
advances in AI, discoveries in cognitive science, and advanced understanding of
human cognition (information processing system) are, combined, generating a
whole set of fertile insights and new ideas that is increasingly altering the way we
think about how we think and how we should use this understanding to advance
technology towards the level of human functioning. One corollary of this is the
socio-technological phenomenon of AmI, especially the intelligent behavior of AmI
systems associated with facilitating and enhancing human cognitive intelligence,
thanks to cognitive context awareness and natural interaction.
Cognitive science is widely applied across several fields and has much to its
credit, owing to its widely acknowledged accomplishments beyond AI and AmI. It
has offered a wealth of knowledge to the field of computing and computer science,
especially foundational concepts and theoretical models which have proven to be
valuable and seminal in the design and modeling of computing systems—the way
they cognitively function and intelligently behave (e.g., social intelligence, emo-
tional intelligence, and conversational intelligence). Indeed, it is widely acknowl-
edged that it is the major stride the cognitive science has made in the past two
decades, coupled with recent discoveries in computing and advances in AI that has
led to the phenomenon of AmI, a birth of a new paradigm shift in computing and a
novel approach to HCI. In more detail, the amalgamation of recent discoveries in
cognitive science—that make it possible to acquire a better understanding of the
cognitive information processing aspects of human mind, and the breakthroughs at
the level of the enabling technologies and computational processes and capabilities
(e.g., context awareness, natural interaction, and intelligent behavior) make it
increasingly possible to build ground-breaking intelligent (human-inspired) systems
based on this understanding. This new development entails advanced knowledge in
human functioning as to cognitive, emotional, behavioral, and social aspects and
processes and how they interrelate, coupled with innovations pertaining to system
engineering, design, and modeling. Moreover, the evolving wave of research in
computing has given rise to, and continues to inspire, a whole range of new
computing trends, namely, hitherto, context-aware, affective, haptic, situated,
invisible, sentient, calm, and aesthetic computing. In particular, the interdisciplinary
research approach increasingly adopted in the field of computing is qualitatively
shaping research endeavors towards realizing the full potential of AmI as a com-
puting paradigm. This approach has generated a wealth of interactional knowledge
about the socio-technological phenomenon of AmI.
Cognitive science spans many levels of analysis pertaining to human mind and
artificial brain, from low-level sensation, perception, and action mechanisms to
high-level reasoning, inference, and decision making. This entails a range of brain
functional systems, including cognitive system, neural system, evaluation system,
decision system, motor system, monitor system, and so forth. One major research
challenge in AmI is to create context-aware computers that are able to adapt in
response to the human users’ cognitive states and processes, with the aim to
facilitate and enhance their cognitive intelligence abilities when performing tasks in
a variety of settings.
Linguistics is the scientific study of natural language, the general and universal
properties of language. It covers the structure, sounds, meaning, and other
dimensions of language as a system. Linguistics encompasses a range of single and
interdisciplinary subfields. Single subfields include morphology, syntax, phonol-
ogy, phonetics, lexicon, semantics, and pragmatics, and Interdisciplinary subfields
include sociolinguistics, psycholinguistics, cognitive linguistics, and neurolinguis-
tics (see Chap. 6 for a detailed account). It collaborates with AI, cognitive science,
cognitive psychology, and neurocognitive science. Chapter 6 provides an overview
addressing the use of computational linguistics: structural linguistics, linguistic
production, and linguistic comprehension as well as psycholinguistics, neurolin-
guistics, and cognitive linguistics in relation to conversational agents and other AI
systems.
Human communication is the field of study that is concerned with how humans
communicate, involving all forms of verbal and nonverbal communication. As a
natural form of interaction, it is highly complex, manifold, and dynamic, making
2.12 Human-Directed Sciences … 61
2.12.9 Philosophy
References
Aarts E (2005) Ambient intelligence drives open innovation. ACM J Interact 12(4):66–68
Aarts E, Grotenhuis F (2009) Ambient intelligence 2.0: towards synergetic prosperity. In:
Tscheligi M, Ruyter B, Markopoulus P, Wichert R, Mirlacher T, Meschterjakov A,
Reitberger W (eds) Proceedings of the European Conference on Ambient Intelligence.
Springer, Salzburg, pp 1–13
Aarts E, Marzano S (2003) The new everyday: visions of ambient intelligence. 010 Publishers,
Rotterdam
Alahuhta P, Heinonen S (2003) A social and technological view of ambient intelligence in
everyday life: what bends the trend? Tech. Rep. Research report RTE 2223/03, VTT. Espoo
Aarts E, Harwig R, Schuurmans M (2002) Ambient intelligence. In: Denning P (ed) The invisible
future. The seamless integration of technology in everyday life. McGraw-Hill, New York,
pp 235–250
Azodolmolky S, Dimakis N, Mylonakis V, Souretis G, Soldatos J, Pnevmatikakis A,
Polymenakos L (2005) Middleware for in-door ambient intelligence: the polyomaton system.
In: Proceedings of the 2nd international conference on networking, next generation networking
middleware (NGNM 2005), Waterloo
Ben-Ari M (1990) Principles of concurrent and distributed programming. Prentice Hall Europe,
New Jersey
Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A
survey of context modelling and reasoning techniques. J Pervasive Mob Comput Spec Issue
Context Model Reasoning Manage 6(2):161–180
Boring RL (2003) Cognitive science: at the crossroads of the computers and the mind. Assoc
Comput Mach 10(2):2
Bourdieu P (1988) Homo academicus. Stanford University Press, Stanford
Bourdieu P, Wacquant L (1992) An invitation to reflexive sociology. University of Chicago Press,
Chicago
Burgelman JC (2001) How social dynamics influence information society technology: lessons for
innovation policy. OECD, social science and innovation. OECD, Paris, pp 215–222
Chen L, Nugent C (2009) Ontology-based activity recognition in intelligent pervasive
environments. Int J Web Inf Syst 5(4):410–430
Cornelius R (1996) The science of emotions. PrenticeHall, Upper Saddle River
Criel J, Claeys L (2008) A transdisciplinary study design on context-aware applications and
environments, a critical view on user participation within calm computing. Observatorio
(OBS*) J 5:057–077
Cross N (2001) Designerly ways of knowing: design discipline versus design science. Des Issues
17(3):49–55
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc 3
(4):219–232
64 2 Ambient Intelligence …
Luger G, Stubblefield W (2004) Artificial intelligence: structures and strategies for complex
problem solving. The Benjamin/Cummings Publishing Company, San Francisco
Lyshevski SE (2001) Nano- and microelectromechanical systems: fundamentals of nano- and
microengineering. CRC Press, Boca Ratón
March ST, Smith GF (1995) Design and natural science research on information technology. Decis
Support Syst 15:251–266
McCarthy J (2007) What is artificial intelligence? Computer Science Department, Stanford
University, Stanford
McCorduck P (2004) Machines who think. AK Peters Ltd, Natick
Miles I, Flanagan K, Cox D (2002) Ubiquitous computing: toward understanding European
strengths and weaknesses. European Science and Technology Observatory Report for IPTS,
PREST, Manchester
Miller GA (2003) The cognitive revolution: a historical perspective. Trends Cogn Sci 7:141–144
Norman DA (1981) What is cognitive science? In: Norman DA (ed) Perspectives on cognitive
science. Ablex Publishing, Norwood, pp 1–11
Ortony A, Clore GL, Collins A (1988) The cognitive structure of emotions. Cambridge University
Press, Cambridge
Oulasvirta A, Salovaara A (2004) A cognitive meta-analysis of design approaches to interruptions
in intelligent environments. In: CHI 2004, late breaking results paper, Vienna, Austria, 24–29
Apr 2004, pp 1155–1158
Passer MW, Smith RE (2006) The science of mind and behavior. Mc Graw Hill, Boston
Picard R (2000) Perceptual user interfaces: affective perception. Commun ACM 43(3):50–51
Poole D, Mackworth A, Goebel R (1998) Computational intelligence: a logical approach. Oxford
University Press, New York
Poslad S (2009) Ubiquitous computing: smart devices, environments and interaction. Wiley,
Hoboken
Punie Y (2003) A social and technological view of ambient intelligence in everyday life: what
bends the trend? In: The European media and technology in everyday life network, 2000–2003,
Institute for Prospective Technological Studies Directorate General Joint Research Center
European Commission
Rapaport WJ (1996) Understanding understanding: semantics, computation, and cognition,
pre-printed as technical report 96–26. SUNY Buffalo Department of Computer Science,
Buffalo
Riva G, Loreti P, Lunghi M, Vatalaro F, Davide F (2003) Presence 2010: the emergence of
ambient intelligence. In: Riva G, Davide F, IJsselsteijn WA (eds) Being there: concepts, effects
and measurement of user presence in synthetic environments. IOS Press, Amsterdam
Riva G, Vatalaro F, Davide F, Alcañiz M (2005) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam
Rose S (1997) Lifelines: biology beyond determinism. Oxford University Press, Oxford
Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 1:145–
172
Russell S, Norvig P (2003) Artificial intelligence—a modern approach. Pearson Education, Upper
Saddle River
Sanders D (2009) Introducing AI into MEMS can lead us to brain-computer interfaces and
super-human intelligence. Assembly Autom 29(4):309–312
Scherer KR, Schorr A, Johnstone T (eds) (2001) Appraisal processes in emotion: theory, methods,
research. Oxford University Press, New York
Schmidhuber J (1991) Curious model building control systems. In: International joint conference
on artificial neural networks, IEEE, Singapore, pp 1458–1463
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam
66 2 Ambient Intelligence …
3.1 Introduction
research, based on findings that every human interaction is contextual, situated, that
is, defined and influenced by—how humans perceive and evaluate in time—the
context of a situation. In other words, context has become of particular interest to
the HCI community, as the interaction with applications and their interfaces has
become less well-structured environments.
Context awareness technology has been of prime focus in AmI research.
Research on context awareness has been intensively active for over two decades in
academic circles as well as in the industry, spanning a range of computing fields,
including HCI, AmI, UbiComp, mobile computing, and AI (e.g., affective com-
puting, conversational agents). Indeed, recent years have witnessed a great interest,
and a proliferation of scholarly writings on, the topic of context awareness,
reflecting both the magnitude and diversity of research in the field of context-aware
computing. The body of research on the use of context awareness technology for
developing AmI applications that are flexible, adaptable, and possibly capable of
acting autonomously on behalf of users continues to flourish within a variety of
application domains. As research shows, it is becoming increasingly evident that
AmI environments—and hence context-aware applications, which can support
living, work, and social places, will be commonplace in the near future due to
recent developments in computer hardware, software, and networking technologies.
These encompass miniaturized sensors, sensor networks, pattern recognition/
machine learning techniques, ontological context modeling and reasoning tech-
niques, intelligent agents, wireless and mobile communication technology, mid-
dleware platforms, and so forth. Most of these technologies constitute the object of
subsequent chapters, whereby they are described, discussed, and put into per-
spective to provide an understanding of their role in the functioning of AmI
applications and environments. However, while there exist numerous technologies
for the development and implementation of context-aware applications, which
indicate that most research focuses on the development of technologies for context
awareness as well as the design of context-aware applications, there is a need,
within the field of context-aware computing, for conducting further studies with
regard to understanding how users perceive, use, and experience context-aware
interaction in different settings. In other words, the focus should be shifted from
technological to human and social dimensions of AmI. Context awareness poses
many issues and challenges that should be addressed and overcome in order to
realize the full potential of AmI vision.
AmI constitutes an approach to HCI that is built upon the concept of implicit
HCI (iHCI) (see Chap. 6). Creating an ambient intelligent human–computer
interface is based on iHCI model that takes the users’ context as implicit elements
into account. One key feature of this model is the use of natural human forms of
communication—based on verbal and nonverbal multimodal communication
behavior (see Chap. 7). These can be used by iHCI applications to acquire con-
textual information about the user—e.g., emotional, cognitive, and physiological
states and actions—so to respond intelligently to the current user’s context. iHCI
applications also use and respond to other subsets of context associated with the
environment, such as places, locations, and physical conditions. In this chapter,
3.1 Introduction 69
context awareness is primarily considered from the view point of HCI applications.
Given the scope of this book, the emphasis is on AmI applications showing
human-like understanding, interacting, and intelligent behavior in relation to cog-
nitive, emotional, social, and conversational processes of humans. Furthermore, to
establish implicit interaction, particularly context-aware functionality, various ele-
ments are required in order to collect, fuse, aggregate, process, propagate, interpret,
and reason about context information in support of users’ needs. These computa-
tional elements are addressed in the next chapter.
Furthermore, research shows that context awareness has proven to be a complex,
multilevel problem with regard to realization. First, what constitutes context and
context information as a delimitation has been of no easy task, and this difficulty
overarches all research in context-aware computing. There are several conceptual
and technical definitions that have been suggested, generating a cacophony that has
led to an exasperating confusion in the field. Second, the current vision of user
centrality in AmI technology design has been questioned and continues to be a
subject of criticism. This is related to the issue of disappearing of user interfaces
associated with context-aware applications in terms of who define the context and
adaptation rules and other HCI issues. Third, detecting contextual data using the
state-of-the-art sensor technologies and how this affects reasoning processes in
terms of inferring high-level abstractions of contexts due to limited and imperfect
data seems to be insurmountable issue in the field of AmI. Fourth, modeling
context, especially human factors related context (e.g., emotional state, cognitive
state, social state, etc.), has proven to be one of the most challenging tasks when it
comes to context representation and reasoning and the related knowledge domains
and adaptation rules. Of these issues, this chapter covers the first and second topics.
The two remaining issues are addressed in Chaps. 4 and 5, respectively.
Specifically, this chapter looks into the concept of context in relation to both
human interaction and HCI, espousing a transdisciplinary approach, and delves into
the technological and social dimensions of context awareness, focusing on key
aspects that are theoretically disputable and questionable in the realm of AmI and
pointing out key challenges, open issues, and limitations.
context and context awareness is conceptualized in cognitive science, AI, and AmI.
The situated nature and inherent complexity of interaction (as cognitive and social
process and behavior) makes it very difficult to grasp context and context awareness,
in relation to human interaction. Human interaction, while systematic, is never
planned; instead, it is situated and ad-hoc: done for a particular purpose as necessary,
for its circumstances are never fully anticipated and continuously changing, to draw
on (Suchman 2005). It entails meaning, which is subjective and evaluating in time
and hence open to re-interpretation/-assessment; this meaning influences the per-
ception of the context of a situation which defines and shapes human interaction.
Hence, transdisciplinary approach remains the most relevant approach to look at
context as a complex problem, as it insists on the fusion of different elements of a set
of theories with a result that exceeds the simple sum of each. Aiming for transdis-
ciplinary insight, the present study of context draws on several theories, such as
situated cognition, situated action, social interaction, social behavior, human com-
munication, and so on. Understanding the tenets of several pertinent theories allows
a more complete understanding of context both in relation to human interaction and
to HCI. Among the most holistic, these theories are drawn from cognitive science,
social science, humanities, philosophy, constructivism, and constructionism. The
intent here is to set side-by-side elements of a relevant set of theories that have clear
implications for the concept under study—context.
Tackling the topic of context, fully grasping it, and clearly conceptualizing it are
all difficult tasks. The underlying assumption is that context touches upon the
elementary structures of interactions in the everyday life world. Human interaction
is a highly complex and manifold process due to the complexity inherent in that
which constitutes context that defines, shapes, and changes that interaction. This
complexity lies particularly in the interlocking and interdependent relationship
between diverse subsets of rather subjectively perceived contextual entities, not
only as to persons, but also to objects, events, situations, and places. So context is
more about meanings that are constructed in interaction with these entities than
about these entities. Constructivist worldview posits that human interaction is
always contextual situated, and meaning to it is ascribed within this changing
context—i.e., evolving perceptions or reinterpretations of a situation. This is related
to the view that reality is one of intersubjective constructed meanings that are
defined in interaction with regard to the different entities involved in a given
situation, rather than a world that consists of epitomes or facts that epitomize
objects. The is related to objectivistic worldview, where distinct objects have
properties independent of the observer, that is, the meaning of a phenomenon is
inherent to the phenomenon and can be experienced by interacting with it.
However, context is interwoven with the view on social and physical reality and the
ontological nature and structure of the life-world—how phenomena and things in
the reality are related and classified—with respect to social interaction. Social and
human sciences posit that cognitive, social, and cultural contexts are taken into
account for explaining social interactions and related processes, a perspective which
emphasizes contextualizing behavior when looking for explaining social behavior.
3.2 Context from a Transdisciplinary Perspective 71
and so on. In other words, context constitutes an infinite richness of assumptions and
factors, against which relevant facts and concerns are delimited in the form of
dynamic, collective interweaving of internal and external entities, including moti-
vational, emotional, cognitive, physiological, biochemical, pragmatic, empirical,
ethical, normative, intellectual, behavioral, relational, paralinguistic, extra-linguistic,
social, cultural, situational, physical, and spatiotemporal elements. Hence, context
can be described as a complex set, or the totality, of intertwined circumstances which
provide a setting for interaction. It is in terms of the setting formed by those cir-
cumstances that everything can be fully understood, evaluated, and eventually
reacted to. In all, contextual assumptions selected based on daily situations of life
enable to delimit relevant facts and concerns that condition our judgments, claims,
and decisions against myriad other circumstances, and the overall context conditions
our perceptions and understandings of the social world—meaning, truth, relevance,
and rationality—and hence our notions of actions in it. Context is ‘the totality of
contextual assumptions and selections that give meaning and validity to any piece of
information; that is, context awareness is an ideal, and ideals usually resist complete
realization. This is why we need them: because they resist full realization, they give
us critical distance to what is real.’ (Ulrich 2008, p. 6)
However, in context-aware computing, it is important to look at everyday human
interactions and the way in which they get shaped and influenced by context, when
attempting to model and implement context awareness in computing systems. To
understand the relationship between human context and interaction, there is a need
to excavate to add much to our current understanding as to what constitutes context
and what underlies the selectivity of our contextual assumptions that condition—
defines, surrounds, and continuously change—our (inter)actions. However, there is
a fundamental difference between human and nonhuman context awareness—
context awareness in computing. According to Ulrich (2008, p. 7), the crucial
difference between the two ‘can be expressed in various ways. In the terms of
practical philosophy…, human context includes the dimension of practical-
normative reasoning in addition to theoretical-empirical reasoning, but machines
can handle the latter only. In phenomenological terms, human context is not only a
“representational” problem (as machines can handle it) but also an “interactional”
problem, that is, an issue to be negotiated through human interaction…. In semiotic
terms, finally, context is a pragmatic rather than merely semantic notion, but
machines operate at a syntactic or at best approximated semantic level of under-
standing’. This conspicuous difference implies that the specifics of context in real
life are too selective, subjective, subtle, fluid, and difficult to identify, capture, and
represent in computationally formal models. This would subsequently make is
difficult for context-aware applications to make sensible estimations about the
meaning of what is happening in the surrounding situation or environment, e.g.,
what someone is feeling, thinking, or needing at a given moment, and to undertake
in a knowledgeable manner actions that improve our wellbeing or support our tasks.
Indeed, it always makes sense to question contextual assumptions that condition
our interaction, as the context needs to be selected, framed, negotiated, and
3.3 Context (and Context Awareness) in Human Interaction 73
reconstructed, and thus is never given in the first place, and this goes much deeper
than how interactive computer systems understand us and our context and what
they decide for us, e.g., every human interaction involves a situational, physical,
psychological, social, and ethical/moral context.
combination in order to yield optimal context recognition results (see Chaps. 4 and 5
for illustrative examples). Example of context-aware applications include:
emotion-aware, cognitive task-aware, activity-aware, location-aware, event-aware,
situation-aware, conversational context-aware or affective context-aware systems.
Indeed conversational and affective systems, a category which falls under AI
research, have recently started to focus on context, namely dialog, environmental,
and cultural context, and the contextual appropriateness of emotions and multimodal
context-aware affective interaction, respectively (see Chaps. 7 and 8).
Furthermore, from most of the context research thus far, one of the main issues
continues to be the lack of clarity of, or the ambiguity surrounding, what constitutes
context: how to define the term and how properly or best to make use of it. There is
an exasperating lack of agreement as to what characterizes context: there are almost
as many different technical definitions as research areas within context-aware
computing. Researchers in the field seem to have no propensity to espouse an
agreed upon technical definition. Hence, it is more likely that context will continue
to take on different technical connotations depending on its context of use. Yet there
is a need for a more comprehensive definition of context, with high potential to be
actually implementable in context awareness architectures—in other words, an
operational definition that enable context-aware applications to sense and combine
as many aspects of context as possible for better understanding of and thus satis-
fying users’ needs. Towards this end, it is important to focus on a discussion of the
difference between context in its original complex definition and the so-called
ontological, logical, and probabilistic models of context being implemented in AmI
applications. It is also of significance to shift the focus of context debate from
whether it is technically feasible to capture the (complex) meaning of context in
more theoretic view to what can be done to develop innovative technologies,
techniques, and mechanisms pertaining to design and modeling that can allow to
operationalize complex concepts of context, close to context in those academic
disciplines specialized on the subject matter or devoted to the study of context (see
Goodwin and Duranti 1992 for an overview). The significance of taking this into
account stems from the high potential to enhance the functioning and performance
of context-aware applications, and thus the acceptance and use of AmI technology.
Especially, at the current stage of research, it seems to be unfeasible to adopt a
conceptual or theoretical definition given the constraints of existing technologies and
engineering practice that dictate the design and modeling of computational artifacts.
Indeed, the development of context-aware artifacts appears to be technology-driven,
driven by what is technically feasible rather than by what constitutes context in
real-world scenarios. This implies that some, if not most of cognitive, emotional, and
social aspects of context cannot be sensed by existing technology. Consequently, the
context determined or the ambience created by context-aware artifacts may differ
from what people involved in the situation have negotiated and how people perceive
to the actual context—subjective, socially situated interpretation of context. Indeed,
context is about meanings that are constructed in interaction with entities, such as
3.4 Definitional Issues of Context and Their Implications … 75
objects, people, places, events, situations, environments, and so on, than about
entities as such, which is a strange switch to make in light of the constraints of the
state-of-the-art enabling technologies and computational processes.
The purpose here is to take a theoretical tour through the various ways of under-
standing or interpreting context. The definition of context is still a matter of debate,
although defining core concepts is a fundamental step in carrying out scientific
research. The scholarly literature on context awareness, whether theoretical,
empirical, or analytical, shows that the definition of context has widely been rec-
ognized to be a difficult issue to tackle in context-aware computing. This difficulty
overarches all research in context awareness, notwithstanding the semantics of what
constitutes context and context information has been studied extensively and dis-
cussed profusely. However, many definitions have been suggested. They are often
classified into technical and conceptual: a restricted application-specific approach
and an encompassing theoretical approach. Technical definition of context is
associated with the technical representation of context information in a particular
application domain. It is technology—driven—by what is technically feasible with
regard to the existing enabling technologies, especially sensors used to measure
some features of context and representation formalism used to encode and reason
about context information. Accordingly, this technical approach entails that the
context of the application is defined by the designer and bounded by his/her con-
ception as to how to operationalize and thus conceptualize context. In this case, the
representation of the context of an entity in a system is of interest to a service
storage or provider for assessing the relevance and user-dependent features of the
service to be delivered. In all, technical definition can be applied to a context
representation as a computational and formal scheme and provides ways to dis-
tinguish (subsets of) contexts from each other, e.g., location, emotional state,
cognitive state, task, activity, time, spatial extent, and so on. Examples of technical
definitions can be found in Schmidt et al. (1999), Turner (1999), Chen and Kotz
(2000), Strang et al. (2003), Loke (2004), Kwon et al. (2005), Lassila and Khushraj
(2005) and Kim et al. (2007).
Similarly, there are many conceptual definitions of context. The most cited one in
the literature is the one provided by Dey (2000), from the perspective that
context-aware applications look at the who’s, where’s, when’s and what’s of dif-
ferent entities and use this information to determine why a situation is occurring. He
accordingly describes context as: ‘any information that can be used to characterize
the situation of an entity. An entity is a person, place, or object that is considered
relevant to the interaction between a user and an application, including the user and
applications themselves. Context is typically the location, identity and state of
people, groups and computational and physical objects’ (Dey 2000; Dey et al. 2001).
Dey’s definition depicts that the concept of entity is fundamentally different from
76 3 Context and Context Awareness of Humans and AmI Systems …
that of context: context is what can be said about or describe an entity. For example,
a user as an entity has such constituents of context as location, emotional state,
cognitive state, intention, social setting, cultural setting, and so on. Dey (2000) also
provides a comprehensive overview of existing definitions of context (e.g., Schilit
et al. 1994; Pascoe 1998; Chen and Kotz 2000; Hull et al. 1997; Brown 1996). These
are adopted in the literature on context awareness. Pascoe (1998) suggests that
context as a subjective concept is defined by the entity that perceives it. Chen and
Kotz (2000) make a distinction between active and passive aspects of context by
defining context as ‘a set of environmental states and settings that either determines
an application’s behavior or in which an application event occurs and is interesting to
the user’. Schilit et al. (1994) view context as the user’s location, the social situation,
and the nearby resources. More to context definitions, Schmidt et al. (1999) describe
context using a context model with three dimensions of Environment (physical and
social), Self (device state, physiological and cognitive), and Activity (behavior and
task). Göker and Myrhaug (2002) present AmbieSense system, where user context
encompasses five elements: environment context, personal context, task context,
social context, and spatiotemporal context. Looking at the ‘Context of Work’, Kirsh
(2001) suggests a more complex description of context: ‘highly structured amalgam
of informational, physical and conceptual resources that go beyond the simple facts
of who or what is where and when to include the state of digital resources, people’s
concepts and mental state, task state, social relations and the local work culture, to
name a few ingredients.’ This definition captures quite many aspects of what con-
stitutes a context from a conceptual level. It encapsulates more additional features
that make up context than other definitions. Yet, this definition, like most of con-
ceptual definitions, remains far from a real-world implementation: difficult to op-
erationalize or turn into workable, given the existing technological boundaries:
systems. Indeed, as noted by Kirsh (2001) there are many aspects of context that
have not yet been technologically sensed and could be very difficult to capture,
highlighting that this is a non-trivial task. Besides, context frameworks derived from
theoretical definitions of context are usually not based on a systematic analysis of
context and need to be supported by empirical data. Furthermore, Abowd and
Mynatt (2002) suggest that context can be thought of in terms of ‘who is using the
system’; ‘for what the system is being used’; ‘where the system is being used’;
‘when the system is being used’; and ‘why the system is being used’. While this
approach provides initial definitions of key features of context that could be rec-
ognized, it indicates neither how these features relate to specific activities nor how
they could be combined in the inference of a context abstraction. This applies to
similar definitions that suggest such classification scheme as person, task, object,
situation, event, and environment. Speaking of relationship between entities,
Crowley et al. (2002) introduce the concepts of role and relation in order to char-
acterize a situation. Roles involve only one entity, describing its activity. An entity is
observed to play a role. Relations are defined as predicate functions on several
entities, describing the relationship between entities playing roles. A related defi-
nition provided by Gross and Prinz (2000) describes an (awareness) context as ‘the
interrelated conditions in which something exists or occurs’.
3.5 Conceptual Versus Technical Definitions of Context 77
alienating the concept from its complex meaning as related to human interaction—
in more theoretic view—to serve technological purposes. Building context-aware
artifacts is not an easy task, and the implementation of context awareness is
computationally limited and its implications are not always well understood. There
is ‘little awareness of human context awareness in this fundamental and rich sense,
which relates what we see and do not only to our physical environment but equally
to our emotional, intellectual, cultural and social-interactive environment; our sense
of purpose and value, our interests and worldviews; our identity and autonomy as
human beings. Instead, context appears to have been largely reduced to the physical
(natural as well as architectural or otherwise human-constructed) properties of the
location and its nearby environment, including wireless capabilities and the
opportunities they afford for co-location sensing and information retrieval.
The advantage, of course, is that we can then entrust our devices and systems with
the task of autonomously detecting, and responding to, features of the context…
I would argue that information systems research and practice, before trying to
implement context awareness technically, should invest more care in understanding
context awareness philosophically and should clarify, for each specific application,
ways to support context-conscious and context-critical thinking on the part of users.
In information systems design, context-aware computing and context-critical
thinking must somehow come together, in ways that I fear we do not understand
particularly well as yet.’ (Ulrich 2008, p. 4, 8).
single entity and does not depend on the relationship with other entities’, e.g., the
location of a spatial entity, such as a person or a building. Relational context
describes ‘a type of context that depends on the relation between distinct entities’,
e.g., containment which defines a containment relationship between entities, such as
an entity building that contains a number of entity persons. Related to developments
in foundational ontologies, this categorization of context is analogous to the
ontological categories of moment defined in Guizzardi (2005).
with a particular emphasis on AmI systems that aim at providing intelligent services
in relation to user’s cognitive, emotional, and social needs. To establish context-
aware functionality, various computational components are required to collect, fuse,
aggregate, process, and propagate context information in support of users’ needs,
desires, and intentions. This involves a wide variety of technologies, including
miniaturized multi-sensors, pattern recognition/machine learning techniques, onto-
logical modeling and reasoning techniques, intelligent agents, networking, wireless
and mobile communication, middleware platforms, and so on. In particular with
sensor technology, AmI systems are augmented with awareness of their milieu,
which contribute to enhancing such autonomic computing features as self-learning,
self-configuring, self-executing, and self-optimizing. These autonomic features
enhanced by the efficiency of multi-sensor fusion technology as well as the gain of
rich information offer a great potential to boost the functionality of context-aware
applications to the extreme, thus providing infinite smart services to users within a
variety of settings: at work, at home, and on the move. Research in sensor technology
is rapidly burgeoning. With the advancement and prevalence of sensors (in addition
to computing devices), context-aware applications are increasingly proliferating,
spanning a variety of domains.
While numerous technologies for the design, development, and implementation
of context-aware applications are rapidly advancing and maturing, given that there
has been, over the last two decades, an intensive research in academic circles and in
the industry on context awareness, R&D activities and projects within, and thus the
advancement of, context awareness technology differs from an application domain
to another. This is due to researchers giving more attention to some areas of context
awareness than others as well as to, arguably, the complexity associated with some
types of context compared to others. Examples of areas that have witnessed an
intensive research in the field of context-aware computing include location-aware
and spatiotemporal-aware in relation both to ubiquitous and mobile computing in
addition to activity-based context-aware applications in the context of assisted
living within smart home environment. Cognitive, emotional, social, and conver-
sational context awareness, on the other hand, is an area in the field of AmI that has
recently started to attract researchers. Recent studies (e.g., Kwon et al. 2005; Kim
et al. 2007; Zhou and Kallio 2005; Zhou et al. 2007; Cearreta et al. 2007; Samtani
et al. 2008) have started to focus on the research topic of context awareness. Thus, it
is still in its infancy. And this wave of research appears to be evolving at a snail’s
pace, nevertheless.
applications automatically according to the context, e.g., when the mobile phone is
close to a phone it runs the contact list application and in the supermarket it executes
shopping list application. A common example of an active context-aware application
is a mobile phone that changes its time automatically when it enters a new time zone
or restricts phone calls when the user is in a meeting. This is opposed to passive
context-aware applications whereby a mobile phone would notify or prompt the user
to perform the action instead. Context awareness is even more important when it
comes to complex mobile devices in which productivity tools with communication
and entertainment devices converge to make mobiles highly multifunctional, per-
sonal smart devices. Especially, mobile phones have been transformed into a ter-
minal capable of accessing the internet, receiving television, taking pictures,
enabling interactive video telephony, reading RFIDs, sending a print request to a
printer at home, and much more (see Wright 2005). The ways in which such mul-
tifunctional, powerful devices are going to behave in AmI environments will vary
from a setting to another, including indoors, business meetings, offices, schools,
outdoors (e.g., touristic places, parks, marketplaces), on the move (e.g., walking in a
shopping mall and running), and so on. For example, if you enter a shopping mall,
your mobile phone could alert you whether any of your friends are also there, and
even identify precisely in which spot they are located, and also alert you to special
offers on products and services of interest to you based on, for example, your habits,
preferences, and prior history. Kwon and Sadeh (2004) proposed context-aware
comparative shopping and developed an active context-aware system that behaves
autonomously based on multi-agent. This system can be aware of a user’s location
and make educated guesses automatically about user preferences to determine the
best purchase. In their Autonomic Middleware for Ubiquitous eNvironment
(AMUN) applied to the Smart Doorplate Project, Trumler et al. (2005) propose a
system that tries to capture the user’s location, walking direction, speed, and so on;
location of the user with special badge is traced and system shows relevant infor-
mation to the user when the user approaches the specific room. As it is noticed, most
of the examples presented above are mainly associated with location-aware appli-
cations, which, to capture and use context within AmI environment, have focused on
the user’s external and physical context through physical devices such as smart
sensors, stereo-type cameras, and RFID. More of these applications as well as
activity-based context-aware application will be introduced as part of recent AmI
projects in the next two chapters. But examples of cognitive and emotional
context-aware applications will be, in addition to being introduced in the next two
chapters as well, elucidated and discussed in more detail in Chaps. 8 and 9. The
intent of mentioning different examples of context-aware applications is to highlight
the emerging research trends around other types of contexts of psychological,
behavioral, and social nature. While all types of contexts are crucial for the devel-
opment of context-aware applications, the real challenge lies in creating applications
that are able to adapt in response to the user’s context based on a synchronized,
dynamic fashion as to analyzing and reasoning about different, yet interrelated,
components of that context.
88 3 Context and Context Awareness of Humans and AmI Systems …
It is recognized that the realization of AmI vision presents enormous and daunting
challenges across computer science, many of which pertain to system engineering,
design and modeling. As mentioned above, context is a difficult topic to tackle and
context awareness has proven to be a complex, multilevel problem with regard to
realization—low-level sensor data acquisition, intermediate-level information pro-
cessing (interpretation and reasoning), and high-level application action. Context
recognition (awareness) comprises many different computational tasks, namely
context modeling, context detection and monitoring, information processing and
pattern recognition, and application actions. These tasks are no easy to deal with.
Thus, context awareness poses many challenges and open issues that need to be
addressed and overcome to bring the field of AmI closer to realization and
delivery-deployment of the next generation of AmI systems. In their project on
context-aware computing, Loke and his colleagues summarize some of these
challenges and open issues as follows:
• general principles and paradigms that govern the assembly of such systems;
• techniques and models of the information, structure and run-time behavior of such
systems;
• an identification of the classes of such systems, each with their specific design patterns,
models, applicable techniques, and design;
• principles and tailored methodologies for engineering context awareness;
• general methods for acquiring, modeling, querying…and making sense of context
information for such systems, with an involvement (and possible interaction) of data
analysis techniques and ontologies;
• the reliability of such systems given that they need to take action proactively [and
function when they are needed];
• the performance of such systems given that they need to be timely in acting;
• effective models of user interaction with such systems, including their update,
improvements over time, and maintenance and the development of query languages;
• enabling proactivity in such systems through learning and reasoning; and
• integration with the services computing paradigm for the provision of context as a
service to a wide range of applications (Loke et al. 2008).
Other challenges and issues include: the predictability of such systems given that
they need to react in ways they are supposed to; the dependability of such systems
given that they need to deliver what they promise; modeling of human functioning
(e.g., emotional, cognitive, conversational, and social processes); effective man-
agement of context information that grows sophisticated; critical review of opera-
tionalization of context in context-aware artifacts and their impact on how context is
conceptualized, especially in relation to human factor related context; full user
participation in design, development, configuration, and use of systems; under-
standing different users’ needs and demands, and, more importantly, how they can
3.10 Context Awareness: Challenges and Open Issues 89
In the literature on context awareness, context: what can be said about an entity,
tends to be synonymous with situation; hence, they have been used interchangeably.
As noticed above, several definitions of context are somewhat tautologous: context
is described as comprising contextual features, assuming ‘context’ and ‘situation’ are
tantamount. Situation describes the states of relevant entities or context represents
any information (contextual aspects) that characterizes the situation of an entity (e.g.,
Dey 2001). This reflects the sense in which the notion of context is applied to
context-aware computing, i.e., everything that could be relevant to a given person
(user) doing a given thing in a given setting. In fact, different dimensions of context,
such as physical, cognitive, emotional, and social, are referred to, in context-aware
computing, as high-level abstractions of context or situation—see Bettini et al.
(2010) and Perttunen et al. (2009) for examples, which are inferred by applying
pattern recognition techniques using machine learning algorithms or semantic rea-
soning using semantic descriptions and domain knowledge of context on the basis of
the observations of physical sensors—only what can be measured as physical
properties. This implies that the term ‘context’, as it is used, can be ambiguous. Most
definitions of context in the technical literature indicate that while context is viewed
as being linked to situations, the nature of this link remains unclear; situation seems
to consist of everything surrounding an entity as an object of enquiry while context
comprise specific features that characterize a situation (Lueg 2002). Thus, there is a
distinction between context and situation. There is more to consider when looking at
context from a perspective that is motivated by research in situated cognition and
situated action. This perspective, which is influenced by the notion of ‘situation’,
90 3 Context and Context Awareness of Humans and AmI Systems …
is on the focus in what remains of the discussion in this section. As the notion of
situation has some similarities to the notion of ‘context’ in those disciplines devoted
to the study of context (see Goodwin and Duranti 1992), context and situation must
have distinguishing features as well. The concept ‘situated’ is common across a wide
range of disciplines, including social science (sociology), computer science, artificial
intelligence, and cognitive science. The social connotation of ‘situated’, which ‘has
its origins in the sociology literature in the context of the relation of knowledge,
identity, and society’ (Lueg 2002) is partly lost as the concept has been reduced (in
terms of complexity) from something social in content and conceptual in form to
merely ‘interactive’ or ‘located in some time and place’ (Clancey 1997). It is this
connotation ‘that allows highlighting the differences between “context” as used in
work on context-aware artifacts and the original “situation”. A “situation” is an
observer-independent and potentially unlimited resource that is inherently open to
re-interpretation. “Context”, to the contrary, as an expression of a certain interpre-
tation of a situation is observer-dependent and therefore no longer open to
re-interpretation: the meaning of aspects included in the context description is more
or less determined. Other potentially relevant aspects may or may not be included in
the context description… The openness to re-interpretation matters as (individual)
users may decide to assign significance to aspects of the environment that were not
considered as significant before.’ (Lueg 2002, pp. 44–45). Understanding the way in
which meanings are constructed in interaction with the environment and how intense
our interaction is can help us gain insights into why a situation may very well be
open to re-interpretation. Schmidt (2005, p. 167) states: ‘All [inter]actions carried
out by a human take place in context—in a certain situation. Usually interaction with
our immediate environment is very intense… even if we don’t recognize it to a great
extent. All contexts and situations are embedded in the world, but the perception of
the world is dictated by the instantaneous context someone is in.’
Interaction entails a process of exchange of mental and social representations
between people and when these people construe meaning by means of representa-
tions, i.e., give meaning to these representations while observing and
representing/giving meaning. Accordingly, context represents a meaning that is
generated based on mental and social representations of people, objects, places,
events, and processes as contextual entities—that is, a subjective, socially situated
interpretation of some aspects of the situation in which interactions occur. This
process is too immediate and fluid to capture all the aspects of the environment—
what constitutes the situation; hence the need for re-evaluations and thus
re-interpretation of the situation (assigning significance to more aspects of the
environment) as the interaction evolves. This explains why an observer may perceive
an interaction differently as it unfolds through the changing context (by including
more of relevant aspects of it). One implication in context-aware computing is that a
higher level of the context (e.g., retrieving information, going to bed, making deci-
sion, feeling bored when interacting with an e-learning application, etc.) may be
inferred at a certain moment but just before this inferred context changes the appli-
cation’s behavior, the context may (unpredictably) change from the user part that the
system (agent) may not register. As a result, the system may behave inappropriately,
3.11 Context and Situation 91
meaning that its action becomes irrelevant and thus annoying or frustrating to the
user. This can be explained by the fact that contextual elements as part of a situation,
such as location, time, lighting, objects, work context, business process, and personal
event as an atomic level of the context may not change while other aspects such as
cognitive, emotional, and biochemical states and processes of people, social
dynamics, and intentions may well do or simply other components of contexts may
be brought in that would render the inference irrelevant. At the current stage of
research, context-aware applications are not capable of including the changing or
dynamic nature of context awareness and how it shapes and influences interaction. In
all, the assumption of tending to work with ‘context’ and ‘situation’ as two distinct
concepts is to enhance the functioning and performance of context-aware applica-
tions in AmI environments.
Context-aware systems are not capable of handling interactive situations the way
humans do. This entails understanding the meanings ascribed to interaction acts
through the continuously changing context as an ongoing interpretation of the
overall situation, and dynamically adjusting or reacting to new or unanticipated
circumstances. On the difference of (human) situated actions and (computer)
planned actions, Lucy Suchman writes: ‘The circumstances of our actions are never
fully anticipated and are continuously changing around us. As a consequence our
actions, while systematic, are never planned in the strong sense that cognitive
science would have it. Plans are a weak resource for what is primarily an ad-hoc
activity.’ (Suchman 2005, p. 20). The idea of situated action is that plans are
resources that need to be combined with many other situational variables as
resources to generate behavior; hence, they are far from being determining in
setting our actions. Researchers in situatedness, notably Suchman (1987, 2005) and
Clancey (1997), who have investigated the specific characteristics of usage situa-
tions understand the characteristics of a situation as resources for human cognition
and human (inter)action, contrary to most researchers developing context-aware
artifacts (Lueg 2002).
3.13 Situated Cognition, Action, and Intelligence 93
on the creation of models for all sorts of contexts, situations, and environments
based on the view of developers. Context-aware applications is about having
developers define what aspects of the world constitute context among the infinite
richness of other circumstances, thereby interpreting and evaluating context in a
way that stops the flow of meaning, by closing off the opportunity of including
emergent contextual aspects or re-assessing the significance assigned to some
previous aspects of the situational environment. One implication is that, as in a lot
of context-aware applications, the user does not have the possibility to negotiate the
meaning of the context and thus the relevance of the so-called ‘intelligent’ behavior.
All the user has to do is to obey what the developers define for him/her—adaptation
rules, although the inferred context is based on only what is computationally
measurable as contextual aspects—limited and imperfect data. Consequently, the
outcome of decision as to the behavior of computational artifacts—delivery of
ambient services—stays in the hands of the developers who understand when and
why things may happen based on particular contextual features. In this sense, the
developer is determining the behavior of the user without negotiating whether it is
suitable or not. Context is not only a ‘representational’ issue (as computers can
handle it), but also an ‘interactional’ issue to be negotiated through human inter-
action (Dourish 2004, pp. 4–6). Besides, developers can never model how people
attach meaning to and negotiate contexts, the logic underlying the socio-cognitive
processes of subjective, socially situated perception, evaluation, interpretation, and
making association in relation to places, objects, events, processes, and people. The
reality is that developers will continue to define what aspects of the world constitute
context and context-dependent application actions, regardless of whether they are
relevant or not for the user. This is due to the fact that within the constraints of
existing computing technologies, taking meaning of context into account is a
strange switch to make, as it ‘undermines the belief in the existence of a “model of
the user’s world” or a “model of behavior”’ (Criel and Claeys 2008, p. 66). There is
little knowledge and computational tools to incorporate user behavior in system
design (Riva et al. 2003). And a strong effort is needed in the direction of user
behavior and world modeling ‘to achieve in user understanding the same level of
confidence that exists in modeling technology’ (Punie 2003). However, it is still
useful—technically feasible—to create systems that allow the user to accept or
decline if an application should behave, deliver an ambient service, based on a
particular inferred context—a situated form of intelligence or user-driven adapta-
tion. Especially, it is unfeasible, as least at the current stage of research in AI and
AmI, computationally model how humans interpret and re-interpret situations to
dynamically shape the meaning of context that define and change their interaction.
Whether personalized, adaptive, responsive, or proactive, an application action as a
ready-made behavior of the system based on particular patterns of analysis and
reasoning on context should not be taken for granted to be relevant to all users as
long as the context that define interaction is carried out in a situation that consists of
potentially unlimited number of contextual aspects—resources for human cognition
and action. There is ‘an infinite richness of aspects that constitute the contexts of
purposeful action’ (Ulrich 2008, p. 7). In particular, interacting with context-aware
3.14 Context Inference, Ready-Made Behavior, and Action Negotiation 95
systems should entail the negotiation of the relevance of the actions of the system to
human actor’s situation; especially, our acting is not routine acting in its entirety.
Besides, translations of context-aware systems’ representations, as Crutzen (2005,
p. 226) argues: ‘must not fit smoothly without conflict into the world for which they
are made ready. A closed readiness is an ideal which is not feasible, because in the
interaction situation the acting itself is ad-hoc and therefore unpredictable. The
ready-made behavior and the content of ICT-representations should then be dif-
ferentiated and changeable to enable users to make ICT-representations ready and
reliable for their own spontaneous and creative use’. Like services, ‘information and
the ways we understand and use it are fundamentally contextual, that is, conditioned
by contextual assumptions through which we delimit relevant “facts” (observations)
and “values” (concerns) against the infinite richness of other circumstances we
might consider. Accordingly, we cannot properly appreciate the meaning, rele-
vance, and validity of information, and of the claims we base on it, without some
systematic tools for identifying contextual assumptions and unfolding their
empirical and normative selectivity. Context awareness of the third kind is about
giving…users more control over this fundamental selectivity.’ (Ulrich 2008, p. 1).
‘Learning, thinking, and knowing are relations among people engaged in activity in,
with, and arising from the socially and culturally structured world’ (Lave 1991).
Fundamentally, situations are subject to negotiation among the people involved in
the situation (e.g., Wenger 1998). Incapability to computationally capture this aspect
of negotiation has implications for the performance of context-aware applications.
Agre (2001) contends that context-aware applications may fail annoyingly as soon
as their wrong choices or decisions become significant. This argument stems from
the fact that people use various features of their environment (situations) as resources
for the social construction of entities, such as places, objects, and events.
Accordingly, abstracting from situations to context should be based on a description
that is so multi-dimensionally rich that it includes as potentially relevant aspects as
possible from a situations rather than a description that is more or less
pre-determined. In other words, the classes of situations that will influence the
behavior of applications have to be selected from a flexible, dynamic, semantic,
extensible, and evolvable model for what should have an influence on such appli-
cations. It is methodologically relevant to, regardless of technical implementation,
‘ask how we can systematically identify and examine contextual selections, our own
ones as well as those of other people…. Only thus can we be in control of our options
for choosing selections’ (Ulrich 2008, pp. 6–7). However, a computational artifact is
incapable of registering features of socially constructed environment (Lueg 2002).
An example taken from Lueg (2002, p. 45) is context-aware buildings, where, using
currently available context awareness technology, ‘a room in such a building could
monitor its electronic schedule, the number of persons in the room, and the
96 3 Context and Context Awareness of Humans and AmI Systems …
prevalence of business clothing among the persons in the room. The room could
compute that the current context is a “business meeting context” and could instruct
attendees’ mobile phones not to disturb the meeting; business-related information
could be projected onto the room’s multipurpose walls. However, being a social
setting in the first place, a meeting does not only depend on the already mentioned
aspects but also on what has been negotiated among participants of the meeting. This
means that even if a particular situation fits the description of a “meeting context”,
the situation may have changed into an informal get together and vice versa. The
subtle changes are hardly recognizable as commonly mentioned context aspects,
such as…location, identity, state of people, groups and computational and physical
objects, may not change at all. In a sense, the context does not change while the
surrounding situation does. Examples for such situational changes are unexpected
breaks or being well ahead of the schedule so that a meeting finishes earlier than
expected. Once the meeting has changed its nature, it may no longer be appropriate
to block calls and it may no longer be appropriate to project business-related
information on walls (as it would demonstrate that the hosting company’s expensive
technology did not recognize the change in the meeting situation).’ Another example
is provided by Robertson (2000) of a business situation that changes, as the com-
putational artifacts could not sense recognizable changes by the people involved in
the situation. While many researchers have in recent years contributed related
viewpoints to AmI and HCI more general, these insights have just started to attract
attention they deserve in the discussion of AmI applications or context-aware arti-
facts. In all, as Lueg (2002) contends, there remains ‘an explicit distinction between
the concept of context that is operationalized and the original usage situation…as a
social setting that has been negotiated among peers in the first place’, and accord-
ingly, ‘developers of context-aware artifacts should pay considerable attention to the
fact that the context determined by artifacts may differ from what the persons
involved in the situation have negotiated’ (Ibid, p. 43).
model, perceiving emotions as the very first step entails identifying emotions and
discriminating between accurate (appropriate) and inaccurate (inappropriate)
expressions of emotion, which is an important ability to understand and analyze
emotional state. Also, cultural aspects are part of the situation or background relevant
to a person; in relation to emotions, cultural variations are great as different cultures
may assign different meanings to different facial expressions, e.g., a smile as a facial
expression can be considered a friendly gesture in one culture while it can signal
embarrassment in another culture. Operationalizing emotional context should take
into account cultural specificities so to enable related context-aware artifacts to be
tailored to or accommodate users variations if they are wanted to be widely accepted
(see Chap. 7 for further discussion). In all, the difference between context-aware
artifacts driven by what is technically feasible and what might be helpful in a con-
textual situation matter, with consideration of social, cultural, emotional, and cog-
nitive aspects that cannot be computationally detected, modeled, and understood by
currently available enabling technologies and processes. One implication is that
context-aware applications will fail in their choices as long as the inferred context
differs from the actual context in which users may find themselves or from the way
they perceive it. Regardless, ‘there is little hope that research on context-aware arti-
facts will succeed in overcoming the problem that context understood as a model of a
situation—is always limited… [F]or context-aware artifacts it may be difficult or
impossible to determine an appropriate set of canonical contextual states. Also, it may
be difficult to determine what information is necessary to infer a contextual state.’
(Lueg 2002, p. 44).
The above approach to operationalizing context relates to what is called bottom–
up approach to context definition, which is based on the availability of particular
technologies that can sense (and model) some aspects of context, which remain
sufficient enough to enable to develop a functional context-aware system. As the
top–down approach to context definition, it entails identifying all the components
that constitute a context and then the system designer can select what is appropriate
to include as sensor technologies along with suitable pattern recognition algorithms
and/or representation and reasoning techniques. This implies that a system designer,
working backwardly, looks at the nature of the context the application is concerned
with and then attempts to combine relevant sensors with machine learning methods
and modeling approaches based on the analysis of the various context features
associated with the intended use of the application. While this approach is gaining a
growing interest in the field of context-aware computing, owing to the advance of
technologies for the design, development, and implementation of context-aware
applications, there are still some challenges and open issues to address and over-
come when it comes to operationalizing complex contexts, such as physical
activities, cognitive activities, emotional processes, social processes, communica-
tive intents, and so on. It is worth noting that this approach, although rewarding at
practical level, remains far from complete application or concrete implementation.
Indeed, experiences with the development of context-aware artifacts have shown
3.16 Operationalizing Context: Simplifications, Limitations, and Challenges 99
that the top–down approach and thus the operationalization of the concept of
context is associated with a lot of difficulties. Researchers and software engineers
usually start with comprehensive definitions but end up operationalizing much
simpler concepts of context (see Lueg 2002). Good examples are the definitions
provided by Dey et al. (2001), Schmidt et al. (1999), Gross and Prinz (2000), Kirsh
(2001), and Göker and Myrhaug (2002). While the definitions are rather compre-
hensive and the conceptual models seem rich, involving many aspects that con-
stitute context and qualitative features of context information, the actual
implementation of the definitions in some of these researchers’ context awareness
architectures consist of a number of explicitly defined attributes, such as physical
locations and conditions related to a context, computational and physical objects of
a context, and human members of a context. In all, the bottom–up approach to
context definition and the related operationalization perspectives still dominates
over the top–down one after all. And it seems that simplifications are necessary
when developing context-aware applications. There is a propensity towards alien-
ating the concept of context from its multifaceted meaning in more theoretical
disciplines in order to serve technical purposes (e.g., Lueg 2002). This pertains
particularly to context-aware artifacts which show human-like understanding and
supporting behavior—human factors related context-aware applications. The sim-
plified ways in which context have been operationalized corroborate the intention of
researchers, designers, and computer scientists to make context awareness projects
happen in reality. AmI ‘applications are very fragile…, designers and researchers
feel this pain…, but they compensate for this by the hard to beat satisfaction of
building this technology [AmI]. The core of their attraction to this lies in ‘I can
make it’, ‘It is possible’ and ‘It works’. It is the technically possible and makeable
that always gets the upper hand’. Who wants to belong to the nondesigners?’
(Crutzen 2005, p. 227).
Arguably, the simplifications observed when operationalizing context is not so
much a matter of choice for researchers as it is about the constraints of computing
as to the design and modeling of human context, which is infinitely rich, constantly
changing, intrinsically unpredictable, and inherently dynamic and multidimensional
and thus intractable. There will always be a difference between human context in its
original complex definition and its operationalization—the context information that
is sensed and the context model that is implemented, irrespective of the advance-
ment of sensor technology (e.g., MMES, NMES) and pattern recognition
algorithms/machine leaning techniques (e.g., handling uncertainty and vagueness of
context information), and, more recently, the merger of different representation and
reasoning techniques (e.g., ontological and logical approaches with rule-based and
probabilistic methods). This is an attempt to overcome potential problems associ-
ated with the operationalization of context in terms of computational formalism as
to representation and reasoning, e.g., reconcile probabilistic reasoning with rea-
soning with languages not supporting uncertainty of context information such as
ontology language (Concrete examples of context awareness architectures or pro-
jects that have applied the hybrid approach to context modeling and reasoning are
100 3 Context and Context Awareness of Humans and AmI Systems …
provided in Chap. 5). In fact, resolving the trade-off between expressiveness and
complexity as well as uncertainty and vagueness in context modeling, coupled with
the miniaturization of capture technology (sensors) and what this entails in terms of
efficiency improvement as to such features as computational speed, bandwidth,
memory, high performance communication network, energy efficiency, and so on
hold a promising potential for achieving and deploying AmI paradigm.
Simplifications associated with operationalizing context in relation to ontologi-
cal modeling of context—conceptualization of context and encoding related key
concepts and the relationships among them, using the commonly shared terms in
the context domain—are explained by what the term ‘ontology’ means in computer
science. While this term is inspired by a philosophical perspective: an ontology is a
branch of philosophy that is concerned with articulating the nature and structure of
the life world, in computing, it signifies a set of concepts and their definitions and
interrelationships intended to describe the world, which depends on the ease with
which real-world concepts (e.g., context, interaction, user behavior) can be captured
by software engineers and the computational capabilities provided by existing
ontologies, such as the expressive power of models. The challenge facing computer
scientists in general, and AmI design engineers in particular, in the field of and
research in context-aware computing, is to computationally capture what constitutes
context as a phenomenon in real life, which is conceived in a multidimensional
way, identifying historical, social, cultural, ethical, psychological, behavioral,
physical, and normative aspects. When speaking of a phenomenon that is of interest
in the ‘world’, the term Universe of Discourse (UoD) (e.g., context) is used; it is
well established within conceptual modeling (e.g., Sølvberg and Kung 1993). In
addition, in terms of the frame problem (e.g., Pylyshyn 1987), which is one of the
most difficult problems in classical representation-based AI (and continues to be in
AmI), it entails what aspects or features of the world (e.g., human context) must be
included in a sufficiently detailed world model (e.g., ontological context model) and
how this model can be kept up-to-date when the world changes (e.g., context
changes with and is modulated through interactions or is an expression of certain
interpretation, and ongoing re-interpretation, of situations in which interactions take
place). Indeed, the frame problem has proven to be intractable in the general case
(e.g., Dreyfus 2001), and aspects of the world are constantly changing, intrinsically
unpredictable, and infinitely rich (Pfeifer and Rademakers 1991). However, while
aspects of the world become context through the way system developers or
ontology modelers use them in interpretation and not because of their inherent
properties, context models, in particular those represented through ontological
formalism are to be evaluated based on their comprehensiveness, expressiveness,
dynamicity, fidelity with real-world phenomena, accuracy, internal consistency,
robustness, coherence, to name a few criteria. Regardless, what becomes certain,
though, is that there is no certainty that research on context-aware systems will
succeed in surmounting the issue that context models as implemented are always
limited (see Chap. 5 for further discussion). In other words, context-aware appli-
cations will never be able to conceive of context—contextual assumptions and
3.16 Operationalizing Context: Simplifications, Limitations, and Challenges 101
The real concern is that these should not fail annoyingly when the system’s wrong
choices become significant because of inefficient context measurement and thus
inference, as the fact remains that most of the reasoning mechanisms or processes
suggested for context-aware applications entail extremely complex inferences based
on limited and imperfect data. The difficulty of handling emotional and cognitive
context at operational level lies in the insurmountable complexity inherent in dealing
with such issues as fuzziness, uncertainty, vagueness, and incompleteness of con-
textual information at measurement, representation and reasoning. It is because
‘contexts may be associated with a certain level of uncertainty, depending on both
the accuracy of the sensed information and precision of the deduction process’ that
‘we cannot directly sense the higher level contexts’ (Bettini et al. 2010).
A rudimentary example is the difficulty to model the feeling of ‘having cold’; ‘we
will probably never be able to model such entities’ (Criel and Claeys 2008). While
the physical world itself and our measurements of it are prone to uncertainty—
capturing imprecise, incomplete, vague, and sometimes conflicting data about the
physical world seems to be inevitable. Besides, not all modeling approaches (rep-
resentation and reasoning techniques) in context-aware computing support fuzziness
and uncertainty of context information. For example, ontological approach to con-
text modeling does not adequately address the issue of representing, reasoning about
and overcoming uncertainty in context information (see, e.g., Bettini et al. 2010;
Perttunen et al. 2009). To address this problem, methods such as probabilistic logic,
fuzzy logic, Hidden Markov Models (HMM) and Bayesian networks (see next
chapter) are adopted in certain models to deal with uncertainty issues for they offer a
deeper support for modeling and reasoning about uncertainty. For example,
Bayesian networks are known to be well suited for combining uncertain information
from a large number of physical sensors and inferring higher level contexts.
However, probabilistic methods, according to Chen and Nugent (2009), suffer from
a number of shortcomings, such as ad-hoc static models, inflexibility (i.e., each
context model needs to be computationally learned), data scalability, scarcity,
reusability (i.e., one user’s context model may be different from others), and so on.
Nevertheless, hybrid methods have been proposed and recently applied in a number
of projects of context-aware computing to overcome the limitations of different
modeling methods. This is making it increasingly easier for developers to build new
applications and services in AmI environments and to reuse various ways of han-
dling uncertainty. Particularly, reasoning on uncertainty aims to improve the quality
of context information by typically taking ‘the form of multi-sensor fusion where
data from different sensors are used to increase confidence, resolution or any other
context quality metrics’, as well as inferring new types of context information by
typically taking the form of deducing higher level contexts from lower level con-
texts, such as the emotional state and activity of a user (Bettini et al. 2010). This will
have great impact on how context can be modeled and reasoned about, and thus
operationalized, in a qualitative way. Indeed, operationalizations of context in
context-aware artifacts have impact on how context is conceptualized.
3.17 Evaluation of Context-Aware Artifacts 103
critical because it provides useful information on how the system and its underlying
components work in real-world situations. March and Smith (1995, p. 260) state:
‘In much of the computer science literature it is realized that constructs, models, and
methods that work “on paper” will not necessarily work in real-world contexts.
Consequently, instantiations provide the real proof. This is evident, for example, in
AI where achieving “intelligent behavior” is a research objective. Exercising
instantiations that purport to behave intelligently is the primary means of identi-
fying deficiencies in the constructs, models, and methods underlying the instanti-
ation.’ Evaluation is valuable for both designers and implementers of context-aware
systems as to assessing whether the systems being designed are effective in terms of
meeting the expectations of the users when implemented in real-world situations.
In the field of computer science, there exists an array of constructs, models, and
methods that are robust, effective, and of high performance, but designed initially
for specific applications within AI, HCI, and more recently AmI. This implies that
these components may not always function as expected when used in general
applications—for other purposes other than those for which they are originally
developed. Indeed, as pointed out above, methods and models for context recog-
nition differ in terms of handling data abundance, uncertainty of context informa-
tion, uncertainty on reasoning, multi-sensor fusion, scalability, dynamicity, and
management of information flow. There is a wide variety of constructs, models, and
methods with significant differences in application. For example, behavioral
methods for emotion recognition and theoretical models of emotions (see Chap. 8
for more detail) can be applied to many different systems with performances
varying over the domain of application within context-aware computing, affective
computing, and conversational agents. Another example of constructs is Ontology
Web Language (OWL), a de facto standard in context-aware computing which is
currently being used for conceptual modeling—to implement context models.
Considering the fact that it was originally designed for computational efficiency in
reasoning, OWL as a modeling language fall short in offering suitable abstractions
for constructing conceptual models, as defended extensively in (Guizzardi et al.
2002; Guizzardi 2005). Whereas today’s W3C Semantic Web standard suggests a
specific formalism for encoding ontologies, with several variants as to expressive
power (McGuinness and van Harmelen 2004). More examples of constructs,
models, and methods and the differences in relation to their use in the field of
context-aware computing are covered in subsequent chapters. However, the main
argument is that evaluation becomes ‘complicated by the fact that performance is
related to intended use, and the intended use of an artifact can cover a range of
tasks… Not only must an artifact be evaluated, but the evaluation criteria them-
selves must be determined for the artifact in a particular environment. Progress is
108 3 Context and Context Awareness of Humans and AmI Systems …
Over the years several significant changes have emerged in computing (or ICT) and
its application in different Human Activity Systems (HAS). These changes have led
to a new wave of design methods embracing new dimensions to deal with funda-
mental issues in ICT design and development. Examples of the most common,
major phase shifts include: from HCI to MetaMan (MM); from Human Computer
Communication (HCC) via Computer to Computer Communication (CCC) to
Thing to Thing Communication (TTC); from virtual reality (VR) to hybrid reality
110 3 Context and Context Awareness of Humans and AmI Systems …
(HR); from Informing Systems (IS) to co-Creating Systems (CS); from require-
ments specification to co-design; from Technology driven (Td) to Demands driven
(Dd) development; from expert methods (EM) via Participatory Methods (PM) to
Stakeholder Methods (SM); and so forth. In terms of participative design, various
methods have been proposed and applied to ICT design and development. Among
which are user-centered design and participatory design as dominant design phi-
losophies that emphasize user-centrality and participation. They are usually utilized
in HCI design (yet not restricted to interactive technologies) to strive to create
useful user interfaces to respond to different classes of users and satisfy their needs.
They continue to be used to create functional, useful, usable, intelligent, emo-
tionally appealing, and aesthetically pleasant interactive systems, including AmI
applications. They both involve a variety of methods that emphasize user-centrality
and participation in different forms and formats.
process? What is their impact? Has this concept become an empty signifier (Laclau
and Mouffe 1985)? In relation to context-aware computing, in promotional material
of AmI applications for the home, the researchers determined that, in contradiction
to the discourse of ‘putting the user central’, almost half of the pictures used in the
promotional material contained no humans but devices (Ben Allouch et al. 2005).
The social connotation of ‘user participation’ is partly lost as the term has been
reduced from something social and political in content and conceptual in form to
merely situated in some setting, thereby diverging from its origin. Indeed, in the
area of HCI, having the users participating in the design and development process is
not taken to mean participation in more theoretic views. User participation is
considered to be circumscribed in user-oriented design practice, as the meaning
attached to the concept of participation remains partial and influenced in most cases
in terms of power distribution—between designers and users—in the province of
technology design. To give a better idea to the reader, it is of import to trace the
origin of ‘user participation’ in design of early information systems and how it has
evolved. This is of relevance for the discussion of key issues relating to the use of
user-centered design models in the dominant design trend and the implication of
this use for the development of context-aware applications.
means to generate, exploit, and enhance the knowledge upon which technologies
are built. Taken up more broadly, PD is described as a democratic, cooperative,
interactive and contextual design philosophy. It epitomizes democracy as it ensures
that users and designers are on the same footing, and sees user participation as a
vehicle for user empowerment in various ways. It maintains roles for designers and
users but calls for users to play a more active part in the imagination and specifi-
cation of technologies. Thereby, it seeks to break barriers between designers and
users and facilitate knowledge exchange between them through mutual involvement
in the design process. Indeed, the co-design process is about shared effective
communication and social collaboration which supports well-informed decisions
and actions in the event of desired democratic change. Drawing on Suchman’s
(2002) account, it is useful to think of design processes more as shaping and staging
encounters between multiple parties and less as ways that designers can formulate
needs and measure outcomes. Moreover, as a contextual approach, PD is about
designers acting in a social cultural setting where the users feed into the process by
providing knowledge needed to build and improve the use of interactive systems
that aim to facilitate daily activities within that setting. PD works well not because
of an inherent superiority to other methods, but rather it draws advantage from
cultural rationalities and practices specific to the setting in which it emerged. The
quintessence of the process is that different people come together and meet to
exchange knowledge, which draws attention to the context of that encounter and the
bidirectionality of the exchange; it is about what people bring into the encounter
and what they take away from it (Irani et al. 2010). Besides, designers should be
able to respond to different situations, whereby the users challenge their ability to
benefit from the contextual situated experience and knowledge. Furthermore, PD
seeks to better understand human users by exploring new knowledge for under-
standing the nature of participative design through an interlocutory space between
the designers and users and for improving the performance of such design through
developing innovative solutions for how to creatively involve users in the devel-
opment of technological systems. Researchers in PD are concerned with a more
human, creative, and effective relationship between the designers and users of
technology, and in that way between technology and the human activities that
provide the rationale for technological systems to exist (Suchman 1993). In the
context of AmI, it is more important than ever that new technologies should allow,
motivate, and require users to play a responsible role as co-designers, modifiers, and
value co-creators, which is not the case for context-aware computing. This is further
discussed below.
UCD perspective has emerged as a strong call for designing well-informed ICT
solutions and become of prime focus in HCI research and practice. UCD is the
dominant trend in HCI design, a widely practiced design philosophy, rooted in the
3.18 Design of Context-Aware Applications and User Participation 113
idea that users must be at the center of design process. In it, designers try to know as
much as possible about their users. Grounded in the understanding of users, UCD
allows designers to work together with users to articulate their needs, wants, goals,
expectations, and limitations. Within UCD practices users are asked to give feed-
back through specific user evaluations and tests to improve the design of interactive
systems. The attention is given to users during requirements gathering and usability
testing, which usually occur iteratively until the relevant objective has been
attained. However, within user-informed design—e.g., interaction design (Arnall
2006), and experience design (Forlizzi and Battarbee 2004)—information about the
users is gathered before developing a design and the user is included at a certain
moment in the design process (Geerts et al. 2007). When organizing co-design
sessions ‘the user is integrated in a very early stage of the conceptual and interface
design process and the focus is on the mutual learning process of developer and
user.’ (Criel and Claeys 2008, p. 61).
Underlying the notion of UCD is the idea that users are not forced to change how
they perform daily activities, using designed systems, to accommodate what the
designer has to propose as solutions, but rather to facilitate how users perform their
daily activities and how systems can be effectively suited to their skills and
experiences. Of importance is also to lessen the technical knowledge threshold
required to make constructive use of functionality, communication, and processing.
more the user is actively involved, the more successful are the designed techno-
logical solutions.
However, research shows that the mainstream trends in the design of
context-aware applications do not fully pursue the participatory philosophy of
design. In other words, the claim about user-centrality in design remains at the level
of discourse, as it has been difficult to translate the UCD guidelines into real-world
actions. Thus, there is a gap between theory and practice as to the user involvement
when it comes to HCI design, the design of context-aware applications in particular
and of interactive technologies in general. In fact, full user participation in the
design process has been questioned and contested for long, and continues to be
challenged in the field of context-aware computing. Consequently, many views
argue that the vision of AmI may not evolve as envisioned with regards to sup-
porting the user due to the uncertainty surrounding the current vision of user
participation. It is doubtful if the AmI vision puts the user at such a central stage as
designers often claim (Criel and Claeys 2008). It is the dominant design discourse
shaping the process of creating context-aware applications that is most likely the
cause of failing to see a real breakthrough in research within AmI.
Full user participation in the design process is one of the most contentious issues
raised in the realm of HCI. When talking about user participation in the develop-
ment of technologies and their applications and services, one can in ‘best’ cases
speak of a certain form of partial participation, but in no way of full participation—
of more or less equal power relations. Indeed, partial participation is the de facto
standard in most UCD methods concerned with HCI design. There are different
methods that can be clustered into the name ‘UCD’ and all of them lean on the
participation of the user in the innovation process.
Involving interdisciplinary teams, performing user research and organizing
co-design sessions (where users are allowed to work together with the designer(s)
or with other users) as common practices in UCD differs from how things can be
done within PD. User participation as applied in UCD is similar (comparable) but
not identical to PD in which users are considered as partners with the designers.
Experiences of HCI design show that user participation is not currently being
applied according to the original idea developed within PD—users are to fully
participate and thus actively contribute to the design process through shared design
sessions and workshops, exchanging feedbacks and suggestions with designers.
Although UCD approach involves consulting directly with the users, the approach
is said to be not fully participatory in practice, as users are not fully involved in all
stages of the design process, and subsequently do not shape the decisions and
outcomes of design solutions. There limitations as related to both user research and
3.18 Design of Context-Aware Applications and User Participation 115
There is a firm belief that users will never fully participate in the design of context-
aware applications and environments. Most work in developing context-aware
artifacts appears to be technology-driven, that is, development is driven by what is
3.19 Empowering Users and Exposing Ambiguities: Boundaries … 119
reconsidering the role of users. The significance of letting users handle some of
the ambiguities that may arise and the semantic connections of the AmI system
lies in its potential to overcome many of the complex issues relating to the need
for accurate or perfect sensing and interpretation of the state of the world (e.g.,
the human’s psychological and social states overtime) that many AmI scenarios
seem to proclaim (José et al. 2010).
2. Users should be able to understand the logic applied in context-aware applica-
tions, meaning that they should be able to know why a certain action is performed
or an application behaves in a certain way. Schmidt (2005) argues for an AmI
interaction model in which users can always choose between implicit and explicit
interfacing: ‘The human actor should know…why the system has reacted as it
reacted’. This is very important because it enables the user to interfere in and
choose how the application should behave in a given situation. People should be
active shapers of their ambient environments, not passive consumers of ambient
services. Contrariwise, developers tend to determine when context-dependent
actions should be performed by defining the inferred data for a certain (implicit
contextual) input, and, in this case, users are expected to passively use or receive
them, without any form of negotiation. In other words, users do not have the
possibility to influence the inferred context or decline the associated intelligent
responses of context-aware systems; they are obliged to accept what the devel-
opers have to offer as ambient intelligent services. ‘When things happen without
the understanding of a person but only by the developer, the developer is
determining the behavior of that person in a non-democratic way…A lot of
applications have these problems but in context-aware applications the life of the
person is affected without the feeling of direct computer interaction’ (Criel and
Claeys 2008, p. 70). Again, this is about empowering people through, as José
et al. (2010, p. 1488) contend, ‘enabling them to generate their own meaning for
the interaction with AmI systems. This should provide a real path towards
user-driven AmI scenarios that provide meaningful functionality that is effec-
tively valued by potential users. Rather than removing the “burden” of choosing,
AmI should make decisions easier to judge and support new practices that allow
people to more intelligently undertake their lives… Instead of having the system
deciding for us, we can leverage on the system for making our choices more
informed and promoting serependity. Moreover, giving people more control may
be an essential step in unleashing the creativity and the everyday life connection
that has so often been missing from AmI research, extending it into more playful
and creative practices.’
Moreover, in relation to the argument that people should understand why
applications behave as they behave, explanations should be unambiguous: in a
human understandable rather than in a mystic computerized way. To give users
a better understanding as to logic of context-aware application logic, it is sug-
gested to present a diagnosis to the user that explains why different
context-aware actions taking place in the AmI environment occur, keeping in
mind that the provided information should be in a graphical way or a human
3.19 Empowering Users and Exposing Ambiguities: Boundaries … 121
References
Abowd GD, Mynatt ED (2002) Charting past, present, and future research in ubiquitous
computing. In: Carroll JM (ed) Human-computer interaction in the new millennium. Addison
Wesley, Boston, pp 513–536
Agre PE (2001) Changing places: contexts of awareness in computing. Human Compu Interact 16
(2–3)
Arnall T (2006) A graphic language for touch-based interactions. Paper presented at the mobile
interaction with the real world (MIRW 2006), Espoo, Finland
Asaro PM (2000) Transforming society by transforming technology: the science and politics of
participatory design. Account Manage Inf Technol 10(4):257–290
Barkhuus L, Dey A (2003a) Is context-aware computing taking control away from the user? Three
levels of interactivity examined. In: Ubiquitous computing, pp 149–156
Barkhuus L, Dey A (2003b) Location-based services for mobile telephony: a study of users’
privacy concerns. In: Proceedings of Interact, ACM Press, Zurich, Switzerland, pp 709–712
Beck E (2001) On participatory design in Scandinavian computing research. University of Oslo,
Department of Informatics, Oslo
Bellotti V, Edwards WK (2001) Intelligibility and accountability: human considerations in
context-aware systems. Human Comput Interact 16(2–4):193–212
Ben Allouch S, Van Dijk JAGM, Peters O (2005) Our future home recommended: a content
analysis of ambient intelligence promotion material. Etmaal van de Communicatiewetenschap.
Amsterdam, The Netherlands
Bjerknes G, Ehn P, Kyng M (eds) (1987) Computers and democracy—a Scandinavian challenge.
Aldershot
Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A
survey of context modelling and reasoning techniques. J Pervasive Mobile Comput Spec Issue
Context Model Reasoning Manage 6(2):161–180
Bianchi-Berthouze N, Mussio P (2005) Introduction to the special issue on “context and emotion
aware visual computing”. J Vis Lang Comput 16:383–385
Bravo J, Alaman X, Riesgo T (2006) Ubiquitous computing and ambient intelligence: new
challenges for computing. J Univ Comput Sci 12(3):233–235
Brooks RA (1991) Intelligence without representation. Artif Intell 47(1–3):139–159
124 3 Context and Context Awareness of Humans and AmI Systems …
Brown PJ (1996) The stick–e document: a framework for creating context-aware applications. In:
Proceedings of EP’96, Palo Alto, pp 259–272
Brown PJ, Jones GJF (2001) Context-aware retrieval: exploring a new environment for
information retrieval and information altering. Pers Ubiquit Comput 5(4):253–263
Carpentier N (2007) Introduction: participation and media. In: Cammaerts B, Carpentier N
(eds) Reclaiming the media: communication rights and democratic media roles. Intellect,
Bristol
Cearreta I, López JM, Garay-Vitoria N (2007) Modelling multimodal context-aware affective
interaction. Laboratory of Human–Computer Interaction for Special Needs, University of the
Basque Country
Chen G, Kotz D (2000) A survey of context-aware mobile computing research. Paper TR2000–
381, Department of Computer Science, Dartmouth College
Chen L, Nugent C (2009) Ontology-based activity recognition in intelligent pervasive
environments. Int J Web Inf Syst 5(4):410–430
Cheverst K, Mitchell K, Davies N (2001) Investigating context-aware information push vs.
information pull to tourists. In: Proceedings of mobile HCI 01
Clancey WJ (1997) Situated cognition. Cambridge University Press, Cambridge
Cowie R, Douglas-Cowie E, Cox C (2005) Beyond emotion archetypes: databases for emotion
modelling using neural networks. Neural Networks 18(4):371–388
Criel J, Claeys L (2008) A transdisciplinary study design on context aware applications and
environments. A critical view on user participation within calm computing. Observatorio
(OBS*) J 5: 057–077
Crowley J, Coutaz J Rey G, Reignier P (2002) Perceptual components for context aware
computing. In: Proceedings of UbiComp: ubiquitous computing, 4th international conference,
Springer, Berlin
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc 3
(4):219–232
Dey AK (2000) Providing architectural support for building context-aware applications. PhD
thesis, College of Computing, Georgia Institute of Technology
Dey AK (2001) Understanding and using context. Pers Ubiquit Comput 5(1):4–7
Dey AK, Abowd GD, Salber D (2001) A conceptual framework and a toolkit for supporting the
rapid prototyping of context-aware applications. Human Comput Interact 16(2–4):97–166
Dockhorn C, Ferreira P, Pires L, Van Sinderen M (2005) Designing a configurable services
platform for mobile context-aware applications. J Pervasive Comput Commun 1(1)
Dourish P (2001) Where the Action Is. MIT Press
Dourish P (2004) What we talk about when we talk about context. Pers Ubiquitous Comput 8
(1):19–30
Dreyfus H (2001) On the internet. Routledge, London
Ekman P (1984) Expression and nature of emotion. Erlbaum, Hillsdale
Edwards WK, Grinter RE (2001) At Home With Ubiquitous Computing: Seven Challenges. In:
Proceedings of the UbiComp 01, Atlanta, GA. Springer-Verlag, pp 256–272
Elovaara P, Igira FT, Mörtberg C (2006) Whose participation? Whose knowledge?—exploring PD
in Tanzania–Zanzibar and Sweden. In: Proceedings of the ninth Participatory Design
Conference, Trento
Erickson T (2002) Ask not for whom the cell phone tolls: some problems with the notion of
context-aware computing. Commun ACM 45(2):102–104
Forlizzi J, Battarbee K (2004) Understanding experience in interactive systems. Paper presented at
the DIS2004, Cambridge
Geerts D, Jans G, Vanattenhoven J (2007) Terminology. Presentation at citizen media meeting,
Leuven, Belgium
Giunchiglia F, Bouquet P (1988) Introduction to contextual reasoning: an artificial intelligence
perspective. Perspect Cogn Sci 3:138–159
Goodwin C, Duranti A (eds) (1992) Rethinking context: language as an Interactive phenomenon.
Cambridge University Press, Cambridge
References 125
Lave J (1991) Situated learning in communities of practice. In: Resnick LB, Levine JM,
Teasley SD (eds) Perspectives on socially shared cognition. American Psychological
Association, Washington DC, pp 63–82
Leahu L, Sengers P, Mateas M (2008) Interactionist AI and the promise of ubicomp, or, how to put
your box in the world without putting the world in your box. In: Proceedings of the 10th Int
conf on Ubiquitous comput, pp 134–143, ACM, Seoul, Korea
Lee Y, Shin C, Woo W (2009) Context-aware cognitive agent architecture for ambient user
interfaces. In: Jacko JA (ed) Human–computer interaction. Springer, Berlin Heidelberg,
pp 456–463
Lieberman H, Selker T (2000) Out of context: computer systems that adapt to, and learn from,
context. IBM Syst J 39:617–632
Lim BY, Dey AK, Avrahami D (2009) Why and why not explanations improve the intelligibility
of context-aware intelligent systems. Proc CHI 2009:2119–2128
Lim BY, Dey AK (2009) Assessing demand for intelligibility in context aware applications.
Carnegie Mellon University, Pittsburgh
Lindblom J, Ziemke T (2002) Social situatedness: Vygotsky and beyond. In 2nd Int Workshop on
Epigenetic Robotics: modeling cognitive development in robotic systems, p. 7178, Edinburgh,
Scotland
Loke SW (2004) Logic programming for context-aware pervasive computing: language support,
characterizing situations, and Integration with the Web. In: Proceedings IEEE/WIC/ACM
international conference on web intelligence, pp 44–50
Loke S, Ling C, Gaber M, Rakotonirainy A (2008) Context aware computing, arc research
network in enterprise information infrastructure, viewed 03 January 2012. http://www.eii.edu.
au/taskforce0607/cac//http://hercules.infotech.monash.edu.au/EII–CAC/
Lueg C (2002) Operationalizing context in context-aware artifacts: benefits and pitfalls Human
Technol Interface 5(2), pp 43–47
March ST, Smith GF (1995) Design and natural science research on information technology. Decis
Support Syst 15:251–266
McGuinness DL, van Harmelen F (2004) OWL web ontology language overview. W3C
Recommendation, viewed 28 March 2011. http://www.w3.org/TR/owl–features/
Muir B (1994) Trust in automation: part I theoretical issues in the study of trust and human
intervention in automated systems. Ergonomics 37(11):1905–1922
Nardi BA (1996) Studying context: a comparison of activity theory, situated action models, and
distributed cognition. In: Nardi BA (ed) Context and consciousness. The MIT Press,
Cambridge, pp 69–102
Nes M (2005) The Gaps between the digital divides, University of Oslo, viewed 16 March 2009.
http://folk.uio.no/menes/TheGapsBetweenTheDigitalDivides.pdf
Newell A, Simon HA (1972) Human problem solving. Prentice Hall, New Jersey
Noldus L (2003) HomeLab as a scientific measurement and analysis instrument. Philips Res
34:27–29
Norman D (2005) Human-centered design considered harmful. Interactions 12(4):14–19
Nygaard K, Bergo TO (1973) Planning, management and data processing. Handbook for the
labour movement, Tiden Norsk Forlag, Oslo
Obrenovic Z, Starcevic D (2004) Modeling multimodal human–computer interaction. IEEE
Comput 37(9):65–72
O’Hare GMP, O’Grady MJ (2003) Gulliver’s genie: a multi-agent system for ubiquitous and
intelligent content delivery. Comput Commun 26:1177–1187
Pascoe J (1998) Adding generic contextual capabilities to wearable computers. In: Proceedings of
the 2nd IEEE international symposium on wearable computers: IEEE computer society
Pateman C (1970) Participation and democratic theory. Cambridge University Press, Cambridge
Perttunen M, Riekki J, Lassila O (2009) Context representation and reasoning in pervasive
computing: a review. Int J Multimedia Eng 4(4)
References 127
Pfeifer R, Rademakers P (1991) Situated adaptive design: toward a methodology for knowledge
systems development. In: Brauer W, Hernandez D (eds) Proceedings of the conference on
distributed artificial intelligence and cooperative work. Springer, Berlin, pp 53–64
Pfeifer R, Scheier C (1999) Understanding Intelligence. MIT Press
Philipose M, Fishkin KP, Perkowitz M, Patterson DJ, Hahnel D, Fox D, Kautz H (2004) Inferring
activities from interactions with objects. IEEE Pervasive Comput Mobile Ubiquitous Syst 3
(4):50–57
Prekop P, Burnett M (2003) Activities, context and ubiquitous computing. Comput Commun
26:1168–1176
Ptaszynski M, Dybala P Shi, Rafal W, Araki RK (2009) Towards context aware emotional
intelligence in machines: computing contextual appropriateness of affective states. Graduate
School of Information Science and Technology, Hokkaido University, Hokkaido
Punie Y (2003) A social and technological view of ambient intelligence in everyday life: what
bends the trend? In: The European Media and Technology in Everyday Life Network, 2000–
2003, Institute for Prospective Technological Studies Directorate General Joint Research
Center European Commission
Pylyshyn ZW (1987) The robot’s dilemma: the frame problem in artificial intelligence. Ablex
Publishing Corporation, Norwood
Riva G, Loreti P, Lunghi M, Vatalaro F, Davide F (2003) Presence 2010: the emergence of
ambient intelligence. In: Riva G, Davide F, IJsselsteijn WA (eds) Being there: concepts, effects
and measurement of user presence in synthetic environments. Ios Press, Amsterdam, pp 60–81
Robertson T (2000) Building bridges: negotiating the gap between work practice and technology
design. Human Comput Stud 53:121–146
Rogers Y (2006) Moving on from weiser’s vision of of calm computing: engaging UbiComp
experiences. In: UbiComp 2006, Orange County, California, USA. Springer-Verlag Vol LNCS
4206, pp 404–421,
Salovey P, Mayer JD (1990) “Emotional intelligence”, Imagination, Cognition and Personality,
vol 9, pp 185–211
Samtani P, Valente A , Johnson WL (2008) “Applying the saiba framework to the tactical
language and culture training system.” In: Parkes P, Parsons M (eds) The 7th International
Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008), Estoril, Portugal
Scherer KR (1992) What does facial expression express? In: Strongman K (ed) International
review of studies on emotion, vol 2. Wiley, New York, pp 139–165
Scherer KR (1999) Appraisal theory. In: Dalgleish T, Power MJ (eds) Handbook of cognition and
emotion. Wiley, New York, pp 637–663
Schilit B, Adams N, Want R (1994) Context-aware computing applications. In: Proceedings of IEEE
workshop on mobile computing systems and applications, Santa Cruz, CA, USA, pp 85–90
Schmidt A (2003) Ubiquitous computing: computing in context. Ph.D. dissertation, Lancaster
University
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human–computer interaction.
IOS Press, Amsterdam, pp 159–178
Schmidt A, Beigl M, Gellersen, HW (1999) There is more to context than location. Comput
Graphics UK 23(6):893–901
Servaes J (1999) Communication for development: on world, multiple cultures. Hampton Press,
Cresskill
Strang T, Linnhoff-Popien C, Frank K (2003) CoOL: a context ontology language to enable
contextual interoperability. In: Proceedings of distributed applications and interoperable
systems: 4th IFIP WG6.1 international conference, vol 2893, Paris, France, pp 236–247
Suchman L (1987) Plans and situated actions: the problem of human–machine Communication.
Cambridge University Press, Cambridge
Suchman L (1993) Participatory design: principles and practice. Lawrence Erlbaum, NJ
Suchman L (2002) Located accountabilities in technology production. Scand J Inf Sys 14(2):91–105
128 3 Context and Context Awareness of Humans and AmI Systems …
Suchman L (2005) Introduction to plans and situated actions II: human–machine reconfigurations,
2nd edn. Cambridge University Press, New York/Cambridge
Sølvberg A, Kung DC (1993) Information systems engineering: an introduction. Springer, Berlin
Tarjan RE (1987) Algorithm design. Commun ACM 30(3):205–212
Teixeira J, Vinhas V, Oliveira E, Reis L (2008) A new approach to emotion assessment based on
biometric data. In: Proceedings of WI–IAT’08, pp 459–500
Tobii Technology (2006) AB, Tobii 1750 eye tracker. Sweden, viewed 15 December 2012. www.
tobii.com
Trumler W, Bagci F, Petzold J, Ungerer T (2005) AMUN–autonomic middleware for ubiquitous
environments applied to the smart doorplate project. Adv Eng Inform 19:243–252
Turner RM (1999) A model of explicit context representation and use for intelligent agents. In:
Proceedings of modeling and using context: 2nd international and interdisciplinary conference,
vol 1688, Trento, Italy, pp 375–388
Tähti M, Arhippainen L (2004) A proposal of collecting emotions and experiences. Interact
Experiences HCI 2:195–198
Tähti M, Niemelä M (2005) 3E—expressing emotions and experiences, Medici Data oy, VTT
Technical Research Center of Finland, Finland
Ulrich W (2008) Information, context, and critique: context awareness of the third kind. In: The
31st information systems research seminar in Scandinavia, Keynote talk presented to IRIS 31
Wenger E (1998) Communities of practice: learning, meaning, and identity. Cambridge University
Press, Cambridge
Winograd T (1996) Bringing design to software. ACM, New York
Wright D (2005) The dark side of ambient intelligence, Forsight 7(6):33–51
Zhou J, Kallio P (2005) Ambient emotion intelligence: from business awareness to emotion
awareness. In: Proceeding of 17th international conference on systems research, informatics
and cybernetics, Baden, Germany
Zhou J, Yu C, Riekki J, Kärkkäinen E (2007) AmE framework: a model for emotion-aware
ambient intelligence, University of Oulu, Department of Electrical and Information
Engineering, Faculty of Humanities, Department of English VTT Technical Research Center
of Finland
Chapter 4
Context Recognition in AmI
Environments: Sensor and MMES
Technology, Recognition Approaches,
and Pattern Recognition Methods
4.1 Introduction
There exist a vast range of AmI architectures that essentially aim to provide the
appropriate infrastructure for AmI systems. Typically, they include many sensors of
diverse types, information processing systems or computing devices where mod-
eling and reasoning occur, and actuators through which the system acts, reacts, or
pre-acts in the physical world. There are many permutations of enabling technol-
ogies and computational processes of AmI, which result in many heterogeneous
components (devices and systems and associated software applications) which have
to interconnect and communicate seamlessly across disparate networks as part of
vast architectures enabling context awareness, machine learning and reasoning,
ontological representation and reasoning, and adaptation of services. The sensors
are basically utilized to acquire the contextual data needed for the context recog-
nition process—that is, observed information as input for AmI systems to analyze,
model, and understand the user’s context, so to undertake in a knowledgeable
manner actions accordingly. Sensor technology is thus a key enabler of context
awareness functionality in AmI systems. Specifically, to acquire, fuse, process,
propagate, interpret, and reason about context data in the AmI space to support
adaptation of services requires using dedicated sensors and signal and data pro-
cessing techniques, in addition to sophisticated context recognition algorithms
based on a wide variety of methods and techniques for modeling and reasoning.
The challenge of incorporating context awareness functionality in the AmI service
provision system lies in the complexity associated with sensing, learning, capturing,
representing, processing, and managing context information.
Context-aware systems are increasingly maturing and rapidly proliferating,
spanning a variety of application domains, owing to recent advances in capture
technologies, the diversity of recognition approaches, multi-senor fusion techniques,
and sensor networks, as well as pattern recognition algorithms and representation
and reasoning techniques. Numerous recognition approaches have been developed
and studied, and a wide variety of related projects have been carried out within
various domains of context awareness. Most of early research work on context
awareness focused on user’s physical context, which can be inferred using different
types of sensing facilities, including stereo-type cameras, RFID, and smart devices.
While most attempts to use context awareness within AmI environments were
centered on the physical elements of the environment or users, in recent years,
research in the area of context recognition has shifted the focus to human elements of
context, such as emotional states, cognitive states, physiological states, activities,
and behaviors. This has led to the development and employment of different rec-
ognition methods, mainly vision-based, multisensory-based, and sensor-based
context and activity recognition approaches. Furthermore, investigating methods for
context recognition in terms of approaches to context information modeling and
reasoning techniques for context information constitutes a large part of a growing
body of research on context awareness technology and its use in the development of
AmI applications that are adaptable and capable of acting autonomously on behalf of
users. Different types of contexts can be recognized using machine learning tech-
niques to associate sensor perceptions to human-defined context labels—classifi-
cation. Specifically, the sensor context data which are acquired and pre-processed
are analyzed using machine learning techniques to create context models and carry
out further pattern recognition—e.g., probabilistic reasoning—to determine
high-level context. Deriving high-level context information from raw sensor data by
means of interpretation and reasoning is about bringing meaning to low-level con-
textual data. However, there is a multitude of recognition algorithms beyond those
based on machine learning techniques that have been proposed and studied on the
basis of the manner in which the context is modeled, represented, and reasoned.
Accordingly, different modeling methods and reasoning techniques have been used
in the field of context-aware computing, apart from supervised and unsupervised
learning methods, including ontological, logical, rule-based, and case-based repre-
sentation and reasoning. Recent research work shows a propensity towards adopting
hybrid approaches to representation and reasoning, which entail integrating related
techniques based on the application domain (see Chap. 5). The key aim is to harness
the context awareness functionality as to generating accurate high-level abstractions
of contexts, such as physical activities, emotional, states, cognitive states, and
communication intents.
The intent of the chapter is to review the state-of-the-art sensor devices, rec-
ognition approaches, data processing techniques, and pattern recognition methods
underlying context recognition in AmI environments. An overview of the recent
advances and future development trends in the area of sensor technology is pro-
vided, focusing on novel multi-sensor data fusion techniques and related signal
processing methods. In addition, the evolving trend of miniaturization is high-
lighted, with a focus on MEMS technology and its role in the advancement of
sensing and computing devices. The observed future development trends include:
the miniaturization of sensing devices, the widespread use of multi-sensor fusion
techniques and systems, and the increasing applicability of autonomous sensors.
4.1 Introduction 131
A sensor is defined as a device that detects or measures some type of input from the
physical environment or physical property, such as temperature, light, sound,
motion, pressure, or other environmental phenomena, and then indicates or reacts to
it in a particular way. The output is a signal in the form of human-readable display
at the sensor location or a recorded data that is to be transmitted over a network for
further processing—e.g., to middleware for context management. Commonly,
sensors can be classified according to the type of energy they detect as signals: light
sensors (e.g., photocells, photodiodes), photo/image sensor (e.g., stereo-type cam-
era, infrared), sound sensors (e.g., microphones), temperature sensors (e.g., ther-
mometers), heat sensors (e.g., bolometer), electrical sensors (e.g., galvanometer),
pressure sensors (e.g., barometer, pressure gauges), motion sensors (e.g., radar gun,
speedometer, tachometer), orientation sensors (e.g., gyroscope), physical movement
sensors (e.g., accelerometers), and so forth.
used to detect various attributes of context associated with both physical environ-
ment as well as human factors related context. Schmidt et al. (1999) catalog dif-
ferent ways of sensing that could be utilized for detecting context. Table 4.1
illustrates a tabulated version of their discussion.
How many and what types of sensors can be used in a given context-aware
system is determined by the way in which context is operationalized (defined so that
it can be technically measured and thus conceptualized) and the number of the
entities of context that are to be incorporated in the system based on the application
domain, such as location, lighting, temperature, time, physical and computational
objects, task, user’s state and goal, personal event, social dynamics, proximity to
people, and so on, and also whether and how these entities can be combined to
generate high-level abstraction of context (e.g., the cognitive, physical, emotional,
and social dimension of context). In relation to human factors related context,
various kinds of sensors have been used to detect human movement, e.g., verbal
and nonverbal behavior, which provide a wealth of contextual information as
implicit input to context-aware systems, indicating user’s emotional, cognitive,
physiological, and social state as well as activities. Human movement as source of
context information has been under a rigorous investigation in the area of context
recognition, in particular in relation to sensing devices.
Recent advances in sensor technology have given rise to a new class of miniaturized
devices characterized by novel signal processing methods, high performance,
multi-fusion techniques, and high-speed electronic circuits. Subsequently, research
on context awareness has started to focus on the use of multiple miniature dense
sensors as to embedding context awareness in computer systems of various scales.
A multitude of sensors are already entrenched in very-small or large ICT, and it is
only a matter of time when advanced use can be gained from these complex
technologies; it is predicted that AmI will be densely populated by ICT devices and
systems with potentially powerful nano-bio-information and communication
capabilities (Riva et al. 2005). The miniaturization trend is increasingly making it
4.2 Sensor Technology 133
After the UbiComp vision gained footing during 1990s, numerous research initia-
tives were launched within the field of sensor technology under the label of
micro-system design and embedded systems, spanning Canada, European continent,
and Japan. But most early research initiatives in this area took place in the USA.
4.3 Miniaturization Trend in AmI 135
The field of sensor technology has changed quite dramatically over the past two
decades due to the advent of such technologies as MEMS, piezo-materials, and
VLSI video. Research shows that MEMS are by far of most importance as to
enabling the rise of microscale sensors and actuators. Sensor technology has, thanks
to the trend of miniaturization, undergone some significant transitions and continues
to evolve rapidly. The focus of research has shifted mainly from macro to micro-
scale devices, owing to the development of MMES technology. In view of that, the
criteria that are being used to gauge the operational capabilities of the evolving
miniaturized devices include: intelligence, system-on-a-chip integration, high per-
formance, computational speed, integrity, efficiency, size, communication, reli-
ability, energy, cost, and so on. ‘Until recently…sensors tended to be simple,
unintelligent, connected directly into control systems, and static…, but all that is
changing. Wireless networks are becoming increasingly common and some smaller
sensors are becoming mobile so that networks of sensors can work in mobile teams
(or swarms)… Sensors are becoming “Smart Sensors” that can pre-process their
own data to improve quality and reduce communications’ (Sanders 2009a). The
emphasis will be given to (the integrated large-scale) MEMS, in addition to pro-
viding a short account of piezo-materials and VLSI video.
A great variety of MMES has been designed and used in the field of computing,
including AI, AmI, UbiComp, and HCI, but commonly used MEMS differ from
what is called ‘large scale integrated MEMS’ in terms of complexity. Many of such
MEMS are too complex to be studied (Ibid). Nevertheless, ‘novel optimized…
MEMS architectures (with processors or multiprocessors, memory hierarchies and
multiple parallelism to guarantee high-performance computing and decision mak-
ing), new smart structures and actuators/sensors, ICs and antennas, as well as other
subsystems play a critical role in advancing the research, developments, and
implementation’ (Ibid, p. 15). Accordingly, the so-called flip-chip, which replaces
wire banding to connect ICs with micro- and nanoscale actuators and sensors, offers
benefits in the implementation of advanced flexible packaging, improves reliability
and survivability, and reduces weight and size, in addition to other improvements
Fig. 4.2 Flip-chip monolithic MEMS with actuators and sensors. Source Lyshevski (2001)
138 4 Context Recognition in AmI Environments …
Fig. 4.3 High-level functional block diagram of large-scale MEMS with rotational and
translational actuators and sensors. Source Lyshevski (2001)
4.4 MEMS Technology 139
To integrate, large scale integrated MEMS are of far greater complexity than
MEMS that are being used today, as they can integrate ‘thousands of nodes of
high-performance actuators/sensors and smart structures controlled by ICs and
antennas; high-performance processors or superscalar multiprocessors; multi-level
memory and storage hierarchies with different latencies (thousands of secondary
and tertiary storage devices supporting data archives); interconnected, distributed,
heterogeneous databases; high-performance communication networks (robust,
adaptive intelligent networks).’ (Ibid).
As mentioned above, apart from MEMS, there is a suite of technologies
underlying the rise of miniaturized sensors, including piezo-materials and VLSI
video. Made typically of ceramics, piezo-materials ‘give off an electrical charge
when deformed and, conversely, deform when in the presence of an electrical field.
Put a charge in, the material deforms; deform the material, it sends out a charge.
Piezos are particularly useful as surface-mount sensors for measuring physical
movement and stress in materials. But more importantly, piezos are useful not just
for sensing, but for effecting—manipulating the analog world. This is an indicator
of the real significance of the sensor decade. Our devices won’t merely sense and
observe. They will also interact with the physical world on our behalf.’ (Saffo
1997). As far as VLSI video is concerned, it is a videocam built ‘on a single chip:
the charge-coupled device (CCD), all the circuitry needed and even the lens will be
glued directly to the chip’ (Ibid). The rapid global progress in VLSI is one of the
factors that have driven the development of MEMS. VLSI technology (or CMOS)
can be used to perform the fabrication of microelectronics (ICs), and the fabrication
of motion microstructures is also based upon VLSI technology or micromachining;
microelectronics and micromachining are two basic components of MEMS
(Lyshevski 2001) in relation to AmI, VLSI video is of particular relevance to the
design and performance of the kind of context-aware applications (multimodal
users interfaces) that use emotional and cognitive cues from external carriers, such
as affect display and eye movement, to recognize or infer the user’s emotional and
cognitive states, as well as to conversational agents which attempt to receive (and
respond) to user’s multimodal nonverbal communication behavior.
Like other enabling technologies of AmI, MEMS technology poses many issues
and challenges associated with research and development, design and engineering,
and manufacturing and fabrication. Due to the scope of this chapter, a great number
of problems and phenomena will not be covered here, including fabrication and
manufacturing. Lyshevski’s (2001) book is the essential reading for those who are
interested to explore MMES and NMES in their complexity and variety.
There are fundamental and computational problems posed by the complexity of
large scale MEMS that need to be addressed, formulated, and solved. The emer-
gence of high-performance computing has dramatically affected the fundamental
and applied research in MEMS, creating a number of very challenging problems.
To advance the theory and engineering practice of MEMS requires high-
performance computing and advanced theory (Ibid). Given the size and complex-
ity of MEMS, the standard concepts of classical and fundamental theories of
physics (e.g., quantum mechanics, molecular dynamics, electromagnetics,
mechanics and thermodynamics, circuitry theories, and other fundamental con-
cepts) and conventional computing technologies (e.g., modeling, simulation),
cannot be straightforwardly applied to large-scale integrated micro-scale devices
(MEMS) given the associated highest degree of complexity. ‘Current advances and
developments in modeling and simulation of complex phenomena in NEMS and
MEMS are increasingly dependent upon new approaches to robustly map, compute,
visualize, and validate the results clarifying, correlating, defining, and describing
the limits between the numerical results and the qualitative-quantitative analytic
analysis in order to comprehend, understand, and grasp the basic features.
Simulations of NEMS and MEMS require terascale computing that will be avail-
able within a couple of years. The computational limitations and inability to
142 4 Context Recognition in AmI Environments …
MEMS will need the ability to cope with technology or communication failures and
large-scale deployments and large amounts of data will need new computer science
algorithms’ (Sanders 2009a).
Advanced intradisciplinary research and thus scholarly collaboration between
researchers from different subfields of computing is necessary to design, develop,
and implement high-performance MEMS. In addition to the complexity of
large-scale MEMS requiring new fundamental and applied research and develop-
ments, ‘there is a critical need for coordination across a broad range of hardware
and software. For example, design of advanced microscale actuators/sensors and
smart structures, synthesis of optimized (balanced) architectures, development of
new programing languages and compilers, performance and debugging tools,
operating system and resource management, high-fidelity visualization and data
representation systems, design of high-performance networks, etc. New algorithms
and data structures, advanced system software and distributed access to very large
data archives, sophisticated data mining and visualization techniques, as well as
advanced data analysis are needed. In addition, advanced processor and multipro-
cessors are needed to achieve sustained capability required of functionally usable
large-scale…MEMS.’ (Lyshevski 2001, p. 17).
The set of long-range goals that challenge the design, manufacturing, develop-
ment, and deployment of high-performance MEMS include advanced materials and
process technology; microsensors and microactuators; sensing and actuation
mechanisms; sensors-actuators-ICs integration and MEMS configurations; pack-
aging, microassembly, and testing; MEMS design, optimization, and modeling; and
MEMS applications and their deployment (Ibid). Research into modeling and
improving MEMS manufacturing and design techniques, in addition to HCI, AmI,
and AI, will lead to useful advances for the immediate and medium term future,
while ‘in the longer term, understanding the properties of MEMS materials and then
creating more capable and intelligent MEMS machines will lead to direct
brain-computer interfaces that will allow us to communicate our ideas directly to
machines (and to other human members of virtual teams) and that may change our
world beyond recognition.’ (Sanders 2009a). The boundaries to what may be
technologically feasible are for the future to tell.
Over the last two decades, sensor technology has undergone a significant change,
especially in relation to the area of context-aware and affective computing, giving
rise to a new class of sensing devices characterized by multi-sensor data fusion
techniques and miniaturization. This has been boosted by recent discoveries in
cognitive science and AI and advances in micro-engineering enabled by interdis-
ciplinary research endeavors. In particular, there has been an ever-increasing
144 4 Context Recognition in AmI Environments …
comprehensive review of the data fusion state of the art, where they explore its
conceptualizations, benefits, challenging aspects, and existing methodologies, and
also highlight and describe several future directions of research in the data fusion
community. MEMS will add much to multi-sensor (context-aware and affective)
systems, which are considered to be more rewarding in relation to a number of
application domains. Given their integration features as to signal processing, wire-
less communication, control and optimization, and self-organization and decision
making, MEMS will revolutionize sensing devices by improving energy efficiency,
intelligence, memory, computational speed, and bandwidth. These are crucially
important factors for the effective operation of sensors, especially sensors deal with
huge amounts of raw data collected from multiple and often heterogeneous sources.
On a single silicon chip, MEMS integrate smart microscale sensors for detecting and
measuring changes of physical variables (and also human actions, activities, and
behaviors); microelectronics/ICs for signal processing, data acquisition, and deci-
sion making; and smart microscale actuators for activating real-world systems, e.g.,
context-aware systems, affective systems, and conversational systems.
Fig. 4.4 Use of multiple, diverse sensors for emotional, cognitive, and situational context
awareness
Given that the emphasis in this chapter is on emotional and cognitive context
awareness in relation to sensing devices and information processing systems, a
layered architecture for abstraction from raw sensor data to multi-sensor based
emotional context is illustrated and described. This architecture is also applicable to
cognitive and situational context—with relevant changes in sensor types and related
signal processing and computation techniques. The keystones of the multi-sensor
context-aware system idea are:
• Integration of multiple, diverse sensors, assembled for collection or acquisition
of multi-sensor data independently of any specific application (e.g., emotional
state, cognitive state, situational state, task state);
• Association of multi-sensor data with emotional, cognitive, situational, or
activity contexts in which the user is, for instance feeling frustrated, decision
making, or watching TV; and
• Implementation of sensors and signal and data processing approaches and
pattern recognition methods for inferring or estimating emotional, cognitive, or
situational context from sensor data (values and cues).
To recognize an emotional context, pre-processing of multi-sensor data received
as digital signals from multiple, diverse sensors (used to detect facial, gestural,
psychophysiological, and speech cues) entails that these sensors are equipped with
interfaces that allow them to interact with one another using dedicated cross-
processing algorithms for the purpose of fusing and aggregate data from multiple
sources and transforming them into cues (application-dependent features), which
4.6 Multi-sensor Based Context Awareness 147
are used for further analysis through machine learning techniques to create emo-
tional context models and carry out further pattern recognition—making inferences
about the emotional context. Emotional context-aware systems are typically based
on a layered architecture for sensor-based computation of emotional context as
illustrated in Fig. 4.5, with separate layers for: raw sensor data, features extracted
from individual sensors, and context derived from cues. The idea is to abstract from
low-level emotional context by creating a model layer that gets the multi-sensor
perceptions to generate application actions.
The sensor layer is defined by an open–ended collection of sensors given that
emotion is multimodal in nature and involves multiple channels. Accordingly, the
data provided by sensors can be of different formats, ranging from slow sensors that
supply scalars (e.g., heart rate, galvanic skin response, electroencephalographic
response) to fast and complex sensors that provide larger volume data (e.g.,
microphone for capturing emotiveness and prosodic features of speech, video
camera for capturing facial expressions, and accelerometer for capturing gestures).
In a general architecture for context awareness, the sensor layer involves a open–
ended collection of many different sensors gathering a large volume of data about
various contextual features pertaining to the user, including cognitive state, task,
social dynamics, personal event, location, lighting, time, temperature, specific
motion pattern, behavior, absolute position, intention, work process, and so on—
more specifically, a great diversity and multiplicity of sensors, such as image
sensor, audio sensor, biosensor, light sensor, temperature sensor, motion sensor,
physical movement sensor, location sensor, but to name a few. These sensors are
utilized to acquire the contextual data needed for the recognition process as to
various entities of context.
Between sensor layer and emotional context model layer, figures a cue layer.
This layer, in multi-sensor emotional context-aware systems, introduces cues as
abstraction from raw sensor data that represent features extracted from the data
stream of a single sensor. As shown in Fig. 4.5, many diverse cues can be derived
from the same sensor-image, motion, audio, or wearable. In reference to mobile
context-aware devices—but of relevance also to emotional context-aware systems,
Gellersen et al. (2001) point out that this ‘abstraction from sensors to cues serves to
reduce the data volume independent of any specific application, and is also referred
to as “cooking the sensors”… Just as the architecture does not prescribe any specific
Fig. 4.5 Layered architecture for abstraction from raw sensor data to multi-sensor based
emotional context
148 4 Context Recognition in AmI Environments …
set of sensors, it also does not prescribe specific methods for feature extraction in
this layer. However, in accordance with the idea of shifting complexity from
algorithms to architecture it is assumed that cue calculation will be based on
comparatively simple methods. The calculation of cues from sensor values may for
instance be based on simple statistics over time (e.g., average over the last second,
standard deviation of the signal, quartile distance, etc.) or slightly more complex
mappings and algorithms (e.g., calculation of the main frequencies from a audio
signal over the last second, pattern of movement based on acceleration values). The
cue layer hides the sensor interfaces from the context layer it serves, and instead
provides a smaller and uniform interface defined as set of cues describing the
sensed system environment. This way, the cue layer strictly separates the sensor
layer and context layer which means context can be modeled in abstraction from
sensor technologies and properties of specific sensors. Separation of sensors and
cues also means that both sensors and feature extraction methods can be developed
and replaced independently of each other. This is an important requirement in
context-aware systems and has motivated the development of [various] architec-
tures’. Architectures for emotional context awareness typically incorporate a spe-
cific set of specialized sensors and feature extraction algorithms. It is important to
extract meaningful features from raw data in order to derive the emotional context
visible from these features.
The context layer introduces a set of emotional contexts which are abstractions
of real-world emotions (the state of a person’s emotions) of both negative and
positive valence, each as function of combined available cues. It is only at this level
of abstraction, after facial, gestural, psychophysiological, and prosodic feature
extraction and dimension reduction, data normalization, and noise elimination in
the cue layer, that information from multiple, diverse sensors is combined for
computation of emotional context. The architecture for emotional context aware-
ness does not prescribe specific methods for computationally reasoning about
emotional context from potential cues. Ontological algorithms, rule-based algo-
rithms, statistical methods, and neural networks may be employed; it is also feasible
to adopt a hybrid approach, which can integrate some of these approaches at the
level of representation, reasoning, or both, depending on the characteristics of the
concrete application (see next chapter for further detail). In the case of using only
facial expression as emotion carrier to detect emotional cues for computing some
basic user’s emotional states, emotional context awareness can be treated as a
typical machine learning classification problem (using unsupervised techniques),
the process of mapping between raw sensor data and an emotional context
description. In this case, the context-aware application automatically identifies or
recognizes a user’s emotional state based on a facial expression from a digital
image or a video frame from a video source, by comparing selected facial features
from the image and a facial expression database, for instance, thereby inferring
high-level emotional context abstraction on the basis of one emotion channel—
facial cues. In fact, most research has centered upon recognizing facial expression
as a source that conveys a wealth of contextual information. Otherwise emotional
4.6 Multi-sensor Based Context Awareness 149
context is calculated from all available cues generated from diverse types of sen-
sors. The mapping from cues to emotional context may be explicit, for instance
when certain cues are known to be relevant indicators of an emotional context—
e.g., emotional states deduced from six universal facial displays: happiness, anger,
disgust, sadness, fear, and surprise—in relation to specific applications, or implicit
as to other types of idiosyncratic or complex emotional states (e.g., interest,
uninterest, boredom, frustration) in the outcome of unsupervised or supervised
learning techniques. If ontological approach to modeling and reasoning is used as a
basis for recognition algorithm, emotional context recognition can be processed
using equivalency and subsumption reasoning in description logic, i.e., to test if two
emotional context concepts are equivalent or if an emotional context concept is
subsumed by one or more context concepts (see Chap. 5 for clarification).
Research is burgeoning within the area of emotional and cognitive context aware-
ness. A range of related specialized hardware and software systems, including signal
processing methods, multi-sensor data fusion techniques, pattern recognition algo-
rithms, (hybrid) representation and reasoning techniques, multimodal user inter-
faces, and intelligent agents are under vigorous investigation—design, testing,
evaluation, and instantiation in laboratory settings. Today’s state-of-the-art enabling
technologies and processes of human factors related context awareness is viewed as
satisfactory and the increasing level of R&D into the next generation of these
technologies and processes is projected to yield further advances. The aim is to
augment future AmI systems with human-like cognitive and emotional capabilities
to enhance their functioning and boost their performance not only in terms of context
awareness but also in terms of affective computing and computational intelligence
(e.g., dialog and conversational systems and behavioral systems that can adapt to
human behavioral patterns). One key aim of ongoing research endeavors is to
improve the recognition or inference of highly complex, dynamic, and multidi-
mensional human contexts, such as multifaceted emotional states, demanding tasks,
and synchronized cognitive activities. Towards this end, it is necessary to advance
sensor technology, develop novel signal and data processing techniques and algo-
rithms, and create new dynamic models that consider the relationship between
cognition, emotion, and motivation in relation to human context, among others. In
relation to sensor technology, if is of equal importance to advance the use of natural
modalities—natural human communication forms, as they are key crucial for the
effective functionality of emotional and cognitive context awareness in terms of
providing a wealth of contextual information. Emotional and cognitive context can
indeed be captured as an implicit input based on multiple signals from the user’s
verbal and nonverbal communication behavior. Hence, human factors context-aware
systems will be equipped with miniaturized, multisensory devices—embedded in
user interfaces, attached to human body, and spread in the environment—that can
combined detect complex cues of human context. Given their computational capa-
bilities, these sophisticated devices are aimed at capturing dynamic contextual
information by reading multimodal sources (e.g., emotional cues coming from facial
expressions, gestures, and speech and its prosodic features; cognitive cues coming
from eye movement and facial displays), thereby enabling complex inferences of
high-level context abstractions—emotional and cognitive states. Cognitive cues can
also be captured using software inference algorithms to recognize or infer the user’s
intention as a cognitive context. For a detailed account of emotional and cognitive
context awareness, the reader is directed to Chaps. 8 and 9, respectively.
Given the criticality of emotion and cognition in human functioning processes,
as part of AmI research, multi-sensor emotional and cognitive context-aware
systems need to be thoroughly tested and evaluated as instantiations in their
operating environments. The evaluation of their performance should be carried out
4.7 Research in Emotional and Cognitive Context Awareness 151
The assumption is that not all emotional cues can be available together, as context
may affect the accessibility of emotional cues that are relevant. Also, the context (e.g.,
physical conditions) in which a user is in a concrete moment may also influence
his/her emotional states, which are likely to be externalized and translated into a form
intelligible to an affective or emotion-aware system through relevant emotion
channels. Moreover, in terms of sociocultural environment, various factors can have
an effect on emotion expression and identification, e.g., verbal cues related to the user
language or idiosyncratic facial emotional properties associated with the user’s
culture. Furthermore, it is important to note that the more channels are involved, the
more robust estimation of the user’s emotional states. The same goes for modalities
as to their combination for multimodal recognition. In fact, here might be limits to the
distance at which, for instance speech is audible (easy to hear), and thereby facial
expressions and gestures become more relevant source (emotion carrier) of affective
information. In other words, the advantages of having many different sensors
embedded in user interfaces of affective systems and distributed in the environment
are valued, as some sensor or sensor nodes may fail at any time and local events and
situations may distort some sensor readings. Research in affective computing (e.g.,
MIT Media Lab 2014) is currently investigating how to combine other modes than
visual and auditory to accurately determine users’ emotional states. The assumption
is that a more robust estimation of the user’s emotional states and thus relevant,
real-time emotional responsive services is dependent on a sound interpretation and
processing based on a complete detection of emotional information—i.e., multiple,
diverse sensors, assembled for acquisition of multi-sensor data. Put differently, the
potential of machine learning techniques can only be exploited to generate sophis-
ticated inferences about emotional states through reasoning processes—if based on
complete sensor data. While it is more effective to consider various modalities and
channels and thus multiple, diverse sensors, when it comes to capturing emotional
states, sensing emotional information and perceiving emotional states must be based
on multi-sensor fusion technology, a process which is inspired by (emulate) the
human cognitive processes of sensation and perception. This entails creating novel
signal processing, specialized data processing, and machine learning mechanisms for
efficient fusion and aggregation of sensor data into features and making inferences
about emotional states. All in all, multi-sensor fusion for multimodal recognition of
emotions provides many intuitive benefits that should be exploited to develop
sophisticated and powerful affective and emotion-aware systems.
sensory organs (i.e., visual, audio, and touch receptors) and associate these signals
with a concept (e.g., the positive or negative state of a person’ emotion)—attaching
a meaning to the sensory information as part of the cognitive process of perception,
which involves recognition, interpretation, and evaluation as mental sub-processes
(see below for more detail). The information processing model of cognition is the
dominant paradigm in the disciplines of cognitive psychology, cognitive science,
and AI (e.g., machine learning). Thus, multi-sensor systems have outgrown these
disciplines, and the information processing view is supported by many years of
research across many disciplines.
From a cognitive psychology perspective, mental processes are the brain
activities that handle information when sensing and perceiving objects, events, and
people and their states (as well as when solving problems, making decisions, and
reasoning). Humans are viewed as dynamic information processing systems whose
mental operations are described in computational terminology, e.g., inputs, struc-
tures, representations, processes, and outputs. Information processing model is a
way of thinking about mental processes, envisioning them as software programs
running on the computer that is the brain. This relates to mental
information-manipulation process that operates between stimulus and response.
‘The notion that mental…processes intervene between stimuli and responses
sometimes takes the form of a ‘computational’ metaphor or analogy, which is often
used as the identifying mark of contemporary cognitive science: The mind is to the
brain as software is to hardware; mental states and processes are (like) computer
programs implemented (in the case of humans) in brain states and processes’
Rapaport (1996, p. 2). For an overview of information processing model and human
cognition as well as cognitive psychology, cognitive science, and AI and the
relationship between them and their contribution to AmI—beyond multi-sensor
systems, he reader is directed to Chap. 8.
In sum, the underlying idea of multi-sensor fusion in AmI is to simulate the
human cognitive processes of sensation and perception into emotion-aware and
context-aware systems. These multi-sensor systems can therefore be seen as a
computational rendition of human cognitive processes of sensation and perception
in terms of detecting and fusing sensory information from various types of sensors
and link sensor readings or observations to emotional states as human-defined
concepts. Therefore, the design of multi-sensor context-aware systems and the
development of related computational cognitive processes—detection, processing,
interpretation, and recognition algorithms—attempt to mimic the human sensory
organs and the associated sensation and perception processes. In computing, the
sensing process involves the acquisition and pre-processing of low-level data col-
lected by multisensory devices and the recognition process entails the interpretation
of and reasoning on information to generate high-level abstractions of contexts. The
typical computational processes underlying context recognition encompass:
detection, fusion, aggregation, classification, interpretation, evaluation, and infer-
ence (in addition to learning in the case of machine learning). However, the way
sensory devices and recognition algorithms function in computers are still far from
how human sensory organs detect signals and human brain fuses sensory
4.9 Multi-sensor Systems: Mimicking the Human Cognitive … 155
information and further processes it for further perception and thus recognition.
Computational artifacts and processes are circumscribed by the constraints of
existing technologies as well as engineering theory and practice. In other words, the
sensors and pre-processing and analysis mechanisms—of the multi-sensor systems
—are technology-driven, i.e., their development is driven by what is technically
feasible, rather than by how the cognitive processes of sensation and perception
function according to cognitive psychology theories (e.g., Passer and Smith 2006).
In fact, there is a tendency not only in context-aware systems but in all kinds of
computer systems towards reducing the complexity of various human cognitive
processes, such as problem solving, emotion, attention, motivation reasoning,
decision making, and so on (in other words: alienating the concepts from their
complex meaning in more theoretical disciplines, such as cognitive psychology) to
serve technical purposes. Thus, the way the sensation and perception cognitive
processes as concepts are operationalized has impact on how multi-sensors and
related computational processes (i.e., signal and data processing algorithms and
pattern recognition techniques) are designed, developed, implemented, and function
in real-world (operating) environments—that is, in an simplified way that result in
imperfect sensing, imperfect inferences, and thus imperfect behaviors.
One implication of the oversimplification underlying the design and modeling of
AmI systems, including multi-sensor context-aware systems, is that in recent years,
some scholars have suggested and others strongly advocated revisiting the whole
notion of intelligence in Am in such a way to give humans a key role in influencing
the representations and thus shaping the actions of nonhuman machines, by
exposing humans to the ambiguities raised by the imperfections pertaining to the
functioning and behavior of AmI systems. As José et al. (2010, p. 1487) state,
‘Letting people handle some of the semantic connections of the system and the
ambiguities that may arise, would overcome many of the complex issues associated
with the need to perfectly sense and interpret the state of the world that many AmI
scenarios seem to announce… [W]e should recognize that many of the complex
inference problems suggested for AmI are in fact trivial when handled by people.
Moreover, even when inferences are simple, systems are not uniform and there will
always be some type of technical discontinuity that may affect sensing and thus the
ability to always get it right.’
Indeed, there is a fundamental difference between computer systems and humans
in terms of cognitive functions as well as biological designs. It may be useful to
provide some theoretical insights drawn from cognitive psychology to give a rough
idea about what characterizes human sensory organs and related cognitive pro-
cesses. Human senses are realized by different sensory receptors. The receptors for
visual, auditory, tactile, olfactory, and gustatory signals are found in the eyes, ears,
skin, nose, and tongue, respectively. The information gathered in these receptors—
sensory information—during the perceptual analysis of the received stimuli is
supplied to the brain that infuses and processes it in a very dynamic, intricate, and
often unpredictable way. Different models of sensation–perception have been
studied in cognitive psychology. There are several unsolved issues associated with
the mental model in psychology. Among which, there is a significant amount of
156 4 Context Recognition in AmI Environments …
overall, familiar, structure to situations, places, objects, and events—on the basis of
mental and social representations. Humans resort to existing schemata that provide
a recognizable meaning to make sense of what constitutes reality in its complex
aspects. Inspired by human cognitive perception process, context-aware systems
infer a high-level abstraction of context through executing recognition, interpreta-
tion, and evaluation mechanisms. Deriving high-level context information from
sensor data (values and cues) together with dynamic models (human knowledge
represented in a computational and formal format) by means of such mechanisms is
about bringing meaning to low-level context data. To model, represent, and reason
about context, different context information representation and reasoning tech-
niques have been developed and applied in the field of context-aware computing
based on a wide variety of approaches, e.g., probabilistic methods, rule-based
methods, ontology-based (description logic) approaches, and hybrid approaches.
Regardless of the type of the approach to context representation and reasoning,
recognition algorithms have not yet reached a mature stage, and thus do not operate
or function at the human cognitive level. There is a long way to go to emulate
human cognitive representations, structures, and processes associated with the
perception process that occurs in the human brain. In fact, understanding mental
information-manipulation processes and internal representations and structures used
in cognition has for long been the key concern of cognitive scientists, who indeed
seek to investigate how information is sensed, perceived, represented, stored,
manipulated, and transformed in the human brain. Among the most challenging
research questions is to understand and implement in computer systems the way in
which affective and motivational states influence sensation and perception as
cognitive processes, and how to computationally model what constitutes the cog-
nitive processes as encompassing information processing at the subconscious level,
not only at the conscious level—the ability to think and reason, which is restricted
or exclusive to humans and has been under research in AI and cognitive science for
several decades. The underlying assumption is that there is a plethora of infor-
mation part of us and around us at all moments, shaping our perceptions and
conclusions and allowing decisions or actions to be made about what is around us.
These and many other aspects of human cognitive functioning (or cognition) cannot
be modeled in artificial systems, and hence it is unfeasible for multi-sensor
context-aware systems to operate at the level of human cognitive information
processing—in terms of the sensation and perception processes. In fact, although
the notion of intelligence as ‘an integral part of some of the most enticing AmI
scenarios’ ‘has inspired a broad body of research into new techniques for improving
the sensing, inference and reasoning processes’ (José et al. 2010), no real break-
through in context awareness research is perceived in this regard. The meaningful
interpretation of and efficient reasoning about information remains by far the main
hurdle in the implementation of context-aware systems due to the fact that most of
the interpretation and reasoning processes involve complex inferences based on
imperfect and inadequate sensor data as well as oversimplified cognitive, emotional,
behavioral, social, cultural, and even physical models. A number of subtasks for
158 4 Context Recognition in AmI Environments …
realizing reliably recognition and interpretation of contexts as implicit input are not
solved yet, and this seems at the current stage of research in context awareness close
to impossible (Schmidt 2005).
As a result of the continuous efforts to realize and deploy AmI paradigm, which is
evolving due to the advance and prevalence of smart, miniaturized sensors and
computing devices, research is currently being carried out in all domains associated
with AmI, ranging from low-level data acquisition (i.e., sensing, signal processing,
fusion), to intermediate-level information processing (i.e., recognition, interpreta-
tion, reasoning), to high-level application and service delivery (i.e., adaptation and
actions). Most research in AmI focuses on the development of technologies for
context awareness as well as the design of context-aware applications. This
involves MMES, multi-sensor fusion techniques, data processing, pattern recog-
nition algorithms, multimodal user interfaces, software agents, actuators, and query
languages.
Context awareness is a prerequisite technology for the realization of AmI vision,
hence the growing interest and burgeoning research in the area of context recog-
nition. This has emerged as a significant research issue related to the thriving
development of AmI towards the realization of intelligent environments. This
relates to the fact that the system’s understanding (analysis and estimation) of the
user’s context, which is based on observed information and dynamic models, is a
precondition for the delivery of (relevant) intelligent services, or that various
entities (e.g., emotional states, cognitive states, tasks, social dynamics, situations,
events, places, and objects) in an AmI environment provide important contextual
information that should be exploited in such that the intelligent behavior of the
system within such an environment must be pertinent to the user’s context. Context
recognition has been an intensively active and rapidly evolving research area in
AmI. While early work—use of context awareness within AmI environments—
directed the focus towards the analysis of physical information, such as location and
physical conditions, as a means to recognize physical context, more recent research
has shifted to the employment of multiple miniature sensors entrenched in computer
interfaces and spread in the surrounding environment to recognize complex features
of context. These sensors are used to acquire the contextual data required for the
process of recognizing—detecting, interpreting, and reasoning about—such con-
texts as emotional states, cognitive states, task states, and social settings. Therefore,
the focus in research within AmI is being directed towards human factors related
context. Accordingly, a multitude of recognition approaches and pattern recognition
methods that have been proposed and studied are being experimented with, and the
main differences between each of these are the manner in which different types of
context, in relation to various application domains, are modeled, represented,
4.10 The State-of-the-Art Context Recognition 159
reasoned about, and used. Indeed, existing approaches to context modeling and
reasoning, such as probabilistic methods, ontology-based approaches, rule-based
methods, and relational databases are often integrated for optimal results and in
response to the increasing complexity of new context-aware applications as well as
the advancement of context awareness technology in terms of the operationalization
of context and its conceptualization and application, giving rise to a whole set of
novel complex pattern recognition algorithms. In all, existing approaches to context
recognition are thus numerous and differ in many technical and computational
aspects. Context awareness has been extensively studied in relation to various
domains, and work in the field has generated a variety and multiplicity of lab-based
applications, but a few real-world ones, involving the use of various pattern rec-
ognition algorithms. In this chapter, the emphasis is on machine learning approa-
ches to context recognition algorithms, and a wide range of related applications are
surveyed. Ontology-based and hybrid approaches and related applications are
addressed in Chap. 5.
techniques. This approach involves wearable sensors that can be attached to a human
actor whose behavior is being monitored or to objects that constitute the environ-
ment where the human actor is performing a given activity—sensor augmentation of
artifacts of use in daily living. On-body sensors include accelerometers, gyroscopes,
biosensors, vital processing devices, and RFID tags (use radio waves to remotely
identify people or objects carrying reactive tags). Based on networking RFID tags
humans are expected to be overwhelmed by huge amount of personalized real-time
responses in AmI environments. However, wearable sensor-based activity recog-
nition approach has been extensively used in the recognition of human physical
activities (Bao and Intille 2004; Huynh 2008; Lee and Mase 2002; Parkka et al.
2006), such as walking, sitting down/up, or physical exercises. Radar as an indirect
system has also been used for human walking estimation (van Dorp and Groen
2003). Tapia and Intille (2007) have used wireless accelerometers and a heart rate
monitoring device for real-time recognition of physical activities and their intensi-
ties. Wearable sensors have been used to recognize daily activities in a scalable
manner (Huynh et al. 2007). Accelerometers sensing movements in three dimen-
sions have been employed in wearable implementations (DeVaul et al. 2003; Ling
2003; Sung et al. 2005), incorporated into a mobile phone (Fishkin 2004). As a novel
approach, a wrist-mounted video camera has been used to capture finger movements
and arm-mounted sensing of electrical activity relating to hand movement (Vardy
et al. 1999). In all, due to their reduced cost and wide availability, accelerometers are
probably the most frequently used as wearable sensors for data acquisition and
activity recognition for human body movements. However, given the prerequisites
of wearable computers (Rhodes 1997), it is crucial to keep sensors to a minimum and
as resource friendly as possible. For this reason, many researchers have considered
using fewer accelerometers to measure different aspects of user body positions (Kern
et al. 2002; Lee and Mase 2002; Park et al. 2002), attempting to avoid
over-complicated and over-resource intensive processes. Otherwise this may put
constraints on the real-world implementation of AmI systems. For example, Van
Laerhoven et al. (2002) have used more than thirty accelerometers to build models of
a user’s posture. While wearable sensors provide some benefits, they are associated
with some limitations. ‘The wearable sensor based approach is effective and also
relatively inexpensive for data acquisition and activity recognition for certain types
of human activities, mainly human physical movements. Nevertheless, it suffers
from two drawbacks. First, most wearable sensors are not applicable in real-world
application scenarios due to technical issues such as size, ease of use and battery life
in conjunction with the general issue of acceptability or willingness of the use to
wear them. Second, many activities in real-world situations involve complex
physical motions and complex interactions with the environment. Sensor observa-
tions from wearable sensors alone may not be able to differentiate activities
involving simple physical movements’ (Chen and Nugent 2009, 413). In fact,
operationalizing many types of human activities and their contexts in daily living—
human interactions with artifacts in the situated environment—pose many technical
issues that need to be addressed, especially oversimplification of concepts.
Accordingly, the object-based activity recognition approach has emerged to address
4.10 The State-of-the-Art Context Recognition 163
the drawbacks associated with wearable based recognition approach (Philipose et al.
2004), in activity recognition. Based on real-world observations, this approach
entails that ‘activities are characterized by the objects that are manipulated during
their operation. Simple sensors can often provide powerful clues about the activity
being undertaken. As such it is assumed that activities can be recognized from sensor
data that monitor human interactions with objects in the environment… It has been,
in particular, under vigorous investigation in the creation of intelligent pervasive
environments for ambient assisted living (AAL)… Sensors in an SH can monitor an
inhabitant’s movements and environmental events so that assistive agents can infer
the undergoing activities based on the sensor observations, thus providing
just-in-time context-aware ADL assistance.’ (Chen and Nugent 2009, p. 413).
An interesting European project called ‘Opportunity’, which started in 2009 and
finished in 2011, picks up on recognizing context and activity as the very essential
methodological underpinnings of any (AmI) scenario and investigates methodolo-
gies to design context-aware systems: ‘(1) working over long periods of time
despite changes in sensing infrastructure (sensor failures, degradation); (2) provid-
ing the freedom to users to change wearable device placement; (3) that can be
deployed without user-specific training’ (CORDIS 2011). The activities of the
project center on developing what is called opportunistic context-aware systems
that ‘recognize complex activities/contexts despite the absence of static assump-
tions about sensor availability and characteristics’; ‘are based on goal-oriented
sensor assemblies spontaneously arising and self-organizing to achieve a common
activity/context recognition goal’; ‘are embodied and situated, relying on
sel-supervised learning to achieve autonomous operation’; ‘make best use of the
available resources, and keep working despite…changes in the sensing environ-
ment’. One of the interesting works done in this project is the development of
‘classifier fusion methods suited for opportunistic systems, capable of incorporating
new knowledge online, monitoring their own performance, and dynamically
selecting most appropriate information sources’, as well as unsupervised dynamic
adaptation to cope with changes and trends in sensor infrastructure.
are collected are analyzed using various pattern recognition algorithms based on
ontology, machine learning, data mining (the discovery of previously unknown
properties in the data extracted from databases) or hybrid approaches. The use of
these techniques depends on the type of emotion channels or carriers that are
considered by a given application (in terms of operationalizing and modeling
emotions) to infer a user’s emotional state. Example sources for affective infor-
mation include emotiveness, prosodic features of speech, facial expressions, hand
gestures, and psychophysiological responses. These can be combined depending on
the features of the concrete AmI systems in relation to various application domains
(e.g., affective system, emotional intelligent system, emotional context-aware sys-
tem, context-aware affective system). Research shows that affective and context-
aware HCI applications are increasingly being equipped with the so-called multi-
modal user interfaces (i.e., facial, gesture, voice, and motion tracking interfaces),
which incorporate a wide variety of miniature dense sensors used to detect a user’s
emotional state by reading multimodal sources. Such applications are therefore
increasingly using multi-sensor fusion technology for multimodal recognition of
emotional states, as discussed above.
In computing, studies on emotion may be classified heuristically into two cat-
egories: face-based (micro-movement) recognition and non-face-based
(macro-movement and speech) recognition. The first category, which relates to
simple emotional states, involves recognizing emotions from facial expressions
using image analysis and understanding, and the second category, which pertains to
complex emotional states, focuses on recognition of emotions by modeling and
recognition based on hand gestures, body movement, and speech as human
behaviors (see Chap. 8 for a large body of work on emotion recognition). Laster has
been used for face and gesture recognition for HCI (Reilly 1998). Another popular
method for emotion recognition is biometric data (Teixeira et al. 2008). Dedicated
systems often facilitate the challenge of emotion detection (Ikehara et al. 2003;
Sheldon 2001; Vick and Ikehara 2003). Vital sign processing devices and other
specialized sensors have been used to detect emotional cues from heart rate, pulse,
skin temperature, galvanic skin response, electroencephalographic response, blood
pressure, perspiration, brain waves, and so on to help derive emotional states.
Miniaturization of computing devices, thanks to nano- and micro-engineering, is
making possible the development of wearable devices that can register parameters
without disturbing users.
Integrating sensors and microprocessors in everyday objects so they can think and
interact with each other and with the environment is common to the vision of AmI;
it also represents the core of UbiComp vision. Indeed, AmI and UbiComp visions
assume that people will be surrounded by intelligent user interfaces supported by
sensing and computing devices and wireless communication networks, which are
embedded in virtually all kinds of everyday objects, such as mobile phones, books,
paper money, clothes, and so on. However, sensor-based and multi-sensor-based
approaches are the most commonly used in the augmentation of mobile devices and
artifacts with awareness of their environment and situation as context. In relation to
mobile and ubiquitous computing, Gellersen et al. (2001) have attempted to inte-
grate diverse simple sensors as alternative to generic sensor for positioning and
vision, an approach which is ‘aimed at awareness of situational context that cannot
be inferred from location, and targeted at resource constraint device platforms that
typically do not permit processing of visual context.’ The authors have investigated
multi-sensor context awareness in a number of projects and developed various
device prototypes, including Technology Enabling Awareness (TEA): an awareness
module used for augmentation of a mobile phone, the Smart-Its platform for aware
mobile devices, and the Media-cup exemplifying context-enabled everyday arti-
facts. (See Beigl et al. (2001) for experience with design and use of
computer-augmented everyday artifacts). The sensor data collected were analyzed
using different methods for computing situational context, such as statistical
methods, rule-based algorithms, and neural networks.
4.10 The State-of-the-Art Context Recognition 167
Not only are sensors (embedded in user interfaces) used for detecting emotional and
cognitive states in HCI, but also for receiving new forms of explicit input so that
assistive agents can execute commands based on the sensor detection of signals,
thus performing many tasks effectively. In this case, user movement as explicit
input can be employed as part of a multimodal input or unimodal input design. To
provide intuitiveness and simplicity of interaction and hence reduce the cognitive
burden to manipulate systems, facial movements, gestures, and speech can allow
new forms of explicit input. Eye gaze, head movement, and mouth motion as facial
movements and hand gestures are being investigated in the area of HCI so that they
can be used as direct commands to computer systems. For example, using dedicated
sensors, facial interfaces with eye gaze tracking capability, a type of interface that is
controlled completely by the eyes, can track the user’s eye motion and translate it
into a command to perform different tasks, such as scrolling, dragging items, and
opening documents. Adjouadi et al. (2004) describe a system whereby eye position
coordinates were obtained using corneal reflections and then translated into
mouse-pointer coordinates. In a similar approach, Sibert and Jacob (2000) show a
significant speed advantage of eye gaze selection over mouse selection and consider
it as a natural, hands free method of input. Adjouadi et al. (2004) propose remote
eye gaze tracking system as an interface for persons with severe motor disability.
Similarly, facial movements have been used as a form of explicit input. As an
alternative to aid people with hand and speech disabilities, visual tracking of facial
movements has been used to manipulate and control mouse cursor movements, e.g.,
moving the head with an open mouth which causes an object to be dragged (Pantic
and Rothkrantz 2003). Also, de Silva et al. (2004) describe a system that tracks
mouth movements. In terms of gestures, utilizing distance sensors, Ishikawa et al.
(2005) propose touchless input system based on gesture commands. As regards
speech, it can be very promising as a new form of explicit input in various appli-
cation domains. On a mobile phone, given the size of its keypad, a message may be
cognitively demanding to type but very easy to be spoken to the phone. The whole
idea is to incorporate multiple modalities as new forms of explicit input to enhance
usability as a benefit to HCI. The limitation of one modality is offset by the
strengths of another, or rather used based on the context in which the user is, since
the context determines which modality can be accessible.
have been developed for a variety of aspects of human context (e.g., emotional
states, cognitive states, situations, social settings, activities, etc.). These models of
human contexts are represented in a formal and computational format, and incor-
porated in the context-aware systems that observe or monitor the cognitive, emo-
tional, social, and physical state or behavior of the user so such systems can perform
a more in-depth analysis of the human context, which can result in an context-aware
environment that may affect the situation of users by undertaking in a knowledgeable
manner actions that provide different kind of support or assistance. Investigating
approaches to context information modeling and reasoning techniques for context
information constitutes a large part of a growing body of research on the use of
context awareness as a technique for developing AmI applications that can adapt to
and act autonomously on behalf of users. Pattern recognition algorithms in
context-aware (and affective) computing have been under vigorous investigation in
the development of AmI applications and environments for ambient support. This is
resulting in a creative or novel use of pattern recognition algorithms. A multitude of
such algorithms and their integration have been proposed and investigated on the
basis of the way in which the contexts are operationalized, modeled, represented, and
reasoned about. This can be done during a specification process whereby, in most
cases, either concepts of context and their interrelationships are described based on
human knowledge (from human-directed disciplines) and represented in a compu-
tational format that can be used as part of reasoning processes to infer context, or
contexts are learned and recognized automatically, i.e., machine learning techniques
are used to build context models and perform further means of pattern recognition—
i.e., probabilistic and statistical reasoning. While several context recognition algo-
rithms have been applied in the area of context-aware computing, the most com-
monly used ones are those that are based on machine learning techniques, especially
supervised and unsupervised methods, and on ontological, logical, and integrated
approaches. Indeed, machine learning techniques and ontological approaches have
been integrated in various context-aware applications. This falls under what has
come to be known as ‘hybrid context modeling and reasoning approaches’, which
involve both knowledge representation formalisms and reasoning mechanisms.
Hybrid approaches involve other methods, such as rule-based methods, case-based
methods, and logic programing. Ontological and hybrid approaches are addressed in
more detail in Chap. 5.
This subsection aims to describe conceptual context models in terms of what
constitutes context information and the aspects and classes of contexts; provide an
overview of machine learning techniques and related methods; briefly describe
ontological and logical modeling and reasoning approaches; review work applying
these techniques and approaches; address uncertainty of context information; col-
lect together work dealing with uncertainty of context information in relation to
different approaches to context information modeling; and synthesize different
mechanisms for reasoning on uncertainty in the literature with a focus on proba-
bility theory and logic theory.
4.10 The State-of-the-Art Context Recognition 169
Conceptual context models are concerned with what constitutes context and its
conceptual structure. The semantics of what constitutes ‘context’ has been widely
discussed in the literature. And it is covered in more detail in the previous chapter
along with a detailed discussion of context operationalization in context-aware
computing. Likewise, defining what constitutes context information has been
studied extensively. Context information refers to the representation of the situation
—a set of contextual features—of an entity (e.g., user) in a computer system, where
these contextual features are of interest to a service provider for assessing the
timeliness and user-dependent aspects of assistive service delivery. There is a wide
variety of works that identify qualitative features of context information. Context is
framed by Schmidt et al. (1999) as comprising of two main components, human
factors and physical environment. Human factors related context encompasses three
categories: information on the user (knowledge of habits, emotional state,
bio-physiological conditions), the user’s tasks (activity, engaged tasks, general
goals), and the user’s social environment (social interaction, co-location of other,
group dynamics). Similarly, physical environment related context encompasses
three categories: location (absolute position, relative position, co-location), infra-
structure (computational communication and information resources, task perfor-
mance), and physical conditions (light, temperature, pressure, noise). Their model is
one of the first endeavors in the field of context-aware computing to explicitly
conceptualize context or model context information. As illustrated in Fig. 4.6,
context is modeled using features, namely there is a set of relevant features for each
context and a value range is defined for each feature. Building on Schmidt et al.
(1999), Göker and Myrhaug (2002) present AmbieSense system, where user context
consist of five elements: environment context (place where user is); personal
context (physiological and cognitive state); task context (activity); social context
(social aspects of the current user context); and spatiotemporal context (time and
spatial extent for the user context). In the context of work, Krish (2001) describes
context as ‘highly structured amalgam of informational, physical and conceptual
resources that go beyond the simple facts of who or what is where and when to
include the state of digital resources, people’s concepts and mental state, task state,
social relations and the local work culture, to name a few ingredients.’ Based on
Schmidt et al. (1999) model, Korpipaa et al. (2003) present a context structure with
the following properties: context type, context value, source, confidence, time-
stamp, and attributes. The Context Toolkit by Dey et al. (2001) is based on a
framework consisting of context widgets, aggregators, interpreters, services, and
discoverers, and in this framework: widgets collect context information, aggrega-
tors assemble information that concerns a certain context entity, interpreters analyze
or process the information to generate a high-level abstraction of context, services
perform actions on the environment using the context information, and discoverers
find the other components in the environment. There have been so many attempts to
model context, e.g., Dey et al. (2001), Jang and Woo (2003), and Soldatos et al.
(2007), but to name a few. It is to note that most of the above work does not provide
computational and formal representations of the proposed models using any related
technique.
One of the challenges in context-aware computing (or AmI) has been to provide
frameworks that cover the class of applications that exhibit human-like under-
standing and intelligent behavior. In this context, human-like understanding sig-
nifies analyzing (or interpreting and reasoning about) and estimating (or inferring)
the human’s context—the states of the user as, in an ideal case, in the manner in
which the user perceives them (what is going on his/her mind), a process for which
input is the observed information about the user’s cognitive, emotional, psycho-
physiological, and social states over time (i.e., human behavior monitoring), and
dynamic human process and human context models. As to human-like intelligent
behavior, it entails the system coming up with and firing the context-dependent
actions that provide support to the user’s cognitive, emotional, and social needs.
Acting upon or interacting based on human factors related context relates to human
functioning, which is linked to the behavioral patterns of individuals in the different
systems that they form part of within their environment. In reference to human
aspects in AmI, Bosse et al. (2007) propose a framework combining different
ingredients, as shown in Fig. 4.7, including human state and history models,
environment state and history models, profiles and characteristics models of
humans, ontologies and knowledge from psychological and/or social disciplines,
dynamic process models about human functioning, dynamic environment process
models, and methods for analysis on the basis of such models. Examples of such
analysis methods—in relation to AmI in general—include prosodic features anal-
ysis, facial expression analysis, gesture analysis, body analysis, eye movement
analysis, psychophysiological analysis, communicative intent analysis, social pro-
cess analysis, and so on.
4.10 The State-of-the-Art Context Recognition 171
Fig. 4.7 Framework to combine the ingredients. Source Bosse et al. (2007)
Making artifacts ‘able to compute and communicate does not make them intelli-
gent: the key (and challenge) to really adding intelligence to the environment lies in
the way how the system learns and keeps up to date with the needs of the user by
itself. A thinking machine, you might conclude—not quite but close: if you rely on
the intelligent environment you expect it to operate correctly every time without
tedious training or updates and management. You might be willing to do it once but
not constantly even in the case of frequent changes of objects…or preferences in the
environment. A learning machine, I’ll say.’ (Riva et al. 2005).
Machine learning is a subfield of computer science (specifically a subspecialty of
AI) that is concerned with the development of software programs that provide
computers with the ability to learn from experiences without following explicitly
programed instructions—that is, to teach themselves to grow and change when
exposed to new data. As a widely quoted, more formal definition provided by
Mitchell (1997, p. 2), ‘A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E’. This definition of
machine learning in fundamentally operational terms resonate with the idea that
172 4 Context Recognition in AmI Environments …
computer can think, which underlies the vision of AmI and UbiComp. Machine
learning is employed in AmI where designing and programing ontological,
rule-based, and logical algorithms is inadequate, unsuitable, or infeasible. However,
it is computationally unfeasible to build models for all sorts of situations of life and
environments—in other words, training sets or trained classes are finite, environ-
ments are dynamic, and the future is uncertain, adding to the limited sensor data.
Hence, notwithstanding the huge potential of machine learning techniques, the
underlying probability theory usually does not yield assurances of the performance
of algorithms; rather, probabilistic (and statistical) reasoning limits to the perfor-
mance are quite common. This relates to computational learning theory, a branch of
theoretical computer science that is concerned with the computational analysis of
machine learning algorithms and their performance in relation to different appli-
cation domains. Furthermore, machine learning has been extensively applied in
various HCI application domains within the area of AmI and AI, e.g., context-aware
computing, affective computing, and conversational systems. In AmI, machine
learning and reasoning aims at monitoring the actions of humans along with the
state change of the environment using various types of sensors as well as actuators
to react and pre-act in response to human actors. A major strand of context and
activity recognition algorithms is based on supervised and unsupervised learning
approaches as machine learning techniques. Machine learning entails various types
of algorithms, which can be classified into different categories based on the type of
input available during machine training or the desired outcome of the algorithm—
e.g., context recognition, including, in addition to supervised and unsupervised,
semi-supervised learning (combines both labeled and unlabeled examples to gen-
erate an appropriate classifier), transductive inference (attempts to predict new
outputs on specific test cases from observed training cases), learning to learn (learns
its own inductive bias based on previous experience), reinforcement learning (the
agent act in a dynamic environment by executing actions which trigger the
observable state of that environment to change, and in the process of acting, it
attempts to gather information about how the environment reacts to its actions as
well as to synthesize a sequence of actions that maximizes some notion of cumu-
lative reward), and so on. However, other strands of context and activity recogni-
tion algorithms are broadly based on logical, ontological, or hybrid modeling and
reasoning.
Supervised Learning: The basic idea of supervised learning is to classify data in
formal categories that an algorithm is trained to recognize. In context-aware
computing, supervised learning entails using a learning data set or labeled data
upon which an algorithm is trained, and the algorithm classifies unknown con-
textual data following training and thus grow and change as it get exposed to new
experiences. In this sense, the machine learning process examines a set of atomic
contexts which have been pre-assigned to categories, and makes inductive
abstractions based on this data that assists in the process of classifying future atomic
contexts into, for example cognitive, emotional, or situational context. Approaches
based on supervised learning require an important training period during which
several examples of each context and related concepts are collected and analyzed.
4.10 The State-of-the-Art Context Recognition 173
Fig. 4.8 Context awareness as an adaptive process where the system incrementally creates a
model of the world it observes. Source Adapted from Van Laerhoven and Gellersen (2001)
174 4 Context Recognition in AmI Environments …
models on the basis of domain knowledge and training data, and then performing
inference according to the rules of probability theory.’ The detection and recog-
nition of emotional states from facial expressions can thus be achieved through
various classifiers or methods. It is worth mentioning that emotion recognition
based on facial expressions is a de facto standard in context-aware computing and
affective computing as well as emotionally intelligent and conversational systems.
As emotions are inherently multimodal, to provide a more robust estimation of the
user’s emotional state, different modalities can be combined so too can classifiers.
Caridakis et al. (2006) combine facial expressions and speech prosody, and
Balomenos et al. (2004) combine facial expressions and hand gestures. Further, in
the context of activity recognition, HMMs are adopted in Patterson et al. (2005),
Ward et al. (2006) and Boger et al. (2005), dynamic and naïve Bayesian networks
in Philipose et al. (2004), Wang et al. (2007) and Albrecht and Zukerman (1998),
decision trees in Tapia and Intille (2007), nearest neighbor in Lee and Mase (2002)
and SVMs in Huynh (2008). With regard to wearable computing, HMMs are used
for ‘Learning Signification Locations and Predicting User Movement with GPS’
(Ashbrook and Starner 2002) and neural networks in Van Laerhoven et al. (2002)
using many sensors (accelerometers) to build models of and analyze user’s body
movement. Brdiczka et al. (2007) propose a four-layered situation learning
framework, which acquires different parts of a situation model, namely situations
and roles, with different levels of supervision. Situations and roles are learned from
individual audio and video data streams. The learning-based approach has a 100 %
recognition rate of situations with pre-segmentation.
Among the above models and algorithms for supervised learning, HMMs and
Bayes networks are thus far the most commonly applied methods in the area of
context recognition. While both of these methods have been shown to be successful
in context-aware computing, they are both very complex and require lots of a large
amount of labeled training and test data. This is in fact the main disadvantages of
supervised learning in the case of probabilistic methods, adding to the fact that it
could be computationally costly to learn each context in a probabilistic model for an
infinite richness or large diversity of contexts in real-world application scenarios
(see Chen and Nugent 2009). Moreover, given that context-aware applications
usually incorporate different contextual features of the user that should be combined
in the inference of a particular dimension of context, and one feature may, in turn,
involve different types of sensor data, e.g., emotional feature of a user’s context
include data from image sensor, voice sensor, and biosensor, adding to the varia-
tions of users’ states and behaviors, the repetitive diversification of the partitioning
of the training and test sets may not lead to the desired outcome with regard to the
generalization with the context recognition models. This has implication for the
accuracy of the estimation of context, that is, the classification of dynamic con-
textual data into relevant context labels. Machine learning methods in the case of
probabilistic methods ‘choose a trade-off between generalization and specification
when acquiring concepts from sensor data recordings, which does not always meet
the correct semantics, hence resulting in wrong detections of situations’ (Bettini
et al. 2010, p. 11). A core objective of a learning algorithm is to generalize from its
4.10 The State-of-the-Art Context Recognition 177
can be used to serve various purposes in this regard, such as modeling uncertainty,
reasoning on uncertainty, and capturing domain heuristics (see, e.g., Bettini
et al. 2010; Chen and Nugent 2009). It is worth mentioning that uncertainty is one
of the weaknesses of ontological approaches in terms of both modeling and rea-
soning. However, unsupervised learning probabilistic methods are usually static
and highly context-dependent, adding to their limitation as to the assignment of the
handcrafted probabilistic parameters (e.g., modeling uncertainty, capturing heuris-
tics) for the computation of the context likelihood (see Chen and Nugent 2009).
Indeed, they seem to be less applied than supervised learning in the domain of
context recognition.
2. to aggregate, fuse, and transform sensor data into semantic terms; and
3. to perform descriptive-logic based reasoning, e.g., subsumption, to interpret
atomic context concepts and then deduce or infer a high-level context abstraction.
The logical and ontological approaches to context representation and reasoning
are acknowledged to be semantically clear in computational reasoning. See next
chapter for a detailed account of ontological approach to context modeling and
reasoning, including its strengths and weaknesses. The strength of logical
approaches lies in the easiness to integrate domain knowledge and heuristics for
context models and data fusion, and the weakness ‘in the inability or inherent
infeasibility to represent fuzziness and uncertainty’; they ‘offer no mechanism for
deciding whether one particular model is more effective than another, adding to ‘a
lack of learning ability associated with logic based methods’ (Ibid).
Like supervised and supervised leaning probabilistic methods, there is a range of
logical modeling methods and reasoning mechanisms with regard to logical theories
(e.g., situation theory, event theory, lattice theory) and representation formalisms
(e.g., first-order logic, inductive logic, description logic, fuzzy logic). In terms of
logical representation, first-order logic can express facts about objects and their
properties and interrelationships, and allows the use of predicates and quantifiers
(see Russell and Norvig 2003; Luger and Stubblefield 2004). In a project called
Gaia, a predicate logic representation of context information is developed by
Ranganathan and Campbell (2003) based on logic programing using XSB (Sagonas
et al. 1994). In this model, a first order predicate is associated with each context,
with its designation describing the context type. As a logic operator, quantification
is always done over finite sets, and can be used in addition to other logic operators,
such as conjunction, disjunction, and negation, to combine the context predicates
into more complex context descriptions (see Perttunen et al. 2009). Ranganathan
and Campbell (2004) have used AI planning techniques to the Gaia system, namely
‘STRIPS’ (Brachman and Levesque 2004) planning. In his thesis, Ranganathan
(2005) states that that they believed that planning computationally was too costly
for their system. Henricksen and Indulska (2006) applied predicate logic to infer a
situation abstraction. High-level situation abstractions are expressed in their model
using a novel form of predicate logic that balances efficient evaluation against
expressive power. They define a grammar for formulating high-level situation
abstractions that model real-world situations in order to evaluate more complex
conditions than can be captured by assertions. Assertion is used to define the sets
over which the quantification is performed. Assertions that are interpreted under a
closed-world assumption of three-valued logics are used to reduce the values in
quantified expressions describing situations. High-level situation abstractions can
be incrementally combined to form more complex logical expressions. Moreover,
the context predicates can be combined using different logic operators into more
complex context descriptions. This is similar to the case of description-logic based
reasoning, where the fillers of a number of properties can be linked to form a
context description, the inference of unknown context described by the perceived
properties.
180 4 Context Recognition in AmI Environments …
By their very nature, humans are exquisitely attuned to their context. They rec-
ognize, understand, and respond to it without being explicitly or necessarily aware
of doing so. This indicates the subtlety of human sensory organs and internal
representations and structures involved in handling context in real-world situations
in terms of the cognitive information processing. Once established cognitive,
schemata facilitate the interpretation of new experiences, enabling humans, for
example to perform accurately on new, unseen contexts. In other words, humans
resort to schemas that provide a recognizable—yet evaluating in time—meaning of
contexts in order to make sense of a complex reality in terms interaction. The
specifics of context are dynamic, volatile, subjective, fluid, intricate, and subtle.
Hence, they are difficult to identify to be measured and modeled (operationalized),
4.10 The State-of-the-Art Context Recognition 181
which may well hinder the system to estimate or make predictions about users’
cognitive and emotional needs at a given moment. Our measurement of the
real-world is prone to uncertainty due to the use of imprecise sensors—and thus
imperfect sensing. As contextual data often originate from sensors, uncertainty
becomes unavoidable. Likewise, computational models and reasoning mechanisms
must necessarily be (over) simplified, as they are circumscribed by existing tech-
nologies. To simulate representations and structures and mental information pro-
cesses of humans into computer systems has been a daunting challenge in AI.
Consequently, context-aware systems are faced with the inevitability of the
employment of alterative techniques to deal with the issues of uncertainty, vague-
ness, erroneousness, and incompleteness of sensor context information in relation to
modeling methods and reasoning algorithms. A real challenge in context-aware
computing is to build robust, accuracy-enhanced, and comprehensive context
models that can deal with these issues. Bettini et al. (2010) point out that
context-aware applications are required to capture and make sense of imprecise and
conflicting data about the physical world, as its measurements are merely prone to
uncertainty. One aspect to consider in this regard is to formally conceptualize
context entities as dynamic rather than static entities, fixed routines with common
sense patterns and heuristics. While the respective problem seems to be difficult to
eradicate, especially when it comes to dealing with human functioning (emotional,
cognitive and behavioral processes), it is useful to develop innovative techniques
and methods to deal with it in ways that at reduce its effect on the performance of
context-aware applications because of imperfect inferences. In context-aware
applications, adaptation decisions ‘are made based on evaluation of context infor-
mation that can be erroneous, imprecise or conflicting’, and hence ‘modeling of
quality of context information and reasoning on context uncertainty is a very
important feature of context modeling and reasoning’ (Bettini et al. 2010, p. 2).
Failure to overcome the issue of uncertainty has implications for the quality of
context-aware applications in terms of the relevancy of delivered services—wrong
choices as to context-dependent actions—due to wrong detections of situations or
imperfect inferences of high-level abstraction of contexts. Therefore, uncertainty is
increasingly becoming a topic of importance and thus gaining a place in the research
area of context-aware computing in relation to low-level data acquisition,
intermediate-level information processing, and high-level service delivery and
applications. Many computational problems associated with context-aware func-
tionality, namely learning, sensing, representation, interpretation, reasoning, and
acting entail that software agents operate with uncertain, imprecise, or incomplete
contextual information. Different types of software objects in the environment must
be able to reason about uncertainty, including ‘entities that sense uncertain contexts,
entities that infer other uncertain contexts from these basic, sensed contexts, and
applications that adapt how they behave on the basis of uncertain contexts. Having a
common model of uncertainty that is used by all entities in the environment makes it
easier for developers to build new services and applications in such environments
and to reuse various ways of handling uncertainty.’ (Bettini et al. 2010, p. 12). In a
recent review of context representation and reasoning in pervasive computing
182 4 Context Recognition in AmI Environments …
(Perttunen et al. 2009), the authors stated that there is only a handful of work on
context-aware system that deals with representation and reasoning under
uncertainty.
Over the last decade, there have been some attempts to create models that deal with
uncertainty issues when representing and reasoning about context information.
A number of research projects have focused on modeling of quality of context
information and reasoning on context uncertainty as an important feature of context
modeling and reasoning. Among the early efforts to address and overcome uncer-
tainty is the work by Schmidt et al. (1999) and Dey et al. (2000). Schmidt and
colleagues associate each of their context values with a certainty measure which
captures the likelihood that the value accurately reflects reality, whereas Dey and
colleagues suggest a method whereby ambiguous information can be resolved by a
mediation process involving the user. This solution is particularly viable when the
context information is manageable in terms of the volume and not subject to rapid
change, so that the user is not unreasonably burdened (Bettini et al. 2010). In Gray
and Salber (2001), the authors discuss the issue of information quality in general
and include it as a type of meta-information in their context model. They describe
six quality attributes: coverage, resolution, accuracy, repeatability, frequency, and
timeliness. Allowed by a context service, different quality metrics are associated
with context information, as described by Lei et al. (2002). Ranganathan et al.
(2004a, b) provide a classification of different types of quality metrics that can be
associated with location information acquired from different types of sensors. These
metrics are: (1) resolution, which is the region that the sensor states the mobile
object is in, and can be expressed either as a distance or as a symbolic location,
depending on the type of sensor, e.g., GPS and card-reader, respectively;
(2) Confidence, which is measured as the probability that the person is actually
within a certain area, which is calculated based on which sensors can detect that
person in the area of concern; and (3) Freshness, which is measured based on the
time that has elapsed since the sensor reading, assuming that all sensor readings
have an expiry time.
Furthermore, an attempt of modeling uncertain context information with
Bayesian networks has been undertaken by Truong et al. (2005). They suggest
representing Bayesian networks in a relational model, where p-classes are used to
store probabilistic information, i.e., their properties have concomitant constraints:
parents-constraint and conditional probability table constraint. In Henricksen and
Indulska’s (2006) model whose interpretation is based on three-valued logic or
under the closed-world assumption, the ‘possibly true’ value is used to represent
uncertain information. To represent contexts information, Mäntyjärvi and Seppänen
(2002) adopted fuzzy logic as manifested in vague predicates to represent various
types of user activities as concepts. Ranganathan et al. (2004a) developed an
uncertainty model and describe reasoning with vague and uncertain information in
4.10 The State-of-the-Art Context Recognition 183
the Gaia system, their distributed middleware system for enabling Active Spaces.
This model is based on a predicate representation of contexts, where a confidence
value, which can be interpreted as one of two values: a probability in probabilistic
logic or a membership value in fuzzy logic, is assigned for each context predicate.
In other words, it measures the probability in the case of probabilistic logic or the
membership value in the case of fuzzy logic of the event that corresponds to the
context predicate holding true. Thus, this model uses various mechanisms such as
fuzzy logic, probabilistic logic, and Bayesian networks. However, the authors state
that probabilities or confidence values to be associated with types of context
information cannot be known by the designer. Approaches to inferring and pre-
dicting context information from sensor data in a bottom-up manner are proposed
by Mayrhofer (2004) and Schmidt (2002). In the ECORA framework (Padowitz
et al. 2008), a hybrid architecture for context-oriented pervasive computing, context
information is represented as a simple multi-dimensional vector of sensor mea-
surements, a space where a context is described as a range of values. A confidence
value is derived on the basis of the current sensor measurements (observations) and
the context descriptions to represent the ambiguity or uncertainty in the occurrence
of a context.
As to ontological approaches to context modeling and reasoning, there are a few
projects that have attempted to address the issue of representing and reasoning about
uncertainty. Straccia (2005) and Ding and Peng (2004) propose to extend existing
ontology languages and related reasoning tools to support fuzziness and uncertainty
while retaining decidability. However, according to Bettini et al. (2010), the few
existing preliminary proposals to extend OWL-DL to represent and reason about
fuzziness and uncertainty do not properly support uncertainty in context data at the
time of writing ontology languages and related reasoning tools. As echoed in a recent
survey carried out by Perttunen et al. (2009), none of the description logic-based
approaches are capable of dealing with uncertainty and vagueness. Although some
work (e.g., Schmidt 2006; Reichle et al. 2008) attempted combine ontological
modeling with modeling of uncertainty as an attempt to approach the issue, it falls
short in considering and preserving the benefits of formal ontologies. In all, sum-
marizing a review of work on modeling vagueness and uncertainty, Perttunen et al.
(2009) note that no work presents a model that satisfies all the requirements for
context representation and reasoning; and seemingly ‘the benefit of modeling
uncertainty and fuzziness has not been evaluated beyond the capability of repre-
senting it’, meaning that ‘the work doesn’t make it clear how easy it is to utilize such
models in applications…and in what kind of applications does it benefit the users.’
Based on the literature, empirical work that deals with representing and rea-
soning under uncertainty in relation to cognitive and emotional context-aware
systems is scant, regardless of whether context pattern recognition algorithm is
based on machine learning techniques or ontological approaches to modeling and
reasoning. This can probably be explained by the fact that the research within
emotional and cognitive context awareness is still in its infancy, and thus the
associated modeling methods and reasoning algorithms are not as mature as those
related to situational context, activity, and location.
184 4 Context Recognition in AmI Environments …
However, it is argued that probabilistic logics are associated with some difficulties,
manifested in their tendency to multiply the computational complexities of their
probabilistic and logical components.
Hidden Markov Models (HMMs): HMMs have a wide applicability in context
awareness for different problems, including learning, inference, and prediction.
They represent ‘stochastic sequences as Markov chains; the states are not directly
observed, but are associated with observable evidences, called emissions, and their
occurrence probabilities depend on the hidden states’ (Bettini et al. 2010, p. 14).
They have been used for location prediction. For example, Ashbrook and Starner
(2002) adopt HMMs that can learn signification locations and predict user move-
ment with GPS sensors. In a similar approach, Liao et al. (2007) adopt a hierar-
chical HMM that can learn and infer a user’s daily actions through an urban
community. Multiple levels of abstraction are used in their model to bridge the gap
between raw GPS sensor measurements and high-level information.
Bayesian networks: Based on probability theory, Bayesian networks can be used
for a wide range of problems in AI and AmI: perception using dynamic Bayesian
networks (e.g., Russell and Norvig 2003), learning using the expectation-
maximization algorithm (e.g., Poole et al. 1998; Russell and Norvig 2003), and
reasoning using the Bayesian inference algorithm (e.g., Russell and Norvig 2003;
Luger and Stubblefield 2004). They ‘are directed acyclic graphs, where the nodes
are random variables representing various events and the arcs between nodes rep-
resent causal relationships. The main property of a Bayesian network is that the
joint distribution of a set of variables can be written as the product of the local
distributions of the corresponding nodes and their parents’ (Bettini et al. 2010,
p. 14). They provide efficiency in representing conditional probabilities in the case
of sparsity as to the dependencies in the joint distribution, and are well suited for
inferring higher level contexts and combining uncertain information from a large
number of sources (Ibid).
differently. First topic concerns capture technologies and related signal and data
processing approaches. Second topic deals with pattern recognition methods and
algorithms and related models that are used to learn or represent and interpret and
reason about contexts. Part of this topic, in relation to ontological approach, pertains
to explicit specification of key concepts and their interrelationships for a certain
context domain and their formal representation using the commonly shared ter-
minology in that domain. The third topic is concerned with context-dependent
actions or ambient service delivery, involving application and adaptation rules. In
all, context-aware functionality is established through capturing, collecting, orga-
nizing, and processing context information to support adaptation of services in the
AmI spaces. This occurs at different levels of the system. This implies that
context-aware applications are based on a multilayered architecture, encompassing
different, separate layers of context information processing, i.e., raw sensor data,
feature extraction, classification or clustering (in the case of a supervised or
unsupervised learning methods), and high-level context derivation from semantic or
logical information (in the case of an ontological and logical approaches).
Figure 4.9 illustrates, in addition to the physical layer of sensors, three layers of
context information processing, along with examples techniques and methods that
have typically been used in context-aware computing. The arrows depict the flow of
context data/information.
Layer 1—Physical sensors: Signals in the environment are detected from mul-
tiple sources using various types of sensors. This sensor layer is usually defined by
open–ended (unrestricted) collection of sensors embedded within computer sys-
tems, i.e., user interfaces, attached to humans or objects, or spread in the envi-
ronment. The data supplied by sensors in a particular context-aware application can
be very different, ranging from slow sensors to fast and complex sensors (e.g.,
MMES, multi-sensors) that provide larger volume of data like those used for
detecting human activities or emotional states. It is also expected that the update
rate can vary largely from sensor to another, depending on the nature of the context.
Some generic context-aware application may deal with a large amount of context
information types beyond location, co-location, time, and identity, to include
emotional states, cognitive states, and activities; the current temporal and spatial
location; physical conditions; and preferences details. Moreover, manually entered
information also constitutes part of context information. context-aware applications
involve both implicit and explicit inputs; on the one hand, context data are acquired
from invisibly embedded sensors (or software equivalents), and, on the other hand,
via keyboard, touchtone screen, pointing device, or manual gestures. Context-aware
services execute service logic, based on information provided explicitly by end
users and implicitly by sensed context information (Dey 2001; Brown et al. 1997;
Schmidt 2005). Based on context data and explicit user input the application logic
defines which new data can be inferred as new context data at inference level and
then which action(s) should be performed at application level. But before this,
sensor data should first be aggregated, fused, and transformed into features.
Layer 2—Context data processing and computation: this layer is dedicated
to aggregate, fuse, organize, and propagate context data for further computation.
4.10 The State-of-the-Art Context Recognition 187
On this layer signal processing, data processing, and pattern recognition methods
are used to recognize context either from sensor signals and labeled annotations
(classification) or from the data stream of multiple sensors—groups of similar
examples (clustering). It is to note that architectures for context-aware applications
usually do not prescribe specific methods for feature extraction. Referred to as
‘cooking the sensors’ (Golding and Lesh 1999), abstraction from sensors to cues
provide the advantage of reducing the data volume independent of any specific
application (Gellersen et al. 2001). The bottom part of this layer provides uniform
interface defined as set of cues describing the sensed user’s context. In this sense,
‘the cue layer strictly separates the sensor layer and context layer which means
context can be modeled in abstraction from sensor technologies and properties of
specific sensors. Separation of sensors and cues also means that both sensors and
feature extraction methods can be developed and replaced independently of each
188 4 Context Recognition in AmI Environments …
the actions taken at the application level can be oriented towards ambient services
that support the user’s cognitive and emotional needs. Used to describe queries and
subscriptions, context query languages (CQLs) (see Reichle et al. 2008; Haghighi
et al. 2006 for detailed reviews) are broadly used by context-aware application to
access context information from context providers. Again, architectures for
context-aware applications do not prescribe specific languages for querying context
from service providers, and thus different query languages can be used; however,
the selection of the query language is based on the context representation tech-
niques used in layer 3 (ontological versus logical representation). Specifically, as
explained by Perttunen et al. (2009, p. 2), ‘The meaning of the queries must be
well-specified because in the implementation the queries are mapped to the rep-
resentations used in the middle layer. An important role of the middle layer and the
query language is to eliminate direct linking of the context providing components to
context consuming components… Thus, the query language should support que-
rying a context value regardless of its source. However, knowing the source of a
value may be useful for the client in the case of finding the cause of an erroneous
inference, for example, and can thus be included in the query response. It should be
noted that since the CQL acts as a facade for the applications to the underlying
context representation, the context information requirements of the applications are
imposed as much on the query language as on the context representation and
context sources… Procedural programing is typically used…to create queries and
to handle query responses, adapting the application according to context. In con-
trast, the context representation…can be purely declarative.’
References
Adjouadi M, Sesin A, Ayala M, Cabrerizo M (2004) Remote eye gaze tracking system as a
computer interface for persons with severe motor disability. In: Proceedings of the 9th
international conference on computers helping people with special needs, Paris, pp 761–766
Albrecht DW, Zukerman I (1998) Bayesian models for keyhole plan recognition in an adventure
game. User Model User Adap Interaction 8:5–47
Ashbrook D, Starner T (2002) Learning signification locations and predicting user movement with
GPS. In: The 6th international symposium on wearable computer. IEEE Computer Society, Los
Alamitos, CA, pp 101–108
Balomenos T, Raouzaiou A, Ioannou S, Drosopoulos A, Karpouzis K, Kollias S (2004) Emotion
analysis in man–machine interaction systems. In: Bengio S, Bourlard H (eds) Machine learning
for multimodal interaction, vol 3361. Lecture Notes in Computer Science, Springer, pp 318–328
Bao L, Intille S (2004) Activity recognition from user annotated acceleration data. In: Proceedings
of pervasive, LNCS3001, pp 1–17
Ballard DH, Brown CM (1982) Computer vision. Prentice Hall, New Jersey
Barghout L, Sheynin J (2013) Real-world scene perception and perceptual organization: lessons
from computer vision. J Vision 13(9):709–709
Baron-Cohen S (1995) Mindblindness. MIT Press, Cambridge
Barwise J, Perry J (1981) Situations and attitudes. J Philos 78(11):668–691
Beigl M, Gellersen HW, Schmidt A (2001) Mediacups: experience with design and use of
computer-augmented everyday objects. Comput Netw 35(4):401–409
190 4 Context Recognition in AmI Environments …
Farringdon J, Moore AJ, Tilbury N, Church J, Biemond PD (1999) Wearable sensor badge and
sensor jacket for contextual awareness. In: 3rd international symposium on wearable
computers. IEEE Computer Society, Los Alamitos, CA, pp 107–113
Farringdon J, Oni V (2000) Visual augmented memory (VAM). In: Proceedings of the IEEE
international symposium on wearable computing (ISWC’00), Atlanta, GA, pp 167–168
Fiore L, Fehr D, Bodor R, Drenner A, Somasundaram G, Papanikolopoulos N (2008)
Multi-camera human activity monitoring. J Intell Rob Syst 52(1):5–43
Fishkin KP (2004) A taxonomy for and analysis of tangible interfaces. Personal Ubiquitous
Comput 8(5):347–358
Galotti KM (2004) Cognitive psychology in and out of the laboratory. Wadsworth
Gärdenfors P (2003) How homo became sapiens: on the evolution of thinking. Oxford University
Press, Oxford
Gaura E, Newman R (2006) Smart MEMS and sensor systems. Imperial College Press, London
Gellersen HW, Schmidt A, Beigl M (2001) Multi-sensor context-awareness in mobile devices and
smart artefacts. Department of Computing, Lancaster University, Lancaster, UK, Teco
University of Karlsruhe, Germany
Golding A, Lesh N (1999) Indoor navigation using a diverse set of cheap wearable sensors. In:
Proceedings of the IEEE international symposium on wearable computing (ISWC99), San
Francisco, CA, pp 29–36
Goldman AI (2006) Simulating minds: the philosophy, psychology and neuroscience of mind
reading. Oxford University Press, Oxford
Gray PD, Salber D (2001) Modelling and using sensed context information in the design of
interactive applications. In: Proceedings of the 8th IFIP international conference on engineering
for human–computer interaction (EHCI ’01), vol 2254. Springer, Toronto, pp 317–335
Gunes H, Piccardi M (2005) Automatic visual recognition of face and body action units. In:
Proceedings of the 3rd international conference on information technology and applications,
Sydney, pp 668–673
Göker A, Myrhaug HI (2002) User context and personalisation. In: ECCBR workshop on case
based reasoning and personalisation, Aberdeen
Haghighi PD, Zaslavsky A, Krishnaswamy S (2006) An evaluation of query languages for
context-aware computing. In: 17th international conference on database and expert systems
applications. IEEE, Krakow, pp 455–462
Henricksen K, Indulska J (2006) Developing context-aware pervasive computing applications:
models and approach. Pervasive Mobile Comput 2(1):37–64
Huynh DTG (2008) Human activity recognition with wearable sensors. PhD thesis, TU Darmstadt,
Darmstadt
Huynh T, Schiele B (2006) Unsupervised discovery of structure in activity data using multiple
eigenspaces. In: The 2nd international workshop on location- and context-awareness (LoCA),
vol 3987, LNCS
Huynh T, Blanke U, Schiele B (2007) Scalable recognition of daily activities with wearable
sensors. In: The 3rd international symposium on location- and context-awareness (LoCA), vol
4718, pp 50–67
Ikehara CS, Chin DN, Crosby ME (2003) A model for integrating an adaptive information filter
utilizing biosensor data to assess cognitive load. In: Brusilovsky P, Corbett AT, de Rosis F
(eds) UM 2003, vol 2702. LNCS, Springer, Heidelberg, pp 208–212
Ishikawa T, Horry Y, Hoshino T (2005) Touchless input device and gesture commands. In:
Proceedings of the international conference on consumer electronics, Las Vegas, NV, pp 205–206
Ivano Y, Bobick A (2000) Recognition of visual activities and interactions by stochastic parsing.
IEEE Trans Pattern Anal Mach Intell 22(8):852–872
Jain AK (2004) Multibiometric systems. Commun ACM 47(1):34–44
Jähne B, Haußecker H (2000) Computer vision and applications, a guide for students and
practitioners. Academic Press, Massachusetts
Jang S, Woo W (2003) Ubi-UCAM: a unified context-aware application model. In: Modeling and
using context, pp 1026–1027
192 4 Context Recognition in AmI Environments …
José R, Rodrigues H, Otero N (2010) Ambient intelligence: beyond the inspiring vision. J Univ
Comput Sci 16(12):1480–1499
Kahn JM, Katz RH, Pister KSJ (1999) Next century challenges: mobile networking for “Smart
Dust”. Department of ElectricalEngineering and Computer Sciences. University of California
Kaiser S, Wehrle T (2001) Facial expressions as indicators of appraisal processes. In: Scherer KR,
Schorr A, Johnstone T (eds) Appraisal processes in emotions: theory, methods, research.
Oxford University Press, New York, pp 285–300
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In:
International conference on automatic face and gesture recognition, France, pp 46–53
Kautz H (1991) A formal theory of plan recognition and its implementation. In: Allen J, Pelavin R,
Tenenberg J (eds) Reasoning about plans. Morgan Kaufmann, San Mateo, CA, pp 69–125
Kern K, Schiele B, Junker H, Lukowicz P, Troster G (2002) Wearable sensing to annotate meeting
recordings. In: The 6th international symposium on wearable computer. The University of
Washington, Seattle, pp 186–193
Kim S, Suh E, Yoo K (2007) A study of context inference for web-based information systems.
Electron Commer Res Appl 6:146–158
Kirsh D (2001) The context of work. Human Comput Interaction 16:305–322
Klette R (2014) Concise computer vision, Springer, Berlin
Korpipaa P, Mantyjarvi J, Kela J, Keranen H, Malm E (2003) Managing context information in
mobile devices. IEEE Pervasive Comput 2(3):42–51
Kwon OB, Choi SC, Park GR (2005) NAMA: a context-aware multi-agent based web service
approach to proactive need identification for personalized reminder systems. Expert Syst Appl
29:17–32
Laudon KC, Laudon JP (2006) Management information systems: managing the digital firm.
Pearson Prentice Hall, Upper Saddle River, NJ
Lee SW, Mase K (2002) Activity and location recognition using wearable sensors. IEEE Pervasive
Comput 1(3):24–32
Lee CM, Narayanan S, Pieraccini R (2001) Recognition of negative emotion in the human speech
signals. In: Workshop on automatic speech recognition and understanding
Lei H, Sow DM, John I, Davis S, Banavar G, Ebling MR (2002) The design and applications of a
context service. SIGMOBILE Mobile Comput Commun Rev 6(4):45–55
Liao L, Fox D, Kautz H (2007) Extracting places and activities from GPS traces using hierarchical
conditional random fields. Int J Rob Res 26(1):119–134
Ling B (2003) Physical activity recognition from acceleration data under semi-naturalistic
conditions. Masters thesis, Massachusetts Institute of Technology (MIT), MA
Luger G, Stubblefield W (2004) Artificial intelligence: structures and strategies for complex
problem solving. The Benjamin/Cummings Publishing Company, Inc
Lyshevski SE (2001) Nano- and microelectromechanical systems: fundamentals of nano- and
microengineering. CRC Press, Boca Ratón, EUA
Mayrhofer R (2004) An architecture for context prediction. In: Ferscha A, Hörtner H, Kotsis G
(eds) Advances in pervasive computing, vol 176, Austrian Computer Society (OCG)
Michel P, El Kaliouby R (2003) Real time facial expression recognition in video using support
vector machines. In: The 5th international conference on multimodal interfaces, Vancouver,
pp 258–264
Mitchell T (1997) Machine learning. McGraw Hill, London
MIT Media Lab (2014) Affective computing: highlighted projects. http://affect.media.mit.edu/
projects.php. Accessed 12 Oct 2013
Mäntyjärvi J, Seppänen T (2002) Adapting applications in mobile terminals using fuzzy context
information. In: Human computer interaction with mobile devices, pp 383–404
Morris T (2004) Computer vision and image processing. Palgrave Macmillan, London
Nilsson NJ (1986) Probabilistic logic. Artif Intell 28(1):71–87
Nilsson N (1998) Artificial intelligence: a new synthesis. Morgan Kaufmann Publishers,
Massachusetts
References 193
Oviatt S, Darrell T, Flickner M (2004) Multimodal interfaces that flex, adapt, and persist.
Commun ACM 47(1):30–33
Padovitz A, Loke SW, Zaslavsky A (2008) The ECORA framework: a hybrid architecture for
context-oriented pervasive computing. Pervasive Mobile Comput 4(2):182–215
Pantic M, Rothkrantz LJM (2003) Automatic analysis of facial expressions: the state of the art.
IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445
Park S, Locher I, Savvides A, Srivastava MB, Chen A, Muntz R, Yuen S (2002) Design of a
wearable sensor badge for smart kindergarten. In: The sixth international symposium on
wearable computers. IEEE Computer Society, Los Alamitos, CA, pp 231–238
Parkka J, Ermes M, Korpipaa P, Mantyjarvi J, Peltola J, Korhonen I (2006) Activity classification
using realistic data from wearable sensors. IEEE Trans Inf Technol Biomed 10(1):119–128
Passer MW, Smith RE (2006) The science of mind and behavior. McGraw Hill, Boston, MA
Patterson DJ, Fox D, Kautz H, Philipose M (2005) Fine-grained activity recognition by
aggregating abstract object usage. In: Proceedings of the IEEE international symposium on
wearable computers, pp 44–51
Pearlson KE, Saunders CS (2004) Managing and using information systems: a strategic approach.
Wiley, New York
Perttunen M, Riekki J, Lassila O (2009) Context representation and reasoning in pervasive
computing: a review. Int J Multimedia Eng 4(4)
Philipose M, Fishkin KP, Perkowitz M, Patterson DJ, Hahnel D, Fox D, Kautz H (2004) Inferring
activities from interactions with objects. IEEE Pervasive Comput Mobile Ubiquitous Syst 3
(4):50–57
Poole D, Mackworth A, Goebel R (1998) Computational intelligence: a logical approach. Oxford
University Press, New York
Poslad S (2009) Ubiquitous computing smart devices, smart environments and smart interaction.
Wiley, New York
Quinlan R (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo
Randell C, Muller H (2000) The shopping jacket: wearable computing for the consumer. Personal
Technol 4:241–244
Ranganathan A (2005) A task execution framework for autonomic ubiquitous computing. PhD
dissertation, University of Illinois at Urbana-Champaign, Urbana, Illinois
Ranganathan A, Campbell RH (2003) An infrastructure for context-awareness based on first order
logic. Personal Ubiquitous Comput 7(6):353–364
Ranganathan A, Campbell RH (2004) Autonomic pervasive computing based on planning. In:
Proceedings of international conference on autonomic computing, New York, pp 80–87, 17–18
May 2004
Ranganathan A, Al-Muhtadi J, Campbell RH (2004a) Reasoning about uncertain contexts in
pervasive computing environments. IEEE Pervasive Comput 3(2):62–70
Ranganathan A, Al-Muhtadi J, Chetan S, Campbell R, Mickunas MD (2004b) Middlewhere: a
middleware for location awareness in ubiquitous computing applications. In: Proceedings of the
5th ACM/IFIP/USENIX international conference on middleware. Springer, Berlin, pp 397–416
Rapaport WJ (1996) Understanding: semantics, computation, and cognition, pre-printed as
technical report 96–26. SUNY Buffalo Department of Computer Science, Buffalo
Reichle R, Wagner M, Khan MU, Geihs K, Valla M, Fra C, Paspallis N, Papadopoulos GA (2008)
A Context query language for pervasive computing environments. In: 6th Annual IEEE
international conference on pervasive computing and communications, pp 434–440
Reilly RB (1998) Applications of face and gesture recognition for human–computer interaction. In:
Proceedings of the 6th ACM international conference on multimedia, Bristol, pp 20–27
Reiter R (2001) Knowledge in action: logical foundations for specifying and implementing
dynamical systems. MIT Press, Cambridge
Rhodes B (1997) The wearable remembrance agent: a system for augmented memory. In: The 1st
international symposium on wearable computers. IEEE Computer Society, Los Alamitos, CA,
pp 123–128
194 4 Context Recognition in AmI Environments …
Riva G, Loreti P, Lunghi M, Vatalaro F, Davide F (2003) Presence 2010: the emergence of
ambient intelligence. In: Riva G, Davide F, IJsselsteijn WA (eds) Being there: concepts, effects
and measurement of user presence in synthetic environments. Ios Press, Amsterdam, pp 60–81
Riva G, Vatalaro F, Davide F, Alcañiz M (2005) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human–computer interaction.
IOS Press, Amsterdam
Russell S, Norvig P (2003) Artificial intelligence—a modern approach. Pearson Education, Upper
Saddle River, NJ
Sagonas K, Swift T, Warren DS (1994) XSB as an efficient deductive database engine. In:
Proceedings of the ACM SIGMOD international conference on management of data.
Minneapolis, Minnesota, New York, pp 442–453
Saffo P (1997) Sensors: the next wave of infotech innovation, 1997 ten-year forecast. Institute for
the Future. http://www.saffo.com/essays/sensors.php. Accessed 25 March 2008
Salvucci DD, Anderson JR (2001) Automated eye movement protocol analysis. Human Comput
Interaction 16(1):38–49
Sanders DA (2008) Environmental sensors and networks of sensors. Sensor Rev 28(4):273–274
Sanders DA (2009a) Introducing AI into MEMS can lead us to brain-computer interfaces and
super-human intelligence. Assembly Autom 29(4)
Sanders DA (2009b) Ambient intelligence and energy efficiency in rapid prototyping and
manufacturing. Assembly Autom 29(3):205–208
Scherer KR (1992) What does facial expression express? In: Strongman K (ed) International
review of studies on emotion, vol 2, pp 139–165
Scherer KR (1994) Plato’s legacy: relationships between cognition, emotion, and motivation.
University of Geneva
Schweiger R, Bayerl P, Neumann H (2004) Neural architecture for temporal emotion
classification. In Andre E, Dybkjær L, Minker W, Heisterkamp P (eds) ADS 2004, vol
3068. LNCS (LNAI), Springer, Heidelberg, pp 49–52
Schmidt A (2002) Ubiquitous computing—computing in context. PhD dissertation, Lancaster
University
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human–computer interaction.
IOS Press, Amsterdam, pp 159–178
Schmidt A (2006) Ontology-based user context management, the challenges of imperfection and
time-dependence. In: On the move to meaningful internet systems: CoopIS, DOA, GADA, and
ODBASE, vol 4275. Lecture Notes in Computer Science, pp 995–1011
Schmidt A, Beigl M, Gellersen HW (1999) There is more to context than location. Comput
Graph UK 23(6):893–901
Sebe N, Lew MS, Cohen I, Garg A, Huang TS (2002) Emotion recognition using a cauchy naive
Bayes classifier. In: Proceedings of the 16th international conference on pattern recognition,
vol 1. IEEE Computer Society, Washington, DC, pp 17–20
Shapiro LG, Stockman GC (2001) Computer vision. Prentice Hall, New Jersey
Sheldon EM (2001) Virtual agent interactions, PhD thesis, Major Professor-Linda Malone
Sibert LE, Jacob RJK (2000) Evaluation of eye gaze interaction. In: Proceedings of the ACM
conference on human factors in computing systems, The Hague, pp 281–288
Soldatos J, Pandis I, Stamatis K, Polymenakos L, Crowley JL (2007) Agent based middleware
infrastructure for autonomous context-aware ubiquitous computing services. Comput Commun
30(3):577–591
Straccia U (2005) Towards a fuzzy description logic for the semantic web (preliminary report). In:
Proceedings of the second European semantic web conference, ESWC 2005, vol 3532. Lecture
Notes in Computer Science, Springer, Berlin
Sung M, Marci C, Pentland A (2005) Wearable feedback systems for rehabilitation. J NeuroEng
Rehabil 2(17):1–12
References 195
Tapia EM, Intille S (2007) Real-time recognition of physical activities and their intensities using
wireless accelerometers and a heart rate monitor. In: Paper presented at international
symposium on wearable computers (ISWC)
Teixeira J, Vinhas V, Oliveira E, Reis L (2008) A new approach to emotion assessment based on
biometric data. In: Proceedings of WI–IAT ‘08, pp 459–500
Tobii Technology (2006) AB, Tobii 1750 eye tracker, Sweden. www.tobii.com. Accessed 21 Nov
2012
Truong BA, Lee Y, Lee S (2005) Modeling uncertainty in context aware computing. In:
Proceedings of the 4th annual ACIS international conference on computer and information
science, pp 676–681
van Dorp P, Groen FCA (2003) Human walking estimation with radar. IEEE Proc Radar Sonar
Navig 150(5):356–365
Van Laerhoven K, Gellersen HW (2001) Multi sensor context awareness. Abstract, Department of
Computing, Lancaster University, Lancaster
Van Laerhoven K, Schmidt A, Gellersen H (2002) Multi-sensor context aware clothing. In: The
6th international symposium on wearable computer. IEEE Computer Society, Los Alamitos,
CA, pp 49–56
Vardy A, Robinson JA, Cheng LT (1999) The WristCam as input device. In: Proceedings of the
3rd international symposium on wearable computers, San Francisco, CA, pp 199–202
Vick RM, Ikehara CS (2003) Methodological issues of real time data acquisition from multiple
sources of physiological data. In: Proceedings of the 36th annual Hawaii international
conference on system sciences. IEEE Computer Society, Washington, DC, pp 1–156
Waldner JB (2008) Nanocomputers and swarm intelligence. ISTE, London
Wang S, Pentney W, Popescu AM, Choudhury T, Philipose M (2007) Common sense based joint
training of human activity recognizers. In: Proceedings of the international joint conference on
artificial intelligence, Hyderabad, India, pp 2237–2242
Ward JA, Lukowicz TP, Starner TG (2006) Activity recognition of assembly tasks using body-worn
microphones and accelerometers. IEEE Trans Pattern Anal Mach Intell 28(10):1553–1567
Weiser M (1991) The computer for the 21st Century. Sci Am 265(3):94–104
Wimmer M, Mayer C, Radig B (2009) Recognizing facial expressions using model-based image
interpretation. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals:
cognitive and algorithmic issues. Springer, Berlin, pp 328–339
Wobke W (2002) Two logical theories of plan recognition. J Logic Comput 12(3):371–412
Wright D (2005) The dark side of ambient intelligence. Foresight 7(6):33–51
Zadeh LA (1999) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 100:9–34
Zhou J, Yu C, Riekki J, Kärkkäinen E (2007) AmE framework: a model for emotionaware
ambient intelligence, University of Oulu, Department of Electrical and Information Engineering,
Faculty of Humanities, Department of English VTT Technical Research Centre of Finland
Chapter 5
Context Modeling, Representation,
and Reasoning: An Ontological
and Hybrid Approach
5.1 Introduction
Over the last decade, a number of context modeling and reasoning approaches have
been developed, ranging from simple early models to the current state-of-the-art
models. These models have been utilized to develop a large number of
context-aware applications for or within various application domains. With the
experiences with the development of the variety of context-aware applications,
context information models have evolved from static, unexpressive, inflexible
representations to more dynamic, semantic (high expressive power), and extensible
ones, providing support for reasoning about context with enhanced computational
performance. Key-value models are one of the early models in context-aware
applications. They use simple key-value pairs to define the list of attributes and their
values as an approach to describe context information. On the onset attribute-value
models were quite often used, i.e., in Context Toolkit for building context-aware
applications (Dey 2000). Markup-based is another approach to context information
models and uses Extensible Markup Language (XML) among a variety of markup
languages. Composite Capabilities/Preference Profile (CC/PP) (Klyne et al. 2004) is
a context modeling approach involving both key-value pair and markup-based
approaches to context information models. CC/PP approach is perhaps the first
context modeling approach to adopt Resource Description Framework (RDF) and
to include elementary constraints and relationships between context types (Bettini
et al. 2010). It ‘can be considered a representative both of the class of key-value
models and of markup models, since it is based on RDF syntax to store key-value
pairs under appropriate tags. Simple kinds of reasoning over the elementary con-
straints and relationships of CC/PP can be performed with special purpose rea-
soners.’ (Ibid, p. 3). The above approaches to context information models have
many shortcomings and cannot respond to the growing complexity of the context
information used by context-aware applications. Indeed, they are criticized for their
limited capabilities in capturing a variety of context types, relationships, depen-
dencies, timeliness, and quality of context information; allowing consistency
checking; and supporting reasoning on (or inference of) higher level context
200 5 Context Modeling, Representation …
abstractions and context uncertainty (Ibid). This is according to the evaluation in the
literature surveys carried out by Indulska et al. (2003), Strang and Linnhoff-Popien
(2004), and Lum and Lau (2002). Recent research on context-aware modeling and
reasoning have attempted to address many of these limitations, giving a rise to new
a class of context information models characterized by more expressive context
modeling tools. A common feature of recent models is that they have the capa-
bilities to define concepts and their interrelationships and the constraints on their
application. Examples of these models include, and are not limited to: 4-ary
predicates (Rom et al. 2002), object-oriented (Hofer et al. 2003), and fact-based
(Bettini et al. 2010). These models, however, differ in terms of expressiveness and
reasoning efficiency and offer quite many distinctive features in terms of reducing
the complexity of context-aware applications development. For example,
‘fact-based context modeling approach…originated from attempts to create suffi-
ciently formal models of context to support query processing and reasoning, as well
as to provide modeling constructs suitable for use in software engineering tasks
such as analysis and design.’ (Ibid). Fact-based models use Context Modeling
Language (CML) (see, e.g., Henricksen et al. 2004). CML is based on Object-Role
Modeling (ORM) but extends it with modeling constructs for ‘capturing the dif-
ferent classes and sources of context facts…: specifically, static, sensed, derived,
and user-supplied…information; capturing imperfect information using quality
metadata and the concept of ‘‘alternatives’’ for capturing conflicting assertions
(such as conflicting location reports from multiple sensors); capturing dependencies
between context fact types; and capturing histories for certain fact types and con-
straints on those histories.’ (Bettini et al. 2010).
Situated as one of the latest waves of context modeling approaches, ontologies
have evolved as the types of context information used by context-aware applica-
tions grew more sophisticated. Ontological approaches to context information
modeling can be considered as a natural extension of CC/PP and RDF based
approaches ‘to satisfy the requirements of heterogeneity, relationship, and reason-
ing’ (Bettini et al. 2010, p. 4). Ontological context models are characterized by high
expressiveness and apply ontology-based reasoning on context using semantic
description logic. Hence, ontologies are considered very suitable for context
models. Especially the expressive power is a factor that significantly influences
reasoning processes—fuel sound context reasoning mechanisms. Indeed, the use of
the Web Ontology Language (OWL) as a representation scheme better supports
automated reasoning. There exist numerous representations to define context
ontologies, to specify context types and their descriptors and relationships,
including OWL; W3C’ semantic web activities; and Resource Description
Framework (RDF); these logic-based languages probably gave boost to ontologies
(Criel and Claeys 2008). For examples of ontology-based context models see Gu
et al. (2005), Chen et al. (2004b), and Korpip et al. (2005). Furthermore, ontology
researchers have recently started to explore the possibility of integrating different
models (e.g., representation sublanguage) and different types of reasoning mecha-
nisms in order to obtain more flexible, robust, and comprehensive systems. This
hybrid approach to context modeling is bringing, as research shows, many benefits.
5.2 Evolution of Context Modeling and Reasoning 201
5.4 Representation
In some scale, such identifiers are necessary for the context-aware system to be able
to identify various entities of contexts in a unique way in the real-world domains
that system deals with. This uniqueness allow for reusing the representations
without conflicts in identifiers. All work applying OWL naturally supports
expressing unique identifiers while other work does not deal with unique identifiers.
5.4.2 Validation
This allows software components to ensure that data is consistent with its repre-
sentation schema before performing any reasoning on, or processing with, it.
According to Strang and Linnhoff-Popien (2004), a context representation should
allow validating data against it.
202 5 Context Modeling, Representation …
5.4.3 Expressiveness
5.4.6 Generality
This entails the ability to support all kinds of context information as to a context
representation (Korpipää 2005). In this perspective, generality of a context repre-
sentation is associated with the expressiveness of a representation language since it
affects its ability to encode context information of different forms of complexity.
5.5 Reasoning
5.5.3 Interoperability
as Perttunen et al. (2009, p. 5) state, ‘evaluated against the same set of axioms, a set
of assertions should always produce the same conclusions. This implies that when a
set of assertions represents a message, its receiver can derive the exact meaning the
sender had encoded in the message.’
While the congruency of inference conclusions represents a basic prerequisite
for interoperability, it entails a disadvantage in terms of strengthening ‘ontological
commitment’ (Studer et al. 1998). That is to say, ‘the more consequences are
encoded as axioms in the representation, the more its clients are tied to dealing with
the represented entities in the exact same manner’, a case which ‘is undesirable
when only a few of the entities of the representation are of interest to the client.’
(Perttunen et al. 2009, p. 5). The reuse of modules of a Web Ontology Language
(OWL) is one example to deal with this issue (Bechhofer et al. 2004).
There have been many attempts to synthesize and evaluate the state-of-the-art
context models that are suitable for any kind of application and that can meet most
of the requirements set for the context modeling, reasoning, and management. The
experiences with the variety of context-aware applications developed based on
various context models has influenced the set of the requirements defined for
generic context models, the context representation and reasoning of the system. In a
recent survey of context modeling and reasoning techniques, Bettini et al. (2008)
synthesize a set of requirements for a generic context information modeling, rea-
soning, and management approach. These requirements quoted below need to be
taken into account when modeling context information.
There is a large variety of context information sources (e.g., mobile sensors, bio-
sensors, location sensors, image sensors, etc.) that context information models have
to handle, which differ in-in addition to the quality of information they generate—
their means of collecting and interpreting information about certain processes of the
human world or/and states of the physical world, update rate (user profiles versus
user behaviors and activities), dynamic nature of context data, semantic level,
derivation of context data from existing context information, and so on. Moreover,
context-aware applications that are dependent on mobile context information
sources add to the issue of heterogeneity due to the need for context information
provisioning to be flexible and adaptable to the changing environment. It is
essential that context information models consider different aspects and types of
context information in terms of handling and management.
5.6 Requirement for Generic Context Models 205
5.6.3 Timeliness
5.6.4 Imperfection
The variable quality of context information may be associated with its dynamic and
varied nature. Accordingly, the changing patterns of the physical world affect the
sensed values in terms of increasing inaccuracy over time or rendering context data
incorrect. Adding to this is the potential incompleteness of context information as
well as its conflicting with other context information. Thereby, it is essential that
context modeling approach incorporates modeling of context information quality as
a means to support reasoning on context information.
5.6.5 Reasoning
The key features of modeling formalisms are the ease with which software designers,
who create context information models to enable context-aware applications to
manipulate context information, can translate real-world concepts associated with
various situations to the modeling constructs and their interrelationships, as well as
the ease with which such applications can manipulate and utilize context
information.
The context modeling approach needs to support the representation of attributes for
appropriate access paths—i.e., dimensions along which context-aware applications
select context information—in order to pick the pertinent objects. This is associated
with the efficiency of access to context information, which the presence of
numerous data objects and large models makes it a difficult requirement to meet.
Those dimensions are, as stated by the authors, ‘often referred to as primary con-
text, in contrast to secondary context, which is accessed using the primary context.
Commonly used primary context attributes are the identity of context objects,
location, object type, time, or activity of user. Since the choice of primary context
attributes is application-dependent, given an application domain, a certain set of
primary context attributes is used to build up efficient access paths’ (Bettini et al.
2010, p. 3).
The experiences with the development of context-aware applications have
shown that deriving and taking into account the requirements for the generic
context knowledge representation and reasoning of the system when modeling
context information is associated with difficulty due to the problematic issues
relating to the development of information context models that usually emerge at
the time of writing the definition of some context domain and devising related
reasoning mechanisms. Context models are usually created for specific use cases or
applications. They have always been application dependent and there are not really
generic context models suitable for all kinds of applications (Dey 2001). ‘As the
context representation and reasoning of the system should be divided between
generic and application-specific, the generic representation and reasoning can be
encoded in the common ontologies, and the application-specific, in turn, in
ontologies extending the common ontology and as rules.’ (Perttunen et al. 2009,
p. 20). Moreover, deriving precisely the requirements for the generic system of
context information representation and reasoning is difficult, as the system should
support all kinds of applications, and some of which are not even known at system
design-time. A straightforward way to approach the situation caused by this
inherent problem because of which the design of the context information repre-
sentation and reasoning system necessarily relies on general requirements is to
5.6 Requirement for Generic Context Models 207
derive requirements from a typical application, thereby designing for the ‘average’
(Perttunen et al. 2009). Nonetheless, there have been some recent attempts (e.g.,
Strimpakou et al. 2006) to design and develop generic context models, not tied to
specific application domains. Common to most approaches to generic context
models is that they should allow for defining various context abstraction levels,
support the mappings and operations on contextual representations across various
contextual entities, and enable an easy reuse and dynamic sharing across models as
well as applications. Unlike the system of context information representation and
reasoning, context management system can be supported by generic mechanisms
applicable in any context domain and should not be bound to specific application
spaces. Overall, the design of generic representation, reasoning, and management
approach to context information modeling remains a quite challenging endeavor,
and thus a research area that merits further attention.
As structural frameworks for organizing knowledge about humans and the world in
a computerized way, ontologies have a wide applicability. This spans diverse areas,
including AI (e.g., conversational agents, emotionally intelligent systems, affective
systems, expert systems, etc.), AmI (e.g., cognitive, emotional, activity, and loca-
tion context-aware systems), the Semantic Web, enterprise engineering, testing, and
academic research.
Due to their semantic expressive formalism, ontologies meet most of the
requirements set for representation and reasoning and are distinctive from other
models in quite many aspects. Ontologies allow for integrating heterogeneous
applications (e.g., mobile, ubiquitous, AmI, and AI applications), enabling inte-
gration and interoperability with regard to shared structure and vocabulary across
multiple models and applications, enabling reusability and portability of models
between various application domains and systems, amalgamating different repre-
sentation schemes and reasoning techniques, providing interfaces for interacting
with knowledge-based software agents, and so on. As to the latter, for example,
ontology specifies a vocabulary with which to make assertions, which may con-
stitute inputs or outputs of software (knowledge) agents, providing, as an interface
specification, a language for communicating with the agent, which is not required to
use the terms of the ontology as an internal encoding of its knowledge while the
definitions and formal constraints of the ontology put restrictions on what can be
meaningfully stated in that language (Gruber 2009). Fundamentally, in order to
5.7 Context Models in Context-Aware Computing: Ontological Approach 211
The context representation problem has two sides, the encoding of knowledge and
conceptual model. It is argued by ontology designers that the conceptual structure is
associated with more issues and challenges than the encoding process. Winograd
(2001) notes that it is relatively straightforward to put what needs to be encoded
once understood into data structures, but the hard part is to come up ‘with con-
ceptual structures that are broad enough to handle all of the different kinds of
context, sophisticated enough to make the needed distinctions, and simple enough
to provide a practical base for programing.’
Ontological context modeling is the process to explicitly specify a set of rep-
resentational primitives—i.e., key concepts and their interrelations—and other
distinctions with which or that are relevant to model a domain of context, and build
a representation structure or scheme to encode such primitives and other distinc-
tions using the commonly shared vocabularies in the context domain. The repre-
sentational primitives include constraints on the logically consistent application and
use of concepts and their interrelations as part of their explicit specification. The
resulting context ontologies, the explicit representation of contexts that comprise
context categories and their relationships in, for instance a cognitive, emotional,
social, or situational domain, are essentially shared context knowledge models that
enhance automated processing capabilities by enabling software agents to interpret
and reason about context information, thereby allowing intelligent decision support
in a knowledgeable manner. This is associated with the delivery of adaptive and
responsive services in a knowledgeable, autonomous manner. Contexts in context
ontologies are modeled based on various contextual entities, e.g., emotional state,
cognitive state, task state, social state, environmental states, time, events, objects,
and so on, as well as the interrelationships between these entities, a computational
feature which allows software agents to take advantage of semantic reasoning
directly to infer high-level context abstraction. This dynamic context can be derived
from existing context information, using intelligent analysis, rather than using
probabilistic methods.
Context ontologies provide highly expressive context formalism that fuel sound
context reasoning mechanisms. They are basically shared knowledge models that
enhance automated processing capabilities by enabling software agents to interpret
and reason about context information. To a large extent, the efficiency of reasoning
mechanisms is determined by the nature of the expressive system (formalism) used
to codify context knowledge domain as to the ability to deal with dynamic
knowledge updates, to encode complex context entities and relations or context
information of different forms of complexity, to use a simple representation, to
consider uncertainty and incompleteness of context information, and so forth.
Research shows that ontological formalism is fundamental in the design, devel-
opment, and evaluation of reasoning mechanisms in context-aware applications.
The research in symbolic knowledge representation has been mostly driven by
the trade-off between the expressiveness of representation and the complexity of
reasoning. The research is still active as to investigating this interplay more closely
to achieve the most suitable solutions for data-intensive AmI systems (see Perttunen
et al. 2009). Ontology as essentially descriptions of concepts and their relationships
has emerged as an alternative solution, a common language for defining
user-specific rules based on semantic description logics that support automated
reasoning. Description logics (Baader et al. 2003) have emerged because they
provide complete reasoning supported by optimized automatic mechanisms (Bettini
et al. 2010). While other reasoning techniques have been utilized in the field of
context-aware computing such as probabilistic and statistical reasoning, logical
reasoning, case-based reasoning, and rule-based reasoning, the subset of the
OWL-DL admitting automatic reasoning is the most frequently used in various
application domains and supported by various reasoning services.
5.7 Context Models in Context-Aware Computing: Ontological Approach 219
context merely refer to using OWL inference, but the focus is on making inferences
using an external rule-based system. There is an important distinction between
inferences licensed by the ontology axioms and inferences based on arbitrary rules. In
the former, any reasoner for that ontology language produces the same results,
whereas in the latter both the ontology and the rules in the specific rule language are
needed, possibly also an identical rule engine. For this reason, much of the benefit of
using standard ontology languages is lost when inference is based on ad-hoc rules,
merely using the ontology terms as a vocabulary. Nevertheless, extending the rea-
soning beyond the inferences licensed by the ontology axioms is often necessary due
to the fact that the expressive power of the ontology language is often insufficient for
the task at hand.’ However, there is an active research to seek solutions to this issue
by extending OWL with rules (Maluszynski 2005). The semantic eWallet (Gandon
and Sadeh 2003) architecture for context awareness adopted more expressive
ontology languages obtained by extending OWL-DL with rules. Overall, ontologies
extending the common ontology and as rules are intended for encoding
application-specific as a category of the context representation and reasoning of the
system, whereas common ontologies are used to encode the generic representation
and reasoning.
Various OWL ontologies have been proposed for representing shared descriptions
of context with a commonality of being grounded in the hierarchies of the taxo-
nomic top-down of the domain components. The SOUPA (Chen et al. 2004c) OWL
ontology for modeling context in pervasive environments and the CONON (Zhang
et al. 2005) ontology for smart home environments are recognized to be among the
most prominent proposals and notable examples of OWL ontologies.
OWL-DL ontological models of context have been adopted in several archi-
tectures for context awareness. Context Broker Architecture (CoBra) (Chen et al.
2004a) for context awareness adopts the SOUPA (Chen et al. 2004c) ontology. The
authors note that reasoning is carried out both based on the axioms in the ontologies
as well as utilizing additional rule-based reasoning with arbitrary RDF triples, using
Jena’s rule-based OWL reasoning engine and Java Expert System Shell (Jess),
respectively (Perttunen et al. 2009). Although there is no description of the
mechanism for detecting when OWL reasoning is not enough, ‘the system is said to
be able to query the ontology reasoner to find all relevant supporting facts, and to
convert the resulting RDF graph(s) into a Jess representation. A forward-chaining
procedure is executed in Jess and any new facts are converted back to RDF and
asserted to the ontology reasoner’ (Ibid, p. 11). The SOCAM (Gu et al. 2004b)
middleware is another architecture that espouses the CONON (Zhang et al. 2005)
ontology. As proposed in Bouquet et al. (2004), SOUPA and CANON can be
5.7 Context Models in Context-Aware Computing: Ontological Approach 223
application designer instead of directly dealing with multiple sensors, thus allowing
context-aware applications to be designed without having to worry about what
sensors are being used and evaluating the raw sensor data. In Gray and Salber
(2001), the authors discuss how the aspects of sensorized contextual information
should be taken into account when designing context-aware applications. Their
work focuses on what they label ‘the meta-attributes’ of sensorized contextual
information, as opposed to context information in general, such as sensory source,
representation forms, information quality, interpretation, reasoning, and actuation.
However, there have been attempts undertaken to incorporate various sensors into
context-aware systems in the desired manner. In ‘Building Distributed Context-
Aware Application’ (Urnes et al. 2001), the authors attempt to address the problem
of dynamically and automatically managing a multitude of location sensors, and
‘Jini Dynamic Discovery Protocol’ is utilized to interface with an arbitrary number
of location sensors and deliver their information to a position service. This protocol
is commonly employed to manage a wide variety of sensors.
Sensors and ontological context models are needed to deal with context in a
computerized way in order to be supported in AmI environments. The context
recognition process entails acquiring sensor readings and mapping them to corre-
sponding properties defined in ontologies, aggregating and fusing multiple sensor
observations using context ontologies to create a high-level context abstraction, and
performing automated processing by allowing software agents to interpret infor-
mation and reason against ontological context—and making knowledge-based
intelligent decision as to what application actions to take in response to the user
needs. The idea is to abstract from low-level context by creating a new model layer
that gets the sensor perceptions as input and generates inferences and system
actions. Acquired without further interpretation, low-level context information from
physical sensors can be meaningless, trivial, vulnerable to small changes, or
uncertain (Ye et al. 2007). The derivation of higher level context information from
raw sensor values is a means to alleviate the issue of the limitation of low-level
contextual cues when modeling users’ behavior interactions that risks reducing the
usefulness of context-aware applications (Bettini et al. 2010). High-level context
abstraction is a layer that is referred to in the literature as situational context (e.g.,
Gellersen et al. 2002) or situation (e.g., Dobson and Ye 2006; Dey 2001). As a
higher level concept for a state representation, situation brings meaning to the
application so it becomes useful to the user by its relevant actions. In context-aware
applications, situations are external semantic interpretations of low-level context
(Dobson and Ye 2006). They allow for a higher level specification of human
actions in the scene and the corresponding application services (Bettini et al. 2010).
These can be of affective, cognitive, social, and communicative nature, and the
behavior of the AmI system is triggered by the change of situations. Compared to
226 5 Context Modeling, Representation …
Fig. 5.1 Overview of the different layers of semantic context interpretation and abstraction.
Source Bettini et al. (2008)
low-level contextual cues, situations are more stable and easier to define and
maintain and, thus, make design and implementation of context-aware applications
much easier because the designer can operate at a high level of abstraction rather
than on all context cues that create the situation (Ibid). Figure 5.1 illustrates the
basic ideas of what has been discussed up till now. The description of the three
layers, from the bottom to the top, of the pyramid are: sensor-based low-level
context information is semantically interpreted by the high-level context layer;
situations abstract from low-level data and are reusable in different applications; and
relationships defined between situations can provide for a further abstraction and
limitation of complexity (Ibid). The top layer has a narrow applicability in
approaches to context-aware applications, which usually focus on defining and
recognizing contexts/situations. Nevertheless, according to Bettini et al. (2008), one
motivation behind some approaches specifying and modeling situation relationships
‘is to considerably reduce the search space for potential situations to be recognized,
once the actual situation is known and knowing possible relationships (e.g.,
knowing possible successor situations of the current situation).’
Soldatos et al. (2007) present a context model in which situations represent
environmental state descriptions based on entities and their properties. In the sit-
uation model, states are connected with transitions, which can be triggered by
changes in the properties of observed entities. However, to include all potential
situations, their relationships and transitions is not always possible, particularly in
informal settings and scenarios (Bettini et al. 2010). Indeed, Soldatos et al. (2007)
note that their context model may seem not to be scalable due to the fact that the
situation states will hardly capture all possible contexts.
In all, establishing links between context information with sensor observations
through context properties defined in ontologies is a critical step in context
awareness functionality. The whole process of context awareness involving
low-level sensor data acquisition, middle-level data aggregation and fusion based
on context ontologies, and information interpretation and high-level context rea-
soning can be made more efficient and effective by employing faster and simpler
5.7 Context Models in Context-Aware Computing: Ontological Approach 227
The InstanceStore system proposed by Horrocks et al. (2004) is also based on the
idea of improving the efficiency of reasoning with OWL-DL based on the use of
relational database techniques.
Bettini et al. (2008) survey two hybrid approaches to context modeling: one
approach is a loosely coupled markup-based/ontological model, the CARE frame-
work for context awareness proposed by Agostini et al. (2009); and the other
approach, proposed by Henricksen et al. (2004), combines ontological approach
with fact-based approach proposed by the Context Modeling Language (CML). The
CARE framework espouses ‘a context modeling approach that is based on a loose
integration between a markup model and an ontological model. The integration
between these models is realized through the representation of context data by
means of CC/PP profiles which contain a reference to OWL-DL classes and rela-
tions. In order to preserve efficiency, ontological reasoning is mainly performed in
advance with respect to the service provision. Whenever relevant new context data
is acquired, ontological reasoning is started, and derived information is used, if still
valid, at the time of service provisioning together with efficient rule evaluation.
Complex context data (e.g., the user’s current activity) derived through ontological
reasoning can be used in rule preconditions in order to derive new context data such
as user preferences.’ (Bittini et al. 2008, p. 15). As to the hybrid fact-based/
ontological model, ‘the aim is to combine the particular advantages of CML models
(especially the handling of ambiguous and imperfect context information) with
interoperability support and various types of reasoning provided by ontological
models. The hybrid approach is based on a mapping from CML modeling con-
structs to OWL-DL classes and relationships. It is worth noting that, because of
some expressivity limitations of OWL-DL, a complete mapping between CML and
OWL-DL cannot be obtained. With respect to interoperability issues, the advan-
tages gained by an ontological representation of the context model are clearly
recognizable. However, with respect to the derivation of new context data, expe-
riences with the proposed hybrid model showed that ontological reasoning with
OWL-DL and its SWRL extension did not bring any advantage with respect to
reasoning with the CML fact-based model. For this reason, ontological reasoning is
performed only for automatically checking the consistency of the context model,
and for semantic mapping of different context models.’ (Ibid). Furthermore, with
respect to fact-based models, the CML, which provides a graphical notation
designed to support software engineering in the analysis and formal specification of
the context requirements of context-aware applications, offers various advantages,
including: capturing ‘the heterogeneity of the context information sources, histories
(timeliness) of context information’; providing ‘an easy mapping from real-world
concepts into modeling constructs’; providing ‘a good balance between expressive
power and efficient reasoning procedures for evaluation of simple assertions about
context and for reasoning about high-level context abstractions…expressed as a
form of predicate logic’, which is ‘well suited for expressing dynamic context
abstractions’. However, CML is less expressive than OWL-DL and ‘a possible
shortcoming of CML with respect to more expressive languages is the lack of
support for hierarchical context descriptions. Moreover, even if CML supports
230 5 Context Modeling, Representation …
Fig. 5.2 Context reasoning architecture. Source Lassila and Khushraj (2005)
Hierarchical hybrid models are assumed to bring clear advantages in terms of the set
of the requirements defined for a generic context model used by context-aware
applications. For example, they can provide solutions to overcome the weaknesses
associated with the expressive representation and reasoning in description logic, as
discussed above. Bettini et al. (2008) contend that there is likelihood to satisfac-
torily address a larger number of the identified requirements by hierarchical hybrid
context model if hybrid approaches can be further extended to design such a model.
They propose a model that is intended to provide a more comprehensive solution, in
terms of expressiveness and integration of different forms of reasoning. In the
proposed model, the representation formalism used to represent data retrieved from
a module executing some sensor data fusion technique should, in order to support
the scalability requirements of AmI services, enable the execution of efficient
reasoning techniques to infer high-level context data on the basis of raw ones by,
for example, executing rule-based reasoning in a restricted logic programing lan-
guage. As suggested by the authors, a more expressive, ontology-based context
model is desirable on top of the respective representation formalism since it
inevitably does not support a formal definition of the semantics of context
descriptions. As illustrated in Fig. 5.3, the corresponding framework is composed of
the following layers:
Fig. 5.3 Multilayer framework. Source Adapted from Bettini et al. (2010)
232 5 Context Modeling, Representation …
• Layer 1: This layer, sensor data fusion, can be organized peer-to-peer network of
software entities and is dedicated to acquire, process (using techniques for
sensor data fusion and aggregation) and propagate raw context data in the AmI
space in order to support cooperation and adaptation of services (see Mamei and
Zambonelli 2004). The arrows depict the flow of context data. At this layer,
signal processing algorithms are used to process raw context data from sensor
signals.
• Layer 2: This layer involves shallow context data representation, integration
with external sources, and efficient context reasoning, and particularly includes
module for efficient markup-based, RDF-based, or DB-based representation and
management of context data; modules for efficient shallow reasoning (logics-
and/or statistics-based); and data integration techniques for acquiring data from
external sources and for conflict resolution. The highly dynamic and hetero-
geneous outputs of the layer 1 put hard demands on the middle layer.
• Layer 3: This layer involving realization/abstraction process to apply ontological
representation and reasoning aims to specify the semantics of context terms,
which is critical for sharing and integration; to check consistency of the set of the
concepts and relationships describing a context scenario; and to provide an
automatic procedure to classify sets of context data (particular sets of instances of
basic context data and their relationships) as more abstract context abstractions.
The process of capturing, modeling, and representing emotional and cognitive states
and behaviors is one of the most difficult computational tasks in the area of HCI.
Moreover, while ontologies allow formal, explicit specification of some aspects of
human emotion and cognition as shared conceptualization, they have not matured
enough to enable the modeling of interaction between emotion and cognition as two
distinct knowledge domains, whether in the area of context-aware computing,
effective computing, or computational intelligence. Current ontologies consist of the
concepts and their relationships pertaining to emotional states (or emotion types),
cognitive states or activities, or communicative intents. Emotion ontologies have
thus been used in diverse HCI application domains within AmI and AI, including
context-aware computing (e.g., emotional context-aware systems, social intelligent
systems), affective computing (e.g., emotion-aware systems, emotionally intelligent
systems), and computational intelligence (e.g., dialog acts, conversational systems).
Further, emotional and cognitive elements of context significantly affect interaction
in everyday life; therefore, they must influence and shape the interaction of users
with computational artifacts and environments. In human interaction, emotional and
cognitive states, whether as contextual elements or communicative messages, can be
conveyed through verbal and nonverbal communication behavior as a reliable
source. Human communication is highly complex, manifold, subtle, fluid, and
dynamic, especially in relation to the interpretation and evaluation of behaviors
conveying contextual information. Likewise, the interpretation and processing of
emotional and cognitive states has proven to be a daunting challenge to emulate as
part of human mental information-manipulation processes and what this entails in
terms of internal representations and structures of knowledge. This carries over its
effects on making appropriate decisions and thus undertaking relevant actions, e.g.,
delivering adaptive and responsive services. This implies that capturing, repre-
senting, and processing emotional and cognitive elements of context requires highly
sophisticated computational techniques. It is not an easy task to deal with human
factors related context in a computerized way, and novel context models are more
needed than ever. Context awareness technology uses verbal and nonverbal cues to
detect people’s emotional and cognitive states through reading multimodal sources
using dedicated multiple, diverse sensors and related multi-sensor data fusion
5.9 Modeling Emotional and Cognitive Contexts or States 235
techniques, but the interpretation and processing of the multimodal context data
collected from sensorized AmI environments should be supported by powerful
modeling, representation, and reasoning techniques to offset the imperfection and
inadequacy of sensor data, so that context-aware applications can adapt their
behavior in response to emotional or/and cognitive states of the user. A great variety
of advanced theoretical models of emotion and cognition and myriad new findings
from very recent studies are available, but most work in developing context-aware
and effective systems seems to be technology-driven, by what is technically feasible
and computationally attainable, and also, a large body of work on emotional and
cognitive context-aware and effective systems tend or seem to operationalize con-
cepts of related states that are rather simple compared to what is understood as
psychological states in cognitive psychology, neurocognitive science, and the phi-
losophy of mind as academic disciplines specialized on the subject matter.
However, the semantic expressiveness and reasoning power of ontologies makes
thus far ontological approach a suitable solution to emotional and cognitive context
modeling used by context-aware applications. Indeed, in relation to emotions,
ontology allows flexible description of emotions at different levels of conceptual-
ization, and it is straightforward to develop conceptual ontological models that
enable such a logical division, as exemplified below. Still, modeling, representation,
and processing of emotional context, in particular, is regarded as one of the most
challenging tasks in the development of context-aware applications, as the specifics
of such context in real life are too subjective, subtle, dynamic, and fluid. And
cognitive states are too tacit, intricate, dynamic, and difficult to identify—even for
the user to externalize and translate into a form intelligible to the system—to be
modeled. In fact, it is more intricate to computationally deal with cognitive context
than emotional context as the former is in most cases of an internal nature, whereas
the latter is often dealt with externally via affect display (see Chap. 8). It is difficult
to recognize the cognitive context of the user (see Kim et al. 2007).
Overall, emotional and cognitive context systems are based on a layered
architecture whose design quality is determined by the relevance of the multiplicity
and diversity of sensors embedded in the system and spread in the environment as
well as the level of the semantic expressiveness and the automation of intelligent
processing (interpretation and reasoning) pertaining to the recognition of emotional
and cognitive states. To build architectures that support emotional and cognitive
context awareness is far too complex compared to other types of context, as they
involve dynamic acquisition techniques, multi-sensor data fusion approaches,
specialized recognition algorithms, complex mappings techniques, e.g., mapping
patterns of facial expressions, gestures, and voice every second as sensor readings
to corresponding properties defined in respective context ontologies to create
high-level context abstractions—emotional or cognitive states. Given the com-
plexity inherent in human emotion and cognition, representing concepts of related
contexts and their relationships and reasoning against related information should
integrate various approaches to modeling and reasoning in order to enhance the
quality of context inference process, i.e., the transformation of the atom contexts
into a higher level of the contexts.
236 5 Context Modeling, Representation …
Fig. 5.4 The ambient intelligence framework. Source Zhou et al. (2007)
According to the authors, the services can help users to carry out their everyday
activities, generate emotional responses that positively impact their emotions, and
train them mediate some aspects of their emotional intelligence associated with the
perception, assessment, and management of their emotions and those of others.
Building on Goleman’s (1995) mixed models of emotional intelligence, the author
mentions that self-awareness, self-management, social skill and social awareness as
emotion capabilities are fulfilled in an emotion experience. In this framework, it is
assumed that the emotional state generated by the user is the context according to
which the responsive services are delivered, and their context consists of cul-
tural background, personal knowledge, present human communication, legacy
emotion positions, and so on, and also refers to emotion situation that produces
emotions.
However, the authors give no detail of how they address the issues of the
non-universality of emotions—i.e., emotions are interpreted differently in various
cultures. It is important to underscore that a framework that is based on common
emotion properties could work in one cultural setting and might not in another.
Rather, emotional context-aware applications should be culture specific and thus
designed in a way to be tailored to cultural variations of users if they are to be
widely accepted. Also, this framework does not provide information on whether the
expressed emotion (negative or positive) is appropriate for the context or situation it
is expressed in, a criterion that is important in the case of further implementation of
Ability EIF. This is a useful input to consider in the final model for emotion-aware
AmI. The premise is that a system cannot help users to improve their emotional
intelligence abilities if it is not emotionally intelligent itself. Indeed, as part of the
authors’ future work is to investigate the feasibility and applicability of mediating
human emotional intelligence by providing ambient services. Further, contextual
appropriateness of emotions, whether displayed through vocal or gestural means, is
a key element in understanding emotions, which is in turn a determining factor for
providing relevant responsive services. There is much to study to be able to
238 5 Context Modeling, Representation …
implement the AmE framework and develop application prototypes. Indeed, the
authors point out that there is a need for further research with regard to ‘emotion
structure in English conversation for detecting emotions and identifying emotion
motivations’ as well as ‘emotion services modeling for a pervasive emotion-aware
service provision responding to emotion motivations’. Ontological modeling of
emotional context should take into account the complexity and the context
dependence of emotions rather than simple emotion recognition—valence classi-
fication, in order to create effective affective context-aware applications. Moreover,
other sources of emotional cues (e.g., psychophysiological responses) may need to
be incorporated in the development of emotional context-aware applications.
Authors have mainly described models for the communication of emotions via
speech, face, and some contextual information (Obrenovic et al. 2005).
Fig. 5.5 Relationship among modules in the domain ontology of emotional concepts. Source
Cearreta et al. (2007)
Theory module: Describes the main types of theories, such as dimensional (Lang
1979), categorical (Ekman 1984), and appraisal (Scherer 1999). For each type of
which the emotion can be represented in a different way.
Emotional cue module: Depicts external emotional representations in terms of
different media properties. An emotional cue will be taken into account more than
another one depending on the context in which the user is. To take into account all
emotional cues and complete emotion, each type of emotional cue corresponds to
each one of the three systems: verbal information, conductal information, and
psychophysiological responses, as proposed by Lang (1979).
User context module: Defines the user context which consists of different context
elements or entities: personal, social, task, environment, and spatiotemporal (Göker
and Myrhaug 2002). This is to take into account the complexity of emotion
dependence of and the influence by context in an actual moment.
Context element module: Describes the context representation in terms of dif-
ferent context elements. Various factors can have an effect on emotion expression
and identification, e.g., verbal cues relate to the user language. As an important
contextual aspect when it comes to emotion detection, different emotional cues can
be taken into account according to user context, e.g., in the case of darkness, speech
emotional cue will be more relevant and facial emotional cue may not be so. Indeed,
not all emotional cues can be available together as context affects cues that are
relevant.
Media property module: Describes basic media properties for emotional cues,
which are used for description of emotional cues. These media properties are
context-aware, e.g., voice intensity value is different depending on the gender in
terms of personal context element. A media property can be basic as voice intensity
or derived as voice intensity variations.
Element property module: Describes properties for context elements, which are
used for description of context elements. In the same manner as to media property, a
context element property can be basic (e.g., temperature) or derived (e.g., mean
temperature). An emotional context can be composed by voice intensity, temper-
ature, facial expression, speech paralinguistic parameters, and with the composition
with the other context elements, user context is to be completed.
240 5 Context Modeling, Representation …
Research on cognitive aspects of context is still in its infancy. While there is a large
part of a growing body of research on context awareness technology that investigate
approaches to context information modeling and reasoning for context information,
it appears to be a less active work on cognitive context modeling. Based on the
literature on context awareness, a very few methods for capturing, representing, and
inferring cognitive context have been developed and applied. And the few practical
attempts to implement cognitive context are far from real-world implementation, so
concrete applications using algorithmic approach have not been performed.
Noticeably, frameworks for developing cognitive context-aware applications are
way less than those for developing emotional ones. In a cognitive context system
proposed by Kim et al. (2007), ontology is proposed to implement components of a
prototype deploying inference algorithms, and a probabilistic method is used to
model cognitive context. Therefore, this approach may be classified as of a hybrid
category.
5.10 Examples of Ontology Frameworks … 241
In a study carried out by Kim et al. (2007), the authors propose the context
inference and service recommendation algorithms for the Web-based information
system (IS) domain. The context inference algorithm aims to recognize the user’s
intention as a cognitive context within the Web-based IS, while the service rec-
ommendation algorithm delivers user-adaptive or personalized services based on
the similarity measurement between the user preferences and the deliver-enabled
services. In addition, the authors demonstrate cognitive context awareness on the
Web-based IS through implementing the prototype deploying the two algorithms.
The aim of the proposed system deploying the context inference and service rec-
ommendation algorithm is to help the IS user to work with an information system
conveniently and enable an existing IS to deliver AmI services. However, to apply
the context inference algorithm—that is, to recognize a user’s intention, which is
regarded as a cognitive context, the sources that the user uses on the Web-based IS
should be discerned and then the representatives of each source category should be
extracted and classified by means of a text categorization technique. For example, a
user may browse or refer to various sources, such as Web pages, PDF documents,
and MS Word documents and these sources used by the user (while using the
Web-based IS) reflect his/her intention as a cognitive context, which can be inferred
by considering the combination of each source category synthetically. The obtained
categories, representative of sources resulting from the text categorization process,
are matched with the predefined categories in the IS context-category memory,
which contains various IS contexts (e.g., business trip request for conference
attendance, business trip request for international academic exchange, book pur-
chasing request, etc.). The predefined categories are assigned to each of these IS
contexts. The IS context, which can be extracted through the IS structure using the
content analysis, is the user’s intention or cognitive context that should be inferred.
It is determined after the process of comparing and scoring the categories has
completed. The perception—recognition and interpretation—of a user’s cognitive
context enables the system to recommend a personalized service to the user, using
service recommendation algorithm that selects user-adaptive services from the list
considering the user preferences recognized normally in advance. The relevant
service is extracted from a deliver-enabled service list, which is obtained by using
the inferred context and the user’s input data. Given the controversy surrounding
the invisibility notion driving context awareness, it would be more appropriate in
terms of how the system should behave to present the context-dependent infor-
mation or service and let the user decide what to do with it. Context-sensitive
information or service is always useful to the user, but the recommendation in
context-aware computing is that by priming information or service with contextual
features or providing information or service that is right for the context, the per-
formance in terms of speed of response as to finding answers in the information
should still increase. Figure 5.6 illustrates the context inference and service rec-
ommendation framework.
242 5 Context Modeling, Representation …
Fig. 5.6 Context inference and service recommendation and procedure. Source Kim et al. (2007)
Considering the emphasis of this chapter, it may be worth elaborating further on the
categorization method as part of the inference algorithm used in this system. Text
content-based categorization is used as a method for categorizing documents, an
approach which is, as pointed out by the authors, ‘based on machine learning,
where the quality of a training text influence the result of the categorization criti-
cally’. This approach, according to Pierre (2001, cited Kim et al. 2007) can render
good results in a way that both is robust as well as makes few assumptions about
the context to be analyzed. As one of the areas in text mining, this method auto-
matically sorts text-based documents into predefined categories, e.g., a system
assigns themes such as ‘science’, ‘sports’, or ‘politics’ to the categories of general
interest (Kim et al. 2007). The authors state that this approach involves machine
learning to create categorizers automatically, a process that ‘typically examines a
set of documents that have been pre-assigned to categories, and makes inductive
244 5 Context Modeling, Representation …
abstractions based on this data that will assist it in categorizing future documents’,
assuming that quality of a training text has a critical impact on the categorization
result. To perform text categorization, features should be extracted and then
weighted, including the following steps: tokenizing, word stemming, and feature
selection and weighting (see Kim et al. 2007, for a brief description of the steps).
Categorizing each reference used by the user using text categorization techniques is
the first step to infer the user’s cognitive context within the Web-based IS through
the context inference algorithm. That is, the category data are used to infer the
user’s cognitive context. The IS context-category memory which contains the IS
contexts and categories that are to be matched with the categories derived from the
representative extraction phase to infer the user’s cognitive context should be based
on ontology given its advantages in, according to Khedr and Karmouch (2005, cited
Kim et al. 2007) enabling the system to perceive the abundant meaning utilizing the
inheritance and attribute data, in addition to its expressiveness power that allow to
understand the real meaning of an item. The context inference algorithm involves
three steps: (1) after the first reference category is determined, the IS context that
includes this category in the IS context-category memory is activated; (2) the same
goes for the second reference category; (3) if the user acknowledges this context
positively, the algorithm function is terminated and the selected context is deter-
mined by the system as the user’s cognitive context; otherwise, step 2 is repeated.
Research shows that a large part of the recent work in the field of context-aware
computing applies an ontology-based approach to context modeling. Various
application-specific, generic, and hybrid context ontology solutions have indeed
been proposed and adopted in a wide range of architectures for context awareness.
Context ontologies seem to provide intuitive benefits for the development of
context-aware applications and compelling features for their implementation. They
allow context to be recognized through direct semantic reasoning that make
extensive use of semantic content—descriptions and domain knowledge. Numerous
studies (e.g., Bettini et al. 2010; Strimpakou et al. 2006; Gu et al. 2005; Khedr and
Karmouch 2005; Chen et al. 2004a, b, c; Korpip et al. 2005; Korpipaa et al. 2003;
Wang et al. 2004; Strang et al. 2003) have demonstrated the benefits of using
ontology-based context models. Evaluation of a research work on context modeling
(Strang and Linnhoff-Popien 2004) shows that the usage of ontologies exhibits
prominent benefits in AmI environments. Enriching context-aware applications
with semantic knowledge representation provides robust and straightforward
techniques for describing contextual facts and interrelationships in a precise and
traceable manner (Strang et al. 2003). Moreover, context ontologies address the
need of applications to access a widely shared representation of knowledge.
5.11 Key Benefits of Context Ontologies: Representation and Reasoning 245
infinite number and a wide variety of contextual elements that dynamically interact
with each other to define and shape human interaction, technologically context
consists of a very limited number of contextual entities or a number of explicitly
defined attributes. It is also (operationalized) based on a static view, occurring at a
point in time, rather than a dynamic view, constantly evolving. Although context
models aim and attempt to capture rich domain of knowledge, concepts and their
relationships as close as possible to real-world based on advanced theoretical
models, they are still circumscribed by technological boundaries. Indeed, most
work in developing context-aware systems seems to be driven by what is techni-
cally feasible and computationally attainable. Also, or probably as a consequence, a
large body of work on context-aware systems tend to operationalize concepts of
various contexts in a rather simplified way compared to how context is conceptu-
alized in situated theory, philosophy, constructivism, communication studies, or
rather academic disciplines devoted to the study of context or specialized on the
subject matter (see Goodwin and Duranti 1992 for a detailed account).
Technological feasibility pertaining to the machine understandability and process-
ability of semantic context content—semantic reasoning making use of context
semantic descriptions and domain knowledge—has implication for how context
should be conceptualized. In fact, the computational feasibility issues relating to the
notion of intelligence as allured to in AmI are associated with an inherent com-
plexity and intrinsic intricacy pertaining to sensing all kinds of patterns in the
physical world and modeling all sorts of situations and environments. In a nutshell,
context models should fit with what technology has to offer in terms existing
computational representation and reasoning capabilities rather than technology
support and responds to how context needs to be modeled or conceptualized: ‘In the
terms of practical philosophy…, human context includes the dimension of
practical-normative reasoning in addition to theoretical-empirical reasoning, but
machines can handle the latter only. In phenomenological terms, human context is
not only a “representational” problem (as machines can handle it) but also an
“interactional” problem, that is, an issue to be negotiated through human interac-
tion…. In semiotic terms, finally, context is a pragmatic rather than merely semantic
notion, but machines operate at a syntactic or at best approximated semantic level of
understanding.’ (Ulrich 2008, p. 7). Theoretical criteria have been proposed for
defining the user context from a theoretic or holistic view, context models are based
on a simplified set of concepts and their relationships, notwithstanding. As a
consequence to the way context is operationalized, driven by the constraints of
existing technologies and engineering theory and practice, it is feasible to model
any domain, or even worse, the world, an outcome of modeling which is based on
user groups, as long as computational models can enable systems to bring (a certain
degree of) utility to the user. Currently, the key concern of how context should be
modeled, represented, processed, managed, disseminated, and communicated seem
to be to make context-aware applications useful in terms of being able to adapt to
some features of the user context in AmI environments. In fact, computer and
design scientists argue that the concern of models should be utility not truth. Hence,
context model is useful insofar as it contributes to developing context-aware
5.13 Context Models Limitations, Inadequacies, and Challenges 249
respects. Context touches upon the basic structure of human and social interaction.
Context-aware applications constitute a high potential area for modelers in the
psychological and social disciplines to implement and assess their models. From the
other side, it is valuable to sensitize researchers in computer science, AI, and AmI
to the possibilities and opportunities to incorporate more substantial knowledge
from human-directed disciplines in context-aware applications.
In sum, context models should be developed through collaborative research
endeavors, an approach that requires rigorous scholarly interdisciplinary and
transdisciplinary research. This has the potential to create new interactional and
holistic knowledge necessary to better understand the multifaceted phenomenon of
context and context awareness and thereby enhance context models. This in turn
reduces the complexity of, and advances, the development of context-aware
applications in ways that allow users to exploit them to the fullest, by benefitting
from a diverse range of context-aware personalized, adaptive, and responsive ser-
vices. Therefore, when designing AmI technologies, it is of import for researchers
or research teams to be aware of the limitation and specificity of technological
knowledge, to challenge assumptions, and to constantly enhance models to increase
the successfulness of the deployment and adoption of new technologies. This
should depart from challenging technology-driven perspectives on context models,
critically reviewing operationalizations of context and their implication for how
context is conceptualized, and questioning the belief in the existence of models of
the user’s world and models of the user’s behavior, as well as revolutionizing the
formalisms used to codify context knowledge beyond hybrid approaches. What is
needed is to create innovative modeling techniques and languages that can capture
and encode context models with high fidelity with real-world phenomenon, com-
prehensiveness, dynamicity, and robustness.
References
Agostini A, Bettini C, Riboni D (2005) Loosely coupling ontological reasoning with an efficient
middleware for context awareness. In: Proceedings of the 2nd annual international conference
on mobile and ubiquitous systems. Networking and services, pp 175–182
Agostini A, Bettini C, Riboni D (2009) Hybrid reasoning in the CARE middleware for context
awareness. Int J Web Eng Technol 5(1):3–23
Arpírez JC, Gómez-Pérez A, Lozano A, Pinto HS (1998) (ONTO)2 agent: an ontology-based
WWW broker to select ontologies. In: Gómez-Pérez A, RV Benjamins (eds) ECAI’98
workshop on applications of ontologies and problem-solving methods, Brighton, pp 16–24
Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF (2003) The description
logic handbook: theory, implementation, and applications. Cambridge University Press, New
York
Bechhofer S, van Harmelen F, Hendler J, Horrocks I, McGuinnes DL, Patel-Schneider PF,
Stein LN (2004) OWL web ontology language reference. W3C
Bernaras A, Laresgoiti I, Corera J (1996) Building and reusing ontologies for electrical network
applications. In: Proceedings of the 12th European conference on artificial intelligence (ECAI),
pp 298–302
References 253
Bettini C, Pareschi L, Riboni D (2008) Efficient profile aggregation and policy evaluation in a
middleware for adaptive mobile applications. Pervasive Mobile Comput 4(5):697–718
Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A
survey of context modelling and reasoning techniques. J Pervasive Mobile Comput 6(2):161–180
(Special Issue on Context Modelling, Reasoning and Management)
Bianchi-Berthouze N, Mussio P (2005) Introduction to the special issue on “context and emotion
aware visual computing”. J Visual Lang Comput Comput 16:383–385
Bobillo F, Delgado M, Gómez-Romero J (2008) Representation of context-dependant knowledge
in ontologies: a model and an application. Expert Syst Appl 35(4):1899–1908
Borgo S, Guarino N, Masolo C (1996) A pointless theory of space based on strong connection and
congruence. In: Proceedings of principles of knowledge representation and reasoning (KR96),
Morgan Kaufmann, Boston, MA, pp 220–229
Bouquet P, Giunchiglia F, van Harmelen F, Serafini L, Stuckenschmidt H (2004) Contextualizing
ontologies. J Web Semant 1(4):325–343
Brachman RJ, Levesque HJ (2004) Knowledge representation and reasoning. Morgan Kaufmann,
Amsterdam
Bravo J, Alaman X, Riesgo T (2006) Ubiquitous computing and ambient intelligence: new
challenges for computing. J Univers Comput Sci 12(3):233–235
Cearreta I, Miguel J, Nestor L, Garay-Vitoria N (2007) Modelling multimodal context-aware
affective interaction. Laboratory of human-computer interaction for special needs, University
of the Basque Country
Chaari T, Dejene E, Laforest F, Scuturici VM (2007) A comprehensive approach to model and use
context for adapting applications in pervasive environments. Int J Syst Softw 80(12):1973–1992
Chen H, Finin T, Joshi A (2004a) Semantic web in the context broker architecture. Proceedings of
the 2nd IEEE international conference on pervasive computing and communications (PerCom
2004). IEEE Computer Society, pp 277–286
Chen H, Fenin T, Joshi A (2004b) An ontology for context-aware pervasive computing
environments. Knowl Eng Rev 18(3):197–207 (Special Issue on Ontologies for Distributed
Systems)
Chen H, Perich F, Finin TW, Joshi A (2004c) SOUPA: standard ontology for ubiquitous and
pervasive applications. In: 1st annual international conference on mobile and ubiquitous
systems, MobiQuitous. IEEE Computer Society, Boston, MA
Chen L, Nugent C (2009) Ontology–based activity recognition in intelligent pervasive
environments. Int J Web Inf Syst 5(4):410–430
Criel J, Claeys L (2008) A transdisciplinary study design on context-aware applications and
environments’, a critical view on user participation within calm computing. Observatorio
(OBS*) J 5:057–077
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc 3
(4):219–232
De Moor A, De Leenheer P, Meersman M (2006) DOGMA–MESS: A meaning evolution support
system for interorganizational ontology engineering. Paper presented at the 14th international
conference on conceptual structures, Aalborg, Denmark
Dey AK (2000) Providing architectural support for building context-aware applications. PhD
thesis, College of Computing, Georgia Institute of Technology
Dey AK (2001) Understanding and using context. Pers Ubiquit Comput 5(1):4–7
Dey AK, Abowd GD, Salber D (2001) A conceptual framework and a toolkit for supporting the
rapid prototyping of context-aware applications. Hum Comput Interact 16(2–4):97–166
Ding Z, Peng Y (2004) A probabilistic extension to ontology language OWL. In: Proceedings of
the 37th annual hawaii international conference on system sciences (HICSS’04). IEEE
Computer Society, Washington, DC
Dobson S, Ye J (2006) Using fibrations for situation identification. Proceedings of pervasive 2006
Workshops. Springer, New York
Ekman P (1982) Emotions in the human face. Cambridge University Press, Cambridge
Ekman P (1984) Expression and nature of emotion. Erlbaum, Hillsdale, Nueva Jersey
254 5 Context Modeling, Representation …
Fensel D (2003) Ontologies: a silver bullet for knowledge management and electronic commerce.
Springer, Berlin
Forbus KD, Kleer JD (1993) Building problem solvers. MIT Press, Cambridge, MA
Fowler M, Scott K (1997) UML distilled: applying the standard object modeling language.
Addison-Wesley, Reading, MA
Gandon F, Sadeh NM (2003) A Semantic e-wallet to reconcile privacy and context awareness.
Proceedings of ISWC 2003, 2nd international semantic web conference. Springer, Berlin,
pp 385–401
Gellersen HW, Schmidt A, Beigl M (2002) Multi-sensor context-awareness in mobile devices and
smart artifacts. Mobile Netw Appl 7(5):341–351
Gennari JH, Musen MA, Fergerson RW, Grosso M, Crubezy WE, Eriksson H, Noy NF, Tu SW
(2003) The evolution of Protégé: an environment for knowledge-based systems development.
Int J Hum Comput Stud 58(1):89–123
Göker A, Myrhaug HI (2002) User context and personalisation. In: ECCBR workshop on case
based reasoning and personalisation, Aberdeen
Goleman D (1995) Emotional intelligence. Bantam Books Inc, NY
Gomez-Perez A (1998) Knowledge sharing and reuse. In: Liebowitz J (ed) The handbook of
applied expert systems. CRC Press, Boca Raton, FL
Goodwin C, Duranti A (eds) (1992) Rethinking context: Language as an Interactive phenomenon.
Cambridge University Press, Cambridge
Gray PD, Salber D (2001) Modelling and using sensed context information in the design of
interactive applications. In: Proceedings of engineering for human-computer interaction: 8th
IFIP international conference, vol 2254. Toronto, pp 317–335
Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquisition
5:199–221
Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing. Int J
Hum Comput Stud 43(5–6):907–928
Gruber T (2009) Ontology. In: Liu L, Tamer Özsu M (eds) The encyclopedia of database systems.
Springer, Heidelberg
Gu T, Pung HK, Zhang DQ (2004a) Toward an OSGi-based infrastructure for context-aware
applications. Pervasive Comput 3(4):66–74
Gu T, Wang XH, Pung HK, Zhang DQ (2004b) An ontology-based context model in intelligent
environments. In: Proceedings of communication networks and distributed systems modeling
and simulation conference, San Diego, California, pp 270–275
Gu T, Pung HK, Zhang DQ (2005) A service-oriented middleware for building context-aware
services. J Network Comput Appl 28(1):1–18
Guarino N (1995) Formal ontology, conceptual analysis and knowledge representation. Int J Hum
Comput Stud 43(5–6):625–640
Guizzardi G (2005) Ontological foundations for structural conceptual models. PhD thesis,
University of Twente, The Netherlands
Guizzardi G, Herre H, Wagner G (2002) On the general ontological foundations of conceptual
modeling. In: Proceedings of the 21st int’l conference on conceptual modeling (ER-2002), vol
2503. LNCS, Finland
Henricksen K, Indulska J (2006) Developing context-aware pervasive computing applications:
models and approach. Pervasive Mobile Comput 2(1):37–64
Henricksen K, Livingstone S, Indulska J (2004) Towards a hybrid approach to context modelling,
reasoning and interoperation. In: Indulska J, Roure DD (eds) Proceedings of the 1st
international workshop on advanced context modelling, reasoning and management, University
of Southampton, Nottingham
Hofer T, Schwinger W, Pichler M, Leonhartsberger G, Altmann J, Retschitzegger W (2003)
Context-awareness on mobile devices—the hydrogen approach. Proceedings of the 36th annual
Hawaii international conference on system sciences (HICSS ‘03), vol 9. IEEE Computer Society
Hong JI, Landay JA (2001) An infrastructure approach to context-aware computing. Hum Comput
Interact 16:287–303
References 255
Horrocks I (2002) DAML+OIL: a reason-able web ontology language. In: Advances in database
technology—8th international conference on extending database technology, vol 2287. Prague,
Czech Republic, pp 2–13, 25–27 Mar 2002
Horrocks I, Patel-Schneider PF, van Harmelen F (2003) From SHIQ and RDF to OWL: the
making of a web ontology language. J Web Semant 1(1):7–26
Horrocks I, Patel-Schneider PF, Boley H, Tabet S, Grosof B, Dean M (2004) SWRL: a semantic
web rule language combining OWL and RuleML. W3C Member Submission, W3C, viewed 23
June 2009. http://www.w3.org/Submission/2004/SUBM-SWRL-20040521/
Indulska J, Robinson R, Rakotonirainy A, Henricksen K (2003) Experiences in using CC/PP in
context-aware systems. In: Chen MS, Chrysanthis PK, Sloman M Zaslavsky AB (eds) Mobile
data management, vol 2574. Lecture notes in computer science. Springer, Berlin
Khedr M, Karmouch A (2005) ACAI: agent-based context-aware infrastructure for spontaneous
applications. J Network Comput Appl 28(1):19–44
Khushraj D, Lassila O, Finin T (2004) sTuples: semantic tuple spaces. In: The 1st annual international
conference on mobile and ubiquitous systems: networking and services, pp 268–277
Kim S, Suh E, Yoo K (2007) A study of context inference for Web-based information systems.
Electron Commer Res Appl 6:146–158
Klyne G, Reynolds F, Woodrow C, Ohto H, Hjelm J, Butler MH, Tran L (2004) Composite
capability/preference profiles (CC/PP): structure and vocabularies 1.0. W3C Recommendation,
Technical Representation, W3C
Kogut P, Cranefield S, Hart L, Dutra M, Baclawski K, Kokar M, Smith J (2002) UML for
ontology development. Knowl Eng Rev 17(1):61–64
Korpip P, Malm E, Salminen I, Rantakokko T (2005) Context management for end user
development of context-aware applications. In: Proceedings of the 6th international conference
on mobile data management. ACM Press, Ayia Napa, Cyprus
Korpipää P (2005) Blackboard-based software framework and tool for mobile device context
awareness. PhD thesis, University of Oulu
Korpipaa P, Mantyjarvi J, Kela J, Keranen H, Malm E (2003) Managing context information in
mobile devices. IEEE Pervasive Comput 2(3):42–51
Lang PJ (1979) A bio-informational theory of emotional imagery. Psychophysiology 16:495–512
Lassila O, Khushraj D (2005) Contextualizing applications via semantic middleware. In:
Proceedings of the 2nd annual international conference on mobile and ubiquitous systems:
networking and services, San Diego, pp 183–189
Lum WY, Lau FCM (2002) A context-aware decision engine for content adaptation. IEEE
Pervasive Comput 1(3):41–49
Maluszynski J (2005) Combining rules and ontologies: a survey. REWERSE, Technical
Representation, I3–D3
Mamei M, Zambonelli F (2004) Programming pervasive and mobile computing applications with
the TOTA middleware. In: Proceedings of the 2nd IEEE international conference on pervasive
computing and communications. IEEE Computer Society
McGuinness DL, van Harmelen F (2004) OWL web ontology language. W3C Recommendation.
http://www.w3.org/TR/owl-features/. Viewed 25 May 2012
McIntyre G, Göcke R (2007) Towards affective sensing. Proceedings of HCII, vol 3
Motik B, Patel-Schneider PF, Parsia B (2008) OWL 2 web ontology language: structural
specification and functional-style syntax. World Wide Web Consortium, Working Draft
WD-owl2-syntax-20081202
Newmann NJ (1999) Sulawesi: a wearable application integration framework. Proceedings of the
3rd international symposium on wearable computers (ISWC ‘99), San Fransisco
Newmann NJ, Clark AF (1999) An intelligent user interface framework for ubiquitous mobile
computing. Proceedings of CHI ‘99
Nicklas D, Grossmann M, Mínguez J, Wieland M (2008) Adding high-level reasoning to efficient
low-level context management: a hybrid approach. In: 6th annual IEEE international
conference on pervasive computing and communications, pp 447–452
256 5 Context Modeling, Representation …
6.1 Introduction
As a new paradigm in ICT, AmI is heralding new ways of interaction, which will
radically change the interaction between humans and technology. AmI could be
seen as a novel approach to HCI, entailing a shift from conventional interaction and
user interfaces towards human-centric interaction and naturalistic user interfaces,
e.g., direct communication with all sorts of everyday objects. AmI has emerged as a
result of amalgamating recent discoveries in human communication, computing,
and cognitive science towards natural HCI. AmI technology is enabled by effortless
(implicit human–machine) interactions attuned to human senses and adaptive and
proactive to users. This entails adding adaptive HCI methods to computing systems
based on new insights into the way people aspire to interact with these systems,
meaning augmenting them with context awareness, multimodal interaction, and
intelligence. The evolving model of natural HCI tries to take the holistic nature of
the human user into account—e.g., context, behavior, emotion, intention, motiva-
tion, and so on—when creating user interfaces for and conceptualizing interaction
in relation to AmI applications and environments. Human-like interaction capa-
bilities aim to enhance the understanding and supporting intelligent behavior of
AmI systems. Therefore, human verbal and nonverbal communication behavior has
become an important research topic in the field of HCI; especially, computers are
becoming increasingly an integral part of everyday and social life. Research in this
area is burgeoning within the sphere of AmI. A diverse range of related capture
technologies are under vigorous investigation in the creation of AmI applications
and environments. Utilizing human verbal and nonverbal communication behavior
allow users to interact with computer systems on a human level, like face-to-face
human interaction. The trends toward AmI are driving research into more natural
forms of human–machine interaction, moving from explicit means of input towards
more implicit forms of input that supports more natural forms of communication,
such as facial expressions, eye movement, hand gestures, body postures, and speech
and its paralinguistic features. Such forms of communication are also utilized by
context-aware systems to acquire information as input for interaction and interface
control in AmI environments. Recognized as an inherent part of direct human
communication, nonverbal behavior, in particular, plays a significant role in con-
veying context. They can provide a wealth of information about the user’s emo-
tional, cognitive, and physiological states as well as actions and behaviors, a type of
contextual information that can be captured implicitly by context-aware systems, so
that they can enhance their computational understanding of interaction with users
and thereby adapt their behavior in ways that intelligently respond to users’ needs.
Indeed, it is by having a greater awareness of context that context-aware systems
can become able to provide more intelligent services, in addition to rendering
interaction with users more intuitive and effortless. However, placing greater reli-
ance on knowledge of context, reducing interactions with users (minimizing input
from them and replacing it with knowledge of context), and providing intelligent
services signify that applications become invisible. Invisibility, the guiding prin-
ciple of context-aware computing has been a subject of much debate and criticism
in the recent years for it poses a special conundrum and a real dilemma. This vision
remains of limited modern applicability.
This chapter examines, discusses, and classifies the different features of implicit
and natural HCI pertaining to ambient and multimodal interaction and user inter-
faces, intelligent agents, intelligent behavior (personalization, adaptation, respon-
siveness, and anticipation), and mental and physical invisibility, as well as related
issues, challenges, and limitations.
concerns include: the joint performance of tasks by users and computers; the
structure of communication between users and computers; human capabilities to use
computers; algorithms and programing of user interfaces; engineering issues
relating to designing and building interfaces, the process of analysis, design and
implementation of interfaces; and design trade-offs (Ibid). HCI also deals with
enhancing usability and learnability of interfaces; techniques for evaluating the
performance of interfaces; developing new interfaces and interaction techniques;
the development and practical application of design methodologies to real-world
problems; prototyping new software and hardware systems; exploring new para-
digms for interaction (e.g., natural interaction); developing models and theories;
and so forth. HCI is of a highly interdisciplinary nature for it studies humans and
computers in conjunction. It integrates a range of fields of research and academic
disciplines, including engineering science, design science, and applied science, as
well as cognitive science, communication theory, linguistics, social anthropology,
and so on. Accordingly, it is concerned with scientific methodologies and processes
for investigating and designing interaction and user interfaces.
HCI has evolved over the last four decades and have been applied in various
application areas, and recently in context-aware computing. The idea of interaction
has evolved from an explicit timely bidirectional interaction between the human
user and the computer system to a more implicit multidirectional interaction. In
desktop applications, graphical user interfaces (GUIs) as commonly used approa-
ches are built on event based interaction, a direct dialog which occurs as a sequence
of communication events between the user and the system, whereby the basic idea
is to assign events to interactions performed by the user (e.g., pressing a button),
which are linked to actions, e.g., calls of certain functions (Schmidt 2005). Whereas
in new context-aware applications, the user and the system are in an implicit dialog
where the system is aware of the context where it operates through using natural-
istic, multimodal user interfaces combining graphical, facial, voice, gestures, and
motion interfaces. In all, designing interaction and user interfaces for context-aware
systems has its distinctive challenges, manifested in the complexity of the novel
forms that aim at illuminating interaction between users and computers, by making
interaction rich, smooth, intuitive, and reliable. This reflects a qualitative leap
crystallized into AmI as a paradigm shift of HCI.
The field of HCI studies has undergone some significant transitions. The focus of
research has shifted from tasks to actions and from laboratories to real-world set-
tings where people would want to use and experience new technologies. Academic
design studies of innovation highlight the importance of observing real people in
real life situations and encourage approaches that make user participation an
262 6 Implicit and Natural HCI in AmI: Ambient …
inseparable part of technology production (Kelley 2002). Studies in the field of HCI
have gone through a number of milestones, including the emphases on function-
ality, usability, and, more recently emotional computing and aesthetic computing.
Research within HCI has for long struggled to address many issues that affect the
amount of effort the user must expend to provide input for the system and to interpret
the output of the system, and how much effort it takes to learn how to perform this.
Dix et al. (1998) observe significant differences when it comes to usability and the
time needed to learn how to operate a system. Usability is a key characteristic of the
user interface; it is concerned with the ease with which a user interface can be used
by its target users to achieve defined goals. Usability is also associated with the
functionality of the computer software and the process to design it. Functionality
refers to the ability to perform a task or function, e.g., software with greater func-
tionality is one that is capable of serving a purpose well or can provide functions
which meet stated and implied needs as intended by its user. In software technology,
usability refers to the capability of the software to be understood, learned, used and
attractive to the user under specified conditions. In this context, it describes how well
a technological artifact can be used for its intended purpose by its target users in
terms of efficiency, effectiveness, and satisfaction. ISO 9241-11 (1998) suggests
measuring usability on three levels: effectiveness (i.e., information retrieval task),
efficiency (i.e., usefulness of time taken to do tasks), and satisfaction (fulfillment of
user’s needs). Usability of technology has been extensively researched in recent
years (e.g., Nielsen 1993; Norman 1988; Hix and Hartson 1993; Somervell et al.
2003; Nielsen and Budiu 2012).
In the context of AmI, usability has gone beyond efficiency and effectiveness to
include appropriateness of context of use—context awareness—for optimal satis-
faction of user’s needs. Context-aware computing promises a rich interaction
experience and a smooth interaction between humans and technology.
Context-aware applications provide intuitive interaction as well as ambient intel-
ligent services, namely adaptive, personalized, responsive, anticipative, and im-
mersive services. The so-called naturalistic, multimodal user interfaces are aimed at
reducing interaction with users by replacing it with knowledge of context with the
goal to reduce the physical and cognitive burden to manipulate and control appli-
cations and better serve the users, thereby increasing usability.
Usability not only represents the degree to which the design of a particular user
interface makes the process of using the system effective, efficient, satisfying, and
context-sensitive, but also takes into account emotional, cognitive and sociocultural
factors of users. It is recognized that HCI design that can touch humans in holistic
ways is fundamental in ensuring a satisfying user interaction experience. Alongside
the standard usability and functionality concerns there is an increasing interest in
questions concerning aesthetics and pleasure. Aesthetic and emotional computing is
another milestone which studies in HCI design has gone through. Particularly in the
area of AmI, design ideals have been confronted by visions of ‘emotional com-
puting’, and HCI research has identified the central position of emotions and aes-
thetics in designing user experiences and computer artifacts. Design aesthetics is of
focus in AmI systems. The basic idea is that high quality of design aesthetics can
6.3 HCI Design Aspects: Usability … 263
profoundly influence people’s core affect through evoking positive affective states
such as sensuous delight and gratification. Aesthetics is thus associated with user’s
emotions. It is used to describe a sense of pleasure, although its meaning is much
broader including any sensual perception (Wasserman et al. 2000). ‘Aesthetics’
comes from the Greek word aesthesis, meaning sensory perception and under-
standing or sensuous knowledge. In a notable work, Udsen and Jorgensen (2005)
unravel recent aesthetic approaches to HCI. It has been realized that studies on
aesthetics in HCI have taken different notations of aesthetics (Ibid). Lavie and
Tractinsky (2004) provide a review of the different approaches to studying aesthetics
including studies in HCI. It is worth mentioning that aesthetics is a contested concept
in design of artifact. Since it is linked to emotions, it touches very much on cultural
context. Visual conventions have indeed proven not to be universal because per-
ception of aesthetics is subjective and socioculturally situated.
However, interfaces are increasingly becoming tailored to a wide variety of users
based on various specificities. In advocating user-centrality, HCI emphasizes the
central role of users in the design of technology, through allowing them to have far
greater involvement in the design process. Widely adopted principles of
user-centered-design (UCD) raise the perspectives of user and context of use to the
center of the design process. The premise of UCD, a common approach to HCI
design, is to balance functionality, usability, and aesthetic aspects. This requires
accounting for psychological, behavioral, social, and cultural variations of users as
a condition for building successful and acceptable interactive technologies.
Therefore, new directions of HCI design research calls for more interdisciplinary
research endeavor to create new interactional knowledge necessary to design
innovative interactive systems in terms of social intelligence (see Chaps. 8 and 9) in
order to heighten user interaction experience. Social intelligence capabilities are
necessary for AmI systems to ensure users’ acceptability and pleasurability. All in
all, interactive computer systems should function properly and intelligently and be
usable, useful, efficient, aesthetically pleasant, and emotionally appealing—in shot
elicit positive emotions and pleasant user experiences. For a detailed discussion of
emotional and aesthetic computing, see Chaps. 8 and 9.
A user interface is the system by which, and the space where, users interact with
computers. Users tend to be more familiar (and aware) with user interfaces as a
component than other external components of the whole computer system when
directing and manipulating it. This is due to the fact that users interact with the
systems in a multimodal fashion, using visual, voice, auditory, and tactile modal-
ities. User interfaces include hardware (physical), (e.g., input devices and output
units) and related software (logical) components for processing the received
264 6 Implicit and Natural HCI in AmI: Ambient …
algorithms and programing of user interfaces, e.g., in the case of web browsing, a
related software application becomes unable (or fail) to locate, retrieve, present, and
traverse information resources on Web sites, including Web pages, images, video,
and other files. Explicit interaction requires a dialog between the user and the
computer, and this dialog ‘brings the computer inevitably to the center of the
activity and the users focus is on the interface or on the interaction activity.’
(Schmidt 2005). This form of interaction is obviously unsuitable to AmI applica-
tions, as explicit input is insufficient for such applications to function properly; they
rather require a great awareness of the user context, so that they can adapt their
functionality accordingly, i.e., in ways that better match the user needs—to some
extent though. It is simply difficult to imagine or achieve AmI with explicit
interaction only, irrespective of the modality. Regardless, explicit HCI does not take
into account the nonverbal behavior of users leading to some authors characterizing
computers as ‘autistic’ in nature (Alexander and Sarrafzadeh 2004). It is thus in
contrast to the visions of calm computing and UbiComp (Weiser 1991; Weiser and
Brown 1998). As Schmidt (2005, p. 162) observes, explicit interaction contradicts
the idea of AmI and disappearing interfaces, and therefore new interaction para-
digms and HCI models are required to realize the vision of an AmI environment
which can offer natural interaction.
• Touch screen are displays that accept input by touch of fingers or a stylus.
• Zooming user interfaces are graphical interfaces in which information objects
are represented at different levels of scale and detail: the user can change the
scale of the viewed area in order to show more detail.
Common to all these user interfaces is that the user explicitly requests an action
from the computer, the action is carried out by the computer, and then the system
responds with an appropriate reply.
The main goal of AmI is to make computing technology simple to use and interact
with, intuitive, ubiquitous, and accessible to people with minimal knowledge by
becoming flexible, adaptable, and able of acting autonomously on their behalf
wherever they are. This implies that there are five main properties for AmI or
UbiComp: iHCI, context awareness, autonomy, ubiquity, and intelligence. These
properties tend to have many some overlaps among them in their concepts, e.g.,
iHCI involes context awareness, intelligence, and autonomy. However, there are
different internal system properties that characterize iHCI, among which include
(Poslad 2009):
• iHCI versus explicit HCI: more natural and less conscious interaction instead of
explicit interaction which involves more devices and thus results in human
overload. Computer interaction with humans needs to be more hidden as much
HCI is overly intrusive. Using implicit interaction systems anticipate use.
• Embodied reality as opposite of virtual reality: Weiser (1991) positioned
UbiCom as an opposite of virtual reality, where computing devices are inte-
grated in the real-world—embodied in the physical and human environment—
instead of putting human users in computer-generated environments. UbiCom is
described as computers ‘that fit the human environment instead of forcing
humans to enter theirs.’ (York and Pendharkar 2004). Devices are bounded by
and aware of both physical and virtual environment so as to optimize their
operation in their physical and human environment, and thus users have access
to various services.
• Concept of calm or disappearing computer model: computer devices are too
small to be visible, embedded, and user interfaces are visible, yet unnoticeable,
becoming part of peripheral senses.
It may be useful to elaborate further on these system properties in the context of
AmI. Disappearing of user interfaces into our environment and from our perception
entails that the computing and networking technology (supporting these interfaces)
and their logic will physically disappear, i.e., technologies will be an integral part of
6.5 The New Paradigm of Implicit HCI (iHCI) 267
interactions and peripheral senses and the technology behind will invisibly be
embedded in everyday life world and function unobtrusively in the background.
Diverse, multiple sensors and other computing devices will be entrenched in
context-aware systems and spread in context-aware environment serving to detect
or capture implicit information about the user’s various contextual elements (e.g.,
cognitive states, emotional states, (psycho)physiological states, social states, social
dynamics, events, activities, physical environment and conditions, spatiotemporal
setting, etc.), for analysis and estimation of what is going in the user’s mind and in
his/her behavior and in the physical, social, and cultural environments, and execute
relevant context-dependent actions. In this way, the user will have full access to a
diverse range of services (e.g., personalized, adaptive, responsive, and proactive),
which will be delivered in a real-time fashion, with the environment appearing fully
interactive and reactive. Detecting and analyzing observed information for gener-
ating intelligent behavior is enabled and supported by flexible multimodal inter-
actions, using naturalistic user interfaces. Forms of implicit inputs, which support
natural forms of communication, allow context-aware applications and systems to
capture rich contextual information, which influence and fundamentally change
such applications and systems. Contextual elements are important implicit infor-
mation about the user that the system can use to adapt its behavior intelligently
accordingly. To approach the goal of HCI emulating natural interaction, it is crucial
to include implicit elements into the interaction (Schmidt 2005). The quest for new
forms of interaction and novel user interfaces is motivated by observing how
interaction between humans differs from HCI. As noted by Schmidt (2005, p. 159):
‘Observing humans interacting with each other and new possibilities given by
emerging technologies indicate that a new interaction model is needed’, e.g., cre-
ating naturalistic user interfaces that are capable of detecting as much information
as possible about the user’s context necessary for inferring an accurate high-level
abstraction of context, as such interfaces can employ multiple sensory modalities
and thus channels for information transmission and for interface (or system) con-
trol. The more channels are involved, the more robust estimation of user’s context.
system to adapt its functionality accordingly. Therefore, implicit forms of input and
output, and the process of acquiring the former from and presenting the latter to the
user are associated with context-aware systems and applications. The basic idea of
implicit interaction ‘is that the system can perceive the users interaction with the
physical environment and also the overall situation in which an action takes place.
Based on the perception the system can anticipate the goals of the user to some
extent and hence it may become possible to provide better support for the task the
user is doing. The basic claim is that…iHCI allows transparent usage of computer
systems. This enables the user to concentrate on the task and allows centering the
interaction in the physical environment rather than with the computer system’
(Schmidt 2005, p. 164).
Essentially, context-aware systems and applications involve both implicit and
explicit inputs and outputs—that is, context data are acquired from invisibly
embedded sensors (or software equivalents) as well as via keyboard, touchscreen,
pointing device, and/or manual gestures. According to Schmidt’s (2005) iHCI
model, explicit user interaction with a context-aware application is a way of
extending the context of the user in addition to being embedded into the context of
the user. Context-aware services execute service logic, based on information pro-
vided explicitly by end users and implicitly by sensed context information (Dey
2001; Brown et al. 1997). As to outputs, notwithstanding the use of explicit output
to a lesser extent in early context-aware systems, combining explicit and implicit
forms of output is increasingly gaining prevalent as a result of revisiting the notion
of intelligence and addressing the issue of ambiguity and disempowerment asso-
ciated with technology invisibility, which has for quite some time guided
context-aware computing. Pushing information towards and taking actions auton-
omously on behalf of the user was the commonly adopted approach in most
attempts to use context awareness within AmI environments.
The model of iHCI has a wide applicability, spanning a variety of application
domains and thus offering solutions to different problem domains relating to
context-aware computing, affective computing, and conversational agents, which
all involve context awareness at varying degrees. Applications that make use of
iHCI take the user’s context into account as implicit input, and respond to the user
accordingly through implicit output. The iHCI model, as proposed by Schmidt
(2005, p. 167), is centered on the standard HCI model ‘where the user is engaged
with an application by a recurrent process of input and output, and in it ‘the user’s
center of attention is the context… The interaction with the physical [social, cul-
tural, and artificial] environment is also used to acquire implicit input. The envi-
ronment of the user can be changed and influenced by the iHCI application’
However, the type of implicit input a system can acquire when its user interacts
with the environment and its artifacts depends on the application domain, so too
does how implicit output influences and changes the environment of the user.
In all, to realize iHCI requires new interaction paradigms and novel methods for
design and development of user interfaces that make no assumptions about the
available input and output devices or usage scenarios and potential users in a
stereotypical way.
6.5 The New Paradigm of Implicit HCI (iHCI) 269
Another question is how implicit interaction can be tied in or integrated with explicit
interaction? This is based on the assumption that implicit interaction is rarely the
only form of interaction, hence the significance of its integration with explicit
interaction. This is also a severe problem that occurs in the kind of proactive
applications, particularly. A related question is how to resolve conflicting inputs
when implicit and explicit user interaction goes together. A final question men-
tioned by the author is how to deal with ambiguities in iHCI given that implicit
interaction is often ambiguous? Disambiguating implicit interaction is of a critical
importance in context-aware applications. Most of these questions relate to the
issues posed by the idea of the invisibility of technology. To iterate, the invisibility
of technology has been a subject of debate in the field of context-aware computing.
One interesting aspect of iHCI is the application feature of natural interaction. This is
a key aspect heralding a radical change to the Interaction between user and tech-
nology, as computers will be ubiquitous and invisible, supporting human action,
interaction, and communication in various ways, wherever and whenever needed.
Using naturalistic user interfaces, AmI can anticipate and respond intelligently to
spoken or gestured indications of desires and wishes, and these could even result in
systems or agents that are capable of engaging in intelligent dialog (Punie 2003;
Riva et al. 2005). Utilizing implicit forms of inputs that support natural human forms
of communication, such as speech, facial movements, and gestural movements,
signifies that users will be able to interact naturally with computer systems in the way
face-to-face human interaction occurs. As one of the key human-like computational
capabilities of AmI, natural interaction has evolved as a solution to realize the full
potential of, and is one of the most significant challenges in, AmI. The idea of
mimicking natural interaction is to create computers that can emulate various aspects
of human interaction, using natural modalities, namely to understand and respond to
cognitive, emotional, social, and conversational processes of humans. The under-
lying assumption of augmenting AmI systems with human-like interaction capa-
bilities and consider human intention, emotion, and behavior is to enhance their
intelligent functionality, with the aim to improve the life of people by providing a
panoply of adaptive, responsive, proactive, immersive, and communicative services.
Context-aware systems are enabled by effortless interactions, which are attuned
to human senses and sensitive to users and their context. To approach the aim of
creating interaction between humans and systems that verge on natural interaction,
it becomes crucial to utilize natural forms of communication and therefore include
implicit elements into the interaction. In more detail, the basic idea of natural
6.6 Natural Interaction and User Interfaces 271
interaction is that the system can recognize the user’s cognitive, emotional, and
psychophysiological states as well as actions, using verbal and nonverbal com-
munication signals (facial, vocal, gestural, corporal, and action cues). Based on this,
the system can select, fine-tune, or anticipate actions according to the context of the
task or to the emotional state of the user, therefore providing support for the task the
user is doing. With natural interaction capabilities, systems become able to detect,
understand, and adapt in response to the user cognitive and emotional states.
Therefore, user interfaces that support natural modalities are important for
context-aware systems to be able to interact naturally with users and behave
intelligently, provide services and support cognitive and emotional needs.
A context-aware user interface assumes ‘that things necessary for daily life embed
microprocessors, and they are connected over wireless network’ and that ‘user
interfaces control environmental conditions and support user interaction in a natural
and personal way. Hence, an ambient user interface is a user interface technology
which supports natural and personalized interaction with a set of hidden intelligent
interfaces’ (Lee et al. 2009, p. 458).
There is more to natural interaction than just recognizing the user’s cognitive or
emotional context as implicit input and adapting in response to it as implicit output.
In addition to supporting users in their daily tasks and activities and responding to
their emotional states, AmI systems can, thanks to the integration of affective
computing into AmI, detect users’ emotions and produce emotional responses that
have positive effect on their emotions as well as appear sensitive and show empathy
to them and even help them improve their emotional intelligence abilities (e.g.,
Zhou and Kallio 2005; Zhou et al. 2007; Picard et al. 2001; Picard 1997).
Furthermore, AmI systems are capable to understand and respond to speech and
gestures as commands to perform a variety of tasks as new forms of explicit inputs
(see, e.g., Kumar et al. 2007; Adjouadi et al. 2004; Sibert and Jacob 2000; Pantic
and Rothkrantz 2003; de Silva et al. 2004). Applications utilizing natural modalities
such as facial movement, eye gaze, hand gestures, and speech to execute tasks have
a great potential to reduce the cognitive and physical burden needed for users to
operate and interact with computer systems. In addition, natural interaction enables
AmI systems to engage in intelligent dialog or mingle socially with humans users.
This relates to ECAs which are capable of creating the sense of face-to-face con-
versation with the human user, as these systems are able to receive multimodal
input and then produce multimodal output in nearly real-time (Vilhjálmsson 2009).
ECAs are concerned with natural interaction given that when constructing believ-
able conversational systems, the rules of human multimodal (verbal and nonverbal)
communication behavior are taken into account. ECAs ‘are capable of detecting and
understanding multimodal behavior of a user, reason about it, determine what the
most appropriate multimodal response is and act on this’ (ter Maat and Heylen
2009, p. 67). They involve, in addition to explicit interaction, implicit interaction in
the sense that to engage in an intelligent dialog with a human user, the conversa-
tional system needs to be aware of various contextual elements that surround the
multimodal communicative signals being received from the human user as explicit
input. These context elements which need to be captured as implicit input include:
272 6 Implicit and Natural HCI in AmI: Ambient …
the dialog context, the environmental context, and the cultural context (Samtani
et al. 2008). Therefore, conversational systems are, like context-aware systems,
based on iHCI paradigm.
The subsequent chapters explore further cognitive and emotional context-aware,
affective, social, conversational, and touchless systems as HCI applications based
on natural interaction, along with a set of relevant examples of systems that have
been development or are being developed.
Augmenting AmI systems with natural interaction capabilities entails using user
interfaces that are ambient, perceptual, reactive, and multimodal—that is, naturalistic.
As a research area in HCI, natural interaction paradigm aims to provide models and
methods for design and development of what has come to be known as NUIs. These
provide multiple means of interfacing with a system and several distinct tools and
devices for input and output. The most descriptive identifier of the so-called NUIs is
the lack of a physical keyboard, pointing device, and/or touchscreen. In other words,
NUIs are based on or use natural modalities of human communication, such as
speech, facial expressions, eye gaze, hand gestures, body postures, paralinguistic
features, and so on. It is worth noting that NUIs may have multi-functionality: they
can be used to acquire context as implicit input, to recognize emotions, to receive
commands in the form of spoken and gestured signals as new forms of explicit inputs,
and to detect multimodal communication behavior. Ideally, an AmI system should be
equipped with user interfaces that support all these functionalities and that can be
used flexibly in response to the user’s needs. NUIs include, and are not limited to:
• Facial user interfaces are graphical user interfaces which accept input in a form
of facial gestures or expressions.
• Gesture interfaces are graphical user interfaces which accept input in a form of
hand or head movements.
• Voice interfaces accept input and provide output by generating voice prompts.
The user input is made by responding verbally to the interface. In this context,
verbal signals can be used by computers as commands to perform tasks.
• Motion tracking interfaces monitor the user’s body motions and translate them
into commands.
• Eye-based interface is a type of interface that is controlled completely by the
eyes. It can track the user’s eye motion or movement and translate it into a
command that a system can execute to perform such tasks as scrolling up and
down, dragging icons, opening documents, and so on.
• Conversational interface agents attempt to personify the computer interface in
the form of an animated person (human-like graphical embodiment), and present
interactions in a conversational form.
6.6 Natural Interaction and User Interfaces 273
Multi-channel and multi-modal are two terms that tend to be often mixed up or used
interchangeably. However, they refer to quite distinct ideas of interaction between
humans and between humans and computers (HCI). In human–human communi-
cation, the term ‘modality’ refers to any of the various types of sensory channels.
These are: vision, hearing, touch, smell and taste. Human senses are realized by
different sensory receptors. The receptors for visual, auditory, tactile, olfactory, and
gustatory signals are found in, respectively, the eyes, ears, skin, nose, and tongue.
Communication is inherently a sensory experience, and its perception occurs as a
multimodal (and thus multi-channel) process. Multimodal interaction entails a set of
varied communication channels provided by a combination of verbal and nonverbal
behavior involving speech, facial movements, gestures, postures, and paralinguistic
features, using multiple sensory organs. Accordingly, one modality entails a set of
communication channels using one sensory channel and different relevant classes of
verbal and nonverbal signals. Basically, nonverbal communication involves more
channels than verbal communication, including space, silence, touch, and smell, in
addition to facial expressions, gestures, and body postures. Indeed, research sug-
gests that nonverbal communication channels are more powerful than verbal ones;
nonverbal cues are more important in understanding human behavior than verbal
ones—what people say. Particularly, visual and auditory modalities, taken sepa-
rately, can enable a wide range of communication channels, irrespective of the class
of verbal and nonverbal signals (see next chapter for examples of channels). These
modalities and related verbal and nonverbal communication behaviors are of high
applicability in HCI, in particular in relation to context-aware computing, affective
computing, and conversational agents.
The term mode and ‘modality’ usually refers to how someone interacts with an
application, which depends on the intended use of that application, how they
provide input to the application, and how output is provided back to them. In HCI, a
modality is a sense through which the human can receive the output of the computer
and a sensor or an input device through which the computer can receive the input
from the human. It is a path of communication employed by the user interface to
carry input (e.g., keyboard, touchscreen, digitizing tablet, sensors) and output (e.g.,
display unit or monitor, loudspeaker) between the human and the computer.
For user input, the visual modalities typically require eyes only, whereas auditory
modalities require ears only, and tactile modalities requires fingers only.
The combination of these modalities is what entails multimodal interfaces. In other
274 6 Implicit and Natural HCI in AmI: Ambient …
words, one is dealing with a multimodal interface when one can both type and
speak (e.g., using vocal signals to send commands to a computer, so that it can
perform a given task), and both hear and see, then. The benefit of multiple input
modalities is increased usability, as mentioned above, for example in the case of
new forms of explicit input, for example, a message may be quite difficult to type
(cognitively demanding) but very easy to communicate to a mobile phone with a
small keypad. Another benefit in the context of context-aware computing, affective
computing, and conversational agents is the accurate detection of a user’s emotional
state, the robust estimation of a user’s emotions, and the disambiguation of com-
municative signals (mapping detected multimodal behavior to intended emotional
communicative functions), respectively. In all, the weakness or unavailability of
one modality is offset by the strength or availability of another.
Furthermore, while auditory, visual, olfactory, and tactile modalities are the fre-
quently used ones in human-to-human communication, HCI commonly uses audi-
tory, visual, and tactile (mostly to carry out input) modalities given the nature of the
interaction—based on computational processes and artificial agents. However, there
are other modalities through which the computer can send information to the human
user, such as tactile modality (e.g., the sense of pressure) and olfaction modality.
Based on the above reasoning, multimodal interaction in HCI, comprising
mostly visual, auditory, and tactile modalities provides multiple modes for the user
to interface with a system, including artificial and natural modes (e.g., keyboard,
mouse, touchscreen, explicit or/and implicit human verbal and nonverbal signals,
etc.). Hence, multimodal user interfaces provide several distinct tools for input and
output of data. Depending on the application, interfaces that may be integrated in a
multimodal user interface include, and are not limited to: web user interface (WUI),
natural-language interface, touchscreen display, zooming user interface, as well as
voice interface, speech interface, facial interface, gesture interface, motion interface,
and conversation interface agent. In the context of ECA, a conversation interface
agent involves inherently many interfaces, especially those associated with natural
modalities, as they may all be needed in a face-to-face conversation. In HCI, an
ECA represents a multimodal interface that uses natural modalities of human
conversation, including speech, facial gestures, hand gestures, and body stances
(Argyle and Cook 1976). In the context of emotional context-aware applications,
multimodal user interfaces allow capturing emotional cues as context information
from different communication channels using both visual and auditory sensory
modalities and relevant classes of verbal or nonverbal signals.
A simple reflex agent (see Fig. 6.1) observes the current environment—percept—
and acts upon it, ignoring the rest of the percept history. Its function is based on the
condition-action rule: if condition then action, and only succeeds when the envi-
ronment is fully observable. Otherwise—if operating in partially observable envi-
ronments—infinite loops become unavoidable, unless, to note, the agent can
randomize its actions.
The simple reflex agent may be used in systems that incorporate the basic concept
of iHCI, i.e., they use situations as implicit elements to trigger the start of systems. In
most of these systems there is direct connection between the situation and the action
that is executed. That is, these systems carry out a predefined action when certain
context is recognized—if-then rule. A simple reflex agent works only if the correct
decision can be made on the basis of the current percept (Russell and Norvig 1995).
Thus, recognition of the situation, the interpretation, and the reaction is simple to
describe, as shown in Fig. 6.1. A common example is an automatic outdoor lantern
device. Such lights are found at the entrance and the floor-levels of buildings.
Whenever a person approaches the entrance and it is dark the light switches on in an
automatic way. A simple sensor is used to detect the situation of interest, which is
hard-coded with an action (switching on the light for a certain period of time).
6.7 Intelligence and Intelligent Agents 279
Fig. 6.1 Simple reflex agent. Source Russell and Norvig (2003)
Due to storing internal models of the world, a model-based reflex agent (see
Fig. 6.2) can handle a partially observable environment. In other words, the agent
current state is stored inside the agent that maintains some sort of knowledge
representation of ‘how the world works’ representing the part of the world that
cannot currently be seen. This knowledge structure (internal model) depends on the
percept input history and thus reflects some of the unobserved aspects of the current
state. Like the reflex agent, the model based agent’s function is based on the
condition-action rule: if condition then action. The model-based reflex agent
resembles intelligent software agents that are used in context-aware applications
which are based on ontological or logical context models, e.g., activity-based
context-aware applications.
Fig. 6.2 Model-based reflex agent. Source Russell and Norvig (2003)
280 6 Implicit and Natural HCI in AmI: Ambient …
Fig. 6.3 Model-based, goal-oriented agent. Source Russell and Norvig (2003)
Goal-Based Agent
A goal-based agent (see Fig. 6.3) uses goal information, which describes situations
that are desirable, to expand on the capabilities of the model-based agent. This
added capability allows the agent to select among the multiple available possibil-
ities the one which reaches a goal state. This stems from the fact that awareness of
the current state of the environment may not always be enough to decide an action.
Search and planning are devoted to finding action sequences that reach the agent’s
goals. This agent is characterized by more flexibility due to the explicit represen-
tation and the modification possibility of the knowledge that supports its decisions.
Also, decision making is fundamentally different from the condition-action rules, in
that it involves consideration of the future. Involving additionally internal model of
‘how the world works’, the goal-based agent may be relevant to AmI systems that
provide predictive services.
A learning agent (see Fig. 6.4) is able to initially operate in unknown environments
and becomes more knowledgeable than its initial knowledge alone might allow. It
entails three distinctive elements: the ‘learning element’, which is responsible for
making improvements, the ‘performance element’, which is responsible for
selecting external actions, and the ‘problem generator’, which is responsible for
suggesting actions that will lead to new experiences. For future improvement, the
learning element employs feedback from the ‘critic’ on how the agent is performing
and determines accordingly how the performance component should be adapted.
The learning agent is what machine learning technique—unsupervised learning
algorithm—entails with regard to context recognition, especially the performance
6.7 Intelligence and Intelligent Agents 281
Fig. 6.4 General learning agent. Source Russell and Norvig (2003)
component represents the entire agent: it takes in (implicit) percepts and decides
on (implicit: context-dependent) actions. The learning agent is relevant to
activity-based or cognitive context-aware applications.
Utility-Based Agent
Unlike a goal-based agent which only differentiates between goal states and
non-goal states, a utility-based agent (see Fig. 6.5) can define a measure of how
desirable a particular state is compared to other states. Comparing different world
states is done, using performance measure, on the basis of ‘how happy they would
make the agent’, a situation which can be described using the term utility. In this
sense, a utility function is used to map ‘a state to a measure of the utility of the
state’, onto a real number that describes the associated degree of happiness. The
concept of ‘utility’ or ‘value’, a measure of how valuable something is to an
intelligent agent, is based on the theory of economics, and used in computing to
make decisions and plans. With the probabilities and utilities of each possible action
outcome, a rational utility-based agent selects, based on what it expects to derive,
the action that maximizes the anticipated utility of the action outcomes. Perception,
representation, reasoning, and learning are computational processes that are used by
utility-based agent to model and keep track of its environment. The computational
tools that analyze how an agent can make choices or decisions include such models
as dynamic decision networks, Markov decision processes, and game theory. Many
of the computational processes underlying the utility-based agent seem to have
much in common with supervised learning algorithms for context recognition. The
utility-based agent can thus be used in location-, activity- and emotion-based
context-aware applications.
282 6 Implicit and Natural HCI in AmI: Ambient …
adaptation whereby the system is inherently reactive because the decision making is
based on the current context with no explicit regard to the future. There is an
assumption in AmI that the software agent should be so intelligent that it can
anticipate the user’s behavior and predict the user’s intentions. AmI represents
technology that can think on its own and predict and adapts and respond to users’
needs. As to responsiveness, the intelligent agent detects and interprets emotional
cues as multimodal behavior, reason about it, determines what the most appropriate
response is, and acts on it. It is worth noting that service-based behaviors and
responses involve both effectors and actuators (physical actors) to act, react, and
pre-act based either on pre-programed heuristics (using ontologies) or real-time
reasoning (using machine learning) capabilities. AmI service types are discussed
further in the next section.
Learning is a key characteristic of the behavior of intelligent agents. It serves
AmI systems to build experience on various types of contexts in a large variety of
domains as well as their relationships as in in real-world scenarios. This is used
primarily to classify or infer new contexts and predict users’ behaviors and actions.
To iterate, it is the experience of the intelligent agent that determines the behavior
of an autonomous system. Machine learning is used to augment AmI systems with
the ability to learn from the user’s context (e.g., states, behaviors) by building and
refining models, specifically in relation to supervised learning algorithms which
keep track of their earlier perceived experiences and employ them to learn the
parameters of the stochastic context models in a dynamic way. This enable AmI
interfaces (agents) to learn from users’ states or behaviors in order to anticipate their
future needs, in addition to recognize new or unknown contextual patterns.
However, the difficulty with intelligent agents is that they can become unpredict-
able. As a consequence of the ability of software intelligent agents to learn, to
adapt, and self-initiatively anticipate their configuration and even their program
structure is that they can react differently on the same control signals at different
points in time. The more intelligent agents learn, the less predictably they behave
(e.g., Rieder 2003).
In all, AmI systems involve various autonomous active devices which entail the
employment of a range of artificial and software intelligent agents. These include,
and are not limited to: push and pull agents (context-aware applications), world
agents, physical agents, distributed agents, multi-agents, and mobile agents, but to
name a few. World agents incorporate an amalgam of classes of agents to allow
autonomous behaviors. Physical agents perceive through sensors and acts through
actuators. Distributed agents are executed on physically distinct (networked)
computers; multi-agent systems are distributed agents that do not have the capa-
bilities to achieve a goal alone and therefore must communicate; and mobile agents
are capable to relocate their execution onto different processors (Franklin and
Graesser 1997). Indeed, an intelligent software agent could run on a user’s com-
puter but could also move around on and across various networks and while exe-
cuting its task, it can collect, store, process, and distribute data.
An AmI intelligent agent is assumed to possess human-like cognitive, emotional,
and social skills. It is claimed by AmI computer scientists that computers will have
6.7 Intelligence and Intelligent Agents 285
a human-like understanding of humans and hence will affect their inner world by
undertaking actions in a knowledgeable manner that improve the quality of their
life. Put differently, AmI seeks to mimic complex natural human processes not only
as a computational capability in its own, but also as a feature of intelligence that can
be used to facilitate and enhance cognitive, emotional, and social intelligence
abilities of humans. However, some views are skeptical towards the concept of
AmI, considering it as questionable, inferior to human intelligence, and something
nonhuman. ‘There may possibly…be a reaction against the concept of AmI as
something nonhuman that completely envelops and surrounds people even if it is
unobtrusive or completely invisible. It will be important to convey the intention
that, in the ambient environment, intelligence is provided through interaction,
or participation and can be appreciated more as something that is
non-threatening, an assistive feature of the system or environment which
addresses the real needs and desires of the user’ (ISTAG 2003, pp. 12–13) (bold
in the original).
AmI can offer a wide variety of services to the user, namely personalized, adaptive,
responsive, and proactive services. AmI is capable of meeting needs and antic-
ipating and adapting and responding intelligently to spoken or gestured indications
of desire, and even these could lead to systems that are capable of engaging in
intelligent dialog (ISTAG 2001; Punie 2003). In terms of iHCI, the range of
application areas that utilize iHCI model is potentially huge, but given the scope of
this chapter, the emphasis is on context-aware applications in relation to ubiquitous
computing and mobile computing that provide the kind of personalized, adaptive,
and proactive services. It is important to note that context-aware applications should
adopt a hybrid form of interactivity to provide these types of services—that is,
user-driven (visibility) and system-driven (invisibility) approaches.
6.8.1 Personalization
Having information on the specific characteristics of the user and their context
available, it becomes possible to create applications that can be tailored to the user
needs. Personalization, sometimes also referred to as tailoring of applications, is a
common feature of both desktop and ubiquitous computing applications. It has been
widely investigated (see, e.g., Rist and Brandmeier 2002; Rossi et al. 2001;
Stiermerling et al. 1997). It entails accommodating the variations between users in
terms of habits (i.e., customs, conducts, routines, practices, traditions, conventions,
286 6 Implicit and Natural HCI in AmI: Ambient …
about the user, the larger becomes the privacy threat. Although considered uneth-
ical, encroachments upon privacy practices continue nowadays and will in the AmI
era, committed by government agencies in association with ICT industry and
marketing companies, and thereby directing the data collected originally for the
purpose of personalized service provision for other acts deemed unjustified and
unacceptable, putting personal data of individuals at risk. Notwithstanding the effort
to overcome privacy issues, the privacy conundrum remains unsolved. How to
‘ensure that personal data can be shared to the extent the individual wishes and no
more’ is ‘not an easy question to answer. Some safeguards can be adopted, but the
snag is that profiling and personalization…is inherent in AmI and operators and
service providers invariably and inevitably will want to ‘‘personalize’’ their offer-
ings as much as possible, and as they do, the risks to personal information will
grow’ (Wright 2005, p. 43). The ICT industry is required to address and overcome
the privacy issues that are most likely to cause many users to decline or distrust any
sort of personalized services in the medium and long-term. Already, experiences
have shown numerous incidents that make personalization unwelcome (e.g., Wright
et al. 2008; Wright 2005).
However, AmI applications should allow the user, especially in non-trivial sit-
uations, to choose to accept or decline the proposed personalized services. Besides,
the control of context-aware interactions should lie in the users’ own hands and not
be dictated by developers as representatives of the ICT industry. This is most often
not the case in current context-aware applications where it is the developer who
decides how the application should behave, not the user. This issue should be
considered in future research endeavors focusing on the design and development of
context-aware applications in terms of personalized services. Designers and
developers in AmI should draw on new findings from recent social studies of new
technologies on user preferences, attitudes, and impression formation in relation to
the use of technology.
Adaptation and responsiveness are key features of AmI. The adaptive behavior of
AmI systems in response to the user’s cognitive or emotional state is regarded as
one of the cornerstones of AmI. Another related feature of the behavior of AmI
systems is the ability to respond to human emotions as a communicative behavior.
AmI aims to provide services and control over interactive processes, and support
various cognitive, emotional, and social needs. See Chaps. 8 and 9 for examples of
adaptive and responsive applications and services and an elaborative discussion
on adaptation and responsiveness as intelligent computational capabilities. There is
much research in the field of HCI dedicated to cognitively and emotionally ambient
user interfaces and related capture technologies and pattern recognition techniques
(real-time reasoning capabilities and pre-programed heuristics as the basis
for adaptation and responsiveness as well as anticipation (see below), in addition to
6.8 Personalized, Adaptive, Responsive, and Proactive Services in AmI 289
culturally dependent (Pantic and Rothkrantz 2003). Individuals differ on the basis of
their cultures and languages as to expressing and interpreting emotions. There are as
many emotional properties that are idiosyncratic as universal. There is hardly ever a
one-size-fits-all solution for the growing variety of users and interactions (Picard
2000). For more challenges and open issues involved in dealing with emotions in
the aforementioned mentioned computing domains, the reader is directed to Chaps.
7 and 8. To avoid negative emotions and convey, evoke, and elicit positive ones is
critical to the success of AmI systems.
users’ intention as an internal context, which is tacit and thus difficult to learn or
capture. It is not easy even for the user to externalize and translate what is tacit into
a form intelligible to a computer system, adding to the fact that user’s intentions are
based on subjective perceptions and change constantly or subject to re-assessment.
‘Realizing implicit input reliably…appears at the current stage of research close to
impossible. Some ‘subtasks for realizing implicit input’ such as…anticipation of
user intention are not solved yet’ (Schmidt 2005, p. 164). Thus, it is more likely that
a computer system may fail in predicting what the user intend or plan to do and
thereby acts outside the range of his/her expectation when taking proactive actions,
causing fear, frustration, or lack of control, especially in such instances where the
mismatch between the system’s anticipation and the reality that was meant to be
experienced by the user is way too much significant. Schmidhuber (1991) intro-
duces, in relation to what is called adaptive curiosity and adaptive confidence, the
concept of curiosity for agents as a measure of the mismatch between expectations
and future experienced reality. This is a method that is used to decrease the mis-
match between anticipated states and states actually experienced in the future. His
rationale is that agents that are capable to monitor their own curiosity explore
situations where they expect to engage with unexpected or novel user experiences
and are capable to deal with complex, dynamic environment more than the others.
This can be useful to AmI systems in the sense of offering the potential to enhance
their anticipatory capabilities to provide relevant proactive services.
Regardless, the degree with which an AmI system’s anticipatory behavior can be
determined by reasoning over dedicated representations or by using predictive
models is a priori decided by the designers of AmI systems. Here the idea of
invisibility comes into play with its contentious issues. The underlying assumption
is that proactive (as well as personalized, adaptive, and responsive) services should
be based on a hybrid approach to service delivery (or finding ways of how implicit
and explicit user interaction may be combined) in the non-trivial kind of AmI
applications. In this line of thinking, Schmidt (2005, p. 168) points out that the
question is how it is possible to achieve stability in the user interface without
confusing the user, for example, ‘when a device is showing different behavior
depending on the situation and the user does not understand why the system
behaves differently and in which way it might lead to confusion and frustration. It is
therefore central to build user interfaces where the proactive behavior of the system
is understandable and predictable by the user even if the details are hidden…’.
Users should be able to understand the logic applied in proactive applications,
meaning that they should know why a certain action is performed or an application
behaves the way it behaves. In addition, they must have the option to switch off the
context-aware proactive interaction or the so-called ‘intelligent’ functionality,
instead of just submitting to what the developer define for them or passively
receiving proactive services without any form of negotiation; they should be able to
intervene in what should happen proactively when certain context conditions are
met by, during design process, composing their own context-aware proactive logic
by defining their own rules; and they should be able to define their own meaning to
context topics, which is typically subjective and evaluating in time.
6.9 Invisible, Disappearing, or Calm Computing 295
Disappearing or calm computing is one of the internal properties for iHCI, which is
in turn one of the main features of AmI and UbiComp. That is to say, the notion of
invisibility of technology and disappearing user interfaces is common to the visions
of AmI and UbiCom. AmI is about technology that is invisible, embedded in our
natural environments and enabled by effortless interactions. In other words, AmI
aims to create an active technology, physically and mentally invisible, seamlessly
integrated into everyday human environment. Invisibility of technology was crys-
tallized into a realist notion in the early 1990s. Weiser (1991) was the first who
focused on this characterization of computing: ‘The most profound technologies are
those that disappear. They weave themselves into the fabric of everyday life until
they are indistinguishable from it… This is not just a “user interface” problem…
Such machines cannot truly make computing an integral, invisible part of the way
people live their lives. Therefore we are trying to conceive a new way of thinking
about computers in the world, one that takes into account the natural human
environment and allows the computers themselves to vanish into the background.
Such a disappearance is a fundamental consequence not of technology, but of
human psychology…. Only when things disappear are we freed to use them without
thinking and so to focus beyond them on new goals.’ The idea that technology will
recede or vanish into the background of our lives and disappear from our con-
sciousness entails that the technology behind will invisibly be embedded and
integrated in everyday life world, and the user interface and its logics (e.g., rea-
soning processes, agent decisions) will be an integral part of interactions, a kind of a
natural extension to our daily tasks and activities.
However, technology invisibility as a phenomenon has proven to be conceptu-
ally diversified. In the literature on and in the discourse of AmI, the term ‘invisi-
bility’ has been used in multiple ways, meaning different things to different people
or based on contradictory or complementary perspectives. According to Crutzen
(2005, pp. 224–225), ‘physical invisibility or perceptual [mental] invisibility mean
that one cannot sense (smell, see, hear or touch) the AmI devices anymore; one
cannot sense their presence nor sense their full (inter)action, but only that part of
interaction output that was intended to change the environment of the individual
user.’ In point of fact, AmI working ‘in a seamless, unobtrusive and often invisible
way’ (ISTAG 2001) entails that even interaction output: adaptive, responsive, and
proactive actions of AmI systems, will be presented in the same way, that is,
without being discovered or consciously perceived by the user. However, more to
the meaning of invisibility, Schmidt (2005, pp. 173–174) conceives of it as ‘not
primarily a physical property of systems; often it is not even clearly related to the
properties of a system… It is not disputed that invisibility is a psychological
phenomenon experienced when using a system while doing a task. It is about the
human’s perception of a particular system in a certain environment.’ This notion of
296 6 Implicit and Natural HCI in AmI: Ambient …
conundrums that are of no easy task to tackle in the pursuit of realizing the vision of
AmI. The whole idea is in fact controversial, spurring an incessant debate of a
philosophical and social nature. Adding to this is the growing criticism that ques-
tions its computational feasibility and real benefits to the user in relation to different
application domains. The fact is that most of the reasoning processes applied in
AmI applications—based on machine learning, logical, or ontological techniques or
a combination of these—involve complex inferences based on limited and imper-
fect sensor data and on oversimplified models.
The basic premise of mental invisibility in AmI is that the operation of the com-
puting devices (e.g., registering presence; monitoring and capturing behavior along
with the state change of the environment; detecting context, learning from user
experiences, reasoning, decision making, etc.) should be moved to the periphery of
our attention. The functioning of the computing devices unobtrusively in the
background is aimed at increasingly invisibility of AmI applications, which can be
accomplished by placing greater reliance on context information and reducing
interactions with or input from users, and thus render interaction effortless, attuned
to human senses (by utilizing natural forms of communications), adaptive and
anticipatory to users, and autonomously acting. Thus, user interfaces will become
visible, yet unnoticeable, part of peripheral senses. This is opposed to the old
computing paradigm where interaction is mostly of an explicit nature, entailing a
kind of a direct dialog between the user and the computer that brings the computer
and thus its operation inevitably to the center of the activity and the whole inter-
action to the center of the user’s attention—the user focus is on the interaction
activity. In AmI, technology will be an integral part of interactions: interactions
between artificial devices and functions of intelligent agents will take place in the
background of the life of the users to influence and change their environment. This
is enabled by context awareness, natural interaction, and autonomous intelligent
behavior as human-inspired computational capabilities. Specifically, augmented
with such computational capabilities, AmI systems can take care of the context in
which users find themselves, by retrieving contextual information (which typically
define and influence their interaction with the environment and its artifacts) and
responding intelligently to it in an autonomous way. Human behaviors and actions
as contextual information will be objects of interactions, ‘captured, recognized and
interpreted by a computer system as input’; and the system output ‘is seamlessly
integrated with the environment and the task of the user’ (Schmidt 2005, p. 64).
Invisible computing is about, quoting Donald Norman, ‘ubiquitous task-specific
computing devices’, which ‘are so highly optimized to particular tasks that they
blend into the world and require little technical knowledge on the part of their users’
298 6 Implicit and Natural HCI in AmI: Ambient …
(Riva et al. 2003, p. 41). Unobtrusiveness of AmI is about interaction that does not
involve a steep learning curve (ISTAG (2003). With the availability of things that
think on behalf of the user, technical knowledge required from users to make use of
computers will be lowered to the minimum, and computing devices will work in
concert to support people in coping with their tasks and performing their activities.
A myriad of intelligent agents will be made available to think on behalf of users and
exploit the rich sets of adaptive and proactive services available within AmI
environments. All in all, mental invisibility is about the perception of user interfaces
in AmI environments, which is experienced when users effortlessly and naturally
interact with user interfaces and what defines and influences this interaction and its
outcomes is done in the background of human life. It is important to underscore that
natural interaction is a salient defining factor for the perception of invisibility and
realization of mental invisibility: users will be able to interact naturally with
computer systems in the same way face-to-face human interaction takes place.
However, the logics of the computer (intelligent user interfaces) disappearing
does not necessarily mean that the computer becomes so intelligent that it can carry
out all sorts of tasks. Rather, it can be optimized only to a particular type of tasks or
activities. In other words, there are some tasks that still require learning—that is,
technical knowledge or ‘minimal expertise’ to make use of computer functionality
and processing to execute these tasks given their complexity. Indeed, not all our
acting is routine acting in everyday life. Hence, the systems (user interfaces) used
for performing demanding tasks may not psychologically be perceived the way
Weiser would put it—‘weave themselves into the fabric of everyday life’—for there
is simply no natural or straightforward way of performing such tasks. Training is
thus required to carry out the task and thus use the system to do so, no matter how
intelligent a computer can become in terms of monitoring, capturing, learning or
interpreting, and reasoning on a user’s cognitive behavior to adapt to what the user
is doing. Yet, in this case, there are different factors that can influence the per-
ception of invisibility, which is strongly linked to the familiarity of the system used
for performing a particular task in a particular environment, which pertains to
non-routine tasks. In this context, perceptual invisibility and the degree of invisi-
bility is contingent on the extent to which people become familiar with (the use of)
the system to perform tasks. Accordingly, the computer as a tool can become a
‘natural extension’ to the task (Norman 1998). But this depends on the knowledge
of the user, the nature of the task, and the surrounding environment, as well as how
these factors interrelate. There are many variations as to the systems, the users, the
tasks, and the environments. In maintaining that invisibility is not primarily a
physical property of systems, Schmidt (2005) suggests four factors that can shape
the perception of invisibility: the user, the system, the task, and the environment,
and only the relationship between them can determine the degree of invisibility as
experience, which is again difficult to assess. To elaborate further on this, taking
this relationship into account, the way the user perceives the system and the task in
terms of whether and how they are complex depends on, in addition to the envi-
ronment being disturbing or conducive (physical, emotional, and social influences),
the experience of the user with using the system and performing the task (cognitive,
6.9 Invisible, Disappearing, or Calm Computing 299
intellectual, and professional abilities). To frame it differently, the nature of the task
and the complexity of the system can be objective facts. From an objectivistic point
of view, the universe of discourse of task or system is comprised of distinct objects
with properties independent of who carries out the task. If two do not understand
how to perform a task or use a system in the same way, it is due to lack of training,
limited knowledge, insufficient experience, unfamiliarity, and plain misunder-
standing. In a nutshell, the degree of invisibility is determined by the extent to
which either the user takes the system for granted or struggles in manipulating it,
and either he/she finds the task easy to perform or is encountered with a difficulty to
perform the task. Accordingly, the perception of invisibility can be linked to the
user’s knowledge and familiarity of using a system to perform a particular task and
also how this task is new and complex as perceived by each user. This notion of
invisibility is different from that which guides context-aware computing and puts
emphasis rather on the system; it is associated with facilitating or improving the
user’s performance of cognitive tasks through placing reliance on knowledge of
cognitive context (e.g., user’s intention, task goals, engaged tasks, work process,
etc.), when computationally feasible in some (demanding) tasks, and also eliciting
positive affective states through aesthetic and visual artifacts to enhance creative
cognition (see Chap. 9 for more detail). Here, the system may, depending on the
situation, disappear, form the (knowledgeable) user’s perception, and it is the
cognitive context awareness functionality that contributes to the system becoming a
‘natural extension’ to the task in this case rather than the user’s familiarity with the
system to perform the task.
In the context of everyday routine tasks, invisibility can essentially be achieved
for any tool, yet to some degree, if the user puts enough time in using it, a notion
which does not relate to the basic idea of AmI in the sense that some ICT-tools
(e.g., on- and off-switch buttons, gadgets, devices, etc.) are embedded invisibly in
the physical world. This is different, to note, from everyday objects as digitally
enhanced artifacts—augmented with micro processors and communication capa-
bilities—with no change to the behavior with regard to their usage. Hence, the
necessity for analyzing the influence of AmI stems from how humans experience
everyday objects and tools in their environment. In our daily life, not only tech-
nologies but a lot of objects and things become mentally invisible, as we use them
without thinking in our routing acting or form a relationship with them so that they
are used subconsciously. They become part of our already unreflective acts and
interactions with the environment. They find their way invisibly into our lives,
disappearing from our perception and environment because of the effortlessness to
use them and of their evident continuous presence (which makes objects still
blended into the world, without having to be hidden or embedded invisibly in
everyday life word). For example, a TV set is mentally invisible when we switch it
on as a matter of routine. But the moment we cannot switch it on, the TV becomes
very present in the action of trying to watch a daily favorite program. Similarly, a
computer becomes mentally invisible when we use it to do something (write or
chat) or as a matter of routine. But the moment the word-processing or commu-
nication application stops functioning, the whole computer becomes very present
300 6 Implicit and Natural HCI in AmI: Ambient …
and at the center of attention in the action of trying to continue writing or chatting.
We do not notice the technologies, things, or tools and their effects until they stop
functioning or act outside the range of our expectations. Nevertheless, these objects
can still be tractable in such situations due to their very physical presence, which is in
contrast to the basic ideas of AmI as to physical invisibility. As Crutzen (2005,
p. 226) argues, ‘Actions and interactions always cause changes, but not all activities
of actors are ‘present’ in interaction worlds. If changes are comparable and com-
patible with previous changes, they will be perceived as obvious and taken for
granted… [R]eady-to-hand interactions will not raise any doubts. Doubt is a nec-
essary precondition for changing the pattern of interaction itself. Heidegger gives
several examples of how doubt can appear and obvious tools will be
‘present-at-hand’ again: when a tool does not function as I expect, when the tool I am
used to is not available, and when the tool is getting in the way of reaching the
intended goal… [T]he ‘present-at-handness’…and the ‘ready-to-handness’…of a
tool are situated and they do not exclude each other. On the contrary, they offer the
option of intertwining use and design activities in interaction with the tool itself. This
intertwining makes a tool reliable, because it is always individual and situated… [T]
his can happen only through involved, embodied interaction. Intertwining of use
and design needs the presence at-hand of the ICT-representations… Their readiness-
to-hand should be doubtable. With AmI we are in danger of losing this ‘critical
transformative room’ … In our interaction with the AmI environment there is no
room for doubt between representation and interpretation of the ready-made inter-
actions with our environment. The act of doubting is a bridge between the obvious
acting and possible changes to our habitual acting. Actors and representations are
only present in an interaction if they are willing and have the potential to create doubt
and if they can create a disrupting moment in the interaction.’ There are many
routine tasks and daily activities that can performed via ICT-tools, and they will
increase in number even more with the use of context-aware functionalities—
ICT-tools will vanish, no physical presence. Whether performed via ICT-tools or
supported by context-aware functionalities, routine tasks can be classified as obvious
and hence mentally invisible. Dewey describes these unreflective responses and
actions as ‘fixed habits’, ‘routines’: ‘They have a fixed hold upon us, instead of our
having a free hold upon things. …Habits are reduced to routine ways of acting, or
degenerated into ways of action to which we are enslaved just in the degree in which
intelligence is disconnected from them. …Such routines put an end to the flexibility
of acting of the individual.’ (Dewey 1916). As further stated by Crutzen (2005,
p. 226), ‘Routines are repeated and established acting; frozen habits which are
executed without thinking. Routine acting with an ICT-tool means intractability; the
tool is not present anymore. The mutual interaction between the tool and the user is
lost.’ This notion of invisibility is the basic idea of AmI, where applications provide
adaptive and proactive services and carry out tasks autonomously on behalf of the
user. This is in line with the idea of ‘technologies…weave themselves into the
fabrics of everyday life’ (Weiser 1991). Here technology becomes accessible by
people to such an extent that they are not even aware of its physical presence and
thus its computational logics, engaging so many computing devices and intelligent
6.9 Invisible, Disappearing, or Calm Computing 301
agents simultaneously without necessarily realizing that they are doing. Hundreds of
computer devices ‘will come to be invisible to common awareness’ and that users
‘will simply use them unconsciously to accomplish everyday tasks’ (Weiser 1991).
Mental invisibility connotes the integration of technology into the daily (inter)action
of humans with the environment and its artifacts, and will, as claimed by AmI, be
settled in their daily routines and activities. In sum, mental invisibility in AmI is
expected to result from equipping context-aware systems with ambient, naturalistic,
multimodal, and intelligent user interfaces and what this entails in terms of context
awareness, natural interaction, and intelligent behavior.
devices that are to be embedded in everyday objects, but also in computer systems.
The increasing miniaturization of computer technology is predicted to result in a
multitude of microprocessors and micro-sensors being integrated into user inter-
faces as part of AmI artifacts and environments and thus in the disappearance of
conventional explicit input and output media, such as keyboards, pointing devices,
touch screen, and displays (device, circuitry, and enclosure). See Chap. 4 for more
detail on miniaturization trend in AmI.
In relation to context-aware systems, physical invisibility and seamless inte-
gration of a multitude of microelectronic devices and components (dedicated
hardware) that form ambient user interfaces without conventional input and output
media (but with visual output displays) has implication for the psychological per-
ception of such user interfaces and thus mental invisibility. This related to the
assumption that physical invisibility may lead to mental invisibility, but this is valid
as long as the system does not react in ways it is not supposed to react or function
when it is not needed. Otherwise as long as a tool is physically invisible, the process
of mental invisibility cannot start. Hence, the physical presence of ICT-tools or
computer systems is important in the sense that people can still control them if
something goes wrong, thereby shunning any issue of intractability. The smart
devices constituting context-aware systems are not possible to control for they are
too small to see and manipulate, or rather they are designed in ways not to be
accessed by users. Consequently, the assumption that physical invisibility will lead
to mental invisibility becomes, to some extent, erroneous, unless ICT-tools and
products function flawlessly or are faultlessly designed, which will never be the
case when it comes to interactive computer systems,—whether in AmI or in any
vision of a next wave in computing. This is due to many reasons, among which:
failure of technologies during their instantiation is very significantly likely, as they
are computationally complex and technology-driven (constrained by existing
technologies); undergo fast, insubstantial evaluation, which is often favored in
technology and HCI design to get new applications and systems quickly to the
market; and with an exponential increase in networked, embedded, always-on
devices, the probability of failure for any of them increases proportionally, adding
to the fact that the technology is created in society and thus is the product of social
processes and thus diverse social actors and factors—sociocultural situativity.
Besides, achieving a high degree of robustness and fault tolerance is what the ICT
industry covets or wishes for when it comes to ‘innovative’ technologies regardless
of the computing paradigm.
As initially defined by Mark Weiser and if it actually materializes—it is still the
way to follow completely, the vision of invisible computing will radically change
the way people perceive the digital and physical world, and much of the way they
understand and act in the social world. The AmI vision explicitly proposes to
transform society by fully technologizing it, and hence it is very likely that this will
have far-reaching, long-term implications for people’s everyday lives and human,
social, and ethical values (see Bohn et al. 2004).
6.9 Invisible, Disappearing, or Calm Computing 303
behaves); dispossessing user from the option to switch off intelligent functionalities;
partial user participation in system and application design; underestimation of the
subjectivity and situatedness of interaction and what defines and surrounds it; and
unaccountability of designers and developers include: disturbance, annoyance,
confusion, mistrust, insecurity, suspicion, and hostility, as well as marginalization
and disempowerment of users, discrimination and favoritism against users, and
power relations. The whole notion of invisibility of technology ‘is sometimes seen as
an attempt to have technology infiltrates everyday life unnoticed by the general
public in order to circumvent any possible social resistance’ (Bohn et al. 2004, p. 19).
Loss of control has implication for user acceptance of AmI technologies. It will
be very difficult for technologies to be accepted by the public, if they do not react in
ways they are supposed to react; do not function when they are needed; and do not
deliver what they promise (Beslay and Punie 2002). AmI applications need to be
predictable, reliable, and dependent. Similarly, physical invisibility may harm
acceptance because AmI systems become difficult to control (Punie 2003). This
intractability is due to the loss of mutual interaction between the technology and the
user. Perhaps, the interface as an omnipresent interlocutory space will lose its
central stage as a mediator in human-computer interactions (Criel and Claeys 2008).
As a consequence, an intelligent environment that takes decisions on user’s behalf
and what this entails in terms of reduced interaction with the user may very well
harm rather than facilitate AmI acceptance (see Punie 2005).
principle for AmI development in its social context. In fact, the idea will be par-
ticularly effective, instead of merely evoking an inspiring vision of an unprob-
lematic and a peaceful ‘computopia’ in the twenty-first century. The idea that
technologies will ‘weave themselves into the fabric of everyday life until they are
indistinguishable from it’, i.e., context-aware systems ‘will come to be invisible to
common awareness’ so that ‘people will simply use them unconsciously to
accomplish everyday tasks’ and in this way ‘computers can find their way invisibly
into people’s lives’ (Weiser 1991) is just a faulty utopia associated with AmI
(Ulrich 2008, p. 5). The early vision of disappearing interfaces and invisibility of
technology as initially defined by Weiser is perhaps not the way to follow com-
pletely (Criel and Claeys 2008). Crutzen (2005, p. 225) contends, ‘The hiding of
AmI in daily aesthetic beautiful objects and in the infrastructure is like the wolf in
sheep’s clothing, pretending that this technology is harmless. Although “not seeing
this technology” could be counterproductive, it is suspicious that computing is
largely at the periphery of our attention and only in critical situations should come
to our attention. Who will decide how critical a situation is and who is then given
the power to decide to make the computing visible again’. And the physical
invisibility of AmI signifies ‘that the whole environment surrounding the individual
has the potential to function as an interface. Our body representations and the
changes the individual will make in the environment could be unconsciously the
cause of actions and interactions between the AmI devices’ (Ibid).
The vision of invisible computing has over the last decade been a subject of much
debate and criticism. The main critical voice or standpoint underlying this debate,
from within and outside the field of AmI, recognizes that users should be given the
lead in the ways that the so-called intelligent interfaces and services are designed and
implemented and that technologies should be conspicuous and controllable by
people. This involves exposing ambiguity and empowering users—that is, recon-
sidering the role of users, by making them aware of and enabling them to control what
is happening behind their backs and exposing them to the ambiguities raised by the
imperfect sensing, analysis, reasoning, and inference. Rather than focusing all the
efforts on the development of technologies for context awareness and on the design
and implementation of context-aware applications based on the guiding principle of
invisibility, research should—and it is time to—be directed towards revisiting the
notion of intelligence in context-aware computing, especially in relation to user
empowerment and visibility. Indeed, it has thus been suggested that it is time for the
AmI field to move beyond its vision of disappearing interfaces and technology
invisibility, among others, and embrace emerging trends around the notion of
intelligence as one of the core concepts of AmI. In other words, several eminent
scholars in and outside the field of AmI have advocated the proposed alternative
6.9 Invisible, Disappearing, or Calm Computing 309
diversity of options to influence the behavior, use and design of the technology’
(Crutzen 2005, p. 227). All in all, the way forward is to make (some aspects of)
technology visible mentally and physically in aspects deemed necessary for enabling
users to control the behavior of computing devices and oversee their interactions with
the environment in human presence. Otherwise users may fail or find it difficult to
develop an adequate mental concept for AmI interactions and behaviors when
computing devices grow more sophisticated, gain more autonomy and authority,
function unobtrusively, and become invisible, embedded. To overcome the issues of
invisibility, new interaction paradigms and novel HCI models and methods for
design and development of user interfaces are needed. AmI requires a new turn in
HCI for interacting with small and embedded computing devices to serve people
well. AmI should not be so much about how aesthetically beautiful computing
devices or how seamlessly integrated are in AmI environments as it should be about
the way people would aspire to interact with these computing devices when they
become an integral part of their daily. The challenge to context-aware computing is to
advance the knowledge of context-aware applications that conceptualize and oper-
ationalize context based on more theoretic disciplines instead of alienating the
concept from its complex meaning to serve technical purposes. The key concern is no
longer to provide context information and context-dependent services but rather, to
question the way the concept of context is defined and operationalized in the first
place. ‘Invisibility is not conducive to questioning. To make sure we are aware of
contextual assumptions and understand the ways they condition what we see, say,
and do, we have no choice but to go beyond the vision of invisibility… We probably
need to take the concept of context much more seriously than we have done so far…
I would argue that information systems research and practice, before trying to
implement context awareness technically, should invest more care in understanding
context awareness philosophically and should clarify, for each specific application,
ways to support context-conscious and context-critical thinking on the part of users.
In information systems design, context-aware computing and context-critical
thinking must somehow come together, in ways that I fear we do not understand
particularly well as yet’ (Ulrich 2008, p. 8).
The underlying assumption of complementing invisibility with visibility is to
enable users to have a certain degree of control over the behavior of intelligent agents
by having the possibility to mutually exchange representations or negotiate with
context-aware systems (intelligent agents), thereby influencing the execution of their
(ready-made) behavior. Any kind of agent-based negotiations can only succeed if
there is trust, e.g., the agents will represent the user at least as effective as the user
would do in similar circumstances (Luck et al. 2003). Otherwise technologies could
easily acquire an aspect of ‘them controlling us’ (ISTAG 2001). Furthermore, the
technology revealing what the system has to offer motivates users to relate the
possibilities of the technology to their actual needs, dreams, and wishes (Petersen
2004). Drawing on Crutzen (2005), our acting is not routine acting in its entirety, and
using an AmI system is negotiating about what actions of the system are appropriate
for the user or actor’s situation. The ready-made behavior of ICT-representations
should ‘be differentiated and changeable to enable users to make ICT-representations
6.9 Invisible, Disappearing, or Calm Computing 311
ready and reliable for their own spontaneous and creative use; besides ‘translations
and replacements of ICT-representations must not fit smoothly without conflict into
the world for which they are made ready. A closed readiness is an ideal which is not
feasible, because in the interaction situation the acting itself is ad-hoc and therefore
unpredictable.’ (Ibid). Hence, a sound interface, nearby or remote, is the one that can
enable users to influence the decisions and actions of context-aware applications and
environments. It is important to keep in mind that people are active shapers of their
environments, not passive consumer of what technology has to offer as services in
their environments. Intelligence should, as José et al. (2010, p. 1487) state, ‘emerge
from the way in which people empowered with AmI technologies will be able to act
more effectively in their environment. The intelligence of the system would not be
measured by the ability to understand what is happening, but by the ability to achieve
a rich coupling with users who interpret, respond to, and trigger new behavior in the
system. This view must also accommodate the idea that intelligence already exists in
the way people organize their practices and their environments’. This entails that
human environments such as living places, workplaces, and social places, already
represent human intelligence with its subjectivity and situatedness at play. People
should be empowered into the process of improvised situatedness that characterizes
everyday life (Dourish 2001).
Reaching the current stage of research within implicit and natural HCI and achieving
the current state-of the-art related applications has been made possible by the
amalgamation of the breakthroughs at the level of the enabling technologies and
processes of AmI and new discoveries in cognitive science, AI, cognitive neuro-
science, communication engineering, human communication, and social sciences—
that, combined, make it possible to acquire a better understanding of the cognitive,
emotional, behavioral, and social aspects and processes underlying human-to-human
communication and how this complex and manifold process can be implemented
into computer systems. In this regard, it is important to underscore that interdisci-
plinary research endeavors have been of great influence on the advent of the new
HCI paradigm, which has made it possible to build ground-breaking or novel
interactive systems. Human communication and thus HCI entail many areas that
need to be meshed together through interdisciplinary research to create interactional
knowledge necessary to understand the phenomenon of AmI as a novel approach to
HCI. HCI within the area of AmI is too complex to be addressed by single disciplines
and also exceeds the highly interdisciplinary field—in some of its core concepts such
as context, interaction, and actions. It is suggested that interdisciplinary efforts
remain inadequate in impact on theoretical development for coping with the
changing human conditions (see Rosenfield 1992). Hence, transdisciplinary
approach remains more pertinent to investigate HCI in relation to AmI—as a
complex problem, as this approach insists on the fusion of different elements of a set
of theories with a result that exceeds the simple sum of each. Thus, any future
research agenda for HCI in AmI should draw on several theories, such as context in
theoretic disciplines, situated cognition, situated action, social interaction, social
behavior, verbal and nonverbal communication behavior, and so on. Understanding
the tenets of several pertinent theories allows a more complete understanding
of implicit and natural HCI. Among the most holistic, these theories are drawn
mainly from cognitive science, social science, humanities, human communication,
philosophy, constructivism and constructionism, and so on.
References
Balkenius C, Hulth N (1999) Attention as selection-for-action: a scheme for active perception. In:
Schweitzer G, Burgard W, Nehmzow U, Vestli SJ (eds) Proceedings of EUROBOT ‘99, IEEE
Press, pp 113–119
Barkhuus L, Dey AK (2003) Is context-aware computing taking control away from the user? Three
levels of interactivity examined. Proceedings of UbiComp. Springer, Heidelberg, pp 149–156
Beslay L, Punie Y (2002) The virtual residence: identity, privacy and security. The IPTS Report
67:17–23 (Special Issue on Identity and Privacy)
Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A
survey of context modelling and reasoning techniques. J Pervasive Mobile Comput 6(2):161–180
(Special Issue on Context Modelling, Reasoning and Management)
Bibri SE (2012) A critical reading of the scholarly and ICT industry’s construction of ambient
intelligence for societal transformation of Europe. Master thesis, Malmö University
Bohn J, Coroama V, Langheinrich M, Mattern F, Rohs M (2004) Living in a world of smart
everyday objects—social, economic, and ethical implications. J Hum Ecol Risk Assess 10
(5):763–786
Brown PJ, Jones GJF (2001) Context-aware retrieval: exploring a new environment for
information retrieval and information altering. Pers Ubiquit Comput 5(4):253–263
Brown PJ, Bovey JD, Chen X (1997) Context-aware applications: from the laboratory to the
marketplace. IEEE Pers Commun 4(5):58–64
Cassell J, Sullivan J, Prevost S, Churchill E (eds) (2000) Embodied conversational agents. MIT
Press, Cambridge
Chen G, Kotz D (2000) A survey of context-aware mobile computing research. Paper
TR2000-381, Department of Computer Science, Darthmouth College
Cheverst K, Mitchell K, Davies N (2001) Investigating context-aware information push vs.
information pull to tourists. In: Proceedings of mobile HCI 01
Criel J, Claeys L (2008) A transdisciplinary study design on context-aware applications and
environments, a critical view on user participation within calm computing. Observatorio
(OBS*) J 5:057–077
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc
3(4):219–232
de Silva GC, Lyons MJ, Tetsutani N (2004) Vision based acquisition of mouth actions for
human-computer interaction. In: Proceedings of the 8th pacific rim international conference on
artificial intelligence, Auckland, pp 959–960
Dewey J (1916) Democracy and education. The Macmillan Company, used edition: ILT Digital
Classics 1994. http://www.ilt.columbia.edu/publications/dewey.html. Viewed 25 June 2005
Dey AK (2001) Understanding and using context. Pers Ubiquit Comput 5(1):4–7
Dix A, FinlayJ, Abowd G, Beale R (1998) Human computer interaction. Prentice Hall Europe,
Englewood Cliffs, NJ
Dourish P (2001) Where the action is. MIT Press
Dreyfus H (2001) On the internet. Routledge, London
Erickson T (2002) Ask not for whom the cell phone tolls: some problems with the notion of
context-aware computing. Commun ACM 45(2):102–104
Franklin S, Graesser A (1997) Is it an agent, or just a program?: a taxonomy for autonomous
agents. In: Proceedings of the 3rd international workshop on agent theories, architectures, and
languages. Springer, London
Gill SK, Cormican K (2005) Support ambient intelligence solutions for small to medium size
enterprises: Typologies and taxonomies for developers. In: Proceedings of the 12th
international conference on concurrent enterprising, Milan, Italy, 26–28 June 2005
Hayes PJ, Reddy RD (1983) Steps toward graceful interaction in spoken and written man-machine
communication. Int J Man Mach Stud I(19):231–284
Hix D, Hartson HR (1993) Developing user interfaces: ensuring usability through product and
process. Wiley, London
316 6 Implicit and Natural HCI in AmI: Ambient …
ISO 9241-11 (1998) Ergonomic requirements for office work with visual display terminals
(VDTs), part 11: guidance on usability. International Organization for Standardization,
Switzerland, Genève
ISTAG 2001 (2001) Scenarios for ambient intelligence in 2010. ftp://ftp.cordis.lu/pub/ist/docs/
istagscenarios2010.pdf. Viewed 22 Oct 2009
ISTAG 2003 (2003) Ambient intelligence: from vision to reality (For participation—in society and
business). http://www.ideo.co.uk/DTI/CatalIST/istag-ist2003_draft_consolidated_report.pdf.
Viewed 23 Oct 2009
José R, Rodrigues H, Otero N (2010) Ambient intelligence: beyond the inspiring vision. J Univ
Comput Sci 16(12):1480–1499
Karpinski M (2009) From speech and gestures to dialogue acts. In: Esposito A, Hussain A,
Marinaro M, Martone R (eds) Multimodal signals: cognitive and algorithmic issues. Springer,
Berlin, pp 164–169
Kasabov N (1998) Introduction: hybrid intelligent adaptive systems’. Int J Intell Syst 6:453–454
Kelley T (2002) The art of innovation: lessons in creativity from IDEO, America’s leading design
firm. Harper Collins Business, London
Kim S, Suh E, Yoo K (2007) A study of context inference for web-based information systems.
Electron Commer Res Appl 6:146–158
Kumar M, Paepcke A, Winograd T (2007) EyePoint: practical pointing and selection using gaze
and keyboard. In: Proceedings of the CHI: conference on human factors in computing systems,
San Jose, CA, pp 421–30
Lavie T, Tractinsky N (2004) Assessing dimensions of perceived visual aesthetics of web sites.
Int J Hum Comput Stud 60(3):269–298
Lee Y, Shin C, Woo W (2009) Context-aware cognitive agent architecture for ambient user
interfaces. In: Jacko JA (ed) Hum Comput Interact. Springer, Berlin, pp 456–463
Lenat DB, Guha RV (1994) Enabling agents to work together. Communications of the ACM 37
(7):127–142
Lenat DB, Guha RV, Pittman K, PrattM D, Shepherd M (1990) Cyc: toward programs with
commonsense. Commun ACM 33(8):30–49
Lieberman H, Selker T (2000) Out of context: computer systems that adapt to, and learn from,
context. IBM Syst J 39:617–632
Lindblom J, Ziemke T (2002) Social situatedness: Vygotsky and beyond. In: 2nd international
workshop on epigenetic robotics: modeling cognitive development in robotic systems,
Edinburgh, pp 71–78
Luck M, McBurney P, Priest C (2003) Agent technology: enabling next generation computing.
A roadmap for agent-based computing. Agentlink EU FP5 NoE
Lucky R (1999) Connections. In: Bi-monthly column in IEEE Spectrum
Lueg C (2002) Operationalizing context in context-aware artifacts: benefits and pitfalls. Hum
Technol Interface 5(2):1–5
Luger G, Stubblefield W (2004) Artificial intelligence: structures and strategies for complex
problem solving. The Benjamin/Cummings Publishing Company Inc
Nielsen J (1993) Usability engineering. Academic Press, Boston
Nielsen J, Budiu R (2012) Mobile usability. New Riders Press
Norman DA (1988) The design of everyday things. Doubleday, New York
Norman DA (1998) The invisible computer. MIT Press, Cambridge, MA
Pantic M, Rothkrantz LJM (2003) Toward an affect sensitive multimodal human-computer
interaction. Proc IEEE 91(9):1370–1390
Petersen MG (2004) Remarkable computing—the challenge of designing for the Home. In: CHI
2004, Vienna, Austria, pp 1445–1448
Pfeifer R, Scheier C (1999) Understanding intelligence. MIT Press, Cambridge
Picard RW (1997) Affective computing. MIT Press, Cambridge
Picard RW (2000) Perceptual user interfaces: affective perception. Commun ACM 43(3):50–51
Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of
affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
References 317
Poslad S (2009) Ubiquitous computing: smart devices, environments and interaction. Wiley,
London
Punie Y (2003) A social and technological view of ambient intelligence in everyday life: what
bends the trend? In: The European media and technology in everyday life network, 2000–2003,
Institute for Prospective Technological Studies Directorate General Joint Research Center
European Commission
Punie Y (2005) The future of ambient intelligence in europe: the need for more everyday life. In:
Media technology and everyday life in Europe: from information to communication. Roger
Silverstone Edition, Ashgate, pp. 141–165
Rieder B (2003) Agent technology and the delegation-paradigm in a networked society. Paper for
the EMTEL conference, 23–26 April, London
Rist T, Brandmeier P (2002) Customizing graphics for tiny displays of mobile devices. Pers
Ubiquit Comput 6(4):260–268
Riva G, Loreti P, Lunghi M, Vatalaro F, Davide F (2003) Presence 2010: the emergence of
ambient intelligence. In: Riva G, Davide F, Jsselsteijn WAI (eds) Being there: concepts, effects
and measurement of user presence in synthetic environments. IOS Press, Amsterdam, pp 60–81
Riva G, Vatalaro F, Davide F, Alcañiz M (2005) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam
Rosen R (1985) Anticipatory systems: philosophical, mathematical and methodological founda-
tions. Pergamon Press, Oxford
Rosenfield PL (1992) The potential of transdisciplinary research for sustaining and extending
linkages between the health and social science. Soc Sci Med 35(11):1343–1357
Rossi D, Schwabe G, Guimares R (2001) Designing personalized web applications. In:
Proceedings of the tenth international conference on World Wide Web, pp 275–284
Russell S, Norvig P (1995) Artificial intelligence: a modern approach. Prentice-Hall Inc,
Englewood Cliffs, NJ
Russell S, Norvig P (2003) Artificial intelligence—a modern approach. Pearson Education, Upper
Saddle River, New Jersey
Samtani P, Valente A, Johnson WL (2008) Applying the SAIBA framework to the tactical
language and culture training system. In: Parkes P, Parsons M (eds) The 7th international
conference on autonomous agents and multiagent systems (AAMAS 2008). Estoril, Portugal
Schilit B, Adams N, Want R (1994) Context-aware computing applications. In: Proceedings of
IEEE workshop on mobile computing systems and applications, Santa Cruz, CA, pp 85–90
Schmidhuber J (1991) Adaptive confidence and adaptive curiosity. Technische Universitat
Munchen, Institut fur Informatik
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam, pp 159–178
Schütz A, Luckmann T (1974) The structures of the life-world. Heinemann, London
Sibert LE, Jacob RJK (2000) Evaluation of eye gaze interaction. In: Proceedings of the ACM
conference on human factors in computing systems. The Hague, pp 281–288
Smith R, Conrey FR (2007) Agent-based modeling: a new approach for theory building in social
psychology. Person Soc Psychol Rev 11:87–104
Somervell J, Wahid S, McCrickard DS (2003) Usability heuristics for large screen information
exhibits. In: Rauterberg M, Menozzi M, Wesson J (eds) INTERACT 2003, Zurich, pp 904–907
Stiermerling O, Kahler H, Wulf V (1997) How to make software softer—designing tailorable
applications. In: Symposium on designing interactive systems, pp 365–376
Suchman L (1987) Plans and situated actions: the problem of human-machine communication.
Cambridge University Press, Cambridge
Suchman L (2005) Introduction to plans and situated actions II: human-machine reconfigurations,
2nd edn. Cambridge University Press, New York/Cambridge
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA
318 6 Implicit and Natural HCI in AmI: Ambient …
ter Maat M, Heylen D (2009) Using context to disambiguate communicative signals. In:
Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals. Springer, Berlin,
pp 164–169
Udsen LE, Jorgensen AH (2005) The aesthetic turn: unraveling recent aesthetic approaches to
human-computer interaction. Digital Creativity 16(4):205–216
Ulrich W (2008) Information, context, and critique: context awareness of the third kind. In: The
31st information systems research seminar in Scandinavia, Keynote talk presented to IRIS 31
Vilhjálmsson HH (2009) Representing communicative function and behavior in multimodal
communication. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Signals: cognitive
and algorithmic issues. Springer, Berlin, pp 47–59
Wasserman V, Rafaeli A, Kluger AN (2000) Aesthetic symbols as emotional cues. In: Fineman S
(ed) Emotion in organizations. Sage, London, pp 140–167
Weiser M (1991) The computer for the 21st century. Sci Am 265(3):94–104
Weiser M, Brown JS (1998) The coming age of calm technology. In: Denning PJ, Metcalfe RM
(eds) Beyond calculation: the next fifty years of computing. Springer, New York, pp 75–85
Wooldridge M (2002) An introduction to multiagent systems. Wiley, London
Wooldridge M, Jennings NR (1995) Intelligent agents: theory and practice. Knowl Eng Rev 10
(2):115–152
Wright D (2005) The dark side of ambient intelligence. Forsight 7(6):33–51
Wright D, Gutwirth S, Friedewald M, Punie Y, Vildjiounaite E (2008) Safeguards in a world of
ambient intelligence. Springer, Dordrecht
York J, Pendharkar PC (2004) Human-computer interaction issues for mobile computing in a
variable work context. Int J Hum Comput Stud 60:771–797
Zhou J, Kallio P (2005) Ambient emotion intelligence: from business awareness to emotion
awareness. In: Proceeding of 17th international conference on systems research, informatics
and cybernetics, Baden
Zhou J, Yu C, Riekki J, Kärkkäinen E (2007) AmE framework: a model for emotion-aware
ambient intelligence. University of Oulu, Department of Electrical and Information Engineering,
Faculty of Humanities, Department of English VTT Technical Research Center of Finland
Part II
Human-Inspired AmI Applications
Chapter 7
Towards AmI Systems Capable
of Engaging in ‘Intelligent Dialog’
and ‘Mingling Socially with Humans’
7.1 Introduction
The origins of ECA (and SDS) can be traced back to AI research in the 1950s
concerned with developing conversational interfaces. The research is commonly
considered a branch of HCI. However, it is only during the last decade, with major
advances in speech and natural interaction technology, that large-scale working
conversational systems have been developed and applied, where the incorporation
of components remains of a key issue. As a research area in AI, ECAs attempt to
personify the computer interface in the form of an animated person (human-like
graphical embodiment) or robot (human-like physical embodiment), and present
interactions in a conversational form. Given the fundamental paradigm of AmI,
namely that interfaces disappear from the user’s consciousness and recede into the
background, the model of human-like graphical embodiment is of more relevance
in the context of AmI. A face-to-face conversation involving humans and virtual
beings is considered as the highest intelligent behavior an AmI system can exhibit.
In this sense, AI relates to AmI in that the latter entails artificial systems that
possess human-inspired intelligence in terms of the processes and behaviors
associated with conversational acts—computational intelligence.
More recent research within ECAs has started to focus on context (dialog,
situation, environment, and culture) to disambiguate communicative signals and
generate multimodal communicative behavior. This research endeavor constitutes
one of the critical steps towards coming closer to the aim of creating interaction
between humans and systems that verge on natural interaction. Conversational
systems are built on theoretical models of linguistics and its subfields as well as
nonverbal communication behavior, coupled with context awareness, natural
interaction, and autonomous intelligent behavior as computational capabilities
exhibited by agents. Within AI research in AmI, many theoretical perspectives of
human communication are being investigated, and new computational modeling
and simulation techniques are being developed to create believable human repre-
sentatives. The combination of recent discoveries in human communication and
neurocognitive science—that make it possible to acquire a better understanding of a
variety of aspects of human functioning in terms of interaction (linguistic, prag-
matic, psycholinguistic, neurolinguistic, sociolinguistic, cognitive-linguistic, and
paralinguistic aspects), and the breakthroughs at the level of the enabling tech-
nologies make it increasingly possible to build advanced conversational systems
based on this understanding.
This chapter addresses computational intelligence in terms of conversational and
dialog systems and computational processes and methods to support complex
communicative tasks. It aims to explore human verbal and nonverbal communi-
cation behavior and shed light on the recent attempts undertaken to investigate
different aspects of human communication with the aim to replicate and implement
them into ECAs. In HCI, ECAs represent multimodal user interfaces where
modalities are the natural modalities of human conversation, namely speech, facial
expressions and gestures, hand gestures, and body postures (Cassell et al. 2000).
7.2 Perspectives and Domains of Communication 323
expression, production entails the process by which human agents express them-
selves through deciding, planning, encoding, and producing the message they wish
to communication. Transmission involves sending the message through some
medium to the recipient, e.g., in verbal communication the only medium of con-
sequence through which the spoken message travels is air. Reception, also referred
to as comprehension, entails the process by which the recipient detects the message
through the sense of hearing and then decodes the expression produced by the
sender. The receiver interprets the information being exchanged and then gives the
sender a feedback. Through this process, which is intrapersonal in nature, infor-
mation transmission affects each of the parties involved in the communication
process. Communication, whether be it verbal or nonverbal, involves, according to
Johnson (1989), three essential aspects: transmission of information, the meaning of
that transmission, and the behavioral effects of the transmission of the information.
From a different perspective, human communication can be clustered into four
levels: the content and form of messages, communicators, levels of communication,
and contexts and situations in which communication occurs (Littlejohn and Foss
2005). Using these levels, human communication can have a more structured view.
Furthermore, communication entails that communicating participants share an area
of communicative commonality, which is essential for a better understanding of the
content being exchanged between them. Characteristically, human communication
involves a common knowledge base used in communication between humans for
understanding each other. This common knowledge includes a complete world and
language model; language is a particular way of thinking and talking about the
world. The expectation of humans towards other humans in any communication act
is strongly influenced by the common knowledge they share. There are many types
of theories that attempt to describe the different models, levels, components, and
variables of how human communication as a complex and manifold process is
achieved. In all, human communication is a planned act performed by a human–
agent for the purpose to cause some effect in an attentive human recipient using
both verbal and nonverbal behaviors. In other words, it entails a two-way com-
munication process of reaching mutual understanding, in which participants
exchange (encode–decode) representations pertaining to information, ideas,
thoughts, and feelings, as well as create, share, and ascribe meaning (to these
representations).
Human communication is the field of study that is concerned with how humans
communicate, involving all forms of verbal and nonverbal communication. As an
academic discipline, human communication draws from several disciplines,
including linguistics, sociolinguistics, psycholinguistics, cognitive linguistics,
behavioral science, sociology, anthropology, social constructivism, social con-
structionism, and so on. As a natural form of interaction, human communication is
highly complex, manifold, subtle, and dynamic. It makes humans the most pow-
erful communicators on the planet. To communicate with each other and convey
and understand messages, humans use a wide variety of verbal and nonverbal
communicative behaviors. As body movements, such behaviors are sometimes
classified into micro-movements (e.g., facial expressions, eye movement) and
7.3 Human Communication 325
macro-movements (e.g., gestures, corporal stances). They have been under vigorous
investigation in the creation of AmI systems for ambient services and conversa-
tional purposes, as they can be utilized as both implicit and explicit inputs for
interface control and interaction.
by having more often a greater impact than the words being said. It is about how,
rather than what, the sender conveys the message that has a major effect on the
receiver. All in all, a great deal of our communication is of a nonverbal form. Facial
expression followed by vocal intonation (with the actual words being of minor
significance) are primarily relied on by the listener to determine whether they are
liked or disliked in an engaged face-to-face conversation, as Pantic and
Rothkrantz’s (2003) findings indicate. Research shows that hand gestures play an
important role in carrying the contrast between what a person likes and dislikes
instead of relying completely on the words. In general, by conveying gestures, the
sender can capture the attention of the receiver and connect with him/her. Also the
receiver of the message usually tends to base the intentions of the sender on the
nonverbal cues he/she receives to decode or better understand what he/she want to
say should the flow of communication is hindered due to, for example, an incon-
gruousness between nonverbal cues and the spoken message. In relation to con-
veying emotions, Short et al. (1976) point out that the primacy of nonverbal
affective information—independent of modality—is demonstrated by studies
showing that when this visual information is in conflict with verbal information,
people tend to trust visual information. Moreover, a communication act can more
often take place when a sender expresses facially or gestures a desire to engage in a
face-to-face conversation, assuming that both the sender and the receiver have to
give the same meanings to the nonverbal signal.
Nonverbal communication behavior constitutes the basis of how humans interact
with one another. It has been extensively researched and profusely discussed. There is
a large body of theoretical, empirical and analytical scholarship on the topic. The
following studies constitute the basis for an ever-expanding understanding of how we
all nonverbally communicate: Andersen (2004, 2007), Argyle (1988), Bull (1987),
Burgoon et al. (1996), Floyd and Guerrero (2006), Guerrero et al. (1999), Hanna
(1987), Fridlund et al. (1987), Hargie and Dickson (2004), Siegman and Feldstein
(1987), Gudykunst and Ting-Toomey (1988), Ottenheimer (2007). Segerstrale and
Molnar (1997), and Freitas-Magalhães (2006); but to name a few. These works cover
a wide range of nonverbal communication from diverse perspectives, including
psychological, social, cultural, and anthropological perspectives.
Nonverbal communication is most likely easily elucidated in terms of the various
channels through which related messages pass, including face, hand, eye, body,
space, touch, smell prosody, silence, time, and culture. Considering the purpose of
this chapter, only key relevant channels are reviewed. This is based on the basic
idea of how meaningful and consequential such channels are than others with
regard to human-like graphical embodiment, how conversational agents attempt to
personify the computer interface in the form of an animated person. Accordingly,
face, hand, eye, body, prosody, and paralanguage seem to be of higher relevance—
naturalistic multimodal user interfaces are used by computer systems to engage in
intelligent dialog with humans in an AmI environment. Next, body movements,
facial gestures, eye movements and contact, and paralanguage are addressed.
7.3 Human Communication 327
One of the most frequently observed conversational cues is hand gestures—in other
words, most people use hand movements regularly in conversational acts. Gestures
form the basis on how humans interact with one another, enabling to communicate
a variety of feelings and thoughts, and therefore they are natural and invisible to
each other. While some gestures have universal meanings, others are individually
learned and thus idiosyncratic. Researchers in kinesics—the study of nonverbal
communication through face and body movements—identify five major categories
of body movements: emblems, illustrators, affect displays, regulators, and adaptors
(Ekman and Friesen 1969; Knapp and Hall 1997). Emblems are body gestures that
directly translate into words or phrases, which are used consciously to communicate
the same meaning as the words, such as the ‘OK’ sign and the ‘thumbs-up’.
Illustrators are body gestures that enhance or illustrate verbal messages they
accompany, e.g., when referring to something to the right you may gesture toward
the right. Illustrators are often used when pointing to objects or communicate the
shape or size of objects you’re talking about. Therefore, most often you illustrate
with your hands, but you can also illustrate with head and general body movements,
e.g., you turn your head or your entire body toward the right. Affect displays are
gestures of the face (such as smiling or frowning) but also of the hands and general
body (e.g., body tension or relaxation) that communicate emotional meaning. Affect
displays are often unconscious when, for example you smile or frown without
awareness. Sometimes, however, you may frown more than you smile consciously,
trying to convey your disapproval or deceit. Regulators are nonverbal behaviors
328 7 Towards AmI Systems Capable …
implications for the functioning of context-aware systems in the sense that they may
become unreliable when they fail to react the way they are supposed to. Otherwise,
as research suggests, a universal gesturing language must be created and taught in
order for context-aware, affective and conversational computing to work. Indeed,
some joint research endeavors (see, e.g., Vilhjálmsson 2009) are being undertaken
to define and build a universal nonverbal communication framework as a part of the
ongoing research in the area of conversational systems—modeling of human
multimodal nonverbal communication behavior. However, this solution may not be
as robust as interactive systems that can adapt and respond dynamically to each
user, context, and interaction; hence, the issue of unviability or unworkability of
new technologies becomes likely. There is no such thing as a one-size-fits-all
solution for the diversity of users and interactions. Indeed, cross-cultural HCI has
emerged to respond to a need brought up by the inevitability of embedding
‘culturability’ in global ICT design. Even in human-to-human communication,
people are becoming increasingly aware of cultural variations and thus culturally
sensitive when using gestures in foreign countries. A discrepancy in the shared
knowledge of gestures may lead to communication difficulties and misunder-
standings as probably most people have experienced in everyday life. Different
cultures use different symbols to mean the same thing or use the same symbol to
mean different things. Among the main hurdles in implementing emotional and
social models of context as well as models of nonverbal communication behavior is
the meaningful interpretation of data collected implicitly from the users’ nonverbal
communication behaviors. More research is needed to investigate the implications
of sociocultural contexts in interpreting nonverbal communication behavior.
Gesture recognition is recognized as one of the most important things to design
effective emotional context-aware systems and emotion-aware conversational
agents. Whether be it of a gestural, facial or corporal nature, nonverbal commu-
nication behavior serves as significant channels to convey emotions between con-
versational participants as well as emotional context information, which normally
influence the patterns of conversational acts. Detection of emotion will rely upon
assessment of multimodal input, including gestural, facial and body movement
(Gunes and Piccardi 2005; Kapur et al. 2005). Culturally nuanced variations in
gestures presume the use of different modes/modalities rather than relying solely
upon one mode to shun ineffective or erroneous interpretation of affective
information.
As an explicit affect display, facial expressions are highly informative about the
affective or states of people, as they are associated with expressing emotional
reactions. The face is so visible that conversational participants can interpret a great
deal from the faces of each other. Facial expressions can be important for both the
speaker and the listener in the sense of allowing the listener to infer the speaker’s
emotional stance to their utterances, and the speaker to determine the listener’s
reactions to what is being uttered. Particularly, the listener/recipient relies heavily
on the facial expressions of the speaker/sender as a better indicator of what he/she
intends to convey as feelings, and therefore monitor facial expressions constantly as
they change during interaction. Facial cues can constitute communicative acts,
comparable to ‘speech acts’ directed at one or more interaction partner
(Bänninger-Huber 1992). Pantic and Rothkrantz’s (2003) findings indicate that
when engaged in conversation the listener determines whether they are liked or
disliked by relying primarily upon facial expression followed by vocal intonation,
with the spoken words or utterances being of minor significance. In line with this,
when visual information conveyed by facial expressions is in conflict with verbal
information, people tend to trust visual information (Short et al. 1976).
Facial expressions communicate various emotional displays irrespective of
cultural variations. Ekman and Friesen (1969) and Ekman (1982) identify six
universal facial displays: happiness, anger, disgust, sadness, fear, and surprise, and
show that are expressed and interpreted in the similar way by people regardless of
their culture. In terms of conversational agents, the six universal facial expressions
occur in Cassell (1989), an embodied conversation that integrates both facial and
gestural expressions into automatic spoken dialog systems. However, while based
on classic psychological theory, those six basic emotions are universally displayed
7.3 Human Communication 331
and recognized, more recent work argues that the expression of emotions is cul-
turally dependent and that emotions cannot be so easily categorized (Pantic and
Rothkrantz 2003). Similar to gestures, cultural variations are also applicable to
facial expressions as different cultures may assign different meanings to different
facial expressions, e.g., a smile as a facial display can be considered a friendly
gesture in one culture while it can signal embarrassment or even regarded as
insulting in another culture. Hence, to achieve a wide adoption and ease social
acceptance of AmI, it is critical to account for cultural variations in facial expres-
sions when designing AmI systems (context-aware systems, affective systems, and
conversational systems. Failure to recognize and account for differences in cul-
turally based, facially expressed emotions may have implications for the perfor-
mance of AmI systems. The lab-based defined metrics to evaluate how well
technologies perform may be inessential to the real-world instantiations of AmI
systems in different operating environments. In other words, what is technical
feasible and risk-free within the lab may have implications in the real-world
environment. For a detailed discussion on facial expressions, e.g., unsettled issues
concerning their universality, see next chapter.
As mentioned above, facial gestures serve as means to regulate talking, that is, to
monitor, control, coordinate, or maintain the speaking. Thus, they are of pertinence
and applicability to conversational systems. As a form of nonverbal communica-
tion, a facial gesture is ‘made with the face or head used continuously in combi-
nation with or instead of verbal communication’ (Zoric et al. 2009). Considerable
research (e.g., Chovil 1991; Fridlund et al. 1987; Graf et al. 2002) has been done on
facial gestures, e.g., head movements, eyebrow movements, eye gaze directions,
eye blinks, frowning and so on. Knapp and Hall (1997, 2007) identify six general
ways in which nonverbal communication (involving facial gestures, prosodic pat-
terns or hand gestures) blends with verbal communication, illustrating the wide
variety of meta-communication functions that nonverbal messages may serve to
accentuate, complement, contradict, regulate, repeat, or substitute for other mes-
sages. To accentuate is when you use nonverbal movement like raising your voice
tonality to underscore some parts of the verbal message, e.g., a particular phrase; to
complement is when you add nuances of meaning not communicated by verbal
message, e.g., a head nod to mark disapproval; to contradict is when your verbal
message is not congruent with your nonverbal gestures, e.g., crossing your fingers
to indicate that you’re lying; to regulate or control the flow of verbal messages, e.g.,
making hand gestures to indicate that you want to speak or put up your hand to
indicate that you’ve not finished and are not ready to relinquish the floor to the next
speaker; to repeat or restate the verbal message nonverbally, e.g., you motion with
your head or hand to repeat your verbal message; and finally, to substitute for or
take the place of verbal messages, e.g., you can nod your head to indicate ‘yes’ or
shake your head to indicate ‘no’. Likewise, some of Knapp and Hall’s (1997, 2007)
332 7 Towards AmI Systems Capable …
general ways involve hand gestures, but they are presented in this section for the
purpose of coherence.
In everyday communication humans employ facial gestures (e.g., head move-
ment, eyebrow movement, blinking, eye gaze, frowning, smiling, etc.) consciously
or unconsciously to regulate flow of speech, punctuate speech pauses, or accentuate
words/segments (Ekman and Friesen 1969). In this context, Pelachaud et al. (1996)
distinguish several roles of facial gestures:
• Conversational signals—facial gestures in this category include eyebrow
actions, rapid head movements, gaze directions, and eye blinks, and these occur
on accented items clarifying and supporting what is being said.
• Punctuators—facial gestures in this category involve specific head motions,
blinks, or eyebrow actions, and these gestures support pauses by grouping or
separating sequences of words.
• Manipulators—involve facial gestures that correspond to the biological needs of
a face and have nothing to do with the linguistic utterances, e.g., blinking to wet
the eyes or random head nods.
• Regulators—correspond to facial gestures that control the flow of conversation
(e.g., turn-taking, turn-yielding, and feedback-request), and these gestures
include eye gaze, eye-contact, and eyebrow actions. Speakers look at listeners
and raise their eyebrows when they want feedback and listeners raise eyebrows
in response (Chovil 1991). Emphasis generally involves raising or lowering of
the eyebrows (Argyle et al. 1973).
Which of these facial gestures can be implemented or applied to ECA systems is
contingent upon whether the ECA acts as a presenter or is involved in a face-to-face
conversation—a believable virtual human. For example, the work of Zoric et al.
(2009) deals with ECAs that act only as presenters so only the first three roles are
applicable for ECAs. Accordingly, the features included in the current version of
their system are: head and eyebrow movements and blinking during speech pauses;
eye blinking as manipulators; and amplitude of facial gestures dependent on speech
intensity. This system is described and illustrated in the end of this chapter.
eye contact, when you catch someone’s eye you become psychologically close
though physically far apart (De Vito 2002). Additionally, eye gaze plays a role in
reference to objects or events, and a critical aspect of conversational content
coordination is the ability to achieve joint reference. People excel at determining
where others are looking (Watt 1995). Gaze serves to coordinate the joint attention
of conversational participants as to an object or event by referring to it by pointing
gestures. Joint attention to an object or event also allows participants greater
flexibility in how they verbally refer to it, whether they resort to pointing gestures
or not (Clark and Marshall 1981). All in all, eye gaze is a powerful form of
nonverbal communication and a key aspect of social communication and interac-
tion. Therefore, eye movement is of high relevance to HCI as to designing con-
versational systems as well as context-aware and affective systems for what it
entails in terms of conveying emotional cues, indicating cognitive processes, and
having conversational functions.
between prosody and facial gestures (and expressions). The information extracted
from speech prosody is essential for generating facial gestures by analyzing natural
speech in real-time (Zoric et al. 2009). Prosody is crucial in spoken communication
as illustrated by an example from Truss (2003). In this example, punctuation rep-
resents the written equivalent of prosody. Although they have completely different
meanings and are pronounced differently, the two sentences below correspond to
exactly the same segmental content:
A woman, without her man, is nothing.
A woman: without her, man is nothing.
Significant differences in meaning are easily communicated depending on where
the speaker places the stress in a given sentence. Each sentence with a stress on a
given word (or a combination of two or more) may communicate something dif-
ferent, or each asks a different question if the sentence is in a form of a question,
even though the words are exactly the same. That is to say, all that distinguishes the
sentences is stress, the way they are uttered.
As a subfield of linguistics, grammar refers to the set of structural rules and prin-
ciples governing the composition of words, phrases, and sentences or the assembly
of various elements into meaningful sentences, in any given natural language. There
are several competing theories and models for the organization of words into
sentences. The same goes for ‘generative grammar’ (Chomsky 1965). Based on the
underlying premise that all humans have an internal capacity to acquire language,
Chomsky’s perspective of language learning implies that the ability to learn,
understand, and analyze linguistic information is innate (Rowe and Levine 2006).
Chomsky regards grammatical competence to be innate because one will still be
able to apply it in an infinite number of unheard examples without having to be
trained to develop it (Phillips and Tan 2010). It is argued that grammatical com-
petence defines an innate knowledge of rules because grammar is represented
mentally and manifested based on the individuals’ own understanding of acceptable
usage in a given language idiom. It is worth pointing out that the subtler sorts of
grammatical differences in languages and the fact the grammar of any language is
7.3 Human Communication 339
highly complex and defies exhaustive treatment may pose challenges for building a
universal grammar framework that can be used in conversational systems.
The term ‘generative grammar’ (Chomsky 1965) is used to describe a finite set
of rules that can be applied to or hypothetically generate an infinitive number (or all
kinds) of sentences precisely those that are grammatical in a given language and no
other. This description is provided by Chomsky (1957) who coined and popularized
the term. It is most widely used in the literature on linguistics. In Chomsky’s (1965)
own words: ‘…by a generative grammar I mean simply a system of rules that in
some explicit and well-defined way [generates or] assigns structural descriptions to
sentences.’ The idea of the ‘creative’ aspect of language and that a grammar must
be existent to describe the process that makes a language possible to ‘make infinite
use of finite means, is advocated by Wilhelm von Humboldt who is one of the key
figures quoted by Chomsky as a spark for his ideas (Chomsky 1965). René
Descartes is also a major influence on Chomsky and whose concern with the
creative powers of the mind led him to regard natural language as an instrument of
thought (Phillips and Tan 2010).
Literature shows that the term ‘generative grammar’ is used in multiple ways. It
refers, in theoretical linguistics, to a particular (Chomskian) approach to the study
of syntax. A generative grammar of a language attempts to provide a set of rules
that will, in addition to predicting the morphology of a sentence according to some
approaches to generative grammar, will correctly predict which combinations of
words will form grammatical sentences. Linguists working in the generativist tra-
dition claim that competence is the only level of language that is studied, as this
level gives insights into the universal grammar, a theory credited to Noam
Chomsky which suggests that there are properties that all possible natural languages
have and that some rules of grammar are hard-wired into the brain and manifest
without being taught. This is however still a subject of a heated debate in terms of
the argument of whether there is such thing and that the properties of a generative
grammar arise from an ‘innate’ universal grammar. Generative grammar also relates
to psycholinguistics in that it focuses on the biological basis for the acquisition and
use of human language. Indeed, Chomsky’s emphasis on linguistic competence
greatly spurred the development of psycholinguistics as well as neuro-linguistics. It
moreover distinguishes between linguistic performance, the production and com-
prehension of speech (see below for detail), and linguistic competence, the
knowledge of language, which is represented by mental grammar—the form of
language representation in the mind. Furthermore, given the fact that generative
grammar characterizes sentences as either grammatically well-formed or not and the
algorithmic nature of the functioning of its rules to predict grammaticality as a
discrete result, it is of high relevance to computational linguistics and thus con-
versation systems. But using theoretical models of generative grammar in modeling
natural language may be associated with the issue of standardization, as there are a
number of competing versions of or approaches to generative grammar currently
practiced within linguistics, including, minimalistic program, lexical functional
grammar, categorical grammar, relational grammar, tree-adjoining grammar,
head-driven phrase structure grammar, and so forth. They all share the common
340 7 Towards AmI Systems Capable …
goal and endeavor to develop a set of principles that account for the well-formed
natural language expression.
However, the knowledge of, and the ability to use, the grammatical rules of a
language to understand and convey meaning by producing and recognizing
well-formed sentences in accordance with these grammatical principles is what
defines grammatical competence. According to Chomsky (1965), competence is the
‘ideal’ language system that enables speakers to understand and generate an infinite
number (all kinds) of sentences in their language and to distinguish grammatical
from ungrammatical sentences. Grammatical competence involves two distinctive
components: morphology (word forms) and syntax (sentence structure).
Morphology is concerned with the internal structure of words and their formation,
identification, modification, and analysis into morphemes (roots, infixes, prefixes,
suffixes, inflexional affixes, etc.). Morphological typology represents a method for
categorizing languages that clusters them according to their common morphological
structures, i.e., on the basis of how morphemes are used in a language or how
languages form words by combining morphemes, e.g., fusional language, a type of
synthetic language which tends to overlay many morphemes to denote syntactic or
semantic change, uses bound morphemes, affixes, prefixes, suffixes, infixes,
including: word-forming affixes. Accordingly, morphological competence is the
ability to form, identify, modify, and analyze words. On the other hand, syntax is
concerned with the patterns which dictate how words are combined to form sen-
tences. Specifically, it deals with the organization of words into sentences in terms
of a set of rules associated with grammatical elements (e.g., morphs, morphemes-
roots, words), categories (e.g., case and gender; concrete/abstract; (in)transitive and
active/passive voice; past/present/future tense; progressive, perfect, and imperfect
aspect) classes (e.g., conjugations, declensions, open and closed word classes),
structures (compound and complex words and sentences, phrases, clauses), pro-
cesses (e.g., transposition, affixation, nominalization, transformation, gradation),
relations (e.g., concord, valency, government) (Council of Europe 2000).
Accordingly, syntactic competence is the ability to organize sentences to convey
meaning.
phonetics goes beyond audible sounds as entailing what happens in the mouth,
throat, nasal cavities, and lungs (respiration) in order to produce sounds of language
to include cognitive aspects associated with the perception of speech sounds. As to
phonetic competence, it is of three-dimensional nature: articulatory, auditory, and
acoustic. It entails, specifically, the knowledge of and the skill in the production,
perception, and transmission of the sounds of speech, phonemes, words, and sen-
tences: the distinctive features of phonemes, such as voicing, rounding, articulation,
accent, nasalization, and labialization; the phonetic composition of words in terms
of the sequence of phonemes and word stress and tones; and other sounds relating
to prosodic features of speech, which cannot be extracted from the characteristics of
phoneme segments, including pause, intonation/pitch, intensity, rhythm, fluctu-
ation, spectral slope, syllable length, the formant frequencies of speech sounds, and
so on. In the context of conversational agents, grammatical, semantic, pragmatic
and sociocultural dimensions of spoken language are treated as levels of linguistic
context.
Phonology is the subfield of linguistics that deals with the systematic use of
sounds to encode meaning in any spoken human language (Clark et al. 2007). It
entails the way sounds function within and across languages and the meaning
behind it. Sounds as abstract units are assumed as levels of language to structure
sound for conveying linguistic meaning. Phonology has traditionally centered lar-
gely on investigating the systems of phonemes. As a basic unit of a language’s
phonology, a phoneme can be combined with other phonemes to form meaningful
units such as words or morphemes (the smallest grammatical unit in a language);
the main difference between the two is that a word is freestanding, comprising one
or more morphemes, whereas a morpheme may or may not stand alone. Moreover,
as the smallest contrastive linguistic unit, one phoneme in a word may bring about a
change of meaning, e.g., the difference in meaning between the words tax and tag is
a result of the exchange of the phoneme /x/ for the phoneme /g/. However, just as a
language has morphology and syntax, it has phonology—phonemes, morphemes,
and words as sound units and their mental representation. In all, phonology deals
with the mental organization of physical sounds and the patterns formed by sound
combinations and restrictions on them within languages. Phonology is concerned
with sounds and gestures as abstract units (e.g., features, phonemes, onset and
rhyme, mora, syllables, articulatory gestures, articulatory features, etc.), and their
conditioned variations through, for example, allophonic rules, constraints, or der-
ivational rules (Kingston 2007). For example, phonemes constitute an abstract
underlying representation for morphemes or words, while speech sounds (phones)
make up the corresponding phonetic realizations. Allophones entail the different
speech sounds that constitute realizations of the same phoneme, separately or in a
given morpheme or word, which are perceived as equivalent to each other in a
given language. Allophonic variations may be conditioned, i.e., a phoneme can be
realized as an allophone in a particular phonological environment—distributional
variants of a single phoneme. And as far as phonological competence is concerned,
it involves the knowledge of, and the skill in, the use of sound-units to encode
7.3 Human Communication 343
used and the effects of language usage on society. There exist several relationships
between language and society, including: ‘social structure may either influence or
determine linguistic structure and/or behavior…’, ‘linguistic structure and/or
behavior may either influence or determine social structure…’; and ‘the influence is
bidirectional: language and society may influence each other…’ (Wardhaugh 2005).
In relation to the first relationship, which appears to be the most at work and
prevalent in almost all societies, language expresses, according to Lippi-Green
(1997, p. 31), the ‘way individuals situate themselves in relationship to others, the
way they group themselves, the powers they claim for themselves and the powers
they stipulate to others.’ People tend to position (express or create a representation
of) themselves in relation to others with whom they are interacting by using
(choosing) specific linguistic forms in utterances that convey social information.
A single utterance can reveal an utterer’s background, social class, or even social
intent, i.e., whether he/she wants to appear distant or friendly, deferential or familiar,
inferior or superior (Gumperz 1968). According to Romaine (1994, p. 19), what
renders ‘a particular way of speaking to be perceived as superior is the fact that it is
used by the powerful’. In all, linguistic choices carry social information about the
utterer, as they are made in accordance with the orderings of society. Accordingly
Gumperz (1968, p. 220) argues that the ‘communication of social information
presupposes the existence of regular relationships between language usage and
social structure’. Given this relationship between language and society, the linguistic
varieties utilized by different groups of people (speech communities) on the basis of
different social variables (e.g., status, education, religion, ethnicity, age, gender)
form a system that corresponds to the structure of society and adherence to socio-
cultural norms is used to categorize individuals into different social classes. Each
speech community ascribes social values to specific linguistic forms in correlation
with which a group uses those forms. Gumperz (1968) provides a definition of
speech community: ‘any human aggregate characterized by regular and frequent
interaction by means of a shared body of verbal signs’, where the human aggregate
can be described as any group of people that shares some common attribute such as
region, race, ethnicity, gender, occupation, religion, age, and so on; interaction
denotes ‘a social process in which utterances are selected in accordance with socially
recognized norms and expectations’, and the ‘shared body of verbal signs’ is
described as the set of ‘rules for one or more linguistic codes and…for the ways of
speaking’ that develop as a ‘consequence of regular participation in overlapping
networks.’ It is to note that these rules of language choice vary based on situation,
role of speakers, relationship between speakers, place, time, and so forth. Moreover,
William Labov is noted for introducing the study of language variation (Paolillo
2002), which is concerned with social constraints that determine language in its
contextual environment. The use of language varieties in different social situations is
referred to as code-switching. Varieties of language associated with specific regions
or ethnicities may, in many societies, be singled out for stigmatization because their
users are situated lower in the social hierarchy. Lippi-Green (1997) writes on the
tendency of the powerful to ‘exploit linguistic variation…in order to send complex
messages’ about the way groups are ranked or placed in society. Language variation
7.3 Human Communication 345
impacts on communication styles and daily lives of people as well as on the way they
communicate at intercultural and cross-cultural levels. Understanding sociocultural
dimension of language is important towards intercultural and cross-cultural com-
munication, since language usage varies among social classes and from place to
place. Second language learners must learn how ‘to produce and understand lan-
guage in different sociolinguistic contexts, taking into consideration such factors as
the status of participants, the purposes of interactions, and the norms or conventions
of interactions.’ (Freeman and Freeman 2004). Learning and practice opportunities
for language learners should include expressing attitudes, conveying emotions,
inferring emotional stances, understanding formal versus informal, and recognizing
idiomatic expressions.
Furthermore, sociolinguistics draws on linguistics, sociology, and anthropology.
Sociolinguists (or dialectologists) study the grammar, semantics, phonetics, pho-
nology, lexicon, and other aspects of social class dialects. Sociology of language
focuses on the effects of language on the society, which is one focus of sociolin-
guistics. Sociolinguistics is closely related to linguistic anthropology (the inter-
disciplinary study of how language influences social life) and the distinction
between these two interdisciplinary fields has even been questioned recently
(Gumperz and Cook-Gumperz 2008).
Sociolinguistic competence deals with the knowledge of social conventions
(norms governing relations between genders, generations, classes, social groups,
and ethnic groups) as well as behaviors, attitudes, values, prejudices, and prefer-
ences of different speech communities, which is necessary to understand socio-
cultural dimensions of language and thus use it in different sociolinguistic contexts.
In specific terms, sociolinguistic competence involves the ability to use language in
different communicative social situations—that is, to know and understand how to
speak given the circumstances one is in, as well as to distinguish between language
varieties on the basis of different social variables. According to Council of Europe
(2000), the matters taken up in sociolinguistic competence in relation to language
usage include: politeness conventions (e.g., impoliteness, positive politeness,
negative politeness); register differences (e.g., formal, neutral, informal, familiar);
dialect and accent (social class, ethnicity, national origin); linguistic markers of
social relations (e.g., use and choice of greetings, address forms and expletives;
conventions for turn-taking, offering, yielding, and keeping), and expressions of
wisdom (e.g., proverbs, idioms). For a detailed account of these matters with
illustrative examples, the reader is directed to Council of Europe (2000), the
Common European Framework of reference for Languages.
Pragmatics is the subfield of linguistics that studies the use of language in
contexts or the ways in which context contributes to meaning—in other words, how
people comprehend and produce communicative acts in a concrete speech situation.
Pragmatics emphasizes what might not be explicitly stated and the way people
interpret utterances in situational contexts. In relation to conversation analysis,
pragmatics distinguishes two intents or meanings in each communicative or speech
act: (1) the informative intent or the utterance meaning, and (2) the communicative
intent or speaker meaning (Leech 1983; Sperber and Wilson 1986). Pragmatics is
346 7 Towards AmI Systems Capable …
‘concerned not so much with the sense of what is said as with its force, that is, with
what is communicated by the manner and style of an utterance.’ (Finch 2000). In
other words, it deals with how the transmission of meaning depends not so much on
the explicit linguistic knowledge (e.g., grammar, semantics, lexicon, etc.) of the
speaker as on the inferred intent of the speaker or the situational context of the
utterance. Overall, pragmatics encompasses talk in interaction, speech act theory,
conversational implicature (the things that are communicated though not explicitly
expressed), in addition to other approaches to language behavior in linguistics,
sociology, philosophy and anthropology (Mey 1993).
Pragmatics competence is a key component of communicative language compe-
tence. It entails the knowledge of, and the skill in, the interpretation of (the meaning
of) utterances in situational contexts. The ability to understand the speaker’s intended
meaning is called pragmatic competence (Takimoto 2008; Koike 1989). In this sense,
pragmatic competence provide language users with effective means to overcome
ambiguities in speech communication given that meaning of utterances or enacted
through speech relies on such contextual factors as place, time, manner, style, situ-
ation, the type of conversation, and relationship between speakers, or that meaning
can be inferred based on logical relations such as entailment, presupposition, and
implicature. Therefore, pragmatic competence entails that language users use lin-
guistic resources to produce speech acts or perform communication functions, have
command of discourse, cohesion and coherence, identify speech types, recognize
idiomatic expressions and sarcasm, and be sensitive to social and cultural environ-
ments. According to Council of Europe (2000), pragmatic competences involves
discourse competence, functional competence, and design competence, that is, it
deals with the language user’s knowledge of the principles according to which
messages are, respectively, organized, structured and arranged; used to perform
communicative functions; and sequenced according to interactional schemata.
Discourse competence, which is the ability to arrange and sequence statements to
produce coherent units of language, involves knowledge of, and ability to control, the
ordering of sentences with reference to topic/focus, given/new, and natural
sequencing (e.g., temporal); cause/effect (invertible); ability to structure and manage
discourse in terms of: thematic organization, coherence and cohesion, rhetorical
effectiveness, logical ordering, style and register; and so on. Discourse refers to a set
of statements that provide a language for talking within some thematic area.
Functional competence is, on the other hand, concerned with the use of utterances and
spoken discourse in communication for functional purposes; it is the ability to use
linguistic resources to perform communicative functions. It involves micro-
functions, macro-functions, and interaction schemata. Micro-functions entails cate-
gories for the functional use of single utterances, including ‘imparting and seeking
factual information: identifying, reporting, correcting, asking, answering; expressing
and finding out attitudes: factual (agreement/disagreement), knowledge (knowledge/
ignorance, remembering, forgetting, probability, certainty), modality (obligations,
necessity, ability, permission), volition (wants, desires, intentions, preference),
emotions (pleasure/displeasure, likes/dislikes, satisfaction, interest, surprise, hope,
disappointment, fear, worry, gratitude), moral (apologies, approval, regret,
7.3 Human Communication 347
Linguistic performance entails the act of carrying out speech communication or the
production of a set of specific utterances by native speakers. It is a concept that was
first coined by Chomsky (1965) as part of the foundations for his theory of
transformational generative grammar (see above). It is said that linguistic perfor-
mance reflects the intrinsic sound-meaning connections established by language
systems, e.g., phonology, phonetics, syntax, and semantics, and involves extra-
linguistic beliefs pertaining to the utterer, including attitude, physical well-being,
mnemonic skills, encyclopedic knowledge, absence of stress, and concentration.
348 7 Towards AmI Systems Capable …
(Ibid). Cognitive linguistics argue that language is embedded in the experiences and
environments of its users, and knowledge of linguistic phenomena is essentially
conceptual in nature; they view meaning in terms of conceptualization—i.e., mental
spaces instead of models of the world; they assert that the cognitive processes of
storing and retrieving linguistic knowledge are not significantly different from those
associated with other knowledge, and that similar cognitive abilities are employed
in the use of language in understanding as those used in other non-linguistic tasks;
and they deny that human linguistic ability (although part of it is innate) is separate
from the rest of cognition—that is, linguistic knowledge is intertwined with all
other cognitive processes and structures, not an autonomous cognitive faculty with
processes and structures of its own (see, e.g., Geeraerts and Cuyckens 2007; Croft
and Cruse 2004; Vyvyan and Green 2006; Vyvyan 2007; Vyvyan et al. 2007).
Moreover, aspects of cognition that are of interest to cognitive linguists include
conceptual metaphor and conceptual blending; cognitive grammar, conceptual
organization (categorization, metonymy, and frame semantics); gesture (nonverbal
communication behaviors); cultural linguistics; and pragmatics.
organize and uncover much of the information about any language that would
otherwise be still hidden under the vastness and infinite richness of data within that
language—incalculability. Structural linguistics approach aims to understand the
structure of language, using computational approaches, e.g., large linguistic corpora
like the ‘Penn Treebank’ (Marcus et al. 1993). This is to grasp how the language
functions on a structural level, so to create better computational models of lan-
guage. Information about the structural data of language allows for the discovery
and implementation of similarity recognition between pairs of utterances (Angus
et al. 2012). While information regarding the structural data of a language can be
available for any language, there are differing patterns as to some aspects of the
structure of sentences. This usually constitutes sort of intriguing information which
computational linguistics is aimed to uncover and which could lead to further
important discoveries regarding the underlying structure of some languages.
Different grammatical models can be employed for the parsing and generation of
sentences. As a subspecialty of computational linguistics, parsing and generation
deal with taking language apart and putting it together. Computational approaches
allow scientists not only to parse huge amounts of data reliably and efficiently and
generate grammatical structures, but also to generate the possibility for important
discoveries, depending on the natural features of a language.
Linguistic production approach involves how a computer system generates or
produces language, an area in which computational linguistics has made some
fascinating discoveries and remarkable progress. The production of language is a
key feature of AmI systems and ECAs, where a computer system receives speech
signals and to respond to them in a human-like manner. This makes computer
system capable of thought and a human-like interactive system when it becomes
difficult for the subject to differentiate between the human and the computer. This
was proposed some six decades ago by Turing (1950) whose ideas remain
influential in the area of AI. The ELIZA program, which was devised by Joseph
Weizenbaum at MIT in 1966, is one of the very early attempts to design a computer
program that can converse naturally with humans. While the program seemed to be
able to understand what was uttered to it and to respond intelligently to written
statements and questions posed by a user, it only comprehended a few keywords in
each sentence and no more using a pattern matching routine (Weizenbaum 1966).
Nevertheless, the research in this domain has significantly improved, giving rise to
more sophisticated conversational systems. The computational methods used in the
production (and comprehension) of language have become matured and hence the
results generated by computational linguists more enlightening. Specific to com-
putational linguistics, current work in developing conversational agents shows how
new machine learning techniques (supervised learning algorithms and models) have
been instrumental in improving the computational understanding of language, how
speech signals are perceived and analyzed and generated and realized by computer
systems. This work adds to the endeavor towards making computers understand and
produce language in a more naturalistic manner. In this line of thinking, there exist
some specialized algorithms which are capable of modifying a system’s style of
production (speech generation) based on linguistic input from a human or on any of
354 7 Towards AmI Systems Capable …
the five dimensions of personality (Mairesse 2011). This work and others notable
ones (see below) use computational modeling approaches that aim at making HCI
much more natural.
Linguistic comprehension approach concerns how a computer system under-
stands language, recognize, interpret and reason about speech signals. There is a
proliferation of application domains of language comprehension that modern
computational linguistics entails, including research engines, e-learning/education,
e-health, automated customer service, activities of daily living (ADL), and con-
versational agents. The ability to create a software agent/program capable of
understanding human language has many broad possibilities, especially in relation
to the emerging paradigm of AmI, one of which is enabling human users to engage
in intelligent dialog or mingle socially with computer systems. Language perception
(speech analysis) involves the use of various types of pattern recognition algorithms
that fall under supervised machine learning methods, including Support Vector
Machine (SVM), neural network, dynamic and naive Bayes network, and Hidden
Markov Models (HMMs). Early work in language comprehension applied Bayesian
statistics to optical character recognition, as demonstrated by Bledsoe and Browing
(1959). An initial approach to applying signal modeling to language (where
unknown speech signals are analyzed or processed to look for patterns and to make
predictions based on their history) was achieved with the application of HMMs as
described by Rabiner (1989). This and other early attempts to understand spoken
language were grounded in work carried out in the 1970s. Indeed, similar
approaches to applying signal modeling to language were employed in early
attempts at speech recognition in the late 1970s using part-of-speech pair proba-
bilities (Bahl et al. 1978). More endeavors to build conversational agents since the
late 70s up till now are cited below.
non-native speaker. Indeed, when conversing with foreigners, people seem more
often to relay on visual modality to ease their understanding of speech and thus
conversational content. Furthermore, some views argue for visual-auditory nature
of the perception of speech on the basis that visual modality carries complementary
information. Indeed, research shows that the role of visual modality when humans
perceive speech goes beyond just serving as redundant information—when part of
the missing auditory information can be recovered by vision. Dohen (2009)
maintains that the role of vision in the perception of speech is not just that of a
backup channel or that the visual information overlays the auditory one, auditory
and visual information are in fact fused for perceptual decision. Also, McGurk and
MacDonald’s (1976) effect, which illustrates that a [da] percept results from an
audio [ba] dubbed onto a visual [ga], also demonstrates that there is more to vision
than just providing redundant information. As demonstrated by Summerfield
(1987), perceptual confusions between consonants differ one from another and
complement one another in the visual and the auditory modalities. To further
support the argument that visual information is not only of a redundant nature in
speech perception, Boë et al. (2000) point out the [m]/[n] contrast which exists in
more or less all the languages in the world is not audible but visible.
Speaking entails producing gestures of paralinguistic as well as phonetic nature
that are intended to be heard and seen. The multimodal nature of speech perception
involves, in other words, segmental to include supra-segmental perception of
speech. Dohen (2009) points out that the production of prosodic information (e.g.,
prosodic focus) involves visible articulatory correlates that are perceived visually,
and adds that it is possible to put forward an auditory-visual fusion when the
acoustic prosodic information is degraded so to enhance speech perception. The
acoustic correlates of prosodic focus have been widely investigated (Dahan and
Bernard 1996; Dohen and Loevenbruck 2004). While prosody was for a long time
uniquely considered as acoustic/auditory, recent studies carried out in the lab (Graf
et al. 2002; Dohen and Loevenbruck 2004, 2005; Dohen et al. 2004, 2006; Beskow
et al. 2006) have demonstrated that prosody has also potentially visible correlates
(articulatory or other facial correlates) (Dohen 2009).
From a psycholinguistics perspective, there are three mechanisms that underlie
language perception: language signal, operations of neuropsychological system,
and language system (Garman 1990). Operations of neuropsychological system
determine how language signals (spoken utterances) are perceived and generated,
which is a process that involves auditory pathways from sensory organs to the
central processing areas of the brain, and language system involves silent verbal
reasoning, contemplation of language knowledge. Drawing upon different psy-
cholinguistic models, there are characteristic cognitive processes that underlie the
fusion of the auditory and visual information in speech perception. Schwartz et al.
(1998) and Robert-Ribes (1995) analyzed the fusion models in the literature and
presented four main potential fusion architectures, as summarized in Fig. 7.1 by
Dohen (2009, p. 26):
7.5 Speech Perception and Production … 357
Fig. 7.1 The four main types of auditory-visual fusion models. Source Schwartz et al. (1998) and
Robert-Ribes (1995)
• Direct Identification (DI): the auditory and visual channels are directly
compiled.
• Separate Identification (SI): the phonetic classification is operated separately
on both channels and fusion occurs after this separate identification. Fusion is
therefore relatively late and decisional.
• Recoding in the dominating modality (RD): the auditory modality is con-
sidered to be dominant and the visual channel is recoded under a compatible
format to that of the auditory representations. This is an early fusion process.
• Recoding in the motor modality (RM): the main articulatory characteristics are
estimated using the auditory and visual information. These are then fed to a
classification process. This corresponds to an early fusion.
Dohen (2009) reviews a number of studies on fusion models and draws different
conclusions and suggestions: the DI and SI models are easier to implement; visual
attention can modulate audiovisual speech perception; there are strong
inter-individual variations as well as inter-linguistic differences; the RD and RM
models seem to be more likely to reflect the cognitive processes underlying
auditory-visual fusion; and several behavioral studies provide valuable information
on the fusion process, adding to the role of vision in understanding speech in noise,
358 7 Towards AmI Systems Capable …
Speech is inherently verbal and gestural. Humans use a wide variety of articulatory,
facial and hand gestures when speaking. For example, gestural movements range
from simple actions of using the hand to point at objects to the more complex
actions that allow communication with others. There is also coordination between
speech and gestures, that is, our hand move along with orofacial articulatory
gestures.
7.5 Speech Perception and Production … 359
the delay measured for vocal responses. As explained by Dohen (2009, p. 32) ‘this
delay could simply be due to coordination requirements: the vocal and gestural
responses would have to be synchronized at some point and when a gesture is
produced at the same time as speech, speech would wait in order for the synchrony
to be achieved.’
There is a set of contextual elements that surround and influence spoken language,
including linguistic (e.g., syntactic, semantic), pragmatic, sociolinguistic, and
extra-linguistic. Lyons (1968) describes several linguistic situations which appear
on different levels and in which the context should be used. For example, on the
syntactic level, a word can have multiple lexical categories (e.g., verb, noun), and
thus the context of the respective word formed by the surrounding words has to be
used to determine the exact lexical class of the word, whether it is a verb or a noun.
For example, a word like look can be a noun (e.g., Please have a look) or a verb
(e.g., I will look at it soon). A similar thing occurs on a semantic level—denotata of
words and sentences. The meaning of a single word and, on an even high level, the
grammatical mood of a sentence depends on the context. An utterance classified as
declarative may become imperative, ironic, or express other meanings under the
right circumstances depending on how it is conversed. This relates to prosodic
features associated with spoken utterances, with what tone of voice it was uttered,
which involves a whole set of variations in the characteristics of voice dynamics:
volume, tempo, pitch, speed, rhythm, intensity, fluctuation, continuity, and so on.
Indeed, prosody is used to nuance meaning and thus reflects various features of
utterances: the form pertaining to statement, question, or command or other aspects
of language that may not be grammatically or lexically encoded in the spoken
utterances. Prosody may facilitate lexical and syntactic processing and express
feelings and attitudes (Karpinski 2009). As Lyons (1977) states: ‘…a speaker will
tend to speak more loudly and at an unusually high pitch when he is excited or
angry (or, in certain situations, when he is merely simulating anger…’. In nonverbal
communication parlance, the use of paralanguage serves primarily to change
meaning or convey emotions.
Other contexts that are considered when it comes to spoken language include
sociocultural, historical, pragmatic, and extra-linguistic, and so on. Halliday and
Hasan (1976) consider various aspects of what they label the context of situation in
terms of the environment in which discourse is situated. This environment is used to
put the (spoken or written) text into perspective. For example, a set of non-cohesive
sentences might become not so, that is, understood correctly as a coherent passage
of discourse under a particular set of contextual elements—the context of situation.
According to the authors, three different components can be used to describe the
context of situation: the field, the tenor, and the mode. The field is the current topic
under discussion (dialog context); the tenor entails the knowledge about and the
relationship between the participants in the discourse; and the mode is about the
communication channel (the genre of the interaction, the type of channel). In
relation to the latter, Karpinski (2009) points out that each communication channel
has its particular properties and it varies in the range of ‘meanings’ it may convey
and in the way it is used, e.g., a facial expression is frequently used for feedback
or emotional reactions. However, remaining on the contextual aspects of discourse,
7.6 Context in Human Communication 363
relies heavily on the facial expressions or gestures of the speaker to decode how
his/her messages are being interpreted, i.e., inferring the speaker’s emotional stance to
his/her utterances. Likewise, the speaker/sender can determine the listeners’ reaction
to what is being said. Pantic and Rothkrantz (2003) found that when engaged in
conversation the listener determines whether he/she is liked or disliked by relying
primarily upon facial expressions followed by vocal intonation, while words tend to
be of minor weight. Facial, gestural, and corporal behavior constitute a rich source of
information that humans share in an implicit and subtle way and that has a seminal
shaping influence on the overall communication. In all, context affects the selection
and the interpretation of nonverbal communicative behavior, which in turn contrib-
utes to conveying the context of spoken utterances and decoding their meanings.
What context moreover influences is the selection of modalities, and thus com-
munication channels, used to express communicative intents. Multi-channel and
multi-modal are two terms that tend to be often mixed up or used interchangeably.
However, they refer to quite distinct ideas of interaction between humans and
between humans and computers (HCI). In human–human communication, the term
‘modality’ refers to any of the various types of sensory channels. Human senses are
realized by different sensory receptors (see previous chapter for further detail).
Communication is inherently a sensory experience, and its perception occurs as a
multimodal (and thus multi-channel) process. Multimodal interaction entails a set of
varied communication channels provided by a combination of verbal and nonverbal
behavior involving speech, facial movements, gestures, postures, and paralinguistic
features, using multiple sensory organs. Accordingly, one modality entails a set of
communication channels using one sensory channel and different relevant classes of
verbal and nonverbal signals. In reference to dialog act systems, Karpinski (2009)
describes modality as a set of communication channels using one sensory channel
and a relevant class of verbal or nonverbal signals. Basically, nonverbal commu-
nication involves more channels than verbal communication, including space,
silence, touch, and smell, in addition to facial expressions, gestures, and body
postures. Indeed, research suggests that nonverbal communication channels are
more powerful than verbal ones; nonverbal cues are more important in under-
standing human behavior than verbal ones—what people say. Particularly, visual
and auditory modalities, taken separately, can enable a wide range of communi-
cation channels, irrespective of the class of verbal and nonverbal signals. For
example, visual modality provides various channels from facial gestures (e.g.,
eyebrow raising, eyebrow lowering, eye blinking, eye gaze, and head nods, as well
as visual orofacial articulatory or other facial correlates of prosody) and from
gestures (e.g., fingers, arms, and hands). On the other hand, auditory modality
provides textual channels (e.g., words, syntactic structures) and prosodic channels
(e.g., pitch, tempo, rhythm, intonation).
366 7 Towards AmI Systems Capable …
The research on ECAs—or its components—has been active for more than two
decades in academic circles. It draws on theoretical and empirical research from
linguistics and its subfields (specifically psycholinguistics, pragmatics, sociolin-
guistic, and cultural linguistics) as well as human nonverbal communication
behavior. Many models, theories, frameworks, and rules of speech and gestural
communication have been investigated and applied to computer systems within the
area of AI. A large body of studies has been conducted in simulated environment by
experts and scholars in computational linguistic or joint research groups and
resulted in the development of a range of systems that attempt to emulate natural
interaction. Many of these systems are, though, far from real life implementation.
There is way more that needs to be done than what has been accomplished thus far
given the complexity associated with mimicking human verbal and nonverbal
communication behavior, simulating natural language and natural forms of com-
munication into computer systems, adding to the evaluation of constructs, models,
methods, and instantiation as components that underlie conversational systems.
Especially, the objective of research within AI and AmI is to build fully functional
and well realized ECAs that are completely autonomous. One of the most inter-
esting investigations happening in the area of ECA is how to present communi-
cative behavior at two levels of abstraction, namely the higher level of
communicative intent or function and the lower level of physical behavior
description, using the SAIBA framework as an international research platform.
Prior delving into the discussion of what has been accomplished and under research
in relation to conversational systems, it might be worth providing short background
information on ECAs.
ECAs are autonomous agents that have a human-like graphical embodiment, and
possess the ability to engage people in face-to-face conversation (Cassell et al.
2000). This agent can create the sense of face-to-face conversation with the human
user, as it is capable of receiving multimodal input and producing multimodal
output in nearly real-time (Vilhjálmsson 2009). In HCI, it represents a multimodal
user interface where modalities are the natural modalities of human conversation,
namely speech, facial expressions and gestures, hand gestures, and body postures
(Cassell et al. 2000). ECAs ‘are capable of detecting and understanding multimodal
behavior of a user, reason about it, determine what the most appropriate multimodal
response is and act on this.’ (ter Maat and Heylen 2009, p. 67). ECAs are concerned
with natural interaction given that when constructing believable ECAs, the rules of
human verbal and nonverbal communication behavior must be taken into account.
368 7 Towards AmI Systems Capable …
For almost three decades, there has been an intensive research in academic circles as
well as in the industry on UbiCom and AmI with the goal to design a horde of next
generation technologies that can support human action, interaction, and communi-
cation in various ways, taking care of people’s needs, responding intelligently to
their spoken or gestured indications of desire, and engaging in intelligent dialog or
mingling socially with human users. This relates to conversational systems/agents
which can enable people to engage in intelligent interaction with AmI interfaces.
A collaborative research between scholars from various domains both within the
area of AmI and AI is necessary to achieve the goal of creating a fully interactive
environment. Indeed, one of the many research projects being undertaken at the MIT
within the field of AI is NECA project, which aims to develop a more sophisticated
generation of conversational systems/agents, virtual humans, which are capable of
speaking and acting in a human-like fashion (Salvachua et al. 2002). There is also an
international research community (a growing group of researchers) that are currently
working together on building conversational systems, a horde of believable virtual
humans that are capable to mingle socially with humanity (Vilhjálmsson 2009).
Again collaboration in this regard is of critical importance to make a stride towards
the goal. Building conversational systems requires bringing researchers together and
pooling their efforts, the knowledge of their research projects, in order to facilitate
and speed up the process. Vilhjálmsson (2009) recognizes that collaboration and
sharing of work among research communities that originally focus on separate
components or tasks relating to conversational systems ‘would get full conversa-
tional systems up and running much quicker and reduce the reinvention of the
wheel’. Towards this end, in 2007 an international group of researchers began laying
the lines for a framework that would help realize the goal, with a particular emphasis
on defining common interfaces in the multimodal behavior generation process for
ECA (Ibid). Following the efforts for stimulating collaboration, the research group
pooled its knowledge of various full agent systems and identified possible areas of
reuse and employment of standard interfaces, and, as a result, the group proposed the
so-called SAIBA framework as a general reference framework for ECA (Kopp et al.
2006; Vilhjálmsson and Stacy 2005). The momentum is to achieve the plan of
constructing a universal framework for multimodal generation of communicative
behavior that allows the researchers to build whole multimodal interaction systems.
Currently, research groups working on ECAs focus on different tasks, ranging from a
high level (communicative intents) to a low level (communicative signals) of the
SAIBA framework. The definition of this framework and its two main interfaces are
still at an early stage, but the increased interest and some promising patterns indicate
that the research group may onto something important (Vilhjálmsson 2009). As an
international research platform, SAIBA is intended to foster the exchange of com-
ponents between different systems, which can be applied to autonomous conver-
sational agents. This is linked to the aspect of AmI interfaces that aim to engage in
7.8 Conversational Systems 369
intelligent dialog with human users and that use verbal and nonverbal communi-
cation signals as commands, explicit inputs as speech waveform or gestural cues
from the user to perform actions.
Furthermore, another type of collaborative research that is crucial towards the
aim of building conversational systems is the interdisciplinary scholarly research
work. Endeavors should focus on raising awareness among the active researchers in
the disciplines of linguistics and its subfields and nonverbal communication
behavior about the possibility to incorporate up-to-date empirical findings and
advanced theoretical models in conversational systems, in particular in relation to
the context surrounding nonverbal communication and verbal communication
(especially in relation to semantics), mouth-hand coordination, speech-face syn-
chronization, communication error and recovery, and so on. Modelers in linguistic,
psycholinguistic, sociolinguistic, pragmatic, and behavioral disciplines should
develop an interest in ECA as a high-potential application sphere for their new
models. They can simultaneously get inspiration for new problem specifications and
new areas that need to be addressed for further developments of the disciplines they
study in relation to ECA. Examples of computational modeling areas of high
topicality to ECA may include: multimodal speech perception and generation,
multimodal perception and generation of nonverbal behavior, situated cognition and
(inter)action, mind reading (e.g., communicative intent), psycholinguistic pro-
cesses, emotional processes, emotional intelligence, context awareness, generative
cultural models, multilingual common knowledge base, and so forth. Far-reaching
conversational systems crucially depend on the availability of adequate knowledge
about human communication. And interdisciplinary teams may involve, depending
on the research tasks, language experts, professional linguists, computer scientists,
AI experts, computational mathematicians, logicians, cognitive scientists, cognitive
psychologists, psycholinguists, neuroscientists, anthropologists, social scientists,
and philosophers, among others.
As mentioned above, the working of ECA researchers has introduced the SAIBA
framework as an attempt to, in addition to stimulating sharing and collaboration,
scaffold the production process, a time-critical production process with high flex-
ibility, required by the generation of natural multimodal output for embodied
conversational agents. SAIBA framework involves two main interfaces: Behavior
Markup Language (BML) at the lower level between behavior planning and
behavior realization (Kopp et al. 2006; Vilhjálmsson et al. 2007) and Function
Markup Language (FML) at the higher level between intent planning and behavior
planning (Heylen et al. 2008). As illustrated in Fig. 7.2, the framework divides the
overall behavior generation process into three sub-processes, starting with com-
municative intent planning, going through behavior planning, and ending to actual
370 7 Towards AmI Systems Capable …
Fig. 7.2 The SAIBA framework for multimodal behavior, showing how the overall process
consists of three sub-processes at different levels of abstraction, starting with communication intent
and ending in actual realization in the agent’s embodiment. Source Vilhjálmsson (2009)
realization trough the agent’s embodiment. In other words, the framework specifies
multimodal generation of communicative behavior at a macro-scale, comprising
processing stages on three different levels: (1) planning of a communicative intent,
(2) planning of a multimodal realization of this intent, and (3) realization of the
planned behaviors. FML interface describes the higher level of communicative
intent, which does not make any claims about the surface form of the behavior, and
BML interface describes the lower level of physical behavior, which is realized by
an animation mechanism, instantiating intent as a particular multimodal realization
(Vilhjálmsson 2009). Moreover, in SAIBA framework, the communicative function
is separated from the actual multimodal behavior that is used to express the com-
municative function (ter Maat and Heylen 2009). As illustrative examples, the
function ‘request a feedback’ is conceptually separated from the act of raising
eyebrows and breathing in to signal that you want a feedback, and the function ‘turn
beginning’ is conceptually separated from the act of breaking eye-contact. The
separation is accomplished by putting the tasks of the communicative functions and
signals in two different modules that should be capable of communicating the
relevant functions and signals to each other, a process that is performed using FML
and BML as specification languages behavior (ter Maat and Heylen 2009).
Fig. 7.3 Rules that map functions to behavior assume a certain context like the social situation
and culture. Source Vilhjálmsson (2009)
This category relates to functional competence, which is one of the key pragmatic
competences and concerned with the use of spoken discourse in communication.
‘Conversational competence is not simply a matter of knowing which particular
functions (microfunctions)…are expressed by which language forms. Participants
are engaged in an interaction, in which each initiative leads to a response and moves
the interaction further on, according to its purpose, through a succession of stages
from opening exchanges to its final conclusion. Competent speakers have an
understanding of the process and skills in operating it. A macro-function is char-
acterized by its interactional structure. More complex situations may well have an
internal structure involving sequences of macro-functions, which in many cases are
ordered according to formal or informal patterns of social interaction (schemata).’
(Council of Europe 2000, p. 125). This category of functions is therefore contextual
as well as socioculturally situated. Initiating a conversation, for example, is greatly
dependent on the cultural conventions, roles of and relationship between the par-
ticipants, politeness conventions, register differences, place, time, and so on. It has
moreover been termed differently. But the widely accepted names are interactional
(Cassell et al. 2001), envelope (Kendon 1990) and management (Thorisson 1997).
The second category involves the actual conversational content that gets exchanged
or interchanged across a live communication channel. Exchanging content evolves at
the own accord of communication participants once the interaction is established.
This has much to do with discourse competence, a pragmatic competence which
involves the ability to arrange and sequence utterances to produce coherent stretches
of language, including control of the ordering of sentences with reference to
topic/focus, given/new, and natural sequencing (e.g., temporal), and organization and
management of discourse in terms of: thematic organization, coherence and cohe-
sion, rhetorical effectiveness, logical ordering, and so on (Council of Europe 2000).
This category also relates to framing in terms of the structuration of discourses,
socially dominant discourses. In this context, framing entails organizing patterns that
gives meaning to a diverse array of utterances and direct the construction of spoken
discourse in the sense of giving meaning and coherence to its content. The third
category is concerned with functions describing mental states and attitudes, which in
374 7 Towards AmI Systems Capable …
turn influence the way in which other functions give rise to their own independent
behavior (e.g., This category is needed to take care of ‘the various functions con-
tributing to visible behavior giving off information, without deliberate intent’, as ‘the
second category covers only deliberate exchange of information’ Vilhjálmsson 2009,
p. 52). In terms of ECAs, cognitive context (intended or unintended meaning) has
proven to be crucial to the functioning of conversational agents. What constitutes a
communicative function and how it should be distinguished from contextual ele-
ments is a critical issue in the current debate about FML (Heylen 2005). Table 7.1
illustrates some examples of all three categories, respectively.
Table 7.1 Interaction function, content functions, and mental states and attitude functions
Interaction functions
Function category Example functions
Initiation/closing React, recognize, initiate, salute-distant, salute-close, break-away, etc.
Turn-taking Take-turn, want-turn, yield-turn, give-turn, keep-turn, assign-turn,
ratify-turn, etc.
Speech-act Inform, ask, request, etc.
Grounding Request-ack, ack, repair, cancel, etc.
Content functions
Function category Example functions
Discourse structure Topics and segments
Rhetorical structure Elaborate, summarize, clarify, contrast, etc.
Information structure Rheme, theme, given, new, etc.
Propositions Any formal notation (e.g., ‘own(A,B)’)
Mental states and attitude functions
Function category Example functions
Emotion Anger, disgust, fear, joy, sadness, surprise, etc.
Interpersonal relation Framing, stance, etc.
Cognitive processes Difficulty to plan or remember
Source Vilhjálmsson (2009)
7.8 Conversational Systems 375
and emotional and attitudinal functions with facial expressions and prosody. One
class of nonverbal behavior may serve different functions. For example, prosody
may organize higher levels of discourse and contribute to topic identification
processes and turn taking mechanisms, as well as express feelings and attitudes (see
Karpinski 2009). To determine what a speaker intends to do, assuming that all these
categories of functions, interactive, informative, communicative, cognitive, emo-
tional, and attitudinal might all be involved at a certain point of time (e.g., engaging
in a topic relating to one of the socially dominant discourses such as AmI, to which
the speaker has intellectual standing or institutional belonging, but find it hard to
comprehend how some aspects of it relate to power relations, corporate power, and
unethical practices) requires a sound interpretation of the different nonverbal
communicative behaviors as well as how they interrelate in a dynamic way. And to
carry out this task necessitates using a set of various, intertwined contextual ele-
ments, consistent circumstances that surround the respective conversation. This can
be an extremely challenging task for an ECA to perform. However, for whatever is
formally graspable and computationally feasible, analyzing nonverbal communi-
cation behaviors using (machine-understandable and—processable entities of)
context is important for an ECA to plan, decide, and execute relevant communi-
cative behaviors. Contextual elements such as dialog, sociocultural setting, and the
environment play a significant role in the interpretation of the multimodal com-
municative behavior. Contextual variables are necessary to disambiguate multi-
modal communicative behaviors, that is, to determine the actual multimodal
communicative behavior that is used to express an interaction function, a process
which entails using the context to know which communicative functions are
appropriate at a certain point of time, and this knowledge of context can be used to
determine what was planned or intended with a given signal (ter Maat and Heylen
2009). Given the focus of SAIBA framework, generation of multimodal commu-
nicative behavior, the emphasis in the following section is on the category of
interaction or conversational functions, and the associated nonverbal communica-
tive behaviors (especially facial gestures).
dialog or mingle socially with human users. Besides, when building believable
ECAs, the rules of human communication must be taken into account; they include,
in addition to natural modalities, common knowledge base, and communication
error and recovery schemes, the diverse, multiple contextual entities that surround
and shape an interaction between human users and computer systems. To create
interaction between humans and computer systems that is closer to natural inter-
action, it is necessary to include various implicit elements into the communication
process (see Schmidt 2005). Like context-aware applications, ECAs need to detect
the user’s multimodal behavior and its surrounding context, interpret and reason
about behavior-context information, determine the most appropriate multimodal
response, and act on it.
There are several technical definitions that have been suggested in the literature
on context awareness, generating a cacophony that has led to an exasperating
confusion in the field of context-aware computing. ECA community is not immune
to the difficulty of context definition. Accordingly, in relation to conversational
systems, the concept of context is operationalized in a simplified way compared to
what is understood as context in human communication in academic disciplines
specialized on the subject matter. Based on the literature on ECA, context consists
of three entities: the dialog context, the environmental context, and the cultural
context (see Samtani et al. 2008). These contextual elements are assumed to sur-
round and influence the interpretation of the communicative behavior that is being
detected and analyzed as multimodal signals by an ECA system and also to
determine its communicative intent and behavior. It is important to note that in
relation to the SAIBA framework no detail is provided as to which features of each
component of context are implemented in conversational systems, nor is there an
indication of how these features are interrelated in their implementation, e.g., the
current topic and the level of tension between participants as part of dialog context
relate to a particular socially dominant discourse and cultural conventions as part of
cultural context.
and the cultural context. The dialog context considers the dialog history, the current
topic under discussion, the communication characters and the level of tension
between them; the environmental context takes into account such elements as the
location, local time, and the current setting; and the cultural context includes cul-
tural conventions and rules, e.g., information on how culturally express certain
communicative functions in an appropriate way. In this sense, specific theoretical
models of pragmatics and sociolinguistics are taken into account in the imple-
mentation of context into ECAs. Context is crucial to the task of producing a more
natural communicative multimodal output in real-time. It helps an artificial agent to
interpret and reason more intelligently on multimodal communicative input and to
carry out actions in knowledgeable manner. In their work, Agabra et al. (1997)
demonstrate that using context is useful in expert domains and conclude that
contextual knowledge is essential to all knowledge-based systems. However, a
complete model of nonverbal behavior related context seems at the moment
unfeasible. One example of theoretical models is Ekman and Friesen’s (1969),
which encompasses other classes of the consistent circumstances that surround and
influence the interpretation of nonverbal behavior than external environment,
external feedback, and the relationship of the nonverbal with the verbal behavior
(being currently under investigation in relation to SAIBA), including awareness,
intentionality, and type of information conveyed.
intended functions are mapped into visible behavior using the current context. This
process entails selecting the relevant communicative behavior to perform when a
conversational function is provided, which should in turn be done through context.
Providing the conversational function occurs after mapping BML to FML with
regard to the human user interacting with the agent. To iterate, this process entails
analyzing and interpreting the multimodal input received from the user in the
current context, and then generating an abstract description of the user’s commu-
nicative intent upon which the agent can act.
Mapping the detected multimodal behavior to the intended communicative
functions (the interaction category of functions) is what ter Maat and Heylen (2009)
call ‘disambiguation problem’, and reliance on knowledge of context is aimed to
remove the potential ambiguities that may surround the class of nonverbal com-
municative behavior that tends to have different roles (e.g., conversational signals,
punctuators, manipulators, regulators, etc.) in relation to interaction functions, such
as facial gestures (e.g., eyebrow actions, head motions, head nods, eye gaze, gaze
directions, blinks, eye-contact, etc.). For example, eyebrow actions, head motions,
and eye blinks serve as conversational signals and punctuators; eye blinks (to wet
the eyes) and head nods also serve as manipulators; and eyebrow actions serve as
regulators as well (Pelachaud et al. 1996). However, relying on knowledge of
context to disambiguate nonverbal communicative behavior is necessary for the
effective performance of ECAs as to realizing their own autonomous communi-
cative behavior. In other words, how best the generated communicative behavior
serves the communicative function for the agent is primarily contingent on how
accurately the agent detects and effectively interprets and reason about the context
surrounding the detected communicative signal to determine the underlying
meaning of it. ‘When trying to disambiguate a signal…It does not really matter
which signal was chosen to express a certain function, the important part is to find
the meaning behind the signals…One has to know the context to know which
functions are appropriate at a certain point and this knowledge can be used to
determine what was intended with a detected signal. In this case, the context can act
as a filter, making certain interpretations unlikely.’ (ter Maat and Heylen 2009,
p. 72).
Therefore, by using context, an agent can determine what is intended by a certain
communicative behavior and why it is performed. Thereby, context provides a key
to decoding the meaning of the nonverbal communicative behavior. Accordingly,
depending on the situation, a conversational agent can determine the actual
meaning of an eye gaze or a head nod (e.g., if a head nod is shown just after a direct
yes/no question, then it probably means yes). Both signals may occur in different
contexts with widely diverging meanings. Facial gestures have different roles in
conversational acts. An eye gaze from the speaker might be part of a behavior
complex that signals the intention of a turn offer (ten Bosch et al. 2004; Cassell
et al. 1999) or it may signify a request for feedback (Heylen 2005; Beavin Bavelas
et al. 2002; Nakano et al. 2003). Likewise, a head nod can have different meanings,
it can signify yes, it can serve as a backchannel, it can convey an intensification
(ter Maat and Heylen 2009), or it can mark disapproval.
7.8 Conversational Systems 379
express different possible appropriate functions based on the context. Therefore, the
context has to be used to find pointer elements to solve the disambiguation problem.
For example, to iterate, a head nod can mean yes, it can serve as a backchannel, it
can convey an intensification, and so on. In relation to cultural context, Lyons
(1977) state that in certain cultures the nodding of the head with or without an
accompanying utterance is indicative of assent or agreement. In all, pointer ele-
ments can help pick out the most likely function, although there are always going to
be some exceptions. ‘It is also possible that the person or agent producing an
ambiguous signal intended to communicate all the different meanings. Take for
example a backchannel utterance… [T]his can be a continuer, an indicator of the
listener that he does not want the turn and that the speaker should continue
speaking. Another function that uses a backchannel utterance as a signal is an
acknowledgement … In a lot of contexts the difference between these two functions
is hardly visible, in a lot of situation both a continuer and an acknowledgement
would fit. But it is not unimaginable that a backchannel utterance means both at the
same time… When disambiguating signals, these types of ambiguities should be
kept in mind and it should be realized that sometimes the function of a signal
simply is not clear…, and sometimes a signal can have multiple meanings at the
same time.’
Context also influences the selection of modalities and thus communication channels
used to express communicative intents. This is most likely to have implication for
the interpretation of the meaning of spoken utterances as well as emotion messages
conveyed, in particular, through nonverbal behaviors. This occurs based on how
modalities and communication channels enabled by these modalities are combined
depending on the context. In this vein, information conveyed through one modality
or channel may be interpreted differently from if it is delivered by another modality
or channel or, rather, as a result of a set of combined modalities and channels.
According to Karpinski (2009, p. 167), ‘each modality may provide information on
its own that can be somehow interpreted in the absence of other modalities, and that
can influence the process of communication as well as the information state of the
addressee.’ He contends that it is of no easy task ‘to separate the contributions to the
meaning provided through various modalities or channels and the final message is
not their simple “sum.” The information conveyed through one modality or channel
may be contrary to what is conveyed through the other; it may modify it or extend it
in many ways. Accordingly, the meaning of a multimodal utterance should be, in
principle, always regarded and analyzed as a whole, and not decomposed into the
meaning of speech, gestures, facial expressions and other possible components. For
example, a smile and words of appraisal or admiration may produce the impression
382 7 Towards AmI Systems Capable …
of being ironic in a certain context.’ (Ibid, p. 167). This provides insights into
understanding how meaning conveyed through combined modalities and channels
may be interpreted differently in conversational acts. It is of import to account for
such nuances of meaning and the underlying modalities and channels when building
conversational systems as believable human representatives. Karpinski (2009)
proposes a system of dialog acts called DiaGest along with a conceptual framework
that allows for independent labeling of the contributions provided by various
modalities and channels. Involving the study of the communicational relevance of
selected lexical, syntactic, prosodic and gestural phenomena, this project considers
both auditory and visual modalities and defines four channels: text, prosody, facial
expression, and gestures as major ways of providing quasi-independent modal
contributions. In this context, modal contribution is defined ‘as the information
provided through a given modality within the boundaries of a given dialog act.’
(Ibid, p. 167). This concept is introduced to relieve the problem of annotating both
separate modalities as well as the meaning of entire utterances as to dialog acts. Also
in their study, dialog acts are conceptualized as multidimensional entities composed,
or built on the basis, of modal contributions provided by the aforementioned
channels. In this sense, a single modal contribution may, depending on the context,
constitute the realization of a dialog act.
The affective information conveyed through one modality or channel may also
be interpreted differently in the absence of, or be contrary to what is conveyed
through, the other modality. This is of relevance to context-aware systems when it
comes to the interpretation of the user’s emotional states, which are implicitly
captured using multi-sensory devices embedded in the so-called multimodal user
interfaces. Emotional states are inherently multimodal and thus their perception is
multi-channel based. The premise is that the interpretation of emotional information
captured by one modality may differ in the absence or degradation of the other
modality depending on the context, e.g., noise may affect auditory sensors (to
capture emotiveness and acoustical prosodic features of speech related to emotion
conveyance), and darkness may affect visual sensors (to capture facial expressions).
Consequently, the interpretation of the user’s emotional states may not be as sound
as if both modalities and relevant channels are combined in the perception of the
contextual data/information in the sense of whether it is completely or partially
captured as implicit input from verbal and nonverbal communication signals.
Hence, the meaning of the (rather multimodal) emotional state should be, in
essence, analyzed and interpreted as a whole, and not decomposed into the meaning
of emotiveness, prosodic features, facial expressions, and other possible compo-
nents. Accordingly, there is much work that needs to be done in this regard to
advance both context-aware and conversational systems as to detecting and inter-
preting more subtle shades and meaning of verbal and nonverbal behavior, par-
ticularly when different modalities and thus channels are to be combined.
Especially, according to Karpinski (2009, p. 167), each modality and channel ‘has
its particular properties and they vary in the range of “meanings” they may convey
and in the way they are typically employed. For example, the modality of gestural
expression is frequently sufficient for answering propositional questions, ordering
7.8 Conversational Systems 383
Fig. 7.4 Communicative function annotated in a real-time chat message helps produce an
animated avatar that augments the delivery. Source Vilhjálmsson (2009)
thereby generates a realization of that message that best supports the intended
communication. In this case, the Generation Module could deliver the message as if
it was being spoken by the avatar, if the sender has also an animated one, and
produce all the supporting nonverbal behavior according to the FML to BML
mapping rules. The author claims that the performance of avatars can even be
personalized or tailored based on the recipient’s local or cultural setting. This
implies that the mapping rules would be applied into the agent embodiment taking
into account the cultural context, and that the same communicative function
associated with the message will be instantiated or realized in two different ways,
using a combination of verbal and nonverbal signals that correspond to commu-
nication rules of that local setting.
The second application, as shown in Fig. 7.5, is a classic ECA where a human
user interacts with a graphical representation of the agent on a large wall-size
display. In this application, following the description of the multimodal input
received from the user using something like BML, a special Understanding Module
interprets the behavior in the current context and generates an abstract description
of the user’s communicative intent in FML specification. The agent’s decisions
about how to respond are made at the abstract level inside a central Decision
Module, which are similarly described in FML. Finally, a Generation Module
applies mapping rules to produce BML (behavior realization) that carries out the
agent’s intended functions visible to the human user, using the current context
(situation and culture).
The author points out that creating an agent in this fashion has some advantages,
one of which is isolating the abstract decision making module, which can be quite
complex, from the surface form of behavior, both on the input side and the output
side. He adds that it may be easier to tailor the agent’s interaction to different
cultural settings or use different means for communication, e.g., phoning a user
7.8 Conversational Systems 385
Fig. 7.5 An embodied conversational agent architecture where the central decision module only
deals with an abstract representation of intent. Source Vilhjálmsson (2009)
Facial animation has recently been under a vigorous investigation in the creation of
ECAs systems with a graphical embodiment that can, by analyzing text input or
natural speech signals, drive a full facial animation. The goal is to build believable
virtual human representatives. Towards this end, it is critical for ECAs to implement
facial gestures, facial expressions, orofacial articulatory gestures (visible correlates
of prosody), and lip-movement associated with visemes. Research shows that there
is much work to be done as to speech-driven facial gestures and nonverbal speech
full facial animation. As stated in Zoric et al. (2009), there is a considerable literature
(e.g., Pelachaud et al. 1996; Graf et al. 2002; Cassell 1989; Bui et al. 2004; Smid
et al. 2004) on the systems that use text input to drive facial animation and that
incorporate facial and head movements as well as lip movements. However, pro-
ducing lip movements does not help as to naturalness of the face. Indeed, there exist
many systems (e.g., Zoric 2005; Kshirsagar and Magnenat-Thalmann 2000; Lewis
1991; Huang and Chen 1998; McAllister et al. 1997) that although they are capable
of producing correct lip synchronization from speech signal, they miss ‘natural
experience of the whole face because the rest of the face has a marble look’ (Zoric
et al. 2009). Existing systems that attempt to generate facial gestures by only ana-
lyzing speech signal mainly concentrate on a particular gesture or general dynamics
of the face), and related state-of-the-art literature lacks method for automatically
generating a complete set of facial gestures (Zoric et al. 2009). Zoric et al. (2009)
mention a set of research works and elaborate briefly on how they expand on each
other in relation to the generation of head movements based on recent evidence that
demonstrate that pitch contour (F0) as an audio feature is correlated with head
motions. They additionally introduce other systems that use speech features to drive
general facial animation. Examples of works involving such systems include Brand
(1999), Gutierrez-Osuna et al. (2005), Costa et al. (2001), and Albrecht et al. (2002).
The first work learns the dynamics of real human faces during speech
using two-dimensional image processing techniques. This work incorporates
lip movements, co-articulation, and other speech-related facial animation.
7.8 Conversational Systems 387
The second work learns speech-based orofacial dynamics from video and generates
facial animation with realistic dynamics. In the third work, the authors propose a
method to map audio features to video analyzing only eyebrow movements. In the
latter work, the authors introduce a method for automatic generation of several facial
gestures from speech, including ‘head and eyebrow raising and lowering dependent
on the pitch; gaze direction, movement of eyelids and eyebrows, and frowning
during thinking and word search pauses; eye blinks and lip moistening as punctu-
ators and manipulators; random eye movement during normal speech.’
In a recent work dealing with ECAs that act as presenters, Zoric et al. (2009)
attempt to model correlation between (nonverbal) speech signals and occurrence of
facial gestures, namely head and eyebrow movements and blinking during speech
pauses; eye blinking as manipulators; and amplitude of facial gestures dependent on
speech intensity. To generate facial gestures, they extract the needed information
from speech prosody, through analyzing natural speech in real-time. Prosodic
features of speech are taken into consideration given the abundance of their func-
tions, including, to iterate, expressing feelings and attitudes; contributing to topic
identification processes and turn taking mechanisms in conversational interactions;
and reflecting various features of utterance pertaining to statements, questions,
commands or other aspects of language that may not be grammatically and lexically
encoded in the spoken utterances. Moreover, their work, which aims to develop a
system for full facial animation driven by speech signals in real-time, is based on
their previously developed HUGE architecture for statistically based facial ges-
turing, and, as pointed out by the authors, extends their previous work on automatic
real-time lip synchronization, which takes speech signal as input and carry out
audio to visual mapping to produce visemes. The components of the system which
is based on the speech signal as a special case of HUGE architecture are illustrated
in Fig. 7.6. The adaption of HUGE architecture to speech signal as inducement
involves the following issues: definition of audio states correlated with specific
speech signal features; implementation of the automatic audio state annotation and
classification module; and integration of the existing Lip Sync system.
For further information on HUGE architecture, supervised learning method
using statistical modeling and reasoning, facial gesture generation and related
issues, and other technical aspects of the project, the reader is directed to the
original document. Figure 7.7 shows snapshots from a facial animation generated
from nonverbal speech signal.
The authors said that their system is still in an early stage, and, as part of future
research, they were planning to add head and eyebrow movements correlated with
388 7 Towards AmI Systems Capable …
Fig. 7.6 Universal architecture of HUGE system adapted to audio data as inducement. Source
Zoric et al. (2009)
Fig. 7.7 From left to right: neutral pose, eyebrow movement, head movement, and eye blink.
Source Zoric et al. (2009)
pitch changes, as well as eye gaze since it contributes a lot to naturalness of the
face. They moreover intend to integrate as many of the rules found in literature on
facial gestures as possible. They state that to have a believable human represen-
tative it is of import to implement, in addition to facial gestures, verbal and emo-
tional displays. They also mention that the evaluation is an important step in
building a believable virtual human. Indeed, it is crucial to carry out a detailed
evaluation of ECAs in terms of the underlying components, namely constructs,
models, methods and instantiations. See Chap. 3 for more detail on the evaluation
of computational artifacts and related challenges. In the context of ECAs, it is as
important to scrutinize evaluation methods for assessing the different components
7.8 Conversational Systems 389
underlying such artificial artifacts as to evaluate what these artifacts embody and
their instantiations. As noted by Tarjan (1987), metrics must also be scrutinized by
experimental analysis. This relates to meta-evaluation, evaluation of evaluations,
whereby metrics define what the evaluation research try to accomplish with regard
to assessing the evaluation methods designed for evaluating how well ECAs can
perform. Periodic scrutiny of these metrics remains necessary to enhance such
methods as the research evolves within ECA community; varied evaluation
methods can be studied and compared.
Although the research on ECAs has made a progress with regard to receiving,
interpreting, and responding to multimodal communicative behavior, it still faces
many challenges and open issues relating to system engineering and modeling that
need to be addressed and overcome in order to achieve the goal of building virtual
humans or online beings. These challenges and open issues include, and are not
limited to:
• paradigms that govern the assembly of ECA systems;
• principles and methodologies for engineering computational intelligence;
• general approaches to modeling, understanding, and generating multimodal
verbal and nonverbal communication behavior, with an interaction of data
analysis techniques and ontologies;
• techniques and models of the knowledge, representation, and run-time behavior
of ECA systems;
• the performance of ECA systems given that they need to act in a (nearly)
real-time fashion, immediately and proactively responding to spoken and ges-
tured signals;
• enabling proactivity in ECA systems through dynamic learning and real-time
and pre-programed heuristics reasoning;
• evaluation techniques of ECA systems; and
• programing of conversational multimodal interfaces and prototyping software
systems.
The way cognitive, emotional, neurological, physiological, behavioral, and
social processes as aspects of human functioning are combined, synchronized, and
interrelated is impossible, at the current stage of research, to mimic and model in
computer systems. Human communication is inherently complex and manifold with
regard to the use, comprehension, and production of language. Advanced discov-
eries in the area of computational intelligence will be based on the combination of
knowledge from linguistics, psycholinguistics, neurolinguistics, cognitive linguis-
tics, pragmatics, and sociolinguistics, as well as the cultural dimension of speech-
accompanying facial, hand, and corporal gestures. It is crucial to get people together
from these fields or working on cross connections of AmI with these fields to pool
390 7 Towards AmI Systems Capable …
their knowledge and work collaboratively. Modelers in these fields must become
interested in conversational and dialog systems associated with AmI research as a
high-potential application area for their models. Otherwise the state-of-the art in
related models, albeit noteworthy, will not be of much contribution to the
advancement of conversational systems towards achieving a full potential. In fact,
research in ECA has just started to emphasize the importance of pooling knowledge
from different growing groups of researchers pertaining to various full agent sys-
tems in order to construct virtual humans capable of mingling socially with human
users. As pointed out by Vilhjálmsson (2009, p. 48, 57), ‘Building a fully functional
and beautifully realized embodied conversational agent that is completely autono-
mous, is in fact a lot more work than a typical research group can handle alone. It
may take individual research groups more than a couple of years to put together all
the components of a basic system [technically speaking only], where many of the
components have to be built from scratch without being part of the core research
effort… Like in all good conspiracy plots, a plan to make this possible is already
underway, namely the construction of a common framework for multimodal
behavior generation that allows the researchers to pool their efforts and speed up
construction of whole multimodal interaction systems.’
Linguistic subareas such as computational linguistics, psycholinguistics, and
neurolinguistics have contributed significantly to the design and development of
current conversational and dialog acts systems. For example, computational lin-
guistics has provided knowledge and techniques for computer simulation of
grammatical models for the generation and parsing of sentences and computational
semantics, including defining suitable logics for linguistic meaning representations
and reasoning. However, research in the area of computational pragmatics and
computational sociolinguistics is still in its infancy, and therefore there is much
work to be done to implement pragmatic and sociolinguistic capabilities (compe-
tences) into conversational systems. Modeling pragmatic and sociolinguistic com-
ponents of language into artificial conversational systems is associated with
enormous challenges. As mentioned earlier, a few research institutions are currently
carrying out research within the areas of computational pragmatics and computa-
tional sociolinguistics—computational modeling of interactive systems in terms of
dialog acts, intention recognition/pragmatics, and interpretation and generation of
multimodal communicative behavior in different sociolinguistic contexts and based
on different pragmatic situations. Most work in building conversational systems is
becoming increasingly interdisciplinary in nature, involving knowledge from across
the fields of linguistics, psycholinguistics, neurolinguistics, computational linguis-
tics, computational pragmatics, computational sociolinguistics, cognitive science,
and speech-accompanying facial and hand gestures. In particular, taking into
account sociocultural and situational contexts in understanding and conveying
meaning, that is, the way in which such contexts contribute to meaning, is of high
importance to building successful ECA systems. In other words, socio-linguistic
and pragmatic components are very critical in order to create AmI systems that can
engage in an intelligent dialog or mingle socially with human users. However, to
formally capture such dimensions in natural language modeling is no easy task,
7.9 Challenges, Open Issues, and Limitations 391
References
Albas DC, McCluskey KW, Albas CA (1976) Perception of the emotional content of speech: a
comparison of two Canadian groups. J Cross Cult Psychol 7:481–490
Albrecht I, Haber J, Seidel H (2002) Automatic generation of non-verbal facial expressions from
speech. In: Proceedings of computer graphics international (CGI2002), pp 283–293
Andersen PA (2004) The complete idiot’s guide to body language. Alpha Publishing, Indianapolis
Andersen P (2007) Nonverbal communication: forms and functions. Waveland Press, Long Grove
Angus D, Smith A, Wiles J (2012) Conceptual recurrence plots: revealing patterns in human
discourse. IEEE Trans Visual Comput Graphics 18(6):988–997
Arbib MA (2003) The evolving mirror system: a neural basis for language readiness. In:
Christiansen M, Kirby S (eds) Language evolution: the states of the art. Oxford University
Press, Oxford, pp 182–200
Arbib MA (2005) From monkey-like action recognition to human language: an evolutionary
framework for neurolinguistics. Behavioral Brain Sci 28(2):105–124
Argyle M (1988) Bodily communication. International Universities Press, Madison
Argyle M, Cook M (1976) Gaze and mutual gaze. Cambridge University Press, Cambridge
Argyle M, Ingham R (1972) Gaze, mutual gaze, and proximity. Semiotica 6:32–49
Argyle M, Ingham R, Alkema F, McCallin M (1973) The different functions of gaze. Semiotica
7:19–32
Bahl LR, Baker JK, Cohen PS, Jelinek F, Lewis BL, Mercer RL (1978) Recognition of a
continuously read natural corpus. In: Proceedings of the IEEE international conference on
acoustics, speech and signal processing, Tulsa, Oklahoma, pp 422–424
Banich MT (1997) Breakdown of executive function and goal-directed behavior. In: Banich MT
(ed) Neuropsychology: the neural bases of mental function. Houghton Mifflin Company,
Boston, MA, pp 369–390
Bänninger-Huber E (1992) Prototypical affective microsequences in psychotherapeutic interactions.
Psychother Res 2:291–306
Beattie G (1978) Sequential patterns of speech and gaze in dialogue. Semiotica 23:29–52
Beattie GA (1981) A further investigation of the cognitive interference hypothesis of gaze patterns.
Br J Soc Psychol 20(4):243–248
Beavin Bavelas J, Coates L, Johnson T (2002) Listener responses as a collaborative process: the
role of gaze. J Commun 52:566–580
Benoît C, Mohamadi T, Kandel S (1994) Effects of phonetic context on audio-visual intelligibility
of French. J Speech Hear Res 37:1195–1203
Beskow J, Granström B, House D (2006) Visual correlates to prominence in several expressive
modes. In: Proceedings of interspeech 2006—ICSLP, Pittsburg, pp 1272–1275
Binnie CA, Montgomery AA, Jackson PL (1974) Auditory and visual contributions to the
perception of consonants. J Speech Hear Res 17(4):619–630
Bledsoe WW, Browning I (1959) Pattern recognition and reading by machine. Papers presented at
the eastern joint IRE-AIEE-ACM computer conference on—IRE-AIEE-ACM’59 (Eastern),
ACM Press, New York, pp 225–232, 1–3 Dec 1959
Boë LJ, Vallée N, Schwartz JL (2000) Les tendances des structures phonologiques: le poids de la
forme sur la substance. In: Escudier P, Schwartz JL (eds) La parole, des modèles cognitifs aux
machines communicantes—I. Fondements, Hermes, Paris, pp 283–323
Brand M (1999) Voice puppetry. In: Proceedings of SIGGRAPH 1999, pp 21–28
Bucholtz M, Hall K (2005) Identity and interaction: a sociocultural linguistic approach. Discourse
Stud 7(4–5):585–614
Bui TD, Heylen D, Nijholt (2004) A Combination of facial movements on a 3D talking head. In:
Proceedings of computer graphics international
Bull PE (1987) Posture and gesture. Pergamon Press, Oxford
Burgoon JK, Buller DB, Woodall WG (1996) Nonverbal communication: the unspoken dialogue.
McGraw-Hill, New York
Burr V (1995) An introduction to social constructivism. Sage, London
Canale M, Swain M (1980) Theoretical bases of communicative approaches to second language
teaching and testing. Appl Linguist 1:1–47
References 395
Guerrero LK, DeVito JA, Hecht ML (eds) (1999) The nonverbal communication reader. Waveland
Press, Lone Grove, Illinois
Gumperz J (1968) The speech community. In: International encyclopedia of the social sciences.
Macmillan, London, pp 381–86. Reprinted In: Giglioli PP (ed) Language and Social Context.
Penguin, London, 1972, p 220
Gumperz J, Cook-Gumperz J (2008) Studying language, culture, and society: sociolinguistics or
linguistic anthropology? J Sociolinguistics 12(4):532–545
Gunes H, Piccardi M (2005) Automatic visual recognition of face and body action units. In:
Proceedings of the 3rd international conference on information technology and applications,
Sydney, pp 668–673
Gutierrez-Osuna R, Kakumanu PK, Esposito A, Garcia ON, Bojorquez A, Castillo JL, Rudomin I
(2005) Speech-driven facial animation with realistic dynamics. IEEE Trans Multimedia, 7(1)
Hall TA (2001) Phonological representations and phonetic implementation of distinctive features.
Mouton de Gruyter, Berlin and New York
Halle M (1983) On distinctive features and their articulatory implementation. Nat Lang Linguist
Theory 91–105
Halliday MAK, Hasan R (1976) Cohesion in English. Longman Publication Group, London
Hanna JL (1987) To Dance is human: a theory of nonverbal communication. University of
Chicago Press, Chicago
Hargie O, Dickson D (2004) Skilled interpersonal communication: research, theory and practice.
Routledge, Hove
Hayes PJ, Reddy RD (1983) Steps toward graceful interaction in spoken and written man-machine
communication. Int J Man Mach Stud I(19):231–284
Heylen D (2005) Challenges ahead: head movements and other social acts in conversations. In:
Halle L, Wallis P, Woods S, Marsella S, Pelachaud C, Heylen D (eds) AISB 2005, Social
intelligence and interaction in animals, robots and agents. The Society for the Study of
Artificial Intelligence and the Simulation of Behavior, Hatfield, pp 45–52
Heylen D, Kopp S, Marsella S, Pelachaud C, Vilhjálmsson H (2008) The next step Towards a
functional markup language. In: Proceedings of Intelligent Virtual Agents. Springer,
Heidelberg
Holden G (2004) The origin of speech. Science 303:1316–1319
Hollender D (1980) Interference between a vocal and a manual response to the same stimulus’. In:
Stelmach G, Requin J (eds) Tutorials in motor behavior. North-Holland, Amsterdam, pp 421–432
Honda K (2000) Interactions between vowel articulation and F0 control. In: Fujimura BDJO,
Palek B (eds) Proceedings of linguistics and phonetics: item order in language and speech
(LP’98)
Huang FJ, Chen T (1998) Real-time lip-synch face animation driven by human voice. In: IEEE
workshop on multimedia signal processing, Los Angeles, California
Hymes D (1971) Competence and performance in linguistic theory. In: Language acquisition:
models and methods, pp 3–28
Hymes D (2000) On communicative competence. In: Duranti A (ed.) Linguistic anthropology:
a reader. Blackwell, Malden, pp 53–73
Iverson J, Thelen E (1999) Hand, mouth, and brain: the dynamic emergence of speech and gesture.
J Consciousness Stud 6:19–40
Iverson J, Thelen E (2003) The hand leads the mouth in ontogenesis too. Behavioral Brain Science
26(2):225–226
Jacko A, Sears A (eds) (2003) The human-computer interaction handbook: fundamentals, evolving
technologies, and emerging applications. Lawrence Erlbaum Associates, Hillsdale
Jakobson R, Fant G, Halle M (1976) Preliminaries to speech analysis: the distinctive features and
their correlates. MIT Press, Cambridge
Johnson FL (1989) Women’s culture and communication: an analytical perspective. In: Lont CM,
Friedley SA (eds) Beyond boundaries: sex and gender diversity in communication. George
Mason University Press, Fairfax, pp 301–316
398 7 Towards AmI Systems Capable …
Kaiser S, Wehrle T (2001) Facial expressions as indicator of appraisal processes. In: Scherer KR,
Schorr A, Johnstone T (eds) Appraisal theories of emotions: theories, methods, research.
Oxford University Press, New York, pp 285–300
Kapur A, Kapur A, Virji-Babul N, Tzanetakis G, Driessen PF (2005) Gesture-based affective
computing on motion capture data. In: Proceedings of the 1st international conference on
affective computing and intelligent interaction, Beijing, pp 1–7
Karpinski M (2009) From Speech and Gestures to Dialogue Acts. In: Esposito A, Hussain A,
Marinaro M, Martone R (eds) Multimodal signals: cognitive and algorithmic issues. Springer,
Berlin, pp 164–169
Kendon A (1967) Some functions of gaze direction in social interaction. Acta Psychol 26:1–47
Kendon A (1980) Gesticulation and speech: two aspects of the process of utterance. In: Key MR
(ed) The relationship of verbal and nonverbal communication. Mouton, The Hague, pp 207–227
Kendon A (1990) Conducting interaction: patterns of behavior in focused encounters. Cambridge
University Press, New York
Kendon A (1997) Gesture. Ann Rev Anthropoly 26:109–128
Kendon A (2004) Gesture: visible action as utterance. Cambridge University Press, Cambridge
Kingston J (2007) The phonetics-phonology interface. In: DeLacy P (ed) The handbook of
phonology. Cambridge University Press, Cambridge, pp 253–280
Kita S (ed) (2003) Pointing: where language, culture, and cognition meet. Lawrence Erlbaum
Associates, Hillsdale
Kleck R, Nuessle W (1968) Congruence between the indicative and communicative functions of
eye-contact in interpersonal relations. Br J Soc Clin Psychol 7:241–246
Knapp ML, Hall JA (1997) Nonverbal communication in human interaction. Harcourt Brace, New
York
Knapp ML, Hall JA (2007) Nonverbal communication in human Interaction. Wadsworth, Thomas
Learning
Koike D (1989) Pragmatic competence and adult L2 acquisition: speech acts in interlanguage. The
Modern Language Journal 73(3):279–289
Kopp S, Krenn B, Marsella SC, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson
HH (2006) Towards a common framework for multimodal generation: the behavior markup
language. In: Gratch J, Young M, Aylett RS, Ballin D, Olivier P (eds) IVA 2006, LNCS, vol
4133. Springer, Heidelberg, pp 205–217
Kroy M (1974) The conscience, a structural theory. Keter Press Enterprise, Israel
Kshirsagar S, Magnenat-Thalmann N (2000) Lip synchronization using linear predictive analysis.
In: Proceedings of IEEE international conference on multimedia and exposition, New York
Langacker RW (1987) Foundations of cognitive grammar, theoretical prerequisites, vol 1. Stanford
University Press, Stanford
Langacker RW (1991) Foundations of cognitive grammar, descriptive application, vol 2. Stanford
University Press, Stanford
Langacker RW (2008) Cognitive grammar: a basic introduction. Oxford University Press, New
York
Lass R (1998) Phonology: an introduction to basic concepts. Cambridge University Press,
Cambridge (2000)
Lee, SP, Badler, JB, Badler, NI (2002) Eyes alive. In: Proceedings of the 29th annual conference
on computer graphics and interactive techniques 2002, ACM Press, New York, pp 637–644
Leech G (1983) Principles of Pragmatics. Longman, London
Levelt WJM, Richardson G, Heij WL (1985) Pointing and voicing in deictic expressions. J Mem
Lang 24:133–164
Lewis J (1991) Automated lip-sync: background and techniques. J Visual Comput Animation
2:118–122
Lippi-Green R (1997) The standard language myth. English with an accent: language,
ldeology, and discrimination in the United States. Routledge, London, pp 53–62
Littlejohn SW, Foss KA (2005) Theories of human communication. Thomson Wadsworth,
Belmont
References 399
Samtani P, Valente A, Johnson WL (2008) Applying the SAIBA framework to the tactical
language and culture training system. In: Parkes P, Parsons M (eds) The 7th international
conference on autonomous agents and multiagent systems (AAMAS 2008), Estoril, Portugal
Scherer KR (1992) What does facial expression express? In: Strongman K (ed) International
review of studies on emotion, vol 2, pp 139–165
Scherer KR (1994) Plato’s legacy: relationships between cognition, emotion, and motivation,
University of Geneva
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam, pp 159–178
Schwartz JL (2004) La parole multisensorielle: Plaidoyer, problèmes, perspective. Actes des
XXVes Journées d’Etude sur la Parole JEP 2004, pp xi–xviii
Schwartz JL, Robert-Ribes J, Escudier P (1998) Ten years after summerfield: a taxonomy of
models for audiovisual fusion in speech perception. In: Campbell R, Dodd BJ, Burnham D
(eds) Hearing by eye II: advances in the psychology of speech reading and auditory-visual
speech. Psychology Press, Hove, pp 85–108
Schwartz JL, Berthommier F, Savariaux C (2004) Seeing to hear better: evidence for early
audio-visual interactions in speech identification. Cognition 93:B69–B78
Segerstrale U, Molnar P (eds) (1997) Nonverbal communication: where Nature meets culture.
Lawrence Erlbaum Associates, Mahwah
Short JA, Williams E, Christie B (1976) The social psychology of telecommunications. Wiley,
London
Siegman AW, Feldstein S (eds) (1987) Nonverbal behavior and communication. Lawrence
Erlbaum Associates, Hillsdale
Smid K, Pandzic IS, Radman V (2004) Autonomous speaker agent. In: Computer animation and
social agents conference CASA 2004, Geneva, Switzerland
Sperber D, Wilson D (1986) Relevance: communication and cognition. Blackwell, Oxford
Stemmer B, Whitaker HA (1998) Handbook of neurolinguistics. Academic Press, San Diego, CA
Stetson RH (1951) Motor phonetics: a study of speech movements in action. Amsterdam,
North-Holland
Sumby WH, Pollack I (1954) Visual contribution to speech intelligibility in noise. J Acoust Soc
Am 26(2):212–215
Summerfield AQ (1979) Use of visual information for phonetic perception. Phonetica 36:314–331
Summerfield Q (1987) Comprehensive account of audio-visual speech perception. In: Dodd B,
Campbell R (eds) Hearing by eye: the psychology of lip-reading. Lawrence Erlbaum
Associates, Hillsdale, pp 3–51
Takimoto M (2008) The effects of deductive and inductive instruction on the development of
language learners’ pragmatic competence. Mod Lang J 92(3):369–386
Tarjan RE (1987) Algorithm design. Commun ACM 30(3):205–212
ten Bosch L, Oostdijk N, de Ruiter JP (2004) Durational aspects of turn-taking in spontaneous
face-to-face and telephone dialogues. In Sojka P, Kopecek I, Pala K (eds) TSD 2004, LNCS,
vol 3206. Springer, Heidelberg, pp 563–570
ter Maat M, Heylen D (2009) Using context to disambiguate communicative signals. In:
Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals, LNAI 5398.
Springer, Berlin, pp 164–169
Thorisson KG (1997) An embodied humanoid capable of real-time multimodal dialogue with People.
In: The 1st international conference on autonomous agents, ACM, New York, pp 536–537
Truss L (2003) Eats, shoots and leaves—the zero tolerance approach to punctuation. Profile Books
Ltd, London
Turing AM (1950) Computing machinery and intelligence. Mind 59(236):433–460
van Hoek K (2001) Cognitive linguistics. In: Wilson RA, Keil FC (eds) The MIT encyclopedia of
the cognitive sciences
References 401
8.1 Introduction
AmI aims to take the emotional dimension of users into account when designing
applications and environments. One of the cornerstones of AmI is the adaptive and
responsive behavior of systems to the user’s emotional states and emotions,
respectively. Technology designs, which can touch humans in sensible ways, are
essential in addressing affective needs and ensuring pleasant and satisfying user
interaction experiences. In recent years there has thus been a rising tendency in AI
and AmI to enhance HCI by humanizing computers making them tactful, sympa-
thetic, and caring in relation to the feelings of the humans. One of the current issues
in AI is to create methods for efficient processing (in-depth, human-like analysis) of
emotional states or emotions in humans. Accordingly, a number of frameworks that
integrate affective computing (a research area in AI) and AmI have recently been
developed and applied across a range of domains. Including affective computing
paradigm within AmI is an interesting approach; it contributes to affective
context-aware and emotion-aware systems. Therefore, AmI researchers are explor-
ing human emotions and emotional intelligence (as abilities related to emotion) and
advancing research on emotion-aware and affective context-aware technology, by
amalgamating fundamental theoretical models of emotion, emotion-aware HCI, and
affective context-aware HCI. The importance and implication of this research
emanates from its potential to enhance the quality of people’s life. The premise is that
affective or emotion-aware applications can support users in their daily activities and
influence their emotions in a positive way, by producing emotional responses that
have positive impact on the users’ emotion and help them to improve their emotional
intelligence, i.e., abilities to understand, evaluate, and manage their emotions and
those of others, as well as to integrate emotions to facilitate their cognitive activities
or task performance. Another interesting related aspect of AmI is the system feature
of social intelligence. As AmI is envisioned to become an essential part of people’s
social life, AmI systems should support social processes of human users and be
8.2 Emotion
The scientific study of emotion (nonverbal aspects) dates back to the late 1800s—
with Darwin’s (1872) earliest and most widely recognized work on emotional
expressions in humans. Emotion has been extensively researched and widely dis-
cussed. It has been an important topic of study throughout most of the history of
psychology (Lazarus 1991). However, after more than a century of scientific
research and theory development, there is a sharp disagreement on which traits
define the phenomenon of human emotion. There is still no definite definition of
emotion. The term is used inconsistently. And dictionary definitions of many terms
associated with the emotional system demonstrate how difficult it is to clearly
articulate what is meant by emotion. It has been perceived differently by different
scholars in terms of what it precisely consists of as dimensions. Scientists find it
very difficult to agree on the definition of emotion, although there is some con-
sensus that emotions are constituted by different components (Kleinginna and
Kleinginna 1981). There is indeed no way to completely describe an emotion by
knowing some of its components. Nevertheless, some psychologists have attempted
to converge on some key common aspects of emotions or rather the emotional
complex. In general, emotion can be described as a complex, multidimensional
experience of an individual’s state of mind triggered by both external influences as
well as internal changes. In psychology, emotion often refers to complex, subjective
experiences involving many components, including cognitive, arousal, expressive,
organizing, and physical, as well as highly subjective meanings. Emotions are
induced affective states (Russell 2003) that typically arise as reactions to important
situational events in one’s environment (Reeve 2005). They arise spontaneously in
response to a stimulus event and biochemical changes and are accompanied by
(psycho)physiological changes, e.g., increased heartbeat and outward manifestation
(external expression). With different degrees of intensity, individuals often behave
in certain ways as a direct result of their emotional state; hence, behavior is con-
sidered to be essential to emotion.
Emotion is very complex. This is manifested in what it involves as components
of and as a program in the brain. Emotions are biologically regulated by the
executive functions of the prefrontal cortex and involve important interactions
between several brain areas, including limbic system and cerebral cortex which has
multiple connections with the hypothalamus, thalamus, amygdale, and other limbic
system structures (Passer and Smith 2006). Neural structures involved in the cog-
nitive process of emotion operate biochemically, involving various neurotransmitter
substances that activate the emotional programs residing in the brain (Ibid).
Furthermore, emotion involves such factors as personality, motivation, mood,
temperament, and disposition in the sense of a state of readiness or a tendency to
behave in a specific way. It is a transient state of mind encompassing various
dynamic emotional processes evoked by experiencing (perceiving) different
406 8 Affective Behavioral Features of AmI …
sensations as a means to cope with the environment. Keltner and Haidt (1999)
describe emotions as dynamic processes that mediate the organism’s relation to a
continually changing social environment. Emotions orchestrate how we react
adaptively to the external environment, especially to the important events in our
lives. Specifically, emotional processes entail establishing, maintaining, or dis-
rupting the relation between the organism and the environment on matters of central
relevance to the individual (Campos et al. 1989). Emotions are thus strategies by
which they engage with the world (Solomon 1993).
arousal, emphasizes that all emotional responses require some sort of appraisal,
whether we are aware of it or not (Passer and Smith 2006). It can be described as
the intensity of physiological arousal that tells us how strongly we are feeling
something, such as fear or frustration or some other emotion, but it is the situational
cues telling us which felling we are having that provide the information needed to
label that arousal (Ibid). Overall, appraisal theorists following a componential
patterning approach, as proposed by Frijda (1986), Roseman (1984), or Scherer
(1984) share the assumption that (a) emotions are elicited by a cognitive evaluation
of antecedent stimuli (situations and events) and that (b) the patterning of the
reactions in the different response domains, including physiology, action tenden-
cies, expression, and (subjective) feeling is determined by the outcome of this
evaluation process. Scherer (1994) argues that a componential patterning approach
is suitable to understand the complex interactions between various factors in the
dynamic unfolding of emotion. He also contends that ‘the apparent lack of
empirical evidence of component covariation in emotion is partly due to different
response characteristics of the systems concerned and the mixture of linear and
nonlinear systems’ and assures that concepts from nonlinear dynamic models are to
be adopted ‘to treat emotion as a turbulence in the flow of consciousness and make
use of catastrophe theory to predict sudden changes in the nature of the emotional
processes’ (Ibid).
Fig. 8.1 Example figure of SAM: the arousal dimension. Source Desmet (2002)
410 8 Affective Behavioral Features of AmI …
the focus of the emotional state (Scherer 1999). Appraisal theoretical model is
perhaps the most influential approach to emotion within psychology (Scherer
1999), but categorical models of emotions (see below for further discussion) remain
of frequent use in affective computing due to practical reasons (Cearreta et al.
2007). Indeed, pragmatism and simplifications in operationalizing and modeling
emotional states as fluid, complex concepts prevail in affective computing and AmI
alike. Regardless, if theoretical models of emotions are not currently taken
all-inclusively into account, that is, from a theoretical view, affective aware AmI
systems will never break through to the mainstream as interactive systems.
Thinking in computing has to step beyond the technological constraints and
engineering perspectives. What is needed is what science needs to have and cannot
measure.
People may misattribute the specific emotion types, but they rarely misattribute
their valence (Solomon 1993). One would, for example, confuse such emotions as
anger and frustration or irritation and exasperation but it is unlikely that they would
confuse happiness with sadness or admiration with detestation. Further to this point
and at the emotion classification level, there is no definitive taxonomy of emotions;
numerous taxonomies have been proposed. Common categorizations of emotions
include: negative versus positive emotions; basic versus complex emotions, primary
versus blended emotions, passive versus active, contextual versus non-contextual,
and so on. In addition, in terms of time occurrence, some emotions occur over a
period of seconds whereas others can last longer. There are a number of classifi-
cation systems of basic emotions compiled by a range of researchers (e.g., Ortony
and Turner 1990). Emotion classification concerns both verbal and nonverbal
communication behaviors, including facial expressions, gestures, speech paralin-
guistic, and emotive features of speech. However, the lack of standardization often
causes inconsistencies in emotion classification, particularly in facial expressions
and emotiveness, an issue that has implication for emotion modeling and impact
emotion conceptualizations with regard to recognition of affect display used in
emotion computing, such as emotion-aware AmI and affective computing.
Many different disciplines have produced work on the subject of emotion, including
social science, human science, cognitive psychology, philosophy, linguistics,
nonverbal communication, neuroscience and its subfields social and affective
neuroscience, and so on. Studies of emotion within linguistics, nonverbal com-
munication, and social sciences are of particular relevance to emotion computing
technology—affective computing and context-aware computing. Studies in lin-
guistics investigate, among others, the expression of emotion through paralinguistic
features of speech and how emotion changes meaning to non-phonemic or prosodic
aspects, in addition to the expression of emotions through utterances. Beijer (2002)
describes emotive utterances as every utterance in which the speaker is emotionally
involved as expressed linguistically, which is informative for the listener. In non-
verbal communication, research is concerned with, among others, the expression of
emotion through facial and gestural behavior and the role of emotions in the
communication of messages. Emotion in relation to linguistics and nonverbal
communication is discussed in more detail in Chap. 7. Social sciences investigate
emotions for the role they play in social processes and interactions, and take up the
issue of emotion classification and emotion generation, among others. In relation to
emotion study in social sciences, Darwin (1872) emphasizes the nonver-
bal aspects of emotional expressions, and hypothesized that emotions evolve via
natural selection and therefore have cross-culturally universal counterparts. Ekman
(1972) found evidence that humans share six basic emotions: fear, sadness, hap-
piness, anger, disgust, and surprise. From Freudian psychoanalytic perspective,
412 8 Affective Behavioral Features of AmI …
emotions are viewed as underlying forces and drives that directly influence
behavior (Freud 1975). From a cognitive perspective, emotions are about how we
perceive and appraise a stimulus event. Emotion requires thought, information
processing over perception, which leads to an appraisal that, in turn, leads to an
emotion (Cornelius 1996). Several theorists argue that evaluations or thoughts as a
cognitive activity is necessary for an emotion to occur (e.g., Frijda 1986; Scherer
et al. 2001; Ortony et al. 1988; Solomon 1993). Moreover, William James sees
emotions as ‘bodily changes’ arguing that emotional experience is largely due to the
experience of such changes (James 1884). This relates to somatic theories of
emotion that claim that bodily responses rather than judgments are essential to
emotions. Anthropological work claims that emotions are dependent on sociocul-
tural facts rather than ‘natural’ in humans, an argument which challenges the
Darwinian view of emotions as ‘natural’ in humans (Lweis and Haviland 1993;
Lutz 1988). Some anthropology studies analyze and investigate, in addition to
emotions by contextualizing them in culture as a setting in which they are expressed
when looking for explaining emotional behavior, the role of emotions in human
activities, a topic which is of relevance to the interaction between the user and
technology in relation to task performance. Indeed, HCI is emerging as a specialty
concern within, among other disciplines, sociology and anthropology in terms of
the interactions between technology and work as well as psychology in terms of the
application of theories of cognitive processes and the empirical analysis of user
behavior (ACM 2009). Moreover, within sociology, according to Lweis and
Haviland (1993), human emotions are viewed as ‘results from real, anticipated,
imagined, or recollected outcomes of social relations’. From the perspective of
sociology of emotions, people try to regulate and control their emotions to fit in
with the norms of the social situation, and everyday social interactions and situa-
tions are shaped by social discourses. The social constructionists worldviews posit
that emotions serve social functions and are culturally determined rather than
biologically (responses within the individual) fixed as well as emergent in social
interaction rather than as a result of individual characteristics, biology, and evo-
lution (Plutchik and Kellerman 1980).
The term ‘emotional intelligence’ has been coined to describe attributes and skills
related to the concept of emotion (Koonce 1996). As such, it has recently gained a
significant ground in the new emerging field of affective computing and recently
AmI. Emotional intelligence denotes the ability to perceive, assess, and manage
one’s emotions and others’. Salovey and Mayer (1990) define emotional intelli-
gence as ‘the ability to monitor one’s own and others’ feelings and emotions, to
discriminate among them and to use this information to guide one’s thinking and
actions’. According to Passer and Smith (2006), a cognitive psychologist, emo-
tional intelligence is to be aware of your emotions, control and regulate your own
8.3 Emotional Intelligence: Definitional Issues and Models 413
Affective computing is the branch of computer science and the area of AI that is
concerned with modeling emotions or simulating emotional processes into com-
puters or machines. It is a scientific area that works on the detection of and response
to user’s emotions (Picard 2000). Specifically, it deals with the study, design,
development, implementation, evaluation, and instantiation of systems that can
recognize, interpret, process, and act in response to emotions or emotional states.
This is to build computers that are able to convincingly emulate emotions or exhibit
human-like emotional capabilities. It is recognized that the inception of affective
8.4 Affective Computing and AmI Computing 415
With the aim to restore a proper balance between emotion and cognition in the
design of new technologies for addressing human (affective) needs, the MIT
affective computing team (Picard 1997; Zhou et al. 2007) carries out research in the
area of affective computing from a broad perspective, contributing to the develop-
ment of techniques for measuring indirect mood, stress, and frustration through
natural interaction; techniques for enhancing the self-awareness of the affective
states and how to select their communication to others, and emotionally intelligent
systems, as well as pioneering studies on ethical issues in affective computing.
Among the notable projects as cited in Zhou et al. (2007) include: ERMIS,
HUMAINE, NECA, and SAFIRA. The prototype system ERMIS (Emotionally Rich
Man-machine Intelligent System) can interpret the user’s emotional states (e.g.,
interest, boredom, anger) from speech, facial expressions, and gestures.
The HUMAINE (Human–Machine Interaction Network on Emotion) project aims to
lay the foundations for emotional systems that can detect, register, model, under-
stand, and influence human emotional states. The aim of NECA project is to develop
a more sophisticated generation of conversational systems/agents, virtual humans,
which are capable of speaking and acting in a human-like fashion. Supporting
Affective Interactions for Real-time Applications (SAFIRA) project focuses on
developing techniques that support affective interactions. The MIT affective
416 8 Affective Behavioral Features of AmI …
Given the variety of systems being investigated in the area of affective computing,
there should be a lot more to its integration with AmI than just enhancing affective
context-aware systems. Indeed, AmI are capable of meeting needs and responding
intelligently to spoken or gestured wishes and desires without conscious mediation,
and even these could result in systems that are capable of engaging in intelligent
dialog (Punie 2003, p. 5). Hence, AmI systems should be able not only to auton-
omously adapt to the emotional state of the user, but also generate emotional
responses that elicit positive emotions by having an impact on the user’s emotions,
appear sensitive to the user, help the user to improve his/her emotional intelligence
skills, and even mingle socially with the user. Particularly, the simulation of
emotional intelligence and human verbal and nonverbal communication into
computers is aimed at helping users to enhance different abilities associated with
emotion and support social interaction processes. Conversational agents and emo-
tional intelligent systems are both of interest to and primary focus in affective
computing. Indeed, affective computing scholars and scientists are studying, in
addition to emotionally intelligent systems, a wide variety of technologies for
improving the emotional abilities of the user such as the self-awareness of the
emotional states and how to communicate them to others in a selective way. They
are also working on the development of advanced conversational agents, systems
which can interpret the user’s emotional state from speech and facial expressions
and gestures and can register, model, understand, and influence human emotional
states as well as support affective interactions.
418 8 Affective Behavioral Features of AmI …
parameters extracted from the speech waveform (related to pitch, speaking tempo,
voice quality, intonation, loudness and rhythm) can be useful in disambiguating
affective display, context remains still determining in the process, especially if the
communication channel does not allow for the use of the textual component of the
linguistic message or is limited to transmission of lexical symbols that describe
emotional states.
The contextual appropriateness of emotions (whether transmitted by speech or
gestural means) is an initial step that is very crucial in order for a system to
understand and interpret emotions and thus provide emotional intelligence services.
While Mayer and Salovey (1997) argue that the ability to discriminate between
appropriate and inappropriate expressions of emotion is the key ability for inter-
preting and analyzing emotional states, Ptaszynski et al. (2009) conclude that
computing contextual appropriateness of emotional states is a key step towards a
full implementation of emotional intelligence in computers. Besides, emotions
should be perceived as context-sensitive engagements with the world, as demon-
strated by recent discoveries in the field of emotional intelligence (Ptaszynski et al.
2009). However, most research focuses on the development of technologies for
affective systems in AI and AmI as well as the design of such systems, but a few
studies on contextual appropriateness of emotions and multimodal context-aware
affective interaction have been conducted. Furthermore, most of the behavioral
methods simply classify emotions to opposing pairs or focus only on a simple
emotion recognition (Teixeira et al. 2008; Ptaszynski et al. 2009), ignoring the
complexity and the context reliance of emotions (Ptaszynski et al. 2009).
Nonetheless, there is a positive change in the tendency of analyzing affective states
as emotion specific rather than using methods that categorize emotions to simple
opposing pairs. This trend can be noticed in text mining and information extraction
approaches to emotion estimation (Tokuhisa et al. 2008). In all, understanding
users’ emotions requires accounting for context as a means to disambiguate and
interpret the meaning or intention of emotional states for a further affective com-
putational processing and relevant service delivery.
As mentioned above, emotional states constitute one element of the user context that
a context-aware system should recognize in order to adapt its functionality to better
match user affective needs. A good context-aware system is the one that can act in
response to the evaluation of the elements of the general context that are of central
concern to the user in an interrelated, dynamic fashion. With context gaining an
increased interest in affective computing, it becomes even more interesting to
include affective computing with context-aware computing. The role of affective
computing in context-aware computing is to equip context-aware applications with
the ability to understand and respond to the user’s needs according to the emotional
element of the user context. And speech, facial and corporal gestures have a great
8.4 Affective Computing and AmI Computing 421
Currently, there is a great variety of technologies that can be used for the design and
implementation of affective systems and affective context-aware systems. Here the
emphasis is on capture technologies and recognition techniques. Also, a classifi-
cation of studies on emotion detection and recognition is included. This is to
highlight the enabling role such technologies are playing in the implementation of
affective systems in terms of detecting or recognizing the emotional states of users
from their affective display. Externally expressed, affective display is considered as
a reliable emotional channel, and includes vocal cues, facial cues, physiological
cues, gestures, action cues, etc. These channels are carriers of affective information,
which can be captured by affective systems for further computational interpretation
and processing for the delivery of a range of adaptive and responsive services.
would mitigate against their use in more sophisticated and critical applications, such
as conversational agents and affective systems, which rely heavily on the contextual
dimension of emotions. Such variations in recognition may rather be practical in
less critical applications that may use unimodal input such as facial expression. For
example, video games that may alter some aspects of its contents in response to the
view’s emotions such as fear or anger as inferred from their facial expression.
neural network processing (see, e.g., Wimmer et al. 2009; Pantic and Rothkrantz
2000). See Chap. 4 for a detailed discussion on pattern recognition techniques
supported with illustrative examples relating to different types of context. Hand
gestures have been a common focus of body gesture detection methods (Pavlovic
et al. 1997). Body gesture is the position and the changes of the body. There are
many proposed methods (Aggarwal and Cai 1999) to detect the body gesture. As an
illustrative example of affect display, facial expressions are addressed in little more
detail in the next section as an affect display behavior and recognition method. But
before delving into this, it may be worth shedding a light on the emotive function of
language—emotiveness—since this topic has not so far covered, neither in relation
to conversational agents in the previous chapter nor to affective systems.
may not have high confidence that it accurately recognizes, for some reason,
lexicon of all words describing emotional states, but their assessment of users’
affective display, mainly paralinguistic features and facial expressions signals is
more likely to indicate high probability or estimation of the current emotional state
of the user. However, speech remains the most precise tool for expressing complex
intentions (Ibid). In terms of affect recognition in speech, most research focusing on
building computational models for the automatic recognition of affective expression
in speech investigate how acoustic (prosody) parameters extracted from the speech
waveform (related to voice quality, intensity, intonation, loudness and rhythm) can
help disambiguate the affect display without knowledge of the textual component of
the linguistic message.
Regarding the investigation of the structure of emotion in conversations, there
are various features that have been studied in conversation analysis (Karkkainen
2006; Wu 2004; Gardner 2001). In order to explore the structure of emotions in
English conversation, Zhou et al. (2007) study four features (see Table 8.1) in
English conversation: namely lexical choice, syntactic form, prosody and sequential
positioning, which are facets that have been studied in conversation analysis by the
above authors. Álvarez et al. (2006) provide feature subset selection based on
evolutionary algorithms for automatic emotion recognition in spoken Spanish
language.
expressions happiness and fear are confused most often due to the similar muscle
activity around the mouth. This is also reflected by Facial Action Coding System
(FACS) that describes the muscle activities within a human face (Ekman 1999).
As an explicit affect display, facial expressions are highly informative about the
affective or emotional states of people. The face is so visible that conversational
participants can interpret a great deal from the faces of each other. Facial expressions
can be important for both speakers and listeners in the sense of allowing listeners to
infer speakers’ emotional stance to their utterances and speakers determining their
listeners’ reaction to what is being uttered or expressed. Facial cues can constitute
communicative acts, comparable to ‘speech acts’ directed at one or more interaction
partner (Bänninger-Huber 1992). Recognizing facial displays is one of the aspects of
natural HCI and of the challenges to augment computer systems with aspects of
human–human (or human-like) interaction capabilities. Equipping systems with
facial expression recognition abilities is an attempt to create HCI applications that
are aimed to take the holistic nature of the human user into account—that is, to touch
humans in holistic and sensible ways, by considering human emotion, (expressive)
behavior and (cognitive) intention (for more detail on this dimension see next
chapter). This concerns the emerging and future affective systems in terms of
becoming more intuitive, aware, sensitive, adaptive, and responsive to the user.
Widespread applicability and the comprehensive benefit motivate research on the
topic of natural interaction, one important feature of which is facial expression
recognition. Perceiving or being aware of human emotions via facial expressions
plays a significant role in determining the success of next-generation interactive
systems intended for different applications, e.g., computer-assisted or e-learning
systems, conversational agents, emotionally intelligent systems, emotional context-
aware systems, and emotion-aware AmI systems. The quality, success rate, and
acceptance of such applications or their combination will significantly rise as
technologies, especially recognition or capture techniques, for their implementation
will evolve or advance. In a multidisciplinary work on automatic facial expression
interpretation, Lisetti and Schiano (2000) integrate human interaction, AI,
and cognitive science with an emphasis on pragmatics and cognition. Their work
provides a comprehensive overview on applications in emotion recognition. Also,
interdisciplinary research (interactional knowledge) crossing multiple disciplines
(including cognitive psychology, cognitive science, computer science, behavioral
science, communication behavior, and culture studies) is necessary in order
to construct suitable, effective interaction methods and user interfaces and,
thus, successful and widely accepted (affective) interactive systems. Indeed, cul-
tural studies are very important when it comes to HCI design in all of its areas.
430 8 Affective Behavioral Features of AmI …
In terms of affective HCI, cultural variations are great as different cultures may
assign different meanings to different facial expressions. For example, a smile as a
facial expression can be considered a friendly gesture in one culture while it can
signal embarrassment in another culture. Hence, affective HCI whether concerning
affective and AmI systems or conversational systems should account for cultural
variations as a key criteria for building effective user interfaces. The implementation
or instantiation of technological systems in real-world environments may run
counter to what the evaluation conducted in the lab may have to say about the
performance of technologies. In fact, what is technical feasible and risk-free within
the lab may have implications in real life.
Considerable research is being carried out on the topic of facial displays with focus
on the relationship between facial expressions and gestures and emotional states
within the field of affective, emotion-aware, and context-aware HCI. Automatic
recognition of human facial displays, expressions and gestures, has particularly in
recent years gained a significant ground in natural HCI—naturalistic user interfaces
used by affective, conversational, and AmI systems alike. HCI community is
extensively investigating the potential of facial displays as a form of implicit input
for detecting the emotional states of users. To date most research within computing
tends to center on recognizing and categorizing facial expressions (see, e.g., Gunes
and Piccardi 2005; Kapur et al. 2005). Recent research projects are exploring how
to track and detect facial movements corresponding to both lower and upper fea-
tures with the hope to integrate the state-of-the art facial expression analysis
modules with new miniaturized (multi)sensors to reliably recognize different
emotions. A number of approaches into facial expression recognition have been
developed and applied to achieve real-time performance and provide robustness for
real-world applicability. Research is indeed focusing on building a real-time system
for facial expressions recognition that robustly runs in real-world environments
with respect to the implementation of conversational and affective systems. Most of
the popular systems for facial expressions recognition (e.g., Cohen et al. 2003; Sebe
et al. 2002; Tian et al. 2001; Pantic and Rothkrantz 2000; Edwards et al. 1998;
Cohn et al. 1999; Wimmer 2007; Wimmer et al. 2009) are built based on the
universal six facial expressions. Figure 8.2 illustrates one example of each of ‘the
six universal facial expressions’ (Ekman 1972, 1982) as they occur in Kanade et al.
(2000), according to a comprehensive database for facial expression analysis for
automatic face recognition.
In Ekman (1999), the Facial Action Coding System (FACS) describes the
muscle activities within a human face. Facial expressions are generated by com-
binations of Action Units (AUs), which denote the motion of particular facial
8.7 Facial Expressions and Computing 431
Fig. 8.2 The six universal facial expressions. Source Kanade et al. (2000)
fractions and state the facial muscles concerned. Based on principles of neuro-
physiology, anatomy and biomechanics, motor neurons supply groups of muscle
fibers with their innervations form motor units, which are connected to the primary
motor cortex of the brain via the pons: an area which conveys the ability to move
muscles independently and perform fine movements. Theoretically, the fewer fibers
are in each motor unit, the finer the degree of facial movement control. On the other
hand, extended systems such as emotional FACS (Friesen and Ekman 1982) denote
the relation between facial expressions and corresponding emotions. In an attempt
to expand his list of basic emotions, Ekman (1999) provide a range of positive and
negative emotions not all of which are encoded in facial muscles, including
amusement, contempt, contentment, embarrassment, excitement, pride in achieve-
ment, relief, satisfaction, sensory pleasure, shame, and so on. Also, research shows
that some facial expressions can have several meanings at the same time for they
normally have different functions and indicate different things. Investigation is
active to develop new approaches to address related issues in the area of facial
expression recognition in relation to affective HCI applications. ‘Given the
multi-functionality of facial behavior and the fact that facial indicators of emotional
processes are often very subtle and change very rapidly…, we need approaches to
measure facial expressions objectively—with no connotation of meaning—on a
micro-analytic level. The Facial Action Coding System (FACS; Ekman and Friesen
1978) lends itself to this purpose; it allows the reliable coding of any facial action in
terms of the smallest visible unit of muscular activity (Action Units), each referred
to by a numerical code. As a consequence, coding is independent of prior
assumptions about prototypical emotion expressions. Using FACS, we can test
different hypotheses about linking facial expression to emotions’ (Kaiser and
Wehrle 2001, pp. 287–288).
from the image and a facial expression database. It is important to extract mean-
ingful features in order to derive the facial expression visible from these features.
This task consists of various subtasks and involves a wide variety of techniques to
accomplish these subtasks, which generally include localizing facial features,
tracking them, and inferring the observable facial expressions. Several
state-of-the-art approaches into performing these subtasks could be found in the
literature (e.g., Chibelushi and Bourel 2003; Wimmer et al. 2009), some of which
will be referred to in this section. According to the survey of Pantic and Rothkrantz
(2000), computational procedure of facial recognition involves three phases, face
detection, feature extraction, and facial expression classification (happiness, anger,
disgust, sadness, fear, surprise).
Phase 1: Like in all computer tasks, different methods exist for performing face
detection as part the overall procedure of facial expression recognition. Face
detection task can be executed automatically as in Michel and El Kaliouby (2003),
Cohn et al. (1999) or manually as to specifying the necessary information to focus
on the interpretation task itself as in Cohen et al. (2003), Schweiger et al. (2004),
Tian et al. (2001). However, according to Wimmer et al. (2009, p. 330), ‘more
elaborate approaches make use of a fine grain face model, which has to be fitted
precisely to the contours of the visible face. As an advantage, the model-based
approach provides information about the relative location of the different facial
components and their deformation, which turns out to be useful for the subsequent
phases’.
Phase 2: Face extraction is mainly concerned with the muscle activity of facial
expressions. Most approaches use the Facial Action Coding System (FACS).
Numbering over twenty, the muscles of facial expressions allow a wide variety of
movements and convey a wide range of emotions (Gunes and Piccardi 2005; Kapur
et al. 2005) or emotional states. Specifically, the muscles activity allows various
facial actions depending on what expressive behavior is performed and conveys a
wide range of emotions, which are characterized by a set of different shapes as they
reach the peak expression. Facial expressions consist of two important aspects: the
muscle activity while the expression is performed and the shape of the peak
expression’, and methods used in this phase tend ‘to extract features that represent
one or both of these aspects’ (Wimmer et al. 2009). When it comes to face
extraction, approaches may slightly differ as to the number of feature points that are
to be extracted from the face, which depends on what area of the face is mostly the
focus as well as on the approach adopted. Within the face, Michel and El Kaliouby
(2003) extract the location of 22 feature points that are predominantly located
around the eyes and around the mouth. In their approach, they focus on facial
motion by manually specifying those feature points and determine their motion
between an image showing the neutral state of the face and another representing a
facial expression. In a similar approach, Cohn et al. (1998) uses hierarchical optical
flow approach called feature point tracking in order to determine the motion of 30
feature points. Schweiger et al. (2004) manually specify the region of the visible
face while (Wimmer et al. 2009)’s approach performs an automatic localization via
model-based image interpretation.
8.7 Facial Expressions and Computing 433
Phase 3: The last phase is concerned with the classification of facial expressions.
That is, it determines which one of the six facial expressions is derived or inferred
from the extracted features. Likewise, there are different approaches into facial
expression classification, clustered into supervised learning (i.e., HMMs, neural
networks, decision trees, support vector machines) and unsupervised learning (i.e.,
graphical models, multiple eigenspaces, variants of HMMs, Bayes networks).
Michel and El Kaliouby (2003) utilize or train a support vector machine (SVM) to
determine one of the six facial expressions within the video sequences of the com-
prehensive facial expression database developed by Kanade et al. (2000) for facial
expression analysis. This database is known as the Cohn-Kanade-Facial-Expression
database (CKFE-DB) contains 488 short image sequences of 97 different individuals
performing the six universal facial expressions and each sequence shows a neutral
face at the beginning and then build up to the peak expression. To accomplish
classification, Michel and El Kaliouby (2003) compare the first frame with the
neutral expression to the last frame with the peak expression. Basing their classifi-
cation instead on supervised neural network learning and in order to extract the facial
features, Schweiger et al. (2004) compute the optical flow within 6 predefined
regions of a human face. Other existing approaches follow Ekman and Friesen
(1978)’s rules by first computing the visible action units (AUs) and then inferring the
facial expression.
applications that have been developed so far are far from real-world implementa-
tion. On the whole, the research in the field is still in its infancy.
In the following, an approach to the estimation of user’s affective states in HCI
and frameworks are presented and described along with related example applica-
tions. It is to note that both the approach and the frameworks are preliminary and
the proposed applications are still at very early stages. The approach is a step
towards the full implementation of ability EIF. The first framework is a modeling
approach into multimodal context-aware affective interaction. It is a domain
ontology of context-aware emotions, which serves particularly as a guide for
flexible design of affective context-aware applications. The second framework is a
model for emotion-aware AmI, which aims to facilitate the development of appli-
cations that take their user’s emotion into account, by providing responsive services
that help users to enhance their emotional intelligence.
In this line of research, in Ptaszynski et al. (2009), the authors propose an approach
to the estimation of user’s affective states in HCI, a method for verifying (com-
puting) contextual appropriateness of affective states conveyed in conversations,
which is capable of specifying users’ affective states in a more sophisticated way
than simple valence classification. Indeed, they assert that this approach is nove, as
it attempts to go beyond the first basic step of EIF—emotion recognition, and
represents a step forward in the implementation of EIF. Their argument for this
method making a step towards practical implementation of EIF is that it provides
machine computable means for verifying whether an affective state conveyed in a
conversation is contextually appropriate. Apart from specifying what type of
emotion was expressed, the proposed approach determines whether the expressed
emotion is appropriate for the context it is expressed or appears in—the appro-
priateness of affective states is checked against their contexts. One more important
feature of this method is its contribution to the classification standardization of
emotions as it uses the most reliable one available today. The proposed method uses
affect analysis system on textual input to recognize users’ emotions—that is to
determine the specific emotion types as well as valence, and a Web mining tech-
nique to verify their contextual appropriateness.
This approach has demonstrated the difficulty in disambiguating emotion types
and valence, since the accuracy of determining contextual appropriateness of
emotions was evaluated against 45 % for specific emotion types and against 50 %
for valence. Accordingly, the authors state that the system is still not perfect and its
components need improvement, but it defines a new set of goals for affective
computing and to the research of AI in general, nevertheless. An example of an
8.8 Approaches, Frameworks, and Applications 435
One interesting and important aspect of AmI is the system feature of social intel-
ligence: the ability to understand, manage, and, to some extent, negotiate complex
social interactions and environments. AmI is envisioned to be an integral part of
8.9 Socially Intelligent AmI Systems … 437
people’s social life. AmI systems should support the social interactive processes of
humans and be competent social agents in social interactions (Markopoulos et al.
2005; Nijholt et al. 2004; Sampson 2005). Emotions are a key element of socially
intelligent behavior. Accordingly, for AmI systems to serve human users well, they
are required to adapt to their emotions and thus elicit positive feelings in them, not
to be disturbing or inconvenient. Socially intelligent features of a system lie in
invoking positive feelings in the user (Markopoulos et al. 2005). A system designed
with socially intelligent features is one that is able to select and fine-tune its
behavior according to the affective (or emotional) state and cognitive state (task) of
the user (see Bianchi-Berthouze and Mussio 2005). The aim of AmI is to design
applications and environments that elicit positive emotions (or trigger emotional
states) and pleasurable user experiences. To ensure satisfactoriness and pleasur-
ability and thus gain acceptance for AmI, applications need not only to function
properly and intelligently and be usable and efficient, but they also need to be
aesthetically pleasant and emotionally alluring. In fact, Aarts and de Ruyter (2009,
p. 5) found that social intelligence, elements from it, plays a central role in the
realization of the AmI vision, in addition to cognitive intelligence and computing.
They reaffirm the notion of intelligence alluded to in AmI, the behavior of AmI
systems associated with context-aware, personalized, adaptive, and anticipatory
services, needs to capture empathic, socialized, and conscious aspects of social
intelligence. AmI systems should demonstrate empathic awareness of users’ emo-
tions or emotional states and intentions by exhibiting human-like understanding and
supportive behavior; the way such systems communicate should emphasize com-
pliance with conventions; and the reasoning of such systems should be reliable,
transparent, and conscientious to the user so as to gain acceptance and ensure trust
and confidence. In relation to emotional awareness, the affective quality of AmI
artifacts and environments as well as the smoothness, intuitiveness, and richness of
interaction evoke positive feelings in users. Positive emotions can be induced by
both subjective, socioculturally situated interpretation of aesthetics as well as
subjective experiences of interactive processes. Therefore, AmI systems should be
equipped with user interfaces that merge hypermedia, visual, aesthetic, naturalistic,
multimodal, and context-aware tools—social user interfaces. These involve artifi-
cial and software intelligent agents that interact with humans, creating the sense of
real-world social interaction, thereby supporting users’ social interactive processes.
With its learning capabilities, a social intelligent agent is capable to learn from the
repeated interactions with humans (social interactive processes) and behave on the
basis of the learned patterns while continuously improving the effectiveness of its
performance to become competent in social interactions. This is like other types of
learning machines, where the key and challenge to adding wit (intelligence) to the
environment lies in the way systems learn and keep up to date with the needs of the
user by themselves in light of the potential frequent changes of people, preferences,
and social dynamics in the environment. Social processes and social phenomena are
forms of social interaction. According to Smith and Conrey (2007), a social phe-
nomenon occurs as the result of repeated interactions between multiple individuals,
and these interactions can be viewed as a multi-agent system involving multiple
438 8 Affective Behavioral Features of AmI …
subagents interacting with each other and/or with their environments where the
outcomes of individual agents’ behaviors are interdependent in the sense that each
agent’s ability to achieve its goals depends on what other agents do apart from what
it does itself.
With its social intelligence features, AmI technology is heralding a radical
change in the interaction between human users and computers, giving rise to novel
interaction design that takes the holistic nature of the user into account. HCI
research is currently active on investigating how to devise computational tools that
support social interactive processes and on addressing important questions and
mechanisms underlying such processes. For instance, how to address questions
relating to subjective perception of interactions and aesthetics, the focus in HCI
research is on considering and developing new criteria when it comes to presenting
and customizing interactive tools to support affective processes. As mentioned in
Bianchi-Berthouze and Mussio (2005, p. 384), ‘Fogli and Piccinno suggest using
the metaphor of the working environment to reduce negative affective states in
end-users of computational tools and to improve their performance. Within the
working environment, they identify a key-role user that is an expert in a specific
domain (domain expert), but not in computer science, and that is also aware of the
needs of the user when using computational systems. The approach enables domain
experts to collaborate with software, and HCI engineers to design and implement
context- and emotion-aware interactive systems. The authors have developed an
interactive visual-environment…which enables the domain-expert user to define the
appearance, functionality and organization of the computational environment’.
As regard to creating computational models that support social interactive pro-
cesses, although much work still needs to be done, dynamic models have been
developed for cognitive and emotional aspects of human functioning and imple-
mented in AmI applications. Although these models have yielded and achieved
good results and implemented in laboratory settings, they still lack usability in real
life. This applies, by extention, to the enabling technologies underlying the func-
tioning of AmI systems. Put differently, the extant developed AmI systems are still
associated with some shortcomings as to accurately detect, meaningfully interpret,
and efficiently reason about the cognitive and emotional states of human users, and
therefore they are far from real-word implementation. In terms of social intelli-
gence, ‘[t]he vision of intelligence in AmI designs is taken…to a new level of
complication in describing the conditions that could introduce…true intelligence.
Thus, AmI 2.0 applications demonstrate only a minute step in that direction, e.g., of
facilitating users with the means for intelligent interaction, affective experience, but
also control. The gap that still needs bridging…relates to the following design
problems: (1) how to access and control devices in an AmI environment; (2) how to
bridge the physical and virtual worlds with tangible interfaces; (3) What protocols
are needed for end-user programing of personalized functionality; (4) how to
capture and influence human emotion; (5) how to mediate social interaction for
social richness, immediacy and intimacy; (6) how devices can persuade and
motivate people in a trustful manner, say, to adopt healthier lifestyles, and; (7) how
to guarantee inclusion and ethically sound designs…. [E]xperience research holds
8.9 Socially Intelligent AmI Systems … 439
the key to eventually bridging this gap between the fiction and concrete realizations.
For example, understanding experience from a deep personality point of view will
unlock unlimited possibilities to develop intelligent applications.’ (Gunnarsdóttir
and Arribas-Ayllon 2012, p. 29). It is not an easy task for AmI systems to emulate
socially intelligent understanding and supporting behavior of humans—that is, to,
in particular, select and fine-tune actions according to the affective and cognitive
state of users by analyzing and estimating what is going on in their mind and
behavior based on observed information about their states and actions over time,
using sensor technologies and dynamic models for the their cognitive and emotional
processes, coupled with exploiting the huge potential of machine learning tech-
niques. More effort is needed for further advancement of the mechanisms, tech-
niques, and approaches underlying the functioning of AmI as socially intelligent
entities. One important issue in this regard is that it is necessary for AmI systems
(intelligent social agents) to be designed in such a way to learn in-action and in a
dynamic way from the user’s emotional and cognitive patterns in social interactive
processes so to be able to make educated or well-informed guesses/inferences about
the user’s affective state and the context of the task and thereby determine the best
behavior in real-time manner. Any supporting behavior performed by systems
designed with socially intelligent features in terms of adaptation and responsiveness
should be based on a (dynamic) combination of real-time reasoning capabilities and
pre-programed heuristics. If with regard to humans, following the tenets of cogn-
itivism, the application of time saving heuristics always result in simplifications in
cognitive representations and schemata, pre-programed heuristics may well intro-
duce bias in computational processing—intelligent agent’s favoritism, which may
carry its effects over to application actions. This is predicated on the assumption
that heuristics are fallible and do not guarantee an accurate solution. In computer
science, a heuristic algorithm is able to produce an acceptable solution to a problem
in different scenarios, but for which there is no formal evidence of its correctness.
Besides, given the variety of users and interactions and the complexity inherent in
social interactive processes, it is not enough for intelligent agents to solely rely on
pre-programed heuristics in their functioning. The basic premise is that such heu-
ristics are more likely to affect reasoning efficiency and hence action appropriate-
ness, which might be disturbing or inconvenient to the user, thus failing to adapt to
users’ emotions. Among the main challenges in AmI pertaining to socially intel-
ligent systems are the performance of such systems given that they need to be
timely in acting; effective models of user interaction with such systems, including
their update and improvements over time; and enabling proactivity in such systems
through dynamic learning and real-time reasoning. There is a need for novel
approaches into integrating different learning techniques, modeling approaches, and
reasoning mechanisms to support social interactive processes.
By demonstrating a novel interaction between human users and computing
technology, socially interactive systems determine an evolution in the culture of
computing. On this culture, Schneiderman (2002) claims that the new computing is
about what people can do, while the old computing is about what computers can do.
Advances in knowledge of affective and cognitive processes of humans and how
440 8 Affective Behavioral Features of AmI …
Research and design sensitive to the user’s emotions is required in order for AmI
systems to be socially intelligent—able to select and tune actions according to the
emotional and cognitive states of the user—and thus ensure acceptability, appro-
priateness, and pleasurability. Research in AmI must primarily design effective
methods to evaluate all types of computational artifacts in real-world setting,
including affective, emotion-aware, context-aware affective, cognitive context-
aware, conversational agents, and so on. But affective AmI applications are espe-
cially important in this regard, as they directly concern the emotions of users—their
8.10 Evaluation of AmI Systems in Real-World Settings … 441
by Leventhal and Scherer (1987, cited in Kaiser and Wehrle 2001), the idea of
using facial expressions as indicators is motivated by the fact that emotion-
antecedent information processing can occur at different levels. ‘…appraisal pro-
cesses occurring on the sensory-motor or schematic level that are not or only with
great difficulty accessible through verbalization might be accessible via facial
expressions… Another reason for analyzing facial expressions in experimental
emotion research is that they are naturally accompanying an emotional episode,
whereas asking subjects about their feelings interrupts and changes the process’
(Kaiser and Wehrle 2001, p. 285). However, evaluation should capture not only
users’ emotions, but also other factors of relevance to the interaction experience.
Tähti and Niemelä (2005) argue that ‘to understand the multifaceted interaction
situation with complex AmI systems’, it is necessary to have ‘more profound
information than just the user’s feeling at the moment of interaction’, especially ‘to
understand the context, in which the feeling is evoked in the mind of the user.’
Furthermore, it is important to ensure that evaluation methods are dynamic, flexible,
and easy to use by evaluators. Over-complex formalization of evaluation methods
may interfere with collecting or capturing more relevant, precise information on the
user emotions sought by the evaluator as he/she may get captured by the adherence
to the appropriate application of the method and eventually fails to spot important
information when conducting the evaluation. The complexity of the interaction with
AmI systems calls for or justifies the simplicity of the evaluation method for
emotions. Failure to use or select a proper type of evaluation method has impli-
cation for the task of evaluation. New evaluation methods should be designed in a
way that allow to obtain profound information on complex user experiences in a
simplified way. There is a need for novel assessment tools that allow collecting rich
data of users’ emotions when they are interacting with applications in real life
situations. As Tähti and Niemelä (2005, p. 66) put it, ‘considering the complexity of
interaction between a user and an AmI system, an emotion assessment method
should be able to capture both the emotion and its context to explain what aspect of
interaction affected to the feelings of the user. The method should be applicable for
field tests and easy to use. Furthermore, the method should minimize the influence
of the researcher on the evaluation and possibly enable long-term studying’.
The study of emotion is increasingly gaining attention among researchers in
affective computing and AmI in relation to emotional intelligence, social intelligence,
and emotion communication, and therefore there is a need for evaluation methods for
emotions to address the challenges associated with emotion technology. Especially,
building this technology has proven to be one of the most daunting challenges in
computing. With the aim to address some issues relating to the evaluation of
emotions in AmI, Tähti and Niemelä (2005) developed a method for evaluating
emotions called Expressing Emotions and Experience (3E), which is a self-report
method that allows both pictorial and verbal reporting, combining verbal and non-
verbal user feedback of feelings and experience in a usage situation. It is validated by
comparing it to two emotion assessment methods, SAM and Emocards which are
also self-report instruments using pictograms for nonverbal assessment of emotions.
The development of 3E is described in detail in Tähti and Arhippainen (2004).
444 8 Affective Behavioral Features of AmI …
This method is a way to collect rich data of user’s feeling sand related context—
mental, physical, and social—whilst using an application without too much burden
on the user. It moreover enables to gauge users’ emotions by users being allowed to
depict or express their emotions and experiences by drawing as well as writing, thus
providing information of their feelings and the motivations behind them based on
their preference, and without the simultaneous intervention of the researcher. The
authors point out that their method applies well to AmI use situations that occur in
real-world environments, does not necessarily require the researcher’s presence, and,
as a projective method, may facilitate expression of negative emotions towards the
evaluated system. However, the author state that this method does not apply well to
evaluations in which the purpose is to evaluate detailed properties of an application.
For a detailed discussion of key properties of AmI applications and their evaluation,
the reader is directed to see Chap. 3.
Affective and AmI applications are seen as the most sophisticated computing
systems ever, as they involve complex dimensions of emotions, such as context
dependence of emotions, multimodal context-aware emotions, context-aware
emotional intelligence—contextual appropriateness of emotion, culture-dependent
emotions, and so on.
between different cultures and even within the same culture. In other words, there is
no universal way of expressing emotions. Emotions are expressed and interpreted
differently in different cultures. Therefore, affective applications should be designed
in a flexible way if they are wanted to be used with a wider class of users. Also, it is
important for AmI systems—affective context-aware applications—to consider
adopting a hybrid approach to handling affective context-dependent actions—
delivery of responsive services, that is, merging invisibility and visibility, as users
may have different motives behind their emotional states. Personalization is nec-
essary for more efficient interaction and better acceptation of AmI systems.
Therefore, both affective computing and AmI should focus on producing applica-
tions that can be easily personalized to each user and that can merge explicit and
implicit affective interaction. Each user may have a different intent of emotional
state, and hence there is a need to properly adjust parameters accordingly as well as
allow accepting or declining the so-called responsive service. However, only the tip
of the iceberg are the above issues and challenges that affective computing and AmI
research should address and overcome in order to design widely accepted
technologies.
Discrete emotion theory posits that there are only a limited number of fundamental
emotions and that there exists a prototypical and universal expression pattern for
each of them. Facial expressions have been discrete emotion theorists’ main evi-
dence for holistic emotion programs (Ellsworth 1991; Ortony and Turner 1990).
However, the notion of basic emotions seems to be a subject of an endless debate,
and there are a lot of unsettled issues in this regard. Many theorists have criticized
the concept of basic or discrete emotions. The overemphasis on the face as
expressing discrete and fundamental emotions has been a corollary of Tomkins’
(1962) notion of innate affect programs affecting the facial muscles (Scherer 1994).
For Scherer (1992) and Kaiser and Scherer (1998) the complexity and variability of
different emotional states can be explained without resorting to a notion of basic
emotions, and the current emotion labels of a large number of highly differentiated
emotional states capture only clusters of regularly recurring ones. Further, findings
of universal prototypical patterns demonstrating the emotions of the six facial
expressions do not enable researchers to interpret them as unambiguous indicators
of emotions in spontaneous interactions (Kaiser and Wehrle 2001). ‘Given the
popularity of photographs displaying prototypical emotion expressions, we need to
remind ourselves that expression does not consist of a static configuration. Rather it
is characterized by constant change’ (Scherer 1994, p. 4). Studying the link between
facial expressions and emotions involve a variety of problems: ‘the mechanisms
linking facial expressions to emotions are not known’, ‘the task of analyzing the
8.11 Issues, Limitations, and Challenges 449
incomplete, which may have implication on the subsequent reasoning and inference
processes and thereby the appropriateness of responsive services. And even though
different communication channels of emotion might be available and accessible, it
can be challenging for an affective system to meaningfully interpret a user’s
emotional state in the sense of being able to join the contributions to the meaning of
emotions provided through various modalities in the analysis. ‘It is difficult to
separate the contributions to the meaning provided through various modalities or
channels and the final message is not their simple “sum.” The information conveyed
through one modality or channel may be contrary to what is conveyed through the
other; it may modify it or extend it in many ways. Accordingly, the meaning of a
multimodal utterance [e.g., emotion as multimodal expression] should be, in
principle, always regarded and analyzed as a whole, and not decomposed into the
meaning of speech, gestures, facial expressions and other possible components. For
example, a smile and words of appraisal or admiration may produce the impression
of being ironic in a certain context’ (Ibid). This applies to emotion as a multimodal
affective expression in the context of affective or AmI systems. A prosodic channel
(i.e., pitch, tempo, intonation) may modify or extend affective information that is
conveyed though facial expression or gesture channel.
Furthermore, as to visual and auditory modalities, affective information may be
degraded, i.e., noise may affect auditory sensors or distance may affect visual
modality. Therefore, the meaning of the user’s emotional state may change based
on whether the affective information is utterly or incompletely captured as implicit
input from the user’s affect displays as signals. The most significant challenge for
affective systems is to analyze and interpret the meaning of a multimodal emotional
expression as a whole and not decomposed into the meaning of separate verbal and
nonverbal signals. In addition, the auditory modality differs from visual modality in
several aspects. Visual modality offers better emotion recognition, which has an
impact on the quality of the estimation of the user’s emotional state. Auditory
modality is omnidirection, transient and is always reserved (Oviatt 2002). Computer
systems tend to lack olfactory sensory modality, which is considered to be
important when it comes to communicating emotions among humans. In fact, touch
is typically linked to emotions. Touch communicates a wide variety of messages
(Jones and Yarbrough 1985). Olfactory modality too often complements visual and
auditory modality when conveying emotions. Rather, it significantly shapes the
patterns of communication and the informational state of the receiver.
In all, there is so much work left to be done in affective computing and AmI as to
interpreting more subtle shades of multimodal emotional behavior. Affective
computing should take into account a holistic perspective as to the conceptuali-
zation and modeling of emotion in relation to human communication. This
includes, among others, the synergic relationship between multimodality,
multi-channeling and the meaning of emotion in communication acts and the
non-intentionality and uncontrollability of communication behavior, including
facial expressions, paralanguage, and gesture, in relation to emotions. Based on
nonverbal communication studies, a number of unintentional, uncontrolled signals
are produced during the process of emotion communication.
References 453
References
Aarts E, de Ruyter B (2009) New research perspectives on Ambient Intelligence. J Ambient Intell
Smart Environ 1(1):5–14
ACM SIGCHI (2009) Curricula for human–computer interaction. Viewed 20 Dec 2009. http://old.
sigchi.org/cdg/cdg2.html#2_1
Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Underst 73
(3):428–440
Álvarez A, Cearreta I, López JM, Arruti A, Lazkano E, Sierra B, Garay N (2006) Feature subset
selection based on evolutionary algorithms for automatic emotion recognition in spoken
Spanish and standard Basque languages. In: Sojka P, Kopecek I, Pala K (eds) Text, speech and
dialog. LNAI, vol 4188, Springer, Berlin, pp 565–572
Andre E, Rehm M, Minker W, Buhler D (2004) Endowing spoken language dialogue systems with
emotional intelligence. LNCS, vol 3068. Spriner, Berlin, pp 178–187
Argyle M (1990) The psychology of interpersonal behavior. Penguin, Harmondsworth
Asteriadis S, Tzouveli P, Karpouzis K, Kollias S (2009) Estimation of behavioral user state based
on eye gaze and head pose—application in an e-learning environment. Multimedia Tools Appl
41(3):469–493
Balomenos T, Raouzaiou A, Ioannou S, Drosopoulos A, Karpouzis K, Kollias S (2004) Emotion
analysis in man–machine interaction systems. In: Bengio S, Bourlard, H (eds) Machine
learning for multimodal interaction. Lecture Notes in Computer Science, vol 3361. Springer,
Berlin, pp 318–328
Bänninger-Huber E (1992) Prototypical affective microsequences in psychotherapeutic interac-
tions. Psychother Res 2:291–306
Beijer F (2002) The syntax and pragmatics of exclamations and other expressive/emotional
utterances. Working papers in linguistics 2. The Department of English, Lund Unversity, Lund
Bianchi-Berthouze N, Mussio P (2005) Introduction to the special issue on “context and emotion
aware visual computing”. J Vis Lang Comput 16:383–385
Boyatzis R, Goleman D, Rhee K (2000) Clustering competence in emotional intelligence: insights
from the emotional competence inventory (ECI). In: Bar-On R, Parker JDA (eds) Handbook of
emotional intelligence. Jossey-Bass, San Francisco, pp 343–362
Braisby NR, Gellatly ARH (2005) Cognitive psychology. Oxford University Press, New York
Brehm JW, Self EA (1989) The intensity of motivation. Annu Rev Psychol 40:109–131
Calvo RA, D’Mello SK (2010) Affect detection: an interdisciplinary review of models, methods,
and their applications. IEEE Trans Affect Comput 1(1):18–37
Campos J, Campos RG, Barrett K (1989) Emergent themes in the study of emotional development
and emotion regulation. Dev Psychol 25(3):394–402
Caridakis G, Malatesta L, Kessous L, Amir N, Raouzaiou A, Karpouzis K (2006) Modeling
naturalistic affective states via facial and vocal expressions recognition. In: International
conference on multimodal interfaces (ICMI’06), Banff, Alberta, Canada
Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Becket T, Douville B, Prevost S,
Stone M (1994) Animated conversation: rule-based generation of facial expressions, gesture
and spoken intonation for multiple conversational agents. In: Proceedings of SIGGAPH, ACM
Special Interest Group on Graphics, pp 413–420
Cearreta I, López JM, Garay-Vitoria N (2007) Modelling multimodal context-aware affective
interaction. Laboratory of Human–Computer Interaction for Special Needs, University of the
Basque Country
Chibelushi CC, Bourel F (2003) Facial expression recognition: a brief tutorial overview. In:
Fisher R (ed) On-line compendium of computer vision, CVonline
Chiu C, Chang Y, Lai Y (1994) The analysis and recognition of human vocal emotions. Presented
at international computer symposium, pp 83–88
454 8 Affective Behavioral Features of AmI …
Cohen I, Sebe N, Chen L, Garg A, Huang T (2003) Facial expression recognition from video
sequences: temporal and static modeling. Comput Vis Image Underst 91(1–2):160–187
(special issue on face recognition)
Cohn J, Zlochower A, Lien JJJ, Kanade T (1998) Feature-point tracking by optical flow
discriminates subtle differences in facial expression. In: Proceedings of the 3rd IEEE
international conference on automatic face and gesture recognition, pp 396–401
Cohn J, Zlochower A, Lien JJJ, Kanade T (1999) Automated face analysis by feature point
tracking has high concurrent validity with manual face coding. Psychophysiology 36:35–43
Cornelius R (1996) The science of emotions. Prentice Hall, Upper Saddle River
Cowie R, Douglas-Cowie E, Cox C (2005) Beyond emotion archetypes: databases for emotion
modelling using neural networks. Neural Netw 18(4):371–388
Damasio AR (1989) Time-locked multiregional retroactivation: a systems level proposal for the
neural substrates of recall and recognition. Cognition 33(1–2):25–62
Damasio A (1994) Descartes’ error: emotion, reason, and the human Brain. Grosset/Putnam,
New York
Darwin C (1872) The expression of emotion in man and animals. IndyPublish, Virginia
Dellaert F, Polizin T, Waibel A (1996a) Recognizing emotion in speech. In: Proceedings of ICSLP
1996, Philadelphia, PA, pp 1970–1973
Dellaert F, Polzin T, Waibel A (1996b) Recognizing emotion in speech. In: International
conference on spoken language processing (ICSLP)
Desmet P (2002) Designing emotions. Doctoral dissertation, Delft University of Technology
DeVito J (2002) Human essentials of human communication. Allyn & Bacon, Boston
Edwards GJ, Cootes TF, Taylor CJ (1998) Face recognition using active appearance models. In:
Burkhardt H, Neumann B (eds) ECCV 1998. LNCS, vol 1407. Springer, Heidelberg, pp 581–595
Ekman P (1972) Universals and cultural differences in facial expressions of emotions. In: Cole J
(ed) Nebraska symposium on motivation. University of Nebraska Press, Lincoln, NB,
pp 207–282
Ekman P (1982) Emotions in the human face. Cambridge University Press, Cambridge
Ekman P (1984) Expression and nature of emotion. Erlbaum, Hillsdale
Ekman P (1993) Facial expression and emotion. Am Psychol 48(4):384–392
Ekman P (1994) All emotions are basic. In: Ekman P, Davidson RJ (eds) The nature of emotion:
fundamental questions. Oxford University Press, Oxford
Ekman P (1999) Facial expressions. In: Dalgleish T, Power MJ (eds) The handbook of cognition
and emotion. Wiley, New York, pp 301–320
Ekman P, Friesen WV (1972) Hand movements. J Commun 22:353–374
Ekman P, Friesen WV (1975) Unmasking the face: a guide to recognizing emotions from facial
clues. Prentice-Hall, Englewood Cliffs
Ekman P, Friesen WV (1978) The facial action coding system: a technique for the measurement of
facial movement. Consulting Psychologists Press, San Francisco
Ekman P, Rosenberg EL (eds) (1997) What the face reveals. Oxford University Press, Oxford
Ekman P, Friesen WV, Ellsworth F (1972) Emotion in the human face: guidelines for research and
an integration of findings. Pergamon Press, NY
Ellsworth PC (1991) Some implications of cognitive appraisal theories of emotion. In:
Strongman KT (ed) International review of studies on emotion, vol 1. Wiley, Chichester,
pp 143–161
Freud S (1975) Beyond the pleasure principle. Norton, New York
Friesen WV, Ekman P (1982) Emotional facial action coding system. Unpublished manuscript,
University of California at San Francisco
Frijda NH (1986) The emotions. Cambridge University Press, Cambridge
Frijda NH, Tcherkassof A (1997) Facial expressions as modes of action readiness. In: Russel JA,
Fernández-Dols JM (eds) The psychology of facial expression. Cambridge University Press,
Cambridge, pp 78–102
Galotti KM (2004) Cognitive psychology in and out of the laboratory. Wadsworth, Belmont
References 455
Gardner R (2001) When listeners talk: response tokens and listener stance. John Benjamins
Publishing Company, Amsterdam
Gardner RC, Lambert WE (1972) Attitudes and motivation in second language learning. Newbury
House, Rowley
Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Underst
73:82–98
Gavrila DM, Davis LS (1996) 3-D model-based tracking of humans in action: a multi-view
approach. In: Proceedings of IEEE conference on computer vision and pattern recognition,
IEEE Computer Society Press, pp 73–80
Goleman D (1995) Emotional intelligence. Bantam Books, New York
Graham JA, Argyle MA (1975) Cross-cultural study of the communication of extra-verbal
meaning by gestures. Int J Psychol 10:57–67
Graham JA Ricci, Bitti P, Argyle MA (1975) Cross-cultural study of the communication of
emotions by facial and gestural cues. J Hum Mov 1:68–77
Gray JA (1991) Neural systems emotions, and personality. In: Madden J IV (ed) Neuro-biology of
learning, emotion and effect. Raven Press, New York, pp 273–306
Gunes H, Piccardi M (2005) Automatic visual recognition of face and body action units. In:
Proceedings of the 3rd international conference on information technology and applications,
Sydney, pp 668–673
Gunnarsdóttir K, Arribas-Ayllon M (2012) Ambient intelligence: a narrative in search of users.
Lancaster University and SOCSI, Cardiff University, Cesagen
Hager JC, Ekman P, Friesen WV (2002) Facial action coding system. A Human Face, Salt Lake
City, UT
Heise D (2004) Enculturating agents with expressive role behavior. In: Agent culture: human–
agent interaction in a mutlicultural world. Lawrence Erlbaum Associates, Hillsdale, pp 127–
142
Hekkert P (2004) Design aesthetics: principles of pleasure in design. Department of Industrial
Design, Delft University of Technology, Delft
Huang TS, Pavlovic VI (1995) Hand gesture modeling, analysis, and synthesis. In: Proceedings of
international workshop on automatic face and gesture recognition, Zurich, Switzerland
Ikehara CS, Chin DN, Crosby ME (2003) A model for integrating an adaptive information filter
utilizing biosensor data to assess cognitive load. In: Brusilovsky P, Corbett AT, de Rosis F
(eds) UM 2003. LNCS, vol 2702. Springer, Heidelberg, pp 208–212
Izard CE (1994) Innate and universal facial expressions: evidence from developmental and
cross-cultural research. Psychol Bull 115:288–299
Jakobson R (1960) Closing statement: linguistics and poetics. In: Sebeok TA (ed) Style in
language. The MIT Press, Cambridge, pp 350–377
James W (1884) Psychological essay: what is an Emotion? Mind 9:188–205
Jones SE, Yarbrough AE (1985) A naturalistic study of the meanings of touch. Commun Monogr
52:19–56
Kaiser S, Scherer KR (1998) Models of ‘normal’ emotions applied to facial and vocal expressions
in clinical disorders’. In: Flack WF, Laird JD (eds) Emotions in psychopathology. Oxford
University Press, New York, pp 81–98
Kaiser S, Wehrle T (2001) Facial expressions as indicators of appraisal processes. In: Scherer KR,
Schorr A, Johnstone T (eds) Appraisal processes in emotions: theory, methods, research.
Oxford University Press, New York, pp 285–300
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In:
Proceedings of the 4th IEEE international conference on automatic face and gesture recognition
(FG’00), Grenoble, France, pp 46–53
Kang BS, Han CH, Lee ST, Youn DH, Lee C (2000) Speaker dependent emotion recognition
using speech signals. In: Proceedings of ICSLP, pp 383–386
Kapur A, Virji-Babul N, Tzanetakis G, Driessen PF (2005) Gesture-based affective computing on
motion capture data. In: Proceedings of the 1st international conference on affective computing
and intelligent interaction, Beijing, pp 1–7
456 8 Affective Behavioral Features of AmI …
Murray I, Arnott J (1996) Synthesizing emotions in speech: is it time to get excited? In:
Proceedings of the international conference on spoken language processing (ICSLP’96),
Philadelphia, PA, USA, pp 1816–1819
Myers DG (2004) Theories of emotion, psychology. Worth Publishers, New York
Nakamura A (1993) Kanjo hyogen jiten (Dictionary of emotive expressions) (in Japanese),
Tokyodo
Nijholt A, Rist T, Tuijnenbreijer K (2004) Lost in ambient intelligence? In: Proceedings of CHI
2004, Vienna, Austria, pp 1725–1726
Noldus L (2003) Homelab as a scientific measurement and analysis instrument. Philips Res
2003:27–29
Ortony A, Turner TJ (1990) What’s basic about basic emotions? Psychol Rev 97:315–331
Ortony A, Clore GL, Collins A (1988) The cognitive structure of emotions. Cambridge University
Press, Cambridge
Oviatt S (2002) Multimodal interfaces. In: Jacko JA, Sears A (eds) A handbook of human–
computer interaction. Lawrence Erlbaum, New Jersey
Pantic M, Rothkrantz LJM (2000) Automatic analysis of facial expressions: the state of the art.
IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445
Pantic M, Rothkrantz LJM (2003) Toward an affect sensitive multimodal human-computer
interaction. Proc IEEE 91(9):1370–1390
Passer MW, Smith RE (2006) The science of mind and behavior. McGraw Hill, Boston
Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for
human–computer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19(7):677–695
Petrides KV, Furnham A (2000) On the dimensional structure of emotional intelligence. Pers
Individ Differ 29:313–320
Petrides KV, Pita R, Kokkinaki F (2007) The location of trait emotional intelligence in personality
factor space. Br J Psychol 98:273–289
Phillips PJ, Flynn PJ, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Jaesik M,
Worek W (2005) Overview of the face recognition grand challenge. In: Proceeding of IEEE
computer society conference on computer vision and pattern recognition
Picard RW (1997) Affective computing. MIT Press, Cambridge
Picard RW (2000) Perceptual user interfaces: affective perception. Commun ACM 43(3):50–51
Picard RW (2010) Emotion research by the people, for the people. Emot Rev 2(3):250–254
Picard WE, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of
affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
Plutchik R, Kellerman H (1980) Emotion: theory, research and experience. Academic Press, New
York
Pope LK, Smith CA (1994) On the distinct meanings of smiles and frowns. Cogn Emot 8:65–72
Ptaszynski M, Dybala P Shi, Rafal W, Araki RK (2009) Towards context aware emotional
intelligence in machines: computing contextual appropriateness of affective states. Graduate
School of Information Science and Technology, Hokkaido University, Hokkaido
Punie Y (2003) A social and technological view of ambient intelligence in everyday life: what
bends the trend? The European media and technology in everyday life network, 2000–2003.
Institute for Prospective Technological Studies Directorate General Joint Research Center
European Commission
Reeve J (2005) Understanding motivation and emotion. Wiley, New York
Riva G, Vatalaro F, Davide F, Alcañiz M (2005) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human–computer interaction.
IOS Press, Amsterdam
Roseman IJ (1984) Cognitive determinants of emotion: a structural theory. In: Shaver P
(ed) Review of personality and social psychology. Sage, Beverly Hills, pp 11–36
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39:1161–1178
Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 1:145–172
Sagisaka Y, Campbell N, Higuchi N (1997) Computing prosody. Springer, New York
Salovey P, Grewal D (2005) The science of emotional intelligence. Curr Dir Psychol Sci 14:281–285
458 8 Affective Behavioral Features of AmI …
Stevenson C, Stevenson L (1963) Facts and values—studies in ethical analysis. Yale University
Press, New Haven
Susskinda JM, Littlewortb G, Bartlettb MS, Movellanb J, Anderson AK (2007) Human and
computer recognition of facial expressions of emotion. Neuropsychologia 45:152–162
Tähti M, Arhippainen L (2004) A Proposal of collecting emotions and experiences. Interact
Exp HCI 2:195–198
Tähti M, Niemelä M (2005) 3e—expressing emotions and experiences. Medici Data oy, VTT
Technical Research Center of Finland, Finland
Tao J, Tieniu T (2005) Affective computing and intelligent interaction. LNCS, vol 3784. Springer,
Berlin, pp 981–995
Teixeira J, Vinhas V, Oliveira E, Reis L (2008) A new approach to emotion assessment based on
biometric data. In: Proceedings of WI–IAT’08, pp 459–500
ter Maat M, Heylen D (2009) Using context to disambiguate communicative signals. In:
Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals. LNAI, vol 5398.
Springer, Berlin, pp 164–169
Tian YL, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE
Trans Pattern Anal Mach Intell 23(2):97–115
Tokuhisa R, Inui K, Matsumoto Y (2008) Emotion classification using massive examples extracted
from the Web. In: Proceedings of COLING 2008, pp 881–888
Tomkins SS (1962) Affect, imagery, consciousness: the positive affects. Springer, New York
Turk M, Robertson R (2000) Perceptual user interfaces. Commun ACM 43(3):33–44
Vick RM, Ikehara CS (2003) Methodological issues of real time data acquisition from multiple
sources of physiological data. In: Proceedings of the 36th annual Hawaii international
conference on system sciences. IEEE Computer Society, Washington DC, pp 1–156
Wimmer M (2007) Model-based image interpretation with application to facial expression
recognition. PhD thesis, Technische Universitat Munchen, Institute for Informatics
Wimmer M, Mayer C, Radig B (2009) Recognizing facial expressions using model-based image
interpretation. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals:
cognitive and algorithmic issues. Springer, Berlin, pp 328–339
Wu H (2004) Sensor data fusion for context-aware computing using Dempster-Shafer theory. PhD
thesis, Carnegie Mellon University
Yin X, Xie M (2001) Hand gesture segmentation, recognition and application. In: Proceedings of
IEEE international symposium on computational intelligence in robotics and automation,
Canada, pp 438–443
Zhang P (2008) Motivational affordances: reasons for ICT design and use. Commun ACM 51(11)
Zhou J, Kallio P (2005) Ambient emotion intelligence: from business awareness to emotion
awareness. In: Proceeding of 17th international conference on systems research, informatics
and cybernetics, Baden, Germany
Zhou J, Yu C, Riekki J, Kärkkäinen E (2007) AmE framework: a model for emotion-aware
ambient intelligence. University of Oulu, Department of Electrical and Information
Engineering, Faculty of Humanities, Department of English VTT Technical Research Center
of Finland
Chapter 9
The Cognitively Supporting Behavior
of AmI Systems: Context Awareness,
Explicit Natural (Touchless) Interaction,
Affective Factors and Aesthetics,
and Presence
9.1 Introduction
The supporting cognitive behavior of AmI systems involves different aspects and
thus application domains. One of the cornerstones of AmI is the adaptive behavior
of systems in response to the user’s cognitive state. The functionality of AmI
systems to act according to the user’s cognitive context is associated with cognitive
context-aware applications, which aim to reduce the cognitive burden involved in
performing tasks or carrying out activities, by helping users to cope with these tasks
in intuitive ways (and also freeing them from tedious ones). AmI aspires to create
technology that supports people’s cognitive needs, including decision making,
problem solving, visual perception, information searching, information retrieval,
and so on. These pertain to such cognitive activities as writing, reading, learning,
design, game playing, activity organizing, Internet surfing, and so forth. Hence,
AmI systems should be able to intelligently adapt to the user’s behaviors and
actions, by recognizing the cognitive dimension of context and modifying their
functionality accordingly. In addition, AmI systems should be able to utilize and
respond to speech and gestures (facial, hand, and eye gaze movements) as com-
mands (new forms of explicit inputs) to perform tasks more effectively and effi-
ciently on behalf of users. This design feature of AmI promises simplicity and
intuitiveness, and will enable the user to save considerable cognitive effort when,
for example, navigating between documents, surfing the Internet, scrolling, reading,
writing, and working. Most importantly, any reaction to cognitive behaviors or
explicit gestured or spoken commands must be performed in a way that is articu-
lated as appropriate and desirable.
One important aspect of AmI is the system feature of social intelligence.
A system designed with socially intelligent features is able to select and fine-tune its
behavior according to the cognitive state and affective state of the user and thus
invoke positive feelings in users. The aim of AmI is to design applications and
environments that elicit positive emotions or induce positive emotional states and
Unlike affect and emotion whose terminology is still, albeit the major strides made
by cognitive and neurosciences in the past two decades, an issue of technical
debates, cognition seems to be overall a well-understood notion. Cognition has
been studied in various disciplines, such as cognitive psychology, social psychol-
ogy, cognitive science, computer science, socio-cognitive engineering, cognitive
anthropology, neuroscience, linguistics, cognitive linguistics, phenomenology,
analytic philosophy, and so on. Hence, it has been approached and analyzed from
different perspectives. In other words, it means different things to different people.
‘To a computer scientist, the mind might be something that can be simulated
through software or hardware… On the other hand, to a cognitive psychologist, the
mind is the key to understanding human or animal behavior. To a cognitive neu-
roscientist, the mind is about the brain and its neurological underpinnings… The list
goes on’ (Boring 2003). In social cognition, which is a branch of social psychology,
the term ‘cognition’ is used to explain attitudes and groups dynamics. In this
chapter, the emphasis is on the definition of cognition as related to cognitive
psychology and thus cognitive science because of its relevance to computing—and
thus AmI. In this sense, as a scientific term cognition refers to an information
processing view of the mental processes of humans as intelligent entities. In cog-
nitive science, intelligent entities also include highly autonomous computers. In
cognitive science, the term ‘cognitive’ is used to describe any kind of mental
464 9 The Cognitively Supporting Behavior of AmI Systems …
process that can be examined in precise terms (Lacoff and Johnson 1999).
A process is any activity that involves more than one operation. Therefore, cog-
nition is an information processing system to perceive and make sense of the world
—and hence an experience-based system. Perception interprets and assigns
meaning, and sense-making refers to the process by which people give meaning to
experience. Cogntivism emphasizes that cognitions are based on perceptions and
that there is no cognition without mental representations of real objects, people,
events, and processes occurring in the world.
brain (Passer and Smith 2006). This process uses memory to recover any prior
knowledge we might have that could be relevant to solving a new problem (Braisby
and Gellatly 2005). The desired outcome we expect is the goal that directs the
course of our thinking to overcome the existing situation by guiding retrieval of
goal-relevant information from long-term memory (Ibid). Problem solving is con-
sidered as a fundamental human cognitive process that serves to deal with a situ-
ation or solve issues encountered in daily life. Decision making is the process of
reaching a decision through deciding on an issue by selecting and rejecting avail-
able options, to choose between available alternatives. Therefore, it involves
weighting the positives and negatives of each alternative, considering all the
alternatives, and determining which alternative is the best for a given situation. This
is, mapping the likely consequences of decisions, working out the importance of
individual factors, and choosing the best alternative. Most of the decisions we make
or action we take relates to some kind of problems we try to solve. Research shows
that emotion has a great impact on problem solving and decision making as cog-
nitive processes. As far as emotion and motivation are concerned, they are dis-
cussed adequately in more detail in the previous chapter.
Cognitive psychologists have proposed a range of theoretical models of cogni-
tion. Similarly, cognitive scientists have developed various models in relation to
computing, such as computational model, decisional model, analytical model,
learning model, and formal reasoning model. These models are inspired by human
mental processes, namely computation, decision making, problem solving, learning,
and reasoning. They are of high applicability in AmI systems as autonomous
entities inspired by human cognitive intelligence, including cognitive context-aware
applications, which aim to facilitate and enhance mental abilities associated with
cognitive intelligence, by using computational capabilities.
co-location, group dynamics, activity, and emotional state. Internal context include
psychophysiological state, cognitive state, and personal event. The external context
is a physical environment, while the internal context is a psychological context that
does not appear externally (Giunchiglia and Bouquet 1988; Kintsch 1988). The
focus of this chapter is on the cognitive (task) dimension of human context, which
may appear externally or internally. In general, human factors related context
encompass, according to Schmidt et al. (1999), three categories: information on the
user (knowledge of habits, emotional state, bio-physiological conditions), the user’s
tasks (activity, engaged tasks, general goals), and the user’s social environment
(social interaction, co-location of other, group dynamics). Regardless, a
context-aware application should be able to act in an interrelated, dynamic fashion
based on the interpretation of a set of atomic contextual elements that are of central
concern to the user, which can be transformed into a higher level abstraction of
context, prior to delivering relevant adaptive services.
AmI technology holds a great potential for permeating everyday life and changing
the nature of almost every human activity. The basic idea of AmI as an emerging
computing paradigm is about what people can do by what computers can do in
contrast to the old computing paradigm which entails what computers can do—i.e.,
be intelligent enough to augment human cognitive intelligence in action and not
only intelligent in executing complex tasks. In AmI people should be empowered
through a smart computing environment that is aware of their cognitive context and
is adaptive and proactive in response to their cognitive needs, among others. In
other words, one feature of AmI is that the services delivered in AmI environments
should adaptively and proactively change according to the user’s cognitive context
and be delivered prior to the user. This feature emphasizes context awareness and
intelligence functionality of AmI systems, a technological feature which involves,
among others, augmenting interactive systems with cognitive capabilities that allow
them to better understand, support, and enhance those of users. Cognitive context is
one of the key elements of the context information amalgam necessary to guide
computational understanding of knowledge-based interaction particularly in rela-
tion to enhancing task and activity performance.
Underpinning AmI is the adaptive behavior of systems in response to the user’s
cognitive state. The computational functionality of AmI systems to act in accor-
dance to the cognitive dimension of context is associated with what is termed
cognitive context-aware applications. Such applications should be able to recognize
the user’s cognitive context in the state of performing a given task or carrying out a
given activity, by means of transforming atomic internal or external elements of
context into a high-level abstraction of context (i.e., sensor-based information are
converted into reusable semantic interpretation of low-level context information—a
process which is known as context inference), and adapt their behavior to best
9.4 Cognitive Context-Aware Computing 467
match the inferred context, that is, meet the user’s cognitive need, by providing the
most relevant services in support of the user’s tasks or activities. The cognitive
dimension of context must be accurately detected, meaningfully interpreted, effi-
ciently reasoned about in order to determine an appropriate response and act upon
it. AmI supports a wide variety of cognitive needs, including decision making,
information searching, information retrieval, problem solving, visual perception,
reasoning, and so on, which are associated with such cognitive activities as writing,
reading, learning, planning, design, game playing, activity organizing, Internet
surfing, and so forth. In light of this, the aim of cognitive context-aware applica-
tions is to reduce the cognitive burden involved in performing tasks or carrying out
everyday life activities. For example, in Web-based information system, cognitive
context awareness feature can help the user to work with a system conveniently and
enable an existing system to deliver AmI services (Kim et al. 2007). In this context,
the cognitive context, which is relevant to psychological state, can be inferred using
such internal context as user’s intention, work context, task goal, business process,
and personal event (Gwizdka 2000; Lieberman and Selker 2000). It is important to
note that the cognitive context may mean different psychological states at different
moments while performing a given task or carrying out a given activity and that one
task might involve one or more cognitive states, such as information retrieval and
problem solving. The range of scenarios for which cognitive context may be uti-
lised is potentially huge. AmI systems can anticipate and intelligently adapt to the
user’s actions, by recognizing the cognitive dimension of context and modifying
their behavior accordingly, e.g., adapt interfaces to ease visual perception, tailor the
set of application-relevant data, enhance decision-making accuracy, recommend
and execute services, enhance memorization, increase the precision of information
retrieval, reduce frustration and thus avoid users to make mistakes, stimulate cre-
ative thinking, facilitate problem solving, enhance learning, and so on.
In context-aware computing research, emotional states as dimensions of the
emotional context have increasingly gained attention, compared to cognitive states
as dimensions of the cognitive context. This is due to the recent joint research
endeavors integrating affective computing with context awareness computing. On
the other hand, research has been less active on the topic of cognitive context. The
lack of interest in this area is likely to be explained by, among other things, the
daunting challenges and subtle intricacies associated with capturing, modeling, and
inferring the cognitive states of users, especially novel recognition and modeling
approaches based on nonverbal behavior are still evolving and hence have not
reached a mature stage yet. In other words, related research in HCI is still in its
infancy.
Advanced knowledge of human cognition, new discoveries in cognitive science,
and further advancement of AI are projected to have strong implications for AmI
system engineering, design, and modeling. One of which is that cognitive
context-aware systems will be able to recognize complex cues of the user’s cog-
nitive behavior using miniature multisensory devices as well as dynamic learning of
stochastic cognitive contexts or activities models—cognitive states and behaviors
pertaining to information handling when reasoning, visually perceiving objects,
468 9 The Cognitively Supporting Behavior of AmI Systems …
solving problems, making decisions, and so on. This will enable cognitive
context-aware systems to make more accurate inferences about cognitive contexts
and what kind of services should be delivered in support of cognitive needs.
Thereby, computational resources and competencies are harnessed and channeled
towards facilitating and enhancing cognitive abilities of users in a more intuitive
way. Cognitive context-aware systems have a great potential to heighten user
interaction experience, by reducing the cognitive burden associated with perform-
ing difficult tasks and activities—the ever-increasing complexity of, and the mas-
sive use of ICT in, everyday life.
cognitive dimension of context can be inferred or deduced using facial cues or eye
movement as an external context—an atomic level of the context. To iterate, the
contribution of IA has been significant with regard to pattern recognition techniques,
ontological modeling techniques, naturalistic user interfaces, facial expression rec-
ognition, and computer vision.
Based on the literature, a few methods for capturing, representing, and inferring
cognitive context have been developed. And the few practical attempts to imple-
ment cognitive context are still far from real-world implementation. In other words,
concrete applications using software algorithmic approaches to cognitive context
recognition have not been instantiated in real-world environments. It is also noticed
that frameworks for developing cognitive context-aware applications seem to be far
less than those for developing affective context-aware applications.
In a study carried out by Kim et al. (2007), the authors propose the context
inference and service recommendation algorithms for the Web-based information
system (IS) domain. The context inference algorithm aims to recognize the user’s
intention as a cognitive context within the Web-based IS, while the service rec-
ommendation algorithm delivers user-adaptive or personalized services based on
the similarity measurement between the user preferences and the deliver-enabled
services. In addition, the authors demonstrate cognitive context awareness on the
Web-based IS through implementing the prototype deploying the two algorithms.
The aim of the proposed system deploying the context inference and service rec-
ommendation algorithm is to help the IS user to work with an information system
conveniently and enable an existing IS to deliver AmI services. For further detail on
the context inference and service recommendation framework see Chap. 5.
A few other studies have been done on the inference and adaptation to cognitive
context. Prekop and Burnett (2003) suggested a conceptual model of activity-
centric context, which focuses on creating context-aware applications that support
cognitive activities, but is far from a real-world implementation. Also, Kwon et al.
(2005) proposed a Need Aware Multi-Agent (NAMA) system, which attempts to
recognize both the cognitive context and the physical context, a research endeavor
that credits a contribution in the view of considering both contexts. While Kim et al.
(2007) system considers only cognitive context but adopts pretty much a similar
algorithmic approach to the inference of the cognitive context and the service
delivery or recommendation as NAMA, the inference algorithm used in the latter to
recognize the user’s context in is far from real-world application, and the method
for collecting the internal context is not accurate.
470 9 The Cognitively Supporting Behavior of AmI Systems …
Information searching and information retrieval are among the most frequent sets of
tasks or activities performed or carried out by users. They constitute either separate
actions in themselves or part of other complex tasks or activities. In either case, they
can be inferred or classified as cognitive dimensions of context by the so-called
cognitive context-aware applications. In reference to the first case, these applica-
tions recognize the cognitive dimension of context from the task the user is doing—
in this case information searching or retrieval, by detecting one or more internal
contexts at an atomic level and transforming them into a high-level abstraction of
context, reason about it, and then deliver the relevant service—recommending a list
of potentially needed documents to be retrieved along with their sources in rele-
vance to the task and other contextual elements. One approach used by cognitive
context-aware applications to perform the application action is what is called
metadata, which documents are tagged with. Metadata involve the document name,
the time and date of creation, and additional information related to the context in
which the system is being used. There is no limit to metadata (Ulrich 2008). To fire
or execute the context-dependent action, cognitive context-aware applications use a
context query language (e.g., Reichle et al. 2008) to access context information
from context providers to respond to the user’s cognitive need. Table 9.1 illustrates
some examples of how context can be used to retrieve documents (Schmidt 2005).
Both basic and context-driven metadata are important part of the data stored—
necessary to retrieve documents. The association between documents based on the
context used is critical as a criterion for information retrieval. For example, all
documents that have been open together with a given document—same time, same
location, and same project—can be retrieved as an adaptive service to be delivered
to the user. In all, metadata is of importance to cognitive context-aware applications,
as it reduces the cognitive burden that the user would otherwise incur to complete
the task at hand—information searching and retrieval. Applications that automati-
cally capture context are central to the idea of AmI (Abowd 1999) and iHCI
(Schmidt 2005).
Having information on the current context (e.g., cognitive and physical dimension
of context), it becomes possible to build user interfaces that adapt to the user’s
cognitive and environmental context. For example, once the cognitive dimension of
context is recognized in the state of reading documents, physical changes in the
environment can be used as an external context (e.g., location, temperature, time,
lighting, etc.) by the system in order to adapt its functionality to the user’s cognitive
context, such as visual perception and visual attention. Having awareness of dif-
ferent, yet related, contextual dimensions, cognitive context-aware applications can
adjust (its interfaces) for use in different situations, which should occur without
conscious mediation. Were context is available during runtime in systems it
becomes feasible to adjust the user interfaces at runtime; however, the requirements
for the user interfaces are dependent on, in addition the user and the context, the
application and the user interface hardware available (Schmidt 2005). Visual fea-
tures of the display like colors, brightness, contrast, arrangement of icons, and so on
can be adjusted depending on where the user moves and exists (e.g., dim room,
living room, sunny space, in open air). Also, a display in a multi-display envi-
ronment may adapt in terms of the font and the size according to the type of the task
the user is engaged with (e.g., writing, reading, design, visual perception, chatting)
in a way that helps the user perform better and focus on the task at hand. However,
there is a variety of challenges associated with the topic of adaptive user interfaces,
among which include: user interface adaptation for distributed settings and user
interface adaptation in a single display (Schmidt 2005). As to the former, ‘in
environments where there is a choice of input and output devices it becomes central
to find the right input and output devices for a specific application in a given
situation. In an experiment where web content, such as text, images, audio-clips,
and videos are distributed in a display rich environment…context is a key concept
for determining the appropriate configuration…In particular to implement a system
where the user is not surprised where the content will turn up is rather difficult’
(Ibid, p. 169). As to the latter, ‘adapting the details in a single user interface a
runtime is a further big challenge. Here in particular adaptation of visual and
acoustic properties according to a situation is a central issue…We carried out
experiments where fonts and the font size in a visual interface became dependent on
the situation. Mainly dependent on the user’s activity the size of the font was
changed. In a stationary setting the font was small whereas when the user was
walking the font was made larger to enhance readability’ (Ibid).
The different computational and communication resources that may surround the
user in a particular location are normally discovered by the system as contextual
information that can be used to support the user’s task. In fact, using resources
dependent on the location and the context more generally was a main motivation in
472 9 The Cognitively Supporting Behavior of AmI Systems …
the early attempts at using context (Schilit 1995) in ubiquitous computing envi-
ronments. In addition to finding appropriate resources, context-aware systems use
also context ‘to adjust the use of resource to match the requirements of the context’
(Schmidt 2005). In an AmI setting, computational resources refer to resources that
have certain functionality and are able to communicate with the rest of AmI
computing and network systems. In this setting, the user would need a highly
efficient means to access resources, as their number and types could be staggering
(Zhang 2009). Using computational resources that are in proximity of the user is
central to context-aware applications. The aim of detecting resources that are close
to the current whereabouts of the user is to reduce the physical and cognitive burden
for users as well as avoid distracting the user from focusing on the task at hand. As
noted by Kirsh (1995), ordering and accessing items based on the concept of
physical proximity and the use of physical space as criteria is a very natural concept
for humans. To better meet the requirements of the current situation—better match
the user’s needs, selecting resources should be based on such contextual elements
as the nature and the requirement of the user activity and the user preferences as
well as on the status and condition of the resource and the network proximity, that
is, the context of the resource entity.
The potential of eye movement as a form of implicit and explicit input is under
rigorous investigation in HCI community. Similarly, eye movement is increasingly
9.4 Cognitive Context-Aware Computing 473
of facial behavior ‘is a prerequisite for developing more adapted models for
interpreting facial expressions in spontaneous interactions, i.e., models that do not
interpret each occurrence of a frown in terms of anger, sadness, or fear’ (Ibid).
There is a need for specialized research within the area of cognitive
context-aware computing with the goal to create novel and robust tools and tech-
niques for accurate measurement and detection of facial expressions as indicators of
cognitive cues or states. This area remains indeed under-researched, compared to
facial expressions for emotion recognition. ‘Given the multi-functionality of facial
behavior and the fact that facial indicators of emotional processes are often very
subtle and change very rapidly…, we need approaches to measure facial expres-
sions objectively—with no connotation of meaning—on a micro-analytic level. The
Facial Action Coding System (FACS)…lends itself to this purpose; it allows the
reliable coding of any facial action in terms of the smallest visible unit of muscular
activity (Action Units), each referred to by a numerical code. As a consequence,
coding is independent of prior assumptions about prototypical emotion expressions’
(Ibid, pp. 287–288).
motion of facial movements, and hand gestures are being investigated in AmI as to
how they can be used as a form of dynamic explicit input to control computer
systems in the sense of instructing them to execute tasks in a more effective and
efficient way. They can also be utilized to assist people with disabilities, with a wide
variety of impairments: visually impaired and hearing-impaired users rely on the
voice modality with some keyboard and the visual modality with some speech
input, respectively (see Vitense et al. 2002). Specifically, eye movement can be
used by the disabled who are unable to make normal use of explicit inputs such
keyboard, movements of the pointing device, and selections with the touch screen;
facial expressions by people with hand and speech disabilities; and gestures and
facial movements by people who suffer from blindness.
For regular users, eye movement may be more efficient than facial or gestural
movements in relation to a set of specific tasks. In other words, compared to HCI
designs using such movements as commands, eye gaze has greater potential to be
used as hands free method for many tasks associated with manipulating interactive
applications. For example, a gaze-based interface with eye gaze tracking capability,
a type of interface that is controlled completely by the eyes, can track the user’s eye
motion and translate it into a command to perform such tasks as opening documents
or scrolling, which the user would normally do by means of conventional explicit
inputs, using keystrokes with the keyboard, movements of the pointing device, and
selections with the touch screen. In more detail in a system equipped with a
gaze-based interface, a user gazes at a given link, then blinks in order to click
through; gazes at a given file or folder then blinks to open it; moves his/her eye to
scroll down and up or move the cursor from right to left and around across icons to
search for a particular item; and so on. The movements of the eyeball, horizontal,
vertical, and rotational can be combined, depending on the nature of the task being
carried out at a certain moment. But, the use of eye gaze information is a natural
choice for enhancing scrolling techniques, given the fact that the act of scrolling is
tightly linked to the users’ ability to absorb information through the eye as a visual
channel (Kumar et al. 2007). Adjouadi et al. (2004) describe a system whereby eye
position coordinates were obtained using corneal reflections and then translated into
mouse-pointer coordinates. In a similar approach, Sibert and Jacob (2000) show a
significant speed advantage of eye gaze selection over mouse selection and consider
it as a natural, hands free method of input. While this is concerned with healthy
subjects, Adjouadi et al. (2004) propose remote eye gaze tracking system as an
interface for persons with severe motor disability.
Similarly, facial movements similarly allow a new form of explicit input as an
alternative to eye gaze movements. These two distinct movements can also be
combined depending on the task to be performed and how the user might prefer—or
need—to proceed. Indeed, as an alternative to aid people with hand and speech
disabilities, visual tracking of facial movements has been used to manipulate and
control mouse cursor movements, e.g., moving the head with an open mouth which
causes an object to be dragged (Pantic and Rothkrantz 2003). Likewise, de Silva
et al. (2004) describe a system that tracks mouth movements.
9.5 New Forms of Explicit Input and Challenges 479
In addition, AmI systems are capable to use gestures and speech as commands to
assist the user in carrying out most routine tasks or activities. HCI design using
natural modalities as commands has a great potential to bring intuitiveness to the
interaction between AmI systems and users. Utilizing distance sensors Ishikawa
et al. (2005) propose touchless input system based on gesture commands. Abawajy
(2009) describes a (common) scenario where an application uses a natural modality
(gestures) to perform a task: the possible scenario is when a user refers ‘to a number
of open documents on a computer whilst typing a document. Presently one must
acquire the mouse, locate the target icon, move the cursor to the icon and click. If
the correct document is successfully opened one then has to use scroll bars or a
mouse wheel to move through the pages. An alternative could use gestures similar
to the movements used when leafing through physical documents. For example, by
moving two or three fingers towards or away from the palm the user could move to
the next document whilst moving one finger could move from page to page. The
user would face less interruption and save considerable cognitive effort when
navigating between and within documents’ (Ibid 2009, p. 67). This can also be
accomplished using speech modality. In this respect, the advantage of multiple
modalities is increased usability as well as accessibility. The limitation or infeasi-
bility of one modality can be counterbalanced by the strength or practicality of
another. On a mobile phone with a small keypad, a message may be cognitively
demanding to type but very easy to be spoken to the phone. Utilizing speech as a
command can be extended to be used in the event of writing, a feature that can be
used by both regular users and those with disabilities alike. With using speech as
commands one can easily manipulate the computer, e.g., you send speech signals to
the computer to, for example, switch off, open an application, log into a website,
play a song, send an email, find an address email, search for a document, or enter a
password.
Additionally, body movements may be used as an explicit input to control
interfaces. They can be used to manipulate what has come to be known as tangible
interfaces, which allow combining digital information with physical objects, and
there is a number of products that are illustrative of tangible interfaces, e.g., a small
keychain computer that clears the display when shaken (Fishkin 2004). The whole
idea of new forms of explicit inputs is to simplify the interaction with computer
systems and harness intuitive processes for their manipulation.
As regards the design of AmI systems using facial, gestural, and bodily move-
ments, it is important to account for physical factors such as aging and disabilities.
Nonverbal behaviors, which have gained increased attention in AmI research, are
typically linked to physical movements of the human body. In fact, aging and
disability have been ignored in HCI design and ICT more generally. And new
technologies continue to be designed for a certain type of users. When interactive
applications get adapted for the needs of a particular target user group, they are less
likely to be appropriate for others. In other words, HCI design tends to become
quickly a strengthening of existing stereotypes when targeting specific groups.
480 9 The Cognitively Supporting Behavior of AmI Systems …
of happiness) or discomfort (i.e., state of unhappiness) and the other is the level of
activation (i.e., state of alert) or deactivation (i.e., state of calmness) experienced by
the individual. These two dimensions are unrelated, as a strong sense of pleasure
can accompany low activation, so can a strong sense of displeasure. Russell (2003)
defines core affect as neuro-physiological state that is consciously accessible as a
simple, non-reflective feeling. This definition is part of a notable recent work in
theoretical development in psychology carried out by the author, which contributed
significantly to the definition of a number of important affective concepts. Those of
which that relate to aesthetic and emotional computing are introduced here.
Affective quality is a stimulus’ ability to cause a change in core affect (Ibid). Core
affect pertains to the individual while affective quality to the stimulus, such as
artifacts/objects, events, and places. Perception of affective quality refers to an
individual’s perception of a stimulus’s ability to change his/her core affect (Russell
2003). The perception of the stimulus leads to an appraisal through thought process
—cognitive information processing, which is a perceptual process that assesses the
affective quality of the stimulus. Accordingly, an AmI artifact is as a stimulus
consciously sensed and perceived—recognized and affectively interpreted, leading
to an appraisal, which in turn leads to an emotional response. Perceived affective
quality of ICT artifacts has been studied by Zhang and Li (2004, 2005) as a concept
related to affect. As an elemental process, perception of affective quality has been
assigned other terms, such as evaluation, affective judgment, affective reaction, and
primitive emotion (Cacioppo et al. 1999; Russell 2003; Zajonc 1980). However, in
the context of HCI design, the affective quality of AmI artifacts may have an effect
on user’s affect, i.e., aesthetically beautiful and emotionally appealing AmI systems
are likely to elicit positive emotions in users. In this sense, perception of affective
quality (of an artifact) is a construct that makes such a relation (Zhang and Li 2004).
9.6.2 Aesthetics
Affect is related to, but different from, aesthetics. Zhang (2009, p. 6) notes:
‘…a simple way to differentiate and connect aesthetics and affect is to say that
aesthetics emphasizes the quality of an object or stimulus and the perception of such
quality in one’s environment, and affect emphasizes the innate feelings people have
that are induced by the object (such as emotions and affective evaluations)’. Affect
studies are concerned with individuals’ affective or emotional reactions to stimuli in
one’s environment, while aesthetics studies focus on objects and their effect on
people’s affect. Affect refers to user’s psychological response to the perceptual
design details of the artifact (Demirbilek and Sener 2003). ‘Aesthetics’ comes from
the Greek word aesthesis, meaning sensuous knowledge or sensory perception and
understanding. It is a branch of philosophy. The meaning of esthetics is implied to
be a broader one, including any sensual perceptions, but sometimes the concept is
used to describe a sense of pleasure (Wasserman et al. 2000). It is difficult to pin
484 9 The Cognitively Supporting Behavior of AmI Systems …
down the concept of aesthetics. Lindgaard et al. (2006) contend that the concept of
aesthetics is considered to be elusive and confusing. It was the philosopher
Baumgarten who, in the eighteenth century, changed the meaning of the concept
into sense gratification or sensuous delight (Goldman 2001). The term has since
become more related to the pleasure attained from sensory perception—sensuous
delight as an aesthetic experience. Aesthetics can thus be said to be about the
experience of the beauty and quality of artifacts (or work of art) as gratifying to our
senses. As artifacts are produced to gratify our senses, ‘the concept has…been
applied to any aspect of the experience of art, such as aesthetic judgment, aesthetic
attitude, aesthetic understanding, aesthetic emotion, and aesthetic value. These are
all considered part of the aesthetic experience’ (Hekkert 2004, p. 2). However,
aesthetics can be found in many aspects of human lives, nature as well as people
can be experienced aesthetically. Indeed, aesthetics has been studied as part of a
wide range of disciplines and has a long history as an object of study. Its historical
development is, however, beyond the scope of this chapter. In relation to com-
puting, the reader can be directed to the book, Aesthetic Computing, written by
Fishwick (2006), which attempts to place aesthetics in its historical context,
and examines its broader sense in art and design, mathematics and computing, and
HCI—user interfaces—in the form of a set of collected articles and essays. This
book involves several scholars and practitioners from art, design, computer science,
and mathematics; they have contributed to laying the foundations for a new field
that applies the theory and practice of art to computing. In the context of computer
science, the contributors address aesthetics from a broader perspective, from
abstract qualities of symmetry to ideas of creative pleasure. Aesthetic computing
offers benefits to HCI in terms of enhancing usability through inducing positive
affective states in users. Pleasant, aesthetic design of artifacts enhances their
usability (Norman 2002).
As an affective experience, aesthetics is subjective because it is associated with
perception. That is, affective response (pleasure or displeasure from the sensory
perception of the affective quality of an artifact) is based on the way each individual
experiences an artifact—react affectively to it, meaning that aesthetic response tends
to vary from an individual to another. Aesthetics is rather about its effect on the
perceiver in terms of the degree to which his/her senses can be gratified when
experiencing an artifact than the aesthetic potential of an artifact itself to invoke
affective reactions. In this sense, gratification of senses or sensuous delight is much
linked to such factors as the context, the situation, and the environment, as well as
idiosyncratic and sociocultural dimensions of the perceiver. This implies that same
aesthetic quality of an artifact may trigger different affective reactions on different
people. Aesthetic reactions differ in a lawful manner, ‘just like the process
underlying our emotions is uniform, yet leading to individual differences as a result
of interpretation differences’; and ‘it is only in this way that beauty can be said to lie
in the “eyes of the beholder”’ (Hekkert 2004, p. 4). Moreover, several concepts
related to aesthetics have been developed to signify the explicit meanings of
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 485
Following the tenets of cognitivism, cognitions are mental and social representa-
tions of real objects, processes, events, and situations that occur in the world.
Accordingly, they are based on perceptions, i.e., affected by subjective, socially
situated interpretation of these elements, and cognitive schemata facilitate percep-
tion of novel experiences. The cognitive system is seen as ‘an experience-based
system that reacts interpretatively and socially’ (Zhang 2008, p. 147). Although
they are abstractions, and thus often simplifications or alterations of the external
environment, they do constitute attempts to capture reality. Appraisal theory pro-
vides a descriptive framework for emotion based on perceptions, that is, the way
individuals experience objects, processes, events, and situations at the focus of the
emotional state (Scherer 1999). The process underlying the emotional response to
these elements can in fact most precisely be described by an appraisal model (e.g.,
Frijda 1986; Scherer 1992; Scherer et al. 2001; Roseman et al. 1994; Ortony and
Turner 1990; Ortony et al. 1988). Theses appraisal theorists posit that an emotion is
elicited by an appraisal of a situation, event, or object as potentially advantageous
or disadvantageous to a person’s concerns, e.g., on seeing a new smart mobile
phone a person may experience desire because he/she expects that possessing it will
fulfill his/her concern of being in the know of the latest technology. A key premise
of appraisal theory is that it is the interpretation of the situation, event, or object
rather than these themselves, which trigger the emotion. Appraisal theory postulates
that each emotional response of an individual has an idiosyncratic pattern of
appraisal, but there are few one-to-one relationships between an emotional response
and a situation, event, or object.
Given the intensity of the interaction between users and AmI artifacts, e.g., intel-
ligent functionality and visual and aesthetic tools, the interaction experience should
have a multidimensional effect, involving sense gratification resulting from aes-
thetically pleasant objects, pleasure and effectiveness of use resulting from the
interaction with the system at data and process levels, and fulfillment resulting from
achieving well-defined goals. In general, as regards sense gratification ‘following
thinking in evolutionary psychology, it is argued that we aesthetically prefer
environmental patterns and features that are beneficial for (the development of) the
senses’ functioning… If certain patterns in the environment contribute to the
functioning of our senses, it is reinforcing to expose ourselves to these patterns.
Hence, we have come to derive aesthetic pleasure from seeing, hearing, touching…
and thinking certain patterns that are beneficial to our primary sense’s functioning’
(Hekkert 2004, p. 1, 10). The aesthetic experience of AmI artifacts involves the
quality of their aesthetic features. These are associated with user interfaces and
encompass at the software level the visualizations of the content, menu and navi-
gation structure, fonts, color pallet, graphical layouts, dynamic icons, animations,
images, musical sounds, and so on. At the hardware level, aesthetic features include
type of display, casing, size, shape, weight, temperature, material, color, buttons,
and so on. Both aesthetic features are connected to the attractiveness and beauty of
AmI artifacts as part of the full experience thereof. The other part of the experience
of AmI artifacts concern the processes associated with the use of the artifact in
terms of performing tasks or actions, such as touching, scrolling, clicking, pushing,
navigating, and receiving reactions from the user interface or device, e.g., images
and musical sound or auditory feedback. Drawing on Dewey (1934), the experience
of the artifact is shaped by a continuous alternation of doing and undergoing.
A typical everyday experience with an AmI artifact would involve interactivity with
both aspects. It is an experience since it is demarcated by a beginning and an end to
make for a whole; this experience is shaped by a continuous alternation of doing
and undergoing (Dewey 1934). However, high affective quality of designed arti-
facts can profoundly influence people’s core affect through evoking positive
affective states, such as delight and satisfaction. Therefore, it is strongly favorable
to take into account aesthetics in the design of AmI systems. The sensory aspects of
humans should be accounted for in all forms of design (Loewy 1951). Aesthetics
satisfies basic ICT users’ needs when they strive for satisfying interactive experi-
ence that involves the senses, produces affective responses, and achieves certain
well-defined goals (Ben-Bassat et al. 2006; Tractinsky 2006), although it has been
difficult for users to articulate the different affective needs, and hence for HCI
designers to understand these needs. However, Norman (2004) contends that the
concept of aesthetic experience is implied to include emotional design. He makes
explicit, in a three-level processing for emotional design, the connection between
aesthetics and emotion: the visceral processing which requires visceral design that
leads to (pleasant) appearance, the behavioral processing which entails behavioral
design that is associated with the pleasure and effectiveness of use, and the
reflective processing which requires reflective design that is about personal satis-
faction, self-image, and memories. This three-level processing illustrates that
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 489
pleasure derivable from the appearance and satisfaction resulting from the func-
tioning of the artifact increase positive affect. Furthermore, aesthetics-based HCI
involves affective quality and rich-content information whose perception is affected
by a subjective, socioculturally situated interpretation of the AmI artifact and the
task. In relation to affective and emotional reactions (positive or negative), one’s
appropriation of the AmI artifact’s aesthetic quality and performance is based on a
set of intertwined factors involved in a particular use situation, e.g., colors and how
they can be combined and dynamically change in a user interface as affective
quality features of an AmI artifact. Psychological theory has it that colors invoke
emotions (pleasure or displeasure) and that people vary as to aesthetically judging
colors and affectively interpreting them based on cultural standards, in addition to
other factors such as personality, preferences, and gender. Colors as an esthetic
element are culturally dependent. Besides, it is argued that there are no universally
agreed-upon principles of what is aesthetically beautiful from what is not. The
pattern underlying our aesthetic responses or reactions, albeit uniform (i.e., visual
system, affect system), can differ from an individual to another, just like the process
underlying our emotions is unvarying (i.e., five organismic subsystems: cognitive
system (appraisal), autonomic nervous system (arousal), motor system (expression),
motivational system (action tendencies), and the monitor system (feeling) (Scherer
1993, 1994b), yet leads to individual differences as a result of interpretation dif-
ferences. Accordingly, like a number of aspects of artifact design aesthetics, the
visual perception of colors tends to be subjective, vary from an individual to
another. A constructivistic worldview posits that reality is socially constructed, i.e.,
the constructions are not personal—the representation process involves other social
and cultural artifacts and therefore inevitably becomes social, although perception
necessarily is individual. Therefore, it is important to account for cultural variations
in interaction design aesthetics—social-cultural specificity of aesthetic representa-
tions. In terms of applying emotional studies to AmI design, inducing intended
emotions involves an experience-based system that reacts socially and interpreta-
tively, to draw on Zhang (2009). Visual conventions have proven not to be uni-
versal as perception of aesthetics is culturally situated. Implementing user interfaces
foundered on assumptions that do not hold renders AmI design useless in the face
of cultural contingencies. Understanding how the process of interpretation occurs
when experiencing an artifact, aesthetically and during the interaction, holds a key
to designing for emotions through aesthetic means in computing. Fishwick (2006)
considers the importance of aesthetics and introduces aesthetic computing as a new
field of studies that ‘aims at adding qualitative representational aspects to visual
computing in order to support various cognitive processes… He argues that visual
programing is not only about technical issues but also about cultural and philo-
sophical assumptions on the notation used to represent computational structures.
His aim is not just to define the optimal aesthetic style, but also to support users to
explore new subjective perspectives’ (Bianchi-Berthouze and Mussio 2005, p. 384).
In ‘Aesthetic Computing’ (Fishwick 2006), the author explores aesthetic experience
beyond the representation of technological events.
490 9 The Cognitively Supporting Behavior of AmI Systems …
New studies in the field of AmI as a novel approach to HCI are marking new
milestones, including the emphases on intelligent functionalities and capabilities
(i.e., context awareness, natural interaction, affective computing) and aesthetics
design and affective design. There is growing interest in merging affective, ambient,
and aesthetic aspects in HCI design. Interactive systems combining context-aware,
multimodal, perceptual, visual, and aesthetic features are increasingly proliferating,
spanning a wide range of ICT application areas. These systems offer new and
appealing possibilities to user interaction—pleasurable experience, aesthetic
appreciation, and positive feeling. The new computing culture is about how people
aspire to interact with technology and the effect they expect this will have on their
own cognitive world—e.g., affect and emotion. This experience-driven way of
acting is a qualitative leap crystallized into a new paradigm shift in HCI, marking a
movement toward a more human-centered philosophy of design and concomitantly
heralding the end of the old computing paradigm which is about what computer can
do. Accordingly, among the things that many scholars have only recently come to
realize and make progress within is the relationship between aesthetics, affect, and
cognition. Aesthetics plays a key role in eliciting positive affective states, which in
turn influences cognitive processes associated with task performance. It can thus be
used to facilitate and stimulate cognitive abilities, either as an alternative to or a
combination with cognitive context-aware adaptive and responsive behavior,
depending on the nature of the task and the characteristics of the user’ cognitive
behavior. In general, the discourse has moved on from the goal of merely attaining
system functionality and usability to aesthetics, a movement from a cognitive
paradigm to a more affective centric paradigm and from being instrumental ori-
entation to experiential orientation (Norman 2002, 2004; Zhang and Li 2004,
2005). Bosse et al. (2007, p. 45) point out that the human factors should ‘support
designs that address people’s emotional responses and aspirations, whereas
usability alone still demands a great deal of attention in both research and practice.
Consideration of these needs has generally fallen within the designer’s sphere of
activities, through the designer’s holistic contribution to the aesthetic and functional
dimensions of human-system interactions’. Accordingly, emotions are gaining an
increased attention in AmI design research. AmI emphasizes the significance of
emotional states in determining the unfolding of the interaction process. In it,
positive emotions can be induced by the affective quality of AmI systems and the
smoothness, simplicity, and richness of interaction due to new technological fea-
tures of AmI. The aesthetic experience is said to have an effect on the users’
cognitive behavior associated with performing tasks using computational artifacts.
It aids user cognition during interaction (e.g., Norman 2002; Spillers 2004).
Similarly, the strength of AmI environments in supporting affective design with
context-aware adaptive applications has implications for improving user perfor-
mance, as they elicit pleasant user experiences. However, the need of AmI
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 491
HCI researchers in AmI community have recently started to focus on aesthetics and
affect in the AmI use. In addition to seeking to understand how aesthetics of AmI
artifacts can trigger and mediate affect, there is a growing interest in exploring how
these processes can have effect on user performance—i.e., aid user cognition during
interaction with AmI systems. Some of the formal investigations on the effects of
aesthetic and affective related constructs on human ICT interaction factors such as
use and performance in various contexts are on the way by scholars in the field of
HCI (Zhang 2009). Many studies demonstrate the significance of affect on (cog-
nitive) task performance. It has become of significance to apply new theoretical
models based on the understanding of the concepts of aesthetics, affect, and cog-
nition and their relationships. The underlying assumption of integrating aesthetics
in the design of AmI systems is that the aesthetic experience as part of the full
experience of an AmI artifact is more likely to influence the cognitive processes
involved in tasks and activities in various AmI use situations. This relates to the
system feature of social intelligence of AmI systems. A system designed with
492 9 The Cognitively Supporting Behavior of AmI Systems …
repertoire and of building cognitive resources while negative emotions narrow the
individual’s thought repertoire. In view of that, it is the information processing
approach adopted by individuals driven by their affective states that shape, to a
large extent, how the task is perceived and thus performed: negative affective states
can make some simple tasks difficult and positive ones can make some difficult
tasks easier—e.g., by helping generate creative patterns of problem solving. Indeed,
Norman (2002, pp. 4–5) maintains that affect ‘regulates how we solve problems and
perform tasks. Negative affect can make it harder to do even easy tasks: positive
affect can make it easier to do difficult tasks. This may seem strange, especially to
people who have been trained in the cognitive sciences: affect changes how well we
do cognitive tasks… Now consider tools meant for positive situations. Here, any
pleasure derivable from the appearance or functioning of the tool increases positive
affect, broadening the creativity and increasing the tolerance for minor difficulties
and blockages. Minor problems in the design are overlooked. The changes in
processing style released by positive affect aids in creative problem solving which is
apt to overcome both difficulties encountered in the activity as well as those created
by the interface design’.
Runco 2004) and potential (Runco 2003), emphasizing research on individuals that
have potential for creativity but are not realizing it. In terms of models of creativity,
Plsek (1997) proposes the ‘directed-creativity cycle’, composed of observation,
analysis, generation, harvesting, enhancement, evaluation, implementation, and
living with it—these are clustered within: (1) preparation, (2) imagination,
(3) development, and (4) action.
Creativity is usually attributed to special imaginative or inventive operation, and
therefore involves a typical use of cognition—mental (information-manipulation)
processes, internal structures, and representations. In other words, the process of
creative cognition entails distinct cognitive patterns, dynamic connections, associ-
ations, and manipulation of mental elements in order to generate creative ideas.
Cognitive approach to creativity aims to understand the mental processes and
representations underlying creative thought (Stenberg 1999). To generate a tangible
creative outcome requires various resources, including intellectual abilities,
knowledge, styles of thinking, personality, environment, flexibility, openness to
experience, sensitivity, playfulness, intrinsic motivation, wide interest and curiosity,
and so on (Sternberg and Lubart 1996, 1999; Runco 2007). Runco (2007) notes
that creative personality varies from domain to domain, and perhaps, even from
person to person: ‘there is no one creative personality’ (Runco 2007, p. 315).
In the context of this chapter, creativity is particularly associated with the
relationship between affect and creative cognition, more specifically, how positive
affective states influence creative thinking in task performance (e.g., see, Norman
2002; Kaufmann and Vosburg 1997). The premise is that positive affective states,
which can be elicited and increased by exposure to and interaction with AmI
systems as aesthetically beautiful, emotionally appealing, and intelligently behaving
artifacts, is likely to broaden the thought processes—hence enhanced creativity
when it comes to performing tasks or carrying out tasks. In the study of how
emotions can affect creativity, three broad lines of research can be distinguished
(Baas et al. 2008): (1) the correlation between positive and neutral emotional states;
(2) the correlation between negative and neutral emotional states; and (3) the
correlation between positive and negative emotional states. In relation to the third
line of research and affect changing the operating parameters of cognition, ‘positive
affect enhances creative, breadth-first thinking’ and makes people more tolerant of
minor difficulties and more flexible and creative in finding solutions’ while ‘neg-
ative affect focuses cognition, enhancing depth-first processing and minimizing
distractions’ (Norman 2002, p. 36). Negative affect has no leverage effect on cre-
ative performance (Kaufmann and Vosburg 1997). In relation to AmI, positive
affect enable creative problem solving which is apt to overcome difficulties
encountered in the task or activity as well as those created by the user interface
design and behavior. Furthermore, various studies suggest that positive affect
increase cognitive flexibility, leading to unusual associations (Isen et al. 1985).
Stenberg (1999) points out that creativity occurs in a mental state where thought is
associative and a large number of mental representations are simultaneously active.
Creativity consists of making new combinations of associative elements (Poincaré
1913). Creative productions consist of novel combinations of pre-existing mental
496 9 The Cognitively Supporting Behavior of AmI Systems …
an engrossing total environment (Nechvatal 1999), a mental state that is, according
to Varney (2006), often accompanied with intense focus, special excess, a distorted
sense of time, and effortless action. As to the latter, in particular, AmI environments
are more likely to induce immersion, as it provides applications that are flexible,
adaptable, and capable of acting autonomously on behalf of users, in addition to
aesthetic beautiful artifacts that trigger and mediate affect and how these processes
aid user cognition during interaction, which is crucial to the quality of user expe-
rience impacting desirability and pleasureability—and hence positive mood and
intense focus. Individuals in a positive mood state, which involves intensity (Batson
et al. 1992), have a broader focus of attention (Gasper and Clore 2000). In all,
‘total-immersion is implied complete presence…within the insinuated space of a
virtual surrounding where everything within that sphere relates necessarily to the
proposed “reality” of that world’s cyberspace and where the immersant is seem-
ingly altogether disconnected from exterior physical space’ (Nechvatal 2009, p. 14).
However, immersion can only be found in the ensemble of ingredients, which
requires a holistic design approach, thereby the need to stimulate collaboration
among people from such human-directed sciences as cognitive science, neurosci-
ence, cognitive psychology, and social sciences or working on cross connections of
presence technologies in AmI (computer science) with these disciplines to combine
their knowledge and capitalize on their strengths, and develop integral solutions to
not only immersive-driven applications, but different aspects of presence in relation
to applications, services, and products. See below for further discussion.
effect felt when controlling real-world objects remotely. Lombard and Ditton (1997)
describe presence abstractly as an illusion that a mediated experience is not med-
iated. In developing further the concept of presence, they enumerate six concep-
tualizations thereof:
1. Presence can be a sense of social richness, the feeling one gets from social
interaction.
2. Presence can be a sense of realism, i.e., computer-generated environments
looking or seeming real.
3. Presence can be a sense of transportation, which is a more complex concept than
the traditional feeling of one being there, including users feeling as though
something is ‘here’ with them or they are sharing common space with another
person together.
4. Presence can be a sense of immersion, through the senses or the state of mind.
5. Presence can provide users with the sense they are social actors within the
medium where users are no longer passive viewers and, via presence, gain a
sense of interactivity and control.
6. Presence can be a sense of the medium as a social actor.
A study carried out by Bracken and Lombard (2004) illustrates this idea of the
medium as a social actor with the suggestion that people interact with computers
socially. With particular emphasis on children as a sample of study, the researchers
found that confidence in the children’s ability is correlated with the positive
encouragement they receive from a computer. In a similar study conducted by Nan
et al. (2006), it was found that the inclusion of anthropomorphic agents that relied
on AI on a Web site positively impact upon people’s attitudes toward the site. Also,
the studies done by the above researchers speak to the concept of presence as
transportation, which in this case refers to the computer-generated identity, in the
sense of users, through their interaction, perceive that these fabricated personalities
are really ‘there’. Communication media and web-based applications have been a
central pillar of presence since the term’s conception and a subject of different
studies (e.g., Rheingold 1993; Turkle 1995). Turkle focuses on the individual sense
of presence and Rheingold on the environmental sense of presence that commu-
nication provides.
However, Weimann (2000) argues that, based on the view of media scholars
who claim that virtual experiences are very similar to real-life ones, people can
confuse their own memories and have trouble remembering if those experiences
were mediated or not. This may apply to people, events, situations, and places.
Indeed, in terms of presence of objects, there is evidence that humans can cope well
with missing and even contrasting information and that they do not need a real-like
representation and full perceptual experience (Bianchi-Berthouze and Mussio
2005). This issue may be overcome or, at least, its effect be mitigated due to the
recent advances in presence technologies. Riva et al. (2005) point out that presence
research can include the bias and context of subjective experience, evolving from
the effort to generate reality with increasing realism—the ‘perceptual illusion of
non-mediation’.
9.7 Presence in Computing and AmI 499
Heralding a paradigm break with the post-desktop paradigm, AmI computing has
broadened and reconfigured the conceptualization of many terms, including pres-
ence. Accordingly, AmI goes further than the early use of the term (e.g., Minsky
1980; Sheridan 1994) presence since its applications and uses are both widened and
deepened. Riva et al. (2005) maintains: ‘Today man-machine interfaces have
evolved considerably, and the inherent capacity of presence technologies is to
support multiple users’ engagement and bidirectionality of exchange: the objectives
and communication approach are thus different to control theory’. Indeed, AmI are
characterized by human-like computational capabilities, including context aware-
ness, implicit and natural interaction, and autonomous intelligent behavior, and
involve distinctive enabling technologies, including smart miniaturized sensors,
embedded systems, communication and networking technologies, intelligent user
interfaces/intelligent agents. These are to be exploited in addition to virtual reality,
mixed reality, augmented reality, embodied reality, hyper-reality, mediated reality,
and ubiquitous virtual reality for successful substitution to being there yourself.
Furthermore, the aforementioned Lombard and Ditton’s (1997) conceptualizations
of presence apply to AmI, given the scope of the understanding and supporting
behavior characterizing AmI systems and environments—AmI takes care of and is
sensitive to needs; is capable of anticipating and responding intelligently to spoken
or gestured indications (cognitive, emotional and physiological cues) of desires
without conscious mediation, reacting to explicit spoken and gestured commands
for executing tasks, and supporting the social processes of humans and even being a
competent social agent in group interactions; can even engage in intelligent dialog
or mingle socially with humans; and elicits pleasant user experiences and positive
emotions in users through affective quality of aesthetic artifacts and environments
and smoothness, intuitiveness, and richness of interaction. Appropriate technologies
of presence, the sense of being there: ‘the experience of projecting one’s mind
through media to other places, people and designed environments’, combine vari-
ous types of media to create a non-mediation illusion—‘the closest possible
approximation to a sense of physical presence, when physical presence there may
be none’ (Ibid). Presence entails an amalgam of cognition, affect, attention, emo-
tion, motivation, and belief associated with the experience of interacting with AmI
technologies in relation to different settings: home, work, social environments, and
on the move. Riva et al. (2005) emphasize ‘the link between the technology—
through the concepts of ubiquitous computing and intelligent interface—and the
human experience of interacting in the world—through a neuro-psychological
vision centered on the concept of ‘presence’.’
In particular, more advances in ambient, naturalistic, intelligent user interfaces
will radically change interaction between technology and humans—e.g., tremen-
dously ease and enrich user interaction experience. This has direct implications for
presence and believability as to the mediated experience of interacting with any
500 9 The Cognitively Supporting Behavior of AmI Systems …
entity (e.g., objects, places, events, situations, people, designed environments, etc.)
in any x-reality (e.g., virtual reality, mixed reality, augmented reality, embodied
reality, hyper-reality, mediated reality, ubiquitous virtual reality, etc.) within AmI
spaces. In fact, computing spaces are much more about the believability than the
reality of these entities—in many AmI applications and scenarios. Addressing the
issue of presence and believability in relation to computational artifacts, Casati and
Pasquinelli (2005) argue that the important issue is believability not reality,
although the representation fidelity of, and perceptual interaction with, computa-
tional artifacts has been at the center of research within visual computing (visual
graphics) and virtual reality. The authors ‘provide evidence that humans do not
need a real-like representation and full perceptual experience but that they can cope
well with missing and even contrasting information. They argue that the processes
that play a role in the subjective feeling of presence are cognitive and perceptual
expectancies relative to the object to be perceived, and the sensory motor loop in
the experience of the perceived object. Insisting that a motor component plays a
prominent role in the perception of objects, they suggest an enactive interface, an
interface that enables users to act on an object and to see the consequences of their
actions, as a means to improve believability’ (Bianchi-Berthouze and Mussio 2005,
p. 384). Therefore, what is essential is the neuro-cognitive-perceptual processes
involved in the experience of the simulated settings. Likely missing and contrasting
information about these settings are more likely to be overcome with natural
interaction, context awareness, and intelligence as computational capabilities. The
perception of computing environments as real and believable is increasingly
becoming achievable due to the advances of many enabling technologies and thus
computational functionalities. AmI represents a computationally augmented envi-
ronment where human users interact and communicate with artificial devices, and
the latter explore their environment and learn from, and support, human users. This
entails that technologies become endowed with human-like cognitive, emotional,
communicative, and social competencies necessary to improve the naturalness of
interaction and the intelligence of services through AmI systems and environments
behaving so pertinently in real-time, i.e., the user having full access to a wide
variety of intelligent services from the augmented presence environments, that they
seem fully interactive, adaptive, and responsive—and hence can be perceived, felt,
and appear as real. Put differently, x-reality environments are more likely to become
indistinguishable from reality environments.
challenges and open issues relating to system engineering, design, and modeling,
especially in relation to context awareness, natural interaction, and intelligence.
Emulating these human capabilities and processes to augment presence environ-
ments is no doubt an easy task. The question is to which extent human capabilities
and processes can be captured in formal abstractions (or models) that an AmI
system can understand and communicate with the user on a human-like level.
Adding to this is the unsettled scientific issues relating to human cognitive world
and the complexity inherent in comprehending how human cognitive processes
interrelate (e.g., emotional complex, affect and cognition, motivation and emotion,
etc.) function in a dynamic and unpredictable way, and relate to other factors such
as different forms of energy and brain and body’s biochemistry. Another question to
be raised is to whether modelers and designers will ever be able to formally
computerize these relationships and dynamic patterns that an application or system
can give users the sense of it as a medium acting as a competent social actor in
interaction and provide users with the sense they are social actors within the
medium where users via presence gain a sense of interactivity and control. The
underlying premise is that for the system to do so, it needs to display a human-like
cognitive, affective, and communicative behavior. AmI works in an unobtrusive and
invisible way (ISTAG 2001). The significant challenge lies in designing, modeling,
evaluating, and instantiating AmI systems and environments that coordinate with
human users’ cognitive, affective, and interactive patterns and behaviors, so that
they can be perceived as real subjects without missing or conflicting information,
that is, in harmony with human mental representations used in the perception and
making sense of real objects and subjects. In deconstructing subject-object relations
in AmI, Crutzen (2005, p. 224) state: ‘A necessary condition for the realization of
AmI environments is not only monitoring in circumambient ways the actions of
humans and the changes in their visible and invisible environment, AmI is also a
pattern of models of chains of interaction embedded in things. Objects in our daily
world—mostly inanimate—will be enriched by an intelligence that will make them
almost ‘subjects’, capable of responding to stimuli from the world around them and
even of anticipating the stimuli. In the AmI world the “relationship” between us and
the technology around us is no longer one of a user towards a machine or tool, but
of a person towards an “object-became-subject”, something that is capable of
reacting and of being educated’. Regardless of the conceptualization of presence, in
augmented presence environment, the perceptual experience remains subjective and
idiosyncratic to a great extent. That is to say, it depends on how each user expe-
riences AmI systems and environments, e.g., the perception of the dynamics of the
interaction and the extent to which it naturally occurs. Riva et al. (2005) maintains
that presence research is evolving to reproduce reality with ever increasing realism
and to include the context and bias of subjective experience, and suggests design
choices for novel computer-enriched environments that can enhance human
capacity to adapt to new situations.
Realizing presence technology as a key aspect of human-centered design sym-
bolizing AmI remains a question of technology feasibility: what existing and future
technologies will permit in terms of engineering, design, development, evaluation,
504 9 The Cognitively Supporting Behavior of AmI Systems …
and modeling of AmI systems—in other words, how far computer scientists and
designers can go and be able as to simulating and implementing human cognition,
perception, emotion, and interaction into next-generation technologies. It is argued
that most of the computer system and application engineering, design, and mod-
eling are technology-driven due to the fact that little knowledge, methods, models,
and tools are available to incorporate user cognitive, affective, and interactive
behavior as a parameter when designing computer systems. A strong effort must be
made in the direction of human behavior modeling to achieve in human under-
standing the same level of confidence that exists in designing and modeling new
technology. The real challenge may lie in taking into account a holistic view at the
level of human functioning processes: neurological, cognitive, affective, motiva-
tional, communicative, and communicative as well as the micro-context of users’
everyday lives. This would go in favor of more in-depth studies of users in real life
settings or so-called living labs. Technology designers seem to believe however that
these techniques are too costly and too time consuming to take them on board;
indeed, they require considerable investments on different scales but rather the
question should be whether the results would prove the efforts or not. On an
optimistic note, Riva et al. (2005) mentioned a number of challenging scenarios that
are envisioned as tests of whether presence technologies can make a real difference,
while foreseeing other scenarios beyond the state of the art to emerge. The chal-
lenging ones include:
• ‘Persistent hybrid communities: constructing large-scale virtual/mixed com-
munities that respond in real-time and exhibit effects of memory and behavioral
persistence while evolving according to their intrinsic social dynamics.
• Presence for conflict resolution, allowing people to be immersed and experience
situations of conflict or co-operation. By fostering communication and mutual
understanding between different parties these presence environments should
ultimately be empathy-inducing.
• Mobile mixed reality presence environments: moving freely and interacting in
real/augmented populated surroundings through natural and/or augmented
mediated tools.
• Personalized learning and training environments, stimulating a combination of
imaginary and physical actions and emotions through appropriate sets of
embedded nonverbal and multisensory cues for skill acquisition and learning’.
References
Alexander S, Sarrafzadeh A (2004) Interfaces that adapt like humans. In: Proceedings of 6th
computer human interaction 6th Asia Pacific conference (APCHI 2004), Rotorua, pp 641–645
Andreasen N (2005) The creating brain. Dana Press, New York
Asteriadis S, Tzouveli P, Karpouzis K, Kollias S (2009) Estimation of behavioral user state based
on eye gaze and head pose—application in an e-learning environment. Multimedia Tools and
Applications 41(3):469–493
Baas M, Carsten De Dreu KW, Nijstad BA (2008) A meta-analysis of 25 years of mood-creativity
research: hedonic tone, activation, or regulatory focus? Psychol Bull Am Psychol Assoc 134
(6):779–806
Batson CD, Shaw LL, Oleson KC (1992) Differentiating affect, mood and emotion: toward
functionally based conceptual distinctions. Sage, Newbury Park
Ben-Bassat T, Meyer J, Tractinsky N (2006) Economic and subjective measures of the perceived
value of aesthetics and usability. ACM Trans Comput Hum Interact 13(2):210–234
Bianchi-Berthouze N, Mussio P (2005) Introduction to the special issue on “context and emotion
aware visual computing”. J Vis Lang Comput Comput 16:383–385
Blechman EA (1990) Moods, affect, and emotions. Lawrence Erlbaum Associates, Hillsdale, NJ
Boring RL (2003) Cognitive science: at the crossroads of the computers and the mind. Assoc
Comput Mach 10(2):2
Bosse T, Castelfranchi C, Neerincx M, Sadri F, Treur J (2007) First international workshop on
human aspects in ambient intelligence. In: Workshop at the European conference on ambient
intelligence, Darmstadt, Germany
Bower GH (1981) Mood and memory. Am Psychol 36:129–148
Bracken C, Lombard M (2004) Social presence and children: praise, intrinsic motivation, and
learning with computers. J Commun 54:22–37
Braisby NR, Gellatly ARH (2005) Cognitive psychology. Oxford University Press, New York
Brandtzæg PB (2005) Gender differences and the digital divide in Norway—Is there really a
gendered divide? In: Proceedings of the international childhoods conference: children and
youth in emerging and transforming societies, Oslo, Norway, pp 427–454
Brewin CR (1989) Cognitive change processes in psychotherapy. Psychol Rev 96:379–394
Cacioppo JT, Gardner WL, Berntson GG (1999) The affect system has parallel and integrative
processing components: form follows function. J Personal Soc Psychol 76:839–855
Casati R, Pasquinelli E (2005) Is the subjective feel of ‘presence’ an uninteresting goal? J Vis Lang
Comput 16(5):428–441
Clore GL, Schwarz N, Conway M (1994) Affective causes and consequences of social information
processing. In: Wyer RS, Srull TK (eds) Handbook of social cognition, vol 1. Erlbaum
Hillsdale, NJ, pp 323–418
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc 3
(4):219–232
Damasio A (1994) Descartes’ error: emotion, reason, and the human Brain. Grosset/Putnam, New
York
Demirbilek O, Sener B (2003) Product design, semantics and emotional response. Ergonomics 46
(13–14):1346–1360
de Silva GC, Lyons MJ, Tetsutani N (2004) Vision based acquisition of mouth actions for
human-computer interaction. In: Proceedings of the 8th Pacific Rim international conference
on artificial intelligence, Auckland, pp 959–960
Dey AK (2000) Providing architectural support for building context-aware applications. PhD
thesis, College of Computing, Georgia Institute of Technology
Dewey J (1934) Art as experience. Berkley Publishing Group, New York
Exline R, Winters L (1965) Effects of cognitive difficulty and cognitive style on eye contact in
interviews. In: Proceedings of the Eastern Psycholoical Association, Atlantic City, NJ, pp 35–41
Fiedler K, Asbeck J, Nickel S (1991) Mood and constructive memory effects on social judgment.
Cognit Emot 5:363–378
Fishkin KP (2004) A taxonomy for and analysis of tangible interfaces. Personal Ubiquitous
Comput 8(5):347–358
506 9 The Cognitively Supporting Behavior of AmI Systems …
Kwon OB, Choi SC, Park GR (2005) NAMA: a context-aware multi-agent based web service
approach to proactive need identification for personalized reminder systems. Expert Syst Appl
29:17–32
Lakoff G, Johnson M (1999) Philosophy in the flesh: the embodied mind and its challenge to
Western thought. Basic Books, New York
Lazarus RS (1982) Thoughts on the relations between emotions and cognition. Am Physiol 37
(10):1019–1024
Leder H, Belke B, Oeberst A, Augustin D (2004) A model of aesthetic appreciation and aesthetic
judgments. Br J Psychol 95:489–508
Lerner JS, Keltner D (2000) Beyond valence: toward a model of emotion-specific influences on
judgment and choice. Cognit Emot 14(4):473–493
Leventhal H, Scherer K (1987) The relationship of emotion to cognition: a functional approach to
a semantic controversy. Cognit Emot 1:3–28
Lieberman H, Selker T (2000) Out of context: computer systems that adapt to, and learn from,
context. IBM Syst J 39:617–632
Lindgaard G, Fernandes G, Dudek C, Brown J (2006) Attention web designers: you have 50
milliseconds to make a good first impression! Behav Inf Technol 25:115–126
Loewy R (1951) Never leave well enough alone. Simon and Schuster, New York
Lombard M, Ditton T (1997) At the heart of it all: the concept of presence. J Comput Mediat
Commun 3(2)
Luce MF, Bettman JR, Payne JW (1997) Choice processing in emotionally difficult decisions.
J Exp Psychol Learn Mem Cognit 23:384–405
Markopoulos P, de Ruyter B, Privender S, van Breemen A (2005) Case study: bringing social
intelligence into home dialogue systems. ACM Interact 12(4):37–43
Martin RA, Kuiper NA, Olinger J, Dance KA (1993) Humor, coping with stress, selfconcept, and
psychological well-being. Humor 6:89–104
Mayer RE (1999) The promise of educational psychology: learning in the content areas. Prentice
Hall, Upper Saddle River, NJ
Mendelsohn GA (1976) Associative and attentional processes in creative performance. J Personal
44:341–369
Minsky M (1980) Telepresence. MIT Press Journals, Cambridge, pp 45–51
Mumford MD (2003) Where have we been, where are we going? Taking stock in creativity
research. Creat Res J 15:107–120
Nan X, Anghelcev G, Myers JR, Sar S, Faber RJ (2006) What if a website can talk? Exploring the
persuasive effects of web-based anthropomorphic agents. J Mass Commun Q 83(3):615–631
Nechvatal J (1999) Immersive Ideals/critical distances. PhD thesis, University of Wales
Nechvatal J (2009) Immersive ideals/critical distances. LAP Lambert Academic Publishing, Köln
Norman DA (2002) Emotion and design: attractive things work better. Interactions 4:36–42
Norman DA (2004) Emotional design: why we Love (or hate) everyday things. Basic Books,
Cambridge
Nygren TE, Isen AM, Taylor PJ, Dulin J (1996) The influence of positive affect on the decision
rule in risk situations: focus on outcome (and especially avoidance of loss) rather than
probability. Organ Behav Hum Decis Process 66:59–72
Ortony A, Clore GL, Collins A (1988) The cognitive structure of emotions. Cambridge University
Press, Cambridge, England
Ortony A, Turner TJ (1990) What’s basic about basic emotions? Psychol Rev 97:315–331
Pantic M, Rothkrantz LJM (2003) Toward an affect sensitive multimodal human-computer
interaction. Proc IEEE 91(9):1370–1390
Passer MW, Smith RE (2006) The science of mind and behavior. Mc Graw Hill, Boston
Plsek PE (1997) Creativity, innovation and quality. ASQ Quality Press, Milwaukee
Poincaré H (1913) The foundations of science. Science Press, Lancaster
Prekop P, Burnett M (2003) Activities, context and ubiquitous computing. Comput Commun
26:1168–1176
508 9 The Cognitively Supporting Behavior of AmI Systems …
The principal aim of this book was to explore, review, and discuss the state-of-the-art
enabling technologies, computational processes and capabilities, and human-
inspired AmI applications (in which knowledge from the human-directed sciences
such as cognitive science, social sciences, and humanities is incorporated) and to
provide new insights and ideas on how these components could be further enhanced
and advanced. This book intended moreover to identify, document, and address the
main challenges and limitations associated with the engineering, design, modeling,
and implementation of AmI systems, and to put forward alternative research avenues
that provide a more holistic view of AmI and present important contributions for
bringing the vision of the integration of computer intelligence into people’s everyday
lives closer to realization and delivery with real impacts.
The significance of the research combing technological, human, and social
dimensions of AmI lies in its potential to enhance the enabling technologies and
computational processes and capabilities underlying the functioning of AmI tech-
nology, by gaining a better understanding of a variety of aspects of human func-
tioning based on advanced knowledge from human-directed sciences and
effectively amalgamating and applying this knowledge in the field of AmI, with the
primary aim to build well-informed human-inspired AmI applications that can have
a profound and positive impact on people as to enhancing the quality of their lives.
The primary intention with regard to the design of human-inspired applications and
their use in everyday life practices is to contribute to the understanding of existing
problem domains, to emphasize the need for making an effort to broaden the scope
of problem domains, and to encourage search for new ones. Adding to this intent is
to contribute to the appropriate and pertinent solutions to some of the real issues
involved in the realization and deployment of AmI smart spaces. In this regard, it is
crucial to rethink how various human-like intelligences in the form of cognitive and
behavioral processes should be conceived, combined, interrelated, and implemented
in the next generation of AmI systems.
The design of AmI systems should follow a three dimensional framework for
research in AmI as a comprehensive approach: (1) research outputs, including
constructs, models, methods, and instantiations; (2) research activities, including
building, evaluating, theorizing, and justifying (e.g., March and Smith 1995); and
(3) interdisciplinary and transdisciplinary research undertakings (multiperspectival
and holistic analysis for achieving coherent knowledge and broad understanding of
AmI). Real, new problems (e.g., context-aware systems, affective/emotion-aware
systems, socially intelligent systems, conversational systems, etc.) must be properly
conceptualized and represented (using machine learning, ontological, logical, and
hybrid methods, as well as other novel approaches), appropriate techniques and
mechanisms (including sensors, intelligent components/information processing
units, actuators, and networks) for their solution must be constructed, and solutions
(various human-inspired AmI applications) must be implemented and evaluated in
their operating environments using appropriate metrics or criteria. Enabling tech-
nologies and processes involve a wide variety of sensors and actuators, data pro-
cessing approaches, machine learning methods, knowledge representation and
reasoning techniques, intelligent agents, and query languages necessary for the
design and implementation of AmI systems. Moreover, if significant progress is to
be made, AmI research must also develop an understanding of how and why
different systems work or fail, and identify during the evaluation and instantiation
phases which of the enabling technologies and processes are interfering with the
proper functioning of AmI systems in their variety. Such an understanding must
link together natural laws (from natural and formal science) governing AmI systems
with human and social rules (from human-directed sciences) governing the human
environments in which they operate.
AmI is an exciting and fertile area for investigation with many intriguing and
probing questions and extensive work awaiting future interdisciplinary scholarly
research and collaborative industry innovation. This supposes the necessity and
motivation for the AmI vision to become open to further interrogations that are
indeed causing it to fundamentally reconfigure its present beliefs and knowledge
claims and, accordingly, abandon some of the currently prevailing assumptions,
especially those pertaining to the notion of intelligence which has been an integral
part of some of the most tantalizing (visionary) scenarios. Besides, philosophically,
it is important for AmI to recognize and accept its historically conditioned
knowledge, which postulates the acceptance—like all knowledge formations which
are infused with ways-of-seeing—of partial, local, and specific analyses of social
reality, as it is not shaped in significant ways by more majestic and general
structures. However, in the process of revisiting the AmI vision, moving behind its
foundational farsightedness, it is of great importance to attach importance to
ensuring that the user implications are made more explicit by answering the main
question about how the users are—and ought to be—configured in AmI; to
surmounting the inadequacy in, or making more explicit, the consideration for
516 10 Concluding Remarks, Practical and Research Implications …
human values in the design choices that will influence AmI technology as well as
using these values as parameters for reading everyday life patterns with regard to
the innovation process of AmI; to working strategically towards becoming more
driven by humanistic concerns than deterministic ones; and to accepting, under-
standing, and capitalizing on the idea that the AmI innovation process is an
interactive process between technological and societal change, where technology
and society mutually shape and influence one another and they both unfold within
that process, thereby taking into account the user and social dynamics and under-
currents involved in and underlying the innovation process. The whole idea is that a
strong effort should be made in the direction of re-examining and reconfiguring the
vision to achieve in AmI progress from a human and social perspective the same
level of confidence and optimism that exists in advancing technology—i.e., it should
inspire researchers and scholars into a quest for the tremendous possibilities created
by exploring new understandings and adopting alternative strategies for rethinking
the whole idea of intelligence as an essential part of the incorporation of machine
intelligence into people’s everyday lives. A holistic view is one that considers people
and their behavioral patterns and everyday life scenarios and practices when looking
at intelligence, and thus leverage on these elements to generate situated forms of
intelligence. This is most likely to make people want and aspire to give technology a
place in their lives and thus allow the incorporation of computer intelligence in their
everyday lives. AmI holds a great potential to frame the role of new technologies—
but only based on incorporating the user dimensions and the social dynamics in the
innovation process. The push philosophy of AmI alone remains inadequate to
generate successful and meaningful technological systems.
Moving behind its foundational (visionary) vision—can still be seen as a sign of
progress towards delivery—after having contributed significantly to establishing the
field of AmI and thus accomplishing its mission, by inspiring a whole generation of
innovators, scholars, and researchers into a quest for the tremendous opportunities
that have been enabled and created by, and foreseen coming from, the incorporation
of computer intelligence into people’s everyday lives and environments to bring
about a radical and technology-driven social transformation (see, e.g., José et al.
2010; Aarts and Grotenhuis 2009; Gunnarsdóttir and Arribas-Ayllon 2012). The
conspicuous reality pertaining to the scattering of research areas, the magnitude of
challenges, the numerous open and unsolved issues, the unintended implications,
the significant risks, the bottlenecks or stumbling blocks, and the unfeasibility and
unattainability associated with the notion of intelligence and thus the realization of
the AmI vision all imply the high relevance and added sense of exigency as to
revisiting or reexamining the AmI vision. This should though, whether concerning
the notion of intelligence or other prevailing assumptions, not be seen as a failure or
criticism to the blossoming field of AmI, so to speak, but rather as an integral part
of the research advancement in which a vision of the future technology should not
be considered as an end in itself or a set of specified requirements. Instead, it should
be conceived as a place that marks the beginning of a journey from which to depart,
while stimulating debates and depicting possible futures along the way, towards
making it a reality. The underlying assumption is that the AmI field anchored on its
10.3 Revisiting the AmI Vision—Rethinking the Notion of Intelligence … 517
substantial research effort by which AmI vision has in fact fulfilled its mission and
role can aim higher and thus dream realistically bigger, by capitalizing on the
proposed alternative research directions, grasping the meaning and implication of
what AmI vision epitomize for people, valuing holistic approaches, and embracing
emerging trends around the core notions of AmI. Indeed, there is a growing per-
ception that the centripetal movement of the recommended fresh ideas and new
avenues, coupled with human considerations in the future AmI innovation in light
of the emerging and growing body of research findings, enduring principles, per-
tinent solutions for many complex issues, unraveled intricacies, and addressed
challenges can have a significant impact on AmI-driven processes of social trans-
formation—what I identify as ‘the substantiated quintessence of AmI’. Hence, it is
time to direct the effort towards new ways of thinking and striving for coherent
knowledge and understanding of AmI, instead of concentrating on—continuing to
devote huge energies to designing and building, very often reinventing the wheel—
new technologies and their applications and services for enabling the visionary
scenarios and making them for real. Such scenarios were actually meant, when
conceived by technology creators 15 years ago, to highlight the potentials and
illustrate the merits of AmI technology. Especially, most of the visionary scenarios
have proven to be unrealistic when compared with the reality they picture or
futuristic only to correspond to the inspiring and aspiring AmI vision they intend to
instantiate. The whole idea is that AmI has for long been driven by overblown
research agendas concentrated primarily on the potential of technology and its
technical features—perhaps to serve economic and political purposes. It is time to
deliver the promises and confront the expectations with reality for serving human
and social purposes.
All the efforts being made towards a synergetic prosperity and fresh research
endeavors in AmI can be justified by the fact that by all accounts—projects and
reports, technology foresight studies, science and technology policies, research and
technology development, and design and development of new technologies, one
can deduce that there is an unshakable belief in the development of technology
towards AmI as an internet of things that think, with computer intelligence com-
pletely infiltrating human environment, embedded everywhere, and minimal tech-
nical knowledge required to make use of computer technology as to functionality
and communication. Indeed, sensing and computing devices are already embedded
in many everyday objects and existing environments, and this trend will
undoubtedly continue to evolve. Especially, computing devices, which are able to
think and communicate, are becoming increasingly cheap, miniature, sophisticated,
powerful, smart, interconnected, and easy to use, thereby finding application in
virtually all aspects of people’s everyday lives. It is becoming increasingly evident
that AmI environments will be commonplace in the very near future to support
518 10 Concluding Remarks, Practical and Research Implications …
living, work, learning, infotainment, and social spaces through naturalistic multi-
modal interaction and context-aware personalized, adaptive, responsive, and service
provision.
It has been widely acknowledged that the dramatic reduction in cost and high
performance of ICT makes it accessible and widespread. That is to say, these two
factors play a key role in determining or shaping ICT use and application in each
computing era, from mainframe computing (1960–1980), through personal com-
puting (1980–1990) and multiple computing (2000 onwards), to everywhere
computing (2010 onwards). In view of this, the sensing and computing devices,
ubiquitous computing infrastructures, and wireless communication networks
becoming technically matured and financially affordable, coupled with the rise of
the internet and the emergence of Global Computing trend are laying the founda-
tions for a number of AmI applications of varied scale, distribution, and intelligence
in terms of system support and new services pertaining to as well everyday life as
societal spheres. This is increasingly shaping the magnitude and massiveness of the
uses of AmI. Thus, it is only a matter of the advance and prevalence of enhanced
enabling technologies and computational processes and capabilities underlying the
functioning of AmI that the AmI vision will materialize into a deployable com-
puting paradigm, if not a societal paradigm.
The construction of the AmI space is progressing on a hard-to-imagine scale.
A countless number of sensors, actuators, and computing devices (where analysis,
modeling, and reasoning occur) as key AmI technologies are being networked, and
their numbers are set to increase exponentially, by orders of magnitude towards
forming gigantic computing and networking infrastructures spread across different
geographical locations and connected by middleware architectures and global
networks. Middleware serves to linkup several kinds of distributed components and
enable them to interact seamlessly across dispersed infrastructures and disparate
networks, in the midst of a variety of heterogeneous hardware and software systems
(e.g., computers, networks, applications, and services) needed for enabling smart
environments.
At present, the environment of humans, the public and the private, is pervaded
by huge quantities of active devices of various types and forms, computerized
enough—e.g., equipped with artificial intelligent agents—to automate routine
decisions and act autonomously on behalf of human agents. The increasing mini-
aturization of computer technology is making possible the development of minia-
ture sensors that allow registering various human parameters without disturbing
human actors, thereby the commonsensical infiltration of AmI into daily human
environments. The purpose of this pervasion is to model and monitor the way
people live, through employing remote and nearby recognition systems for body
tracking, behavior monitoring, facial expressions, hand gestures, eye movements,
and voices, thanks to biometrics technology. Today, RFID tags are attached to
many objects and are expected to be entrenched in virtually all kinds of everyday
objects, with the advancement of the Internet of Things trend, handling address-
ability and traceability, monitoring and controlling devices, and automating process
controls and operative tools, and so on, on a hard-to-imagine scale. Likewise,
10.4 The Inconspicuous, Rapid Spreading of AmI Spaces 519
much more than what we have done so far, and therefore AmI can no longer be
about a vision of a new world for the future, and driven by distant and overblown
research agendas focused mainly on technological features. AmI has the obligation
to start delivering valuable services’ (José et al. 2010, pp. 1497–1498). Rather, a
genuine value shift is needed to guide the evolution of AmI innovation. Aarts and
Grotenhuis (2009) underscore the need for a value shift: ‘…we need a more bal-
anced approach in which technology should serve people instead of driving them to
the max’. This argument relates to social innovation in the sense of directing the
development of new technologies towards responding to the user and social needs
and creating enduring collaborations between various stakeholders. A value shift
entails the necessity of approaching AmI in terms of a balance between conflicting
individual and social needs and impacts, rather than merely in terms of techno-
logical progress (ISTAG 2012). The underlying assumption is that failing to con-
nect with social development is likely to result in people rejecting new technologies
and societal actors in misallocating or misdirecting resources (e.g., technical R&D).
One way to achieve the objective is to view AmI development as entry into the
networks of social working relationships, involving technology designers, diverse
classes of users, and other involved stakeholders and what they entail in terms of
codified, tacit, creative, and non-technological knowledge, that make AmI systems
possible and enable them to find their way to domestication and social acceptance
and subsequently thrive. In other words, all the stakeholders involved in the value
chain of AmI technology should focus and work with how AmI, with its diverse
application domains, connect to broader systems of socio-material relationships—
thereby the need for insights from social research—in the form of cooperatives of
humans and nonhumans, through which various issues of concern can be dealt with.
Of particular significance in this regard is that, to iterate, human values must
constitute key drivers of AmI innovation and key parameters for reading everyday
life patterns, an important trait of those innovators who judge the successfulness of
their innovations on the basis of the extent to which they primarily deliver real
value to people and benefiting them—first and foremost. Indeed, the key factors and
criteria for technology acceptance and appropriation are increasingly associated
with the way technology is aligned with human values (see José et al. 2010).
Human values form an important part of the society, and guide people’s behavior in
many ways. Incorporating human values into, and bringing them to the forefront of,
the innovation process of AmI is about putting a strong emphasis on people and
their experience with technology as a view that is rather concerned with a much
broader set of issues than just ‘intelligent functionality’ and ‘intuitive usability’,
namely hedonism (pleasure, aesthetics, and sensuous gratification) as well as other
high-level values, such as self-direction (independent thought and action), crea-
tivity, ownership, freedom, privacy, and so on. Consequently, the necessity of the
AmI technological progress to be linked with human and social progress entails
changes to the context in which AmI technology creators and producers operate and
innovate. Besides, the ICT industry has to operate within the wider sociotechnical
context, networked ecosystem, where it is embedded and thus consider the other
stakeholders with their interests, meaning constructions, and notions of action.
10.5 Future Avenues for AmI Technology Development: A General Perspective 521
The continuous success of AmI as ICT innovations will be based on the social
dimension of innovation and, thus, the participative and humanistic dimensions of
design—i.e., the ability and willingness of people to use or acclimatize to the
technological opportunities offered by AmI as well as their active involvement in
the design process, coupled with the consideration for human values in the fun-
damental design choices. This highlights the tremendous value of the emerging
approaches into and trends around technology design and innovation in addressing
the complexity of AmI context, enhancing related application and service devel-
opment, and even managing the unpredictable future as to emerging user behaviors
and needs in the context of AmI. Given its underpinnings—collective interlacing of
concerned people, participative and humanistic design processes, and needed
technological systems and applications, social innovation is a sound and powerful
way to mitigate the risk of unrealism associated with the AmI vision, and thus work
purposefully and strategically towards achieving the essence of the AmI vision,
522 10 Concluding Remarks, Practical and Research Implications …
Although research aimed at improving and extending the knowledge in core sci-
entific and technology domains remains a necessity, it is at these interfaces between
scientific domains that exciting things happen…. The AmI vision should not be
‘oversold’ but neither ISTAG nor the IST research community should shrink from
highlighting the exciting possibilities that will be offered to individuals who will live
in the AmI space’ (Bold in the original). Further research should focus on providing
the knowledge that the involved societal actors will need to make informed decisions
about how to realize the AmI vision in its social context—predicated on the
assumption that it is high time for the AmI community to embrace new emerging
research trends around its core concepts and underlying assumptions.
References
Aarts E, Grotenhuis F (2009) Ambient intelligence 2.0: towards synergetic prosperity. In:
Tscheligi M, Ruyter B, Markopoulus P, Wichert R, Mirlacher T, Meschterjakov A, Reitberger
W (eds) Proceedings of the European conference on ambient intelligence. Springer, Austria,
pp 1–13
Bell G, Dourish P (2007) Yesterday’s tomorrows: notes on ubiquitous computing’s dominant
vision. Pers Ubiquit Comput 11(2):133–143
Bibri SE (2014) The potential catalytic role of green entrepreneurship—technological
eco-innovations and ecopreneurs’ acts—in the structural transformation to a low-carbon or
green economy: a discursive investigation. Master Thesis, Department of Economics and
Management, Lund University
Crabtree A, Rodden T (2002) Technology and the home: supporting cooperative analysis of the
design space. In: CHI 2002, ACM Press
Gunnarsdóttir K, Arribas-Ayllon M (2012) Ambient intelligence: a narrative in search of users.
Lancaster University and SOCSI, Cardiff University, Cesagen
Hallns L, Redstrm J (2002), From use to presence: on the expressions and aesthetics of everyday
computational things. ACM Trans Comput Hum Interact 9(2):106–124
ISTAG (2003) Ambient Intelligence: from vision to reality (For participation—in society &
business), viewed 23 October 2009. http://www.ideo.co.uk/DTI/CatalIST/istag–ist2003_draft_
consolidated_report.pdf
ISTAG (2012) Towards horizon 2020—recommendations of ISTAG on FP7 ICT work program
2013, viewed 15 March 2012. http://cordis.europa.eu/fp7/ict/istag/reports_en.html
José R, Rodrigues H, Otero N (2010) Ambient intelligence: beyond the inspiring vision. J Univers
Comput Sci 16(12):1480–1499
March ST, Smith GF (1995) Design and natural science research on information technology. Decis
Support Syst 15:251–266
Smith A (2003) Transforming technological regimes for sustainable development: a role for
alternative technology niches? Sci Public Policy 30(2):127–135