You are on page 1of 545

Atlantis Ambient and Pervasive Intelligence

Series Editor: Ismail Khalil

Simon Elias Bibri

The Human
Face of Ambient
Intelligence
Cognitive, Emotional, Affective,
Behavioral and Conversational Aspects
Volume 9
Atlantis Ambient and Pervasive Intelligence

Volume 9

Series editor
Ismail Khalil, Johannes Kepler University Linz, Linz, Austria
Aims and Scope of the Series

The book series ‘Atlantis Ambient and Pervasive Intelligence’ publishes high
quality titles in the fields of Pervasive Computing, Mixed Reality, Wearable
Computing, Location-Aware Computing, Ambient Interfaces, Tangible Interfaces,
Smart Environments, Intelligent Interfaces, Software Agents and other related
fields. We welcome submission of book proposals from researchers worldwide who
aim at sharing their results in this important research area.
For more information on this series and our other book series, please visit our
website at:

www.atlantis-press.com/publications/books
Atlantis Press
29, avenue Laumière
75019 Paris, France

More information about this series at www.atlantis-press.com


Simon Elias Bibri

The Human Face of


Ambient Intelligence
Cognitive, Emotional, Affective, Behavioral
and Conversational Aspects
Simon Elias Bibri
Halmstad University
Halmstad
Sweden

ISSN 1875-7669 ISSN 2215-1893 (electronic)


Atlantis Ambient and Pervasive Intelligence
ISBN 978-94-6239-129-1 ISBN 978-94-6239-130-7 (eBook)
DOI 10.2991/978-94-6239-130-7

Library of Congress Control Number: 2015941343

© Atlantis Press and the author(s) 2015


This book, or any parts thereof, may not be reproduced for commercial purposes in any form or by any
means, electronic or mechanical, including photocopying, recording or any information storage and
retrieval system known or to be invented, without prior permission from the Publisher.

Printed on acid-free paper


The entire effort of your mind, soul, and heart
working incessantly and together in tandem is
what it takes to become a master at any skill
and of what you love, but to nourish and
sustain virtually everything you yearn to
master takes more than just your grit and
perseverance. You need someone who, if you
are lucky enough, truly appreciates your
intellect, genuinely wants you to thrive and
prosper, heartily lifts up your spirit, and
constantly makes you feel that life is precious
and worth living because of the things that
you value and cherish the most in it—I am
privileged to be innately inspired into a quest
for the tremendous intangible possibilities
enabled by seeking, embracing, sharing,
questioning, challenging, and humanizing
knowledge.
This book is dedicated to you, Vera, for your
good nature, beautiful soul, authentic moral
fiber, and genuine intellectual curiosity.
And I want to thank you for your indirect
contribution to my most momentous,
enlightening, and memorable intellectual
journey as an episode of substantial metalevel
learning, and for your unending moral
support and being there for me when I need
you. It is so amazing to have someone to lean
on, to draw strength from, and to share
intellectual passions and daily experiences
with—I am so fortunate to have you in my life.
Preface

Aims and Major Themes

I have written this book to help you to explore ambient intelligence (AmI) in all its
complexity, intricacy, variety, and breadth, the many faces of a topical subject that
encompasses so much of modern and future life’s issues and practicalities, and can
be applied and made useful to the everyday lifeworld. Indeed, AmI technology will
pervade and impact virtually every aspect of people’s lives: home, work, learning,
social, public and infotainment environments, and on the move. This vision of a next
wave in information and communication technology (ICT) with far-reaching societal
implications is postulated to offer the possibility of a killer existence, signifying that
it will alter people’s perception of the physical and social world and thus their
notions of action in it, as well as their sense of self and the sense of their relations to
each other, things, and places. AmI is a field where a wide range of scientific and
technological areas and human-directed disciplines converge on a common vision
of the future and the fascinating possibilities and enormous opportunities such future
will bring and open up (as to the numerous novel applications and services that are
more intelligent and alluring as to interaction in both real and cyber spaces) that are
created by the incorporation of computer intelligence and technology into people’s
everyday lives and environments. While the boundaries to what may become
technologically feasible and what kind of impact this feasibility may have on
humans are for the future to tell, some scientists foresee an era when the pace of
technological change and its shaping influence (progress of computer intelligence
and reliance of humans on computer technology) will be so fast, profound, and
far-reaching that human existence will be irreversibly altered.
To facilitate your embarking on exploring the realm of AmI, I have designed the
book around three related aims: to help you gain essential underpinning knowledge
and reflect on the potentials, challenges, limitations, and implications pertaining to
the realization of the AmI vision—with consideration of its revisited core notions
and assumptions; to help you develop a deeper understanding of AmI, as you make
connections between your understandings and experiences (e.g., of using computer

vii
viii Preface

technology and your reliance on it in modern, high-tech society), relevant scientific


and social theories, recent research findings, and the visions and views of computer
scientists and technology creators; and, more importantly, to encourage you to take
part in an ongoing debate about AmI in the twenty-first century, examining con-
trasting perspectives on AmI across a wide range of everyday life domains and
practices.
In sum, this book offers a fresh, wide-ranging, and up-to-date approach to AmI,
combining scientific, academic, and practical relevance with critical reflection. The
latter is meant to question some underlying assumptions of AmI, to test the justi-
fication of taken-for-granted premises pertaining to AmI, or to cogitate intently
about AmI as a form of scientific knowledge in light of the grounds that support it.
The approach aims to provide fertile insights, new perspectives, and refreshing
research alternatives, aiming to contribute to bringing the field of AmI closer to
realization and delivery with concrete impact.

How Did the Book Come into Existence?

There are several factors that have stimulated my innate curiosity to jump into the
ever-evolving or blossoming field of ICT and subsequently stirred my interest in
embarking on writing this book, an intellectual journey into the modern, high-tech
world. I have always been interested in and intrigued by science, technology, and
society as fields of study. The world of science and technology (S&T) has gone
through overwhelming and fast advances that have had significant intended and
unintended effects within modern societies. My interest in exploring issues at the
intersection of those fields, in particular, stems from a deep curiosity about the
contemporary world we live in as to how it functions and the patterns of changing
directions it pursues and also from a desire to meet people from different academic
and cultural backgrounds for the purpose of social and lifelong learning as an
ongoing, voluntary, and self-motivated pursuit of knowledge for good reasons.
Having always been fascinated by the mutual process where science, technology,
and society are shaped simultaneously, I have decided to pursue a special academic
career by embarking on studying diverse subject areas, which has resulted, hitherto,
in an educational background encompassing knowledge from diverse disciplines,
ranging from computer science and engineering to social sciences and humanities.
My passion for other human-directed sciences, which are of relevance to this book,
sprouted in me around the age of fifteen when I read—first out of sheer curiosity—a
mesmerizing book on cognitive and behavioral psychology in the summer of 1988.
And this passion continues to flourish throughout my intellectual and academic
journey. In recent years, I have developed a great interest in interdisciplinary and
transdisciplinary scholarly research and academic writing. Having earned several
Master’s degrees and conducted several studies in the area of ICT, I have more
specifically become interested in topical issues pertaining to AmI, including
affective and aesthetic computing, cognitive and emotional context awareness,
Preface ix

natural human–computer interaction (HCI), computational sociolinguistics and


pragmatics, and interaction design, among other things.
In particular, what draws me to AmI as a distinguished example of a field that
lies at the intersection of technology, science, and society arose from an intrigue
into its postulating a paradigmatic shift not only in computing but also in society.
This renders AmI a sphere of boundless knowledge, extending far beyond the ambit
of computer science and artificial intelligence to include a wide range of other
academic disciplines, ranging from human-directed scientific areas (e.g., cognitive
psychology, cognitive science, cognitive neuroscience) to social sciences (e.g.,
sociology, anthropology) and humanities (e.g., linguistics, communication studies,
philosophy). Indeed, AmI can only be fully developed by a holistic approach,
encompassing scientific, technical, and social research. Further, my interest in AmI
continues to flourish and I enjoy exploring this field. I yearn to discover further the
complexity, intricacy, and expansion of AmI, and have a greater understanding of
what may in the longer run determine its success as to its transformational effects on
society by fully technologizing it. That is, how the AmI vision in its evolution will
balance between its futuristic and innovative claims and realistic assumptions.
In all, the scope of my academic and intellectual interests and the nature of the
field of AmI have had a seminal influence on my choice to undertake the chal-
lenging endeavor of exploring AmI as a new computing paradigm—with a par-
ticular emphasis on humanlike cognitive and behavioral aspects, namely context
awareness, natural interaction, conversational acts, emotional and social intelli-
gence, and affective and aesthetic interaction. Furthermore, over the past few years,
I have tremendously enjoyed the challenge of merging technological, scientific, and
social perspectives in the studies I have carried out as part of my Master studies.
But of all these were two main studies that actually inspired me to write this book:
(1) A Transdisciplinary Study on Context Awareness, Natural Interaction, and
Intelligent Behavior in Ambient Intelligence: Towards Emotionally and Cognitively
Human-inspired Computer Systems and (2) A Critical Reading of the Scholarly and
ICT Industry’s Construction of Ambient Intelligence for Societal Transformation.
I started writing this book in the fall/winter of 2012/2013, about 2 years after I
finished my fourth (research-based) Master’s degree in Computer Science with a
focus on AmI at Blekinge Institute of Technology, Sweden. The writing process
continued ever since alongside my full-time Master studies till June 2014. That is to
say, I have been able to work on the book only during study breaks and vacations.
I can only hope the result proves worth the efforts.

How Can the Book Be Read?

Providing guidelines for the reading of this book is an attempt to domesticate the
unruly readers—who learn, interpret, and respond in different ways. The intention
of this book is to explore the technological, human, and social dimensions of the
large interdisciplinary field of AmI. In the book, I demonstrate the scope and
x Preface

complexity of the field by presenting and discussing different aspects of AmI as


both a computing paradigm in and new vision of ICT. This book focuses on
humanlike cognitive, emotional, social, and conversational understanding and
intelligent behavior of AmI systems in smart environments—in other words, on the
enabling technologies, processes, and capabilities that underlie the functioning of
AmI systems as a class of intelligent entities exhibiting cognitive and behavioral
patterns in the different systems and roles that they form part of within their
operating environments, where their supposedly situated forms of intelligence are
supposed to enable them to achieve a close coupling with their human and social
environment. The range of applications that relate to the scope of AmI in this book
is potentially huge in domains such as workspaces, living spaces, learning, health
care, assisted living in smart homes, and so forth. AmI applications are postulated to
be widened and deepened.

Why Does the Book Stand Out with What It Covers?

In response to the growing need for a more holistic view of AmI and a clear
collaborative approach to ICT innovation and the development of successful and
meaningful human-inspired applications, this book addresses interdisciplinary, if
not transdisciplinary, aspects of a rapidly evolving area of AmI, as a crossover
approach related to lots of computer science and artificial intelligence topics as well
as various human-directed sciences (namely cognitive psychology, cognitive sci-
ence, social sciences, humanities). Up to now, most of the books about AmI focus
their analysis on the advancement of enabling technologies and processes and their
potential only. A key feature of this book is the integration of technological, human,
social, and philosophical dimensions of AmI. In other words, its main strength lies
in the inclusiveness pertaining to the features of the humanlike understanding and
intelligent behavior of AmI systems based on the latest developments and prospects
in research and emerging computing trends and the relevant knowledge from
human and social disciplines and sub-disciplines.
No comprehensive book has, to the best of one’s knowledge, been produced
elsewhere with respect to covering the characteristics of the intelligent behavior of
AmI systems and environments—i.e., the cornerstones of AmI in terms of being
sensitive to users, taking care of needs, reacting and preacting intelligently to
spoken and gestured indications of desires, responding to explicit speech and
gestures as commands of control, supporting social processes and being social
agents in group interactions, engaging in intelligent dialog and mingling socially
with human users, and eliciting pleasant experiences and positive emotions in users
through the affective quality of aesthetic artifacts and environments as well as the
intuitiveness and smoothness of interaction as to computational processes and the
richness of interaction as to content information and visual tools.
In addition, this book explains AmI in a holistic approach—by which it can
indeed be fully developed and advanced, encompassing technological and societal
Preface xi

research. This is accomplished by amalgamating and organizing various strands of


scientific and social theories or concrete conceptual assumptions and their appli-
cability to AmI in a multifaceted, coherent, unified analysis reinforced by a
high-quality synthesis of knowledge from a large body of interdisciplinary research
on and relating to AmI. Apropos, the intent of this book addressing interdisciplinary
and multidisciplinary aspects of AmI (as a crossover approach linked to computer
science and artificial intelligence topics) with human-directed disciplines such as
cognitive psychology, cognitive science, social sciences, and humanities as defined
by leading scholars is to encourage collaboration among people from these scien-
tific and academic disciplines or working on cross-connections of AmI with these
disciplines. Moreover, this book operates under the assumption that it is timely to
look back at the visionary user scenarios (portraying the bright side of life and ideal
type users in a perfect world) and the research outcomes pertaining to, and spanning
a wide variety of topics within, the field of AmI, after 15 years of substantial
research effort, and reflect on the overall achievements of this area. The underlying
premise is that the originators of the AmI vision have gained a much more thorough
understanding of the area of humanlike applications that needs to be examined to
solve current problems and also pertinent and well-informed solutions to several
of the specific issues involved in the realization and deployment of AmI spaces.

Who Am I Writing for?

The intended readership of the book is aimed at students, academics, computer


and cognitive scientists, HCI designers and engineers, modelers in psychology and
linguistics, techno-social scientists, industry experts, research professionals and
leaders, and ICT professionals and entrepreneurs, whether they are new or already
working within the area of AmI. Specifically, I have written this book with two
kinds of readers in mind. First, I am writing to students taking advanced under-
graduate and postgraduate courses in computer science, artificial intelligence,
cognitive science, informatics, interaction design, software development, affective
computing, and aesthetic computing, as well as those pursuing studies in such
subject areas as ICT and society, social studies of new technologies, innovation and
entrepreneurship, and media and communication studies. I have assumed that most
of these students will already have some background in subjects related to com-
puting, human-directed scientific areas, social sciences, or humanities. Those
familiar with AmI will certainly get more out of it and find much that appeals to
them in it than those without that grounding. However, those with more limited
knowledge are supported with detailed explanations of key concepts and elabora-
tion on theoretical perspectives and their applicability and convergence. This is
meant to appease the uninitiated reader. Second, I hope that this book will be useful
resource for people working on cross-connections of AmI with human-directed
scientific areas, social sciences, and humanities, and for anyone who is looking for
an accessible and essential reference of AmI in its various dimensions and from
xii Preface

different perspectives. In all, people in many disciplines will find the varied
coverage of the main elements that comprise the emerging field of AmI as a
socio-technological phenomenon to be of interest. My hope is that this book will be
well suited to people living in modern, high-tech societies.

Who Did Contribute to the Book and What are Its Prospects?

The book obviously benefited—indirectly—from the work of many others. As an


author, I know that I am not the exclusive originator; rather, the book is indebted to
other writings in the field of AmI that have inspired me into finding an original
approach to writing a book that differs from other books on and related to AmI in
terms of the analytical approach, topicality of addressed issues, integration of major
research strands, nature of inclusiveness, diversity and richness of content, and
reflective thinking in terms of scrupulously analyzing and making judgments about
what has happened. This pertains to the research results and the overall accom-
plishments of the area of AmI. While this book has an ambitious agenda, clearly it
is not possible to deal with every aspect of AmI in a single book, nor can it cover all
of my chosen topics in equal depth. Hence, this book makes no claims that it is an
exhaustive study of the subject—but it will add a great value to AmI scholars,
scientists, experts, advocates, and critics, as well as to those who are interested in
AmI as a new computing paradigm or a vision of a next wave in ICT, including
students, intellectuals, and academics from outside the field of AmI.
Lastly, I believe that I have achieved an important objective with this book—that
is, creating a valuable resource for the AmI community. I also believe that there is a
real need for a comprehensive book of AmI—a blossoming field that cuts across
several academic and scientific disciplines. Therefore, I hope that this book will be
enlightening, thought-provoking, and, more importantly, making good reading for
the target audience, and eventually the first edition will be well received.
Acknowledgments

I would like to express my sincerest gratitude to Per Flensburg (West University)


and Sara Eriksén (Blekinge Institute of Technology), professors of computer
science, for their unfailing encouragement. They are the best professors I have ever
had in my long academic journey. They are amazingly wonderful and incredibly
thoughtful people. They inspired me to be an academic author. There are so many
other great things about them. My sincerest appreciation is extended to Kamal
Dissaoui, General Director of EMSI School, for inspiring and encouraging me to
pursue studies in computer science and engineering.
I would like to take this opportunity to express particular thanks to Zeger
Karssen, Publishing Director of Atlantis Press, for his professionalism and under-
standing, while his effective communication style and great commitment to
developing strong partnership with authors made interacting with him such a
delight and choosing Atlantis-Springer Press worthwhile.
A special and profound gratitude goes to my beloved sister, Amina, for her
wholehearted love, immeasurable moral support, and unwavering encouragement.
She has for long been restorative counterbalance to my life.

xiii
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 1
1.1 The Many Faces of AmI . . . . . . . . . . . . . . . . . . . . . . . . ... 1
1.1.1 The Morphing Power, Constitutive Force,
and Disruptive Nature of AmI as ICT Innovations ... 1
1.1.2 Foundational and Defining Characteristics of AmI ... 4
1.1.3 The Essence of the (Revisited) AmI Vision . . . . . ... 4
1.1.4 AmI as a Novel Approach to Human–Machine
Interaction and a World of Machine Learning. . . . ... 5
1.1.5 Human-Inspired Intelligences in AmI Systems . . . ... 6
1.1.6 Human-like Cognitive, Emotional, Affective,
Behavioral, and Conversational Aspects of AmI . . ... 8
1.1.7 Context Awareness and Natural Interaction
as Computational Capabilities
for Intelligent Behavior . . . . . . . . . . . . . . . . . . . ... 8
1.1.8 Situated Forms of Intelligence as an Emerging
Trend in AmI Research and Its Underlying
Premises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.9 Underpinnings and Open Challenges and Issues . . . . . 12
1.2 The Scope and Twofold Purpose of the Book. . . . . . . . . . . . . 15
1.3 The Structure of the Book and Its Contents . . . . . . . . . . . . . . 16
1.4 Research Strategy: Interdisciplinary and Transdisciplinary
Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 18
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 19

xv
xvi Contents

Part I Enabling Technologies and Computational Processes


and Capabilities

2 Ambient Intelligence: A New Computing Paradigm


and a Vision of a Next Wave in ICT . . . . . . . . . . . . . . . . . . . ... 23
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 23
2.2 The Origin and Context of the AmI Vision . . . . . . . . . . . ... 25
2.3 The Current Status, Unrealism, and Technological
Determinism of the AmI Vision . . . . . . . . . . . . . . . . . . . . . . 27
2.4 AmI Versus UbiComp as Visions . . . . . . . . . . . . . . . . . . . . . 30
2.5 AmI Versus UbiComp as Concepts . . . . . . . . . . . . . . . . . . . . 32
2.6 UbiComp and AmI: Definitional Issues . . . . . . . . . . . . . . . . . 33
2.7 More to the Characterizing Aspects of AmI . . . . . . . . . . . . . . 35
2.8 Typologies for AmI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.9 Paradigmatic, Non-paradigmatic, Pre-paradigmatic,
and Post-paradigmatic Dimensions of AmI. . . . . . . . . . . . ... 39
2.9.1 ICT and Computing . . . . . . . . . . . . . . . . . . . . . ... 39
2.9.2 Paradigm and Paradigm Shift . . . . . . . . . . . . . . . ... 40
2.9.3 Computing Paradigm and AmI as an Instance
of a New Computing Paradigm. . . . . . . . . . . . . . ... 41
2.9.4 AmI as a Paradigmatic Shift in Computing . . . . . ... 43
2.9.5 Non-paradigmatic Aspects of AmI . . . . . . . . . . . ... 45
2.9.6 Pre-paradigmatic and Post-paradigmatic
Aspects of AmI . . . . . . . . . . . . . . . . . . . . . . . . ... 46
2.10 Technological Factors Behind the AmI Vision . . . . . . . . . ... 47
2.11 Research Topics in AmI . . . . . . . . . . . . . . . . . . . . . . . . ... 50
2.11.1 Computer Science, Artificial Intelligence,
and Networking . . . . . . . . . . . . . . . . . . . . . . . . ... 50
2.11.2 Middleware Infrastructure . . . . . . . . . . . . . . . . . ... 51
2.12 Human-Directed Sciences and Artificial Intelligence
in AmI: Disciplines, Fields, Relationships,
and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.12.1 Cognitive Psychology . . . . . . . . . . . . . . . . . . . . . . . 52
2.12.2 Cognitive Science . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.12.3 Artificial Intelligence (AI) . . . . . . . . . . . . . . . . . . . . 54
2.12.4 Relationship Between Cognitive Psychology,
Cognitive Science, and AI . . . . . . . . . . . . . . . . . ... 55
2.12.5 Contributions of Cognitive Disciplines
and Scientific Areas to AmI . . . . . . . . . . . . . . . . . . . 57
2.12.6 Neuroscience and Cognitive Neuroscience . . . . . . . . . 59
2.12.7 Linguistics: Single and Interdisciplinary Subfields. . . . 60
2.12.8 Human Communication . . . . . . . . . . . . . . . . . . . . . . 60
Contents xvii

2.12.9 Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.12.10 Sociology and Anthropology (Social, Cultural,
and Cognitive) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3 Context and Context Awareness of Humans and AmI Systems:


Characteristics and Differences and Technological Challenges
and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2 Context from a Transdisciplinary Perspective . . . . . . . . . . . . . 69
3.3 Context (and Context Awareness) in Human Interaction . . . . . 71
3.4 Definitional Issues of Context and Their Implications
for Context-Aware Computing . . . . . . . . . . . . . . . . . . . . . . . 73
3.5 Conceptual Versus Technical Definitions of Context . . . . . . . . 75
3.6 Definition of Context Awareness . . . . . . . . . . . . . . . . . . . . . 77
3.7 Context Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.8 Interactivity Levels of Context-Aware Applications . . . . . . . . . 81
3.9 Context-Aware Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.9.1 Technological Dimensions and Developments
and Application Domains. . . . . . . . . . . . . . . . . . . .. 82
3.9.2 There Is Much More to Context than the Physical
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 83
3.9.3 Cognitive and Emotional Context-Aware
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 85
3.9.4 Common Examples of Context-Aware
Applications and Services: Mobile Computing . . . . .. 86
3.10 Context Awareness: Challenges and Open Issues . . . . . . . . .. 88
3.11 Context and Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 89
3.12 Individual and Sociocultural Meaning of Context
and Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91
3.13 Situated Cognition, Action, and Intelligence . . . . . . . . . . . . .. 92
3.14 Context Inference, Ready-Made Behavior,
and Action Negotiation . . . . . . . . . . . . . . . . . . . . . . . . . . .. 93
3.15 Situation and Negotiation. . . . . . . . . . . . . . . . . . . . . . . . . .. 95
3.16 Operationalizing Context: Simplifications, Limitations,
and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.17 Evaluation of Context-Aware Artifacts. . . . . . . . . . . . . . . . . . 103
3.17.1 Constructs, Methods, Models, and Instantiations . . . . . 103
3.17.2 Evaluation Challenges . . . . . . . . . . . . . . . . . . . . . . . 107
3.18 Design of Context-Aware Applications
and User Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 109
3.18.1 Major Phase Shifts and Design Methods . . . . . . . . .. 109
3.18.2 The Notion of Participation . . . . . . . . . . . . . . . . . .. 110
3.18.3 Participatory Design (PD): The Origin
of User Participation . . . . . . . . . . . . . . . . . . . . . . .. 111
xviii Contents

3.18.4 User-Centered-Design (UCD) . . . . . . . . . . . . . . . ... 112


3.18.5 User-Centrality in AmI . . . . . . . . . . . . . . . . . . . ... 113
3.18.6 The Impoverishment of User Participation
and the Loss of Its Political Connotation . . . . . . . ... 114
3.18.7 Realities and Contradictions of User Participation
in Context-Aware Computing . . . . . . . . . . . . . . . ... 116
3.19 Empowering Users and Exposing Ambiguities:
Boundaries for Developing Critical User Participatory
Context-Aware Applications. . . . . . . . . . . . . . . . . . . . . . ... 118
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 123

4 Context Recognition in AmI Environments: Sensor


and MMES Technology, Recognition Approaches,
and Pattern Recognition Methods . . . . . . . . . . . . . . . . . . . . . . . . 129
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.2 Sensor Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.2.1 Sensor Definition and Sensor Types . . . . . . . . . . . . . 131
4.2.2 Sensor Information and Diversity of Sensing
Areas in Context-Aware Systems . . . . . . . . . . . . . . . 131
4.2.3 Emerging Trends in Sensor Technology. . . . . . . . . . . 132
4.3 Miniaturization Trend in AmI. . . . . . . . . . . . . . . . . . . . . . . . 133
4.3.1 Miniature System Devices and Their Potential . . . . . . 133
4.3.2 Early Dust, Skin, and Clay Projects. . . . . . . . . . . . . . 134
4.4 MEMS Technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.4.1 Defining Characteristics of MMES . . . . . . . . . . . . . . 136
4.4.2 Large Scale Integrated MEMS . . . . . . . . . . . . . . . . . 137
4.4.3 Potentials and Advantages . . . . . . . . . . . . . . . . . . . . 139
4.4.4 Technical and Theoretical Issues and Challenges . . . . 141
4.5 MEMS and Multi-sensor Fusion and Context-Aware
and Affective Computing . . . . . . . . . . . . . . . . . . . . . . . . . .. 143
4.6 Multi-sensor Based Context Awareness . . . . . . . . . . . . . . . .. 145
4.6.1 Multi-sensor Data Fusion and Its Application
in Context-Aware Systems . . . . . . . . . . . . . . . . . . .. 145
4.6.2 Layered Architecture for Emotional (and Cognitive)
Context Awareness . . . . . . . . . . . . . . . . . . . . . . . .. 146
4.6.3 Visual Approach to (Emotional) Context Reading . . .. 149
4.7 Research in Emotional and Cognitive Context Awareness . . .. 150
4.8 Multi-sensor Fusion for Multimodal Recognition
of Emotional States in Affective Computing. . . . . . . . . . . . .. 151
4.9 Multi-sensor Systems: Mimicking the Human Cognitive
Sensation and Perception Processes . . . . . . . . . . . . . . . . . . .. 153
4.10 The State-of-the-Art Context Recognition. . . . . . . . . . . . . . .. 158
4.10.1 Context Recognition Process . . . . . . . . . . . . . . . . .. 159
4.10.2 Movement Capture Technologies and Recognition
Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 160
Contents xix

4.10.3 Context Recognition Techniques, Models,


and Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.10.4 Uncertainty in Context-Aware Computing . . . . . . . . . 180
4.10.5 Basic Architecture of Context Information Collection,
Fusion, and Processing . . . . . . . . . . . . . . . . . . . . . . 185
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

5 Context Modeling, Representation, and Reasoning:


An Ontological and Hybrid Approach . . . . . . . . . . . . . . . . . . . . . 197
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
5.2 Evolution of Context Modeling and Reasoning . . . . . . . . . . . . 199
5.3 Requirements for Context Representation and Reasoning . . . . . 201
5.4 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.4.1 Unique Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.4.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.4.3 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
5.4.4 Simplicity, Reuse, and Expandability. . . . . . . . . . . . . 202
5.4.5 Uncertainty and Incomplete Information . . . . . . . . . . 202
5.4.6 Generality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.5 Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.5.1 Efficiency, Soundness, and Completeness . . . . . . . . . 203
5.5.2 Multiple Reasoning/Inference Methods . . . . . . . . . . . 203
5.5.3 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.6 Requirement for Generic Context Models . . . . . . . . . . . . . . . 204
5.6.1 Heterogeneity and Mobility . . . . . . . . . . . . . . . . . . . 204
5.6.2 Relationships and Dependencies . . . . . . . . . . . . . . . . 205
5.6.3 Timeliness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.6.4 Imperfection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.6.5 Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.6.6 Usability of Modeling Formalisms . . . . . . . . . . . . . . 206
5.6.7 Efficient Context Provisioning . . . . . . . . . . . . . . . . . 206
5.7 Context Models in Context-Aware Computing:
Ontological Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.7.1 Origin and Definitional Issues of Ontology . . . . . . . . 208
5.7.2 Key Characteristics and Fundamentals of Ontology . . . 210
5.7.3 Ontology Components . . . . . . . . . . . . . . . . . . . . . . . 212
5.7.4 Ontological Context Modeling . . . . . . . . . . . . . . . . . 213
5.7.5 Ontological Context Reasoning. . . . . . . . . . . . . . . . . 219
5.7.6 OWL-Based Context Models: Examples
of Architectures for Context Awareness . . . . . . . . . .. 222
5.7.7 Key Components, Features, and Issues
of Architectures of Context-Aware Systems . . . . . . .. 223
5.7.8 Three-Layer Architecture of Context Abstraction . . .. 225
xx Contents

5.8 Hybrid Context Models . . . . . . . . . . . . . . . . . . . . . . . . . ... 227


5.8.1 Examples of Projects Applying Hybrid Approach
to Representation and/or Reasoning. . . . . . . . . . . . . . 228
5.8.2 Towards a Hierarchical Hybrid Model . . . . . . . . . . . . 231
5.8.3 Limitations of Hybrid Context Models. . . . . . . . . . . . 232
5.9 Modeling Emotional and Cognitive Contexts or States. . . . . . . 234
5.10 Examples of Ontology Frameworks: Context-Aware
and Affective Computing . . . . . . . . . . . . . . . . . . . . . . . . ... 236
5.10.1 AmE Framework: A Model
for Emotion-Aware AmI . . . . . . . . . . . . . . . . . . ... 236
5.10.2 Domain Ontology of Context-Aware Emotions . . . ... 238
5.10.3 Cognitive Context-Aware System:
A Hybrid Approach to Context Modeling . . . . . . ... 240
5.11 Key Benefits of Context Ontologies: Representation
and Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 244
5.12 Context Ontologies: Open Issues and Limitations . . . . . . . ... 245
5.13 Context Models Limitations, Inadequacies, and Challenges ... 247
5.13.1 Technology-Driven and Oversimplified
Context Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
5.13.2 Context Models as User Groups Models . . . . . . . . . . 250
5.14 Holisitic Approach to Context Models . . . . . . . . . . . . . . . . . . 251
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

6 Implicit and Natural HCI in AmI: Ambient and Multimodal


User Interfaces, Intelligent Agents, Intelligent Behavior,
and Mental and Physical Invisibility . . . . . . . . . . . . . . . . . . .... 259
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 259
6.2 Definitional Issues, Research Topics, and Shifts in HCI . . .... 260
6.3 HCI Design Aspects: Usability, Functionality, Aesthetics,
and Context Appropriateness . . . . . . . . . . . . . . . . . . . . . . . . 261
6.4 Computer User Interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . 263
6.4.1 Key Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 263
6.4.2 Explicit HCI Characterization . . . . . . . . . . . . . . . . . . 264
6.4.3 Explicit HCI Issues . . . . . . . . . . . . . . . . . . . . . . . . . 264
6.5 The New Paradigm of Implicit HCI (iHCI) . . . . . . . . . . . . . . 266
6.5.1 Internal System Properties of iHCI . . . . . . . . . . . . . . 266
6.5.2 iHCI Characterization . . . . . . . . . . . . . . . . . . . . . . . 267
6.5.3 Analyzing iHCI: Basic Issues . . . . . . . . . . . . . . . . . . 269
6.6 Natural Interaction and User Interfaces . . . . . . . . . . . . . . . . . 270
6.6.1 Application Domains: Context-Aware, Affective,
Touchless, and Conversational Systems . . . . . . . .... 270
6.6.2 Naturalistic User Interfaces (NUIs) . . . . . . . . . . .... 272
6.6.3 Multimodality and Multi-channeling in Human
Communication . . . . . . . . . . . . . . . . . . . . . . . .... 273
Contents xxi

6.6.4 Multimodal Interaction and Multimodal


User Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . .. 273
6.6.5 Context Awareness, Multimodality, Naturalness,
and Intelligent Communicative Behavior in Human
Communication: A Synergic Relationship . . . . . . . .. 274
6.7 Intelligence and Intelligent Agents . . . . . . . . . . . . . . . . . . .. 276
6.7.1 Intelligent Agents in AI and Related Issues . . . . . . .. 277
6.7.2 Intelligent Agents in AmI and Related Issues:
Context-Aware Systems. . . . . . . . . . . . . . . . . . . . .. 282
6.8 Personalized, Adaptive, Responsive, and Proactive
Services in AmI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
6.8.1 Personalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
6.8.2 Adaptation and Responsiveness . . . . . . . . . . . . . . . . 288
6.8.3 Anticipation (and Proactiveness) . . . . . . . . . . . . . . . . 292
6.9 Invisible, Disappearing, or Calm Computing . . . . . . . . . . . . . 295
6.9.1 Characterization and Definitional Issues. . . . . . . . . . . 295
6.9.2 Mental Versus Physical Invisibility
and Related Issues . . . . . . . . . . . . . . . . . . . . . . . .. 297
6.9.3 Invisibility in Context-Aware Computing . . . . . . . . .. 303
6.9.4 Delegation of Control, Reliability, Dependability
in AmI: Social Implications . . . . . . . . . . . . . . . . . .. 303
6.9.5 Misconceptions and Utopian Assumptions . . . . . . . .. 306
6.9.6 Challenges, Alternative Avenues,
and New Possibilities . . . . . . . . . . . . . . . . . . . . . . . 308
6.10 Challenges to Implicit and Natural HCI . . . . . . . . . . . . . . . . . 311
6.11 Interdisciplinary and Transdisciplinary Research . . . . . . . . . . . 314
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

Part II Human-Inspired AmI Applications

7 Towards AmI Systems Capable of Engaging


in ‘Intelligent Dialog’ and ‘Mingling Socially with Humans’ . . . . . 321
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
7.2 Perspectives and Domains of Communication. . . . . . . . . . . . . 323
7.3 Human Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
7.3.1 Nonverbal Communication. . . . . . . . . . . . . . . . . . . . 325
7.3.2 Verbal Communication: A Theoretical Excursion
in Linguistics and Its Subfields. . . . . . . . . . . . . . . .. 336
7.4 Computational Linguistics and Relevant Areas of Discourse:
Structural Linguistics, Linguistic Production,
and Linguistic Comprehension . . . . . . . . . . . . . . . . . . . . . .. 352
xxii Contents

7.5 Speech Perception and Production: Key Issues and Features. .. 354
7.5.1 The Multimodal Nature of Speech Perception. . . . . .. 354
7.5.2 Vocal-Gestural Coordination and Correlation
in Speech Communication . . . . . . . . . . . . . . . . . . .. 358
7.6 Context in Human Communication . . . . . . . . . . . . . . . . . . .. 361
7.6.1 Multilevel Context Surrounding Spoken Language
(Discourse) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 362
7.6.2 Context Surrounding Nonverbal Communication
Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
7.7 Modalities and Channels in Human Communication . . . . . . . . 365
7.8 Conversational Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
7.8.1 Key Research Topics. . . . . . . . . . . . . . . . . . . . . . . . 366
7.8.2 Towards Believable ECAs . . . . . . . . . . . . . . . . . . . . 367
7.8.3 Embodied Conversational Agents (ECAs) . . . . . . . . . 367
7.8.4 Research Endeavor and Collaboration
for Building ECAs . . . . . . . . . . . . . . . . . . . . . . . .. 368
7.8.5 SAIBA (Situation, Agent, Intention, Behavior,
Animation) Framework . . . . . . . . . . . . . . . . . . . . .. 369
7.8.6 Communicative Function Versus Behavior
and the Relationship . . . . . . . . . . . . . . . . . . . . . . .. 370
7.8.7 Taxonomy of Communicative Functions
and Related Issues . . . . . . . . . . . . . . . . . . . . . . . .. 372
7.8.8 Deducing Communicative Functions
from Multimodal Nonverbal Behavior
Using Context . . . . . . . . . . . . . . . . . . . . . . . . . . .. 374
7.8.9 Conversational Systems and Context . . . . . . . . . . . .. 375
7.8.10 Basic Contextual Components in the (Extended)
SAIBA Framework . . . . . . . . . . . . . . . . . . . . . . . .. 376
7.8.11 The Role of Context in the Disambiguation
of Communicative Signals . . . . . . . . . . . . . . . . . . .. 377
7.8.12 Context or Part of the Signal . . . . . . . . . . . . . . . . .. 379
7.8.13 Contextual Elements for Disambiguating
Communicative Signals . . . . . . . . . . . . . . . . . . . . .. 380
7.8.14 Modalities and Channels and Their Impact
on the Interpretation of Utterances and Emotions . . .. 381
7.8.15 Applications of SAIBA Framework: Text-
and Speech-Driven Facial Gestures Generation . . . . .. 383
7.8.16 Towards Full Facial Animation. . . . . . . . . . . . . . . .. 386
7.8.17 Speech-Driven Facial Gestures Based on HUGE
Architecture: an ECA Acting as a Presenter . . . . . . .. 387
7.9 Challenges, Open Issues, and Limitations. . . . . . . . . . . . . . .. 389
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 393
Contents xxiii

8 Affective Behavioral Features of AmI: Affective Context-Aware,


Emotion-Aware, Context-Aware Affective, and Emotionally
Intelligent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
8.2 Emotion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
8.2.1 Definitional Issues . . . . . . . . . . . . . . . . . . . . . . . . . 405
8.2.2 Componential Patterning Approach . . . . . . . . . . . . . . 406
8.2.3 Motivation and Its Relationship to Emotion . . . . . . . . 407
8.2.4 Theoretical Models of Emotions: Dimensional,
Appraisal, and Categorical Models . . . . . . . . . . . . . . 409
8.2.5 Emotion Classification. . . . . . . . . . . . . . . . . . . . . . . 410
8.2.6 Affect Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
8.2.7 A Selection of Relevant Studies . . . . . . . . . . . . . . . . 411
8.3 Emotional Intelligence: Definitional Issues and Models . . . . . . 412
8.4 Affective Computing and AmI Computing . . . . . . . . . . . . . . . 414
8.4.1 Understanding Affective Computing . . . . . . . . . . . . . 414
8.4.2 Examples of the State-of-the-Art Application
Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
8.4.3 Integration of Affective and AmI Computing:
Advancing Emotional Context-Aware Systems . . . . . . 416
8.4.4 More Contributions of Affective Computing
to AmI Computing . . . . . . . . . . . . . . . . . . . . . . . . . 417
8.4.5 Emotional Intelligence in Affective Computing
and Affective AmI . . . . . . . . . . . . . . . . . . . . . . . . . 418
8.4.6 Context in Affective Computing:
Conversational and Emotional Intelligent Systems. . . . 419
8.4.7 Emotions in AmI Research . . . . . . . . . . . . . . . . . . . 420
8.5 Affective and Context-Aware Computing and Affective
Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
8.5.1 Context and Multimodal Recognition . . . . . . . . . . . . 421
8.5.2 Recognizing Affect Display and Other Emotional
Cues in Affective and Context-Aware
HCI Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 423
8.5.3 Studies on Emotion Recognition: Classification
and Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
8.6 Areas of Affective Computing . . . . . . . . . . . . . . . . . . . . . . . 425
8.6.1 Facial, Prosodic, and Gestural Approaches
to Emotion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
8.6.2 A Linguistic Approach to Emotion: Emotiveness . . . . 426
8.7 Facial Expressions and Computing . . . . . . . . . . . . . . . . . . . . 427
8.7.1 Facial Expressions: Theoretical Perspectives. . . . . . . . 428
8.7.2 Recognizing Emotion from Facial Expressions:
Humans and HCI Applications . . . . . . . . . . . . . . . . . 429
xxiv Contents

8.7.3 Research Endeavors in Facial Expression


Recognition in HCI. . . . . . . . . . . . . . . . . . . . . .... 430
8.7.4 The Common Three-Phase Procedure
of Facial Expression Recognition . . . . . . . . . . . .... 431
8.8 Approaches, Frameworks, and Applications . . . . . . . . . . .... 433
8.8.1 Towards Context-Aware Effective AmI Systems:
Computing Contextual Appropriateness
of Affective States . . . . . . . . . . . . . . . . . . . . . .... 434
8.8.2 Multimodal Context-Aware Affective Interaction .... 435
8.8.3 Emotion-Aware AmI . . . . . . . . . . . . . . . . . . . . .... 435
8.9 Socially Intelligent AmI Systems: Visual, Aesthetic,
Affective, and Cognitive Aspects . . . . . . . . . . . . . . . . . .... 436
8.10 Evaluation of AmI Systems in Real-World Settings:
Emotions and User Experience . . . . . . . . . . . . . . . . . . . .... 440
8.11 Issues, Limitations, and Challenges . . . . . . . . . . . . . . . . .... 444
8.11.1 Application of Ability EIF and the Issue
of Complexity . . . . . . . . . . . . . . . . . . . . . . . . .... 444
8.11.2 Debatable Issues of Emotions in Affective
Computing and AmI . . . . . . . . . . . . . . . . . . . . .... 445
8.11.3 Interpretative and Cultural Aspects of Emotions . .... 447
8.11.4 The Link Between Facial Expressions
and Emotions: Controversies and Intricacies . . . . .... 448
8.11.5 The Significance of the Identification
of the Intention of Emotions. . . . . . . . . . . . . . . .... 449
8.11.6 The Impact of Multimodality on Emotion
Meaning and Interpretation . . . . . . . . . . . . . . . .... 451
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 453

9 The Cognitively Supporting Behavior of AmI Systems:


Context Awareness, Explicit Natural (Touchless) Interaction,
Affective Factors and Aesthetics, and Presence. . . . . . . . . . . . . .. 461
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 461
9.2 The Usage of the Term ‘Cognition’ in Cognitive
Psychology and Cognitive Science . . . . . . . . . . . . . . . . . . . . 463
9.3 Cognitive/Mental Processes . . . . . . . . . . . . . . . . . . . . . . . . . 464
9.4 Cognitive Context-Aware Computing . . . . . . . . . . . . . . . . . . 465
9.4.1 Internal and External Context . . . . . . . . . . . . . . . . . . 465
9.4.2 Cognitive Context Awareness. . . . . . . . . . . . . . . . . . 466
9.4.3 Methods for Capturing Cognitive Context . . . . . . . . . 468
9.4.4 Application Areas of Cognitive Context Awareness. . . 469
9.4.5 Eye Gaze and Facial Expressions: Cognitive
Context That Appears Externally . . . . . . . . . . . . . .. 472
9.4.6 Challenges and Limitations . . . . . . . . . . . . . . . . . .. 475
Contents xxv

9.5 New Forms of Explicit Input and Challenges . . . . . . . . . . . .. 477


9.5.1 Speech, Eye Gaze, Facial Expressions,
and Gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 477
9.6 The Relationship Between Aesthetics, Affect,
and Cognition in AmI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
9.6.1 Affect and Related Concepts and Theories . . . . . . . . . 481
9.6.2 Aesthetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
9.6.3 Artifact Experience Versus Aesthetic Experience . . . . 485
9.6.4 Appraisal Theory: Emotional Response
to the External Environment. . . . . . . . . . . . . . . . . .. 487
9.6.5 Aesthetics and Affect in AmI Design
and Use Context . . . . . . . . . . . . . . . . . . . . . . . . . .. 487
9.6.6 The Evolving Affective-Ambient-Aesthetic
Centric Paradigm . . . . . . . . . . . . . . . . . . . . . . . . .. 490
9.6.7 Affect and Cognition in the AmI Use Context . . . . .. 491
9.6.8 Relationship Between Affect, Mood,
and Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . .. 492
9.6.9 Creativity and the Relationship Between
Affect and Creative Cognition or Thought . . . . . . . .. 494
9.6.10 The Effect of Aesthetics and Intelligent
Behavior of AmI Systems on Mood and Immersion . .. 496
9.7 Presence in Computing and AmI . . . . . . . . . . . . . . . . . . . .. 497
9.7.1 Definitions of Presence . . . . . . . . . . . . . . . . . . . . .. 497
9.7.2 Expanding and Reconfiguring the Concept
of Presence in AmI . . . . . . . . . . . . . . . . . . . . . . . . . 499
9.7.3 Interdisciplinary Research in Presence . . . . . . . . . . . . 500
9.7.4 Challenges to Presence in AmI . . . . . . . . . . . . . . . . . 502
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504

Part III Conclusion

10 Concluding Remarks, Practical and Research Implications,


and Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 513
10.1 A Comprehensive Design Approach to AmI Systems . . . .... 513
10.2 The Need for Interdisciplinary Research . . . . . . . . . . . . .... 514
10.3 Revisiting the AmI Vision—Rethinking the Notion
of Intelligence—and Fresh Possibilities and Opportunities .... 515
10.4 The Inconspicuous, Rapid Spreading of AmI Spaces . . . . .... 517
10.5 Future Avenues for AmI Technology Development:
A General Perspective . . . . . . . . . . . . . . . . . . . . . . . . . .... 519
10.6 The Seminal Role of Social Innovation and Participative
and Humanistic Design in the Sustainability
of AmI Technology. . . . . . . . . . . . . . . . . . . . . . . . . . . .... 521
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 523
List of Figures

Figure 2.1 Ambient intelligence system. Source Gill


and Cormican (2005) . . . . . . . . . . . . . . . . . . . . . . . . . . .. 35
Figure 4.1 Smart dust. Source Kahn et al. (1999) . . . . . . . . . . . . . . .. 135
Figure 4.2 Flip-chip monolithic MEMS with actuators
and sensors. Source Lyshevski (2001) . . . . . . . . . . . . . . .. 137
Figure 4.3 High-level functional block diagram of large-scale
MEMS with rotational and translational actuators
and sensors. Source Lyshevski (2001) . . . . . . . . . . . . . . .. 138
Figure 4.4 Use of multiple, diverse sensors for emotional,
cognitive, and situational context awareness . . . . . . . . . . .. 146
Figure 4.5 Layered architecture for abstraction from raw sensor
data to multi-sensor based emotional context . . . . . . . . . . .. 147
Figure 4.6 Context feature space. Source Schmidt et al. (1999) . . . . . .. 169
Figure 4.7 Framework to combine the ingredients.
Source Bosse et al. (2007) . . . . . . . . . . . . . . . . . . . . . . .. 171
Figure 4.8 Context awareness as an adaptive process where
the system incrementally creates a model of the world
it observes. Source Adapted from Van Laerhoven and
Gellersen (2001) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 173
Figure 4.9 Basic multilayered architecture underlying context
information processing . . . . . . . . . . . . . . . . . . . . . . . . . .. 187
Figure 5.1 Overview of the different layers of semantic context
interpretation and abstraction. Source Bettini et al. (2008) . .. 226
Figure 5.2 Context reasoning architecture. Source Lassila and
Khushraj (2005). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 230
Figure 5.3 Multilayer framework. Source Adapted from
Bettini et al. (2010) . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 231
Figure 5.4 The ambient intelligence framework.
Source Zhou et al. (2007) . . . . . . . . . . . . . . . . . . . . . . . .. 237
Figure 5.5 Relationship among modules in the domain ontology
of emotional concepts. Source Cearreta et al. (2007) . . . . .. 239

xxvii
xxviii List of Figures

Figure 5.6 Context inference and service recommendation


and procedure. Source Kim et al. (2007). . . . . . . . . . . . . .. 242
Figure 5.7 Prototype framework. Source Kim et al. (2007) . . . . . . . . .. 242
Figure 6.1 Simple reflex agent. Source Russell and Norvig (2003) . . .. 279
Figure 6.2 Model-based reflex agent. Source Russell
and Norvig (2003) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 279
Figure 6.3 Model-based, goal-oriented agent. Source Russell
and Norvig (2003) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 280
Figure 6.4 General learning agent. Source Russell and Norvig (2003). .. 281
Figure 6.5 Utility-based agent. Source Russell and Norvig (2003) . . . .. 282
Figure 7.1 The four main types of auditory-visual fusion models.
Source Schwartz et al. (1998) and Robert-Ribes (1995) . . .. 357
Figure 7.2 The SAIBA framework for multimodal behavior,
showing how the overall process consists of three
sub-processes at different levels of abstraction, starting
with communication intent and ending in actual
realization in the agent’s embodiment.
Source Vilhjálmsson (2009) . . . . . . . . . . . . . . . . . . . . . .. 370
Figure 7.3 Rules that map functions to behavior assume a certain
context like the social situation and culture.
Source Vilhjálmsson (2009) . . . . . . . . . . . . . . . . . . . . . .. 373
Figure 7.4 Communicative function annotated in a real-time chat
message helps produce an animated avatar that
augments the delivery. Source Vilhjálmsson (2009) . . . . . .. 384
Figure 7.5 An embodied conversational agent architecture where
the central decision module only deals with an abstract
representation of intent. Source Vilhjálmsson (2009) . . . . .. 385
Figure 7.6 Universal architecture of HUGE system adapted to
audio data as inducement. Source Zoric et al. (2009) . . . . .. 388
Figure 7.7 From left to right: neutral pose, eyebrow movement,
head movement, and eye blink.
Source Zoric et al. (2009) . . . . . . . . . . . . . . . . . . . . . . . .. 388
Figure 8.1 Example figure of SAM: the arousal dimension.
Source Desmet (2002) . . . . . . . . . . . . . . . . . . . . . . . . . .. 409
Figure 8.2 The six universal facial expressions.
Source Kanade et al. (2000) . . . . . . . . . . . . . . . . . . . . . .. 431
List of Tables

Table 4.1 Sensor technologies. . . . . . . . . . . . . . . . . . . . . . ........ 132


Table 4.2 Real-world situations related to sensor data . . . . . ........ 146
Table 7.1 Interaction function, content functions, and mental
states and attitude functions . . . . . . . . . . . . . . . . ........ 374
Table 8.1 Structure of emotion in English conversation . . . . ........ 427
Table 9.1 Using context metadata to retrieve documents . . . ........ 470

xxix
About the Author

Simon E. Bibri is Research Associate at the School of Business and Engineering,


Halmstad University. He has a true passion for academic and lifelong learning and a
natural thirst for knowledge. Having above all been intrigued by the relationship
between scientific knowledge, technological systems, and society, he has wittingly
and voluntarily chosen to pursue an unusual academic journey by embarking on
studying a diverse range of subjects—at the interaction of science, technology, and
society. His intellectual pursuits and endeavors have resulted, hitherto, in an edu-
cational background encompassing knowledge from, and meta-knowledge about,
different academic disciplines. He holds a Bachelor of Science in computer engi-
neering with a major in ICT strategic management, a research-based Master of
Science in computer science with a focus on Ambient Intelligence and ICT for
sustainability, a Master of Science in computer science with a major in informatics,
a Master of Science in entrepreneurship and innovation, a Master of Science in
strategic leadership toward sustainability, a Master of Science in sustainable urban
planning, a Master of Social Science with a major in business administration
(MBA), a Master of Arts in communication and media for social change, a post-
graduate degree in economics and management, and other certificates in project
management, teaching for sustainability, economics of innovation, and policy and
politics in the European Union. He has received his Master’s degrees from different
universities in Sweden, namely Lund University, Blekinge Institute of Technology,
West University, and Malmö University.
Before starting his Master studies’ endeavor, Bibri worked as an ICT strategist
and business engineer. In 2004, he founded a small business and consulting firm
where he served as a sustainability and green ICT strategist for four years. Over the
last few years, he has been involved in a number of research and consulting projects
pertaining to the Internet of Things (IoT), green ICT strategy, sustainability inno-
vations, entrepreneurship and business model innovation, clean and energy effi-
ciency technology, sustainable urban planning, and eco-city and smart city. Since
his graduation in June 2014, he has been working as a freelance consultant in his
areas of expertise, giving lectures on specialized topics, and writing his second

xxxi
xxxii About the Author

book on the social shaping of AmI and the IoT as science-based technologies—a
study in science, technology, and society (STS). This book has been completed and
delivered to the publisher.
Bibri has a genuine interest in interdisciplinary and transdisciplinary research. In
light of his varied academic background, his research interests include AmI, the
IoT, social shaping of science-based technology, philosophy and sociology of
scientific knowledge, sustainability transitions and innovations, governance of
socio-technical changes in technological innovation systems, green and knowledge-
intensive entrepreneurship/innovation, clean and energy efficiency technology,
green economy, ecological modernization, eco-city and smart city, and S&T and
innovation policy. As to his career objective, he would like to take this opportunity
to express his strong interest in working as an academic or in pursuing an inter-
disciplinary Ph.D. in a well-recognized research institution or center for research
and innovation.
Chapter 1
Introduction

1.1 The Many Faces of AmI

1.1.1 The Morphing Power, Constitutive Force,


and Disruptive Nature of AmI as ICT Innovations

Since the early 1990s, computer scientists have had the vision that ICT could do
much more and offer a whole range of fascinating possibilities. ICT could weave into
the fabric of society and offer useful services—in a user-friendly, unobtrusive, and
natural way—that support human action, interaction, and communication in various
ways wherever and whenever needed (e.g., Weiser 1991; ISTAG 2001). At present,
ICT pervades modern society and has a significant impact on people’s everyday
lives. And the rapidly evolving innovations and breakthroughs in computing and the
emergence of the ensuing new paradigms in ICT continue to demonstrate that there
is a tremendous untapped potential for adding intelligence and sophistication to ICT
to better serve people and transform the way they live, by unlocking its transfor-
mational effects as today’s constitutive technology. In recent years, a range of new
visions of a next wave in ICT with far-reaching societal implications, such as AmI,
ubiquitous computing, pervasive computing, calm computing, the Internet of
Things, and so on, and how they will shape the everyday of the future have generated
and gained worldwide attention, and are evolving from visions to achievable real-
ities, thanks to the advance, prevalence, and low cost of computing devices, mini-
ature sensors, wireless communication networks, and pervasive computing
infrastructures. AmI is the most prevalent new vision of ICT in Europe. Due to its
disruptive nature, it has created prospective futures in which novel applications and
services seem to be conceivable, and consequently visionaries, policymakers, and
leaders of research institutes have placed large expectations on this technology,
mobilized and marshaled R&D resources, and inspired and aligned various stake-
holders towards its realization and delivery. As a science-based technological
innovation, it is seen as indispensable for bringing more advanced solutions for

© Atlantis Press and the author(s) 2015 1


S.E. Bibri, The Human Face of Ambient Intelligence, Atlantis Ambient
and Pervasive Intelligence 9, DOI 10.2991/978-94-6239-130-7_1
2 1 Introduction

societal problems, augmenting everyday life and social practices, and providing a
whole range of novel services to consumers.
Indeed, common to all technological innovations is that they have strong effects
on people. They are very meaningful innovations because they do offer advance-
ments in products and services that can have significant impacts on people’s
everyday lives and many spheres of society. The underlying premise is that they
have power implications in the sense that they encapsulate and form what is held as
scientific knowledge and discourse, which is one of today’s main sources of
legitimacy in knowledge production as well as policy- and decision-making in
modern society. Thanks to this legitimization capacity, technological innovations
can play a major role in engendering social transformations—in other words, the
power effects induced by scientific discourse determine their success and expansion
in society. They embody a morphing power, in that they change how society
functions, creating new social realities and reshaping how people construct their
lives. Therefore, they represent positive configurations of knowledge which have
more significant intended and unintended effects within modern society. They have
indeed widely been recognized as a vehicle for societal transformation, especially
as a society moves from one technological epoch to another (e.g., from industrial to
post-industrial/information society). Technological epoch has been over the past
few decades predominantly associated with ICT or computing, more specifically
from the 1960s, the second half of the twentieth century to the beginning of the
twenty-first century.
The first half of the twenty-first century is heralding new behavioral patterns of
European society towards technology: ICT has become more sophisticated, thanks
to innovation, and deeply embedded into the fabric of European society—social,
cultural, economic, and political structures and practices. Hence, it is instigating and
unleashing far-reaching societal change, with its constitutive effects amounting to a
major shift in the way the society is starting to function (ISTAG 2006) and is
unfolding. ICT as a constitutive technology is a vision that builds upon the AmI
vision (Ibid). In this vision, computing devices will be available unobtrusively
everywhere and by different means, supporting human action, interaction, and
communication in a wide variety of ways whenever needed. This implies that a
degree of social transformation is present in AmI scenarios, whether they are
visionary, conceived by the creators of AmI technology, extrapolated from the
presence based on their view to illustrate the potential and highlight the merits of
that technology, or substantiated, determined by findings from in-depth studies
aimed at reconciling the futuristic and innovative claims of the AmI vision and its
realistic assumptions.
AmI has been a multidisciplinary field strongly driven by a particular vision of
how the potential of ICT development can be mobilized to shape the everyday of
the future and improve the quality of people’s lives. This has been translated into
concrete strategies, whereby the AmI vision has been attributed a central role in
shaping the field of new ICT and establishing its scenarios, roadmaps, research
agendas, and projects. With this in mind, ICT as a constitutive technology repre-
sents a widening and deepening of AmI strategies at the level of societal
1.1 The Many Faces of AmI 3

applications (ISTAG 2006), and AmI as a new paradigm in ICT is assumed to


transform the role of ICT in society and ultimately the way people live and work. In
the AmI vision, ICT and its applications are therefore widened and deepened as
well, implying a drastic shift in such dimensions as ICT users and consumers,
incorporation of ICT into different living and working spheres, the multiplicity of
applications and services offered, and the scale of the stakeholders involved (Punie
2003).
In light of the tremendous opportunities residing in deploying and implementing
AmI systems on different scales, intelligence, and distribution, a horde of new
applications and services is being heralded and unleashed, which is leading to
capabilities and triggering intentions which are in turn creating un-intentions. Put
differently, AmI offerings are creating new users and consumers and reshaping what
people want and need and transforming the way they do things. Technological
innovation-oriented philosophy asserts that people prefer technology offerings that
provide advanced performance and value—the technological superiority (see
Schindehutte et al. 2009). In a nutshell, AmI is demonstrating the potential to
engage people minds and imaginations. In fact, common to technological visions of
the future is that they have the power not only to catch people’s minds and
imaginations, but also to inspire them into a quest for new possibilities
and untapped opportunities, encourage them to think outside common mindsets,
and prompt them to reconfigure their current convictions. This is enabled by, in the
context of AmI, the incorporation of computer intelligence and technology into
people’s everyday lives. Once fully released into society, AmI offerings will
become subjects to several forces and processes that will change their path in both
predictable and unpredictable directions, and they will concurrently with the society
evolve in an emergent series of exchanges (see Bibri 2014). AmI offerings are
active forces—human creations with power over humans. In other words, AmI
technologies are disruptive innovations and directed at a complex, dynamic social
environment, entailing situated and intertwined dynamics and made of an infinite
richness of circumstances that cannot be easily predicted or fully anticipated. As
technological developments, AmI may offer new innovation opportunities that, in
relation to social transformation which results from a merger of systematic and
unique factors, cannot be foreseen as to its intended effects until AmI technology
reaches and also permeates society. But to mitigate the risks and uncertainties
surrounding the development of AmI, it is crucial that it takes into consideration the
social context where it is embedded and operates—and its evolution is actually
determined. Accordingly, future scenarios should be considered with caution in
terms of the ambitious and inspiring vision of AmI they intend to instantiate and in
terms of the achievability of reality they intend to predict. Crucially, technological
advancement should go hand in hand with the social dynamics and undercurrents
involved in the innovation process of AmI. Indeed, it is posited that technology
develops dependently of society, in a process where they mutually affect each other
and evolve and are shaped at the same time as part of the innovation process.
4 1 Introduction

1.1.2 Foundational and Defining Characteristics of AmI

The AmI vision was essentially proposed and published in 1999 by Information
Society Technologies Advisory Group (ISTAG), the committee which advises the
European Commission’s Information Society Directorate General—the Information
Society Technology (IST) program. It postulates a new paradigmatic shift in
computing and constitutes a large-scale societal discourse as a cultural manifesta-
tion and historical event caused to emerge as a result of the remaking of social
knowledge, with strong implications for reshaping the overarching discourse of
information society. It offers technological evolution driven by integrating intelli-
gence in ICT applications and services in ways to transform computer technology
into an integral part of everyday life, and thus make significant impacts on society.
This vision has been promoted by, and attracted a lot of interest from, government
science and technology agencies, research and innovation policy, industry, tech-
nical research laboratories, research centers, and universities.
Materialized as a multidisciplinary field within—or rather inspired by the vision
of—ubiquitous computing, attracting substantial research, innovation, funding, and
public attention as well as leading to the formation of many consortiums and research
groups, AmI provides an all-encompassing and far-reaching vision on the future of
ICT in the information society, a vision of the future information society where
everyday environments will be permeated by computer intelligence and technology:
humans will be surrounded and accompanied by advanced sensing and computing
devices, multimodal user interfaces, intelligent software agents, and wireless and
ad-hoc networking technology, which are everywhere, invisibly woven into the
fabric of space, in virtually all kinds of everyday objects (e.g., computers, mobile
phones, watches, clothes, furniture, appliances, doors, walls, paints, lights, books,
paper money, vehicles, and even the flow of water and air), in the form of tiny
microelectronic processors and networks of miniature sensors and actuators, func-
tioning unobtrusively in the background of human life and consciousness. The
logically malleable nature of this computationally augmented everyday environment
—seamlessly composed of a myriad of heterogeneous, distributed, networked, and
always-on computing devices, available anytime, anywhere, and by various means,
enabling people to interact naturally with smart objects which in turn communicate
with each other and other people’s objects and explore their environment—lends
itself to a limitless functionality: is aware of people’s presence and context; adaptive,
responsive, and anticipatory to their desires and intentions; and personalized and
tailored to their needs, thereby intelligently supporting their daily lives through
providing unlimited services in new, intuitive ways and in a variety of settings.

1.1.3 The Essence of the (Revisited) AmI Vision

The essence of AmI vision lies in that the integration of computer intelligence and
technology into people’s everyday lives and environments may have positive,
1.1 The Many Faces of AmI 5

profound, and long-term impacts on people and society—radical social transfor-


mation as promised by the creators of AmI technology. This does not necessarily
mean according to the visionary scenarios developed fifteen years ago, at the
inception of the AmI vision. Indeed, these scenarios are grounded in unrealistic
assumptions pertaining to unreasonable prospects, of limited modern applicability,
on how people, technology, and society will evolve, as well as to an oversimpli-
fication of the rather complex challenges involved in enabling future scenarios or
making them for real. They also underscore the view of technological determinism
as to technological progress and its societal dissemination, which makes the vision
of AmI fall short in considering the user and social dynamics and undercurrents
involved in the innovation process. The current reality is different from the way it
was predicted back then, fifteen years ago, and as a result the AmI vision is being
revisited and thus undergoing major restructuring based on new and alternative
research directions in relation to its core notions (especially intelligence),
embracing key emerging trends that are conjectured to take the vision closer to
realization and delivery in ways that bring it back to its quintessence. So, if visions
are to be built, challenged, and then transformed—in this particular context achieve
an appropriate balance between futuristic and innovative claims and realistic
assumptions—and thus become true, people will live in a world of AmI, where
computers are omnipresent, invisibly integrated in everyday life world, and func-
tioning in a close coupling with humans, not unobtrusively, to support living,
working, learning, social, and infotainment spaces. This implies, in addition to
transforming people’s lives for the better, that interaction between human users and
technology will radically change.

1.1.4 AmI as a Novel Approach to Human–Machine


Interaction and a World of Machine Learning

AmI is heralding and giving rise to new ways of interaction and interactive appli-
cations, which strive to take the holistic nature of the human user into account—e.g.,
context, behavior, emotion, intention, and motivation. It has emerged as a result of
amalgamating recent discoveries in human communication, computing, and cogni-
tive science towards natural HCI—AmI technology is enabled by effortless (implicit
human–machine) interactions attuned to human senses and adaptive and proactive to
users. Therefore, as an integral part of everyday life, AmI promises to provide
efficient support and useful services to people in an intuitive, unobtrusive, and
natural fashion. This is enabled by the human-like understanding of AmI interactive
systems and environments and the varied features of their intelligent behavior,
manifested in taking care of needs, reacting and pre-acting intelligently to verbal and
nonverbal indications of desires; reacting to explicit spoken and gestured com-
mands; supporting social processes and being competent social agents in social
interactions; engaging in intelligent dialogs and mingling socially with human users;
and eliciting pleasant experiences and positive emotions in users through the
6 1 Introduction

affective quality of aesthetic artifacts and environments as well as the intuitiveness


and smoothness of interaction as to computational processes and the richness of
interaction as to content information and visual tools.
To iterate, the world of AmI is about a vision of a future filled with smart and
interacting everyday objects and devices. It entails incorporating tiny microelec-
tronic processors as well as sensing, actuating, and communication capabilities into
such objects and devices and thus enriching them by intelligence to make them
‘smart’ in terms of anticipating and responding to stimuli from the world that
surrounds them. Hence, the interaction of human users with technological artifacts
is no longer to be conceived of as from a user towards a nonhuman machine, but
rather of a user towards an object with human-like cognitive and behavioral aspects,
‘object-becomes-subject’, something that is able to learn, reason, and react. Indeed,
AmI is seen as a world of machine learning and reasoning, where computing
systems and environments observe the states of humans and monitor their behaviors
and actions along with the changes in the environment in circumambient ways,
using multiple, diverse sensors and advanced multi-sensor data fusion techniques,
and based on pre-programed heuristics, real-time, ontological, and/or hybrid rea-
soning capabilities, these systems and environments can respond to and anticipate
people’s desires, wishes, and intentions, thereby providing various services using
actuators to react and pre-act in the physical world.
In all, objects and devices are in AmI treated as humans, when they are sig-
nificantly complex enough that their cognitive processes and behavior become
difficult to understand. This entails that interaction design takes into concepts that
enhance the kind of human-like understanding and behavior of AmI, such as social
intelligence, emotional intelligence, cognitive intelligence, conversational intelli-
gence, as well as aesthetic and affective interaction. The interaction between human
factors and design in the realm of AmI continues to be a challenging task in the
working-out of human use issues in the design of AmI applications and services.
The human factors are intended to support AmI designs in terms of addressing
human users’ tacit nature of subjective perceptions and emotional responses and
aspirations. New technology designs, which can touch humans in sensible ways, are
essential in addressing affective needs and ensuring pleasant and satisfying user
interaction experiences. In particular, the emotional state of the human user has a
key role in determining and shaping the unfolding of the interaction process.

1.1.5 Human-Inspired Intelligences in AmI Systems

The ‘intelligence’ alluded to in AmI pertains particularly to the environments,


networks, devices, and actions, where it resides and manifests itself and its asso-
ciations with aspects of human functioning in terms of cognitive, affective, and
behavioral processes and established concepts of artificial intelligence and cognitive
science. The areas of artificial intelligence that have been integrated into AmI
encompass: cognitive intelligence, emotional computing, social intelligence, and
1.1 The Many Faces of AmI 7

conversational intelligence and what these capabilities entail in terms of sensing,


pattern recognition, modeling and reasoning, and behaviors, e.g., smart sensors,
machine learning, ontologies, and actuators, respectively. Artificial intelligence is
the branch of computer science that is concerned with understanding the nature of
human intelligence and creating computer systems capable of emulating intelligent
processing and behavior—i.e., the modeling and simulation of intelligent aspects of
humans into machines. AmI relates to artificial intelligence in that AmI deals with
intelligent systems that possess human-inspired cognitive, emotional, social, and
conversational intelligence in terms of both computational processes and behaviors.
These aspects of human intelligence are the most interesting features of AmI sys-
tems. To exhibit human-like intelligent behavior requires high intelligence from AmI
systems. In AmI, cognitive intelligence is associated with context awareness in the
sense of supporting, facilitating, or augmenting such abilities as decision-making
and its accuracy, problem solving, reasoning, complex ideas comprehension,
learning, creativity, visual perception, information retrieval precision, planning, and
so on. Emotional intelligence involves improving users’ abilities to understand,
evaluate, and manage their own emotions and those of others, as well as to integrate
emotions to facilitate their cognitive activities. Social intelligence entails invoking
positive feelings in human users, by eliciting positive emotions and pleasant user
experiences. AmI systems aim at supporting social processes, the forms or modes of
interaction between humans such as adaptation, cooperation, and accommodation,
and being competent agents in social interactions (see Markopoulos et al. 2005;
Nijholt et al. 2004). The latter pertains to various conceptualizations of what is called
‘presence’ with respect to computer-mediated human–human or human–agent
interaction, such as a sense of social richness, manifested in the feeling that one can
have from social interaction; a sense of transportation wherein users feel as if they
are sharing common space with one person or a group of people together; and a sense
of the medium, a computer system/agent, as a social actor (Lombard and Ditton
1997) with the suggestion that people interact with computers socially (e.g., users
receive encouragement, praise, or emotional responses). In addition, a system
designed with socially intelligent features is able to select and fine-tune its behavior
according to the cognitive state (task) and affective state of the user. Both positive
emotions induced by subjective experiences of interaction (i.e., smoothness and
intuitiveness pertaining to computational processes and richness pertaining to
information content) as well as emotional states triggered by subjective, socially
situated interpretation of aesthetics (i.e., affective quality pertaining to visual and
computational artifacts and environments) are hypothesized to improve user per-
formance. And as to conversational intelligence entails enabling users to engage in
face-to-face conversations and mingle socially with computers, using embodied
conversational agents (ECAs), which have a human-like graphical embodiment
(personify the user/computer interface in the form of an animated person), as they are
capable of receiving multimodal input, communicative signals, and generating
multimodal output, communicative behavior, in nearly real-time (e.g., Vilhjálmsson
2009; ter Maat and Heylen 2009).
8 1 Introduction

1.1.6 Human-like Cognitive, Emotional, Affective,


Behavioral, and Conversational Aspects of AmI

The class of AmI applications and systems on focus, under investigation and
review, exhibits human-like understanding and intelligent supporting behavior in
relation to cognitive, emotional, social, and conversational processes and behaviors
of humans. Human-like understanding can be described as the ability of AmI
systems (agents) to analyze (or interpret and reason about) and estimate (or infer)
what is going on in the human’s mind (e.g., ideally how a user perceives a given
context—as an expression of a certain interpretation of a situation), which is a form
of mindreading or, in the case of conversational systems, interpreting communi-
cative intents, as well as in his/her body and behavior, which is a form of facial-,
gestural-, corporal-, and psychophysiological reading or interpreting and disam-
biguating multimodal communicative behavior), as well as what is happening in the
social, cultural, and physical environments. Here, context awareness technology is
given a prominent role. Further, input for computational understanding processes is
observed information acquired from multiple sources (diverse sensors) about the
human user’s cognitive, emotional, psychophysiological, behavioral, and social
states over time (i.e., human behavior monitoring), and dynamic models for the
human’s mental, physiological, conversational, and social processes. For the
human’s psychological processes, such a model may encompass emotional states
and cognitive processes and behaviors. For the human’s physiological processes,
such a model may include skin temperature, pulse, galvanic skin response, and
heart rate (particularly in relation to emotions), and activities. For the human’s
conversational processes, such a model may comprise a common knowledge base,
communication errors and recovery schemes, and language and, ideally, its cog-
nitive, psychological, neurological, pragmatic, and sociocultural dimensions. For
the human’s social processes, such a model may entail adaptation, cooperation,
accommodation, and so on as forms of social interaction. AmI requires different
types of models: cognitive, emotional, psychophysiological, behavioral, social,
cultural, physical, and artificial environment. Examples of methods for analysis on
the basis of these models include facial expression analysis, gesture analysis, body
analysis, eye movement analysis, prosodic features analysis, psychophysiological
analysis, communicative intents analysis, social processes analysis, and so forth.

1.1.7 Context Awareness and Natural Interaction


as Computational Capabilities for Intelligent Behavior

In light of the above, the class of AmI applications showing intelligent behavior are
required to be equipped with context awareness (the ability to sense, recognize, and
react to contextual variables) and natural interaction (the use of natural modalities
like facial expressions, hand gestures, body postures, and speech) as human-like
1.1 The Many Faces of AmI 9

computational capabilities. It is to note that such forms of communication are


utilized by context-aware systems to acquire information as input for interaction
and interface control in AmI environments. They can provide a wealth of infor-
mation about the user’s emotional, cognitive, and physiological states as well as
actions and behaviors, a type of contextual information that can be captured
implicitly by context-aware systems. This is intended to augment the computational
understanding of AmI systems when interacting with users to come up with better
informed actions, thereby adapting their behavior in ways that respond to users’
needs. Natural modalities are also utilized to perform conversational acts in AmI
with respect to intelligent dialog and social mingling with humans. However,
placing greater reliance on knowledge of context, reducing interactions with users
(minimizing input from them and replacing it with knowledge of context), and
providing intelligent services signify that applications become invisible.
Invisibility, where context awareness is given a prominent role, has been a subject
of much debate and criticism in the recent years for it poses a special conundrum
(see below for further discussion). Further to the point, the human-like supporting
behavior entails the system undertaking intelligent actions to provide support to the
user’s cognitive, emotional, social, and conversational needs. This entails utilizing
special user interfaces—equipped with ambient, perceptual, multimodal, hyper-
media, visual, and aesthetical tools. The aim of AmI as a novel approach to HCI is
to create interaction between humans and systems that is closer to natural inter-
action, by mimicking a variety of aspects of human functioning: behavioral patterns
of people in the different systems and functions that they form part of within their
environment. People react with congruence among the various dimensions of their
cognitive world to different (interactive) situations in their environment, and their
functioning takes place in a particular time frame as an integral part of the unfolding
or development phase in which they are operating.

1.1.8 Situated Forms of Intelligence as an Emerging Trend


in AmI Research and Its Underlying Premises

Situated forms of intelligence have emerged as an alternative strategy for rethinking


the connotation of intelligence as alluded to in the AmI vision—the smart envi-
ronment that senses and reacts and pre-acts to people, responding intelligently and
anticipating their desires, wishes, and intentions, without conscious mediation. This
meaning of intelligence relates to the controversial idea of what is known as ‘mental
invisibility’, whereby the system is to take care of the context in which the human
users find themselves, by recognizing their behaviors and actions along with
changes in the environment and (re)acting autonomously on their behalf accord-
ingly. From this perspective, users cannot sense the presence of AmI systems, user
interfaces, nor their full interaction, but only the generated interactive behavior
intended to support, or change the environment of, the user. Consequently, AmI
10 1 Introduction

research has for long concentrated strongly on the development of computational


models for all sorts of possible situations of everyday life and environments,
inspiring a whole generation of researchers, thereby producing a large and broad
body of research, into newfangled techniques, methods, and approaches for
enhancing the computational processes of sensing, multi-fusion data processing,
pattern recognition, inference, and reasoning, as well as advanced enabling tech-
nologies, namely sensors, actuators, and information processing systems. This
particular notion of intelligence, a prevailing assumption underlying many of the
envisioned AmI scenarios, has however been a subject of debate, fueled by many
critics in the field. The whole idea is that the vision of ‘artificial-becoming-human’
intelligence has proven to be a fallacy and failure, while AmI research and
development continues to grapple with what the (artificial) intelligence represents.
In fact, the concepts of human intelligence and artificial intelligence are ill-defined,
never precisely delineated. As human intelligence will always retain a definitional
elusiveness and looseness, so will artificial intelligence, to same extent. Indeed,
human intelligence has taken on many definitions such as in terms of capacity for
logical thinking, abstract thought, learning, understanding, emotional and social
knowledge, creativity, problem solving, planning, and communication, but to name
a few. However, while in the visionary work of the AmI research community
eminent failings of artificial reasoning are not accounted for (Gunnarsdóttir and
Arribas-Ayllon 2012), the vision of true intelligence continues to be reconstructed
around new concepts, ideas, and problems to solve (Aarts and de Ruyter 2009).
Hypothetically, intelligent actions can be triggered as a result of a close coupling
between the system/agent and the user (e.g., Lindblom and Ziemke 2002) or can be
taken autonomously on behalf of the user. In the latter case, the idea that AmI
systems should be able to sense, analyze, model, and understand—i.e., detect,
capture, conceptualize, encode, interpret, reason about, and infer—contexts or sit-
uations in a way that they can adaptively and proactively take the most suitable or
pertinent actions (intelligent behavior) has generated an increasing level of reproach
that basically challenges and questions the computational feasibility of the notion of
intelligence prevailing in AmI, which pertains to the inherent complexity and
intrinsic intricacy associated with sensing all kinds of patterns in the physical world
and modeling all sorts of contexts and situations. In this book, alternative ways to
look at intelligence as a core concept in AmI research are documented, surveyed,
and scrupulously discussed in relation to the underlying components of context
awareness and implicit and natural interaction, with consideration of relevant
implications for AmI research. They essentially revolve around situated forms of
intelligence as to the behavior of the software and artificial agent and an intensified
collaboration with the human user—e.g., negotiation about what actions of the AmI
system are suitable for the human user’s situation.
The underlying assumption is that in most non-trivial (human-inspired) appli-
cations, the context amalgam created supposedly to have a human-like under-
standing of human users and their functioning based on observed information and
computationally formalized knowledge (from the human-directed disciplines) are
argued to be inadequate to guide the system’s actions due to the situated nature of
1.1 The Many Faces of AmI 11

human cognition and thus action—the subtlety and intricacy of meaning attribution
to (perception of) context and the evolving nature of the latter, i.e., details of
context are too subjective, elusive, fluid, and difficult to recognize to be modeled
and encoded. Indeed, sensor data are limited or imperfect and existing models must
necessarily be oversimplified. That is to say, they suffer from limitations pertaining
to comprehensiveness, dynamicity, fidelity with real-world phenomena, and
robustness, and thus are associated with inaccuracies. Therefore, an ambience
created based on sensor information about human’s states and behaviors and
computational (dynamic) models for aspects of human functioning may not be the
most effective way of supporting human users in their daily activities or assisting
them in coping with their tasks, by providing services that are assumed—because of
their delivery being done in a particular knowledgeable manner—to improve the
quality of their life. One implication of an irrelevant behavior of the system is a loss
of control over the environment and of freedom to act within it or interact with its
artifacts. Consequently, some scholars called for shunning modeling and antic-
ipating actions as much as possible, particularly in relation to such application
domains as smart home environments and highly demanding circumstances or
tasks. Especially, the vision of the intelligent, caring environment seems to fail to
bring real benefits. Indeed, if the artificial actors (devices) gain control over human
users, it becomes questionable as to whether they will bring an added value to users.
Hence, AmI applications and environments should instead focus on (and ideally
possess) the capacity to respond to unanticipated circumstances of the uses’ actions,
an aspect which in fact makes interactive computer systems, so far, fundamentally
different form human communication (e.g., Hayes and Reddy 1983). Regardless,
user interfaces in AmI systems should, given the constraints of existing technolo-
gies and from an engineering perspective, minimalize modeling and anticipating
actions of the growing variety of users and an infinite richness of interactive situ-
ations. Many of these critical perspectives can be framed within a wider debate over
invisible and disappearing user interfaces underlying AmI technology and the
associated issues pertaining to the weakness of plans as resources in situated actions
(e.g., Suchman 1987, 2005), the negotiation among people involved in situations
(e.g., Lueg 2002), context as an issue of negotiation through interaction (e.g.,
Crutzen 2005), exposing ambiguities and empowering users (e.g., José et al. 2010),
and the development of critical user participatory AmI applications (e.g., Criel and
Claeys 2008).
All in all, the basic premise of situated forms of intelligence is to design AmI
technology that can capitalize on what humans have to offer in terms of intelligence
already embedded in scenarios, practices, and patterns of everyday life and envi-
ronment and hence leverage on their own cognitive processes and behavior to
generate alternative forms of situated intelligence. Instead of AmI technology being
concerned with offering to decide and do things for people or perform tasks on their
behalf—and hence modeling the elusive and complex forms of real-life intelli-
gence, it should offer people further resources to act and thus choose and think,
thereby engaging them more actively by empowering them into the process of
spur-of-the-moment situated cognition and thus action. This entails assisting people
12 1 Introduction

in better assessing their choices and decisions and thus enhancing their actions and
activities. Overall, a quest for situated forms of intelligence is seen by several
eminent scholars as an invigorating alternative for artificial intelligence research
within AmI.

1.1.9 Underpinnings and Open Challenges and Issues

It has widely been acknowledged that the realization (and evolution) of AmI vision
pose enormous challenges and a plethora of open issues in the sense of not being
brought to a conclusion and subject to further thought. AmI is an extremely complex,
complicated, and intricate phenomenon, with so many unsettled questions.
Specifically, AmI is a subject of much debate and current research in the area is
ambiguous; it involves a lot of details or so many parts that make it difficult to deal
with; and it entails many complexly arranged and interrelated elements and factors
which make it demanding to resolve. Therefore, there is a lot to tackle, address,
solve, draw out and develop, and unravel or disentangle in the realm of AmI.
AmI as a multidisciplinary paradigm or ‘crossover approach’ AmI is linked to a lot
of topics related to computer science, artificial intelligence, human-directed scientific
areas (e.g., cognitive psychology, cognitive science, cognitive neuroscience, etc.),
social sciences (e.g., sociology, anthropology, social psychology, etc.), and
humanities (e.g., human communication, single and interdisciplinary subfields of
linguistics, communication and cultural studies, philosophy, etc.). The relevance of
these disciplines and technological and scientific areas to AmI stems from its vision
being far-reaching and all-encompassing in nature, and postulates a paradigmatic
change in computing and society.
To create computer systems that emulate (a variety of aspects of) human
functioning for use in a broadened scope of application domains is no easy task. It
has been recognized by high-profile computer scientists and industry experts to be a
daunting challenge. Building AmI systems poses real challenges, many of which
pertain to system engineering, design, and modeling. This involves the develop-
ment of enabling technologies and processes necessary for the proper operation of
AmI systems and the application and convergence of advanced theoretical models
from many diverse scientific and social disciplines in terms of their simulation and
implementation into machines or computer systems within the areas of AmI and
artificial intelligence as mimicked in the form of processes and behaviors as well as
computationally formalized knowledge.
AmI research and development needs to address and overcome many design,
engineering, and modeling challenges. These challenges concern human-inspired
applications pertaining to various application domains, such as context-aware
computing, emotion-aware/affective computing, and conversational systems. They
include, and are not limited to: paradigms that govern the assemblage of such
systems; techniques and models of knowledge, representation, and run-time
behavior of such systems; methodologies and principles for engineering context
1.1 The Many Faces of AmI 13

awareness, affective interaction, and computational intelligence; methods for


detecting, modeling, understanding, and querying of relevant information for such
systems; the reliability of such systems given that they need to function when they
are needed; the predictability of such systems given that they need to react in ways
they are supposed to; the dependability of such systems given that they need to
deliver what they promise; the performance of such systems given that they need to
act in real-time or be timely in acting; and enabling adaptation in such systems
through dynamic learning and a combination of real-time and pre-programed
heuristics reasoning; as well as full user participation in the design and development
of such systems and understanding different users’ needs, and how they can be
fulfilled in different settings; but to name a few.
To further advance enabling technologies and processes and thus computational
capabilities of AmI systems requires collaborative endeavors in the form of inter-
disciplinary teams that bring together researchers from diverse research areas within
and outside the ambit of computer science, natural science, and formal science. The
value of interdisciplinary research lies in bringing well-informed engineered and
designed technologies, as this research approach seeks a broader understanding of
AmI as a technological phenomenon for a common purpose. In doing so, it enhances
the computational understanding of a variety of aspects of human functioning—e.g.,
the way perception, emotion, intention, reasoning, and intelligent actions as human
cognitive and behavioral processes work, co-operate, and interrelate—to ultimately
develop effective and successful applications that deliver valuable services and that
can span a whole range of potential domains. It is the amalgamation of recent
discoveries in human-directed sciences—that make it possible to acquire a better
understanding of a variety of aspects of human functioning—and the breakthroughs
at the level of enabling technologies and computational processes, thanks to artificial
intelligence, that has made it possible to build a horde of human-inspired systems
based on this understanding. In other words, AmI innovations stem from the com-
bined progress in different ICT fields and often from combining these with human
and social disciplines. It is therefore necessary and fruitful to intensively stimulate
interdisciplinary endeavors among scholars from human-directed disciplines and
diverse fields of computing. The underlying premise is that current research should
not only concentrate on designing and building new technologies and applications,
but also strive for coherent knowledge and understanding of AmI. AmI applications
can be realized only partly with sensor technologies in terms of acquiring infor-
mation about human users and their functioning and environment, but their full
realization is crucially contingent upon the availability of adequate knowledge (e.g.,
context, activity, emotion, cognition, dialog acts, etc.) in the form of dynamic
models for analysis of and reasoning about the observed information.
It is in the complexity of capturing, representing, processing, and using
knowledge about human functioning where the challenge lies as to the incorpora-
tion of context awareness, affective computing, and computational intelligence
functionalities in the AmI service and support provision chain. In recent years,
scientific research within the areas focusing on human functioning, such as cog-
nitive psychology, cognitive science, cognitive neuroscience, social sciences, and
14 1 Introduction

human (verbal and nonverbal) communication have made major strides in pro-
viding new insights into understanding cognitive, emotional, physiological, neu-
rological, behavioral, and social aspects of human functioning. Although much
work still needs and remains to be done, complex models have been developed for a
variety of aspects of human contexts and processes and implemented in a variety of
application domains within the areas of AmI and artificial intelligence and at their
intersection. Though these models have yielded and achieved good results in lab-
oratory settings, they tend to lack usability in real life. If knowledge about human
functioning is computationally available—models of human contexts and processes
are represented in a formal and explicit form and developed based on concrete
interdisciplinary research work, and incorporated in everyday human environment
in computer systems that observe the contexts (e.g., psychological and physio-
logical states) and monitor the actions of humans along the changes in the envi-
ronment in circumambient ways; then these systems become able to carry out a
more in-depth, human-like, analysis of the human context and processes, and thus
come up with well-informed actions in support of the user in terms of cognitive,
emotional, and social needs. Moreover, advanced knowledge from the
human-directed sciences needs to be amalgamated, using relevant frameworks for
combining the constituents, to obtain the intended functioning of human-inspired
AmI systems in terms of undertaking actions in a knowledgeable manner in some
applications (e.g., biomedical systems, healthcare systems, and assisted living
systems) while applying a strengthened collaboration with humans in others (e.g.,
cognitive and emotional context-aware systems, affective systems, and
emotion-aware systems). This can result in a close coupling between the user and
the agent, where the human users partner with the system in the sense of negoti-
ating, about what actions of the former are suitable for the situation of the latter.
However, human-directed disciplines involve volatile theories, subjectivities,
pluralism of theoretical models, and a plethora of unsolved issues. Adding to this is
the generally understood extraordinary complexity of social sciences as well as
humanities (especially human communication with regard to pragmatics, sociolin-
guistics, psycholinguistics, and cultural dimensions of nonverbal communication
behavior), due to the reflexive nature of social and human processes as well as the
changing and evolving social and human conditions. This is most likely to carry over
its effects to modeling and implementation of knowledge about processes and
aspects of human functioning into AmI systems—user interfaces—and their
behavior. Computational modeling of human behavior and context and achieving a
human-like computational understanding (analysis of what is going on in the mind
and behavior of humans (and) in their physical, social, and cultural environments)
has proven to be the most challenging task in AmI and artificial Intelligence alike. In
fact, these challenges are argued to be the main reason why AmI is failing, hitherto,
to scale from prototypes to realistic environments and systems. While machine
learning and ontological techniques, coupled with recent hybrid approaches, have
proven to hold a tremendous potential to reduce the complexity associated with
modeling human activities and behaviors and situations of life, the fact remains that
most of the current reasoning processes—intelligent processing of sensor data and
1.1 The Many Faces of AmI 15

formalized knowledge at a higher level of automation—entail extremely complex


inferences to generate high-level abstractions of situations and activities grounded in
relatively limited, uncertain, fuzzy, and/or imperfect observed information about the
human’s state and behavior over time, adding to the oversimplified models for the
human’s psychological, social, and conversational processes.

1.2 The Scope and Twofold Purpose of the Book

This book addresses the human face of AmI in terms of the cognitive, emotional,
affective, behavioral, and conversational features that pertain to the various appli-
cation domains where AmI systems and environments show human-like under-
standing and exhibit intelligent behavior in relation to a variety of aspects of human
functioning—states and processes of human users. These systems and environments
are imminent since they use essentially the state-of-the-art enabling technologies and
related computational processes and capabilities underlying the functioning of AmI
as (nonhuman) intelligent entities. It also includes ambitious ideas within the same
realm whose realization seems to be still far away due to unsolved technological and
social challenges. In doing so, this book details and elucidates the rich potential of
AmI from a technological, human, and social perspective; the plethora of difficult
encounters and bottlenecks involved in making AmI a reality, a deployable and
achievable paradigm; and the existing and future prerequisite enabling technologies.
It moreover discusses in compelling and rich ways the recent discoveries and
established knowledge in human-directed sciences and their application and con-
vergence in the ambit of AmI as a computing paradigm, as well as the application
and convergence of major current and future computing trends.
Specifically, this book has a twofold purpose. First, it aims to explore and assess
the state-of-the-art enabling technologies and processes; to review and discuss the
key computational capabilities underlying the AmI functioning; and to identify the
main challenges and limitations associated with the design, modeling, and imple-
mentation of AmI systems and applications, with an emphasis on various aspects of
human functioning. This is intended to inform and enlighten various research
communities of the latest developments and prospects in the respective research
area as well as to provide a seminal reference for researchers, designers, and
engineers who are concerned with the design and development of cognitive and
emotional context-aware, affective, socially intelligent, and conversational systems
and applications. Second, it intends to explore and discuss the state-of-the-art
human-inspired AmI systems and applications (in which knowledge from the
human-directed sciences such as cognitive science, cognitive psychology, social
sciences, and humanities is incorporated) and provide new insights and ideas on
how these could be further enhanced and advanced. This class of AmI applications
is augmented with aspects of cognitive intelligence, emotional intelligence, social
intelligence, and conversational intelligence at the cognitive and behavioral level.
More in detail, the main aim of this book is to support scholars, scientists, experts,
16 1 Introduction

and researchers interested in the understanding of the different dimensions of AmI


and exploiting its potential and concretizing its merits in relation to the improve-
ment of the quality of people’s lives in modern, high-tech society.

1.3 The Structure of the Book and Its Contents

The book is divided into two distinct but interrelated sections, each dealing with a
different dimension or aspect of AmI and investigation at an advanced level. It opens
with a scene setting chapter (this chapter, Sect. 1.1). This chapter contains a more
detailed introduction to Part I and Part II of the book. The major themes, issues,
assumptions, and arguments are introduced and further developed and elaborated on
in subsequent chapters. It moreover includes an outline of the book’s scope, pur-
pose, structure, and contents, in addition to providing a brief descriptive account of
the research strategy espoused in the book: a combination of interdisciplinary and
transdisciplinary approaches. Part I (Chaps. 2–6) looks at different permutations of
enabling technologies and processes as well as core computational capabilities.
As to enabling technologies and processes, it covers sensor and MMES technology,
multi-sensor systems and data fusion techniques, capture/recognition approaches,
pattern recognition/machine learning algorithms, logical and ontological modeling
methods and reasoning techniques, hybrid approaches to representation and rea-
soning, conventional and multimodal user interfaces, and software and artificial
intelligent agents. As to core computational capabilities, it comprises context
awareness, implicit and natural interaction, and intelligent behavior in relation to
human-inspired AmI applications.
Part II (Chaps. 7–9) deals with a variety of human-inspired AmI applications,
namely cognitive and emotional context-aware systems, affective/emotion-aware
systems, multimodal context-aware affective systems, context-aware emotionally
intelligent systems, socially intelligent systems, explicit natural and touchless
systems, and conversational systems. It provides a detailed review and synthesis of
a set of theoretical concepts and models pertaining to emotion, emotional intelli-
gence, cognition, affect, aesthetics, presence, nonverbal communication behavior,
linguistics, pragmatics, sociolinguistics, psycholinguistics, and cognitive linguis-
tics. With their explanatory power, these conceptual and theoretical frameworks,
coupled with the state-of-the-art enabling technologies and computational processes
and capabilities, can be used to inform the design, modeling, evaluation, and
implementation of the respective human-inspired AmI applications.
Parts I and II are anchored in, based on the nature of the topic, philosophical and
analytical discussions, worked out with great care and subtlety of detail, along with
theoretical and practical implications and alternative research directions, high-
lighting an array of new approaches to and emerging trends around some of the core
concepts and ideas of AmI that provide a more holistic view of AmI.
With its three parts, the book comprises 10 chapters, which have a standardized
scholastic structure, making them easy to navigate. Each chapter draws on some of
1.3 The Structure of the Book and Its Contents 17

the latest developments and prospects as to findings in the burgeoning research area
of AmI, along with voices of high-profile and leading scholars, scientists, and
experts. Moreover, the chapters can be used in various ways, depending on the
reader’s interests: as a stand-alone overview of contemporary (theoretical, empiri-
cal, and analytical) research on AmI; as a seminal resource or reference for pro-
spective students and researchers embarking on studies in computing, ICT
innovation, science and technology, and so forth. In addition, Part II can be used as
complement to the Part I chapters, enabling students, researchers, and others to
make connections between their perceptions and understandings, relevant research
evidence, and theoretical concepts and models, and the experiences and visions of
computer scientists and AmI creators and producers.
The contents of this book are structured to achieve two outcomes. Firstly, it is
written so the reader can read it easily from end to end—based on his/her back-
ground and experience. It is a long book that is packed with value to various classes
of readers. Whether the reader diligently sits down and reads it in a few sessions at
home/at the library or goes through a little every now and then, he/she will find it
interesting to read and accessible—especially those readers with passionate interest
in or deep curiosity about new visions of the future of technology. Secondly, it is
written so that the reader can call upon specific parts of its content information in an
easy manner. Furthermore, each of its chapters can be read on its own or in
sequence. It is difficult to assign a priority rating to the chapters given that the book
is intended for readers with different backgrounds and interests, but the reader will
get the most benefit from reading the whole book in the order it is written, so that
he/she can gain a better understanding of the phenomenon of AmI. However, if you
are short of time and must prioritize, start with those chapters you find of highest
priority based on your needs, desires, or interests. Hence, as to how important the
topics are, the choice is yours—based on your own reflection and assessment.
Overall, the book has been carefully designed to provide you with the material
and repository required to explore the realm of AmI. AmI is an extremely complex,
intricate, varied, and powerful phenomenon, and it is well worth exploring in some
depth. The best way to enable the reader to embark on such an exploration is to
seamlessly integrate technological, human, and social dimensions in ways that build
on and complement one another. Achieving this combination is the main strength
and major merit of this book and succeeding in doing so is meant to provide the
reader with valuable insights into imminent AmI technologies, their anticipated
implications for and role in people’s future lives, potential ways of addressing and
handling the many challenges in making it a reality, and alternative research
directions for delivering the essence of the AmI vision. This is believed to be of no
small achievement in its own right, and certainly makes the book rewarding reading
experience for anyone who feels they could benefit from a greater understanding of
the domain. I encourage you to make the most of this opportunity to explore AmI,
an inspiring vision of next wave of ICT with far-reaching implications on modern,
high-tech society. While some of us might shy away from foreseeing what the
future era of AmI will look like, it is certain to be a very different world. I wish you
well on the exploration journey.
18 1 Introduction

1.4 Research Strategy: Interdisciplinary


and Transdisciplinary Approaches

This research work operates out of the understanding that advances in knowledge
and an ever-increasing awareness of the complexity of emerging phenomena have
led researchers to pursue multifaceted problems that cannot be resolved from the
vantage point of a single discipline or sometimes an interdisciplinary field as an
organizational unit that crosses boundaries between academic disciplines. AmI is a
phenomenon that is too complex and dynamic to be addressed by single disciplines
or even an interdisciplinary field. Indeed, impacts of AmI applications in terms of
context, interaction, and intelligent behavior, for instance, well exceed the highly
interdisciplinary field, thereby the need for espousing transdisciplinary perspective
as to some of its core aspects. Besides, it is suggested that interdisciplinary efforts
remain inadequate in impact on theoretical development for coping with the
changing human circumstance. Still, interdisciplinary approach remains relevant to
look at AmI as a field of tension between social, cultural, and political practices and
the application and use of new technologies or an area where a wide range of
technological and scientific areas come together around a common vision of the
future and the enormous opportunities such future will open up. Thus, in the context
of AmI, some research topics remain within the framework of disciplinary research,
and other research topics cannot be accomplished in disciplinary research. In light
of this, both interdisciplinary and transdisciplinary research approaches are
espoused in this book to investigate the AmI phenomenon. Adopting this research
strategy has made it possible to flexibly respond to the topic under inquiry and
uncover the best way of addressing it. It is aimed at contributing to an integral
reflection upon where the still-emerging field of AmI is coming from and where it is
believed it should be heading.
Seeking to provide a holistic understanding of the AmI phenomenon for a
common purpose or in the pursuit of a common task, interdisciplinary approach
insists on the mixing of disciplines and theories. Thereby, it crosses boundaries
between disciplines to create new perspectives based on interactional knowledge
beyond these disciplines. It is of high importance because it allows interlinking
different analyses and spilling over disciplinary boundaries. The field of AmI
should see the surge of interdisciplinary research on the incidence of technological,
social, cultural, political, ethical, and environmental issues as well as strategic
thinking toward the social acceptance of AmI technology, with the capacity to
create methods for innovation and policy. Pooling various perspectives and mod-
ifying them so to become better suited to AmI as a problem at hand is therefore very
important to arrive at a satisfactory form of multidisciplinary AmI. The subject of
AmI appears differently when examined by different disciplines, for instance, his-
tory, sociology, anthropology, philosophy, cultural studies, innovation studies, and
so on.
Transdisciplinary approach insists on the fusion of disciplines with an outcome
that exceeds the sum of each, focusing on issues that cross and dissolve disciplinary
1.4 Research Strategy: Interdisciplinary and Transdisciplinary Approaches 19

boundaries as well but to create holistic, unified knowledge. Transdisciplinarity lends


itself to the exploration of multifaceted problems. This knowledge is by definition
situated at once between the disciplines, across the disciplines, and beyond each
single discipline, as these spaces contain a lot of useful knowledge in the presence of
several levels and categories of reality. Transdisciplinarity concerns what their action
can generate as coherent knowledge and the discovery of which necessarily passes
through disciplinary knowledge. Common procedural postulates applied in this
regard are the existence of levels and categories of reality and the logic of the
encompassed middle. And the objective is to better understand the AmI world, of
which one of the imperatives is the overarching unity of knowledge. Aiming for
transdisciplinary insight, the present analysis draws on several theories—including
human communication, natural interaction, human intelligence, context, cognition,
emotion, situated cognition, situated action, aesthetics, linguistics, and culture.
Understanding the tenets of many relevant theories allows a more complete under-
standing of AmI. Among the most holistic, these theories are drawn from cognitive
science, cognitive psychology, social sciences, humanities, and philosophy. The
purpose here is to set side-by-side elements of some theories that have strong and
clear implications for the nation of AmI, rather than provide a detailed review of
an exhaustive set of theories from academic disciplines specialized on the related
subject matters.
In sum, given the nature of the topic under investigation, transdisciplinary
research is seen as complementary to interdisciplinary research. It is important to
note that transdisciplinarity is radically distinct from interdisciplinarity due to its
goal, the understanding of the present world of AmI, which cannot be, in any case,
accomplished in the framework of disciplinary research, while the goal of inter-
disciplinarity always remains within the framework of disciplinary research—AmI
as a computing paradigm or a crossover approach linked to computer science
topics. The confusion surrounding the difference between these research strategies
is in general explained by the fact that they both overflow disciplinary boundaries.
It is argued that this confusion is disadvantageous because it hides the enormous
potential of transdisciplinarity.

References

Aarts E, de Ruyter B (2009) New research perspectives on Ambient Intelligence. J Ambient Intell
Smart Environ 1(1):5–14
Bibri SE (2014) The potential catalytic role of green entrepreneurship—technological eco–
innovations and ecopreneurs’ acts—in the structural transformation to a low-carbon or green
economy: a discursive investigation. Master Thesis, Department of Economics and
Management, Lund University
Criel J, Claeys L (2008) A transdisciplinary study design on context-aware applications and
environments. A critical view on user participation within calm computing. Observatorio
(OBS*) J 5:057–077
20 1 Introduction

Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc 3
(4):219–232
Gunnarsdóttir K, Arribas-Ayllon M (2012) Ambient Intelligence: a narrative in search of users.
Lancaster University and SOCSI, Cardiff University, Cesagen
Hayes PJ, Reddy RD (1983) Steps toward graceful interaction in spoken and written man-machine
communication. Int J Man Mach Stud 19(3):231–284
ISTAG (2001) Scenarios for Ambient Intelligence in 2010. ftp://ftp.cordis.lu/pub/ist/docs/
istagscenarios2010.pdf. Viewed 22 Oct 2009
ISTAG (2006) Shaping Europe’s future through ICT. http://www.cordis.lu/ist/istag.htm. Viewed
22 Mar 2011
José R, Rodrigues H, Otero N (2010) Ambient intelligence: beyond the inspiring vision. J Univ
Comput Sci 16(12):1480–1499
Lindblom J, Ziemke T (2002) Social situatedness: Vygotsky and beyond. In: The 2nd international
workshop on epigenetic robotics: modeling cognitive development in robotic systems, pp 71–78,
Edinburgh, Scotland
Lombard M, Ditton T (1997) At the heart of it all: the concept of presence. J Comput Mediat
Commun 3(2)
Lueg C (2002) Operationalizing context in context-aware artifacts: benefits and pitfalls. Hum
Technol Interface 5(2)
Markopoulos P, de Ruyter B, Privender S, van Breemen A (2005) Case study: bringing social
intelligence into home dialogue systems. ACM Interact 12(4):37–43
Nijholt A, Rist T and Tuijnenbreijer K (2004) Lost in Ambient Intelligence? In: Proceedings of
CHI 2004, Vienna, Austria, pp 1725–1726
Punie Y (2003) A social and technological view of ambient intelligence in everyday life: what
bends the trend? Eur Media Technol Everyday Life Netw 2000–2003, Institute for Prospective
Technological Studies Directorate General Joint Research Center European Commission
Schindehutte M, Morris MH, Pitt LF (2009) Rethinking marketing—The entrepreneurial
imperative. Pearson Education, New Jersey
Suchman L (1987) Plans and situated actions: the problem of human-machine Communication.
Cambridge University Press, Cambridge
Suchman L (2005) Introduction to plans and situated actions II: human-machine reconfigurations,
2nd expanded edn. Cambridge University Press, New York/Cambridge
ter Maat M, Heylen D (2009) Using context to disambiguate communicative signals. In:
Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals, LNAI 5398.
Springer, Berlin, pp 164–169
Vilhjálmsson HH (2009) Representing communicative function and behavior in multimodal
communication. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals:
cognitive and algorithmic issues. Springer, Berlin, pp 47–59
Weiser M (1991) The computer for the 21st century. Sci Am 265(3):94–104
Part I
Enabling Technologies and Computational
Processes and Capabilities
Chapter 2
Ambient Intelligence: A New Computing
Paradigm and a Vision of a Next Wave
in ICT

2.1 Introduction

AmI has emerged in the past 15 years or so as a new computing paradigm and a
vision of a next wave in ICT. It postulates a paradigmatic shift in computing and
offers a vision of the future of ICT—with far-reaching societal implications, rep-
resenting an instance of the configuration of social-scientific knowledge. AmI is a
multidisciplinary field (within ubiquitous computing) where a wide range of sci-
entific and technological areas and human-directed sciences converge on a common
vision of the future and the enormous opportunities and immense possibilities such
future will open up and bring, respectively, that are created by the incorporation of
machine intelligence into people’s everyday lives. In other words, it is said to hold
great potential and promise in terms of social transformations. As such, it has
increasingly gained legitimacy as an academic and public pursuit and discourse in
the European information society: scientists and scholars, industry experts and
consortia, government science and technology agencies, science and technology
policymakers, universities, and research institutes and technical laboratories are
making significant commitments to AmI.
By virtue of its very definition, implying a certain desired view on the world, it
represents more a vision of the future than a reality. And as shown by and known
from preceding techno-visions and forecasting studies, the future reality is most
likely to end up being very different from the way it is initially envisioned. Indeed,
techno-visions seem to face a paradox, in that they fail to balance between inno-
vative and futuristic claims and realistic assumptions. This pertains to unreasonable
prospects, of limited modern applicability, on how people, technology, and society
will evolve, as well as to a generalization or oversimplification of the rather specific
or complex challenges involved in enabling future scenarios or making them for
real. Also, crucially, techno-utopia is a relevant risk in such a strong focus on
ambitious and inspiring visions of the future of technology. Techno-utopian dis-
courses surround the advent of new technological innovations or breakthroughs, on

© Atlantis Press and the author(s) 2015 23


S.E. Bibri, The Human Face of Ambient Intelligence, Atlantis Ambient
and Pervasive Intelligence 9, DOI 10.2991/978-94-6239-130-7_2
24 2 Ambient Intelligence …

the basis of which these discourses promise revolutionary social changes. The
central issue with techno-visions is the technologically deterministic view under-
lying many of the envisioned scenarios, ignoring or falling short in considering the
user and social dynamics involved in the innovation process.
Furthermore, yet recent years have—due to the introduction of technological
innovations or breakthroughs and their amalgamation with recent discoveries in
human-directed sciences—witnessed an outburst of claims for new paradigms and
paradigm shifts in relation to a plethora of visions of next waves in ICT, social
studies of AmI include—a kind of new paradigm and paradigm shift epidemic.
Many authors and scholars have a tendency to categorize AmI—as recent techno-
scientific achievements or advances in S&T—as a paradigm and thus paradigm
shift in relation to computing, ICT, society, and so on. In fact, there has been a near
passion for labeling new technological visions as paradigms and paradigm shifts as
a way to describe a certain stage of technological development within a given
society. While such visions emanate from the transformational effects of comput-
ing, predominately, where paradigm and paradigm shift actually hold, they still
entail a lot of aspects of discursive nature in the sense of a set of concepts, ideas,
claims, assumptions, premises, and categorizations that are historically contingent
and socio-culturally specific and generate truth effects accordingly. The underlying
assumption is that while AmI as new technological applications is the result of
scientific discovery or innovation, it is still directed towards humans and targeted at
complex, dynamic social realities made of an infinite richness of circumstances, and
involving intertwined factors and situated social dynamics. In other words, AmI has
been concerned with people-centered approaches in the practice of technological
development. I therefore argue that there is a computing paradigm profile relating to
AmI as to ubiquitous computing—which constitutes one its major visions, but there
is no paradigm in society—nor should there be. Accordingly, AmI as a techno-
logical vision involves paradigmatic, non-paradigmatic, pre-paradigmatic, and post-
paradigmatic dimensions, as well as discursive aspects.
However, at the technological level, AmI is characterized by human-like cog-
nitive and behavioral capabilities, namely context awareness, implicit and natural
interaction, and intelligence (cognitive, emotional, social, and conversational). By
being equipped with advanced enabling technologies and processes and what this
entails in terms of miniature smart sensors, sophisticated data processing and
machine learning techniques, and hybrid modeling approaches to knowledge rep-
resentation and reasoning, AmI should be capable to think and behave intelligently
in support of human users, by providing personalized, adaptive, responsive, and
proactive services in a variety of settings: living spaces, workspaces, social and
public places, and on the move. With the progress in the fields of microelectronics
(i.e., miniaturization and processing power of sensing and computing devices),
embedded systems, wireless and mobile communication networks, and software
intelligent agents/user interfaces, the AmI vision is evolving into a deployable and
achievable computing paradigm.
The aim of this chapter is to give insights into the origin and context of the AmI
vision; to shed light on the customary assumptions behind the dominant vision of
2.1 Introduction 25

AmI, underlying many of its envisioned scenarios, and provide an account on its
current status; to outline and describe a generic typology for AmI; to provide an
overview on technological factors behind AmI and the many, diverse research
topics and areas associated with AmI; to introduce and describe human-directed
sciences as well as artificial intelligence and their relationships and contributions to
AmI; and to discuss key paradigmatic, non-paradigmatic, pre-paradigmatic, and
post-paradigmatic dimensions of AmI. Moreover, this chapter intends to provide
essential underpinning conceptual tools for exploring the subject of AmI further in
the remaining chapters.

2.2 The Origin and Context of the AmI Vision

Much of what characterizes AmI can be traced back to the origins of ubiquitous
computing. AmI as a new computing paradigm has evolved as a result of an
evolutionary technological development, building upon preceding computing par-
adigms, including mainframe computing, desktop computing, multiple computing,
and ubiquitous computing (UbiComp). As a vision of a next wave in ICT, a kind of
shift in computer technology and its role in society, AmI became widespread and
prevalent in Europe about a decade after the emergence of the UbiComp vision in
the USA, a future world of technology which was spotted in 1991 by Mark Weiser,
chief scientist at the Xerox Palo Alto Research Center (PARC) in California, when
he published a paper in Scientific American which spoke of a third generation of
computing systems, an era when computing technology would vanish into the
background. Weiser (1991) writes: ‘First were mainframes, each shared by lots of
people. Now we are in the personal computing era, person and machine staring
uneasily at each other across the desktop. Next comes ubiquitous computing, or the
age of calm technology, when technology recedes into the background of our lives.
Alan Kay of Apple calls this “Third Paradigm” computing’. So, about 25 years ago,
Mark Weiser predicted this technological development and described it in his
influential article “The Computer for the 21st Century” (Weiser 1991). Widely
credited as the first to have coined the term ‘ubiquitous computing’, Weiser alluded
to ir as omnipresent computing devices and computers that serve people in their
everyday lives, functioning unobtrusively in the background of their consciousness
and freeing them from tedious routine tasks. In a similar fashion, the European
Union’s Information Society Technologies Advisory Group (ISTAG) used the term
‘ambient intelligence’ in its 1999 vision statement to describe a vision where
‘people will be surrounded by intelligent and intuitive interfaces embedded in
everyday objects around us and an environment recognizing and responding to the
presence of individuals in an invisible way’ (ISTAG 2001, p. 1). In the European
vision of AmI (or the future information society), ‘the emphasis is on greater
user-friendliness, more efficient services support, user-empowerment, and support
for human interactions’ (ISTAG 2001, p. 1). Issues on key difference between the
two visions and concepts are taken up in the next section.
26 2 Ambient Intelligence …

The research within UbiComp and the development of the vision in the USA has
been furthered in concert with other universities, research centers and laboratories,
governmental agencies, and industries. Among the universities involved include
MIT, Berkeley, Harvard, Yale, Stanford, Cornell, Georgia Tech’s College of
Computing, and so on. As an example, MIT has contributed significant research in
the field of UbiComp, notably Hiroshi Ishii’s Things That Think consortium at the
Media Lab and Project Oxygen. It is worth pointing out that research undertaken at
those universities has been heavily supported by government funding, especially by
the Defense Advanced Research Projects Agency (DARPA), which is the central
research and development organization for the Department of Defense (DoD), and
the National Science Foundation (NSF) as an independent federal agency. Many
other corporations have additionally undertaken UbiComp research, either on their
own or in consortia with other companies and/or universities. Among which
include: Microsoft, IBM, Xerox, HP, Intel, Cisco Systems, Sun Microsystems, and
so forth.
Inspired by the UbiComp vision, the AmI vision in Europe was promoted by
certain stakeholders—a group of scholars and experts, a cluster of ICT companies,
research laboratories, governmental agencies, and policymakers. AmI was origi-
nally developed in 1998 by Philips for the time frame 2010–2020 as a vision on the
future of ICT (consumer electronics, telecommunications, and computing) where
user-friendly devices support ubiquitous information, communication, and enter-
tainment. In 1999, Philips joined the Oxygen alliance, an international consortium
of industrial partners within the MIT Oxygen project. In 2000, plans were made to
construct a feasibility and usability facility dedicated to AmI. A major step in
developing the vision of AmI in Europe came from the Information ISTAG, a group
of scholars and industry experts who first advanced the vision of AmI in 1999. In
this year, ISTAG published a vision statement for the European community
Framework Program (FP) 5 for Research and Technological Development
(RTD) that laid down a challenge to start creating an AmI landscape. During 2000,
a scenario exercise was launched to assist in further developing a better under-
standing of the implications of this landscape as a collaborative endeavor between
the Joint Research Center’s Institute for Prospective Technological Studies
(IPTS-JRC) and DG Information Society, and the development and testing of
scenarios involved about 35 experts from across Europe. In parallel with the
development of the AmI vision at Philips at the time ISTAG working group was
chaired by CEO of Philips Industrial Research Dr. Martin Schuurmans, a number of
other initiatives started to explore AmI further with the launch of and the funneling
of expenditure in research projects. ISTAG continued to develop the vision under
the IST program of the European Union (EU) FP6 and FP7 for RTD. It has since
1999 made consistent efforts for ICT to get an increased attention and a higher pace
of development in Europe (Punie 2003). Indeed, it is a strong promoter of, and a
vocal champion for, the vision of AmI. With ISTAG and the EU IST RTD funding
program, huge efforts have been made in the EU to mobilise research and industry
towards laying the foundation of an AmI landscape and realizing the vision of AmI.
There has been a strong governmental and institutional support for AmI. AmI has
2.2 The Origin and Context of the AmI Vision 27

been embedded in one of the funding instruments of the European Commission


(EC), notably under its FP5, FP6, and FP7. EC is a key player in the further
development of the AmI vision; it used it for the launch of its FP5 and FP6,
following the advice of ISTAG. In particular, AmI was one of the key concepts
being used to develop the Information Society aspects of the EU’s RTD FP 6. The
association of AmI with the European policies towards the knowledge society and
the financial backing in the FP IST research programs contributed to make AmI a
very active research topic. European industry, consortiums, universities, research
institutes, and member states have also been mobilized to contribute to the reali-
zation of the AmI vision, by devoting funds to AmI research (e.g., Wright 2005).
As a result of many research initiatives and endeavors, the AmI vision gained a
strong footing in Europe. This has led to the establishment of roadmaps, research
agendas, projects, and other endeavors across Europe, spanning a variety of
domains, such as context awareness computing, multimodal communication mod-
eling, micro-systems design, embedded systems, multimedia, service provisioning,
privacy and security, affective computing, and so on. Virtually all European AmI
projects have been undertaken by consortia, which typically comprise partners from
different countries and different sectors, especially universities and industry (Wright
2005). The increase of AmI projects and research activities has been driving up the
EC budget, apart from the heavy investment undertaken from and the huge funding
spent by many European corporations, companies, universities and other involved
stakeholders from different sectors in the EU. In addition, in the aftermath of the
first European symposium on AmI (EUSAI) that took place in 2004, many con-
ferences and forums have been and continued to be held across Europe to-date,
addressing a range of topics within AmI research and practice. The goal of all these
efforts and stakeholder motivation is to spur innovation and the S&T knowledge
base for well-being, competitiveness, and growth in the future European informa-
tion society (Punie 2003), by unlocking the transformational effects of ICT. AmI
can be used as a medium to achieve innovation (Aarts 2005). AmI has a great
potential to lead to ‘radical social transformations’ and new ICT to ‘shape Europe’s
future’ (ISTAG 2003, 2006). Innovation has long been recognized as a vehicle for
societal transformation, especially as a society moves from one technological epoch
to another.

2.3 The Current Status, Unrealism, and Technological


Determinism of the AmI Vision

Notwithstanding the huge financial support and funding provided and the intensive
research in academic circles and in the industry, coupled with the strong interest
stimulated by European policy makers, the current state of research and develop-
ment shows that the vision of AmI is facing enormous challenges and hurdles in its
progress towards realization and delivery in Europe. Demonstrably, the ‘AmI
28 2 Ambient Intelligence …

Space’ has not materialized as foreseen or envisaged 15 years ago—by ISTAG. No


real breakthrough in AmI research is perceived and achieved thus far, although AmI
environments are intelligent and AmI applications and services make the life of the
people better. It is argued that among the causes why AmI environments have not
broken through into the mainstream are the prevailing assumptions in the vision of
AmI, underlying many of the envisioned scenarios pertaining to the
pre-configuration of users in, and the kind of society envisaged with, AmI—i.e.,
unrealism and technological determinism. Like preceding techno-visions, by virtue
of its very definition, implying a certain desired view on the world, AmI represents
more a vision of the future than reality. And as shown by and known from fore-
casting studies, the future reality is most likely to end up being very different from
the way it is initially envisioned or predicted. Indeed, techno-visions appear to face
a paradox, in that they fail to balance between innovative and futuristic claims and
realistic assumptions. This pertains to unreasonable prospects, of limited modern
applicability, on how people and technology will evolve, as well as to an over-
simplification of the rather complex challenges involved in enabling future sce-
narios or making them for real. Also, techno-utopia is a relevant risk in such a
strong focus on such aspiring and inspiring visions of the future of technology.
Techno-utopian discourses are common with the advent of new technological
innovations or breakthroughs, on the basis of which these discourses promise
revolutionary social changes. The central issue with techno-visions is the techno-
logically deterministic view underlying many of the envisioned scenarios.
However, techno-visions seem to fail to deliver what they promise or to realize their
full potential, regardless of the extent to which visionaries, research leaders, and
policymakers build expectations, mobilize and marshal R&D resources, and inspire
and align strategic stakeholders towards the realization and delivery of such visions.
The main reason for this phenomenon lies in the difficulty of avoiding unrealism
and technological determinism.
A key implication of technological determinism is overlooking the user and social
dynamics and undercurrents involved in the innovation process. This implies that
techno-visions only look at what is technologically feasible and have a
one-dimensional account of how social change occurs (Burgelman 2001). This may
involve the risk of people becoming disinclined to accept, absorb, or adapt to
technological innovation opportunities and the promised radical social transforma-
tion consequently becoming a fallacy. Similarly, one of the ramifications of unre-
alism—e.g., a design process grounded in the unrealistic assumptions pervading
(user) scenarios—is the irrelevant and unrealistic systems and applications that no
one will use, adopt, or benefit from. What is needed is to ‘better understand what
people want, how they take advantage of available devices, and how to craft devices
and systems in ways that intelligently inserts them into ordinary everyday affairs—
not just the affairs of one individual at a time, but into the ordinary interactions found
in group activity or social settings more generally’ (Gunnarsdóttir and
Arribas-Ayllon 2012, p. 32). In light of this, if no real breakthrough in AmI research
and development is perceived, it would be mainly because of the prevailing vision of
user participation (see Criel and Claeys 2008), in addition to ignoring what recent
2.3 The Current Status, Unrealism … 29

history of ICT and social studies of new technologies have shown in terms of the
importance of social innovation as an ingredient in technology innovation and the
central role of multiple methods of participative design as innovation instruments, as
well as failing to make explicit the consideration for human values and concerns in
the design choices and decisions that will shape AmI technology. Seeing the user as
a shaper of technology, these views call upon a more active participatory role in
technology innovation and design, and thereby challenge the passive role of the user
as a mere adopter of new technologies (e.g., Alahuhta and Heinonen 2003).
Furthermore, putting emphasis on the user in AmI innovation research plays a
key role in the development of related applications and services. However, it is
unquestionable that the current or dominant user-centered design approaches—
albeit originated from participatory design—place the user at such a central stage as
they often claim, which goes together with the vision of AmI (e.g., Criel and Claeys
2008). As to the humanistic philosophy of technology design, experiences have
shown that it is very challenging to give people the lead and consider their values
and concerns in the ways systems and applications are developed and applied. In
other words, the difficulty with human-centered design approach is that it is far from
clear how this can be achieved due to the availability of little knowledge and the
lack of tools to integrate user behavior as a parameter in system design and product
and service development (Punie 2003; Riva et al. 2003). As to social innovation,
while it is considered decisive in producing successful technological systems as
well as in the acceptance of new technologies, it is often seen to be very challenging
as well as too costly and time consuming for technology creators to take onboard.
Regardless, in reference to the AmI vision, Aarts and Grotenhuis (2009) underscore
the need for a value shift: ‘…we need a more balanced approach in which tech-
nology should serve people instead of driving them to the max’. This argument
relates to social innovation in the sense of directing the development of new
technologies towards responding to users’ needs and addressing social concerns. In
other words, technological development has to be linked with social development.
The underlying assumption is that failing to make this connection is likely to result
in people rejecting new technologies and societal actors in misdirecting and mis-
allocating resources, e.g., mobilization of professionals, experts, companies, and
technical R&D.
Nevertheless, as many argue, visions of the future of technology are meant to
provoke discussion or promote debate and depict plausible futures or communicate
possible scenarios, adding to mobilizing and marshalling resources and inspiring
and aligning key stakeholders into the same direction. As Gunnarsdóttir and
Arribas-Ayllon (2012, p. 30) point out, ‘[t]he AmI vision emerges from a pedigree
of expectations about the future of computing…The original scenarios are central to
making up new worlds and building expectations around prospective lifestyles and
users. Rhetorically, they contribute to conditions that make visions of AmI seem-
ingly possible. But they also engender capacities to investigate what is actually
possible. Incorporating new challenges and anticipating problems modulates the
course of expectations… New visions are adapted to accommodate contingent
futures—uncertainties about design principles, experiences, identities and
30 2 Ambient Intelligence …

preferences… Visionaries and research leaders continue to imagine new


socio-technical arrangements in which…experiences are profoundly changing. The
new interaction paradigm between people and technology will be embedded in an
ecological utopia…based on values associated with intimate connections between
people and things… [A] greater vision needs to be cultivated to sustain both
research and…funding interests.’
With the purpose of reflecting on what it ‘means for the AmI vision, and its
foundational role for AmI at large’ to ‘move from visionary perspectives of the
future to a new focus on the challenge of actually being able to deliver real value
today’, José et al. (2010, p. 1480, 1482) suggest ‘that it is time for the AmI field to
move beyond its founding vision and embrace important emerging trends that may
bring this field closer to realization, delivery and real social impact’ and that revolve
‘around some of its core concepts, more specifically the notion of intelligence, the
system view and the requirements process. The main motivation is to search for
alternative research directions that may be more effective in delivering today the
essence of the AmI vision, even if they mean abandoning some of the currently
prevailing approaches and assumptions’.

2.4 AmI Versus UbiComp as Visions

AmI and UbiComp share many similar assumptions, claims, ideas, terminologies,
and categorizations. They depict a vision of the future information society where
everyday human environment will be permeated by computer intelligence and
technology: humans will be surrounded and accompanied by advanced sensing and
computing devices, intelligent multimodal interfaces, intelligent software agents,
and wireless and ad-hoc (a system of network elements combined to form a network
entailing no planning) networking technology, which are everywhere, invisibly
embedded in human natural surroundings, in virtually all kinds of everyday objects
in order to make them smart. This computationally augmented everyday environ-
ment is aware of people’s presence and context, and is adaptive, responsive, and
anticipatory to their needs and desires, thereby intelligently supporting their daily
lives through providing unlimited services in new, intuitive ways and in a variety of
settings. In other words, smart everyday objects can interact and communicate with
each other and other people’s objects, explore their own environment (situations,
events, locations, user states, etc.), and interact with human users, therefore helping
them to cope with their daily tasks in a seamless and intuitive way.
While AmI and UbiComp visions converge on the pervasion of microprocessors
and communication capabilities into everyday human environments and thus the
omnipresence and always-on interconnection of computing resources and services,
AmI places a particularly strong focus on intelligent interfaces that are sensitive to
users’ needs, adaptive to and anticipatory of their desires and intentions, and
responsive to their emotions. Philips has distinguished AmI from UbiComp as a
related vision of the future of technology, by characterizing the AmI vision as a
2.4 AmI Versus UbiComp as Visions 31

seamless smart environments capable of anticipating and intelligently responding to


people’s needs and motivations, and acting autonomously on their behalf
(Gunnarsdóttir and Arribas-Ayllon 2012). ISTAG (2003) claims that AmI emerged
in parallel with UbiComp but is different from it, in that AmI is concerned more
with the use of the technology than basic technology: what characterizes this dif-
ference particularly are the focus (users in their environment versus next-generation
computing technology) and the orientation (user-pull versus technology push) of
technology (Ibid). Weiser (1993, p. 75) wrote: ‘Since we started this work at PARC
in 1988 a few places have begun work on this possible next-generation computing
environment in which each person is continually interacting with hundreds of
nearby wirelessly interconnected computers. The goal is to achieve the most
effective kind of technology, that which is essentially invisible to the user. To bring
computers to this point while retaining their power will require radically new kinds
of computers of all sizes and shapes to be available to each person. I call this future
world “Ubiquitous Computing”’. At the core of the AmI vision, on the other hand,
are three technologies: ubiquitous computing, ubiquitous communication, and
intelligent user-friendly interfaces. Ubiquitous computing means integration of
microprocessors into everyday objects, ubiquitous communication enables these
objects to communicate with each other and human users by means of wireless and
ad-hoc networking, and intelligent user-friendly interfaces allow the inhabitants of
the AmI environment to interact with the environment in a natural and personalized
way (Riva et al. 2005). Accordingly, AmI stems from the convergence of these
three key technologies.
To a large extent, the distinctive characteristics have been largely set by the
ISTAG reports on AmI: according to the vision statement, ‘on convergence humans
will be surrounded by intelligent interfaces supported by computing and networking
technology which is everywhere, embedded in everyday objects… AmI implies a
seamless environment of computing, advanced networking technology and specific
interfaces. It is aware of the specific characteristics of human presence and per-
sonalities, takes care of needs and is capable of responding intelligently to spoken
or gestured indications of desire, and even can engage in intelligent dialog. AmI
should also be unobtrusive, often invisible: everywhere and yet in our conscious-
ness—nowhere unless we need it. Interaction should be relaxing and enjoyable for
the citizen, and not involve a steep learning curve’ (ISTAG 2001, p. 11; ISTAG
2003, p. 8). In other words, AmI can be described as the merger of two important
visions: ‘ubiquitous computing’ and ‘social user interfaces’: ‘It builds on advanced
networking technologies, which allow robust, ad-hoc networks to be formed by a
broad range of mobile devices and other objects (ubiquitous computing). By adding
adaptive user-system interaction methods, based on new insights in the way people
like to interact with computing devices (social user interfaces), digital environments
can be created which improve the quality of life of people by acting on their behalf.
These context-aware systems combine ubiquitous information, communication, and
entertainment with enhanced personalization, natural interaction and intelligence’
(Riva et al. 2003, p. 63). In all, AmI is a vision in which ICT and its applications
and uses are widened and deepened—a drastic shift in the users of the technology,
32 2 Ambient Intelligence …

its incorporation into diverse spheres of living and working, and the applications
(Punie 2003).
In fact, the vision of the future of technology is reflected in a variety of terms that
closely resemble each other, including, in addition to AmI and UbiComp, pervasive
computing, ubiquitous networking, everywhere computing, sentient computing,
proactive computing, calm computing, wearable computing, invisible computing,
affective computing, haptic computing, the Internet of Things, Things that Think,
and so on. These terms are used by different scholars and industry players to
promote the future vision of technology in different parts of the world. For example,
AmI is used in Europe, and the term was coined by Emile Aarts of Philips Research
in 1998 and adopted by the European Commission. Its equivalent in the USA is
UbiComp; Marc Weiser was first credited for dubbing the term in the late 1980s,
during his tenure as a Chief Scientist/Technologist at the Xerox Palo Alto Research
Center (PARC). He wrote some of the earliest papers on the subject, largely
defining it and sketching out its major concerns (Weiser 1991; Weiser et al. 1999).
Ubiquitous networking is more prevalent in Japan. Essentially all these terms mean
pretty much the same thing: regardless of their locations, researchers are all
investigating and developing similar technologies and dealing with similar chal-
lenges and problems (see Wright 2005).

2.5 AmI Versus UbiComp as Concepts

AmI as a concept is similar to UbiComp—intelligence everywhere. Similar to the


vision, however, views from the European scholarly community argue that they
differ in some aspects. AmI and UbiComp as concepts can still imply a slightly
different focus. AmI is the direct extension of the concept UbiComp, but it is much
more than this, as the AmI system should be adaptive and responsive to the user’s
needs and behavior (Riva et al. 2003; ISTAG 2001). The term AmI has a recent
provenance and is not clearly discerned from earlier concepts, such as UbiComp
(ISTAG 2003). Indeed, to the set of core system properties initially proposed by
Weiser (1991) two additional ones have been added: computers (1) can operate
autonomously, on behalf of the user or without human intervention, be self-
governed, and (2) handle a multiplicity of dynamic interactions and actions, gov-
erned by intelligent decision making and interaction, which involves artificial
intelligence techniques (Polsdad 2009). Weiser (1991) suggested three main internal
properties in order for the UbiComp systems to be interleaved into the world:
(1) computers need to be networked, distributed, and transparently accessible, as
wireless communication network and Internet were far less pervasive; (2) HCI needs
to be hidden (implicit) for it was overly intrusive, and (3) computers need to be
aware of the context of physical and human environment in order to operate in their
physical and human environment in an optimal way. According to Polsdad (2009),
2.5 AmI Versus UbiComp as Concepts 33

different types of UbiComp systems have been proposed based upon merging dif-
ferent sets of core properties, including ubiquity and transparency; distributed
mobile, intelligence, augmented reality; autonomy and iHCI; AmI; and so forth.

2.6 UbiComp and AmI: Definitional Issues

In general, the term ‘ubiquitous’ means omnipresent: appearing or existing


everywhere. Combined with computing, it forms the term ‘ubiquitous computing’,
which was introduced by Marc Weiser in early 1990s, and denotes that technology
in all its forms—computing, communication, and networking—will permeate
everyday human environment. It is a concept in computer science wherein com-
puting can occur using any device and system, in any location and co-location, and
in any design format, enabling the human user to interact with such diverse forms of
computers, as laptops, smart cards and devices, tablets, and terminals in everyday
objects. UbiComp is a way to describe computers that ‘fit the human environment
instead of forcing humans to enter theirs’ (York and Pendharkar 2004, pp. 773–
774). In more detail, UbiComp entails sensing and computing devices (and related
services) being omnipresent, situated in physical and human world environment,
and functioning unobtrusively in the background while being intuitive to human
usage to such an extent that users are not even aware of their presence or sense their
interaction—i.e., the UbiComp devices disappear into the environment and from the
perception of users in such that the latter can engage many (hidden) devices
simultaneously without necessarily being aware of doing so, simply using them
unconsciously to accomplish everyday tasks in a variety of settings. UbiComp is
about technology vanishing, being invisibly woven, into the fabric of everyday life
and being massively used by people (Weiser 1991).
Thus far, there is no canonical definition of AmI, although many attempts have
been, over the last 15 years, undertaken to define the concept of AmI. AmI is a
difficult concept to define precisely; hence, it has been used in multiple ways.
Definitions are fundamental to, and lay the foundation of, the understanding of AmI
as a new concept, as they illustrate the properties of AmI and elucidate the term in
relation to related terms. What is common to all definitions in the literature on AmI
is that it is conceived as distributing computation in the environment and a novel
approach to HCI—i.e., human-centric or social user interfaces. The most basic
prerequisite of AmI is that it is focused on the human actor and thus concerned with
people-centered practice of technology development. Indeed, most attempts to
define and redefine the notion of AmI by most studies that flooded after the pub-
lication of the ISTAG reports on AmI in 2001 and 2003 emphasize this shared
characteristic—AmI denotes a shift towards ‘human-centered computing’ (e.g.,
Aarts et al. 2002). AmI claims to place the user at the center of future design and
development of technologies and provides guiding principles for how this should be
34 2 Ambient Intelligence …

accomplished. In AmI, technologies should be designed and developed for people


rather than making people adapt to technologies. Iterating the ISTAG’s (2001,
p. 11) description of AmI for clarification purposes, ‘…humans will be surrounded
by intelligent interfaces supported by computing and networking technology which
is everywhere, embedded in everyday objects… AmI… is aware of the specific
characteristics of human presence and personalities, takes care of needs and is
capable of responding intelligently to spoken or gestured indications of desire, and
even can engage in intelligent dialog. AmI should also be unobtrusive, often
invisible: everywhere and yet in our consciousness nowhere unless we need it.
Interaction should be relaxing and enjoyable for the citizen, and not involve a steep
learning curve’. This description points out some of the most fundamental ideas
underlying the AmI concept: ‘the idea of a radical and technology driven change to
existing environments and people’s lives; the view of networked devices strongly
embedded into the environment; the idea of transparent systems that do not need to
be noticed by people; the anticipatory and proactive nature of the system that frees
people from manual control of the environment; and intelligent interfaces that will
be able to understand and adapt, not only to the presence of people, but also to
situations of everyday life, including people’s moods, activities or expectations’
(José et al. 2010, p. 1481). In a nutshell, AmI is an adaptive, responsive, and
proactive technology that is omnipresent.
Other attempts to define AmI revolve essentially around the same set of con-
structs. Gill and Cormican (2005, p. 3) define AmI as ‘a people centered technology
that is intuitive to the needs and requirements of the human actor. They are
non-intrusive systems that are adaptive and responsive to the needs and wants of
different individuals’. AmI is described as technology that is capable to automate a
platform embedding the required devices for powering context-aware, personalized,
adaptive and anticipatory services (Arts and Marzano 2003). AmI is lauded to be ‘a
new paradigm in information technology, in which people are empowered through a
digital environment that is aware of their presence and context, and is sensitive,
adaptive, and responsive to their needs, habits, gestures and emotions’ (Riva et al.
2003, p. 63). To Horvath (2002, cited in Gill and Cormican 2005), who advances the
definition further in practical terms, AmI signifies that ‘we will be surrounded by
intelligent interfaces embedded in everyday objects… These interfaces register our
presence, automatically carry out certain tasks based on given criteria, and learn
from our behavior in order to anticipate our needs’. Delving more into the human
actors’ interactions with AmI systems, Lindwer et al. (2003, cited in Gill and
Cormican 2005, p. 3) describe AmI as a technology that is ‘invisible, embedded in
our natural surroundings, present whenever we need it,’ the technology is easily
‘enabled by simple and effortless interactions,’ that are ‘attuned to all our senses,
adaptive to users and context and autonomously acting’.
2.7 More to the Characterizing Aspects of AmI 35

2.7 More to the Characterizing Aspects of AmI

AmI has recently been adopted as a concept to refer to a multidisciplinary subject,


which embraces a variety of pre-existing fields, such as computer science, engi-
neering, cognitive neuroscience, human communication, and so on. Fundamentally,
multiple definitions and descriptions emerge when dealing with multidimensional
concepts or investigating new emerging multifaceted phenomena. AmI is an
evolving socio-technological phenomenon for which there is no clear and widely
acknowledged definition. The research within AmI is ambiguous and vast, which
makes it difficult to delineate the concept of AmI, although defining concepts is a
fundamental step in doing scientific research. This has indeed an implication for
understanding the concept and hampering the advance of AmI. AmI as a new
paradigm in ICT is ill-defined, which is at present hindering its development (Gill
and Cormican 2005). The scholarly literature on AmI is as almost heterogeneous as
the approaches into the conceptualization, modeling, design, and development of
AmI systems within a variety of application domains. This has generated and led to
a profusion of definitions. There is a cornucopia of applications in the domain of
AmI supporting (or combining) different sets and scales of core properties (e.g.,
context awareness, implicit interaction, intelligence; and distribution,) to different
degrees, various types of settings (e.g., home, learning, social, and work environ-
ment) to different degrees; multiple forms of computing (smart) devices (e.g.,
various types of sensors, MEMS, NMES, VLSI Video, and RFID); and a vast range
of combination possibilities of multiple systems to form interacting systems of
systems, and so forth. For example, on smart sensor technologies, Lindwer et al.
(2003, cited in Gill and Cormican 2005, p. 3) highlight there is a ‘large difference in
abstraction level between the thinking about Ambient Intelligence systems and the
micro-, nano-, and optoelectrical components needed to implement those systems’.
This makes the definitions of AmI not that useful to AmI designers and developers

Fig. 2.1 Ambient


intelligence system. Source
Gill and Cormican (2005)
36 2 Ambient Intelligence …

as a research community. This substantiates that definitions of AmI need something


extra to assist AmI engineers in the creation and development of AmI systems—
e.g., generic typologies or frameworks. However, the extension of computing
power into everyday life scenarios in the context of AmI certainly requires
advanced knowledge from diverse human-directed disciplines beyond the proper
ambit of computing, such as cognitive psychology, cognitive science, neuroscience,
social science, behavioral science, linguistics, communication, and philosophy, to
name a few. This makes it certainly overwhelming to understand the concept and
philosophy of AmI. Adding to the lack of an agreed-upon definition is the alphabet
soup of metaphors created by computer scientists and ICT industry designers and
experts that commonly fall under the technology of the future, as mentioned earlier.
This has generated a cacophony leading to an exasperating confusion in the field,
including the elusiveness of new concepts. In all, AmI defies a concise analytical
definition, although one can often point to examples of application domains that
entail specific technological dimensions. However, while most definitions tend to
capture key shared characteristics of AmI as a new computing paradigm (or a
metaphor depicting a vision of a next wave in ICT), a generic typology can still be
useful in understanding this paradigm. A typology can better facilitate an under-
standing of the AmI concept and philosophy (Gill and Cormican 2005).

2.8 Typologies for AmI

A generic topology for AmI can improve its definition and reduce or remove the
ambiguity surrounding what constitutes it and thereby assist in the development of
AmI systems. While typologies are not panaceas, a generic one for AmI systems is
necessary, as it helps to define what AmI is and what it is not and assists the
designers and developers of AmI systems and applications, by having a better
understanding of AmI as a new computing paradigm (Gill and Cormican 2005).
A typology commonly refers to the study and interpretation of types or a taxonomy
according to general type. It is thus grouping models or artifacts describing different
aspects of the same or shared characteristics. There exist various approaches to AmI
typology, involving technological or human views or a combination of these and
supporting different characteristics pertaining to computational tasks and compe-
tencies depending on the application domain, among others. There exist many
theoretical models in literature (e.g., Arts and Marzano 2003; Hellenschmidt and
Kirste 2004; Riva et al. 2005; Gill and Cormican 2005) that look at technological
dimensions as to what enables or initiates an AmI system or take a combined view
of the characteristic of what an AmI system should involve, that is, what constitutes
and uniquely distinguishes AmI from other computing paradigms or technologies.
Based on the foundational tenets of AmI as a paradigm that builds upon
people-centered philosophy, Gill and Cormican (2005) propose an AmI system
typology based on a combined perspective—technological and human side of the
AmI—involving tasks and skills as two main areas that together define what an
2.8 Typologies for AmI 37

AmI system should entail—what is and what is not an AmI system. As illustrated in
Fig. 2.1, the outer ring represents the tasks that the AmI system needs to recognize
and respond to and the inner ring represent the skills that AmI system should
encompass. The authors stated that the tasks: habits, needs, gestures, emotions,
and context are human-orientated, in that they represent the human characteristics
that the AmI must be aware of, whereas the skills: sensitive/responsive,
intuitive/adaptive, people-centered, and omnipresent, are technology-orientated, in
that they represent the technology characteristics that the AmI must have or
inherently accomplish as abilities to interact with the human actors. They also
mentioned that the link between the two areas is of an inseparable, interlinked, and
interdependent nature.
To elaborate further on the link between the tasks and skills, the AmI system
needs to take care of needs, be sensitive to users, anticipate and respond intelli-
gently to spoken or gestured indications of desire, react to explicit spoken and
gestured commands, support the social processes of humans and be competent
agents in social interactions, engage in intelligent dialog or mingle socially with
human users, and elicit pleasant user experiences and positive emotions in users.
AmI thus involves supporting different kinds of needs associated with living, work,
social, and healthcare environments. These needs differ as to the necessity level—
i.e., either they improve the quality of people’s lives or sustain human lives. For
AmI technology to be able to interact with the human actor—what it must innately
accomplish as its aptitudes—and thus provide efficient services in support of the
user, it has to be equipped with such human-like computational capabilities as
context awareness functionality (see Chap. 3), natural interaction and intelligent
behavior (see Chap. 6), emotional and social intelligence (see Chap. 8), and cog-
nitive supporting behavior (see Chap. 9). These computational competencies enable
AmI systems to provide adaptive, responsive, and anticipatory services.
Responsiveness, adaptation, and anticipation (see Chap. 6 for a detailed account
and discussion and Chaps. 8 and 9 for application examples) are based either on
pre-programed heuristics or real-time learning and reasoning capabilities. However,
according to Gill and Cormican (2005, p. 6) for an AmI system to be
sensitive/responsive, it ‘needs to be tactful and sympathetic in relation to the
feelings of the human actor, has to react quickly, strongly, or favorably to the
various situations it encounters. In particular, it needs to respond and be sensitive to
a suggestion or proposal. As such, it needs to be responsive, receptive, aware,
perceptive, insightful, precise, delicate, and most importantly finely tuned to the
requirements of the human actor and quick to respond’. For AmI to be adaptive, it
‘needs to be able to adapt to the human actor directly and instinctively. This should
be accomplished without being discovered or consciously perceived therefore it
needs to be accomplished instinctively i.e., able to be adjusted for use in different
conditions. The characteristics it is required to show are spontaneity, sensitivity,
discerning, insightful and at times shrewd’ (Ibid). And for AmI to be anticipatory
and proactive, it needs to predict the human actor’s needs and desires and pre-act in
a way that is articulated as desirable and appropriate and without conscious
mediation. It is required to think on its own, make decisions based on predictions or
38 2 Ambient Intelligence …

expectations about the future, and act autonomously so the human actor does not
have to work to use it—the AmI system frees people from manual control of the
environment. As such, it needs to be predictive, aware, knowledgeable, experi-
enced, and adaptively curious and confident. This characteristic is, according to
Schmidhuber (1991), important to decrease the mismatch between anticipated states
and states actually experienced in the future. He introduces the concept of curiosity
for intelligent agents as a measure of the mismatch between expectations and future
experienced reality.
Considering the sprouting nature of AmI paradigm, any proposed typology for
AmI normally result from and build on pervious, ongoing, and/or future (theoretical
and empirical) research in the area of AmI, thereby evolving continuously with the
purpose of improving definitions and reducing the ambiguity around what consti-
tutes AmI. Indeed, since the inception of AmI, a number of typologies have been,
and continue to be, developed, revised, refined, restructured, expanded, or adapted
to reflect various renditions pertaining to the amalgamation of computational tasks
and competencies—how they have been, and are being, combined in relation to
various application domains (e.g., ambient assisted living, smart home environ-
ment, workspace, healthcare environment, social environment, etc.) as to what they
entail in terms of the underlying technologies used for the implementation of AmI
systems (e.g., capture technologies, data processing methods, pattern recognition
techniques, modeling and reasoning approaches, etc.) and in terms of the nature of
intelligent services to be provided. Therefore, typologies constantly evolve as new
research results transpire and knowledge advances. This process will continue as
AmI evolves as a computing paradigm and become more established and popular as
an academic discourse.
However, the existing literature on AmI remains heavy on speculation and weak
on empirical evidence and theory building—extant typologies, frameworks, and
models have poor explanatory power, and the applications and systems that have
been developed in the recent years are far from real-world implementation, i.e.,
generally evaluated and instantiated in laboratory settings. This concerns more the
vision of ‘human-centric computing’, as most of the many concepts that have
already been tested out as prototypes in field trials relate more to the vision of
UbiComp. Hence, thorough empirical and theorizing endeavor is necessary for AmI
as both a new computing paradigm and a vision of a next wave in ICT to have
strong academic buy-in and practical relevance in relation to the future form of the
kind of technological development in the information society. At present, the
growth of academic interest in AmI as a ‘paradigmatic shift in computing and
society’ (Punie 2003) is such that it is becoming part of mainstream debate in the
technological social sciences in Europe.
2.9 Paradigmatic, Non-paradigmatic … 39

2.9 Paradigmatic, Non-paradigmatic, Pre-paradigmatic,


and Post-paradigmatic Dimensions of AmI

For what it entails as a metaphor depicting a future vision of technology, AmI


involves aspects, or represents an instance, of both a new computing paradigm as
well as a vision of a next wave in ICT (or a new paradigm in ICT of a loose profile
nature) with societal implications. This is because AmI characterization involves
merging two major trends: (1) ubiquitous computing and communication, distrib-
uting computation in everyday human environment or integration of micropro-
cessors and networked sensors and actuators in everyday objects, and (2) social and
human-centric user interfaces as a novel approach to HCI, which entails a trans-
formation of the role of ICT in society and eventually of how people live and work.
Issues relating to AmI as a paradigmatic shift in computing are also discussed here
given their relevance. Before delving into the discussion of AmI as a new com-
puting paradigm and a paradigmatic shift in computing, it may be useful to first
look at some key concepts that make up this discussion, namely ‘ICT’, ‘comput-
ing’, ‘paradigm’, and ‘paradigm shift’.

2.9.1 ICT and Computing

Abbreviated for information and communication technology, ICT is an umbrella


term that describes a set of technologies used to access, create, store, retrieve,
disseminate, exchange, manage, and transmit information in a digital format. ICT
involves computing systems (e.g., laptops, wearable computers, smart mobile
phones, augmented-reality devices, Internet network, telecommunication systems,
sensors and actuators, etc.) and the associated innumerable software applications
and services. ICT applications span over a myriad of domains and are integrated in
almost all sectors of society. It is often spoken of based on the context of use, e.g.,
living, smart homes, learning, healthcare, energy efficiency, and so on. ICT is
commonly synonymous with information technology (IT), the engineering field that
deals with the use of information and communication systems to handle information
and aid its transmission by a microelectronics-based combination of computing,
networking, and telecommunications, as well as with the knowledge and skills
needed to use such systems securely and intelligently within a wide spectrum of
situations of use. The Information Technology Association of America (ITAA)
defines IT as ‘the study, design, development, implementation, support or man-
agement of computer-based information systems, particularly software applications
and computer hardware’ (Veneri 1998, p. 3).
ICT has been used interchangeably with computing, but there is a distinction
between the two concepts, in that computing theory is concerned with the way
computer systems and software programs are created and function, and ICT theory
deals with the application of ICT in and its effects on society. Generally, computing
40 2 Ambient Intelligence …

can be defined as: ‘any goal-oriented activity requiring, benefiting from, or creating
computers. Thus, computing includes designing and building hardware and soft-
ware systems for a wide range of purposes; processing, structuring, and managing
various kinds of information; doing scientific studies using computers; making
computer systems behave intelligently; creating and using communications and
entertainment media; finding and gathering information relevant to any particular
purpose, and so on. The list is virtually endless, and the possibilities are vast’
(ACM, AIS and IEEE-CS 2005, p. 9).

2.9.2 Paradigm and Paradigm Shift

According to Kuhn (1962, 1996), a paradigm denotes the explanatory power and
thus universality of a theoretical model and its broader institutional implications for
the structure, organization, and practice of science. A theoretical model is a theory or
a group of related theories designed to provide explanations within a scientific
domain or subdomain for a community of practitioners—in other words, a scientific
discipline- or subfield-shared cognitive or intellectual framework encompassing the
basic assumptions, ways of reasoning, and approaches or methodologies that are
universally acknowledged by a scientific community. A comprehensive theoretical
model involves a conceptual foundation for the domain; understands and describes
problems within the domain and specify their solutions; is grounded in prior
empirical findings and scientific literature; is able to predict outcomes in situations
where these outcomes can occur far in the future; guides the specification of a priori
postulations and hypotheses; uses rigorous methodologies to investigate them; and
provides a framework for interpretation and understanding of unexpected outcomes
or results of scientific investigations. Kuhn’s notion of paradigm is based on the
existence of an agreed upon set of concepts for a scientific domain, and this set forms
or constitutes the shared knowledge and specialized language of a discipline (e.g.,
computer science) or sub-discipline (e.g., artificial intelligence, software engineer-
ing). This notion of paradigm: an all-encompassing set of assumptions resulting in
the organization of scientific theories and practices, involves searching for invariant
dominant paradigm governing scientific research. And ‘successive transition from
one paradigm to another via revolution is the usual developmental pattern of mature
science’ (Kuhn 1962, p. 12). This is what Kuhn (1962) dubbed ‘paradigm shifts’.
A paradigm shift is, according to him, a change in the basic assumptions, thought
patterns or ways of reasoning, within the ruling theory of science—in other words, a
radical and irreversible scientific revolution from a dominant scientific way of
looking at the world. This applies to computing, as I will try to exemplify below. In
accordance with Kuhn’s (1962) conception, a paradigm shift in computing should
meet three conditions or encompass three criteria: it must be grounded in a
2.9 Paradigmatic, Non-paradigmatic … 41

meta-theory, be accepted by practitioners of a scientific community, and have a body


of successful practice. This is the case for AmI with regard to its UbiComp strand.

2.9.3 Computing Paradigm and AmI as an Instance


of a New Computing Paradigm

Like all scientific paradigms, computing paradigm is based on the existence of a


widely agreed upon set of concepts and theories, a theoretical model, based on
computer science, computer engineering, IT, information systems, and software
engineering. These five sub-disciplines constitute the field of computing (ACM,
AIS and IEEE-CS 2005). As subdomains of scientific research, they have many
overlaps among them in their theories, methodologies, and practices as they form
the domain of computing. The focus here is on computer science and engineering
given their synergy as well as their more relevance to the topic of paradigm.
Computer science is concerned with the study of the theoretical foundations of
information (e.g., structures, representation) and computation (e.g., mechanisms,
algorithms) and the practical techniques and methods for their implementation in
the designed computer systems. Computer scientists deal with the systematic study
and creation of algorithmic processes that describe, create, and transform infor-
mation and formulate abstractions (or conceptualizations) to model and design
complex systems (Denning et al. 1989; Wegner 1976). Integrating several fields of
computer science and electrical engineering (IEEE and ACM 2004), computer
engineering is concerned with the study, development, and application of computer
systems and applications, hardware and software aspects of computing, such as
designing chips, sensors, actuators, information processing units, operating sys-
tems, and other hardware components and devices and software mechanisms and
processes.
Broadly, research in computing entails two key dimensions: the first is based on
broad types of design science and natural science research activities: build, evaluate,
theorize, and justify, and the second is based on broad types of design research
produced outputs: representational constructs, models, methods, and instantiations
(see March and Smith 1995 for an overview). Design is at the core of computing. As
a scientific paradigm, design science entails an agreed upon set of principles, rules,
methods, and activities used to construct technological artifacts to achieve certain
goals—intended uses. Design science has its roots in engineering and other applied
sciences, which are important for technology development. There is a large body of
work (e.g., Venable 2006; March and Smith 1995; Cross 2001) on meta-theory, a
theory about computing theories, pertaining to engineering science and design sci-
ence, which has engendered several theorems in relation to the field of computing.
Indeed, theory and theorizing are important ingredients in the evolution and practice
of computing as a field of research and development. Like in other scientific para-
digms, theory in computing is a primary output and theorizing plays a central role in
42 2 Ambient Intelligence …

the advancement of engineering, design, and modeling of computing systems. The


foundational tenets and practice of computing paradigm—conceptual and theoretical
model and practical knowledge—are based on hard sciences, such as natural science
and formal science which involve methodological rigor and legitimacy. ‘Natural
science is concerned with explaining how and why things are… Natural scientists
develop sets of concepts, or specialized language, with which to characterize phe-
nomena. These are used in higher order constructions—laws, models, and theories—
that make claims about the nature of reality. Theories—deep, principled explana-
tions of phenomena—are the crowning achievements of natural science research.
Products of natural science research are evaluated against norms of truth, or
explanatory power. Claims must be consistent with observed facts, the ability to
predict future observations being a mark of explanatory success. Progress is
achieved as new theories provide deeper, more encompassing, and more accurate
explanations’ (March and Smith 1995, p. 253). Formal sciences, which are con-
cerned with formal systems, such as logic, mathematics, statistics, theoretical
computer science, information theory, game theory, systems theory, decision theory,
and portions of linguistics, aid the natural sciences by providing information about
the structures the latter use to describe and explain the world, and what inferences
may be made about them. Among the characteristics of hard science include: pro-
ducing testable predictions; performing controlled experiments; relying on quanti-
fiable data and mathematical models; a high degree of accuracy and objectivity; and
generally applying a purer form of the scientific method (Wilson 2012; Lemons
1996; Rose 1997; Diamond 1987).
In light of Kuhn’s notion of scientific paradigm, entailing UbiComp as one of its
two main constituting paradigms, AmI represents a third computing paradigm (as
opposed to keeping computation bottled in a desktop-bound personal computer
(PC) and sharing mainframes by lots of people). AmI paradigm, the age of calm
technology, posits that computing technology recedes or vanishes into the back-
ground of everyday life (e.g., Weiser 1991). This paradigm has also been referred to
as invisible computing and disappearing computing. In AmI, many invisible dis-
tributed computing devices are hidden in the environment, and come to be invisible
to common consciousness. The increasing, continuous process of miniaturization of
mechatronic systems, devices, and components, thanks to micro-engineering, is
increasingly making this computing paradigm deployable, resulting in processors
and tiny sensors and actuators being integrated into more and more everyday
objects, leading to the physical disappearance of computing technology into the
environment. This rapidly evolving development exemplifies a ‘successive transi-
tion from one [computing] paradigm to another via [technological] revolution’
(Kuhn 1962), which represents a developmental pattern of computing as a mature
science. This implies that the new theoretical model pertaining to computing
embodies an explanatory power, which in turn has institutional implications for the
structure and organization of computing as a scientific discipline. AmI represents an
instance of this new computing paradigm with regard to the new ways of designing,
developing, and building computing devices and systems; structuring, representing,
processing, and managing various kinds of information associated with context
2.9 Paradigmatic, Non-paradigmatic … 43

awareness, natural interaction, and intelligence functionalities; making computing


devices and systems behave autonomously and equipping them with affective and
conversational capabilities; creating and using advanced (based on presence tech-
nology) computer-mediated human–human and human–agent communications; and
handling and managing media; and so on. Gunnarsdóttir and Arribas-Ayllon (2012)
found that AmI paradigm has even the generative and performative power to
harness not only technological, but also ‘social-psychological, cultural, political and
moral imaginations into a collective quest for novel reconfigurations of
human-world relationships’, a feature which relates to AmI as paradigmatic shift in
computing.

2.9.4 AmI as a Paradigmatic Shift in Computing

Following Kuhn’s conception of paradigm shift—the element of a drastic break in


intellectual and thus political practice, AmI assumes a paradigmatic shift in com-
puting—in terms of UbiComp as a key constituent of AmI. With that in mind,
UbiComp did herald a paradigm break with the post-desktop paradigm, shifting
from computation bottled in desktop-bound PC to computation distributed in the
environment. Weiser (1991) positioned UbiComp as embodied reality, where
computers are integrated in the real-world, as opposed to virtual reality, putting
human users in computer-generated environments. He wrote: ‘The most profound
technologies are those that disappear. They weave themselves into the fabric of
everyday life until they are indistinguishable from it… This is not just a “user
interface” problem… Such machines cannot truly make computing an integral,
invisible part of the way people live their lives. Therefore we are trying to conceive
a new way of thinking about computers in the world, one that takes into account the
natural human environment and allows the computers themselves to vanish into the
background. Such a disappearance is a fundamental consequence not of technology,
but of human psychology’. Referring to AmI as a paradigmatic shift in computing
(and society), Miles et al. (2002, pp. 4–9) state: ‘It is probably one occasion where
the overused phrase “paradigm shift” is appropriate because it implies a radical shift
in such dimensions as the users of the technology, its incorporation into different
spheres of living and working, the skills required, the applications and content
provided, the scale and nature of the markets and the players involved’. However,
the vision of AmI assumes many shifts, including ‘in computing systems from
mainframe computing (1960–1980) over personal computing (1980–1990) and
multiple computing devices per person… (2000 onwards) to invisible computing
(2010 onwards)’, ‘in communication processes from people talking to people over
people interacting with machines to machines/devices/software agents talking to
each other and interacting with people’; ‘in using computers as a tool to computers
performing tasks without human intervention’; ‘a decoupling of technological
artifact and its functionality/use to multi-purpose devices/services’; ‘in accessibility
and networking from on/off over may access points to always on, anywhere,
44 2 Ambient Intelligence …

anytime’ (Punie 2003, p. 12). This paradigm shift ‘has the objective to make
communication and computer systems simple, collaborative and immanent.
Interacting with the environment where they work and live, people will naturally
and intuitively select and use technology according to their own needs’ (Riva et al.
2003, p. 64).
More to Kuhn’s (1996) conception of paradigm shift, AmI stemming from
UbiComp is accepted by a community of practitioners and has a body of successful
practice. As mentioned earlier, there is a strong institutional and governmental
support for and commitment to AmI—industry associations, scholarly and scientific
research community, and policy and politics. The research and innovation within
AmI are active across Europe at the levels of technology farsightedness, science and
technology policy, research and technology development, and design of next
generation technologies (see Punie 2003; Wright 2005). They pertain predomi-
nantly to the areas of microelectronics (miniaturization of mechatronic systems,
devices, and components), embedded systems, and distributed computing. In par-
ticular, the trends toward AmI are noticeably driving research and development into
ever smaller sizes of computing devices. AmI is about smart dust with networked
miniature sensors and actuators and micro-electro-mechanical systems (MMES)
incorporating smart micro-sensors and actuators with microprocessors and several
other components so small to be virtually indiscernible or invisible. The minia-
turization trend is increasingly enabling the development of various types and
formats of sensing and computing devices that allow registering and processing
various human parameters (information about people) in an intrusive way, without
disturbing users or actors (see Chap. 4 for more detail on miniaturization trends and
related issues).
In the very near future, both the physical and human world will be overwhelmed
by or strewn with huge quantities of tiny devices (e.g., active and passive RFID
tags), entrenched into everyday objects and attached to people, for the purpose of
their identification, traceability, and monitoring. Today, RFID tags are attached to
many objects and are expected to be embedded in virtually all kinds of everyday
objects, with the advancement of the Internet of Things. In recent years, efforts have
been directed towards designing remote devices and simple isolated appliances—
that might be acceptable to the users and consumers of AmI technology, which
‘prepares the ground for a complete infiltration of our environment with even more
intelligent and interconnected devices. People should become familiar with AmI;
slowly and unspectacularly; getting used to handing over the initiative to artificial
devices. There is much sensing infrastructure already installed for handling secu-
rity… What remains to be done is to shift the domain of the intended monitoring
just enough to feed the ongoing process of people getting used to these controls and
forgetting the embarrassment of being permanently monitored, in other words—
having no off-switch’ (Crutzen 2005, p. 220). At present, the environment of
humans, the public and the private, is pervaded by huge quantities of active devices
of various types and forms, computerized enough to automate day-to-day decisions
and thus act autonomously on behalf of human–agents. However, the extensive
incorporation of computer technology into people’s everyday lives and thus the
2.9 Paradigmatic, Non-paradigmatic … 45

inevitable employment of artificial intelligent agents to automate day-to-day deci-


sions involve repercussions that are difficult to foresee. In fact, the question to be
raised is whether people really want to live in a world permeated with computer
devices that take on their routine decision-making activities.

2.9.5 Non-paradigmatic Aspects of AmI

AmI has been concerned with people-centered practice of technological develop-


ment. This implies that AmI is (claimed to be) about technologies that are fully
designed for and adapted to people (human cognition, behavior, and needs)—i.e.,
based on new insights in the way people like to interact with such technologies and
their applications, smart environments can be created which improve the quality of
their life. If the people are the principal actors in the AmI paradigm, the relevant
socio-technological reality must be only of the people’s own construction.
Following this reasoning, how can there be a general AmI theory, let alone a
paradigm? There can only be a scattered archipelago of local socio-technological
perspectives pertaining to the incorporation of computer technology into people’s
everyday lives and environments and how this can bring them a better life—in other
words, how the promises made by AmI concerning the transformation of the role of
ICT in society can transform the way people live and work. In addition to this
argument, AmI travels under many aliases—context-aware computing, situated
computing, sentient computing, wearable computing, invisible computing, calm
computing, pervasive computing, disappearing computing, affective computing,
and so forth. Such scattering or dispersion of computing trends does not provide the
conditions for, or facilitate, the generation of a coherent body of theory. In many
cases, computing sources do not refer in any systematic way to one another, but
keep on generating alternative labels with some of them even from the ground up,
in the process reinventing the wheel or starting from scratch without zeroing in on
generating ‘expert opinion’. There are still further reasons why the notion of a
paradigm (shift) may not apply to AmI in relation to society. One key consideration
is that the elements of the AmI paradigm are contradictory. While AmI technologies
should be designed for and adapted to people, the people who are to live in AmI
and the IoT are not asked for their views as part of the design and innovation
process. Another consideration is that AmI concern normative values and, thus, is
concerned with various policy frameworks, rather than explanatory and
meta-theoretical frameworks. It is more a vision of the future information society—
and, to add, promoted by certain ICT companies, institutions, and policymakers for
particular ends—than a reality. By virtue of its very definition, it is normative,
signifying a certain desired view on the socio-technological world, and also serve
political-economic purposes. Overall, AmI is not necessarily anti-theoretical but it
is intellectually fragmented. The work of several AmI authors can be contextualized
in terms of their institutional belonging, scholarly affiliation, social location, cul-
tural inclination, ideological commitment, and socio-political status. In particular,
46 2 Ambient Intelligence …

institutional dimension entails that there are clear political advantages to a break
with existing societal paradigm—which is not fully technologized, thereby AmI
finding strong institutional (and governmental) support.

2.9.6 Pre-paradigmatic and Post-paradigmatic Aspects


of AmI

Like all paradigms in (technological) social science, AmI being post-paradigmatic


or, at least, non-paradigmatic—in relation to society—has to do obviously with not
being grounded on a solid, meta-theoretical base that transcends contingent human
actions—i.e., it lacks a theoretical model with an explanatory power and universal
nature (and as taken to assume a paradigmatic shift in society, it does not dem-
onstrate a drastic break in intellectual and thus political practice.
AmI is pre-paradigmatic because there is no scholarly consensus available in
social sciences and humanities (and other human-directed sciences) upon which it is
based. Human-directed sciences (see below for elucidation) involve volatile theo-
ries, pluralism of theoretical models, and a plethora of unsolved issues. Adding to
this is the generally understood extraordinary complexity of social sciences (and
humanities), as they involve social and political processes which are reflexive in
nature (see Bourdieu and Wacquant’s (1992) notion of reflexive sociology), i.e.,
social actors act upon theories themselves, which are hence adapted in action (see
Bourdieu’s (1988) analyses of social science in action). This is most likely to carry
over its effects to the implementation of knowledge about cognitive, emotional,
social, and behavioral processes of humans into AmI systems and thus their
behavior. But the AmI vision continues to be performed to elucidate the role of
paradigm-making to communicate complex problems and address multiple issues
pertaining to how people would want what they want. In addition, as a new
approach to HCI, AmI integrates a range of human-directed disciplines and
sub-disciplines, including cognitive science, cognitive psychology, cognitive neu-
roscience, social sciences (e.g., anthropology, sociology, etc.), human verbal and
nonverbal communication, linguistics, media and cultural studies, and philosophy,
but to name a few. However, through identifying limitations, complications, and
new possibilities, disciplinary (and sub-disciplinary) synergies further complicate
the AmI vision (Gunnarsdóttir and Arribas-Ayllon 2012).
AmI is post-paradigmatic because the conditions of inquiry within the field
reflects and acknowledges the gaps, risks, limits, and discontinuities that AmI
paradigm (as called) fails to notice, especially AmI. Gunnarsdóttir and
Arribas-Ayllon (2012, p. 16) point out, ‘[a] striking feature of the AmI narrative is
continuous modulation of promises… But we also identify highly reflexive prac-
tices of anticipating possibilities, limitations and dangers, with which the future
horizon is modified and adjusted. One is the unique strategy of deliberately com-
plicating the expectations [as ‘an innovation practice, subjecting AmI developments
to an ever-growing number of disciplines and methodological approaches which
2.9 Paradigmatic, Non-paradigmatic … 47

require continuous experimentation, monitoring and reporting’] by aggregating


disciplines to carefully explore the subtleties of ordinary reasoning, communication
and interaction in everyday situations. Another strategy is the world-making that
situates AmI in a social economy and a culture undergoing radical changes [i.e.,
‘accounting for contingencies is a rhetorical strategy creating worlds in which AmI
visions and technologies seek alignment with socio-economic and cultural imagi-
nations, and respond to changes in the global environment)’]. The third is to ear-
nestly engage in the contemplation of futures to be avoided.’ In line with this
thinking, José et al. (2010, p. 1480) argue that the inspiring vision of AmI ‘should
no longer be the main driver for AmI research’ and it is necessary to re-interpret its
role; it is time for the AmI field to move behind its foundational vision and thus
rethink its currently prevailing assumptions, claims, and approaches, by embracing
important emerging trends, among other things. Regardless, even new trends are
essentially subject to future interrogations—predicated on the assumption of the
perennial changing nature of the configuration of scientific and social knowledge.
All in all, in current usage, AmI paradigm (in society or in ICT) can be used in a
loose sense of an ‘intellectual framework’, similar to discourse, and not in Kuhn’s
specific meaning of an explanatory and meta-theoretical framework. Here discourse
refers to a specific, coherent set of concepts, ideas, terminologies, claims,
assumptions, visions, categorizations, and stories that are constructed, recon-
structed, transformed, and challenged in a particular set of social practices—in other
words, that are socially specific and historically contingent and that generate
(discursive) truth effects, e.g., meaning and relevance is given to social realities.

2.10 Technological Factors Behind the AmI Vision

The main goal of AmI is to make computing technology everywhere, simple to use
and intuitive to interact with, and accessible to people with minimal technical
knowledge. The AmI vision is evolving towards an achievable and deployable
computing paradigm, thanks to the recent advances in embedded systems, micro-
electronics, wireless communication networks, multimodal user interfaces, and
intelligent agents. These enabling technologies are expected to evolve even more.
They are a key prerequisite for realizing the AmI vision, especially in terms of its
UbiComp vision. This is about the technology necessary for turning it into reality,
making it happen. AmI systems are increasingly maturing and proliferating across a
range of application domains.
Embedded systems constitute one of the components for ambience in AmI. AmI
is characteristically embedded: many networked devices are integrated into the
environment. The recent advances in embedded systems have brought significant
improvements. Modern embedded systems, which are dedicated to handle a par-
ticular task, are based on microcontrollers (i.e., processors with integrated memory
and peripheral interfaces). An embedded system is a computer system with a
dedicated task, often with reactive computing—hardware and software systems are
48 2 Ambient Intelligence …

subject to a real-time computing constraint, e.g., operational deadlines from event to


system response (e.g., Ben-Ari 1990), and is embedded as part of a complete device
often including electrical and mechanical parts—within a larger mechanical or
electrical system. Further, there are different approaches to processors (e.g., general
purpose, specialized, custom designed, etc.). Embedded systems differ in size and
cost, reliability, performance, and complexity, depending on the type of the tasks
they are dedicated to handle. As a common application today, many devices can be
controlled by embedded systems.
The progress of microelectronics has altered the nature of computing devices.
Advances in electronic components (increasing capacity of computing power and
storage) every 18–24 months at fixed costs has significantly affected many aspects
of computing capabilities, including processing power, computational speed,
memory, energy optimization, performance, efficiency, and so on. This has made it
possible to entrench computing devices in everyday objects, a trend which is
rapidly evolving. In particular, miniaturization has been a key factor for incorpo-
rating multiple smart sensors and microprocessors in everyday objects. There is
already a huge amount of invisible computing devices embedded in laptops, mobile
phones, wearable computers, and various types of appliances. Sensors are
increasingly being manufactured on a microscopic scale, and this will with AmI
continue to increase exponentially. Computing devices are increasingly equipped
with quantum-based processing capacity and linked by mammoth bandwidth
wireless networks with limitless connectivity, ushering in the era of the always-on,
interconnected computing resources. This also relates to the the IoT: the inter-
connection of uniquely identifiable embedded devices, physical and virtual objects,
and smart objects, using embedded systems, intelligent entities, and communication
and sensing-actuation capabilities to interact with each other and with the envi-
ronment via the Internet.
Recent advances in wireless and mobile networking technologies have drastically
improved the capacity (mega-bandwidth), speed, energy efficiency, availability, and
proliferation of communication networks. The three decade development in these
technologies has enabled the idea of the massively distributed, embedded computing
devices characteristic to AmI computing to become networked or connected.
HCI has evolved over the last four decades, from an explicit timely bidirectional
interaction between the human user and the computer system to a more implicit
multidirectional interaction. The shift from explicit means of human inputs to more
implicit forms of inputs implies supporting natural human forms of communication
and thus natural interaction. In desktop applications, graphical user interfaces
(GUIs) as commonly used approaches are built on event based interaction, a direct
dialog which occurs as a sequence of communication events between the user and
the system (Schmidt 2005). This explicit HCI approach works through a user
conforming to static devices (e.g., keyboard, mouse, touch screen, and visual dis-
play unit) using them in a predefined way. Various types of explicit user interface
can be distinguished, including batch interfaces, command line interfaces, graphical
user interfaces (GUIs), Web user interfaces (WUI), natural-language interfaces,
touch screen, and zooming user interfaces (see Chap. 6 for more detail). Common
2.10 Technological Factors Behind the AmI Vision 49

to all explicit user interfaces is that the user explicitly requests an action from the
computer, the action is carried out by the computer, and then the system responds
with an appropriate reply. In AmI computing, on the other hand, the user and the
system are in an implicit interaction where the system is aware of the context in
which it operates or is being used and responds or adapts its behavior to the
respective context. This relates to iHCI: ‘the interaction of a human with the
environment and with artifacts’ as a process which entails that ‘the system acquires
implicit input from the user and may present implicit output to the user’ (Schmidt
2005, p. 164). Hence, iHCI involves a number of the so-called naturalistic user
interfaces, including facial user interfaces, gesture user interfaces, voice interfaces,
motion tracking interfaces, eye-based interfaces, and so on.
The intelligent agent as a paradigm became widely recognized during the 1990s
(Russell and Norvig 2003; Luger and Stubblefield 2004), a period that marked the
emergence of UbiComp vision. In computing, the term ‘intelligent agent’ may be
used to describe a software agent that has some intelligence, a certain degree of
autonomy, ability to react to the environment, and goal-oriented behavior. There are
many different types of agents (see Chap. 6), but common to all of them is that they
act autonomously on behalf of users—decide and execute tasks on their own
autonomy and authority. Intelligent agents represent one of the most promising
technologies in AmI—intelligent user interfaces—because they are associated with
computational capabilities such as adaptation, responsiveness, and anticipation
relating to service delivery. Accordingly, capture technologies, pattern recognition
techniques, ontological and hybrid modeling and reasoning techniques, and actu-
ators have attracted increasing attention as AmI computing infrastructures and
wireless communication networks become financially affordable and technically
matured.
In all, intelligent environments, in which AmI can exist, which involve the
home, work, learning, and social settings, are increasingly becoming computa-
tionally augmented: equipped with smart miniature sensors and actuators and
information processing systems. These intelligent environments will be common-
place in the very near future. This can be explained by the dramatic reduction in the
cost and the advancement of computing, networking, and communication tech-
nologies, which have indeed laid the foundations for the vision of AmI to become
an achievable computing paradigm. In sum, it can be said that AmI is primarily
based on technological progress in the aforementioned fields. The required research
components in which significant progress has to be made in order to further develop
and realize the AmI vision include: in terms of ambient components, MEMS and
sensor technology, embedded systems, ubiquitous communications, input and
output device technology, adaptive software, and smart materials, and in terms of
intelligence component, contextual awareness, natural interaction, computational
intelligence, media handling and management, and emotional computing (ISTAG
2003).
50 2 Ambient Intelligence …

2.11 Research Topics in AmI

2.11.1 Computer Science, Artificial Intelligence,


and Networking

As a result of the continuous effort to realize and deploy AmI paradigm, which
continues to unfold due to the advance and prevalence of multi-sensory, minia-
turized devices, smart computing devices, and advanced wireless communication
networks, all AmI areas are under vigorous investigation in the creation of smart
environments, ranging from low-level data collection (i.e., sensing, signal pro-
cessing, fusion), to intermediate-level information processing (i.e., recognition,
interpretation, reasoning), to high-level application and service delivery (i.e.,
adaptation and actions), to networking and middleware infrastructures. As a mul-
tidisciplinary paradigm and a ‘crossover approach’, AmI is strongly linked to a lot
of topics related to computer science, artificial intelligence, and networking.
In terms of computer science, artificial intelligence, and networking, topics
include, and are not limited to: context-aware, situated, affective, haptic, sentient,
wearable, invisible, calm, smart, mobile, distributed, and location computing;
embedded systems; knowledge-based and perceptual user interfaces; micropro-
cessors and information processing units; machine learning and reasoning tech-
niques; ontological modeling and reasoning techniques; real-time operation
systems; multi-agent software; human-centered software engineering; sensor sys-
tems and networks; MMES and NMES; multimodal communication protocols;
wireless and mobile communication networks; smart materials for multi-application
smart cards; embodied conversational agents; and so forth (Punie 2003; Bettini
et al. 2010; Schmidt 2005; Oulasvirta and Salovaara 2004; Chen and Nugent 2009;
Picard 2000; Senders 2009; Lyshevski 2001; Vilhjálmsson 2009).
To create AmI environments requires collaboration between scholars and experts
from several research areas of AmI, which can be clustered into: ubiquitous
communication and networking, context awareness, intelligence, and natural HCI.
The first area involves fixed, wireless, mobile, and ad-hoc networking systems,
discovery mechanisms, software architectures, system integration, and mobile
devices. The second area encompasses sensors, smart devices, and software
architectures for multi-platform interfaces, as well as capture, tracking, positioning,
monitoring, mining, and aggregation techniques. The third area includes pattern
recognition algorithms, ontological modeling and reasoning, and autonomous
intelligent decision making. The last area involves multimodal interaction, hyper-
media interfaces, and agent-based interfaces. These areas have some overlaps
among them.
2.11 Research Topics in AmI 51

2.11.2 Middleware Infrastructure

In addition to the above is the research area of middleware architecture (e.g.,


Azodolmolky et al. 2005; Strimpakou et al. 2006; Soldatos et al. 2007). It is
important to highlight the key role of middleware in AmI. (This topic is beyond the
scope of this book.) Indeed, advances in middleware research are critically
important, as middleware represents the logic glue: it connect several kinds of
distributed components, in the midst of a variety of heterogeneous hardware sys-
tems and software applications needed for realizing smart environments and their
proper functioning. Put differently, in order for the massively embedded, distrib-
uted, networked devices and systems, which are invisibly integrated into the
environment, to coordinate require middleware components, architectures, and
services. Middleware allows multiple processes running on various sensors, devi-
ces, computers, and networks to link up and interact to support daily activities
wherever needed. It is the coordination and cooperation between heterogeneous
devices, their ability to communicate seamlessly across disparate networks, rather
than their wide spread presence that create AmI environments. These are highly
distributed, heterogeneous, and complex, involving myriad computing devices
whose numbers are set to continuously increase by orders of magnitude and which
are to be exploited in their full range to transparently provide services on a
hard-to-imagine scale, regardless of time and place. AmI infrastructures are highly
dynamic, while featuring a high degree of heterogeneity (e.g., Johanson et al. 2002;
Garlan et al. 2002), and middleware boosts interoperability, integration, coopera-
tion, and dynamicity (e.g., sensors join and leave the AmI infrastructure in a
dynamic fashion) necessary to support highly heterogeneous and distributed com-
ponents (e.g., agents) and scalable systems.
Middleware components are intended to provide information on people and
objects—to identify them and their behavior, activities, actions, and locations in the
scope of multi-sensor indoor and outdoor infrastructures. Therefore, middleware is
crucial for context representation, interpretation, and management. The amalgam-
ation of sensing technologies, ubiquitous computing, and distributed middleware
aims at creating a new generation of pervasive or AmI services. Distributed pro-
cessing is empowered by middleware components for transfer of signals from
various sources and for realizing information fusion from multiple perceptive
components (Azodolmolky et al. 2005). Moreover, middleware can be used to
support and deploy data-centric distributed systems, such as network-monitoring
systems, sensor networks, the dynamic Web whose ubiquitous presence creates
very large application networks that spread over large geographical areas. It is
increasingly evident that intensive processing, the massive data dissemination, and
intelligent fusion in order to build dynamic knowledge bases are becoming
achievable, owing to the recent advances and innovation solutions to operating
efficiencies, easing application and networking development; enhancing data
management, and boosting interoperability between applications. Therefore, sup-
porting AmI systems and applications necessitates a wide range of middleware
52 2 Ambient Intelligence …

components, especially components for context awareness for it relies on gathering


a huge amount of implicit contextual information from distributed sensors.
Building middleware infrastructures of such magnitude, multi-layering, and
complexity requires enormous research endeavor in design and engineering.
Middleware is one of the main technical and engineering challenges, as AmI
requires complex middleware components and architectures. There is a need to
develop new middleware technologies for adaptive, reliable, and scalable handling
of high-volume dynamic information flows for coping with the complexity of the
unprecedented extensity and velocity of information flow, constantly changing
underlying network connectivity, dynamic system organization, high sensitivity and
real-time processing of data, and massive volatile and unpredictable bursts of data
at geographically dispersed locations.

2.12 Human-Directed Sciences and Artificial Intelligence


in AmI: Disciplines, Fields, Relationships,
and Contributions

Directed at humans, AmI is moreover strongly linked to a number of fields and


subfields related to human-directed sciences. These include, but are not limited to:
cognitive psychology, cognitive science, cognitive neuroscience, human commu-
nication, linguistics, philosophy, sociology, and anthropology; a brief account of
these disciplines is provided below. Especially, the class of AmI applications on
focus in this book exhibits human-like understanding and intelligent supporting
behavior in relation to cognitive, emotional, social, and conversational processes
and behaviors of humans. The human-directed sciences are in AmI associated with
modeling in terms of incorporating related knowledge into AmI systems to enhance
their computational understanding and thus inform and guide their behavior, with
design in terms of how AmI systems should be constructed to better suit implicit
and natural forms of interaction with human users, and with, more broadly, HCI,
which is highly interdisciplinary: it studies humans and computers in conjunction,
and thus integrates a range of academic human-directed disciplines (see Chap. 5 for
more detail).

2.12.1 Cognitive Psychology

Psychology is the scientific study of the processes and behavior of the human brain.
Cognitive psychology is one of the recent psychological approaches and additions
to psychological research. It is thus the subfield of psychology that studies internal
mental information-manipulation processes and internal structures and
2.12 Human-Directed Sciences … 53

representations used in cognition between stimulus and response (e.g., Galotti


2004; Passer and Smith 2006). The core focus of cognitive psychology is on how
humans process information. Mental processes are the brain activities that handle
information when sensing and perceiving objects and situations, storing informa-
tion, solving problems, making decisions, learning, processing language, reasoning,
and so forth. The school of thought derived from the cognitive approach is known
as cognitivism, which is a theoretical framework for understanding the mind.
Cognitivists argue that thinking is so essential to psychology that the study of
thinking should become its own field (Lilienfeld et al. 2009). The cognitive
approach has achieved a phenomenal success, which is manifested in its current
dominance as the core model in contemporary psychology (e.g., Frijda 1986;
Cornelius 1996; Scherer et al. 2001; Ortony et al. 1988; Russell 2003; Galotti 2004;
Passer and Smith 2006). The information processing view is supported by many
years of research. Additionally, cognitive psychology has fueled a generation
of productive research, yielding deep and fertile insights into many aspects of
cognition. Major research areas in cognitive psychology include: sensation (e.-
g., sensory modalities, sensory memory); perception (e.g., attention, pattern rec-
ognition); categorization (e.g., categorical judgment and classification, category
representation and structure); memory (e.g., emotion and memory, working
memory, short-term memory, long-term memory, semantic memory); knowledge
presentation (e.g., mental imagery, propositional encoding); language (e.g., gram-
mar, phonetics, language acquisition, language understanding and production);
thinking (e.g., decision making, formal and natural reasoning, problem solving);
emotion (e.g., cognitive appraisal processing, neuro-physiological arousal); but to
name a few. There are numerous practical applications for cognitive psychology
research, including ways to improve memory, how to stimulate creativity, how to
enhance decision-making accuracy, how to facilitate problem solving, how to
enhance learning, and so forth. Recently, cognitive psychology has started to focus
on the study of the relationship between cognition and emotion, as perception grew
among cognitive psychologists that cognition is impossible without emotion.
Emotion studies have contributed to ‘ground cognitive psychology—which has had
a penchant for the abstract—in the real-world, uncovering important science
behind’ how people make decision in all walks of life (Lehrer 2007). Most of the
above research areas are of interest to cognitive science research based on complex
representations and computational processes.

2.12.2 Cognitive Science

Cognitive science is concerned with the interdisciplinary scientific study of cog-


nition, intelligence, or mind as information processors. It thus draws on a number of
research disciplines (analytical fields), embracing cognitive psychology, computer
science, cognitive neuroscience, neurophysiology, linguistics, cognitive and
54 2 Ambient Intelligence …

cultural anthropology, philosophy (especially the philosophy of mind and lan-


guage), communication, and so on. The shared concern is the quest for under-
standing the nature of the mind. Cognitive science investigates how information is
sensed, perceived, represented, processed, stored, and transformed in the human
brain or computer systems. It involves researchers from several fields exploring
new areas of mind and developing theories based on human and computational
complex representations and processes. Some cognitive scientists limit their study
to human cognition, while others consider cognition independently of its imple-
mentation in human or computers: ‘cognition, be it real or abstract, human or
machine’ (Norman 1981, p. 1). Given its interdisciplinary nature, cognitive science
espouses a wide variety of scientific research methodologies, among which include
behavioral experiments, brain imagery, and neurobiological methods, in addition to
computational modeling or simulation. While cognitive science encompasses a
wide range of subject areas on cognition, it does not deal equally with every subject
area that might be relevant to the functioning of the human mind or intelligence.
Among the topics, which normally cover a wide range of intelligent behaviors,
include, but are not limited to, knowledge representation, knowledge and pro-
cessing of language, learning, memory, formal reasoning, perception and action,
and artificial intelligence.

2.12.3 Artificial Intelligence (AI)

AI is the branch of computer science (defined above) that is concerned with


understanding the nature of human intelligence (e.g., cognitive intelligence, emo-
tional intelligence, social intelligence, and conversational intelligence), and creating
computer systems capable of emulating intelligent behavior. Cognitive intelligence
as a general mental capability entail, among other things, the ability to make think
abstractly, reason, comprehend complex ideas, learn from experience, plan, make
decisions, and solve problems. For what emotional, social, and conversational
intelligences entail in relation to AmI, see Chap. 1—introduction. AI also refers to
the modeling of intelligent cognitive and behavioral aspects of humans into
machines, such as learning, reasoning, problem solving, perception, planning,
creativity, language production, actuation, and so forth. John McCarthy, who
coined the term in 1956, defines AI as ‘the science and engineering of making
intelligent machines’ (McCarthy 2007). Another common definition of AI is the
study of intelligent agents, systems which perceive their environment and make
decisions and take actions that increase their chances of success (see, e.g., Russell
and Norvig 2003; Poole et al. 1998; Luger and Stubblefield 2004). In all, while
there are many definitions of AI in the literature, a common thread running through
all definitions is the study of cognitive phenomena or the simulation of human
intelligence into machines. Implementing aspects of human intelligence in com-
puter systems is one of the main practical goals of AI. In relation to AmI, to
simulate intelligence into computers, that is, to enable AmI systems to emulate
2.12 Human-Directed Sciences … 55

intelligent behavior, entails augmenting such systems with such capabilities as


sensation, perception (recognition and interpretation), reasoning, decision making,
actuation, and so on, as well as awareness of the cognitive, emotional, social, and
environmental dimensions of the user context, adding to responsiveness to task
commands transmitted through voice, facial expression, or gestures.
Research in AI is characterized by high specialization, deeply separated into
dedicated subfields that often fail to connect with each other (McCorduck 2004).
The lack of interdisciplinary and collaborative research endeavors is a major con-
cern in the field of AI. McCorduck (2004, p. 424) writes: ‘the rough shattering of
AI in subfields-vision, natural language, decision theory, genetic algorithms,
robotics…and these with own sub-subfield-that would hardly have anything to say
to each other’.
AI has become an essential part of the ICT industry, providing solutions for the
most complex problems encountered in computer science (Russell and Norvig
2003; Kurzweil 2005). Especially, AI systems have greatly improved for the last
decade (Sanders 2009). It is decisive in AmI research and practice. Computer
intelligence combines a wide range of advanced technologies, such as machine
learning, artificial neural networks, multisensory devices, data fusion techniques,
modeling techniques, context awareness, natural HCI, computer vision, intelligent
agents, and so forth.

2.12.4 Relationship Between Cognitive Psychology,


Cognitive Science, and AI

Cognitive psychology, cognitive science, and AI involve the study of the phe-
nomenon of cognition or intelligence, with cognitive psychology focused on the
nature of cognition in humans, cognitive science in both humans and computers,
and AI particularly in machines and computers. With aiming and sharing the
interest to understand the nature and organizing principles of the mind, they involve
low-level perception mechanisms and high-level reasoning and what they entail,
thereby spanning many levels of analysis. They all pride themselves on their sci-
entific basis and experimental rigor. As contributors to the cognitive evolution, they
are built on the radical notion that it is possible to study, with scientific precision,
the actual processes of thought. Insofar as research methods are taken to be com-
putational in nature, AI has come to play a central role in cognitive science
(Rapaport 1996). And given its interdisciplinary nature, cognitive science espouses
a wide variety of methodologies, drawing on scientific research methods from
cognitive psychology, cognitive neuroscience, and computer science. Cognitive
science and AI use computer’s intelligence to understand how humans think.
Computers as tools are widely used to investigate various cognitive phenomena.
In AI, computational modeling makes use of simulation techniques to investigate
how human intelligence may be structured (Sun 2008). Testing computer programs
56 2 Ambient Intelligence …

by how they can accomplish what they can accomplish is said, in the field of AI, to
be doing cognitive science: using AI to understand the human mind. Cognitive
science also provides insights into how to present information to or structure
knowledge for human beings so they can use it most effectively in terms of pro-
cessing and manipulation. In addition, cognitive science employs cognitive para-
digms to understand how information processing systems such as computers can
simulate cognition or how the brain implements information-processing functions.
In relation to this, del Val (1999) suggests that in order for cognitive psychology to
be useful to AI, it needs to study common-sense knowledge and reasoning in
realistic settings and to focus on studying how people do well the things they do
well. Also, analyzing AI systems provides ‘a new understanding of both human
intelligence and other intelligences. However, it is difficult to study the mind with a
similar one—namely ours. We need a better mirror. As you will see, in artificial
intelligent systems we have this mirror’ (Fritz 1997). Moreover, both cognitive
scientists and cognitive psychologists were the antagonists of reason and therefore
tended to reinforce the view that emotions interfere with cognition, and now dis-
covered, building on almost more than two decades of mounting work, that it is
impossible to understand how we think without understanding how we experience
emotions. This area of study has become of prime focus in AI—specifically
affective computing—in the recent years (addressed in the previous chapter).
Furthermore, core theoretical ideas of cognitive science, of which psychology is
the thematic heart, are drawn from AI; many cognitive scientists try to build
functioning models of how the mind works. AI is considered as one of the fields (in
addition to linguistics, neuroscience, philosophy, anthropology, and psychology)
that contributed to the birth of cognitive science (Miller 2003). Cognitive science
could be synonymous with AI when the mind is understood as something that can
be simulated through software and hardware—a computer scientist’s view (Boring
2003). AI and cognitive psychology are a unified endeavor, with AI focused on
cognitive science and ways of engineering intelligent entities. Cognitive psychol-
ogy evolved as one of the significant facets of the interdisciplinary subject of
cognitive science, which attempts to amalgamate a range of approaches in research
on the mind and mental processes (Sun 2008). Owing to the use of computational
metaphors and terminology, cognitive psychology has benefited greatly from the
flourishing of research in cognitive science and AI. One major contribution of
cognitive science and AI to cognitive psychology is the information processing
model of cognition. This is the dominant paradigm in the field of psychology,
which is a way of thinking and reasoning about mental processes, envisioning them
as software programs running on the computer as a human brain. In this account,
humans are viewed as dynamic information processing systems whose mental
operations are described in computational terminology, e.g., inputs, structures,
representations, processes, and outputs, and metaphors, e.g., the mind functions as a
computer. The cognitive revolution was, from its inception, guided by the metaphor
that the mind is like a computer, and ‘cognitive psychologists were interested in the
software’ programs, and this ‘metaphor helped stimulate some crucial scientific
breakthroughs. It led to the birth of AI and helped make our inner life a subject
2.12 Human-Directed Sciences … 57

suitable for science’ (Lehrer 2007). ‘The notion that mental states and processes
intervene between stimuli and responses sometimes takes the form of a “compu-
tational” metaphor or analogy, which is often used as the identifying mark of
contemporary cognitive science: The mind is to the brain as software is to hard-
ware; mental states and processes are (like) computer programs implemented (in the
case of humans) in brain states and processes’ Rapaport (1996, p. 2). All in all,
advances in AI, discoveries in cognitive science, and advanced understanding of
human cognition (information processing system) are, combined, generating a
whole set of fertile insights and new ideas that is increasingly altering the way we
think about how we think and how we should use this understanding to advance
technology towards the level of human functioning. One corollary of this is the
socio-technological phenomenon of AmI, especially the intelligent behavior of AmI
systems associated with facilitating and enhancing human cognitive intelligence,
thanks to cognitive context awareness and natural interaction.

2.12.5 Contributions of Cognitive Disciplines and Scientific


Areas to AmI

One of the significant contributions of cognitive science and AI to computing is the


creation and implementation of computer systems that are capable of emulating
human intelligent behavior. AmI technology represents an instance of this wave of
computing. In the recent years, the evolution of cognitive science and the
advancement of AI have provided the ground for the vision of AmI to become a
reality, enabling AmI systems to evolve rapidly and spread across a whole range of
areas of applications. At present, tremendous opportunities reside in deploying and
implementing AmI systems on different scales, intelligence, and distribution, thanks
to AI. To iterate, AI has become an essential part of the ICT industry, providing
solutions for the most difficult problems in computing (Russell and Norvig 2003;
Kurzweil 2005). AmI systems are increasingly performing well towards emulating
many aspects of human intelligence, by becoming highly intelligent entities due in
large part to the advance and prevalence of AI techniques. The engineering, design,
and modeling of such entities is made possible by simulating the human mind—as
complex mental information-manipulation processes. The cognitive science view of
humans as dynamic information processing systems whose mental operations are
described in computational terminology (e.g., sensory inputs, artificial neural net-
works, knowledge representation, reasoning mechanisms, outputs, etc.) has led to
simulating ‘broad areas of human cognition’ (Vera and Simon 1993)—i.e.,
implementing human cognitive models into computer systems, which has enabled
the vision of AmI to become deployable and achievable as a computing paradigm.
Examples of AI processes and models which emulate human cognition as an
information processing system, which have been utilized in AmI systems, include
58 2 Ambient Intelligence …

sensing (inspired by human sensory receptors), artificial neural networks (inspired


by the structure of biological neural networks), reasoning/inference (inspired by the
cognitive ability to connect concepts and manipulate them mentally to generate
abstractions or descriptions), and perception and action (inspired by the ability of
biological actuators that perceive a stimulus and behave in response to it).
Human-made actuators are devices that receive signals or stimulus and respond
with torque or force while biological actuators are based upon electro-magnetic-
mechanical-chemical processes and accomplished through motor responses.
Computer system outputs can be classified into different types of actuators.
AmI systems can perform in a human-brain like fashion and are even projected to
perform more powerfully than humans—in some instances. One of the goals of AI
is to develop complex computers that surpass human intelligence. Indeed, general
intelligence (known as strong or classical AI), which matches or exceeds human
intelligence continues to be among the field’s long-term goals (Kurzweil 1999,
2005). While both AI and AmI face the challenges of achieving a human-level
understanding of the world, Leahu et al. (2008) claim this is the reason why AmI is
failing to scale from prototypes to realistic systems and environments. However,
next-generation AI is aimed at the construction of fully integrated artificial cognitive
systems that reach across the full spectrum of cognition, from low-level
perception/action to high-level reasoning. At the current stage of joint research
between AI and AmI, AmI systems seem to be able—in laboratory settings—to
emulate many aspects of cognitive intelligence as a property of the mind, encom-
passing such capacities as to learn from and leverage on human behavior, to adapt,
to anticipate, to perform complex inferences, to make decisions, to solve problems,
to perceive and produce language (e.g., speech acts with prosodic features and facial
gestures), and so on. This computational intelligence of AmI systems is being
extended to include abilities of facilitating and augmenting cognitive intelligence in
action, by understanding, a form of mindreading of various cognitive dimensions of
the user context, and undertaking actions in a knowledgeable manner that support
the user’s cognitive needs. One key aim of AI is to use the computational power of
computer systems to augment human intelligence in its various forms.
The complexity of AmI systems that results from their dynamic nature and the
need to provide controllable environment for people constitutes a long-term
opportunity for the application of AI research. In order to realize the idea of AmI,
researchers must employ the state-of-the-art AI techniques. As regards the inte-
gration of AI with AmI with the aim to stimulate joint research among scholars
working in the field of computer science, vigorous investigations are active on
diverse computing topics, including design of smart and miniaturized sensing and
computing devices, embedded and distributed computing, modeling formalism
languages, knowledge representation and reasoning, service management, intelli-
gent agent-based architectures, multi-agent software, real-time operation systems,
naturalistic and knowledge-based user interfaces, natural language processing,
speech and gesture recognition, computer vision, machine learning and reasoning,
complex decision making, multimodal communication protocols, and so on. These
topics constitute currently the focus areas within AI research.
2.12 Human-Directed Sciences … 59

Cognitive science is widely applied across several fields and has much to its
credit, owing to its widely acknowledged accomplishments beyond AI and AmI. It
has offered a wealth of knowledge to the field of computing and computer science,
especially foundational concepts and theoretical models which have proven to be
valuable and seminal in the design and modeling of computing systems—the way
they cognitively function and intelligently behave (e.g., social intelligence, emo-
tional intelligence, and conversational intelligence). Indeed, it is widely acknowl-
edged that it is the major stride the cognitive science has made in the past two
decades, coupled with recent discoveries in computing and advances in AI that has
led to the phenomenon of AmI, a birth of a new paradigm shift in computing and a
novel approach to HCI. In more detail, the amalgamation of recent discoveries in
cognitive science—that make it possible to acquire a better understanding of the
cognitive information processing aspects of human mind, and the breakthroughs at
the level of the enabling technologies and computational processes and capabilities
(e.g., context awareness, natural interaction, and intelligent behavior) make it
increasingly possible to build ground-breaking intelligent (human-inspired) systems
based on this understanding. This new development entails advanced knowledge in
human functioning as to cognitive, emotional, behavioral, and social aspects and
processes and how they interrelate, coupled with innovations pertaining to system
engineering, design, and modeling. Moreover, the evolving wave of research in
computing has given rise to, and continues to inspire, a whole range of new
computing trends, namely, hitherto, context-aware, affective, haptic, situated,
invisible, sentient, calm, and aesthetic computing. In particular, the interdisciplinary
research approach increasingly adopted in the field of computing is qualitatively
shaping research endeavors towards realizing the full potential of AmI as a com-
puting paradigm. This approach has generated a wealth of interactional knowledge
about the socio-technological phenomenon of AmI.
Cognitive science spans many levels of analysis pertaining to human mind and
artificial brain, from low-level sensation, perception, and action mechanisms to
high-level reasoning, inference, and decision making. This entails a range of brain
functional systems, including cognitive system, neural system, evaluation system,
decision system, motor system, monitor system, and so forth. One major research
challenge in AmI is to create context-aware computers that are able to adapt in
response to the human users’ cognitive states and processes, with the aim to
facilitate and enhance their cognitive intelligence abilities when performing tasks in
a variety of settings.

2.12.6 Neuroscience and Cognitive Neuroscience

Neuroscience is the interdisciplinary scientific study of the nervous system; it


collaborates with such fields as computer science, AI, engineering, mathematics,
linguistics, psychology, philosophy, and so on. The neuroscience has made major
strides in the past two decades with regard to advancing the understanding of
60 2 Ambient Intelligence …

neurological patterns underlying affect, emotion, attention, and behavior.


Ontologies and knowledge from neurological disciplines are key components of
AmI applications—the structure of ambient software and hardware design.
Neurocognitive science is of particular relevance to presence technology covered in
Chap. 9.
Cognitive neuroscience is the interdisciplinary scientific study of higher cogni-
tive functions (e.g., object recognition, reasoning, language understanding, etc.) in
humans and their underlying neural substructures (bases), neural substrates of
mental processors as part of biological substrates. As an integrative field of study, it
draws mainly from cognitive science, cognitive psychology, neuroscience, and
computer science. It also has backgrounds in linguistics, philosophy, neurobiology,
neuropsychology, bioengineering, and so on. In investigating how cognitive
functions are generated by neural circuits in the brain, it relies upon theoretical
models in cognitive science and evidence from computational modeling and neu-
ropsychology. As its main goal is to understand the nature of cognitive functions
from a neural perspective, it entails two strands of research: behavioral strand, using
a combination of behavioral testing (experimental paradigm), and computational
strand, using theoretical computational modeling. In all, the concern of cognitive
neuroscience is to advance the understanding of the link between cognitive phe-
nomena and the underlying neural substrate of the brain.

2.12.7 Linguistics: Single and Interdisciplinary Subfields

Linguistics is the scientific study of natural language, the general and universal
properties of language. It covers the structure, sounds, meaning, and other
dimensions of language as a system. Linguistics encompasses a range of single and
interdisciplinary subfields. Single subfields include morphology, syntax, phonol-
ogy, phonetics, lexicon, semantics, and pragmatics, and Interdisciplinary subfields
include sociolinguistics, psycholinguistics, cognitive linguistics, and neurolinguis-
tics (see Chap. 6 for a detailed account). It collaborates with AI, cognitive science,
cognitive psychology, and neurocognitive science. Chapter 6 provides an overview
addressing the use of computational linguistics: structural linguistics, linguistic
production, and linguistic comprehension as well as psycholinguistics, neurolin-
guistics, and cognitive linguistics in relation to conversational agents and other AI
systems.

2.12.8 Human Communication

Human communication is the field of study that is concerned with how humans
communicate, involving all forms of verbal and nonverbal communication. As a
natural form of interaction, it is highly complex, manifold, and dynamic, making
2.12 Human-Directed Sciences … 61

humans the most powerful communicators on the planet. To communicate with


each other and convey and understand thoughts, feelings, messages, opinions, or
information, humans use a wide variety of verbal and nonverbal communicative
behaviors. As body movements, such behaviors are sometimes classified into
micro-movements (e.g., facial expressions, facial gestures, eye movement) and
macro-movements (e.g., hand gestures, body postures/corporal stances), in addition
to speech and its prosodic, paralinguistic, and extra-linguistic features. They have
been under vigorous investigation in the creation of AmI systems for context-aware
adaptive and responsive, dialog acts, and explicit natural (touchless) interactive
services, as they can be utilized as both explicit and implicit inputs for interface
control and interaction.
The human-directed sciences or disciplines covered thus far have been at the
core of the study, design, development, and implementation of AmI systems. AmI
represents a class of applications that is characterized by human-like cognitive,
emotional, and behavioral (conversational and social) understanding, interacting,
and supporting behaviors as computational capabilities. All in all, the aim of AmI as
a novel approach to HCI is to come closer to the aim of creating interaction between
humans and systems that is closer to natural and social interaction, by mimicking
the most pertinent aspects and processes of human functioning.

2.12.9 Philosophy

In this context, philosophy is concerned with general and fundamental questions


and problems associated particularly with reality, values, and language (see
Teichmann and Evans 1999). Accordingly, reality is the conjectured state of
technological artifacts and environments—human-like or intelligent interactive
entities—as they in point of fact exist and will exist as well as some of their aspects
that are or might be imagined in the inspiring vision of AmI—aspects of limited or
no modern applicability with reference to intelligent interaction in both real and
cyber spaces. This also includes re-imagining and rebuilding expectations about the
potential and role that new ICT as smart artifcats and environments will have in
shaping the everyday of the future and the way people construct their lives, in
particular in relation to what the prevailing notion and assumption of intelligence in
the vision of AmI stands for or can possibly stand for. Especially, AmI scenarios are
constructed in ways that treat AmI as an ‘imagined concept’ (ISTAG 2003), and
thus represent visions of lifeworlds inhabited by potential human users who are
imagined. This pertains to what modern philosophers or thinkers refer to as
thoughts of things that are conceivable as coherent abstractions but not real. As to
values, AmI is associated with both human and ethical values in the sense that
technologies may pose risks to such values. Human values, for which consideration
are unlikely to be made more explicit, or which may not be taken into account, in
the fundamental design choices that shape AmI technology can include hedonism
(pleasure and aesthetics) and other high-level values such as self-direction
62 2 Ambient Intelligence …

(independent thought and action), creativity, ownership, freedom, togetherness, and


so on. Ethical values are associated predominantly with privacy, trust and confi-
dence, security (safety, harmony, and stability of self), and so forth. As philo-
sophical fields, ethics (which is concerned with the concepts of ‘right’ and ‘good’ in
relation to individual and social behavior) and aesthetics (which investigates the
concepts of ‘beauty’ and ‘pleasure’—see Chap. 9 for a detailed account) form the
field of axiology (e.g., von Hartmann 1908), which is the philosophical study of
values. As regards to language, in the context of AmI, it pertains to the perceived
ability of AmI systems to mimic verbal and nonverbal human communication
behavior so to become able to engage in intelligent dialog or mingle socially with
human users (see Chap. 7). Thus, philosophy of language in this context deals with
such fundamental problems as the nature and origin of meaning—what it means to
mean something and what underlies meaning, language use—understanding and
producing speech acts, and the relationship between language and social reality—
how it is used pragmatically and socioculturally in terms of situational and cultural
context. And the philosophical perspective in this book is of a critical and analytical
nature in the way of addressing the various problems in question.

2.12.10 Sociology and Anthropology (Social, Cultural,


and Cognitive)

Sociology is the academic study of social behavior—i.e., behavior directed towards


society, which in a sociological hierarchy is followed by social actions from people
and directed at other people. Social processes as forms of social interactions and
social relations come further along this ascending scale. It is concerned with such
aspects of social behavior as development, structure, institutions, and roots. As a
social science, it relates to AmI from the perspective of social change, social
processes, social interaction, social structure, and so on. Drawing on social sciences
and humanities, among others, anthropology is the scientific study of past and
present humans. It entails social anthropology and cultural anthropology which
emphasize, respectively, cross-cultural comparisons (e.g., relationships between the
traits of a few societies) and examination of social context, and cultural relativism
(e.g., others’ understanding of individuals’ beliefs and activities in terms of their
own culture) and holism (e.g., viewing properties of social systems as wholes, not
as sums or collections of parts), among others. As an approach within cultural
anthropology, cognitive anthropology is concerned with the ways in which people
perceive and think about aspects of the world, physical and social reality, seeking to
explain patterns of shared knowledge (e.g., scientific discourse), cultural innovation
(e.g., AmI, ICT, etc.), among others, using cognitive science methods and theo-
retical frameworks, coupled with insights from history, linguistics, ethnography,
hermeneutics, and so on. Cognitive anthropology serves as a link between the
material and ideational aspects of culture and human cognitive or thought processes
2.12 Human-Directed Sciences … 63

(D’Andrade 1995). Rooted in cultural relativism, it deals with the implicit


knowledge of people from different groups and how such knowledge changes the
way people perceive and connect with the world around them (Ibid). Both soci-
ology and anthropology are social sciences. Social science is the academic study of
society and the relationships among individuals that constitute part of society. In
AmI, a multidisciplinary team of sociologists, anthropologists, cognitive psychol-
ogists, philosophers, designers, engineers, and so forth is required ‘to represent
realistically the complexities and subtleties of daily human living’ (Hartson 2003).

References

Aarts E (2005) Ambient intelligence drives open innovation. ACM J Interact 12(4):66–68
Aarts E, Grotenhuis F (2009) Ambient intelligence 2.0: towards synergetic prosperity. In:
Tscheligi M, Ruyter B, Markopoulus P, Wichert R, Mirlacher T, Meschterjakov A,
Reitberger W (eds) Proceedings of the European Conference on Ambient Intelligence.
Springer, Salzburg, pp 1–13
Aarts E, Marzano S (2003) The new everyday: visions of ambient intelligence. 010 Publishers,
Rotterdam
Alahuhta P, Heinonen S (2003) A social and technological view of ambient intelligence in
everyday life: what bends the trend? Tech. Rep. Research report RTE 2223/03, VTT. Espoo
Aarts E, Harwig R, Schuurmans M (2002) Ambient intelligence. In: Denning P (ed) The invisible
future. The seamless integration of technology in everyday life. McGraw-Hill, New York,
pp 235–250
Azodolmolky S, Dimakis N, Mylonakis V, Souretis G, Soldatos J, Pnevmatikakis A,
Polymenakos L (2005) Middleware for in-door ambient intelligence: the polyomaton system.
In: Proceedings of the 2nd international conference on networking, next generation networking
middleware (NGNM 2005), Waterloo
Ben-Ari M (1990) Principles of concurrent and distributed programming. Prentice Hall Europe,
New Jersey
Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A
survey of context modelling and reasoning techniques. J Pervasive Mob Comput Spec Issue
Context Model Reasoning Manage 6(2):161–180
Boring RL (2003) Cognitive science: at the crossroads of the computers and the mind. Assoc
Comput Mach 10(2):2
Bourdieu P (1988) Homo academicus. Stanford University Press, Stanford
Bourdieu P, Wacquant L (1992) An invitation to reflexive sociology. University of Chicago Press,
Chicago
Burgelman JC (2001) How social dynamics influence information society technology: lessons for
innovation policy. OECD, social science and innovation. OECD, Paris, pp 215–222
Chen L, Nugent C (2009) Ontology-based activity recognition in intelligent pervasive
environments. Int J Web Inf Syst 5(4):410–430
Cornelius R (1996) The science of emotions. PrenticeHall, Upper Saddle River
Criel J, Claeys L (2008) A transdisciplinary study design on context-aware applications and
environments, a critical view on user participation within calm computing. Observatorio
(OBS*) J 5:057–077
Cross N (2001) Designerly ways of knowing: design discipline versus design science. Des Issues
17(3):49–55
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc 3
(4):219–232
64 2 Ambient Intelligence …

D’Andrade RG (1995) The development of cognitive anthropology. Cambridge University Press,


Cambridge
del Val A (1999) How can psychology help artificial intelligence?. Interfaces da Psicologia,
University of Evora, Portugal
Denning PJ, Comer DE, Gries D, Mulder MC, Tucker A, Turner AJ, Young PR (1989) Computing
as a discipline. Commun ACM 32(1):9–23
Diamond J (1987) Soft sciences are often harder than hard sciences. Discover, pp. 34–39. http://
bama.ua.edu/*sprentic/607%20Diamond%201987.htm
Frijda NH (1986) The emotions. Cambridge University Press, Cambridge
Fritz W (1997) Intelligent systems and their societies, e-book. Buenos Aires, Argentina. http://
www.intelligent-systems.com.ar/intsyst/intsyst.htm
Galotti KM (2004) Cognitive psychology in and out of the laboratory. Wadsworth
Garlan D, Siewiorek D, Smailagic A, Steenkiste (2002) Project aura: towards distraction-free
pervasive computing. IEEE Pervasive Comput 1(2):22–31
Gill SK, Cormican K (2005) Support ambient intelligence solutions for small to medium size
enterprises: typologies and taxonomies for developers. In: Proceedings of the 12th international
conference on concurrent enterprising, Milan, Italy, 26–28 June
Gunnarsdóttir K, Arribas-Ayllon M (2012) Ambient intelligence: a narrative in search of users.
Cesagen, Lancaster University and SOCSI, Cardiff University, Cardiff
Hartson R (2003) HomeLab as a force for ensuring usability. In: de Ruyter B (ed) 365 days’ ambient
intelligence research in HomeLab. Eindhoven, NL, (Royal Philips Electronics), pp 25–26
Hellenschmidt M, Kirste T (2004) A generic topology for ambient intelligence. In: Ambient
intelligence: second European symposium, EUSAI, Eindhoven, The Netherlands, 8–11 Nov,
pp 112–123
Horvath J (2002) Making friends with big brother? Telepolis, viewed 3 Oct 2005. http://www.
heise.de/tp/r4/artikel/12/12112/1.html
ISTAG (2001) In: Ducatel K, Bogdanowicz M, Scapolo F, Leijten J, Burgelman J-C (eds)
Scenarios for Ambient Intelligence in 2010. IPTS-ISTAG, EC, Luxembourg, viewed 22 Oct
2009. ftp://ftp.cordis.lu/pub/ist/docs/istagscenarios2010.pdf
ISTAG (2003) Ambient intelligence: from vision to reality (for participation—in society and
business), viewed 23 Oct 2009. http://www.ideo.co.uk/DTI/CatalIST/istag–ist2003_draft_
consolidated_report.pdf
ISTAG (2006) Shaping Europe’s future through ICT, viewed 22 Mar 2011. http://www.cordis.lu/
ist/istag.htm
Johanson B, Fox A, Winograd T (2002) The interactive workspaces project: experiences with
ubiquitous computing rooms. IEEE Pervasive Comput Mag 1(2):67–75
José R, Rodrigues H, Otero N (2010) Ambient intelligence: beyond the inspiring vision. J Univ
Comput Sci 16(12):1480–1499
Kuhn TS (1962) The structure of scientific revolutions. University of Chicago Press, Chicago
Kuhn TS (1996) The structure of scientific revolutions. University of Chicago Press, Chicago
Kurzweil R (1999) The age of spiritual machines. Penguin Books, New York
Kurzweil R (2005) The singularity is near. Penguin Books, New York
Leahu L, Sengers P, Mateas M (2008) Interactionist AI and the promise of ubicomp, or, how to put
your box in the world without putting the world in your box. In: Proceedings of the 10th
international conference on ubiquitous computing, ACM press, Seoul, Korea, pp 134–143
Lehrer JS (2007) Hearts and minds, viewed 20 June 2012. http://www.boston.com/news/
education/higher/articles/2007/04/29/hearts__minds/
Lemons J (1996) Scientific uncertainty and environmental problem solving. Blackwell Science,
Cambridge
Lilienfeld SO, Lynn SJ, Namy L, Woolf N (2009) Psychology: from inquiry to understanding.
Allyn & Bacon, Boston
Lindwer M, Marculescu D, Basten T, Zimmermann R, Marculescu R, Jung S, Cantatore E (2003)
Ambient intelligence vision and achievement: linking abstract ideas to real-world concepts.
Design, automation and test in Europe, p 10010
References 65

Luger G, Stubblefield W (2004) Artificial intelligence: structures and strategies for complex
problem solving. The Benjamin/Cummings Publishing Company, San Francisco
Lyshevski SE (2001) Nano- and microelectromechanical systems: fundamentals of nano- and
microengineering. CRC Press, Boca Ratón
March ST, Smith GF (1995) Design and natural science research on information technology. Decis
Support Syst 15:251–266
McCarthy J (2007) What is artificial intelligence? Computer Science Department, Stanford
University, Stanford
McCorduck P (2004) Machines who think. AK Peters Ltd, Natick
Miles I, Flanagan K, Cox D (2002) Ubiquitous computing: toward understanding European
strengths and weaknesses. European Science and Technology Observatory Report for IPTS,
PREST, Manchester
Miller GA (2003) The cognitive revolution: a historical perspective. Trends Cogn Sci 7:141–144
Norman DA (1981) What is cognitive science? In: Norman DA (ed) Perspectives on cognitive
science. Ablex Publishing, Norwood, pp 1–11
Ortony A, Clore GL, Collins A (1988) The cognitive structure of emotions. Cambridge University
Press, Cambridge
Oulasvirta A, Salovaara A (2004) A cognitive meta-analysis of design approaches to interruptions
in intelligent environments. In: CHI 2004, late breaking results paper, Vienna, Austria, 24–29
Apr 2004, pp 1155–1158
Passer MW, Smith RE (2006) The science of mind and behavior. Mc Graw Hill, Boston
Picard R (2000) Perceptual user interfaces: affective perception. Commun ACM 43(3):50–51
Poole D, Mackworth A, Goebel R (1998) Computational intelligence: a logical approach. Oxford
University Press, New York
Poslad S (2009) Ubiquitous computing: smart devices, environments and interaction. Wiley,
Hoboken
Punie Y (2003) A social and technological view of ambient intelligence in everyday life: what
bends the trend? In: The European media and technology in everyday life network, 2000–2003,
Institute for Prospective Technological Studies Directorate General Joint Research Center
European Commission
Rapaport WJ (1996) Understanding understanding: semantics, computation, and cognition,
pre-printed as technical report 96–26. SUNY Buffalo Department of Computer Science,
Buffalo
Riva G, Loreti P, Lunghi M, Vatalaro F, Davide F (2003) Presence 2010: the emergence of
ambient intelligence. In: Riva G, Davide F, IJsselsteijn WA (eds) Being there: concepts, effects
and measurement of user presence in synthetic environments. IOS Press, Amsterdam
Riva G, Vatalaro F, Davide F, Alcañiz M (2005) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam
Rose S (1997) Lifelines: biology beyond determinism. Oxford University Press, Oxford
Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 1:145–
172
Russell S, Norvig P (2003) Artificial intelligence—a modern approach. Pearson Education, Upper
Saddle River
Sanders D (2009) Introducing AI into MEMS can lead us to brain-computer interfaces and
super-human intelligence. Assembly Autom 29(4):309–312
Scherer KR, Schorr A, Johnstone T (eds) (2001) Appraisal processes in emotion: theory, methods,
research. Oxford University Press, New York
Schmidhuber J (1991) Curious model building control systems. In: International joint conference
on artificial neural networks, IEEE, Singapore, pp 1458–1463
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam
66 2 Ambient Intelligence …

Soldatos J, Dimakis N, Stamatis K, Polymenakos L (2007) A breadboard architecture for pervasive


context-aware services in smart spaces: middleware components and prototype applications.
Pers Ubiquit Comput 11(3):193–212
Strimpakou M, Roussak I, Pils C, Anagnostou M (2006) COMPACT: middleware for context
representation and management in pervasive computing. Pervasive Comput Commun 2
(3):229–245
Sun R (2008) The Cambridge handbook of computational psychology. Cambridge University
Press, New York
Teichmann J, Evans KC (1999) Philosophy: a beginner’s guide. Blackwell Publishing, Hoboken
The Joint Task Force for Computing Curricula 2005 ACM, AIS and IEEE-CS (2005) Computing
curricula 2005: the overview report covering undergraduate degree programs in computer
engineering, computer science, information systems, information technology, and software
engineering. A volume of the Computing Curricula Series, viewed 25 Sept 2010. http://www.
acm.org/education/curric_vols/CC2005-March06Final.pdf
The Joint Task Force for Computing Curricula IEEE Computer Society and Association for
Computing Machinery (2004) Computer engineering 2004: curriculum guidelines for
undergraduate degree programs in computer engineering. A Report in the Computing
Curricula Series
Venable J (2006) The role of theory and theorising in design science research. In: Hevner A,
Chatterjee S (eds) Proceedings of the 1st international conference on design science research in
information systems and technology
Veneri CM (1998) Here today, jobs of tomorrow: opportunities in information technology. Occup
Outlook Q 42(3):44–57
Vera AH, Simon HA (1993) Situated action: a symbolic interpretation. Cogn Sci 17(1):7–48
Vilhjálmsson HH (2009) Representing communicative function and behavior in multimodal
communication. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals:
cognitive and algorithmic issues. Springe, Berlin, pp 47–59
von Hartmann E (1908) Grundriss der Axiologie. Hermann Haacke, Leipzig
Wegner P (1976) Research paradigms in computer science. In: (IEEE) Proceedings of the 2nd
international conference on software engineering, San Francisco, California, 13–15 Oct,
pp 322–33
Weiser M (1991) The computer for the 21st century. Sci Am 265(3):94–104
Weiser M (1993) Some computer science issues in ubiquitous computing. Commun ACM 36
(7):75–84
Weiser M, Gold R, Brown JS (1999) The origins of ubiquitous computing research at PARC in the
late 1980s. IBM Syst J 38(4):396–693
Wilson TD (2012) Soft sciences don’t deserve the snobbery. The Los Angeles Times, California
Wright D (2005) The dark side of ambient intelligence. Forsight 7(6):33–51
York J, Pendharkar PC (2004) Human-computer interaction issues for mobile computing in a
variable work context. Int J Hum Comput Stud 60:771–797
Chapter 3
Context and Context Awareness
of Humans and AmI Systems:
Characteristics and Differences
and Technological Challenges
and Limitations

3.1 Introduction

AmI environment is a context-aware system based on UbiComp, a computationally


augmented everyday environment that enables people and devices (which function
invisibly and unobtrusively in the background) to interact naturally with each other
and with their own surroundings, and that is aware of people’s context and, thus,
adaptive, responsive, and anticipatory to their needs, thereby intelligently sup-
porting their everyday lives.
This introduction is intended to be more detailed because context and context
awareness are central issues to AmI in the sense that they are given a prominent role
in the notion of intelligence alluded to in the vision of AmI: the environment that
recognizes and intelligently reacts and pre-acts to people, responding to and
anticipating their desires and intentions. Accordingly, context awareness is asso-
ciated with most enabling technologies and computational processes and capabil-
ities underlying the functioning of AmI, which are the subject of subsequent
chapters (4–6). Thus, the intent is, in addition to introducing the topic of this
chapter, to briefly elucidate and highlight the connection between context aware-
ness and the other underlying components of AmI. Context awareness is a key
feature of AmI systems and environments—in other words, it is a prerequisite for
realizing the AmI vision. As a novel approach to HCI, AmI is heralding a new class
of systems called context-aware applications and thus new ways of interaction.
Context awareness promises a rich, smooth, and intuitive interaction between
human users and technology. The availability of contextual information and the use
of context offer new possibilities to adapt the behavior of interactive applications
and systems to the current situation, providing computing environments with the
ability to tailor services based on users’ needs and settings. Just like context affects
communicative intents and behaviors of humans (see Chap. 7) in human-to-human
communication, context shapes and fundamentally changes interactive applications
and systems. Indeed, context awareness has become an essential part of HCI in

© Atlantis Press and the author(s) 2015 67


S.E. Bibri, The Human Face of Ambient Intelligence, Atlantis Ambient
and Pervasive Intelligence 9, DOI 10.2991/978-94-6239-130-7_3
68 3 Context and Context Awareness of Humans and AmI Systems …

research, based on findings that every human interaction is contextual, situated, that
is, defined and influenced by—how humans perceive and evaluate in time—the
context of a situation. In other words, context has become of particular interest to
the HCI community, as the interaction with applications and their interfaces has
become less well-structured environments.
Context awareness technology has been of prime focus in AmI research.
Research on context awareness has been intensively active for over two decades in
academic circles as well as in the industry, spanning a range of computing fields,
including HCI, AmI, UbiComp, mobile computing, and AI (e.g., affective com-
puting, conversational agents). Indeed, recent years have witnessed a great interest,
and a proliferation of scholarly writings on, the topic of context awareness,
reflecting both the magnitude and diversity of research in the field of context-aware
computing. The body of research on the use of context awareness technology for
developing AmI applications that are flexible, adaptable, and possibly capable of
acting autonomously on behalf of users continues to flourish within a variety of
application domains. As research shows, it is becoming increasingly evident that
AmI environments—and hence context-aware applications, which can support
living, work, and social places, will be commonplace in the near future due to
recent developments in computer hardware, software, and networking technologies.
These encompass miniaturized sensors, sensor networks, pattern recognition/
machine learning techniques, ontological context modeling and reasoning tech-
niques, intelligent agents, wireless and mobile communication technology, mid-
dleware platforms, and so forth. Most of these technologies constitute the object of
subsequent chapters, whereby they are described, discussed, and put into per-
spective to provide an understanding of their role in the functioning of AmI
applications and environments. However, while there exist numerous technologies
for the development and implementation of context-aware applications, which
indicate that most research focuses on the development of technologies for context
awareness as well as the design of context-aware applications, there is a need,
within the field of context-aware computing, for conducting further studies with
regard to understanding how users perceive, use, and experience context-aware
interaction in different settings. In other words, the focus should be shifted from
technological to human and social dimensions of AmI. Context awareness poses
many issues and challenges that should be addressed and overcome in order to
realize the full potential of AmI vision.
AmI constitutes an approach to HCI that is built upon the concept of implicit
HCI (iHCI) (see Chap. 6). Creating an ambient intelligent human–computer
interface is based on iHCI model that takes the users’ context as implicit elements
into account. One key feature of this model is the use of natural human forms of
communication—based on verbal and nonverbal multimodal communication
behavior (see Chap. 7). These can be used by iHCI applications to acquire con-
textual information about the user—e.g., emotional, cognitive, and physiological
states and actions—so to respond intelligently to the current user’s context. iHCI
applications also use and respond to other subsets of context associated with the
environment, such as places, locations, and physical conditions. In this chapter,
3.1 Introduction 69

context awareness is primarily considered from the view point of HCI applications.
Given the scope of this book, the emphasis is on AmI applications showing
human-like understanding, interacting, and intelligent behavior in relation to cog-
nitive, emotional, social, and conversational processes of humans. Furthermore, to
establish implicit interaction, particularly context-aware functionality, various ele-
ments are required in order to collect, fuse, aggregate, process, propagate, interpret,
and reason about context information in support of users’ needs. These computa-
tional elements are addressed in the next chapter.
Furthermore, research shows that context awareness has proven to be a complex,
multilevel problem with regard to realization. First, what constitutes context and
context information as a delimitation has been of no easy task, and this difficulty
overarches all research in context-aware computing. There are several conceptual
and technical definitions that have been suggested, generating a cacophony that has
led to an exasperating confusion in the field. Second, the current vision of user
centrality in AmI technology design has been questioned and continues to be a
subject of criticism. This is related to the issue of disappearing of user interfaces
associated with context-aware applications in terms of who define the context and
adaptation rules and other HCI issues. Third, detecting contextual data using the
state-of-the-art sensor technologies and how this affects reasoning processes in
terms of inferring high-level abstractions of contexts due to limited and imperfect
data seems to be insurmountable issue in the field of AmI. Fourth, modeling
context, especially human factors related context (e.g., emotional state, cognitive
state, social state, etc.), has proven to be one of the most challenging tasks when it
comes to context representation and reasoning and the related knowledge domains
and adaptation rules. Of these issues, this chapter covers the first and second topics.
The two remaining issues are addressed in Chaps. 4 and 5, respectively.
Specifically, this chapter looks into the concept of context in relation to both
human interaction and HCI, espousing a transdisciplinary approach, and delves into
the technological and social dimensions of context awareness, focusing on key
aspects that are theoretically disputable and questionable in the realm of AmI and
pointing out key challenges, open issues, and limitations.

3.2 Context from a Transdisciplinary Perspective

Context and how it influences interaction is a multifaceted problem that cannot be


resolved from the vantage point of a single discipline. And interdisciplinary
approach remains inadequate as well to tackle this issue given that context is
inherently dynamic, fluid, subjective, subtle, and multifarious, and interaction is
accordingly situated, ad-hoc, manifold, and complex. Similarly, impacts of
context-aware applications in relation to HCI well exceeds the interdisciplinary field
as looking at context from a perspective that is inspired by research in situated
cognition and situated action (see Suchman 1987) or those theoretic disciplines
dedicated to the study of context (see Goodwin and Duranti 1992) may affect how
70 3 Context and Context Awareness of Humans and AmI Systems …

context and context awareness is conceptualized in cognitive science, AI, and AmI.
The situated nature and inherent complexity of interaction (as cognitive and social
process and behavior) makes it very difficult to grasp context and context awareness,
in relation to human interaction. Human interaction, while systematic, is never
planned; instead, it is situated and ad-hoc: done for a particular purpose as necessary,
for its circumstances are never fully anticipated and continuously changing, to draw
on (Suchman 2005). It entails meaning, which is subjective and evaluating in time
and hence open to re-interpretation/-assessment; this meaning influences the per-
ception of the context of a situation which defines and shapes human interaction.
Hence, transdisciplinary approach remains the most relevant approach to look at
context as a complex problem, as it insists on the fusion of different elements of a set
of theories with a result that exceeds the simple sum of each. Aiming for transdis-
ciplinary insight, the present study of context draws on several theories, such as
situated cognition, situated action, social interaction, social behavior, human com-
munication, and so on. Understanding the tenets of several pertinent theories allows
a more complete understanding of context both in relation to human interaction and
to HCI. Among the most holistic, these theories are drawn from cognitive science,
social science, humanities, philosophy, constructivism, and constructionism. The
intent here is to set side-by-side elements of a relevant set of theories that have clear
implications for the concept under study—context.
Tackling the topic of context, fully grasping it, and clearly conceptualizing it are
all difficult tasks. The underlying assumption is that context touches upon the
elementary structures of interactions in the everyday life world. Human interaction
is a highly complex and manifold process due to the complexity inherent in that
which constitutes context that defines, shapes, and changes that interaction. This
complexity lies particularly in the interlocking and interdependent relationship
between diverse subsets of rather subjectively perceived contextual entities, not
only as to persons, but also to objects, events, situations, and places. So context is
more about meanings that are constructed in interaction with these entities than
about these entities. Constructivist worldview posits that human interaction is
always contextual situated, and meaning to it is ascribed within this changing
context—i.e., evolving perceptions or reinterpretations of a situation. This is related
to the view that reality is one of intersubjective constructed meanings that are
defined in interaction with regard to the different entities involved in a given
situation, rather than a world that consists of epitomes or facts that epitomize
objects. The is related to objectivistic worldview, where distinct objects have
properties independent of the observer, that is, the meaning of a phenomenon is
inherent to the phenomenon and can be experienced by interacting with it.
However, context is interwoven with the view on social and physical reality and the
ontological nature and structure of the life-world—how phenomena and things in
the reality are related and classified—with respect to social interaction. Social and
human sciences posit that cognitive, social, and cultural contexts are taken into
account for explaining social interactions and related processes, a perspective which
emphasizes contextualizing behavior when looking for explaining social behavior.
3.2 Context from a Transdisciplinary Perspective 71

The complexity of human interaction, in particular context and meaning attri-


bution to it, has implications for the development of models, approaches, and
techniques and their implementation into AmI systems and environments as
intelligent entities. Therefore, in relation to AmI as a novel approach to HCI, an
advanced understanding of the relationship between context and interaction (see
next section) is crucial to designing well-informed and successful AmI systems and
environments that are able to emulate human-to-human interaction with regard to
such aspects as context awareness and natural interaction and the associated
intelligent behavior. AmI represents a smart interactive environment augmented
with context awareness functionality, a human-like interaction capability which is
necessary for such systems and environments to behave intelligently when inter-
acting with human users—that is, delivering smart services that better match uses’
emotional, cognitive, and social needs.

3.3 Context (and Context Awareness) in Human


Interaction

Context is a fundamental aspect of everyday life. It is all around us: it defines,


shapes, and changes the patterns underlying our interaction with each other and with
the environment and its artifacts. Humans ascribe meanings to interaction acts—
within the changing context. Specifically, meaning is influenced by context—how
context is perceived as an expression of a certain situation, so too is interaction in
this situation. Since all interactions entail meaning and meanings shape and influ-
ence how we interact—our interactive behavior—all interactions have a contextual
aspect. Indeed, meaning to interaction and, thus, meaningful interactive actions are
only made meaningful within the constitutive abstract space of a context.
Particularly, in social interaction, context conditions what we see, say and do, and
thus the way we think, understand, learn, and know, although we’re seldom
explicitly aware of it. Indeed, by their very nature, humans are exquisitely or deli-
cately attuned to their context. It is worth noting that is the way people perceive
context that determines how they act, react, or pre-act—e.g., intelligently, sponta-
neously, intentionally, etc.—within a variety of, and based on different, situations. In
other words, context is about the meanings that are ascribed to its constituting
entities, which are constructed in interaction with such entities. In this sense, context
entails the mentally represented structures (e.g., schemata, representations, models,
and processes providing organization to learned experiences and facilitating learning
of novel experiences) and socioculturally shared constructs that are relevant for the
understanding of, and response to, interactive situations. Hence, it involves situation,
event, setting (place and time), physical conditions, ongoing actions, and roles of the
involved persons and their psychophysiological/emotional and biochemical states as
well as their mental and social representations: background, perspectives,
goals/intentions, knowledge, experiences, opinions, attitudes, values, ideologies,
72 3 Context and Context Awareness of Humans and AmI Systems …

and so on. In other words, context constitutes an infinite richness of assumptions and
factors, against which relevant facts and concerns are delimited in the form of
dynamic, collective interweaving of internal and external entities, including moti-
vational, emotional, cognitive, physiological, biochemical, pragmatic, empirical,
ethical, normative, intellectual, behavioral, relational, paralinguistic, extra-linguistic,
social, cultural, situational, physical, and spatiotemporal elements. Hence, context
can be described as a complex set, or the totality, of intertwined circumstances which
provide a setting for interaction. It is in terms of the setting formed by those cir-
cumstances that everything can be fully understood, evaluated, and eventually
reacted to. In all, contextual assumptions selected based on daily situations of life
enable to delimit relevant facts and concerns that condition our judgments, claims,
and decisions against myriad other circumstances, and the overall context conditions
our perceptions and understandings of the social world—meaning, truth, relevance,
and rationality—and hence our notions of actions in it. Context is ‘the totality of
contextual assumptions and selections that give meaning and validity to any piece of
information; that is, context awareness is an ideal, and ideals usually resist complete
realization. This is why we need them: because they resist full realization, they give
us critical distance to what is real.’ (Ulrich 2008, p. 6)
However, in context-aware computing, it is important to look at everyday human
interactions and the way in which they get shaped and influenced by context, when
attempting to model and implement context awareness in computing systems. To
understand the relationship between human context and interaction, there is a need
to excavate to add much to our current understanding as to what constitutes context
and what underlies the selectivity of our contextual assumptions that condition—
defines, surrounds, and continuously change—our (inter)actions. However, there is
a fundamental difference between human and nonhuman context awareness—
context awareness in computing. According to Ulrich (2008, p. 7), the crucial
difference between the two ‘can be expressed in various ways. In the terms of
practical philosophy…, human context includes the dimension of practical-
normative reasoning in addition to theoretical-empirical reasoning, but machines
can handle the latter only. In phenomenological terms, human context is not only a
“representational” problem (as machines can handle it) but also an “interactional”
problem, that is, an issue to be negotiated through human interaction…. In semiotic
terms, finally, context is a pragmatic rather than merely semantic notion, but
machines operate at a syntactic or at best approximated semantic level of under-
standing’. This conspicuous difference implies that the specifics of context in real
life are too selective, subjective, subtle, fluid, and difficult to identify, capture, and
represent in computationally formal models. This would subsequently make is
difficult for context-aware applications to make sensible estimations about the
meaning of what is happening in the surrounding situation or environment, e.g.,
what someone is feeling, thinking, or needing at a given moment, and to undertake
in a knowledgeable manner actions that improve our wellbeing or support our tasks.
Indeed, it always makes sense to question contextual assumptions that condition
our interaction, as the context needs to be selected, framed, negotiated, and
3.3 Context (and Context Awareness) in Human Interaction 73

reconstructed, and thus is never given in the first place, and this goes much deeper
than how interactive computer systems understand us and our context and what
they decide for us, e.g., every human interaction involves a situational, physical,
psychological, social, and ethical/moral context.

3.4 Definitional Issues of Context and Their Implications


for Context-Aware Computing

Typically, multiple definitions emerge when dealing with multifaceted concepts.


Context is an inherently complex and multifarious concept. At present, the number
of theoretical definitions of context is large. Notwithstanding the agreement on
many issues, there is still no definitive theoretical definition. In other words, context
has proven to be difficult to theoretically delineate: ill-defined concept. This ema-
nates from the complexity inherent in comprehending its characteristics—dynamic,
fluid, subtle, subjective, unstructured, changeable, volatile, indiscernible, intracta-
ble, and multidimensional—as well as how its entities or components interrelate
and coordinate dynamically to form an amalgam that shapes interaction. This
amalgam epitomizing what is to be selected as facts and values includes cognitive,
emotional, psychophysiological, biochemical, behavioral, situational, environ-
mental, spatiotemporal, social, cultural, normative, and/or historical aspects—a set
of intertwined contextual assumptions and selections.
Similarly, in context-aware computing, the term ‘context’ has been technically
used in multiple ways, with different meanings in different contexts—that is, it is so
malleable as to mean different things to different people. In other words, defining the
notion of context depends on the application domain and what this entails in terms of
the diversity and multiplicity of the features of context that can be incorporated in the
design and development of context-aware applications to achieve a particular
intended performance of the system. Features of context include the user’s mental
and physiological states, setting (place and time), location, situation, activity, social
environment, physical environment, and so forth. The way these features have been
selectively combined and incorporated in context-aware application has led to a
great deal of confusion concerning the definition of context awareness in AmI, and
an alphabet soup of the so-called context-aware applications.
Within a specific application domain, the technical definition of context deter-
mines how context should be operationalized and thus measured, modeled, encoded,
processed, inferred, and responded to. Moreover, context-aware applications differ
as to the technical details concerning the use of sensor technologies, capture
approaches, recognition algorithms, modeling approaches (representation methods
and reasoning techniques), and query languages. It is worth pointing out that these
computational tools and processes are usually suitable for different applications. And
the suitability is contingent upon the nature of context being measured and assessed
and the features of the concrete applications. In many cases they can be used in
74 3 Context and Context Awareness of Humans and AmI Systems …

combination in order to yield optimal context recognition results (see Chaps. 4 and 5
for illustrative examples). Example of context-aware applications include:
emotion-aware, cognitive task-aware, activity-aware, location-aware, event-aware,
situation-aware, conversational context-aware or affective context-aware systems.
Indeed conversational and affective systems, a category which falls under AI
research, have recently started to focus on context, namely dialog, environmental,
and cultural context, and the contextual appropriateness of emotions and multimodal
context-aware affective interaction, respectively (see Chaps. 7 and 8).
Furthermore, from most of the context research thus far, one of the main issues
continues to be the lack of clarity of, or the ambiguity surrounding, what constitutes
context: how to define the term and how properly or best to make use of it. There is
an exasperating lack of agreement as to what characterizes context: there are almost
as many different technical definitions as research areas within context-aware
computing. Researchers in the field seem to have no propensity to espouse an
agreed upon technical definition. Hence, it is more likely that context will continue
to take on different technical connotations depending on its context of use. Yet there
is a need for a more comprehensive definition of context, with high potential to be
actually implementable in context awareness architectures—in other words, an
operational definition that enable context-aware applications to sense and combine
as many aspects of context as possible for better understanding of and thus satis-
fying users’ needs. Towards this end, it is important to focus on a discussion of the
difference between context in its original complex definition and the so-called
ontological, logical, and probabilistic models of context being implemented in AmI
applications. It is also of significance to shift the focus of context debate from
whether it is technically feasible to capture the (complex) meaning of context in
more theoretic view to what can be done to develop innovative technologies,
techniques, and mechanisms pertaining to design and modeling that can allow to
operationalize complex concepts of context, close to context in those academic
disciplines specialized on the subject matter or devoted to the study of context (see
Goodwin and Duranti 1992 for an overview). The significance of taking this into
account stems from the high potential to enhance the functioning and performance
of context-aware applications, and thus the acceptance and use of AmI technology.
Especially, at the current stage of research, it seems to be unfeasible to adopt a
conceptual or theoretical definition given the constraints of existing technologies and
engineering practice that dictate the design and modeling of computational artifacts.
Indeed, the development of context-aware artifacts appears to be technology-driven,
driven by what is technically feasible rather than by what constitutes context in
real-world scenarios. This implies that some, if not most of cognitive, emotional, and
social aspects of context cannot be sensed by existing technology. Consequently, the
context determined or the ambience created by context-aware artifacts may differ
from what people involved in the situation have negotiated and how people perceive
to the actual context—subjective, socially situated interpretation of context. Indeed,
context is about meanings that are constructed in interaction with entities, such as
3.4 Definitional Issues of Context and Their Implications … 75

objects, people, places, events, situations, environments, and so on, than about
entities as such, which is a strange switch to make in light of the constraints of the
state-of-the-art enabling technologies and computational processes.

3.5 Conceptual Versus Technical Definitions of Context

The purpose here is to take a theoretical tour through the various ways of under-
standing or interpreting context. The definition of context is still a matter of debate,
although defining core concepts is a fundamental step in carrying out scientific
research. The scholarly literature on context awareness, whether theoretical,
empirical, or analytical, shows that the definition of context has widely been rec-
ognized to be a difficult issue to tackle in context-aware computing. This difficulty
overarches all research in context awareness, notwithstanding the semantics of what
constitutes context and context information has been studied extensively and dis-
cussed profusely. However, many definitions have been suggested. They are often
classified into technical and conceptual: a restricted application-specific approach
and an encompassing theoretical approach. Technical definition of context is
associated with the technical representation of context information in a particular
application domain. It is technology—driven—by what is technically feasible with
regard to the existing enabling technologies, especially sensors used to measure
some features of context and representation formalism used to encode and reason
about context information. Accordingly, this technical approach entails that the
context of the application is defined by the designer and bounded by his/her con-
ception as to how to operationalize and thus conceptualize context. In this case, the
representation of the context of an entity in a system is of interest to a service
storage or provider for assessing the relevance and user-dependent features of the
service to be delivered. In all, technical definition can be applied to a context
representation as a computational and formal scheme and provides ways to dis-
tinguish (subsets of) contexts from each other, e.g., location, emotional state,
cognitive state, task, activity, time, spatial extent, and so on. Examples of technical
definitions can be found in Schmidt et al. (1999), Turner (1999), Chen and Kotz
(2000), Strang et al. (2003), Loke (2004), Kwon et al. (2005), Lassila and Khushraj
(2005) and Kim et al. (2007).
Similarly, there are many conceptual definitions of context. The most cited one in
the literature is the one provided by Dey (2000), from the perspective that
context-aware applications look at the who’s, where’s, when’s and what’s of dif-
ferent entities and use this information to determine why a situation is occurring. He
accordingly describes context as: ‘any information that can be used to characterize
the situation of an entity. An entity is a person, place, or object that is considered
relevant to the interaction between a user and an application, including the user and
applications themselves. Context is typically the location, identity and state of
people, groups and computational and physical objects’ (Dey 2000; Dey et al. 2001).
Dey’s definition depicts that the concept of entity is fundamentally different from
76 3 Context and Context Awareness of Humans and AmI Systems …

that of context: context is what can be said about or describe an entity. For example,
a user as an entity has such constituents of context as location, emotional state,
cognitive state, intention, social setting, cultural setting, and so on. Dey (2000) also
provides a comprehensive overview of existing definitions of context (e.g., Schilit
et al. 1994; Pascoe 1998; Chen and Kotz 2000; Hull et al. 1997; Brown 1996). These
are adopted in the literature on context awareness. Pascoe (1998) suggests that
context as a subjective concept is defined by the entity that perceives it. Chen and
Kotz (2000) make a distinction between active and passive aspects of context by
defining context as ‘a set of environmental states and settings that either determines
an application’s behavior or in which an application event occurs and is interesting to
the user’. Schilit et al. (1994) view context as the user’s location, the social situation,
and the nearby resources. More to context definitions, Schmidt et al. (1999) describe
context using a context model with three dimensions of Environment (physical and
social), Self (device state, physiological and cognitive), and Activity (behavior and
task). Göker and Myrhaug (2002) present AmbieSense system, where user context
encompasses five elements: environment context, personal context, task context,
social context, and spatiotemporal context. Looking at the ‘Context of Work’, Kirsh
(2001) suggests a more complex description of context: ‘highly structured amalgam
of informational, physical and conceptual resources that go beyond the simple facts
of who or what is where and when to include the state of digital resources, people’s
concepts and mental state, task state, social relations and the local work culture, to
name a few ingredients.’ This definition captures quite many aspects of what con-
stitutes a context from a conceptual level. It encapsulates more additional features
that make up context than other definitions. Yet, this definition, like most of con-
ceptual definitions, remains far from a real-world implementation: difficult to op-
erationalize or turn into workable, given the existing technological boundaries:
systems. Indeed, as noted by Kirsh (2001) there are many aspects of context that
have not yet been technologically sensed and could be very difficult to capture,
highlighting that this is a non-trivial task. Besides, context frameworks derived from
theoretical definitions of context are usually not based on a systematic analysis of
context and need to be supported by empirical data. Furthermore, Abowd and
Mynatt (2002) suggest that context can be thought of in terms of ‘who is using the
system’; ‘for what the system is being used’; ‘where the system is being used’;
‘when the system is being used’; and ‘why the system is being used’. While this
approach provides initial definitions of key features of context that could be rec-
ognized, it indicates neither how these features relate to specific activities nor how
they could be combined in the inference of a context abstraction. This applies to
similar definitions that suggest such classification scheme as person, task, object,
situation, event, and environment. Speaking of relationship between entities,
Crowley et al. (2002) introduce the concepts of role and relation in order to char-
acterize a situation. Roles involve only one entity, describing its activity. An entity is
observed to play a role. Relations are defined as predicate functions on several
entities, describing the relationship between entities playing roles. A related defi-
nition provided by Gross and Prinz (2000) describes an (awareness) context as ‘the
interrelated conditions in which something exists or occurs’.
3.5 Conceptual Versus Technical Definitions of Context 77

From a conceptually different angle, defining context from an activity point of


view, Nardi (1996) discusses method for describing context with respect to learning
behavior. In light of a contrast between Activity Theory, Situated Action Models,
and Distribute Cognition, she concludes that Activity Theory appears the richest
framework to study context in its comprehensiveness. Fundamentally, like
Distributed Cognition, Activity theory starts from the goal of the activity. Differing
objects is what differentiate one activity from another Activity (Kozulin 1986;
Kuutti 1991). In other words, an activity is typified by the different objects that are
manipulated during its operation, human interaction with objects in the environment
and its artifacts. Indeed, this approach has emerged to address some issues asso-
ciated with existing context recognition approaches (e.g., Philipose et al. 2004). The
basic assumption is that the activity of the user can be used to determine the context
he/she is in, as it provides relevant contextual information, which can be used by
the system to guide its behavior in ways that respond intelligently to that context.
Action Theory is also of particular relevance to cognitive context-aware systems
where the dimension of context, high-level context, is to be deduced from internal
context (as an atomic level of the context) such as user’s intention and work process
as part of what he/his doing as a cognitive activity, so that such systems can provide
cognitively adaptive services. However, attempting to computationally infer the
goal of the activity has proven to be no easy task, although sensors can provide
powerful cues about the (cognitive) activity being undertaken. Accordingly, while
Action Theory could be regarded as a comprehensive approach to describing
context, turning it into a working system is a daunting challenge in the province of
context-aware computing. Indeed, realizing a system that senses, models, under-
stands, and infers a cognitive context has proven to be one of the most difficult
issues to deal with in context awareness. Consequently, when operationalizing the
concept of cognitive context, it stands to reason for software engineers to opt for a
simplified concept instead of a comprehensive one. This in fact applies to all types
of context (e.g., emotional context, social context, task context, situational context,
etc.). Simplifications (operationalizing contexts by focusing on particular aspects)
seem to be typical and necessary when developing context-aware systems.
Nevertheless, it suffices in some applications that the system developer applies the
notion of context and context awareness meaningfully enough to design applica-
tions that provide specific services to users.

3.6 Definition of Context Awareness

Context-aware computing is the catchphrase nowadays. Computer systems are


becoming ubiquitous, are not always in the same location, and might not be used by
the same user or within the same social environment, adding to interaction
becoming user-centered and the increased mobility of users. Hence, it is of high
relevance to study context awareness in relation to HCI in the emerging paradigm
of AmI.
78 3 Context and Context Awareness of Humans and AmI Systems …

As a system property or an application-specific trait, context awareness (also


referred to as sensor-based perception) indicates that a computer component is able
to acquire information about the context of the user and of itself for further pro-
cessing (interpretation and reasoning) towards inferring a high-level abstraction of
context. Underlying the technical term ‘context awareness’, originated in UbiCom,
is the idea that technology is able to sense, recognize, and react to contextual
variables, that is, to determine the actual context of its use and adapt its func-
tionality accordingly or respond appropriately to features of that context. This
represents a common thread running through all definitions; a veritable flood of
studies have defined the notion and applied it to different applications and com-
puting environments since the notion of context-aware computing was introduced
by Schilit et al. (1994). According to these authors, systems that utilize information
about the situation of its users, the environment, or the state of the system itself to
adapt their behavior are called context-aware systems (Schilit et al. 1994). This is
most often-quoted and widely used definition in the literature on context awareness.
This definition depicts that a system analyzes and reacts to the surrounding,
changing context consisting of various elements of relevance to the situation of the
user. With emphasis on task as a constituent of the context of user as an entity, Dey
(2000) describes a context-aware system as one that uses context to provide rele-
vant information and services to the user, where relevancy depends on the user’s
task. Considering the focus of this book, this type of systems is associated with
what is called cognitive context-aware application (see below), which offer relevant
information services to the user, where relevancy depends on the user’s cognitive
states or processes—different cognitive activities may be involved in a given task or
even a subtask—and the related intention or goal. According to Schmidt (2003),
context-aware system entails the acquisition of context using sensors to perceive a
situation, the abstraction of context by matching a sensory reading to a context
concept, and application behavior through triggering actions based on the inferred
context. In this sense, context awareness capabilities provide computing environ-
ments with the ability to adapt the services they provide according to the user’s
current context. The provision of services by a general purpose context-aware
application should ideally occur as a result of a sequence of interdependent,
interrelated, synchronized changes in the states of the computational processes of
sensing, interpretation, reasoning, action, and, more importantly, monitoring (how
the user reacts to service), in response to the evaluation of a set of contextual
elements as both internal and external stimuli event that are of central relevance to
the major needs and goals of the user interacting with the application as an intel-
ligent entity.
Furthermore, context awareness as a technological concept—applied to specific
applications—seems overall to be well understood and unambiguous in most cases,
as the definition of context awareness varies depending on the application domain:
the type of context or the number of the subsets of context of an entity that can be
incorporated in the design of a given application. Moreover, there is a tendency in
context-aware computing towards reducing the complexity of context awareness:
3.6 Definition of Context Awareness 79

alienating the concept from its complex meaning as related to human interaction—
in more theoretic view—to serve technological purposes. Building context-aware
artifacts is not an easy task, and the implementation of context awareness is
computationally limited and its implications are not always well understood. There
is ‘little awareness of human context awareness in this fundamental and rich sense,
which relates what we see and do not only to our physical environment but equally
to our emotional, intellectual, cultural and social-interactive environment; our sense
of purpose and value, our interests and worldviews; our identity and autonomy as
human beings. Instead, context appears to have been largely reduced to the physical
(natural as well as architectural or otherwise human-constructed) properties of the
location and its nearby environment, including wireless capabilities and the
opportunities they afford for co-location sensing and information retrieval.
The advantage, of course, is that we can then entrust our devices and systems with
the task of autonomously detecting, and responding to, features of the context…
I would argue that information systems research and practice, before trying to
implement context awareness technically, should invest more care in understanding
context awareness philosophically and should clarify, for each specific application,
ways to support context-conscious and context-critical thinking on the part of users.
In information systems design, context-aware computing and context-critical
thinking must somehow come together, in ways that I fear we do not understand
particularly well as yet.’ (Ulrich 2008, p. 4, 8).

3.7 Context Taxonomy

In the literature on context-aware computing, several attempts have been under-


taken to classify context. There is therefore a wide variety of taxonomies that have
been offered based on different authors’ perspectives. Only common taxonomies are
introduced, described and discussed. One approach into context categorization is
advanced by Schmidt et al. (1999) who frame context as comprising of two main
components: human factors and physical environment. Human factors related
context encompasses three categories: information on the user (knowledge of
habits, emotional state, bio-physiological conditions), the user’s tasks (activity,
engaged tasks, general goals), and the user’s social environment (social interaction,
co-location of other, group dynamics). Similarly, physical environment related
context encompasses three categories: location (absolute position, relative position,
co-location), infrastructure (computational communication and information
resources, task performance), and physical conditions (light, temperature, pressure,
noise). Moreover, depending on the application domain, context-aware systems
may use different types of contextual elements to infer the dimension of context
such as physical or psychological. As another common taxonomy, context can be
classified into external and internal context. The external context is a physical
environment, while the internal context is a psychological context (or state) that
does not appear externally (Giunchiglia and Bouquet 1988; Kintsch 1988).
80 3 Context and Context Awareness of Humans and AmI Systems …

Examples of external context include: location, time, lighting, temperature,


co-location (or proximity) of people and objects, group dynamics, activity, feeling
(as external expression), spatiotemporal setting and so on. Examples of internal
context include: psychophysiological state, motivational state, affective state, cog-
nitive state, personal event, intention and so forth. A closer look at this taxonomy
depicts that human factors related context may appear externally or/and internally
such as cognitive state. Cognitive processes can be indicated by facial expressions
(Scherer 1992) or accurately reflected by eye movements (Tobii Technology 2006).
As to user’s intention as an internal context, it can be recognized using software
inference algorithms (see Prekop and Burnett 2003; Kim et al. 2007; Kwon et al.
2005) based on the user’s ongoing task. Also, it is possible to infer the cognitive
dimension of the context like information retrieval, decision making and product
design by using internal context such as user’s intention, work context, personal
event and so on (Gwizdka 2000; Lieberman and Selker 2000). Likewise, using
external context such as location, place, lighting and time as a low level context, we
can deduce the physical dimension of the context like ‘having meeting’, ‘shopping’
and ‘watching TV’ as a higher level of the context. See Chaps. 4 and 5 for more
detail on the process of context inference: transformation of low-level context data
into contextual cues and then into a higher level context abstraction. In relation to
this, from a context modeling perspective, Lee et al. (2009) point out that context
can be represented as static-dynamic continuum; static context describes informa-
tion such as user profile while dynamic context describes wisdom obtained by
intelligent analysis, which involves reasoning and interpretation processes for
inferring high-level context abstraction. User profile (e.g., identity, habits, interests,
tastes, social category, culture, social status) as information provided by the user is
updated more rarely and in general does not need additional interpretation. Adding
to user profile, there are context data that can be derived from existing context
information, which can be obtained from databases or digital libraries such as maps.
Another type of context information source which differs in its update rate and
semantic level is the context data that represent certain states of the physical
environment such as location, time, temperature and lighting and provide fast and
near real-time access, while providing rather raw data that has to be interpreted
before being usable by applications. Dynamic context data include users’ adaptation
preferences, emotional displays, activities, and so on. By its very nature, dynamic
context data is usually associated with fuzziness, uncertainty and vagueness. Hence,
in terms of modeling, it is argued that ontologies are not well suited to represent
some dynamic context data adaptation preferences; rather, such data can be more
profitably modeled by lower-complexity, restricted logics (e.g., Bettini et al. 2010).
Generally, the predicate logic is well suited for expressing dynamic context
abstractions, and a three-valued logic is a deeper support for modeling and rea-
soning about uncertainty (see Chaps. 4 and 5 for a discussion of uncertainty issues).
However, further to context taxonomy, Dockhorn et al. (2005) propose an approach
that categorize context into intrinsic or relational. They state that these concepts are
in line with conceptual theories in the areas of philosophy and cognitive science.
Intrinsic context describes ‘a type of context that belongs to the essential nature of a
3.7 Context Taxonomy 81

single entity and does not depend on the relationship with other entities’, e.g., the
location of a spatial entity, such as a person or a building. Relational context
describes ‘a type of context that depends on the relation between distinct entities’,
e.g., containment which defines a containment relationship between entities, such as
an entity building that contains a number of entity persons. Related to developments
in foundational ontologies, this categorization of context is analogous to the
ontological categories of moment defined in Guizzardi (2005).

3.8 Interactivity Levels of Context-Aware Applications

There are different interactivity levels of context-aware applications. Common are


passive and active approaches; yet they tend to have different names. The level of
interactivity varies greatly within numerous context-aware applications, ranging
from presenting context-aware information or services to the user and letting him/her
decide—define manually parameters on—how the application should behave and
providing the user with services and information in an autonomous way that the
developer finds relevant, that is, based on what the application recognizes or infers as
context, to espousing a hybrid approach which entails combining both system-driven
and user-driven approaches. Chen and Kotz (2000) make a distinction between
active and passive aspects of context, stating that passive context-aware applications
present updated context (or sensor information) to the user and lets the user decide
how to change the application behavior; whereas active aware applications auton-
omously changes the application behavior according to the sensed information or on
the basis of sensor data. They believe that two types of context provide a deeper
understanding of context-aware computing. This relates to the current debate over
complementing invisibility with visibility in context-aware computing (see Chap. 6
for a detailed discussion). However, in terms of context-aware personalized services,
the common two levels of interactivity have been termed differently: passive versus
active (Barkhuus and Dey 2003b; Chen and Kotz 2000); pull versus push (Cheverst
et al. 2001); interactive versus proactive (Brown and Jones 2001); and sometimes
implicit versus explicit. For example, in Brown and Jones (2001), proactive is
described the same way as active context awareness whereas interactive refers to
passive contex awareness. However, pushing information towards the user is the
commonly used approach in context-aware computing. As noted by Erickson
(2002), researchers consider only push based applications to be context-aware. In
recent years, the perception has grown that context-aware services, whether per-
sonalized, adaptive, or proactive should ideally be based on a hybrid model in
context-aware (or iHCI) applications. In their investigation of whether the user
should be left to pull the information on his own or information should be pushed
towards the user in context-aware systems, Cheverst et al. (2001) found that users’
sense of control decreases when autonomy of the service increases. For this reason
and many others that are taken up in Chaps. 6 and 10, a hybrid approach remains the
way forward as to complementing invisibility with visibility in context-aware
82 3 Context and Context Awareness of Humans and AmI Systems …

computing, as it provides a deep understanding of context awareness and also


reflects a way of considering user variations and accommodating them explicitly, an
aspect which is of value, if not necessary, for user and social acceptation of AmI
technologies. Therefore, to provide context-aware services, designers of
context-aware systems should consider psychological, behavioral, and sociocultural
factors.

3.9 Context-Aware Systems

3.9.1 Technological Dimensions and Developments


and Application Domains

The context-aware computing literature includes a vast range of context awareness


architectures that basically aim to provide the appropriate infrastructure for
context-aware systems (see Bravo et al. 2006). As a critical enabler for AmI, context
awareness entails a myriad of embedded, distributed, networked, and always-on
computing devices that serve users while being perceptive enough of situations,
events, settings, and the environment around them. A countless number of sensing
and other computing devices are invisibly embedded in everyday objects and spread
throughout the environment-augmented real or simulated settings, and able to
communicate seamlessly across disparate networks, in the midst of a variety of
heterogeneous hardware and software components and systems. With such com-
puting devices becoming entrenched in everyday objects such as computers, mobile
phones, cars, parking lots, doors, lights, appliances, tools, toys, and so on, a pro-
liferation of context-aware systems can be envisioned. Context is a fundamental
aspect of everyday life interaction. The design principles of context-aware systems
tend to address special requirements for diverse application domains, such homes,
workplaces, cars, public places, and on the move. Tremendous opportunities reside
in deploying and implementing context-aware systems on different scales, intelli-
gence and distribution, ranging from location based context-aware applications;
cognition based context-aware application; activity-based context-aware applica-
tions in the context of assisted living within smart home; context-aware mobile
phones that know what to do with, and determine the level of intrusiveness that
would be appropriate when trying to notify the user, of incoming calls as well as how
to behave as to the provision of personalized services; context-aware bookshops that
interact with personalized interested users, context-aware video games that alter their
narrative in response to the viewer’s emotions as inferred from their facial expres-
sion and; context-aware toys that interacts with children with understanding; to
context-aware parking areas that tell drivers where to park, but to name a few.
For the scope of this book, the emphasis is placed on computational artifacts as to
how context influences and changes their interaction with human users. Accordingly,
context awareness is primarily considered from the view point of HCI applications,
3.9 Context-Aware Systems 83

with a particular emphasis on AmI systems that aim at providing intelligent services
in relation to user’s cognitive, emotional, and social needs. To establish context-
aware functionality, various computational components are required to collect, fuse,
aggregate, process, and propagate context information in support of users’ needs,
desires, and intentions. This involves a wide variety of technologies, including
miniaturized multi-sensors, pattern recognition/machine learning techniques, onto-
logical modeling and reasoning techniques, intelligent agents, networking, wireless
and mobile communication, middleware platforms, and so on. In particular with
sensor technology, AmI systems are augmented with awareness of their milieu,
which contribute to enhancing such autonomic computing features as self-learning,
self-configuring, self-executing, and self-optimizing. These autonomic features
enhanced by the efficiency of multi-sensor fusion technology as well as the gain of
rich information offer a great potential to boost the functionality of context-aware
applications to the extreme, thus providing infinite smart services to users within a
variety of settings: at work, at home, and on the move. Research in sensor technology
is rapidly burgeoning. With the advancement and prevalence of sensors (in addition
to computing devices), context-aware applications are increasingly proliferating,
spanning a variety of domains.
While numerous technologies for the design, development, and implementation
of context-aware applications are rapidly advancing and maturing, given that there
has been, over the last two decades, an intensive research in academic circles and in
the industry on context awareness, R&D activities and projects within, and thus the
advancement of, context awareness technology differs from an application domain
to another. This is due to researchers giving more attention to some areas of context
awareness than others as well as to, arguably, the complexity associated with some
types of context compared to others. Examples of areas that have witnessed an
intensive research in the field of context-aware computing include location-aware
and spatiotemporal-aware in relation both to ubiquitous and mobile computing in
addition to activity-based context-aware applications in the context of assisted
living within smart home environment. Cognitive, emotional, social, and conver-
sational context awareness, on the other hand, is an area in the field of AmI that has
recently started to attract researchers. Recent studies (e.g., Kwon et al. 2005; Kim
et al. 2007; Zhou and Kallio 2005; Zhou et al. 2007; Cearreta et al. 2007; Samtani
et al. 2008) have started to focus on the research topic of context awareness. Thus, it
is still in its infancy. And this wave of research appears to be evolving at a snail’s
pace, nevertheless.

3.9.2 There Is Much More to Context than the Physical


Environment

Mainstream research on context awareness has heavily focused on physical envi-


ronment, but there is more—considerably more—to context than environment and
location. Particularly, location has been the primary factor used for denoting context
84 3 Context and Context Awareness of Humans and AmI Systems …

in the contemporary literature on ubiquitous computing because the context was


initially perceived as a matter of user location by computer science community.
This relates to Bellotti and Edwards’s (2001, p. 196) argument, noting in a dis-
cussion of the state of context-aware computing: ‘only the basic nonhuman aspects
of context, with constrained conditions and well-defined, responsive behaviors, can
be handled by devices on their own.’ Most attempts to use context awareness within
AmI environments are focused on the physical elements of the environment, a user
or devices (Prekop and Burnett 2003). Indeed, context awareness has been reduced
to the physical properties of the location and its nearby environment, using global
positioning systems (GPS) and wireless capabilities and the opportunities they
afford for location and co-location sensing and information retrieval and assembly
such as multi-media presentations that can be wirelessly transmitted and displayed
on devices such as mobile phones and laptops. There is, as adequately discussed
above, so much more to context than location, co-location and other physical
characteristics of the user’ environment, including physical conditions and spatio-
temporal aspects. In the current stage of AmI research, there is a growing realization
among scholars that context awareness goes far beyond such physical factors to
include more, considerably more, of human factors pertaining to user’s cognitive
state, emotional state, psychophysiological state, bio-physiological state, social
setting, cultural setting, and so on. It suffices to see an increasing awareness among
AmI designers and computer scientists of human context awareness in this fun-
damental and rich sense.
As a manifestation of operating out this understanding, architectures of
context-aware application are becoming generic, and context models are increas-
ingly growing more sophisticated. In terms of human factors related context, there
is a new emerging wave of research focusing on cognitive and emotional elements
of context. Research on cognitive and emotional context awareness has been less
active over the last two decades. Cognitive elements of context need more study
(Kim et al. 2007). While in affective computing area research has paid little
attention to context (Cowie et al. 2005). Research shows that psychological context
is far more difficult to be measured, modeled, and implemented compared to the
physical context. Many studies within context awareness have a limitation related to
recognizing user internal states such as intentions and emotions in a static condition,
not to mention, in a dynamic condition. One of the likely reasons why the research
in such areas of human factors related context has not gone far in the field of AmI is
the insurmountable complexity inherent in sensing, modeling, and understanding
the human cognitive world. Indeed, cognitive states are tacit and difficult (even for
human user) to externalize and translate into a form comprehensible to a
context-aware system, while the difficulty with emotions lies particularly in
understanding the motivation behind them. Many authors acknowledged the sig-
nificance of capturing the cognitive elements of a user’s context (Schilit et al.
1994). This is essential for realizing the true potential of AmI, which are, indeed,
aimed primarily at supporting the user’s emotional and cognitive needs. One of the
cornerstones of AmI is the adaptive behavior of systems to the user’s cognitive or
emotional state (Noldus 2003).
3.9 Context-Aware Systems 85

3.9.3 Cognitive and Emotional Context-Aware Applications

Cognitive or emotional context-aware applications are capable of verifying and


validating cognitive or emotional context information, often acquired by sensors
from the user’s behavior. Cognitive information can be captured as implicit input,
for example, from facial expression and eye gaze, using multisensory devices
embedded in multimodal user interfaces and software equivalents, and as explicit
input from keyboard, touchscreen, and/or pointing devices. Affective or emotional
information are usually captured as an implicit input using a combination of sensors
to detect different emotional cues by reading multiple sources. Furthermore, a
cognitive context-aware application is a class of AmI systems that aims at reducing
the burden associated with task performance by lessening the requirement for the
cognitive activities associated with difficult tasks, by either assisting the user in
accomplishing these tasks or carrying them out on behalf of the user, thereby
allowing for smooth and intuitive interaction. It should be able to sense and predict
and intelligently adapt to the user’s cognitive states, by recognizing the cognitive
dimension of context and modifying its behavior accordingly. On the other hand, an
emotional context-aware application as a class of AmI systems is able to recognize
the user’s emotional state and make inferences on how to adapt its behavior in
response to that state. Such an application has the potential to influence users’
emotions in a positive way by producing emotional responses and invoking positive
feelings. Including emotions into context-aware computing is a recent challenging
endeavor and increasingly attracting many researchers in the field of AmI. Hence,
affective computing paradigm is becoming an integral part of AmI research—
affective context-aware computing. With utilizing affective computing, AmI sys-
tems can have human-like emotional, interactive capabilities. Affective computa-
tional tools enable AmI systems to use affective display behaviors and other means
to detect the emotional state of users by reading multimodal sources. An affective or
emotional context-aware system should be able to recognize the user’s emotional
state by detecting various emotional cues, which requires various types of dedicated
sensors for detecting emotiveness and vocal parameters of speech, facial expres-
sions, gestures, body movements, as well as heart rate, pulse, skin temperature,
galvanic skin response, and so on. Furthermore, AmI systems amalgamating
emotional and cognitive aspects of the user relates to what is known as social
intelligence. For a system to be socially intelligent it should be able to select and
fine-tune its behavior according to the affective and cognitive state (task) of the user
(Bianchi-Berthouze and Mussio 2005).
However, context-aware computing is not something that will be driven by
pre-existing information or computationally formal knowledge about users and thus
ready-made behaviors, but rather it should understand how human cognitive and
emotional states dynamically continuously evolve as contextual elements in order
to be able to deliver efficient, real-time services. Context-aware computing should
aim to support human users who are already better-equipped to figure out how a
certain context may emerge and change the interaction with the environment and its
86 3 Context and Context Awareness of Humans and AmI Systems …

artifacts. This relates to has come to be known as situated intelligence, an issue


which is discussed in detail further down. Indeed, in context-aware computing, the
universe of discourse of emotion and cognition are considered to be the most
difficult to define and model by the system designers and ontology modelers in AI
and AmI due to the complexity of human functioning, especially in relation to
situated actions, situated cognition, meaning attribution, dynamic perception,
negotiation, and so forth.

3.9.4 Common Examples of Context-Aware Applications


and Services: Mobile Computing

As a technological feature, context awareness is about firing various


context-dependent application actions, thereby delivering personalized, adaptive, or
proactive services that best meet the user’s needs. In other words, on the basis of
sensed (context) information, context-aware systems can change their content and
behavior autonomously. That is to say, they can dynamically self-select,
self-configure, and self-execute relevant services based on the information acquired
about the user’s setting. Examples of existing context-aware applications are
numerous, and vary greatly in terms of scale, complexity, distribution, and intelli-
gence. The intent here is only to provide a few examples of such application and cast
the light on some of the ambient services they can offer. Since the emergence of
AmI, there has been a tendency to associate context with location awareness in
mobile computing, owing to the advances of ultra-mobile computing devices, such
as smart mobile phones, personal digital assistants (PDAs), wearable computers, and
laptops. In this regard, the information and services users need change according to
their mobility—while they are on the move. Accordingly, context-aware systems
provide users, more often acting in an autonomous way, with the information and
services they need wherever they are and whenever they need them, in the specific
context in which they find themselves. O’Hare and O’Grady (2003) introduced
Gulliver’s Genie as a context-aware application that assists roaming tourists, where
intelligent agents collectively determine the user context and retrieve and assemble
multi-media presentations that are wirelessly transmitted and displayed on a PDA.
Gulliver’s Genie attempts to deliver personalized services based on the location
context. In a similar approach, Khedr and Karmouch (2005) propose Agent-based
Context-Aware Infrastructure (ACAI), which tries to recognize the current situation
and act on that understanding. There are also diverse examples of general-purpose
computing devices that can become a specific information appliance depending on
the context. One example is a laptop that becomes a TV screen capable of displaying
high quality video when needed or transforming itself into transparent mirror
without human intervention. Another example, is a mobile phone that runs its
3.9 Context-Aware Systems 87

applications automatically according to the context, e.g., when the mobile phone is
close to a phone it runs the contact list application and in the supermarket it executes
shopping list application. A common example of an active context-aware application
is a mobile phone that changes its time automatically when it enters a new time zone
or restricts phone calls when the user is in a meeting. This is opposed to passive
context-aware applications whereby a mobile phone would notify or prompt the user
to perform the action instead. Context awareness is even more important when it
comes to complex mobile devices in which productivity tools with communication
and entertainment devices converge to make mobiles highly multifunctional, per-
sonal smart devices. Especially, mobile phones have been transformed into a ter-
minal capable of accessing the internet, receiving television, taking pictures,
enabling interactive video telephony, reading RFIDs, sending a print request to a
printer at home, and much more (see Wright 2005). The ways in which such mul-
tifunctional, powerful devices are going to behave in AmI environments will vary
from a setting to another, including indoors, business meetings, offices, schools,
outdoors (e.g., touristic places, parks, marketplaces), on the move (e.g., walking in a
shopping mall and running), and so on. For example, if you enter a shopping mall,
your mobile phone could alert you whether any of your friends are also there, and
even identify precisely in which spot they are located, and also alert you to special
offers on products and services of interest to you based on, for example, your habits,
preferences, and prior history. Kwon and Sadeh (2004) proposed context-aware
comparative shopping and developed an active context-aware system that behaves
autonomously based on multi-agent. This system can be aware of a user’s location
and make educated guesses automatically about user preferences to determine the
best purchase. In their Autonomic Middleware for Ubiquitous eNvironment
(AMUN) applied to the Smart Doorplate Project, Trumler et al. (2005) propose a
system that tries to capture the user’s location, walking direction, speed, and so on;
location of the user with special badge is traced and system shows relevant infor-
mation to the user when the user approaches the specific room. As it is noticed, most
of the examples presented above are mainly associated with location-aware appli-
cations, which, to capture and use context within AmI environment, have focused on
the user’s external and physical context through physical devices such as smart
sensors, stereo-type cameras, and RFID. More of these applications as well as
activity-based context-aware application will be introduced as part of recent AmI
projects in the next two chapters. But examples of cognitive and emotional
context-aware applications will be, in addition to being introduced in the next two
chapters as well, elucidated and discussed in more detail in Chaps. 8 and 9. The
intent of mentioning different examples of context-aware applications is to highlight
the emerging research trends around other types of contexts of psychological,
behavioral, and social nature. While all types of contexts are crucial for the devel-
opment of context-aware applications, the real challenge lies in creating applications
that are able to adapt in response to the user’s context based on a synchronized,
dynamic fashion as to analyzing and reasoning about different, yet interrelated,
components of that context.
88 3 Context and Context Awareness of Humans and AmI Systems …

3.10 Context Awareness: Challenges and Open Issues

It is recognized that the realization of AmI vision presents enormous and daunting
challenges across computer science, many of which pertain to system engineering,
design and modeling. As mentioned above, context is a difficult topic to tackle and
context awareness has proven to be a complex, multilevel problem with regard to
realization—low-level sensor data acquisition, intermediate-level information pro-
cessing (interpretation and reasoning), and high-level application action. Context
recognition (awareness) comprises many different computational tasks, namely
context modeling, context detection and monitoring, information processing and
pattern recognition, and application actions. These tasks are no easy to deal with.
Thus, context awareness poses many challenges and open issues that need to be
addressed and overcome to bring the field of AmI closer to realization and
delivery-deployment of the next generation of AmI systems. In their project on
context-aware computing, Loke and his colleagues summarize some of these
challenges and open issues as follows:
• general principles and paradigms that govern the assembly of such systems;
• techniques and models of the information, structure and run-time behavior of such
systems;
• an identification of the classes of such systems, each with their specific design patterns,
models, applicable techniques, and design;
• principles and tailored methodologies for engineering context awareness;
• general methods for acquiring, modeling, querying…and making sense of context
information for such systems, with an involvement (and possible interaction) of data
analysis techniques and ontologies;
• the reliability of such systems given that they need to take action proactively [and
function when they are needed];
• the performance of such systems given that they need to be timely in acting;
• effective models of user interaction with such systems, including their update,
improvements over time, and maintenance and the development of query languages;
• enabling proactivity in such systems through learning and reasoning; and
• integration with the services computing paradigm for the provision of context as a
service to a wide range of applications (Loke et al. 2008).

Other challenges and issues include: the predictability of such systems given that
they need to react in ways they are supposed to; the dependability of such systems
given that they need to deliver what they promise; modeling of human functioning
(e.g., emotional, cognitive, conversational, and social processes); effective man-
agement of context information that grows sophisticated; critical review of opera-
tionalization of context in context-aware artifacts and their impact on how context is
conceptualized, especially in relation to human factor related context; full user
participation in design, development, configuration, and use of systems; under-
standing different users’ needs and demands, and, more importantly, how they can
3.10 Context Awareness: Challenges and Open Issues 89

be met or fulfilled by ambient services in AmI environments; but to name a few.


One of the significant challenges in AmI is to create applications that can be aware
of the context in its multifarious, multidimensional, and changing form and that are
capable of validating that context in accordance with the way the user dynamically
ascribes meaning to its constituting entities as an amalgam and be easily in com-
mand of it in terms of how it can influence interaction. As supported by Ulrich
(2008, p. 6): ‘The challenge to context-aware computing is to enhance, rather than
substitute, human authorship, so that people (not their devices) can respond pur-
posefully and responsibly to the requirements and opportunities of the context. The
aim is…to give users meaningful and easy control of it.’ This is the kind of
understanding of human context that should be considered as a fundamental
underpinning for realizing the true potential of AmI—inspired by human interaction
and communication. Indeed, at issue is that interactive computer systems cannot be
entrusted as to the task of responding to features of the context—making decisions
and autonomously acting on behalf of human users accordingly—given the fun-
damental difference between these systems and humans (see Chap. 6 for a detailed
discussion).

3.11 Context and Situation

In the literature on context awareness, context: what can be said about an entity,
tends to be synonymous with situation; hence, they have been used interchangeably.
As noticed above, several definitions of context are somewhat tautologous: context
is described as comprising contextual features, assuming ‘context’ and ‘situation’ are
tantamount. Situation describes the states of relevant entities or context represents
any information (contextual aspects) that characterizes the situation of an entity (e.g.,
Dey 2001). This reflects the sense in which the notion of context is applied to
context-aware computing, i.e., everything that could be relevant to a given person
(user) doing a given thing in a given setting. In fact, different dimensions of context,
such as physical, cognitive, emotional, and social, are referred to, in context-aware
computing, as high-level abstractions of context or situation—see Bettini et al.
(2010) and Perttunen et al. (2009) for examples, which are inferred by applying
pattern recognition techniques using machine learning algorithms or semantic rea-
soning using semantic descriptions and domain knowledge of context on the basis of
the observations of physical sensors—only what can be measured as physical
properties. This implies that the term ‘context’, as it is used, can be ambiguous. Most
definitions of context in the technical literature indicate that while context is viewed
as being linked to situations, the nature of this link remains unclear; situation seems
to consist of everything surrounding an entity as an object of enquiry while context
comprise specific features that characterize a situation (Lueg 2002). Thus, there is a
distinction between context and situation. There is more to consider when looking at
context from a perspective that is motivated by research in situated cognition and
situated action. This perspective, which is influenced by the notion of ‘situation’,
90 3 Context and Context Awareness of Humans and AmI Systems …

is on the focus in what remains of the discussion in this section. As the notion of
situation has some similarities to the notion of ‘context’ in those disciplines devoted
to the study of context (see Goodwin and Duranti 1992), context and situation must
have distinguishing features as well. The concept ‘situated’ is common across a wide
range of disciplines, including social science (sociology), computer science, artificial
intelligence, and cognitive science. The social connotation of ‘situated’, which ‘has
its origins in the sociology literature in the context of the relation of knowledge,
identity, and society’ (Lueg 2002) is partly lost as the concept has been reduced (in
terms of complexity) from something social in content and conceptual in form to
merely ‘interactive’ or ‘located in some time and place’ (Clancey 1997). It is this
connotation ‘that allows highlighting the differences between “context” as used in
work on context-aware artifacts and the original “situation”. A “situation” is an
observer-independent and potentially unlimited resource that is inherently open to
re-interpretation. “Context”, to the contrary, as an expression of a certain interpre-
tation of a situation is observer-dependent and therefore no longer open to
re-interpretation: the meaning of aspects included in the context description is more
or less determined. Other potentially relevant aspects may or may not be included in
the context description… The openness to re-interpretation matters as (individual)
users may decide to assign significance to aspects of the environment that were not
considered as significant before.’ (Lueg 2002, pp. 44–45). Understanding the way in
which meanings are constructed in interaction with the environment and how intense
our interaction is can help us gain insights into why a situation may very well be
open to re-interpretation. Schmidt (2005, p. 167) states: ‘All [inter]actions carried
out by a human take place in context—in a certain situation. Usually interaction with
our immediate environment is very intense… even if we don’t recognize it to a great
extent. All contexts and situations are embedded in the world, but the perception of
the world is dictated by the instantaneous context someone is in.’
Interaction entails a process of exchange of mental and social representations
between people and when these people construe meaning by means of representa-
tions, i.e., give meaning to these representations while observing and
representing/giving meaning. Accordingly, context represents a meaning that is
generated based on mental and social representations of people, objects, places,
events, and processes as contextual entities—that is, a subjective, socially situated
interpretation of some aspects of the situation in which interactions occur. This
process is too immediate and fluid to capture all the aspects of the environment—
what constitutes the situation; hence the need for re-evaluations and thus
re-interpretation of the situation (assigning significance to more aspects of the
environment) as the interaction evolves. This explains why an observer may perceive
an interaction differently as it unfolds through the changing context (by including
more of relevant aspects of it). One implication in context-aware computing is that a
higher level of the context (e.g., retrieving information, going to bed, making deci-
sion, feeling bored when interacting with an e-learning application, etc.) may be
inferred at a certain moment but just before this inferred context changes the appli-
cation’s behavior, the context may (unpredictably) change from the user part that the
system (agent) may not register. As a result, the system may behave inappropriately,
3.11 Context and Situation 91

meaning that its action becomes irrelevant and thus annoying or frustrating to the
user. This can be explained by the fact that contextual elements as part of a situation,
such as location, time, lighting, objects, work context, business process, and personal
event as an atomic level of the context may not change while other aspects such as
cognitive, emotional, and biochemical states and processes of people, social
dynamics, and intentions may well do or simply other components of contexts may
be brought in that would render the inference irrelevant. At the current stage of
research, context-aware applications are not capable of including the changing or
dynamic nature of context awareness and how it shapes and influences interaction. In
all, the assumption of tending to work with ‘context’ and ‘situation’ as two distinct
concepts is to enhance the functioning and performance of context-aware applica-
tions in AmI environments.

3.12 Individual and Sociocultural Meaning of Context


and Situation

A situation represents an overarching environment in which the interaction that takes


place is defined by the context and also changing under its influence as an evolving
interpretation of that situation. In this sense, each distinct interpretation of situation
is associated with distinctive patterns of assigning meaning or significance to some
of its constituting aspects which form the (perceived) context that influences the
interaction, and there are few if any one-to-one relationships between an interaction
and the meaning given to it through the interpretation and re-interpretation of the
situation in which it takes place. It is the evolving interpretation of the situation—
and thus dynamic perception of context—rather than the situation itself, which
defines the patterns of interaction. Accordingly, context has no fixed meaning, but
rather different meanings for different people in different interactions—the meaning
of context is modulated through interactions and context changes with actions. Put
differently, the meaning of context information differs for every person on the basis
of the patterns underlying the selectivity and framing of the assumptions through
which he/she delimits relevant facts and concerns against the whole situation or
environment in which interaction takes place. Ulrich (2008, p. 7) points out that
context is a pragmatic and practical-normative rather than merely semantic and
theoretical-empirical notion, but computer systems operate at approximated
semantic level of understanding and can only handle theoretical-empirical reasoning.
This argument leads to questioning the claims made in the vision of AmI that AmI
applications are capable of—based on the context in which users find themselves—
anticipating and intelligently responding to their needs and desires.
The meaning we attach to things and many aspects of life is not inherent in them
but a result of mental and social representations. This is to say, meaning con-
struction entails individual and shared perceptions. As one shared premise of
constructivistic worldview is that reality is socially constructed, the construction
process involves social and cultural artifacts and therefore inevitably becomes
92 3 Context and Context Awareness of Humans and AmI Systems …

sociocultural, despite perception necessarily is individual. One implication is that


while contexts are perceived (from interpreted situations) by each individual, they
are associated with (socially) shared values and practices. In other words, context is
representational and interactional. Specifically, ‘relevant context emerges and
changes with human practice. It has to do with the…questions we face in our daily
lifeworld; with the shared core of views, values and visions for improvement on
which we can agree with other people; and with the support we can marshal from
them for our actions…. Contextual assumptions are unavoidable, and they…have a
judgmental and normative core that is rooted in individual purposefulness and
social practice. They imply empirical as well as normative selectivity of all our
judgments and claims, in that they determine what considerations of fact and value
we take to be relevant and what others we don’t consider so relevant. We are not
usually fully aware of these selections, nor do we overview all the consequences
they may have. We are, then, always at risk that our designs and actions have effects
that we do not adequately anticipate; that we raise claims that we cannot really
substantiate’ (Ulrich 2008, p. 7, 22). There is certainly myriad other circumstances
(social, ethical, intellectual, and motivational) that we might consider than those
contextual assumptions we select and through which we delineate relevant facts and
concerns that condition our judgments, claims, decisions, actions, and interactions
(with people, situations, objects, places, and events). People use various situational
features as resources for the social construction of people, objects, places, and
events, that is, it is through their continuous, concerted effort that these entities
become what they are as perceived.

3.13 Situated Cognition, Action, and Intelligence

Context-aware systems are not capable of handling interactive situations the way
humans do. This entails understanding the meanings ascribed to interaction acts
through the continuously changing context as an ongoing interpretation of the
overall situation, and dynamically adjusting or reacting to new or unanticipated
circumstances. On the difference of (human) situated actions and (computer)
planned actions, Lucy Suchman writes: ‘The circumstances of our actions are never
fully anticipated and are continuously changing around us. As a consequence our
actions, while systematic, are never planned in the strong sense that cognitive
science would have it. Plans are a weak resource for what is primarily an ad-hoc
activity.’ (Suchman 2005, p. 20). The idea of situated action is that plans are
resources that need to be combined with many other situational variables as
resources to generate behavior; hence, they are far from being determining in
setting our actions. Researchers in situatedness, notably Suchman (1987, 2005) and
Clancey (1997), who have investigated the specific characteristics of usage situa-
tions understand the characteristics of a situation as resources for human cognition
and human (inter)action, contrary to most researchers developing context-aware
artifacts (Lueg 2002).
3.13 Situated Cognition, Action, and Intelligence 93

In terms of situated intelligence, the cognitive processes and behavior of a


situated system should be the outcome of a close coupling between the system
(agent) and the environment (user) (Pfeifer and Scheier 1999; Lindblom and
Ziemke 2002). This assumes no ability of the AmI system to reason about the
meaning of what is taking place in its surrounding, e.g., user’s (perception of)
contextual features on the basis of observed information and dynamic models, so to
undertake actions autonomously on behalf of the user. This implies that rather than
focusing on the development of models for all sorts of relevant situations of
everyday life, AmI research should focus on the development of new technologies
that enhance aspects of a close coupling between AmI systems and their human,
social, and cultural environments, with no need to model all those situations or
environments. Brooks (1991) called for avoiding modeling as much as possible,
suggesting alternatively that machine intelligence should ensue from basic
responsive elements that can create the suitable dynamics when interacting with
their environment. One implication of this conception is that the intelligence of the
system should be gauged against the ability to accomplish a deep coupling with
users who, with respect to the attribution of meaning, interpret, evaluate, and make
association, and with respect to acting, react to and trigger relevant behavior in the
system. In this way, intelligence would evolve from how people can act more
effectively in their environment and organize their practices and settings based on
the way in which they can be empowered with AmI artifacts and environments. The
pertinence lies in a search for empowering people into the process of improvised or
unplanned situated action characterizing everyday life, rather than trying to model
the subtle and complex forms of intelligence embedded into the life-world (Dourish
2001). It can be inferred that the limitations of context-aware applications provide
justifications for questioning the claim about the promised intelligence that can be
provided through interaction in AmI environments to address ‘the real needs and
desires of users’ (ISTAG 2003), to iterate. As stated by Gunnarsdóttir and
Arribas-Ayllon (2012, p. 1), ‘the original promise of intelligence has largely failed.
This outcome points to a two-sided problem. The definitional looseness of intelli-
gence is permissive of what can be expected of the role and scope of artificial
reasoning in AmI interaction paradigms, while ordinary human reasoning and
knowing what people actually want remains persistently elusive. AmI research and
development to-date is still grappling with a problem of what the intelligence in
Ambient Intelligence can stand for.’

3.14 Context Inference, Ready-Made Behavior,


and Action Negotiation

A situated form of intelligence in terms of system behavior (service delivery) is an


emerging trend around the notion of intelligence and viewed to provide a refreshing
alternative for research in AmI. However, most of current AmI research still focuses
94 3 Context and Context Awareness of Humans and AmI Systems …

on the creation of models for all sorts of contexts, situations, and environments
based on the view of developers. Context-aware applications is about having
developers define what aspects of the world constitute context among the infinite
richness of other circumstances, thereby interpreting and evaluating context in a
way that stops the flow of meaning, by closing off the opportunity of including
emergent contextual aspects or re-assessing the significance assigned to some
previous aspects of the situational environment. One implication is that, as in a lot
of context-aware applications, the user does not have the possibility to negotiate the
meaning of the context and thus the relevance of the so-called ‘intelligent’ behavior.
All the user has to do is to obey what the developers define for him/her—adaptation
rules, although the inferred context is based on only what is computationally
measurable as contextual aspects—limited and imperfect data. Consequently, the
outcome of decision as to the behavior of computational artifacts—delivery of
ambient services—stays in the hands of the developers who understand when and
why things may happen based on particular contextual features. In this sense, the
developer is determining the behavior of the user without negotiating whether it is
suitable or not. Context is not only a ‘representational’ issue (as computers can
handle it), but also an ‘interactional’ issue to be negotiated through human inter-
action (Dourish 2004, pp. 4–6). Besides, developers can never model how people
attach meaning to and negotiate contexts, the logic underlying the socio-cognitive
processes of subjective, socially situated perception, evaluation, interpretation, and
making association in relation to places, objects, events, processes, and people. The
reality is that developers will continue to define what aspects of the world constitute
context and context-dependent application actions, regardless of whether they are
relevant or not for the user. This is due to the fact that within the constraints of
existing computing technologies, taking meaning of context into account is a
strange switch to make, as it ‘undermines the belief in the existence of a “model of
the user’s world” or a “model of behavior”’ (Criel and Claeys 2008, p. 66). There is
little knowledge and computational tools to incorporate user behavior in system
design (Riva et al. 2003). And a strong effort is needed in the direction of user
behavior and world modeling ‘to achieve in user understanding the same level of
confidence that exists in modeling technology’ (Punie 2003). However, it is still
useful—technically feasible—to create systems that allow the user to accept or
decline if an application should behave, deliver an ambient service, based on a
particular inferred context—a situated form of intelligence or user-driven adapta-
tion. Especially, it is unfeasible, as least at the current stage of research in AI and
AmI, computationally model how humans interpret and re-interpret situations to
dynamically shape the meaning of context that define and change their interaction.
Whether personalized, adaptive, responsive, or proactive, an application action as a
ready-made behavior of the system based on particular patterns of analysis and
reasoning on context should not be taken for granted to be relevant to all users as
long as the context that define interaction is carried out in a situation that consists of
potentially unlimited number of contextual aspects—resources for human cognition
and action. There is ‘an infinite richness of aspects that constitute the contexts of
purposeful action’ (Ulrich 2008, p. 7). In particular, interacting with context-aware
3.14 Context Inference, Ready-Made Behavior, and Action Negotiation 95

systems should entail the negotiation of the relevance of the actions of the system to
human actor’s situation; especially, our acting is not routine acting in its entirety.
Besides, translations of context-aware systems’ representations, as Crutzen (2005,
p. 226) argues: ‘must not fit smoothly without conflict into the world for which they
are made ready. A closed readiness is an ideal which is not feasible, because in the
interaction situation the acting itself is ad-hoc and therefore unpredictable. The
ready-made behavior and the content of ICT-representations should then be dif-
ferentiated and changeable to enable users to make ICT-representations ready and
reliable for their own spontaneous and creative use’. Like services, ‘information and
the ways we understand and use it are fundamentally contextual, that is, conditioned
by contextual assumptions through which we delimit relevant “facts” (observations)
and “values” (concerns) against the infinite richness of other circumstances we
might consider. Accordingly, we cannot properly appreciate the meaning, rele-
vance, and validity of information, and of the claims we base on it, without some
systematic tools for identifying contextual assumptions and unfolding their
empirical and normative selectivity. Context awareness of the third kind is about
giving…users more control over this fundamental selectivity.’ (Ulrich 2008, p. 1).

3.15 Situation and Negotiation

‘Learning, thinking, and knowing are relations among people engaged in activity in,
with, and arising from the socially and culturally structured world’ (Lave 1991).
Fundamentally, situations are subject to negotiation among the people involved in
the situation (e.g., Wenger 1998). Incapability to computationally capture this aspect
of negotiation has implications for the performance of context-aware applications.
Agre (2001) contends that context-aware applications may fail annoyingly as soon
as their wrong choices or decisions become significant. This argument stems from
the fact that people use various features of their environment (situations) as resources
for the social construction of entities, such as places, objects, and events.
Accordingly, abstracting from situations to context should be based on a description
that is so multi-dimensionally rich that it includes as potentially relevant aspects as
possible from a situations rather than a description that is more or less
pre-determined. In other words, the classes of situations that will influence the
behavior of applications have to be selected from a flexible, dynamic, semantic,
extensible, and evolvable model for what should have an influence on such appli-
cations. It is methodologically relevant to, regardless of technical implementation,
‘ask how we can systematically identify and examine contextual selections, our own
ones as well as those of other people…. Only thus can we be in control of our options
for choosing selections’ (Ulrich 2008, pp. 6–7). However, a computational artifact is
incapable of registering features of socially constructed environment (Lueg 2002).
An example taken from Lueg (2002, p. 45) is context-aware buildings, where, using
currently available context awareness technology, ‘a room in such a building could
monitor its electronic schedule, the number of persons in the room, and the
96 3 Context and Context Awareness of Humans and AmI Systems …

prevalence of business clothing among the persons in the room. The room could
compute that the current context is a “business meeting context” and could instruct
attendees’ mobile phones not to disturb the meeting; business-related information
could be projected onto the room’s multipurpose walls. However, being a social
setting in the first place, a meeting does not only depend on the already mentioned
aspects but also on what has been negotiated among participants of the meeting. This
means that even if a particular situation fits the description of a “meeting context”,
the situation may have changed into an informal get together and vice versa. The
subtle changes are hardly recognizable as commonly mentioned context aspects,
such as…location, identity, state of people, groups and computational and physical
objects, may not change at all. In a sense, the context does not change while the
surrounding situation does. Examples for such situational changes are unexpected
breaks or being well ahead of the schedule so that a meeting finishes earlier than
expected. Once the meeting has changed its nature, it may no longer be appropriate
to block calls and it may no longer be appropriate to project business-related
information on walls (as it would demonstrate that the hosting company’s expensive
technology did not recognize the change in the meeting situation).’ Another example
is provided by Robertson (2000) of a business situation that changes, as the com-
putational artifacts could not sense recognizable changes by the people involved in
the situation. While many researchers have in recent years contributed related
viewpoints to AmI and HCI more general, these insights have just started to attract
attention they deserve in the discussion of AmI applications or context-aware arti-
facts. In all, as Lueg (2002) contends, there remains ‘an explicit distinction between
the concept of context that is operationalized and the original usage situation…as a
social setting that has been negotiated among peers in the first place’, and accord-
ingly, ‘developers of context-aware artifacts should pay considerable attention to the
fact that the context determined by artifacts may differ from what the persons
involved in the situation have negotiated’ (Ibid, p. 43).

3.16 Operationalizing Context: Simplifications,


Limitations, and Challenges

In context-aware computing, operationalizing the concept of context entails


defining it so that it can be technically measured (sensed) and computationally
modeled (expressed and represented). This has implications for the effectiveness of
the behavior of context-aware applications due to the incapability to capture the
whole situation relevant to the entity concerned with that behavior.
Operationalizing context entails focusing on some aspects that characterize a sit-
uation of an entity and thus excluding other relevant aspects due to the constraints
of existing technologies—system engineering, design and modeling. Accordingly,
system developers have to pre-determine that some aspects (e.g., location, identity,
activity, event, groups, physical objects, etc.) are significant while other aspects
3.16 Operationalizing Context: Simplifications, Limitations, and Challenges 97

(e.g., cognitive and emotional states, social relations, culture, bio-physiological


conditions, psychophysiological response, knowledge, experiences, values, etc.) are
less significant. Hence, the currently adopted approach to operationalizing context
has implications for the conceptualization and inference of context and thus the
context-dependent application actions. Depending on the application domain, to
operationalize context, researchers determine which technological features to
characterize specific applications in terms of which constituents of context should
be incorporated and how these should be sensed, modeled, and understood and then
influence the patterns of service provision, while taking into account what the
available technologies can offer. An example of this approach in the context of an
emotional context-aware system would be that the emotional state of the user as a
context is operationalized based on what the available emotion recognition tech-
nology can allow at the technological and computational level—that is, smart
sensors (e.g., image sensor to capture facial expressions, audio sensor to detect
paralinguistic parameters and emotiveness, and other wearable biosensors attached
to human users to measure psychophysiological data such as heart rate and elec-
troencephalographic response) and the suitable modeling and reasoning techniques
such as probabilistic methods, rule-based methods, ontological approaches, logical
programing, or a combination of these. In other words, operationalizing the emo-
tional state of the user entails only the type of the emotional cues that are mea-
surable, modelable, and processable, using currently available enabling
technologies and processes. Ideally, emotional cues should be implicitly sensed
from multiple sources and then combined and evaluated as information for inferring
the user’s emotional state. This could be a very difficult task to tackle by existing
technologies. There is in fact an inherent complexity associated with computational
modeling of all mental states and situations of life. Currently, most of the inter-
pretation and reasoning processes entail complex inferences based on limited,
vague, conflicting, or uncertain data, notwithstanding the tremendous potential of
machine learning and hybrid techniques.
In general, operationalizing context from an abstract concept—that is, framing it—
have to include aspects of context as an aspect of the world in a sufficiently detailed
context model and how this model can be kept up-to-date when new changes to
context. So, the context model should be characterized by semantic expressiveness,
dynamicity, flexibility, extensibility, and evolvability. However, the technological
approach—what is typical for work on context-aware artifacts—is adopted primarily
to serve pragmatic purposes: defining the notion of context in ways that produce
context-aware artifacts that are able to detect, analyze, model, and understand situa-
tions or environments in a way that would allow the artifact to take appropriate actions
based on the intended use of the system application domain. The rationale for opting
for such an approach is that it works well and simplifies the system for some appli-
cations. However, it has pitfalls in terms of overlooking other important contextual
aspects. For example, an affective artifact will be incapable of registering the most
basic aspects of emotions: contextual appropriateness of emotions and culturality of
emotions. According to Salovey and Mayer’s (1990) ability emotional intelligence
98 3 Context and Context Awareness of Humans and AmI Systems …

model, perceiving emotions as the very first step entails identifying emotions and
discriminating between accurate (appropriate) and inaccurate (inappropriate)
expressions of emotion, which is an important ability to understand and analyze
emotional state. Also, cultural aspects are part of the situation or background relevant
to a person; in relation to emotions, cultural variations are great as different cultures
may assign different meanings to different facial expressions, e.g., a smile as a facial
expression can be considered a friendly gesture in one culture while it can signal
embarrassment in another culture. Operationalizing emotional context should take
into account cultural specificities so to enable related context-aware artifacts to be
tailored to or accommodate users variations if they are wanted to be widely accepted
(see Chap. 7 for further discussion). In all, the difference between context-aware
artifacts driven by what is technically feasible and what might be helpful in a con-
textual situation matter, with consideration of social, cultural, emotional, and cog-
nitive aspects that cannot be computationally detected, modeled, and understood by
currently available enabling technologies and processes. One implication is that
context-aware applications will fail in their choices as long as the inferred context
differs from the actual context in which users may find themselves or from the way
they perceive it. Regardless, ‘there is little hope that research on context-aware arti-
facts will succeed in overcoming the problem that context understood as a model of a
situation—is always limited… [F]or context-aware artifacts it may be difficult or
impossible to determine an appropriate set of canonical contextual states. Also, it may
be difficult to determine what information is necessary to infer a contextual state.’
(Lueg 2002, p. 44).
The above approach to operationalizing context relates to what is called bottom–
up approach to context definition, which is based on the availability of particular
technologies that can sense (and model) some aspects of context, which remain
sufficient enough to enable to develop a functional context-aware system. As the
top–down approach to context definition, it entails identifying all the components
that constitute a context and then the system designer can select what is appropriate
to include as sensor technologies along with suitable pattern recognition algorithms
and/or representation and reasoning techniques. This implies that a system designer,
working backwardly, looks at the nature of the context the application is concerned
with and then attempts to combine relevant sensors with machine learning methods
and modeling approaches based on the analysis of the various context features
associated with the intended use of the application. While this approach is gaining a
growing interest in the field of context-aware computing, owing to the advance of
technologies for the design, development, and implementation of context-aware
applications, there are still some challenges and open issues to address and over-
come when it comes to operationalizing complex contexts, such as physical
activities, cognitive activities, emotional processes, social processes, communica-
tive intents, and so on. It is worth noting that this approach, although rewarding at
practical level, remains far from complete application or concrete implementation.
Indeed, experiences with the development of context-aware artifacts have shown
3.16 Operationalizing Context: Simplifications, Limitations, and Challenges 99

that the top–down approach and thus the operationalization of the concept of
context is associated with a lot of difficulties. Researchers and software engineers
usually start with comprehensive definitions but end up operationalizing much
simpler concepts of context (see Lueg 2002). Good examples are the definitions
provided by Dey et al. (2001), Schmidt et al. (1999), Gross and Prinz (2000), Kirsh
(2001), and Göker and Myrhaug (2002). While the definitions are rather compre-
hensive and the conceptual models seem rich, involving many aspects that con-
stitute context and qualitative features of context information, the actual
implementation of the definitions in some of these researchers’ context awareness
architectures consist of a number of explicitly defined attributes, such as physical
locations and conditions related to a context, computational and physical objects of
a context, and human members of a context. In all, the bottom–up approach to
context definition and the related operationalization perspectives still dominates
over the top–down one after all. And it seems that simplifications are necessary
when developing context-aware applications. There is a propensity towards alien-
ating the concept of context from its multifaceted meaning in more theoretical
disciplines in order to serve technical purposes (e.g., Lueg 2002). This pertains
particularly to context-aware artifacts which show human-like understanding and
supporting behavior—human factors related context-aware applications. The sim-
plified ways in which context have been operationalized corroborate the intention of
researchers, designers, and computer scientists to make context awareness projects
happen in reality. AmI ‘applications are very fragile…, designers and researchers
feel this pain…, but they compensate for this by the hard to beat satisfaction of
building this technology [AmI]. The core of their attraction to this lies in ‘I can
make it’, ‘It is possible’ and ‘It works’. It is the technically possible and makeable
that always gets the upper hand’. Who wants to belong to the nondesigners?’
(Crutzen 2005, p. 227).
Arguably, the simplifications observed when operationalizing context is not so
much a matter of choice for researchers as it is about the constraints of computing
as to the design and modeling of human context, which is infinitely rich, constantly
changing, intrinsically unpredictable, and inherently dynamic and multidimensional
and thus intractable. There will always be a difference between human context in its
original complex definition and its operationalization—the context information that
is sensed and the context model that is implemented, irrespective of the advance-
ment of sensor technology (e.g., MMES, NMES) and pattern recognition
algorithms/machine leaning techniques (e.g., handling uncertainty and vagueness of
context information), and, more recently, the merger of different representation and
reasoning techniques (e.g., ontological and logical approaches with rule-based and
probabilistic methods). This is an attempt to overcome potential problems associ-
ated with the operationalization of context in terms of computational formalism as
to representation and reasoning, e.g., reconcile probabilistic reasoning with rea-
soning with languages not supporting uncertainty of context information such as
ontology language (Concrete examples of context awareness architectures or pro-
jects that have applied the hybrid approach to context modeling and reasoning are
100 3 Context and Context Awareness of Humans and AmI Systems …

provided in Chap. 5). In fact, resolving the trade-off between expressiveness and
complexity as well as uncertainty and vagueness in context modeling, coupled with
the miniaturization of capture technology (sensors) and what this entails in terms of
efficiency improvement as to such features as computational speed, bandwidth,
memory, high performance communication network, energy efficiency, and so on
hold a promising potential for achieving and deploying AmI paradigm.
Simplifications associated with operationalizing context in relation to ontologi-
cal modeling of context—conceptualization of context and encoding related key
concepts and the relationships among them, using the commonly shared terms in
the context domain—are explained by what the term ‘ontology’ means in computer
science. While this term is inspired by a philosophical perspective: an ontology is a
branch of philosophy that is concerned with articulating the nature and structure of
the life world, in computing, it signifies a set of concepts and their definitions and
interrelationships intended to describe the world, which depends on the ease with
which real-world concepts (e.g., context, interaction, user behavior) can be captured
by software engineers and the computational capabilities provided by existing
ontologies, such as the expressive power of models. The challenge facing computer
scientists in general, and AmI design engineers in particular, in the field of and
research in context-aware computing, is to computationally capture what constitutes
context as a phenomenon in real life, which is conceived in a multidimensional
way, identifying historical, social, cultural, ethical, psychological, behavioral,
physical, and normative aspects. When speaking of a phenomenon that is of interest
in the ‘world’, the term Universe of Discourse (UoD) (e.g., context) is used; it is
well established within conceptual modeling (e.g., Sølvberg and Kung 1993). In
addition, in terms of the frame problem (e.g., Pylyshyn 1987), which is one of the
most difficult problems in classical representation-based AI (and continues to be in
AmI), it entails what aspects or features of the world (e.g., human context) must be
included in a sufficiently detailed world model (e.g., ontological context model) and
how this model can be kept up-to-date when the world changes (e.g., context
changes with and is modulated through interactions or is an expression of certain
interpretation, and ongoing re-interpretation, of situations in which interactions take
place). Indeed, the frame problem has proven to be intractable in the general case
(e.g., Dreyfus 2001), and aspects of the world are constantly changing, intrinsically
unpredictable, and infinitely rich (Pfeifer and Rademakers 1991). However, while
aspects of the world become context through the way system developers or
ontology modelers use them in interpretation and not because of their inherent
properties, context models, in particular those represented through ontological
formalism are to be evaluated based on their comprehensiveness, expressiveness,
dynamicity, fidelity with real-world phenomena, accuracy, internal consistency,
robustness, coherence, to name a few criteria. Regardless, what becomes certain,
though, is that there is no certainty that research on context-aware systems will
succeed in surmounting the issue that context models as implemented are always
limited (see Chap. 5 for further discussion). In other words, context-aware appli-
cations will never be able to conceive of context—contextual assumptions and
3.16 Operationalizing Context: Simplifications, Limitations, and Challenges 101

selections—as generated through social processes and interactions and by which


our ways of understanding and representing the world are constructed and main-
tained. Rather, they will always conceptualize context as oversimplified models of
situations, a difference that will always matter in the interaction between humans
and technology, especially human factors related AmI applications such as cogni-
tive and emotional context-aware, affective, and conversational artifacts, as it may
be impossible to determine an appropriate set of canonical contextual states and
how they dynamically interrelate and evolve in interactions. Indeed, most of the
issues relating to simplifications when operationalizing context concern the human
factors related aspects of context, namely psychological, behavioral, social, and
cultural dimensions. To advance measuring contextual features and representing
them in a formal and computational format requires breakthrough in enabling
technologies and processes as well as engineering and computer science theories.
Although ‘technological progress has been made to create more semantic models,
only little research has been performed on how models can be designed by com-
munities… and even fewer by “ordinary users”. Almost all models are techno-
logical driven and expect that, in one or another way, it is possible to model a
domain, or even worse, the world. A domain or world model is mostly based on
user groups but context-aware applications are very dedicated for every user.
Because of that they are more difficult or even not possible to define them.
Moreover, mostly little attention is paid on what model users really need. In
addition, models change over time because with every action context changes as in
real life. As a result most current models can only be used in protected research
environments but not in real-world situations. Although ontologies allow a more
semantic description of an environment and many authors claim to be able to model
emotions and feelings of persons but forget that the meaning of context differs for
every person’ (Criel and Claeys 2008, p. 67). Emotional and cognitive states are
subjective and it will never be easy to model them.
Indeed, the recent attempts undertaken to include (sense and model) emotional
and cognitive aspects of context in the development of context-aware applications
are far from real-world implementation. In particular, in terms of emotion recogni-
tion, the few practical attempts eventually still do not go beyond the basic recog-
nition step—perceiving emotions using facial expressions, voice, or biometric data,
instead of combining such data sources when implicitly sensing, analyzing, under-
standing, and deriving the emotional state of the user so that the system computer
system can respond appropriately to the related needs of the user. Consequently,
using fragmented sources to implicitly capture the user’s emotional state has cer-
tainly implications for the performance of emotional context-aware (and affective)
applications in their operating environment. In relation to the issue of simplifications
of emotional context model, most of the behavioral methods simply classify emo-
tions to opposing pairs or focus only on simple emotion recognition (Teixeira et al.
2008; Ptaszynski et al. 2009) ignoring the complexity and the context reliance
of emotions (see Chap. 8 for a detailed discussion). However, simplifications per-
taining to modeling complex contextual features seem to be of inconsequential
concern for many researchers as to the functioning of context-aware applications.
102 3 Context and Context Awareness of Humans and AmI Systems …

The real concern is that these should not fail annoyingly when the system’s wrong
choices become significant because of inefficient context measurement and thus
inference, as the fact remains that most of the reasoning mechanisms or processes
suggested for context-aware applications entail extremely complex inferences based
on limited and imperfect data. The difficulty of handling emotional and cognitive
context at operational level lies in the insurmountable complexity inherent in dealing
with such issues as fuzziness, uncertainty, vagueness, and incompleteness of con-
textual information at measurement, representation and reasoning. It is because
‘contexts may be associated with a certain level of uncertainty, depending on both
the accuracy of the sensed information and precision of the deduction process’ that
‘we cannot directly sense the higher level contexts’ (Bettini et al. 2010).
A rudimentary example is the difficulty to model the feeling of ‘having cold’; ‘we
will probably never be able to model such entities’ (Criel and Claeys 2008). While
the physical world itself and our measurements of it are prone to uncertainty—
capturing imprecise, incomplete, vague, and sometimes conflicting data about the
physical world seems to be inevitable. Besides, not all modeling approaches (rep-
resentation and reasoning techniques) in context-aware computing support fuzziness
and uncertainty of context information. For example, ontological approach to con-
text modeling does not adequately address the issue of representing, reasoning about
and overcoming uncertainty in context information (see, e.g., Bettini et al. 2010;
Perttunen et al. 2009). To address this problem, methods such as probabilistic logic,
fuzzy logic, Hidden Markov Models (HMM) and Bayesian networks (see next
chapter) are adopted in certain models to deal with uncertainty issues for they offer a
deeper support for modeling and reasoning about uncertainty. For example,
Bayesian networks are known to be well suited for combining uncertain information
from a large number of physical sensors and inferring higher level contexts.
However, probabilistic methods, according to Chen and Nugent (2009), suffer from
a number of shortcomings, such as ad-hoc static models, inflexibility (i.e., each
context model needs to be computationally learned), data scalability, scarcity,
reusability (i.e., one user’s context model may be different from others), and so on.
Nevertheless, hybrid methods have been proposed and recently applied in a number
of projects of context-aware computing to overcome the limitations of different
modeling methods. This is making it increasingly easier for developers to build new
applications and services in AmI environments and to reuse various ways of han-
dling uncertainty. Particularly, reasoning on uncertainty aims to improve the quality
of context information by typically taking ‘the form of multi-sensor fusion where
data from different sensors are used to increase confidence, resolution or any other
context quality metrics’, as well as inferring new types of context information by
typically taking the form of deducing higher level contexts from lower level con-
texts, such as the emotional state and activity of a user (Bettini et al. 2010). This will
have great impact on how context can be modeled and reasoned about, and thus
operationalized, in a qualitative way. Indeed, operationalizations of context in
context-aware artifacts have impact on how context is conceptualized.
3.17 Evaluation of Context-Aware Artifacts 103

3.17 Evaluation of Context-Aware Artifacts

3.17.1 Constructs, Methods, Models, and Instantiations

It is as important to evaluate as to build a context-aware artifact, i.e., its underlying


components: representational constructs, models, methods, and instantiations as
types of outputs produced by design research. According to March and Smith
(1995, pp. 256–258), ‘Constructs…form the vocabulary of a domain. They con-
stitute a conceptualization used to describe problems within the domain and to
specify their solutions. They form the specialized language and shared knowledge
of a discipline or sub-discipline. Such constructs may be highly formalized as in
semantic data modeling formalisms…or informal… A model is a set of propositions
or statements expressing relationships among constructs. In design activities,
models represent situations as problem and solution statements…, a representation
of how things are… A method is a set of steps (an algorithm or guideline) used to
perform a task. Methods are based on a set of underlying constructs (language) and
a representation (model) of the solution space… Although they may not be
explicitly articulated, representations of tasks and results are intrinsic to methods.
Methods can be tied to particular models in that the steps take parts of the model as
input… An instantiation is the realization of an artifact in its environment…both
specific information systems and tools that address various aspect of designing
information systems. Instantiations operationalize constructs, models, and meth-
ods… Instantiations demonstrate the feasibility and effectiveness of the models and
methods they contain.’ (See Chaps. 4 and 5 for examples of constructs, models,
methods, and instantiations). Further, build and evaluate are two basic activities that
constitute design science: building entails the process of constructing an artifact for
particular purposes, demonstrating that such an artifact can be constructed, and
evaluation refers to the process of determining how well the artifact performs
through developing criteria and assessing the performance of the artifact against
those criteria (Ibid). A context-aware artifact is built to perform a specific task:
sense and analyze the user’s context and behave intelligently according to that
context by providing relevant services. Demonstrating feasibility, the construction
of a context-aware artifact entails building constructs, models, methods and
instantiations; each is a technology (capture approaches, machine learning algo-
rithms, ontological modeling, actuators, query language, etc.) that, once built, must
be evaluated scientifically. The general purpose of evaluating a computational
artifact is to determine if any progress has been made. The basic question is, how
well does such as artifact perform in its operating environment? Commonly, to
evaluate an artifact requires the development of performance metrics and the
assessment of this artifact according to those metrics, which define what the
designer attempts to accomplish. Accordingly, evaluation must occur prior to
the deployment and implementation of the artifact in real-world environment. The
evaluation of the context-aware artifacts that are to be instantiated involves issues
pertaining particularly to the efficiency and effectiveness of these artifacts and their
104 3 Context and Context Awareness of Humans and AmI Systems …

impacts on the user’s interaction experience in terms of providing intuitiveness,


smoothness, flexibility, satisfaction, visual thinking and so on. Therefore, among
the aspects to look at when evaluating context-aware artifacts include: their
usability, usefulness and accessibility; their reliability given that they need to take
action proactively; their performance given that they need to behave timely; the
relevance and suitability of their services (to be delivered to varied users in different
settings); as well as principles and methodologies for engineering context
awareness; methods for measuring, modeling, querying and making sense of con-
text information for such artifacts, with possible interaction of data analysis tech-
niques and ontologies; learning and reasoning mechanisms and so on. Research in
the evaluation activity within context-aware computing is to develop metrics and
compare the performance of different constructs, models, methods and instantia-
tions for context-aware artifacts. Essentially, these metrics define what the
researchers seek to achieve with context-aware artifacts in specific and
general-purpose applications, namely to effectively and efficiently perform in the
environment for which these artifacts are designed. By and large, the the evaluation
aims to determine how well these computational artifacts recognize the context in
which it is being used and how well it modifies its functionality according to that
context or intelligently adapts its behavior to features of user context. These per-
formance aspects are determined by the functioning and performance of the com-
ponents constituting the artifact as a whole, including constructs, models, methods,
and instantiations—in other words, techniques and models of the context infor-
mation, structure and run-time behavior of such artifacts. Further, different metrics
can be developed to assess the different components embodied in context-aware
systems. According to March and Smith (1995, p. 261), ‘Evaluation of constructs
tends to involve completeness, simplicity, elegance, understandability, and ease of
use. Data modeling formalisms [e.g., (onto)logical approach], for example, are
constructs with which to represent the logical structure of data…Models [e.g.,
context] are evaluated in terms of their fidelity with real-world phenomena, com-
pleteness, level of detail, robustness, and internal consistency…Often existing
models are extended to capture more of the relevant aspects of the task…Evaluation
of methods considers operationality (the ability to perform the intended task or the
ability of humans to effectively use the method if it is not algorithmic), efficiency,
generality, and ease of use…These [methods] can be evaluated for completeness,
consistency, ease of use, and the quality of results obtained by analysts applying the
method. Evaluation of instantiations considers the efficiency and effectiveness of
the artifact and its impacts on the environment and its users. A difficulty with
evaluating instantiations is separating the instantiation from the constructs, models,
and methods embodied in it…’ Of particular importance is to devise standardized
evaluation methods that enable to collect precise information on various properties
and components of context-aware artifacts of different types and on the manner by
which these components interact in artificial systems, as well as on their appro-
priateness for different application domains. Examples of components include, and
are not limited to: conceptual context models, representation formalism, reasoning
mechanism, recognition pattern algorithms, and user-centered design methods.
3.17 Evaluation of Context-Aware Artifacts 105

Taking an emotional context-aware application as an example of instantiations,


while this artifact embodies certain constructs, models and methods, the application
developer must select from among a wide array of available constructs, models, and
methods, and decide whether and how to combine these components for optimal
performance, especially in relation to delivering complex emotional services (see
Chap. 8 for application examples). Indeed, there exist several theoretical models of
emotions, including dimensional (Lang 1979), categorical (Ekman 1984), and
appraisal (Scherer 1999), so are data modeling formalisms, constructs with which to
represent the structure of context information, such as Ontology Web Language
(OWL), Resource Description Framework (RDF), Context Modeling Language
(CML), and Object-Role Modeling (ORM). It is the appropriate choice of these
components, coupled with the advantages and the opportunity their combination
might offer that differentiate emotional context-aware artifacts. Evaluations focus
on these differences and how they change the task of system development and
enhance the system performance.
After the development of relevant metrics, empirical endeavor comes into play
as necessary phase to perform the evaluation, i.e., gauging context-aware artifacts
and their underlying components against the identified metrics. Crucially, ‘con-
structs, models, methods, and instantiations must be exercised within their envi-
ronments…Often multiple constructs, models, methods, or instantiations are
studied and compared. Issues that must be addressed include comparability, subject
selection, training, time, and tasks.’ (March and Smith 1995, p. 261). When it
comes to the execution of the respective components in their environment, it is
important to remember that lab conditions and real-world environment may be two
completely different settings as to testing the functioning and performance of these
components. Commonly, constructs, methods, models, and instantiations are usu-
ally developed for laboratory environments since they are intended to, or serve for,
research ends prior to any deployment in real-world environment. This is applicable
to research within context-aware computing. As an example of a database devel-
oped for laboratory environment in relation to emotion recognition,
‘Cohn-Kanade-Facial-Expression database (CKFE-DB)’ (Kanade et al. 2000),
which is used in building systems for facial expression recognition that can be used
in context-aware as well as affective systems, contains volunteers that were asked to
act rather than natural facial expressions. Also, the 488 image sequences of 97
different persons performing the six universal facial expressions this database
contains are taken in a laboratory environment with predefined illumination con-
ditions, solid background and frontal face views. Consequently, algorithms that
perform well with these image sequences are not immediately appropriate for
real-world scenes. To build a real-time system for facial expressions recognition
that is embodied in a context-aware system that robustly runs in real-world envi-
ronments, it is required to evaluate, in this case, how an algorithm and database as
components performs in comparison with competing ones in a real-life setting—
that is, outside lab conditions. Typically, algorithms and models are written or
represented in different formalism languages and sometimes combine more than
one, use different techniques run on different computers and employed in different
106 3 Context and Context Awareness of Humans and AmI Systems …

application domains, e.g., context-aware systems, conversational agents, affective


systems. This may well motivate the development of metrics that can provide
objective measures of the relative performance of algorithms and database inde-
pendent of their implementations. Therefore, the evaluation of algorithms and
models designed for a specific task or situation and their comparison with one
another may yield useful information from which to judge their performance. Their
evaluation should contribute to their improvement towards robustness for both
real-world applicability and high performance. Indeed, at this stage of research
within context-aware computing it is critical to focus on conducting evaluation
activities on or assessing the performance of the various components underlying
context-aware systems, especially constructs and models, as they currently pose a
plethora of issues relating to real-world implementation given the complexity
surrounding the performance of context-aware artifacts. Early evaluation endeavors
can be of great import and rewarding down the road, as they could lead to creating
systems that can perform well in their real-world environments. To be preferably
based on empirical endeavor, additional research work is needed to determine what,
in fact, actually works in practice when it comes to the complex context-aware
applications that are inspired by human functioning.
Research within context-aware computing seeks to instantiate context-aware
artifacts as well as the tools that address various aspects of system design and
modeling. Ideally, instantiations of context-aware artifacts are to be executed in
real-world environment as it is precisely in such a setting that users would use such
artifacts. Their implementation can serve to gather more insights from diverse usage
scenarios to overcome potential shortcomings and improve future systems.
Especially, many things that are technical feasible within the lab may not work in
the real-world adding to the fact that in the field of context-aware computing, not
enough data about the usage of context-aware artifacts is available. At this stage of
research, most evaluative information seems to come from laboratory studies where
users can be given predefined technologies, tasks, and instructions. Often such
information is of objective and subjective nature, e.g., quality of a context-aware
artifact and how useful it is, respectively. Field trials of context-aware systems can
rather provide evaluative information about usage patterns and subjective reactions
to technology by different classes of users. Whether in lab or field trails, multiple
instantiations of context-aware artifacts should be carried out, analyzed and com-
pared, and it is important to ensure that the metrics by which to compare perfor-
mance are as all-encompassing as possible to achieve better outcomes.
Furthermore, the instantiation of a context-aware artifact in its environment
demonstrates the effectiveness of the embodied constructs, models and methods.
That said instantiations of context-aware systems may lead to new way of thinking
about design and modeling of context-aware systems, in addition to improved
instantiations. Newell and Simon (1972, cited in March and Smith 1995) highlight
the significance of instantiations in computer science, describing it as ‘an empirical
discipline’, and stating that each new system ‘that is built is an experiment. It poses
a question to nature, and its behavior offers clues to the answer.’ Testing the
effectiveness of the performance of context-aware systems in real environment is
3.17 Evaluation of Context-Aware Artifacts 107

critical because it provides useful information on how the system and its underlying
components work in real-world situations. March and Smith (1995, p. 260) state:
‘In much of the computer science literature it is realized that constructs, models, and
methods that work “on paper” will not necessarily work in real-world contexts.
Consequently, instantiations provide the real proof. This is evident, for example, in
AI where achieving “intelligent behavior” is a research objective. Exercising
instantiations that purport to behave intelligently is the primary means of identi-
fying deficiencies in the constructs, models, and methods underlying the instanti-
ation.’ Evaluation is valuable for both designers and implementers of context-aware
systems as to assessing whether the systems being designed are effective in terms of
meeting the expectations of the users when implemented in real-world situations.

3.17.2 Evaluation Challenges

In the field of computer science, there exists an array of constructs, models, and
methods that are robust, effective, and of high performance, but designed initially
for specific applications within AI, HCI, and more recently AmI. This implies that
these components may not always function as expected when used in general
applications—for other purposes other than those for which they are originally
developed. Indeed, as pointed out above, methods and models for context recog-
nition differ in terms of handling data abundance, uncertainty of context informa-
tion, uncertainty on reasoning, multi-sensor fusion, scalability, dynamicity, and
management of information flow. There is a wide variety of constructs, models, and
methods with significant differences in application. For example, behavioral
methods for emotion recognition and theoretical models of emotions (see Chap. 8
for more detail) can be applied to many different systems with performances
varying over the domain of application within context-aware computing, affective
computing, and conversational agents. Another example of constructs is Ontology
Web Language (OWL), a de facto standard in context-aware computing which is
currently being used for conceptual modeling—to implement context models.
Considering the fact that it was originally designed for computational efficiency in
reasoning, OWL as a modeling language fall short in offering suitable abstractions
for constructing conceptual models, as defended extensively in (Guizzardi et al.
2002; Guizzardi 2005). Whereas today’s W3C Semantic Web standard suggests a
specific formalism for encoding ontologies, with several variants as to expressive
power (McGuinness and van Harmelen 2004). More examples of constructs,
models, and methods and the differences in relation to their use in the field of
context-aware computing are covered in subsequent chapters. However, the main
argument is that evaluation becomes ‘complicated by the fact that performance is
related to intended use, and the intended use of an artifact can cover a range of
tasks… Not only must an artifact be evaluated, but the evaluation criteria them-
selves must be determined for the artifact in a particular environment. Progress is
108 3 Context and Context Awareness of Humans and AmI Systems …

achieved in design science when existing technologies are replaced by more


effective ones.’ (March and Smith 2005, p. 254).
To achieve effective outcomes for evaluating context-aware applications, stan-
dardization remains the way forward. Standardization provides a significant thrust
for further progress because it codifies best practices, enables and encourages reuse,
and facilitates interworking between complementary tools (Obrenovic and
Starcevic 2004). The lack of standardization of evaluation methods is more likely to
cause issues of inconsistencies in assessing the performance of context-aware
artifacts. However, since the research on context awareness has not matured yet, it
might take quite long time before standard evaluation solutions materialize. In fact,
what matters most in the technology market is to understand how and what it takes
to make the user accept technologies, rather than to design what users would like to
see or how they aspire to experience new technologies. Since context-aware
applications are not immune to marketability, fast, insubstantial evaluation is pre-
ferred in ICT and HCI design to get the products and services as quick as possible
to the market (see Tähti and Niemelä 2005).
Regardless, in the province of context-aware computing, it is as important to
scrutinize evaluation methods for assessing the different components underlying
context-aware artifacts as to evaluate these artifacts and their instantiations. This is
due to the complexity inherent in the design, development, implementation, and
assessment of context-aware applications. As noted by Tarjan (1987), metrics must
also be scrutinized by experimental analysis. In meta-evaluation, evaluation of
evaluations, metrics define what the evaluation research tries to accomplish with
regard to assessing the evaluation methods designed for evaluating the performance
of context-aware applications. Periodic scrutiny of these metrics remains necessary
to enhance such methods as the research evolves in the field of context-aware
computing. In the field of computer science, varied evaluation methods can be
studied and compared. One underlying assumption of assessing existing evaluation
methods is to determine how well a given method works in relation to a particular
application domain compared to other methods that are in use. Context-aware
applications differ in terms of complexity, scale, architectural design, the class of
context, the multiplicity and diversity of context constituents of an entity, the kinds
of delivered services (e.g., personalized, adaptive, responsive, proactive, etc.), and
so on. Furthermore, meta-evaluation may look into such features as dynamicity,
completeness, operationality (the ability to perform the intended evaluation task or
the ability of the evaluator to effectively use or apply the method), simplicity,
ease-of-use (less formalized), generality, consistency, and the quality of
meta-evaluation results obtained by the analyst applying the method. For example,
a dynamic-oriented evaluation method could enable a direct and efficient interaction
between the evaluator and the user or emphasize less interference of the evaluator in
the assessment process. It can also consider both dynamic and static aspects at the
same time during the evaluation of the instantiation of the artifact, depending on
which aspects to evaluate. Indeed, when evaluating evaluation methods for
context-aware artifacts, it is of import to look at the patterns of interaction between
the evaluator and the user in ways that do not affect the use of technology in its
3.17 Evaluation of Context-Aware Artifacts 109

operating environment. Evaluation methods should be built upon a broad under-


standing of the real-world environment where system is being used as well as a
multi-stakeholder perspective—e.g., user, designer, implementer, and assessor—in
addition to the evaluator’s knowledge of evaluation activities. Also, an ease to use
evaluation method usually provide more flexibility in terms of changing evaluation
requirements during the evaluation process and considering contingencies or situ-
ational emergences, thereby enabling evaluators to respond dynamically to various
contextual or situational variables. Contrariwise, evaluation methods should not
require a slavish adherence to their applications. Otherwise, it will blind the eval-
uator to critical issues relating to the performance of the artifact in its operating
environment.
In relation to the evaluation of instantiations in real-world environments, with the
aim to address some issues relating to the evaluation of emotions in AmI, Tähti and
Niemelä (2005) develop a method for evaluating emotions called Expressing
Emotions and Experience (3E), which is a self-report method that allows both
pictorial and verbal reporting, combining verbal and nonverbal user feedback of
feelings and experience in a usage situation. It is validated by comparing it to two
emotion assessment methods, SAM and Emocards which are self-report instruments
using pictograms for nonverbal assessment of emotions. The development of 3E is
described in detail in (Tähti and Arhippainen 2004). This method is a way to collect
rich data on user’s feeling and related context—mental, physical, and social—while
using an application or service without too much burden on the user. It moreover
enables to gauge users’ emotions by allowing users to depict or express their
emotions and experiences by drawing and writing, therefore providing information
of their feelings and motivations behind them in a way of their preferences, and
without the concurrent intervention of the researcher. The authors claim that it
applies well to AmI use situations that occur in real-world environments, does not
necessarily require the researcher’s presence, and, as a projective method, may
facilitate expression of negative emotions towards the evaluated system.

3.18 Design of Context-Aware Applications and User


Participation

3.18.1 Major Phase Shifts and Design Methods

Over the years several significant changes have emerged in computing (or ICT) and
its application in different Human Activity Systems (HAS). These changes have led
to a new wave of design methods embracing new dimensions to deal with funda-
mental issues in ICT design and development. Examples of the most common,
major phase shifts include: from HCI to MetaMan (MM); from Human Computer
Communication (HCC) via Computer to Computer Communication (CCC) to
Thing to Thing Communication (TTC); from virtual reality (VR) to hybrid reality
110 3 Context and Context Awareness of Humans and AmI Systems …

(HR); from Informing Systems (IS) to co-Creating Systems (CS); from require-
ments specification to co-design; from Technology driven (Td) to Demands driven
(Dd) development; from expert methods (EM) via Participatory Methods (PM) to
Stakeholder Methods (SM); and so forth. In terms of participative design, various
methods have been proposed and applied to ICT design and development. Among
which are user-centered design and participatory design as dominant design phi-
losophies that emphasize user-centrality and participation. They are usually utilized
in HCI design (yet not restricted to interactive technologies) to strive to create
useful user interfaces to respond to different classes of users and satisfy their needs.
They continue to be used to create functional, useful, usable, intelligent, emo-
tionally appealing, and aesthetically pleasant interactive systems, including AmI
applications. They both involve a variety of methods that emphasize user-centrality
and participation in different forms and formats.

3.18.2 The Notion of Participation

Participation is a highly contested and cryptic concept. It is associated with various


philosophical underpinnings and multifarious interpretations, i.e., it can approached
from different perspectives and refer to a wide variety of different situations by
different people. Hence, it is still under discussion. The origin of the concept as
associated with power relation issues in society is fading away under the diversity
of meanings adopted in different contexts. In reference to the importance of power
as central entity that must be linked to participation, Servaes writes: ‘this “real”
form of participation has to be seen as participation [that] directly addresses power
and its distribution in society. It touches the very core of power relationships’
(Servaes 1999, p. 198). Carole Pateman makes a distinction between ‘partial’ and
‘full’ participation, defining partial participation as ‘a process in which two or more
parties influence each other in the making of decisions but the final power to decide
rests with one party only’, and full participation as a process in which each member
involved in a decision-making body possesses equal power to determine the out-
come of decisions (Pateman 1970). Furthermore, there is a variety of practices and
theories informed by participation concept. It has attracted attention among com-
puter scientists (particularly HCI scholars) and social scientists (specifically
researchers concerned with social studies of technologies, socio-technological
phenomena, and social change), as well as AmI creators and practitioners in the
context of technology design. In addition, being open to a wide variety of divergent
interpretations, this notion has led to an exacerbating confusion or misconception as
to its practice, e.g., in HCI design, innovation, and industrial research. It has
moreover been criticized for long by many authors and continues to be challenged
in the prevalent discourse of design underlying the development of context-aware
applications (see below for further discussion). Some refer to it as an empty sig-
nifier (Carpentier 2007). This pitfall has proven to have implication for user par-
ticipation. What does it mean to have the users participating in the development
3.18 Design of Context-Aware Applications and User Participation 111

process? What is their impact? Has this concept become an empty signifier (Laclau
and Mouffe 1985)? In relation to context-aware computing, in promotional material
of AmI applications for the home, the researchers determined that, in contradiction
to the discourse of ‘putting the user central’, almost half of the pictures used in the
promotional material contained no humans but devices (Ben Allouch et al. 2005).
The social connotation of ‘user participation’ is partly lost as the term has been
reduced from something social and political in content and conceptual in form to
merely situated in some setting, thereby diverging from its origin. Indeed, in the
area of HCI, having the users participating in the design and development process is
not taken to mean participation in more theoretic views. User participation is
considered to be circumscribed in user-oriented design practice, as the meaning
attached to the concept of participation remains partial and influenced in most cases
in terms of power distribution—between designers and users—in the province of
technology design. To give a better idea to the reader, it is of import to trace the
origin of ‘user participation’ in design of early information systems and how it has
evolved. This is of relevance for the discussion of key issues relating to the use of
user-centered design models in the dominant design trend and the implication of
this use for the development of context-aware applications.

3.18.3 Participatory Design (PD): The Origin of User


Participation

Originated in Scandinavian tradition, culture and politics, PD draws authority from


a very distinctive set of discourses of labor relations and social justice. Nordic
welfare region is the birth place of the Scandinavian tradition of PD, where par-
ticipation often is understood as a mean of democracy (Elovaara et al. 2006). PD is
a research area that initially started from Trade Union Participation (Beck 2001).
It was driven by the central concern that workers needed to be able to participate in
the means of production. Therefore, representatives needed to understand new
technologies to be prepared for negotiations with management (Nygaard and Bergo
1973). The mode of negotiation in addition to the modes of engagement and
deliberation are drawn from Scandinavian strong traditions of union involvement in
workplace decision making (Winograd 1996). The political branch of PD evolved
as computer scientists made common cause with workers instead of management
when designing workplace information systems (Asaro 2000). Different political
and non-political researchers focused on the development of specific techniques for
involving users in design (see Bjerknes et al. 1987).
PD is characterized as a maturing area of research as well as an area of evolving
practice (Kensing and Blomberg 1998). PD researchers mainly address the politics
of design; the nature of participation; and methods, tools and techniques used to
foster participation (Ibid). PD involves design practitioners, researchers, and deci-
sion makers who advocate full user participation in design tasks and issues as a
112 3 Context and Context Awareness of Humans and AmI Systems …

means to generate, exploit, and enhance the knowledge upon which technologies
are built. Taken up more broadly, PD is described as a democratic, cooperative,
interactive and contextual design philosophy. It epitomizes democracy as it ensures
that users and designers are on the same footing, and sees user participation as a
vehicle for user empowerment in various ways. It maintains roles for designers and
users but calls for users to play a more active part in the imagination and specifi-
cation of technologies. Thereby, it seeks to break barriers between designers and
users and facilitate knowledge exchange between them through mutual involvement
in the design process. Indeed, the co-design process is about shared effective
communication and social collaboration which supports well-informed decisions
and actions in the event of desired democratic change. Drawing on Suchman’s
(2002) account, it is useful to think of design processes more as shaping and staging
encounters between multiple parties and less as ways that designers can formulate
needs and measure outcomes. Moreover, as a contextual approach, PD is about
designers acting in a social cultural setting where the users feed into the process by
providing knowledge needed to build and improve the use of interactive systems
that aim to facilitate daily activities within that setting. PD works well not because
of an inherent superiority to other methods, but rather it draws advantage from
cultural rationalities and practices specific to the setting in which it emerged. The
quintessence of the process is that different people come together and meet to
exchange knowledge, which draws attention to the context of that encounter and the
bidirectionality of the exchange; it is about what people bring into the encounter
and what they take away from it (Irani et al. 2010). Besides, designers should be
able to respond to different situations, whereby the users challenge their ability to
benefit from the contextual situated experience and knowledge. Furthermore, PD
seeks to better understand human users by exploring new knowledge for under-
standing the nature of participative design through an interlocutory space between
the designers and users and for improving the performance of such design through
developing innovative solutions for how to creatively involve users in the devel-
opment of technological systems. Researchers in PD are concerned with a more
human, creative, and effective relationship between the designers and users of
technology, and in that way between technology and the human activities that
provide the rationale for technological systems to exist (Suchman 1993). In the
context of AmI, it is more important than ever that new technologies should allow,
motivate, and require users to play a responsible role as co-designers, modifiers, and
value co-creators, which is not the case for context-aware computing. This is further
discussed below.

3.18.4 User-Centered-Design (UCD)

UCD perspective has emerged as a strong call for designing well-informed ICT
solutions and become of prime focus in HCI research and practice. UCD is the
dominant trend in HCI design, a widely practiced design philosophy, rooted in the
3.18 Design of Context-Aware Applications and User Participation 113

idea that users must be at the center of design process. In it, designers try to know as
much as possible about their users. Grounded in the understanding of users, UCD
allows designers to work together with users to articulate their needs, wants, goals,
expectations, and limitations. Within UCD practices users are asked to give feed-
back through specific user evaluations and tests to improve the design of interactive
systems. The attention is given to users during requirements gathering and usability
testing, which usually occur iteratively until the relevant objective has been
attained. However, within user-informed design—e.g., interaction design (Arnall
2006), and experience design (Forlizzi and Battarbee 2004)—information about the
users is gathered before developing a design and the user is included at a certain
moment in the design process (Geerts et al. 2007). When organizing co-design
sessions ‘the user is integrated in a very early stage of the conceptual and interface
design process and the focus is on the mutual learning process of developer and
user.’ (Criel and Claeys 2008, p. 61).
Underlying the notion of UCD is the idea that users are not forced to change how
they perform daily activities, using designed systems, to accommodate what the
designer has to propose as solutions, but rather to facilitate how users perform their
daily activities and how systems can be effectively suited to their skills and
experiences. Of importance is also to lessen the technical knowledge threshold
required to make constructive use of functionality, communication, and processing.

3.18.5 User-Centrality in AmI

AmI is claimed to be built on people-centered philosophy. Consequently,


user-centrality is of a focal point in AmI. A key implication of this is that users must
be actively involved as co-designers of new technological systems. This is high-
lighted in the ISTAG: Scenarios for Ambient Intelligence 2010 (ISTAG 2001). But
designing technologies from a user’s perspective, where users have, yet not far
greater, involvement in the process of technology design, is not a new idea to HCI
community. Over three decades there has been a growing attendance for the role of
the user in design and innovation, manifested through co-employing ethnography
researchers, usability designers, user experience engineers, and end-users or their
representatives. In the field of context-aware computing, given the adaptive, per-
sonalized, and responsive nature of interactive services that context-aware systems
are concerned with, user participation in the design of such systems becomes crucial
in order to produce delivery mechanisms that respond effectively to users’ needs,
goals, and expectations, as an attempt to gain their acceptance of and trust in AmI
technologies. Thus, UCD perspective still holds as a strong call for designing
well-informed AmI solutions, and continues to be of a major challenge in AmI as a
new paradigm of HCI in terms of which forms of user participation and empow-
erment to adopt or apply. The inevitability of the employment of user participation
and empowerment in the design of new technologies is justified by the fact that the
114 3 Context and Context Awareness of Humans and AmI Systems …

more the user is actively involved, the more successful are the designed techno-
logical solutions.
However, research shows that the mainstream trends in the design of
context-aware applications do not fully pursue the participatory philosophy of
design. In other words, the claim about user-centrality in design remains at the level
of discourse, as it has been difficult to translate the UCD guidelines into real-world
actions. Thus, there is a gap between theory and practice as to the user involvement
when it comes to HCI design, the design of context-aware applications in particular
and of interactive technologies in general. In fact, full user participation in the
design process has been questioned and contested for long, and continues to be
challenged in the field of context-aware computing. Consequently, many views
argue that the vision of AmI may not evolve as envisioned with regards to sup-
porting the user due to the uncertainty surrounding the current vision of user
participation. It is doubtful if the AmI vision puts the user at such a central stage as
designers often claim (Criel and Claeys 2008). It is the dominant design discourse
shaping the process of creating context-aware applications that is most likely the
cause of failing to see a real breakthrough in research within AmI.

3.18.6 The Impoverishment of User Participation


and the Loss of Its Political Connotation

Full user participation in the design process is one of the most contentious issues
raised in the realm of HCI. When talking about user participation in the develop-
ment of technologies and their applications and services, one can in ‘best’ cases
speak of a certain form of partial participation, but in no way of full participation—
of more or less equal power relations. Indeed, partial participation is the de facto
standard in most UCD methods concerned with HCI design. There are different
methods that can be clustered into the name ‘UCD’ and all of them lean on the
participation of the user in the innovation process.
Involving interdisciplinary teams, performing user research and organizing
co-design sessions (where users are allowed to work together with the designer(s)
or with other users) as common practices in UCD differs from how things can be
done within PD. User participation as applied in UCD is similar (comparable) but
not identical to PD in which users are considered as partners with the designers.
Experiences of HCI design show that user participation is not currently being
applied according to the original idea developed within PD—users are to fully
participate and thus actively contribute to the design process through shared design
sessions and workshops, exchanging feedbacks and suggestions with designers.
Although UCD approach involves consulting directly with the users, the approach
is said to be not fully participatory in practice, as users are not fully involved in all
stages of the design process, and subsequently do not shape the decisions and
outcomes of design solutions. There limitations as related to both user research and
3.18 Design of Context-Aware Applications and User Participation 115

co-design extend to design practice in AmI. Regardless of the specific terminology


and the UCD model being used, there is a tendency in the design of AmI artifacts
towards reducing the complexity of user participation (in other words: alienating the
term from its complex meaning in more theoretical or political view) for technical
purposes. Consequently, it is uncertain that users are put at central stage of design
process as AmI designers often claim; rather, user participation continues to be
applied according to the way user-centrality is viewed by HCI designers. The
development of AmI applications is rather influenced by the dominant discourse of
UCD and what this entails in terms of the kind of meaning ascribed to user par-
ticipation and translated into practice. The way HCI designers and researchers
approach user involvement in the design process will impact on design research and
practice in AmI applications. How people in the field of HCI write about user
participation does not resonate with design as a dominant social practice in the
realm of technology. In other words, user participation remains at the level of
discourse, in spite of being ubiquitous in the visions of AmI, with all those claims
for user centrality and empowerment.
It becomes evident that UCD models, albeit labeled ‘participatory’, are not fully
participatory, which goes together with AmI as a new paradigm of HCI. This is due
to the narrow view inherent in the interpretation attached onto the concept of ‘user
participation’, which is driven by pragmatic ends. As argued by Criel and Claeys
(2008, pp. 61–62), the ‘widespread adoption of the concept “user participation”,
and diversification of design methods that put the user central, does not mean that
the original ideas on participation and user participation, as historical rooted within
PD, are also widely disseminated. The question even is if these design methods can
be defined as “participatory” if we look at the interpretation and meanings attached
onto the concept of “user participation”…’ Put differently, from most of the design
research within HCI thus far, one of the contentious issues is that the political
connotation of ‘user participation’ is fading away. This connotation is partly lost as
the term has been reduced from something cultural and political in content and
conceptual in form to merely located or situated in some design process or setting.
It is the political connotation of the term ‘user participation’ that allows highlighting
the differences between its use in work on AmI artifacts and the original ‘user
participation’. This divergence is driven by various factors in the provinces of AmI,
one of which is its use for marketing purposes. As determined by researchers in the
work on AmI, almost half of the pictures used in the promotional material of AmI
applications for the home contained no humans but devices, which contradict the
discourse of ‘putting the user central’ (Ben Allouch et al. 2005). As Beck (2001,
p. 6) formulates it: PD ‘has come to include practices that share only the historical
link to participation as a vehicle for empowerment. In the non-political extreme,
user participation, once politically radical, has been picked up as a slogan for
marketing and other uses’. All in all, where in PD empowerment and participation
were central and political aims, the different UCD models are mostly not concerned
with empowerment and active involvement of users, a pitfall which seems to carry
over its effect to the predominant design trend in AmI.
116 3 Context and Context Awareness of Humans and AmI Systems …

3.18.7 Realities and Contradictions of User Participation


in Context-Aware Computing

Given the very nature of AmI technology—active personalized, adaptive, respon-


sive, and proactive behavior—the focus of design in context-aware computing
becomes no longer on the interaction of human users with technology as nonhuman
machines, but rather on the interaction between human users and technological
artifacts as subjects, whereby the system cognitive and behavioral processes are
expected to emulate those of humans with respect to interaction. Thus, to a great
extent, it becomes unreasonable to opt for even partial, not to mention full, user
participation. User participation in the design of AmI artifacts becomes an oxymoron
given the vision of invisibility underpinning AmI technology, the guiding principle
of context-aware computing in terms of interaction and service provision. Invisibility
means that the computer system is to take care of the context in which users find
themselves, by detecting, analyzing, and understanding context information and
responding to it autonomously. The invisibility of user interfaces and user partici-
pation exclude each other. If AmI technology should be mentally and physically
invisible and unobtrusive and not require a steep learning curve, then placing the
user at the center of AmI design becomes contradictory (ISTAG 2001). Put differ-
ently, empowering users by letting them ‘handle some of the semantic connections
of the system and the ambiguities that may arise’ and ‘enabling them to generate
their own meaning for the interaction with AmI systems’ is in fact ‘the opposite of
the view that AmI systems need to be transparent and invisible, and it may also seem
like a move backwards in a field that has so often proclaimed the ability to anticipate
user needs and react accordingly’ (José et al. 2010, p. 1487). Crutzen (2005) con-
tends that with designers, researchers, and experts overvaluing design, ‘design
within use’ has been reduced to the theme of ‘the adaptation of the technology’
which results in continuous measurement and interpretation of our behaviors and
actions, and the acceptance of AmI technology by users. But the conundrum of the
mental invisibility, in particular, is that humans who are to benefit from future AmI
environments are not consulted or asked for their views of what is appropriate as
‘intelligent’ services and acceptable as interactive technologies. The ambient ser-
vices to be delivered—personalized, adaptive, and anticipatory actions—are taken
for granted by designers to be relevant, although the context-triggered actions are
based on the context as understood and inferred by the system, not as a result of the
perception and selection of the user or, at least, of the negotiation between the human
user and the system. AmI systems are claimed to react and pre-act in a way that is
articulated as desirable and appropriate. For AmI technology it is the way designers
envision how technology should serve users or users should take advantage of
technology that seem to dictate the type of services that will be provided. This
implies that the designers conceive of ‘user-centeredness’ as based on a notion of
non-problematic interaction between human users and technology: ‘which means
technology that can think on its own and react to (or possibly even predict) indi-
vidual needs so people don’t have to work to use it’ (Crutzen 2005, p. 222).
3.18 Design of Context-Aware Applications and User Participation 117

Moreover, the adaptive and proactive behavior of AmI technology circumscribing


design within use gives designers full power to set the rules and boundaries as to the
design of context-aware applications and environments. By taking this role, then
partially active participative role of user comes to an end—i.e., it is meant to vanish.
This has several implications for the use and acceptance of AmI technology. In all, as
summarized by Crutzen (2005, p. 224), ‘People are in danger of losing within the
activity of use the activity of ‘design’. In AmI the symbolic meaning of use and
design is reconstructed as an opposition in which ‘design’ is active and virtuous and
‘use’ is passive and not creative. This dominance of design discloses and largely
prevents the act of discovery of the users by the designer and acts of discovery on the
part of the users. Design is focused on generalized and classified users. Users are
turned into resources, which can be used by designers in the process of making
ICT-products. They do not have sufficient room for starting their own design pro-
cesses. Those who do not fit into regimented classes are seen as dissidents. In AmI,
designers are creating an artificial play in which they have given the active and
leading role to the artificial subjects. Users are ready-made sources of data for the
technology in their environment. By interpreting ‘user-centeredness’ in this way, the
active explicit participation aspect is lost…the user is reduced to an observable
object placed in a feedback loop that, in the opinion of the designers, converges to an
optimal intelligent environment with an action/communication oriented smart space
function in order to influence the user’. Consequently, it has been suggested that
AmI must move behind its foundational vision and particularly revisit the notion of
intelligence, by embracing the emerging trends around it. This involves rethinking
the role of users in terms of empowerment and the exposure of the ambiguities
associated with the functioning of AmI applications. Allowing users to control and
manage some of the semantic connections—context inferences—of the system and
also trigger behaviors in it would overcome most of the complex issues relating to
the need to make AmI applications function flawlessly.
From a general perspective, whether designers try to know much about their users
within multiple forms of UCD or have complete control over the design of AmI
applications, interactive technologies are, in general, still designed for a certain type
of users. Technical decisions in the development of technologies aim at a certain
type of users (Kirk and Carol 2004; Nes 2005). Norman (2005) points out when
applications are getting adapted for the particular likes, dislikes, skills, and needs of a
particular target user group, they will less likely be appropriate for others. Design
tends to become quickly a strengthening of existing stereotypes when targeting
specific groups, which ‘may conduct shortage to the individual preferences and
reality of a liquid identity, preferences and life of users’ (Criel and Claeys 2008). In
line with this argument, the assumption that individuals have a single preference and
cultural profile is problematic. Taxonomic models, in general, tend to overlook many
aspects of people’s experiences, and thus their use should be mitigated in the design
of AmI applications. They are commonly used as patterns to systematically cate-
gorize people and communities, which is likely to render design of such technologies
ineffective or unsuccessful. Research shows that this approach is analytically weak
in explicating differences in technology use because taxonomies that prevail in the
118 3 Context and Context Awareness of Humans and AmI Systems …

framing of technological design endeavors have become useless—do not hold


anymore. It thus becomes highly relevant to re-examine the application of taxonomic
models in the realm of the design of AmI technology. In fact, the prevailing UCD
approach to interface design has been seen as more like ‘partial-frames’ and
‘lock-frames’; it is unable to capture the subtlety of individual, social, and cultural
dimensions of users. This implies that a great deal of technical decision in the UCD
process stays in the hands of the designers and developers as to the use and con-
figuration of applications. This can have strong implications for the design of AmI
applications. Instead, any viable design solution must be put into a much wider
perspective. In this regard, it is strategically valuable, albeit challenging, to develop
novel methods that allow the involvement of users both at the micro-level of their
everyday life as well as at the macro-level of their sociocultural life.
It is imperative to rethink user participation in the design practice of AmI
technology as well as to move away from stereotypes when targeting user groups.
Involving all types of users (e.g., literate, illiterate, disabled, aged, gendered, dis-
advantaged, etc.) in design decisions is crucial for creating the kind of technologies
that are designed for all, thereby enabling a wide variety of users and social groups
to become socially-through-digitally included. Future trends of UCD face enormous
challenges associated with the engagement with users to better understand the
psychological, behavioral, social and cultural dimensions of their context; the
articulation of targets of design and the interpretation of their needs, desires,
opportunities, and limitations; and the translation of the requirements into state-
ments about AmI applications. Successful creation and thus wide adoption of AmI
technology is determined by full user participation since related applications are
about people. Users ought not to be stochastically at the outer borders of AmI
systems. They should be moved from the periphery to the center of attention and
become a salient defining factor in the design and use of AmI applications, as
advocated by new emerging research trends. In all, the major thrust will come from
novel and creative ways of involving users in the design process of AmI systems. It
is timely and necessary to develop new knowledge and tools to incorporate the
complexity and subtlety of user behavior as parameters in system design and
development. In other words, real research endeavors must be undertaken and
strong effort must be made in the direction of user behavior design supported by full
user participation to achieve in the understanding of users the same level of con-
fidence that exists in designing technologies.

3.19 Empowering Users and Exposing Ambiguities:


Boundaries for Developing Critical User
Participatory Context-Aware Applications

There is a firm belief that users will never fully participate in the design of context-
aware applications and environments. Most work in developing context-aware
artifacts appears to be technology-driven, that is, development is driven by what is
3.19 Empowering Users and Exposing Ambiguities: Boundaries … 119

technically and computationally feasible rather than by insights provided by users


into the way they aspire to interact with such artifacts and thus how they can be
designed. This is due to the fact that little knowledge and technology (methods,
models, and tools) are available to incorporate user behavior as a factor in system
design and development. However, this does not necessarily mean that there are no
alternatives to reconsider the role of users, by empowering them and exposing them
to some of the ambiguities raised by the inaccurate sensing, inefficient inferences,
and unfitting actions. User empowerment is one of the key issues being addressed
as part of the ongoing endeavor of revisiting the notion of intelligence in AmI. In
this line of thinking, in a transdisciplinary study on context-aware applications and
environments, Criel and Claeys (2008) suggest a non-exhaustive set of recom-
mendations for the development of what they call ‘critical’ user participatory
context-aware applications and environments. They identify six key conditions that
need to be fulfilled to be able to speak of critical user participation in the devel-
opment and configuration of context-aware application, in an attempt to address the
issue of user empowerment—to shift existing power relations in the advantage of
the user. They state that these conditions are formulated from technological and
human perspective, as developers and users both need to be accountable and
knowledgeable actors. The formulation of these conditions is based on the fol-
lowing assumptions:
• Users perceive context-aware applications nowadays as black boxes
• Users ‘should be able to look into the black box and define themselves for which
context data…a certain action should be performed’, rather than ‘developers
define what the output/inferred data will be for a certain input’
• ‘The opening of the black box onto a certain level is inevitable for empower-
ment of the users’
The conditions are described below as adapted from Criel and Claeys (2008) and
supported by other authors’ insights and views, along with some reflection:
1. ‘People should know about the computerized context that surrounds them.
Therefore they should be aware which context can be ‘sensed’ and understand
what it means. This doesn’t mean that users should know all sensor details but at
least what’s possible to detect and what not. A way to tackle this problem from
our perspective is that users could retrieve which context is measured in the
environment surrounding them at any time and any place. Therefore (maybe
separate) context-aware applications should be available to sense the environ-
ment and present the context topics that are measured in a human understand-
able way. Without access to this information users will always experience an
ambient environment as suspicious or even hostile and will fear the unknown.’
(Criel and Claeys 2008, pp. 69–70). Intelligibility can help expose the inner
workings of such applications that tend to be opaque to users due to their
implicit sensing (Lim and Dey 2009). The basic idea is that it is important to
expose ambiguity to and empower users, an issue which pertains to the ongoing
endeavor of revisiting the whole notion of intelligence in AmI in terms of
120 3 Context and Context Awareness of Humans and AmI Systems …

reconsidering the role of users. The significance of letting users handle some of
the ambiguities that may arise and the semantic connections of the AmI system
lies in its potential to overcome many of the complex issues relating to the need
for accurate or perfect sensing and interpretation of the state of the world (e.g.,
the human’s psychological and social states overtime) that many AmI scenarios
seem to proclaim (José et al. 2010).
2. Users should be able to understand the logic applied in context-aware applica-
tions, meaning that they should be able to know why a certain action is performed
or an application behaves in a certain way. Schmidt (2005) argues for an AmI
interaction model in which users can always choose between implicit and explicit
interfacing: ‘The human actor should know…why the system has reacted as it
reacted’. This is very important because it enables the user to interfere in and
choose how the application should behave in a given situation. People should be
active shapers of their ambient environments, not passive consumers of ambient
services. Contrariwise, developers tend to determine when context-dependent
actions should be performed by defining the inferred data for a certain (implicit
contextual) input, and, in this case, users are expected to passively use or receive
them, without any form of negotiation. In other words, users do not have the
possibility to influence the inferred context or decline the associated intelligent
responses of context-aware systems; they are obliged to accept what the devel-
opers have to offer as ambient intelligent services. ‘When things happen without
the understanding of a person but only by the developer, the developer is
determining the behavior of that person in a non-democratic way…A lot of
applications have these problems but in context-aware applications the life of the
person is affected without the feeling of direct computer interaction’ (Criel and
Claeys 2008, p. 70). Again, this is about empowering people through, as José
et al. (2010, p. 1488) contend, ‘enabling them to generate their own meaning for
the interaction with AmI systems. This should provide a real path towards
user-driven AmI scenarios that provide meaningful functionality that is effec-
tively valued by potential users. Rather than removing the “burden” of choosing,
AmI should make decisions easier to judge and support new practices that allow
people to more intelligently undertake their lives… Instead of having the system
deciding for us, we can leverage on the system for making our choices more
informed and promoting serependity. Moreover, giving people more control may
be an essential step in unleashing the creativity and the everyday life connection
that has so often been missing from AmI research, extending it into more playful
and creative practices.’
Moreover, in relation to the argument that people should understand why
applications behave as they behave, explanations should be unambiguous: in a
human understandable rather than in a mystic computerized way. To give users
a better understanding as to logic of context-aware application logic, it is sug-
gested to present a diagnosis to the user that explains why different
context-aware actions taking place in the AmI environment occur, keeping in
mind that the provided information should be in a graphical way or a human
3.19 Empowering Users and Exposing Ambiguities: Boundaries … 121

understandable language. Context-aware applications must be intelligible: being


able to ‘represent to their users what they know, how they know it, and what
they are doing about it’ (Bellotti and Edwards 2001). Support of intelligibility
can occur through automatically generating explanations of application behavior
that users want, which has the potential to increase user satisfaction and thus
trust and acceptance of context-aware applications (Lim and Dey 2009). While
little work has been done to compare the impact of different types of explana-
tions in the domain of context-aware computing (Lim et al. 2009), Lim and
Dey’s (2009) findings indicate that some types of explanation are more effective
than others in improving users’ understanding and trust of context-aware sys-
tems. Yet, as it is not clear what information users actually want to know and
will ask about, the authors explore and assess user demand for intelligibility:
which types of questions users want answered, and how answering them
improves user satisfaction of context-aware applications, and they provide an
overview of different types of explanations to support intelligibility in terms of
both question users may ask of context-aware applications as well as description
of their experimental design that uses surveys and scenarios to expose users to a
range of experiences with context-aware applications. Regardless, intelligibility
can help expose the inner workings of such applications that tend to be opaque
to users due to their implicit sensing (Lim and Dey 2009). Contrariwise, the lack
of transparency can hinder users from making sense of context-aware applica-
tions (Bellotti and Edwards 2001). The lack of application intelligibility can lead
users to mistrust the system, misuse it, or discard it altogether (Muir 1994).
Users of context-aware applications can find the lack of intelligibility frustrating
(Barkhuus and Dey 2003a).
3. A simple but important condition is that users always must have the option to
switch off the context-aware interactions, and accordingly the ultimate control
should lie in their hands rather than in hands of developers. This is most often
not the case in current context-aware applications where it is indeed the
developer who manages the application not the user. Rather, the critical trans-
formative room that stands between the user and AmI ‘should include a
diversity of options to influence the behavior, use and design of the technology.
The off-switch is only one end of a rich spectrum of intervention tools… per-
vasive applications are very fragile and any design paradigm must include ways
in which the average user can fix problems’ (Crutzen 2005, p. 227). Leahu et al.
(2008) call for a redirection of the effort into what they label an ‘interactionist’
approach to AmI, in which the generation of intelligent behavior attempts to
capitalize on the fact that AmI is directed towards humans and, thus, can
leverage on their behavior to create alternative notions of situated intelligence.
In this line of thinking, Rogers (2006) contends that the specifics of context
being too subjective, subtle, fluid and difficult to identify to capture in models
would hamper the system to make sensible predictions about user’s feelings,
wants, and needs, and accordingly the author suggests an alternative research
agenda: a shift from proactive computing to proactive people in which AmI
122 3 Context and Context Awareness of Humans and AmI Systems …

technologies can be designed to engage people more actively by extending their


practices rather than to do things for them or on their behalf.
4. Users should be able to participate in the decisions made about what actions to be
performed that are triggered by situations defined at the inference level. That is,
users should be able to intervene or adapt what should happen when certain
context conditions are met. In fact, many of the complex inference problems
suggested for AmI are trivial when handled by users. By developers defining
what happens when without the user intervention, context-aware actions are
likely to become irrelevant and undesirable; indeed, developers can never create
the logic needed for the individually users given the inherent feature that
context-aware application are very personal—specific to each user. Put differ-
ently, context-aware systems are not uniform and there will always be some kind
of technical discontinuity that may, even when inferences are simple, affect the
ability to constantly get it right as to estimating what is going on in the human’s
mind or reasoning about the meaning of what is happening in the surrounding
physical or social environment. As suggested by Edwards and Grinter (2001), we
should accept that ambiguity should not be hidden from the parts of the users who
may need to understand the pragmatics of interpretation and machine action as
long as inference in the presence of ambiguity is inherently prone to errors.
While the developers will still define the most basic context topics and related
actions that can be used by users to create their own rules, technologically rule
engines could still be combined with DSL’s to allow users to compose ‘their
own context-aware logic by defining their own rules without having to become a
general-purpose developer’ (Criel and Claeys 2008). This would reduce the risk
of users losing within the activity of use the activity of design and allow
co-learning between the designer and the user. Otherwise the dominance of
design—‘design’ being active and virtuous and ‘use’ passive and not creative
—‘discloses and largely prevents the act of discovery of the users by the
designer and acts of discovery on the part of the users’ (Crutzen 2005).
5. Perhaps the most difficult condition to realize is that users become able to define
their own meaning to context topics, which is so subjective and evaluating in
time that developers, albeit working in trans-disciplinary teams, never can define
it for the user. While there is a possibility of using rules to implement a very
basic form of meaning, whereby users could define what they understand in
terms of the inferred context under certain conditions, it is necessary to take into
account the constraints of existing technologies, the fact that they cannot handle
meaning, and meaning is constructed within the interaction itself.
6. As a last, but very important, condition is that developers and users both need to
be accountable and knowledgeable actors. Users have to take their responsibility
the same as the developers, as Suchman (2002) points out, may design from a
‘located accountability’, not only a ‘view from nowhere’ or ‘detached intimacy’.
In AmI era where users will be empowered by the digital environment, it
becomes ‘necessary to develop some critical digital literacy, and also some
critical literacy of the digital. A necessary condition to shift power relations
3.19 Empowering Users and Exposing Ambiguities: Boundaries … 123

regarding technology, and more specific related to context-aware applications,


in favor of the user is inextricably linked with the will of users to take their
responsibility in autonomous behaving and controlling their everyday life world
where context-aware applications will possibly get integrated’ (Criel and Claeys
2008, p. 71).
Although meeting these conditions is a sound approach to designing successful
context-aware applications and environments, full user participation in the
design, use and configuration of context-aware applications is no easy task to
implement given the constraints inherent in engineering, design and modeling of
new technologies. Nevertheless, as argued by the authors, ‘although users will
never fully participate in the development and configuration of the
context-aware logic’, ‘consequent satisfaction of these conditions will make the
users more confident in context-aware environment and give them a greater
feeling of being in control’ (Criel and Claeys 2008, p. 71).

References

Abowd GD, Mynatt ED (2002) Charting past, present, and future research in ubiquitous
computing. In: Carroll JM (ed) Human-computer interaction in the new millennium. Addison
Wesley, Boston, pp 513–536
Agre PE (2001) Changing places: contexts of awareness in computing. Human Compu Interact 16
(2–3)
Arnall T (2006) A graphic language for touch-based interactions. Paper presented at the mobile
interaction with the real world (MIRW 2006), Espoo, Finland
Asaro PM (2000) Transforming society by transforming technology: the science and politics of
participatory design. Account Manage Inf Technol 10(4):257–290
Barkhuus L, Dey A (2003a) Is context-aware computing taking control away from the user? Three
levels of interactivity examined. In: Ubiquitous computing, pp 149–156
Barkhuus L, Dey A (2003b) Location-based services for mobile telephony: a study of users’
privacy concerns. In: Proceedings of Interact, ACM Press, Zurich, Switzerland, pp 709–712
Beck E (2001) On participatory design in Scandinavian computing research. University of Oslo,
Department of Informatics, Oslo
Bellotti V, Edwards WK (2001) Intelligibility and accountability: human considerations in
context-aware systems. Human Comput Interact 16(2–4):193–212
Ben Allouch S, Van Dijk JAGM, Peters O (2005) Our future home recommended: a content
analysis of ambient intelligence promotion material. Etmaal van de Communicatiewetenschap.
Amsterdam, The Netherlands
Bjerknes G, Ehn P, Kyng M (eds) (1987) Computers and democracy—a Scandinavian challenge.
Aldershot
Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A
survey of context modelling and reasoning techniques. J Pervasive Mobile Comput Spec Issue
Context Model Reasoning Manage 6(2):161–180
Bianchi-Berthouze N, Mussio P (2005) Introduction to the special issue on “context and emotion
aware visual computing”. J Vis Lang Comput 16:383–385
Bravo J, Alaman X, Riesgo T (2006) Ubiquitous computing and ambient intelligence: new
challenges for computing. J Univ Comput Sci 12(3):233–235
Brooks RA (1991) Intelligence without representation. Artif Intell 47(1–3):139–159
124 3 Context and Context Awareness of Humans and AmI Systems …

Brown PJ (1996) The stick–e document: a framework for creating context-aware applications. In:
Proceedings of EP’96, Palo Alto, pp 259–272
Brown PJ, Jones GJF (2001) Context-aware retrieval: exploring a new environment for
information retrieval and information altering. Pers Ubiquit Comput 5(4):253–263
Carpentier N (2007) Introduction: participation and media. In: Cammaerts B, Carpentier N
(eds) Reclaiming the media: communication rights and democratic media roles. Intellect,
Bristol
Cearreta I, López JM, Garay-Vitoria N (2007) Modelling multimodal context-aware affective
interaction. Laboratory of Human–Computer Interaction for Special Needs, University of the
Basque Country
Chen G, Kotz D (2000) A survey of context-aware mobile computing research. Paper TR2000–
381, Department of Computer Science, Dartmouth College
Chen L, Nugent C (2009) Ontology-based activity recognition in intelligent pervasive
environments. Int J Web Inf Syst 5(4):410–430
Cheverst K, Mitchell K, Davies N (2001) Investigating context-aware information push vs.
information pull to tourists. In: Proceedings of mobile HCI 01
Clancey WJ (1997) Situated cognition. Cambridge University Press, Cambridge
Cowie R, Douglas-Cowie E, Cox C (2005) Beyond emotion archetypes: databases for emotion
modelling using neural networks. Neural Networks 18(4):371–388
Criel J, Claeys L (2008) A transdisciplinary study design on context aware applications and
environments. A critical view on user participation within calm computing. Observatorio
(OBS*) J 5: 057–077
Crowley J, Coutaz J Rey G, Reignier P (2002) Perceptual components for context aware
computing. In: Proceedings of UbiComp: ubiquitous computing, 4th international conference,
Springer, Berlin
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc 3
(4):219–232
Dey AK (2000) Providing architectural support for building context-aware applications. PhD
thesis, College of Computing, Georgia Institute of Technology
Dey AK (2001) Understanding and using context. Pers Ubiquit Comput 5(1):4–7
Dey AK, Abowd GD, Salber D (2001) A conceptual framework and a toolkit for supporting the
rapid prototyping of context-aware applications. Human Comput Interact 16(2–4):97–166
Dockhorn C, Ferreira P, Pires L, Van Sinderen M (2005) Designing a configurable services
platform for mobile context-aware applications. J Pervasive Comput Commun 1(1)
Dourish P (2001) Where the Action Is. MIT Press
Dourish P (2004) What we talk about when we talk about context. Pers Ubiquitous Comput 8
(1):19–30
Dreyfus H (2001) On the internet. Routledge, London
Ekman P (1984) Expression and nature of emotion. Erlbaum, Hillsdale
Edwards WK, Grinter RE (2001) At Home With Ubiquitous Computing: Seven Challenges. In:
Proceedings of the UbiComp 01, Atlanta, GA. Springer-Verlag, pp 256–272
Elovaara P, Igira FT, Mörtberg C (2006) Whose participation? Whose knowledge?—exploring PD
in Tanzania–Zanzibar and Sweden. In: Proceedings of the ninth Participatory Design
Conference, Trento
Erickson T (2002) Ask not for whom the cell phone tolls: some problems with the notion of
context-aware computing. Commun ACM 45(2):102–104
Forlizzi J, Battarbee K (2004) Understanding experience in interactive systems. Paper presented at
the DIS2004, Cambridge
Geerts D, Jans G, Vanattenhoven J (2007) Terminology. Presentation at citizen media meeting,
Leuven, Belgium
Giunchiglia F, Bouquet P (1988) Introduction to contextual reasoning: an artificial intelligence
perspective. Perspect Cogn Sci 3:138–159
Goodwin C, Duranti A (eds) (1992) Rethinking context: language as an Interactive phenomenon.
Cambridge University Press, Cambridge
References 125

Gross T, Prinz W (2000) Gruppenwahrnehmung im kontext. Tagungsband der Deutschen


Computer-Supported Cooperative Work Konferenz (D-CSCW), Munich, Teubner, Stuttgart,
pp 115–126
Guizzardi G (2005) Ontological foundations for structural conceptual models. PhD thesis,
University of Twente, The Netherlands, TI–FRS No. 15
Guizzardi G, Herre H, Wagner G (2002) On the general ontological foundations of conceptual
Modeling. In: Proceedings of the 21st Int’l Conference on Conceptual Modeling (ER–2002),
LNCS 2503, Finland
Gunnarsdóttir K, Arribas-Ayllon M (2012) Ambient intelligence: a narrative in search of users.
Lancaster University and SOCSI, Cardiff University, Cesagen
Gwizdka J (2000) What’s in the context? Computer human interaction (CHI). The Hague, The
Netherlands
Göker A, Myrhaug HI (2002) User context and personalisation. ECCBR workshop on case based
reasoning and personalisation, Aberdeen
Hull R, Neaves P, Bedford-Roberts J (1997) Towards situated computing, In: Proceedings of the
1st IEEE international symposium on wearable computers, IEEE Computer Society
Irani L, Vertesi J, Dourish P, Philip K, Grinter R (2010) Postcolonial computing: a lens on design
and development. Proc CHI 2010:1311–1320
ISTAG (2001) Scenarios for ambient intelligence in 2010. In: Ducatel K, Bogdanowicz M,
Scapolo F, Leijten J, Burgelman J-C (eds) IPTS–ISTAG, EC: Luxembourg, viewed 22 October
2009. ftp://ftp.cordis.lu/pub/ist/docs/istagscenarios2010.pdf
ISTAG (2003) Ambient intelligence: from vision to reality (For participation—in society &
business), viewed 23 October 2009. http://www.ideo.co.uk/DTI/CatalIST/istag–ist2003_draft_
consolidated_report.pdf
José R, Rodrigues H, Otero N (2010) Ambient intelligence: beyond the inspiring vision, J Univ
Comput Sci 16(12):1480–1499
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In:
International conference on automatic face and gesture recognition, France, pp 46–53
Kensing F, Blomberg J (1998) Participatory design: issues and concerns. Comput Support Coop
Work J Collaborative Comput 7 3(4):167–185
Khedr M, Karmouch A (2005) ACAI: agent-based context-aware infrastructure for spontaneous
applications. J Netw Comput Appl 28(1):19–44
Kim S, Suh E, Yoo K (2007) A study of context inference for Web-based information systems.
Electron Commer Res Appl 6:146–158
Kintsch W (1988) The role of knowledge in discourse comprehension: a construction-integration
model. Psychol Rev 95(2):163–182
Kirk M, Carol Z (2004) Narrowing the digital divide: in search of a map to mend the
gap. J Comput Sci Coll Arch 20(2):168–175
Kirsh D (2001) The context of work. Human Comput Interact 16:305–322
Kozulin A (1986) The concept of activity in Soviet psychology. Am Psychol 41(3):264–274
Kuutti K (1991) Activity theory and its application to information systems research and
development. In: Missen HE (ed) Information systems research. Elsevier Science Publishers,
Amsterdam, pp 529–549
Kwon OB, Sadeh N (2004) Applying case-based reasoning and multiagent intelligent system to
context-aware comparative shopping. Decis Support Syst 37:199–213
Kwon OB, Choi SC, Park GR (2005) NAMA: a context-aware multi-agent based web service
approach to proactive need identification for personalized reminder systems. Expert Syst Appl
29:17–32
Laclau E, Mouffe C (1985) Hegemony and socialist strategy: towards a radical democratic politics.
Verso London, New York
Lang PJ (1979) A bio-informational theory of emotional imagery. Psychophysiology 16:495–512
Lassila O, Khushraj D (2005) Contextualizing applications via semantic middleware. In:
Proceedings of the second annual international conference on mobile and ubiquitous systems:
networking and services, San Diego, USA, pp 183–189
126 3 Context and Context Awareness of Humans and AmI Systems …

Lave J (1991) Situated learning in communities of practice. In: Resnick LB, Levine JM,
Teasley SD (eds) Perspectives on socially shared cognition. American Psychological
Association, Washington DC, pp 63–82
Leahu L, Sengers P, Mateas M (2008) Interactionist AI and the promise of ubicomp, or, how to put
your box in the world without putting the world in your box. In: Proceedings of the 10th Int
conf on Ubiquitous comput, pp 134–143, ACM, Seoul, Korea
Lee Y, Shin C, Woo W (2009) Context-aware cognitive agent architecture for ambient user
interfaces. In: Jacko JA (ed) Human–computer interaction. Springer, Berlin Heidelberg,
pp 456–463
Lieberman H, Selker T (2000) Out of context: computer systems that adapt to, and learn from,
context. IBM Syst J 39:617–632
Lim BY, Dey AK, Avrahami D (2009) Why and why not explanations improve the intelligibility
of context-aware intelligent systems. Proc CHI 2009:2119–2128
Lim BY, Dey AK (2009) Assessing demand for intelligibility in context aware applications.
Carnegie Mellon University, Pittsburgh
Lindblom J, Ziemke T (2002) Social situatedness: Vygotsky and beyond. In 2nd Int Workshop on
Epigenetic Robotics: modeling cognitive development in robotic systems, p. 7178, Edinburgh,
Scotland
Loke SW (2004) Logic programming for context-aware pervasive computing: language support,
characterizing situations, and Integration with the Web. In: Proceedings IEEE/WIC/ACM
international conference on web intelligence, pp 44–50
Loke S, Ling C, Gaber M, Rakotonirainy A (2008) Context aware computing, arc research
network in enterprise information infrastructure, viewed 03 January 2012. http://www.eii.edu.
au/taskforce0607/cac//http://hercules.infotech.monash.edu.au/EII–CAC/
Lueg C (2002) Operationalizing context in context-aware artifacts: benefits and pitfalls Human
Technol Interface 5(2), pp 43–47
March ST, Smith GF (1995) Design and natural science research on information technology. Decis
Support Syst 15:251–266
McGuinness DL, van Harmelen F (2004) OWL web ontology language overview. W3C
Recommendation, viewed 28 March 2011. http://www.w3.org/TR/owl–features/
Muir B (1994) Trust in automation: part I theoretical issues in the study of trust and human
intervention in automated systems. Ergonomics 37(11):1905–1922
Nardi BA (1996) Studying context: a comparison of activity theory, situated action models, and
distributed cognition. In: Nardi BA (ed) Context and consciousness. The MIT Press,
Cambridge, pp 69–102
Nes M (2005) The Gaps between the digital divides, University of Oslo, viewed 16 March 2009.
http://folk.uio.no/menes/TheGapsBetweenTheDigitalDivides.pdf
Newell A, Simon HA (1972) Human problem solving. Prentice Hall, New Jersey
Noldus L (2003) HomeLab as a scientific measurement and analysis instrument. Philips Res
34:27–29
Norman D (2005) Human-centered design considered harmful. Interactions 12(4):14–19
Nygaard K, Bergo TO (1973) Planning, management and data processing. Handbook for the
labour movement, Tiden Norsk Forlag, Oslo
Obrenovic Z, Starcevic D (2004) Modeling multimodal human–computer interaction. IEEE
Comput 37(9):65–72
O’Hare GMP, O’Grady MJ (2003) Gulliver’s genie: a multi-agent system for ubiquitous and
intelligent content delivery. Comput Commun 26:1177–1187
Pascoe J (1998) Adding generic contextual capabilities to wearable computers. In: Proceedings of
the 2nd IEEE international symposium on wearable computers: IEEE computer society
Pateman C (1970) Participation and democratic theory. Cambridge University Press, Cambridge
Perttunen M, Riekki J, Lassila O (2009) Context representation and reasoning in pervasive
computing: a review. Int J Multimedia Eng 4(4)
References 127

Pfeifer R, Rademakers P (1991) Situated adaptive design: toward a methodology for knowledge
systems development. In: Brauer W, Hernandez D (eds) Proceedings of the conference on
distributed artificial intelligence and cooperative work. Springer, Berlin, pp 53–64
Pfeifer R, Scheier C (1999) Understanding Intelligence. MIT Press
Philipose M, Fishkin KP, Perkowitz M, Patterson DJ, Hahnel D, Fox D, Kautz H (2004) Inferring
activities from interactions with objects. IEEE Pervasive Comput Mobile Ubiquitous Syst 3
(4):50–57
Prekop P, Burnett M (2003) Activities, context and ubiquitous computing. Comput Commun
26:1168–1176
Ptaszynski M, Dybala P Shi, Rafal W, Araki RK (2009) Towards context aware emotional
intelligence in machines: computing contextual appropriateness of affective states. Graduate
School of Information Science and Technology, Hokkaido University, Hokkaido
Punie Y (2003) A social and technological view of ambient intelligence in everyday life: what
bends the trend? In: The European Media and Technology in Everyday Life Network, 2000–
2003, Institute for Prospective Technological Studies Directorate General Joint Research
Center European Commission
Pylyshyn ZW (1987) The robot’s dilemma: the frame problem in artificial intelligence. Ablex
Publishing Corporation, Norwood
Riva G, Loreti P, Lunghi M, Vatalaro F, Davide F (2003) Presence 2010: the emergence of
ambient intelligence. In: Riva G, Davide F, IJsselsteijn WA (eds) Being there: concepts, effects
and measurement of user presence in synthetic environments. Ios Press, Amsterdam, pp 60–81
Robertson T (2000) Building bridges: negotiating the gap between work practice and technology
design. Human Comput Stud 53:121–146
Rogers Y (2006) Moving on from weiser’s vision of of calm computing: engaging UbiComp
experiences. In: UbiComp 2006, Orange County, California, USA. Springer-Verlag Vol LNCS
4206, pp 404–421,
Salovey P, Mayer JD (1990) “Emotional intelligence”, Imagination, Cognition and Personality,
vol 9, pp 185–211
Samtani P, Valente A , Johnson WL (2008) “Applying the saiba framework to the tactical
language and culture training system.” In: Parkes P, Parsons M (eds) The 7th International
Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008), Estoril, Portugal
Scherer KR (1992) What does facial expression express? In: Strongman K (ed) International
review of studies on emotion, vol 2. Wiley, New York, pp 139–165
Scherer KR (1999) Appraisal theory. In: Dalgleish T, Power MJ (eds) Handbook of cognition and
emotion. Wiley, New York, pp 637–663
Schilit B, Adams N, Want R (1994) Context-aware computing applications. In: Proceedings of IEEE
workshop on mobile computing systems and applications, Santa Cruz, CA, USA, pp 85–90
Schmidt A (2003) Ubiquitous computing: computing in context. Ph.D. dissertation, Lancaster
University
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human–computer interaction.
IOS Press, Amsterdam, pp 159–178
Schmidt A, Beigl M, Gellersen, HW (1999) There is more to context than location. Comput
Graphics UK 23(6):893–901
Servaes J (1999) Communication for development: on world, multiple cultures. Hampton Press,
Cresskill
Strang T, Linnhoff-Popien C, Frank K (2003) CoOL: a context ontology language to enable
contextual interoperability. In: Proceedings of distributed applications and interoperable
systems: 4th IFIP WG6.1 international conference, vol 2893, Paris, France, pp 236–247
Suchman L (1987) Plans and situated actions: the problem of human–machine Communication.
Cambridge University Press, Cambridge
Suchman L (1993) Participatory design: principles and practice. Lawrence Erlbaum, NJ
Suchman L (2002) Located accountabilities in technology production. Scand J Inf Sys 14(2):91–105
128 3 Context and Context Awareness of Humans and AmI Systems …

Suchman L (2005) Introduction to plans and situated actions II: human–machine reconfigurations,
2nd edn. Cambridge University Press, New York/Cambridge
Sølvberg A, Kung DC (1993) Information systems engineering: an introduction. Springer, Berlin
Tarjan RE (1987) Algorithm design. Commun ACM 30(3):205–212
Teixeira J, Vinhas V, Oliveira E, Reis L (2008) A new approach to emotion assessment based on
biometric data. In: Proceedings of WI–IAT’08, pp 459–500
Tobii Technology (2006) AB, Tobii 1750 eye tracker. Sweden, viewed 15 December 2012. www.
tobii.com
Trumler W, Bagci F, Petzold J, Ungerer T (2005) AMUN–autonomic middleware for ubiquitous
environments applied to the smart doorplate project. Adv Eng Inform 19:243–252
Turner RM (1999) A model of explicit context representation and use for intelligent agents. In:
Proceedings of modeling and using context: 2nd international and interdisciplinary conference,
vol 1688, Trento, Italy, pp 375–388
Tähti M, Arhippainen L (2004) A proposal of collecting emotions and experiences. Interact
Experiences HCI 2:195–198
Tähti M, Niemelä M (2005) 3E—expressing emotions and experiences, Medici Data oy, VTT
Technical Research Center of Finland, Finland
Ulrich W (2008) Information, context, and critique: context awareness of the third kind. In: The
31st information systems research seminar in Scandinavia, Keynote talk presented to IRIS 31
Wenger E (1998) Communities of practice: learning, meaning, and identity. Cambridge University
Press, Cambridge
Winograd T (1996) Bringing design to software. ACM, New York
Wright D (2005) The dark side of ambient intelligence, Forsight 7(6):33–51
Zhou J, Kallio P (2005) Ambient emotion intelligence: from business awareness to emotion
awareness. In: Proceeding of 17th international conference on systems research, informatics
and cybernetics, Baden, Germany
Zhou J, Yu C, Riekki J, Kärkkäinen E (2007) AmE framework: a model for emotion-aware
ambient intelligence, University of Oulu, Department of Electrical and Information
Engineering, Faculty of Humanities, Department of English VTT Technical Research Center
of Finland
Chapter 4
Context Recognition in AmI
Environments: Sensor and MMES
Technology, Recognition Approaches,
and Pattern Recognition Methods

4.1 Introduction

There exist a vast range of AmI architectures that essentially aim to provide the
appropriate infrastructure for AmI systems. Typically, they include many sensors of
diverse types, information processing systems or computing devices where mod-
eling and reasoning occur, and actuators through which the system acts, reacts, or
pre-acts in the physical world. There are many permutations of enabling technol-
ogies and computational processes of AmI, which result in many heterogeneous
components (devices and systems and associated software applications) which have
to interconnect and communicate seamlessly across disparate networks as part of
vast architectures enabling context awareness, machine learning and reasoning,
ontological representation and reasoning, and adaptation of services. The sensors
are basically utilized to acquire the contextual data needed for the context recog-
nition process—that is, observed information as input for AmI systems to analyze,
model, and understand the user’s context, so to undertake in a knowledgeable
manner actions accordingly. Sensor technology is thus a key enabler of context
awareness functionality in AmI systems. Specifically, to acquire, fuse, process,
propagate, interpret, and reason about context data in the AmI space to support
adaptation of services requires using dedicated sensors and signal and data pro-
cessing techniques, in addition to sophisticated context recognition algorithms
based on a wide variety of methods and techniques for modeling and reasoning.
The challenge of incorporating context awareness functionality in the AmI service
provision system lies in the complexity associated with sensing, learning, capturing,
representing, processing, and managing context information.
Context-aware systems are increasingly maturing and rapidly proliferating,
spanning a variety of application domains, owing to recent advances in capture
technologies, the diversity of recognition approaches, multi-senor fusion techniques,
and sensor networks, as well as pattern recognition algorithms and representation
and reasoning techniques. Numerous recognition approaches have been developed

© Atlantis Press and the author(s) 2015 129


S.E. Bibri, The Human Face of Ambient Intelligence, Atlantis Ambient
and Pervasive Intelligence 9, DOI 10.2991/978-94-6239-130-7_4
130 4 Context Recognition in AmI Environments …

and studied, and a wide variety of related projects have been carried out within
various domains of context awareness. Most of early research work on context
awareness focused on user’s physical context, which can be inferred using different
types of sensing facilities, including stereo-type cameras, RFID, and smart devices.
While most attempts to use context awareness within AmI environments were
centered on the physical elements of the environment or users, in recent years,
research in the area of context recognition has shifted the focus to human elements of
context, such as emotional states, cognitive states, physiological states, activities,
and behaviors. This has led to the development and employment of different rec-
ognition methods, mainly vision-based, multisensory-based, and sensor-based
context and activity recognition approaches. Furthermore, investigating methods for
context recognition in terms of approaches to context information modeling and
reasoning techniques for context information constitutes a large part of a growing
body of research on context awareness technology and its use in the development of
AmI applications that are adaptable and capable of acting autonomously on behalf of
users. Different types of contexts can be recognized using machine learning tech-
niques to associate sensor perceptions to human-defined context labels—classifi-
cation. Specifically, the sensor context data which are acquired and pre-processed
are analyzed using machine learning techniques to create context models and carry
out further pattern recognition—e.g., probabilistic reasoning—to determine
high-level context. Deriving high-level context information from raw sensor data by
means of interpretation and reasoning is about bringing meaning to low-level con-
textual data. However, there is a multitude of recognition algorithms beyond those
based on machine learning techniques that have been proposed and studied on the
basis of the manner in which the context is modeled, represented, and reasoned.
Accordingly, different modeling methods and reasoning techniques have been used
in the field of context-aware computing, apart from supervised and unsupervised
learning methods, including ontological, logical, rule-based, and case-based repre-
sentation and reasoning. Recent research work shows a propensity towards adopting
hybrid approaches to representation and reasoning, which entail integrating related
techniques based on the application domain (see Chap. 5). The key aim is to harness
the context awareness functionality as to generating accurate high-level abstractions
of contexts, such as physical activities, emotional, states, cognitive states, and
communication intents.
The intent of the chapter is to review the state-of-the-art sensor devices, rec-
ognition approaches, data processing techniques, and pattern recognition methods
underlying context recognition in AmI environments. An overview of the recent
advances and future development trends in the area of sensor technology is pro-
vided, focusing on novel multi-sensor data fusion techniques and related signal
processing methods. In addition, the evolving trend of miniaturization is high-
lighted, with a focus on MEMS technology and its role in the advancement of
sensing and computing devices. The observed future development trends include:
the miniaturization of sensing devices, the widespread use of multi-sensor fusion
techniques and systems, and the increasing applicability of autonomous sensors.
4.1 Introduction 131

As to data processing and pattern recognition methods, emphasis is laid on machine


learning probabilistic techniques, particularly in relation to emotional and cognitive
context awareness and affective systems.

4.2 Sensor Technology

4.2.1 Sensor Definition and Sensor Types

A sensor is defined as a device that detects or measures some type of input from the
physical environment or physical property, such as temperature, light, sound,
motion, pressure, or other environmental phenomena, and then indicates or reacts to
it in a particular way. The output is a signal in the form of human-readable display
at the sensor location or a recorded data that is to be transmitted over a network for
further processing—e.g., to middleware for context management. Commonly,
sensors can be classified according to the type of energy they detect as signals: light
sensors (e.g., photocells, photodiodes), photo/image sensor (e.g., stereo-type cam-
era, infrared), sound sensors (e.g., microphones), temperature sensors (e.g., ther-
mometers), heat sensors (e.g., bolometer), electrical sensors (e.g., galvanometer),
pressure sensors (e.g., barometer, pressure gauges), motion sensors (e.g., radar gun,
speedometer, tachometer), orientation sensors (e.g., gyroscope), physical movement
sensors (e.g., accelerometers), and so forth.

4.2.2 Sensor Information and Diversity of Sensing Areas


in Context-Aware Systems

AmI provides possibilities to support people in their everyday life activities.


Acquisition of sensor information about humans and their behavior and functioning
is an important factor, in addition to adequate knowledge for analysis of this
information by computing devices. Observed information about the human’s var-
ious states and dynamic models for the human’s mental, physical, behavioral, and
social processes serve as input for the process of computational understanding,
which entails the analysis and estimation of what is going on in the human’s mind
and body and the surrounding physical, social, and cultural environments.
Accordingly, for a context-aware system, one class of AmI applications, to be able
to infer high-level context abstraction based on the interpretation of and reasoning
on context information, it is first necessary to acquire low-level data from physical
sensors (and other sources). Researchers from different application domains within
the field of context-aware computing have investigated context recognition for the
past two decade or so by developing a diversity of sensing devices (in addition to
methods and techniques for signal and data processing, pattern recognition, mod-
eling, and reasoning tasks). Thus, numerous (types of) sensors are currently being
132 4 Context Recognition in AmI Environments …

Table 4.1 Sensor technologies


What to sense Sensors
Optical/vision Photo-diode, color sensor, IR and UV sensor
Audio Microphones
Motion Accelerometers, mercury switches and angular sensors
Location GPS, active badges
Biosensors Pulse, galvanic skin response measure, heart rate
Specialized sensors Touch sensor, thermometer, barometer
Source Schmidt et al. (1999)

used to detect various attributes of context associated with both physical environ-
ment as well as human factors related context. Schmidt et al. (1999) catalog dif-
ferent ways of sensing that could be utilized for detecting context. Table 4.1
illustrates a tabulated version of their discussion.
How many and what types of sensors can be used in a given context-aware
system is determined by the way in which context is operationalized (defined so that
it can be technically measured and thus conceptualized) and the number of the
entities of context that are to be incorporated in the system based on the application
domain, such as location, lighting, temperature, time, physical and computational
objects, task, user’s state and goal, personal event, social dynamics, proximity to
people, and so on, and also whether and how these entities can be combined to
generate high-level abstraction of context (e.g., the cognitive, physical, emotional,
and social dimension of context). In relation to human factors related context,
various kinds of sensors have been used to detect human movement, e.g., verbal
and nonverbal behavior, which provide a wealth of contextual information as
implicit input to context-aware systems, indicating user’s emotional, cognitive,
physiological, and social state as well as activities. Human movement as source of
context information has been under a rigorous investigation in the area of context
recognition, in particular in relation to sensing devices.

4.2.3 Emerging Trends in Sensor Technology

Recent advances in sensor technology have given rise to a new class of miniaturized
devices characterized by novel signal processing methods, high performance,
multi-fusion techniques, and high-speed electronic circuits. Subsequently, research
on context awareness has started to focus on the use of multiple miniature dense
sensors as to embedding context awareness in computer systems of various scales.
A multitude of sensors are already entrenched in very-small or large ICT, and it is
only a matter of time when advanced use can be gained from these complex
technologies; it is predicted that AmI will be densely populated by ICT devices and
systems with potentially powerful nano-bio-information and communication
capabilities (Riva et al. 2005). The miniaturization trend is increasingly making it
4.2 Sensor Technology 133

possible to incorporate multiple smart sensors in context-aware systems, owing to


sensors being manufactured on a micro- and nanoscopic scale. The trends toward
AmI are driving research into ever-smaller sizes of sensors capable of powerfully
sensing complex aspects of context at very low cost. The production of components
and devices with a low cost-to-performance ratio is further driven by the rapid
development of sensor manufacturing technologies. In context-aware systems as
well as affective and conversational systems, recognizing the user’s emotional,
cognitive, physiological, and behavioral states becomes possible because of the
advancement of multimodal user interfaces (see Chap. 6). These are equipped with
many types of miniature sensors, as they incorporate various types of naturalistic
interfaces, such as facial interface, gesture interface, voice interface, motion
tracking interface, gaze-based interface, and conversational interface.

4.3 Miniaturization Trend in AmI

4.3.1 Miniature System Devices and Their Potential

AmI is about the omnipresence of invisible technology in everyday human envi-


ronments. Physical invisibility of technology is a key feature of AmI. Countless
tiny, distributed, networked devices are invisibly embedded in the environment.
This is enabled by the miniaturization of sensing and computing devices and their
seamless integration and communication. Alteration to the notion of a computer is
evident: new concepts wherein computing power is distributed or dispersed among
a multitude of dedicated devices are increasingly prevailing. In a world of AmI,
a myriad of invisible sensing devices and intelligent components will be entrenched
in virtually everything around us, and unobtrusively functioning in the background
of human life, interacting with each other and their environment.
Weiser (1991) proposed three basic forms for ubiquitous system devices: (1) tabs
which are wearable centimeter sized devices (e.g., smartphone); (2) pads which are
hand-held decimeter-sized devices (e.g., laptop), and (3) boards which are
meter-sized interactive display devices (e.g., horizontal surface computers). What
characterize these three forms is that they are macro-sized, have a planar form, and
incorporate visual output displays. Three additional forms for ubiquitous computing
devices: (1) dust: miniaturized devices can be without visual output displays,
ranging from nanometers through micrometers to millimeters; (2) skin: fabrics
based upon light emitting and conductive polymers, organic computer devices, can
be formed into more flexible non-planar display surfaces and everyday objects;
(3) clay: ensembles of MEMS can be formed into arbitrary three dimensional
shapes as artifacts, resembling many different kinds of physical objects (Poslad
2009). See below for related early research projects.
Miniaturization offers nanotechnology and nanoengineering breakthroughs in
computing. The research in the area of nanotechnology and nanoengineering is
expected to yield major shifts in ICT performance and the way mechatronic
134 4 Context Recognition in AmI Environments …

components, devices, and systems are manufactured, designed, modeled, and


implemented, thereby drastically changing the nature and structure of computers.
The performance of ICT is associated with efficiency improvement in terms of such
features as computational speed, energy efficiency, bandwidth, memory, and
wireless communication network. And the miniaturization trend is increasingly
making it possible to develop both on-body and remote smart sensors that allow
registering various human parameters (e.g., emotional, cognitive, physiological, and
behavioral cues) in an intrusive way, without disturbing human actors. This is
instrumental in enhancing the computational understanding of the human’s mind,
body, actions, and activities (which entails analysis, interpretation, and reasoning
tasks that in turn occur in miniaturized intelligent components) and thus the system
behavior in the physical world (which take place through tiny actuators or effec-
tuators) with regard to responding intelligently to facial and gestured indications
associated with human psychological and physiological states and multimodal
communication intents and behaviors. In particular, sensors are being manufactured
on a nano- and microscale—nano- and micro-sensors. This has been boosted by
rapid advances in such sensor technologies as piezo-materials, VLSI (Very Large
Scale Integration) video, optical gyros, MEMS (Micro-Electro-Mechanical
Systems) (Saffo 1997), and NMES (Nano-Electro-Mechanical Systems). Applied
research and engineering in MEMS and NEMS have undergone major develop-
ments over the last three decades, and high-performance NEMS and MEMS have
been manufactured and implemented in a wide variety of sensors, including
accelerometers and microphones, actuators, molecular wires and transistors, and so
forth (Lyshevski 2001). NEMS and MEMS technology is giving rise to a new class
of devices and systems that are increasingly used in AmI and AI, spanning a wide
variety of application domains.
MEMS are distinct from the hypothetical vision of molecular nanotechnology,
where the research and development is primarily concentrated on design, modeling,
simulation, and fabrication of molecular-scale devices; molecular technology
allows designing and manufacturing the atomic-scale devices with atomic preci-
sion, designing nano-scale devices ranging from electromechanical motion devices
(e.g., rotational actuators and sensors) to nano-scale integrated circuits (e.g., diodes
and transistors, logic gates and switches, capacitors) (Ibid). Nanoengineering is
very challenging because of the complex multidisciplinary nature involving biology
and chemistry, engineering and physics, mathematics and medicine, and technology
and material science (Ibid).

4.3.2 Early Dust, Skin, and Clay Projects

After the UbiComp vision gained footing during 1990s, numerous research initia-
tives were launched within the field of sensor technology under the label of
micro-system design and embedded systems, spanning Canada, European continent,
and Japan. But most early research initiatives in this area took place in the USA.
4.3 Miniaturization Trend in AmI 135

The trend towards manufacturing miniature sensors started to flourish in the


mid-1990s, a few years after the vision of UbiCom was spotted by Mark Weiser.
Subsequently, UbiComp vision sparked a large number of research projects. Early
related projects in the USA, dedicated to embedded technology, were undertaken by
the ICT industry and universities. A technology known as ‘smart painting’, which is
a random network of wall painted computers, was studied at the MIT. A similar
technology is Smart Matter, a project that began in the late 1990s at PARC, a
subsidiary of Xerox Corporation. This project was underpinned by MEMS which
enabled to mass produce large numbers of integrated sensors, actuators, computer
devices, and communication systems (on a single chip) that can be embedded within
mobile devices and daily artifacts or spread throughout the environment (Wright
2005). A similar technology is ‘smart dust’ (Kahn et al. 1999), a project that was
launched by the University of California at Berkeley and started in 1997 and fin-
ished in 2001. The project developed tiny sensors dubbed ‘smart dust’: the proto-
type is illustrated in Fig. 4.1. Smart dust (or mote) is ‘a cloud of tiny speckles, each
one of millimeter dimension of active silicon’; mote senses, communicates, power
itself: ‘converts sunlight into electricity, locally elaborates information, localizes
itself, both in absolute and relative to other particles, communicates with other ones
within a few meters; furthermore, they jointly act as a coordinated array of sensors, a
distributed computer, and as a communications network’ (Riva et al. 2003). The
goal of the project was to develop a complete sensor network node in a single cubic
millimeter where tiny sensors are augmented with wireless connectivity capable of
organizing themselves into flexible networks, and characterized by free-space
communication at optical frequencies, power consumption of a few milliwatts,

Fig. 4.1 Smart dust. Source Kahn et al. (1999)


136 4 Context Recognition in AmI Environments …

adoption of power management strategies, processing power, and so on (Wright


2005; Riva et al. 2003). Sensor network is a low-speed network that is used to
connect sensors to actuators. Multiple sensor networks can be coupled to form
device networks. And as they become smart, sensors can pre-process their own data
to reduce communications (Sanders 2009a).

4.4 MEMS Technology

The field of sensor technology has changed quite dramatically over the past two
decades due to the advent of such technologies as MEMS, piezo-materials, and
VLSI video. Research shows that MEMS are by far of most importance as to
enabling the rise of microscale sensors and actuators. Sensor technology has, thanks
to the trend of miniaturization, undergone some significant transitions and continues
to evolve rapidly. The focus of research has shifted mainly from macro to micro-
scale devices, owing to the development of MMES technology. In view of that, the
criteria that are being used to gauge the operational capabilities of the evolving
miniaturized devices include: intelligence, system-on-a-chip integration, high per-
formance, computational speed, integrity, efficiency, size, communication, reli-
ability, energy, cost, and so on. ‘Until recently…sensors tended to be simple,
unintelligent, connected directly into control systems, and static…, but all that is
changing. Wireless networks are becoming increasingly common and some smaller
sensors are becoming mobile so that networks of sensors can work in mobile teams
(or swarms)… Sensors are becoming “Smart Sensors” that can pre-process their
own data to improve quality and reduce communications’ (Sanders 2009a). The
emphasis will be given to (the integrated large-scale) MEMS, in addition to pro-
viding a short account of piezo-materials and VLSI video.

4.4.1 Defining Characteristics of MMES

Microelectromechanical systems (MEMS) (also written as micro-electro-mechanical


or MicroElectroMechanical systems and referred to as micro systems technology
(MST) in Europe) technology is about tiny mechanical devices driven by electricity.
Lyshevski (2001, p. 26) defines MEMS ‘as batch-fabricated microscale devices (ICs
and motion microstructures) that convert physical parameters to electrical signals
and vice versa, and in addition, microscale features of mechanical and electrical
components, architectures, structures, and parameters are important elements of their
operation and design.’ As integrated systems, MEMS incorporate smart
micro-sensors and actuators with signal-conditioning electronics on a single silicon
chip. MMES usually consist of a central unit to process data, the microprocessor and
several components that interact with the outside such as micro-sensors (Waldner
2008) and micro-actuators. Specifically, the subsystems of MEMS comprise
4.4 MEMS Technology 137

microscale sensors (detection and measurement of changes of the physical vari-


ables), microelectronics/ICs (signal processing, data acquisition, decision making,
etc.); and microscale actuators (actuating of real-world systems) (Lyshevski 2001).
As microscale devices, MEMS are made up of components between 1 and 100 μm
(micrometres) in size (i.e., 0.001–0.1 mm), and generally range in size from 20 μm to
a millimeter—smaller than a millimeter long. MEMS can be fabricated using
modified semiconductor device fabrication technologies, such as molding and
plating, wet etching, dry etching, electro-discharge machining, and other technolo-
gies capable of manufacturing microscale devices. Typically, MEMS microscale
structures or subsystems—actuators, sensors, and ICs—are fabricated using com-
plementary metal oxide semiconductor (CMOS), surface micromachining and
photolithography, near-field optical microscopy and magneto-optics, and other
leading-edge microfabrication technologies and processes (Ibid).

4.4.2 Large Scale Integrated MEMS

A great variety of MMES has been designed and used in the field of computing,
including AI, AmI, UbiComp, and HCI, but commonly used MEMS differ from
what is called ‘large scale integrated MEMS’ in terms of complexity. Many of such
MEMS are too complex to be studied (Ibid). Nevertheless, ‘novel optimized…
MEMS architectures (with processors or multiprocessors, memory hierarchies and
multiple parallelism to guarantee high-performance computing and decision mak-
ing), new smart structures and actuators/sensors, ICs and antennas, as well as other
subsystems play a critical role in advancing the research, developments, and
implementation’ (Ibid, p. 15). Accordingly, the so-called flip-chip, which replaces
wire banding to connect ICs with micro- and nanoscale actuators and sensors, offers
benefits in the implementation of advanced flexible packaging, improves reliability
and survivability, and reduces weight and size, in addition to other improvements

Fig. 4.2 Flip-chip monolithic MEMS with actuators and sensors. Source Lyshevski (2001)
138 4 Context Recognition in AmI Environments …

of performance characteristics (Ibid). Figure 4.2 illustrates flip-chip MEMS.


The flip-chip assembly attaches actuators and sensors directly to ICs (components
to perform signal conditioning data acquisition, computations and control, decision
making, etc.), and they are mounted face down with bumps on the pads. This is to
form electrical and mechanical joints to the ICs substrate.
Monolithic MEMS are integrated microassembled structures (electromechanical
microsystems on a single chip) that combine microscale sensors and actuators as
motion microstructures (microsensors sense the physical variables, and microactua-
tors actuate real-world systems) with electricalelectronic ICs, composing the major
class of MEMS (Ibid). And the large scale integrated MEMS that can be
mass-produced at low cost integrates N nodes of actuators/sensors, smart structures;
ICs and antennas; processor (multiprocessor) and memories; high performance
interconnection networks (communication busses) and input–output (IO) subsystems.
Figure 4.3 illustrates a high-level functional block diagram of large-scale MEMS.

Fig. 4.3 High-level functional block diagram of large-scale MEMS with rotational and
translational actuators and sensors. Source Lyshevski (2001)
4.4 MEMS Technology 139

To integrate, large scale integrated MEMS are of far greater complexity than
MEMS that are being used today, as they can integrate ‘thousands of nodes of
high-performance actuators/sensors and smart structures controlled by ICs and
antennas; high-performance processors or superscalar multiprocessors; multi-level
memory and storage hierarchies with different latencies (thousands of secondary
and tertiary storage devices supporting data archives); interconnected, distributed,
heterogeneous databases; high-performance communication networks (robust,
adaptive intelligent networks).’ (Ibid).
As mentioned above, apart from MEMS, there is a suite of technologies
underlying the rise of miniaturized sensors, including piezo-materials and VLSI
video. Made typically of ceramics, piezo-materials ‘give off an electrical charge
when deformed and, conversely, deform when in the presence of an electrical field.
Put a charge in, the material deforms; deform the material, it sends out a charge.
Piezos are particularly useful as surface-mount sensors for measuring physical
movement and stress in materials. But more importantly, piezos are useful not just
for sensing, but for effecting—manipulating the analog world. This is an indicator
of the real significance of the sensor decade. Our devices won’t merely sense and
observe. They will also interact with the physical world on our behalf.’ (Saffo
1997). As far as VLSI video is concerned, it is a videocam built ‘on a single chip:
the charge-coupled device (CCD), all the circuitry needed and even the lens will be
glued directly to the chip’ (Ibid). The rapid global progress in VLSI is one of the
factors that have driven the development of MEMS. VLSI technology (or CMOS)
can be used to perform the fabrication of microelectronics (ICs), and the fabrication
of motion microstructures is also based upon VLSI technology or micromachining;
microelectronics and micromachining are two basic components of MEMS
(Lyshevski 2001) in relation to AmI, VLSI video is of particular relevance to the
design and performance of the kind of context-aware applications (multimodal
users interfaces) that use emotional and cognitive cues from external carriers, such
as affect display and eye movement, to recognize or infer the user’s emotional and
cognitive states, as well as to conversational agents which attempt to receive (and
respond) to user’s multimodal nonverbal communication behavior.

4.4.3 Potentials and Advantages

Miniaturization offers nanotech breakthroughs in computing which have direct


effect on society. AmI or UbiComp offer a vision on the future of computing (or
ICT) in society. Looking at behavioral patterns of technological developments,
miniaturization of devices and systems are of great potential in shaping the future of
technology as well as the development of society. The Miniaturization trend has
indeed become deeply embedded into modern society due to the associated
far-reaching benefits from the results. The idea was appreciated long before the
technology existed to actually make minute devices and systems and mainstream
their use at the societal level. In 1960 Richard Feynman, the Nobel Laureate
140 4 Context Recognition in AmI Environments …

considered the ability to manipulate matter on an atomic scale. In 1959, he gave a


talk to the American Physical Society in which he highlighted the potential role of
nanotechnology and nanoscale organic and inorganic systems on the society and
development. In fact, while the shift in cost and performance makes technology
widespread, accessible, and socially accepted, the shift in size defines each era of
computing, and thus indicates a new paradigmatic shift in computing: from
mainframe computing prevailing for two decades, 1960–1980, over personal
computing for a decade, 1980–1990, and multiple computing devices per person
(e.g., laptop, mobile phone, PDA, etc.) during 2000 onwards, to invisible com-
puting in this decade, 2010s and onwards. While miniaturization of technology has
been the guiding principle of computing technology design for quite some time, it is
about to reach its mature stage in the era of AmI computing. The trend towards AmI
is driving research into ever smaller, cheaper, more effective, and smarter devices
and systems, critically demanding continuous fundamental, applied, and techno-
logical enhancement.
The creation and development of MMES seems to be a salient factor or of high
potential to turn the vision of AmI into reality given the associated technological
offerings—performance improvement of multiple features of sensor technology.
Indeed, limited, imperfect, imprecise, or uncertain features of sensor data have
implications for the reasoning processes as to complex inferences, regardless of the
acknowledged huge potential of machine learning techniques and algorithms.
However, MEMS technology has a serious, evolutionary engineering potential. The
development of sensors tending to acquire data and convert that data into electrical
signals to feed higher level systems has been driven by the need to reduce size and
cost while increasing performance; and MEMS could revolutionize the sensor
market by providing power, space, time efficiency, and reliability at minimal cost.
Due to extremely-high level of integration of electromechanical components with
accuracy, efficiency, reliability, low cost and maintenance, ruggedness, and sur-
vivability, MEMS can be applied to a wide variety of microscale devices, e.g.,
accelerometers, gyroscopes (Lyshevski 2001), and other types of sensors. Moreover,
MEMS provide a high flexibility in terms of the way they can be entrenched in,
spread over, or take the shape of various everyday objects. Ensembles of MEMS can
be formed into artifacts, resembling many different kinds of physical objects (Poslad
2009), as they are built on a single chip. Underlying MEMS technology is an
interesting mind-shift in chip design (Saffo 1997). Furthermore, the development in
the field of MEMS has been boosted by rapid global progress in ICs, VLSI, mate-
rials, microprocessors, memories, and so on that have revolutionized instrumenta-
tion, control, and systems design philosophy, and this progress has moreover
facilitated explosive development in data processing (massive localized processing
power) and (wireless) communications in high-performance systems (Lyshevski
2001). Recent development in MMES with massive localized processing power has
tremendously improved the performance of sensor technology and allowed for more
accurate sensing and efficient control of complex processes and systems.
Intelligence, performance, efficiency, and reliability are considered as key criteria to
gauge the operational capabilities of MEMS. Designing high-performance,
4.4 MEMS Technology 141

intelligent MEMS is needed in order for applications to accomplish many functions,


such as programing and self-testing; collection, compiling, and processing infor-
mation; multivariable embedded high-density array coordinated control; and cal-
culation and decision making with outcomes prediction and actuation and control
(Lyshevski 2001). In line with that, Sanders (2008) envisages a change through
amalgamating smart mobile sensors with advances in microprocessors, new algo-
rithms, remote calibration, automatic ranging, effective wireless communication, and
enough energy to move themselves around within their environment. Sensor
mobility is of particular focus in current research on AmI; it is expected to herald a
new class of sensor devices based on dynamically defined sensor configuration,
where the location of the sensor and its characteristics is flexibly changeable
depending on the situation, sensors can deploy themselves around in the environ-
ment as wearable devices may change depending on the activities, sensor location on
body may change, and sensors may be added or removed in instrumented envi-
ronment (see European Project Opportunity 2009–2011).

4.4.4 Technical and Theoretical Issues and Challenges

Like other enabling technologies of AmI, MEMS technology poses many issues
and challenges associated with research and development, design and engineering,
and manufacturing and fabrication. Due to the scope of this chapter, a great number
of problems and phenomena will not be covered here, including fabrication and
manufacturing. Lyshevski’s (2001) book is the essential reading for those who are
interested to explore MMES and NMES in their complexity and variety.
There are fundamental and computational problems posed by the complexity of
large scale MEMS that need to be addressed, formulated, and solved. The emer-
gence of high-performance computing has dramatically affected the fundamental
and applied research in MEMS, creating a number of very challenging problems.
To advance the theory and engineering practice of MEMS requires high-
performance computing and advanced theory (Ibid). Given the size and complex-
ity of MEMS, the standard concepts of classical and fundamental theories of
physics (e.g., quantum mechanics, molecular dynamics, electromagnetics,
mechanics and thermodynamics, circuitry theories, and other fundamental con-
cepts) and conventional computing technologies (e.g., modeling, simulation),
cannot be straightforwardly applied to large-scale integrated micro-scale devices
(MEMS) given the associated highest degree of complexity. ‘Current advances and
developments in modeling and simulation of complex phenomena in NEMS and
MEMS are increasingly dependent upon new approaches to robustly map, compute,
visualize, and validate the results clarifying, correlating, defining, and describing
the limits between the numerical results and the qualitative-quantitative analytic
analysis in order to comprehend, understand, and grasp the basic features.
Simulations of NEMS and MEMS require terascale computing that will be avail-
able within a couple of years. The computational limitations and inability to
142 4 Context Recognition in AmI Environments …

develop explicit mathematical models (some nonlinear phenomena cannot be


comprehended, fitted, and precisely mapped) focus advanced studies on the basic
research in robust modeling and simulation under uncertainties. Robust modeling,
simulation, and design are critical to advance and foster the theoretical and engi-
neering enterprises.’ (Ibid, p. 19). For a thorough study of ‘a broad class of fun-
damental and applied problems ranging from fundamental theories (quantum
mechanics and electromagnetics, electromechanics and thermodynamics, structural
synthesis and optimization, optimized architecture design and control, modeling
and analysis, etc.) and numerical computing (to enable the major progress in design
and virtual prototyping through the large scale simulations, data intensive com-
puting, and visualization)’, the reader is directed to Lyshevski (2001).
Advancing miniaturization towards the micro-level with the ultimate goal to
design and manufacture large-scale intelligent MEMS which have microcomputers,
i.e., sensors, actuators, and ICs, as the core components faces a great number of
challenging or unsolved issues. Some fundamental and computational problems
that have not been solved due to the complexity of large scale MEMS ‘include
nonlinearities and uncertainties which imply fundamental limits to formulate, set
up, and solve…design problems’ (Ibid, p. 20). Accordingly, given that micro-scale
devices must be controlled, an extremely challenging problem is to design MEMS
integrating control and optimization, self-organization and decision making, diag-
nostics and self-repairing, signal processing, and communication (Ibid). There is an
array of issues relating to sensor signals and conditioning, in particular, the inte-
gration of electronic circuitry with sensor systems and the integration of electronics
systems with MEMS, e.g., how advanced control techniques can be used to
improve the performance of accelerometers and how adaptive optical systems can
be combined with silicon-based systems (Gaura and Newman 2006). In addition,
there is a lack of synergy theory to augment actuation, sensing, signal processing,
and control; moreover (Lyshevski 2001). In fact, the need to develop strategies to
integrate mechanical structures and ICs constitute one of the main challenges in
MEMS fabrication (Ibid). Researchers in the field of MEMS are often faced with a
mammoth task when trying to adopt a top–down design strategy (Gaura and
Newman 2006).
Furthermore, constraints on resources such as energy, memory, size, and
bandwidth are considered among the difficult issues that face the future research and
development in MEMS. In terms of size and energy, Sanders (2009a) points out
that the ‘potential need for smaller and more energy efficient sensors that can
operate autonomously in harsh industrial conditions…will drive research towards
more robust and fault tolerant MEMS that can automatically compensate for
variables such as temperature’. Considering that electronic sensor systems are
sometimes exposed to harsh environmental conditions, MEMS-based sensors or
sensors-on-a-chip could be robust in such conditions, as they seem to be able to
withstand high humidity, pressure, and temperature (Sanders 2008). In addition,
‘there are some interference problems that might become critical for wireless
communications between MEMS and they can also be limited by antenna size,
power, and bandwidth, and that is all being explored by some radio engineers.
4.4 MEMS Technology 143

MEMS will need the ability to cope with technology or communication failures and
large-scale deployments and large amounts of data will need new computer science
algorithms’ (Sanders 2009a).
Advanced intradisciplinary research and thus scholarly collaboration between
researchers from different subfields of computing is necessary to design, develop,
and implement high-performance MEMS. In addition to the complexity of
large-scale MEMS requiring new fundamental and applied research and develop-
ments, ‘there is a critical need for coordination across a broad range of hardware
and software. For example, design of advanced microscale actuators/sensors and
smart structures, synthesis of optimized (balanced) architectures, development of
new programing languages and compilers, performance and debugging tools,
operating system and resource management, high-fidelity visualization and data
representation systems, design of high-performance networks, etc. New algorithms
and data structures, advanced system software and distributed access to very large
data archives, sophisticated data mining and visualization techniques, as well as
advanced data analysis are needed. In addition, advanced processor and multipro-
cessors are needed to achieve sustained capability required of functionally usable
large-scale…MEMS.’ (Lyshevski 2001, p. 17).
The set of long-range goals that challenge the design, manufacturing, develop-
ment, and deployment of high-performance MEMS include advanced materials and
process technology; microsensors and microactuators; sensing and actuation
mechanisms; sensors-actuators-ICs integration and MEMS configurations; pack-
aging, microassembly, and testing; MEMS design, optimization, and modeling; and
MEMS applications and their deployment (Ibid). Research into modeling and
improving MEMS manufacturing and design techniques, in addition to HCI, AmI,
and AI, will lead to useful advances for the immediate and medium term future,
while ‘in the longer term, understanding the properties of MEMS materials and then
creating more capable and intelligent MEMS machines will lead to direct
brain-computer interfaces that will allow us to communicate our ideas directly to
machines (and to other human members of virtual teams) and that may change our
world beyond recognition.’ (Sanders 2009a). The boundaries to what may be
technologically feasible are for the future to tell.

4.5 MEMS and Multi-sensor Fusion and Context-Aware


and Affective Computing

Over the last two decades, sensor technology has undergone a significant change,
especially in relation to the area of context-aware and affective computing, giving
rise to a new class of sensing devices characterized by multi-sensor data fusion
techniques and miniaturization. This has been boosted by recent discoveries in
cognitive science and AI and advances in micro-engineering enabled by interdis-
ciplinary research endeavors. In particular, there has been an ever-increasing
144 4 Context Recognition in AmI Environments …

interest in interdisciplinary research on multi-sensor data fusion technology, driven


by its diverse application domains, such as mobile computing, UbiComp, AmI, and
AI, as well as it versatility and adaptability. Multi-sensor systems can significantly
contribute to the accuracy of the computation of measurement values and the
enhancement of the quality and availability of information (e.g., situational, emo-
tional, and cognitive). This pertains to new, sophisticated signal processing methods
for the improvement of sensor properties that are based on data fusion techniques.
As an area of systems engineering, electrical engineering, and applied mathematics,
signal processing refers to various techniques for enhancing the accuracy and
reliability of (digital) communications, and is concerned with operations on or
analysis of digitized (or analog) signals, using statistics over time or complex
mappings and algorithms for representing time-varying or spatially varying phys-
ical quantities. However, in relation to context-aware and affective computing, the
integration of multiple, diverse sensors with advanced signal processing algorithms
enable context-aware systems or affective systems to detect various contextual or
emotional cues that can be combined to make inferences about the user’s context of
situations, activities, psychological states (see below), or to recognize emotions.
Known with the highest degree of detection complexity, emotional and cognitive
states requires miniature, multisensory devices with very sophisticated algorithms
for an effective context awareness functionality and affective computation. MEMS
and multi-sensor fusion as technologies are intended to augment computer user
interfaces with multi-sensor data fusion capabilities, efficiency, high performance,
reliability, intelligence necessary for the effective functioning of emotional and
cognitive context-aware systems and affective systems as to accurate detection,
sound interpretation, rapid reasoning, and relevant, real-time delivery of adaptive
and responsive services to the user. Given the difficult task of recognizing psy-
chological states of human users, computer user interfaces must be equipped with
human-like interaction capabilities, including multimodality, multi-channeling,
dynamic perception, and intelligence—that is, multi-sensor fusion techniques and
MMES technical features. There is a need for creating intelligent computers in the
form of direct brain-computer interfaces that will allow humans to communicate
what is going on in their mind (e.g., affective and cognitive states) directly to
computers, and MEMS and multi-fusion sensor technologies hold a great enabling
potential in this regard. In particular, if intelligent computers, which are exceeding
human performance in more and more tasks (Sanders 2009b), can be effectively
introduced into MEMS devices then computers ‘can be made to merge with us
more intimately and we should be able to combine our brain power with computer
capacity to create powerful’ artificial intelligent systems (Sanders 2009a).
The evolving multi-sensor data fusion (or distributed sensing) technology is
increasingly attracting interest among researchers within the area of computing, and
is gaining a significant ground in the research area of context awareness, affective
computing, and computational intelligence within AmI and AI alike. Multi-sensor
context awareness has been applied and widely investigated in a number of AmI and
AI projects. In response to a need for an analytical review of recent developments in
the multi-sensor data fusion domain, Bahador et al. (2013) provides a
4.5 MEMS and Multi-sensor Fusion and Context-Aware … 145

comprehensive review of the data fusion state of the art, where they explore its
conceptualizations, benefits, challenging aspects, and existing methodologies, and
also highlight and describe several future directions of research in the data fusion
community. MEMS will add much to multi-sensor (context-aware and affective)
systems, which are considered to be more rewarding in relation to a number of
application domains. Given their integration features as to signal processing, wire-
less communication, control and optimization, and self-organization and decision
making, MEMS will revolutionize sensing devices by improving energy efficiency,
intelligence, memory, computational speed, and bandwidth. These are crucially
important factors for the effective operation of sensors, especially sensors deal with
huge amounts of raw data collected from multiple and often heterogeneous sources.
On a single silicon chip, MEMS integrate smart microscale sensors for detecting and
measuring changes of physical variables (and also human actions, activities, and
behaviors); microelectronics/ICs for signal processing, data acquisition, and deci-
sion making; and smart microscale actuators for activating real-world systems, e.g.,
context-aware systems, affective systems, and conversational systems.

4.6 Multi-sensor Based Context Awareness

4.6.1 Multi-sensor Data Fusion and Its Application


in Context-Aware Systems

In relation to context awareness, underlying the multi-sensor fusion methodology is


the idea that context, an abstraction of context, as an amalgam of various, inter-
related contextual elements can be generated or inferred on the basis of information
detected from multiple, heterogeneous data sources, which provide different, yet
related, sensor information. Thus, sensors should be integrated to yield optimal
context recognition results—provide robust estimation of context, e.g., situation,
emotional state, and cognitive state. In terms of situation, Table 4.2 lists two
examples of how situations and data from different sensors may relate.
The physical dimension of the context as a higher level of the context like the
ones presented above can be deduced by using such external context as location,
temperature, time, lighting, acceleration, audio, motion, and so on as an atomic
level of the context. In the same manner, the cognitive or emotional dimension of
the context, psychological context, like ‘making decision’ and ‘retrieving infor-
mation’ or ‘feeling bored’ or ‘feeling frustrated’, can be deduced by using such
internal context as user’s goal and work setting or user’s facial expressions and
psychophysiological responses. The use of multi-sensor fusion approach in emo-
tional context-aware systems, in particular, allows gaining access simultaneously to
varied information necessary for accurate estimation or inference. Multi-sensor
fusion systems have the potential to enhance the information gain while keeping the
overall bandwidth low (Van Laerhoven and Gellersen 2001). Figure 4.4 illustrates a
multi-sensor fusion approach.
146 4 Context Recognition in AmI Environments …

Table 4.2 Real-world situations related to sensor data


Situation Sensor data
User sleeps It is dark, room temperature, silent, indoor location time is ‘nighttime’, user
is horizontal, specific motion pattern, absolute position is stable
User is Light level/color is changing, certain audio level (not silent), room
watching TV temperature, type of location is indoors, user is mainly stationary
Source Gellersen et al. (2001)

Fig. 4.4 Use of multiple, diverse sensors for emotional, cognitive, and situational context
awareness

4.6.2 Layered Architecture for Emotional (and Cognitive)


Context Awareness

Given that the emphasis in this chapter is on emotional and cognitive context
awareness in relation to sensing devices and information processing systems, a
layered architecture for abstraction from raw sensor data to multi-sensor based
emotional context is illustrated and described. This architecture is also applicable to
cognitive and situational context—with relevant changes in sensor types and related
signal processing and computation techniques. The keystones of the multi-sensor
context-aware system idea are:
• Integration of multiple, diverse sensors, assembled for collection or acquisition
of multi-sensor data independently of any specific application (e.g., emotional
state, cognitive state, situational state, task state);
• Association of multi-sensor data with emotional, cognitive, situational, or
activity contexts in which the user is, for instance feeling frustrated, decision
making, or watching TV; and
• Implementation of sensors and signal and data processing approaches and
pattern recognition methods for inferring or estimating emotional, cognitive, or
situational context from sensor data (values and cues).
To recognize an emotional context, pre-processing of multi-sensor data received
as digital signals from multiple, diverse sensors (used to detect facial, gestural,
psychophysiological, and speech cues) entails that these sensors are equipped with
interfaces that allow them to interact with one another using dedicated cross-
processing algorithms for the purpose of fusing and aggregate data from multiple
sources and transforming them into cues (application-dependent features), which
4.6 Multi-sensor Based Context Awareness 147

are used for further analysis through machine learning techniques to create emo-
tional context models and carry out further pattern recognition—making inferences
about the emotional context. Emotional context-aware systems are typically based
on a layered architecture for sensor-based computation of emotional context as
illustrated in Fig. 4.5, with separate layers for: raw sensor data, features extracted
from individual sensors, and context derived from cues. The idea is to abstract from
low-level emotional context by creating a model layer that gets the multi-sensor
perceptions to generate application actions.
The sensor layer is defined by an open–ended collection of sensors given that
emotion is multimodal in nature and involves multiple channels. Accordingly, the
data provided by sensors can be of different formats, ranging from slow sensors that
supply scalars (e.g., heart rate, galvanic skin response, electroencephalographic
response) to fast and complex sensors that provide larger volume data (e.g.,
microphone for capturing emotiveness and prosodic features of speech, video
camera for capturing facial expressions, and accelerometer for capturing gestures).
In a general architecture for context awareness, the sensor layer involves a open–
ended collection of many different sensors gathering a large volume of data about
various contextual features pertaining to the user, including cognitive state, task,
social dynamics, personal event, location, lighting, time, temperature, specific
motion pattern, behavior, absolute position, intention, work process, and so on—
more specifically, a great diversity and multiplicity of sensors, such as image
sensor, audio sensor, biosensor, light sensor, temperature sensor, motion sensor,
physical movement sensor, location sensor, but to name a few. These sensors are
utilized to acquire the contextual data needed for the recognition process as to
various entities of context.
Between sensor layer and emotional context model layer, figures a cue layer.
This layer, in multi-sensor emotional context-aware systems, introduces cues as
abstraction from raw sensor data that represent features extracted from the data
stream of a single sensor. As shown in Fig. 4.5, many diverse cues can be derived
from the same sensor-image, motion, audio, or wearable. In reference to mobile
context-aware devices—but of relevance also to emotional context-aware systems,
Gellersen et al. (2001) point out that this ‘abstraction from sensors to cues serves to
reduce the data volume independent of any specific application, and is also referred
to as “cooking the sensors”… Just as the architecture does not prescribe any specific

Fig. 4.5 Layered architecture for abstraction from raw sensor data to multi-sensor based
emotional context
148 4 Context Recognition in AmI Environments …

set of sensors, it also does not prescribe specific methods for feature extraction in
this layer. However, in accordance with the idea of shifting complexity from
algorithms to architecture it is assumed that cue calculation will be based on
comparatively simple methods. The calculation of cues from sensor values may for
instance be based on simple statistics over time (e.g., average over the last second,
standard deviation of the signal, quartile distance, etc.) or slightly more complex
mappings and algorithms (e.g., calculation of the main frequencies from a audio
signal over the last second, pattern of movement based on acceleration values). The
cue layer hides the sensor interfaces from the context layer it serves, and instead
provides a smaller and uniform interface defined as set of cues describing the
sensed system environment. This way, the cue layer strictly separates the sensor
layer and context layer which means context can be modeled in abstraction from
sensor technologies and properties of specific sensors. Separation of sensors and
cues also means that both sensors and feature extraction methods can be developed
and replaced independently of each other. This is an important requirement in
context-aware systems and has motivated the development of [various] architec-
tures’. Architectures for emotional context awareness typically incorporate a spe-
cific set of specialized sensors and feature extraction algorithms. It is important to
extract meaningful features from raw data in order to derive the emotional context
visible from these features.
The context layer introduces a set of emotional contexts which are abstractions
of real-world emotions (the state of a person’s emotions) of both negative and
positive valence, each as function of combined available cues. It is only at this level
of abstraction, after facial, gestural, psychophysiological, and prosodic feature
extraction and dimension reduction, data normalization, and noise elimination in
the cue layer, that information from multiple, diverse sensors is combined for
computation of emotional context. The architecture for emotional context aware-
ness does not prescribe specific methods for computationally reasoning about
emotional context from potential cues. Ontological algorithms, rule-based algo-
rithms, statistical methods, and neural networks may be employed; it is also feasible
to adopt a hybrid approach, which can integrate some of these approaches at the
level of representation, reasoning, or both, depending on the characteristics of the
concrete application (see next chapter for further detail). In the case of using only
facial expression as emotion carrier to detect emotional cues for computing some
basic user’s emotional states, emotional context awareness can be treated as a
typical machine learning classification problem (using unsupervised techniques),
the process of mapping between raw sensor data and an emotional context
description. In this case, the context-aware application automatically identifies or
recognizes a user’s emotional state based on a facial expression from a digital
image or a video frame from a video source, by comparing selected facial features
from the image and a facial expression database, for instance, thereby inferring
high-level emotional context abstraction on the basis of one emotion channel—
facial cues. In fact, most research has centered upon recognizing facial expression
as a source that conveys a wealth of contextual information. Otherwise emotional
4.6 Multi-sensor Based Context Awareness 149

context is calculated from all available cues generated from diverse types of sen-
sors. The mapping from cues to emotional context may be explicit, for instance
when certain cues are known to be relevant indicators of an emotional context—
e.g., emotional states deduced from six universal facial displays: happiness, anger,
disgust, sadness, fear, and surprise—in relation to specific applications, or implicit
as to other types of idiosyncratic or complex emotional states (e.g., interest,
uninterest, boredom, frustration) in the outcome of unsupervised or supervised
learning techniques. If ontological approach to modeling and reasoning is used as a
basis for recognition algorithm, emotional context recognition can be processed
using equivalency and subsumption reasoning in description logic, i.e., to test if two
emotional context concepts are equivalent or if an emotional context concept is
subsumed by one or more context concepts (see Chap. 5 for clarification).

4.6.3 Visual Approach to (Emotional) Context Reading

As to the inference of emotional context (and also estimation of emotional states),


visual types of emotion channels in their multiple forms, especially facial cues, are
more used (and accessible) than auditory channels, hence the more frequent use of
image sensors than audio sensors and biosensors. Indeed, they are currently of wide
applicability in the area of context-aware (and affective) computing. Visual context
constitutes a rich resource from which more specific context can be derived by
means of feature extraction and image analysis (Gellersen et al. 2001). There exist
various visual augmented wearable and embedded systems that embody a camera
and face recognition software to obtain highly specific visual context (see
Farringdon and Oni 2000). This relates to computer vision, an area which deals
with the capture of all visual aspects of context associated with facial displays, hand
gestures, eye movement, and other body movements, using image sensors. As a
technological area, computer vision aims to apply its theoretical models to the
building of computer vision systems with human visual capabilities. As a field of
study, computer vision involves methods for acquiring, processing, analyzing, and
understanding images and high-dimensional data from the real-world in order to
produce numerical or symbolic information (Klette 2014; Shapiro and Stockman
2001; Morris 2004; Jähne and Haußecker 2000). Computer vision has also been
referred to as the enterprise of automating and integrating a wide range of processes
and representations for vision perception (Ballard and Brown 1982; Barghout and
Sheynin 2013). As a branch of scientific knowledge—in relation to AmI, computer
vision deals with the theory behind context-aware and emotion-aware systems that
extract various types of information from images, where related data can take such
forms as video sequences or views from multiple cameras, which can be embedded
in user interfaces, wearable, or spread in the environments. Relevant sub-areas
of computer vision include learning, motion estimation, video tracking, object
recognition, and event detection.
150 4 Context Recognition in AmI Environments …

4.7 Research in Emotional and Cognitive Context


Awareness

Research is burgeoning within the area of emotional and cognitive context aware-
ness. A range of related specialized hardware and software systems, including signal
processing methods, multi-sensor data fusion techniques, pattern recognition algo-
rithms, (hybrid) representation and reasoning techniques, multimodal user inter-
faces, and intelligent agents are under vigorous investigation—design, testing,
evaluation, and instantiation in laboratory settings. Today’s state-of-the-art enabling
technologies and processes of human factors related context awareness is viewed as
satisfactory and the increasing level of R&D into the next generation of these
technologies and processes is projected to yield further advances. The aim is to
augment future AmI systems with human-like cognitive and emotional capabilities
to enhance their functioning and boost their performance not only in terms of context
awareness but also in terms of affective computing and computational intelligence
(e.g., dialog and conversational systems and behavioral systems that can adapt to
human behavioral patterns). One key aim of ongoing research endeavors is to
improve the recognition or inference of highly complex, dynamic, and multidi-
mensional human contexts, such as multifaceted emotional states, demanding tasks,
and synchronized cognitive activities. Towards this end, it is necessary to advance
sensor technology, develop novel signal and data processing techniques and algo-
rithms, and create new dynamic models that consider the relationship between
cognition, emotion, and motivation in relation to human context, among others. In
relation to sensor technology, if is of equal importance to advance the use of natural
modalities—natural human communication forms, as they are key crucial for the
effective functionality of emotional and cognitive context awareness in terms of
providing a wealth of contextual information. Emotional and cognitive context can
indeed be captured as an implicit input based on multiple signals from the user’s
verbal and nonverbal communication behavior. Hence, human factors context-aware
systems will be equipped with miniaturized, multisensory devices—embedded in
user interfaces, attached to human body, and spread in the environment—that can
combined detect complex cues of human context. Given their computational capa-
bilities, these sophisticated devices are aimed at capturing dynamic contextual
information by reading multimodal sources (e.g., emotional cues coming from facial
expressions, gestures, and speech and its prosodic features; cognitive cues coming
from eye movement and facial displays), thereby enabling complex inferences of
high-level context abstractions—emotional and cognitive states. Cognitive cues can
also be captured using software inference algorithms to recognize or infer the user’s
intention as a cognitive context. For a detailed account of emotional and cognitive
context awareness, the reader is directed to Chaps. 8 and 9, respectively.
Given the criticality of emotion and cognition in human functioning processes,
as part of AmI research, multi-sensor emotional and cognitive context-aware
systems need to be thoroughly tested and evaluated as instantiations in their
operating environments. The evaluation of their performance should be carried out
4.7 Research in Emotional and Cognitive Context Awareness 151

against rigorous assessment metrics before their implementation in real-world


environments in order to avoid potential unintended consequences that might result
from their inappropriate functioning, for instance in terms of unpredictability and
unreliability: when they do not function when they are needed and do not react in
ways they are supposed to react. In this regard, it is important to advance compu-
tational modeling of human cognitive processes (e.g., perception, interpretation,
evaluation, emotion, reasoning, etc.) and behaviors based on advanced knowledge
from human-directed sciences. The application of fundamental and computational
theories should be supported by new interdisciplinary studies crossing diverse
academic disciplines, such as cognitive science, cognitive psychology, cognitive
neuroscience, human communication, and so on. Moreover, the development of
emotional and cognitive context-aware applications should emphasize a central role
to the user; especially, research in the area is still in its infancy, hence the opportunity
to develop better systems. It is valuable to carry out new empirical, in-depth studies
on user perception of context-aware interaction, with a particular emphasis on
emotional and cognitive aspects of context with the aim to inform and enhance
dynamic models of context for a successful real-world implementation of
context-aware applications. While creating context-aware system that mimics
human cognitive processes—brain-computer interface—is a daunting challenge due
to the complexity inherent in modeling such processes into computer systems,
computer scientists postulate that creating intelligent MEMS machines will lead to
direct brain-computer interfaces that will enable humans to communicate their
cognitive activities and emotional states to computers and that (machine intelli-
gence) may change our world beyond recognition.

4.8 Multi-sensor Fusion for Multimodal Recognition


of Emotional States in Affective Computing

Multi-sensor fusion technology is of high applicability in affective computing, as it


dovetails with the multimodal nature of emotions, in addition to its versatility.
Affective (emotion-aware HCI) applications (e.g., Zhou et al. 2007) should be
equipped with the so-called multimodal user interfaces, which incorporate a wide
variety of miniature sensors designed to detect different types of signals pertaining
to users’ emotional or emotion-related states by reading various emotional cues,
including facial cues, vocal cues, gestural and bodily cues, physiological cues,
psychophysiological cues, and actions cues. In an attempt to emulate the way
human sense and perceive emotional cues in others, affective or emotion-aware
applications use auditory, visual, and biometric sensing modalities: image sensor to
capture facial expression, eye gaze, and gestures; audio sensor to detect prosodic
patterns; speech recognition device to capture emotiveness; and wearable biosensor,
attached to users, to measure psychophysiological data. But most attempts to rec-
ognize emotion have used a single sensor approach, by focusing either on facial
expressions or hand gestures (by image analysis and understanding), speech,
152 4 Context Recognition in AmI Environments …

or psychophysiological responses (see below for related studies). Affective systems


are increasingly being equipped with multisensory devices used for multimodal
detection and feature extraction of emotional cues. Planning to expand their work
on facial expression recognition, Wimmer et al. (2009) state that they aim at
integrating multimodal feature sets and apply multi-sensor fusion. More research
projects within the area of conversational agents and affective computing are
working on developing methods for implementing multi-sensor fusion in emotion
recognition. One interesting project being carried out at the MIT is called ‘Machine
Learning and Pattern Recognition with Multiple Modalities’ (MIT Media Lab
2014). The aim of this project is to develop new theory and recognition algorithms
to enable computer systems to make rapid and accurate inferences from multiple
modes of data, i.e., determining a user’s emotional or affective state using multiple
sensors, including video, mouse behavior, chair pressure patterns, or physiology.
The underlying assumption is that the more an affective or emotion-aware system
knows about its user and the emotional situations in which it is used the better it can
provide user assistance.
Recognizing emotional states entails the extraction of features from the collected
raw data as a result of detection. At this level of abstraction, after facial, gestural,
psychophysiological, and prosodic feature extraction and dimension reduction, data
normalization, and noise elimination, information from multiple, diverse sensors is
combined for computation of the state of a user’s emotions. This process usually
entails parsing or analyzing data through a variety of processing environments
(application fields of signal and data processing), such as feature extraction (e.g.,
image understanding); emotional speech processing (e.g., speech recognition and
synthesis tools); image (facial expressions) processing—in digital cameras and
computers; audio signal processing—for electrical signals representing sound (e.g.,
speech); speech signal processing—for processing spoken words (e.g., emotive-
ness); and quality improvement (e.g., noise reduction and image enhancement). For
example, vocal parameters and prosodic features are analyzed through speech
pattern recognition in Dellaert et al. (1996) and Lee et al. (2001). In sum, the
recognition process, which is done in a real-time fashion, involves signal and data
processing algorithms and machine learning techniques: the collected affective data
is converted into digital representations; fused and aggregated in the form of cues;
and then processed and interpreted to deduce emotional states that should make
sense for both the application and the user. Then the affective or emotion-aware
system can act upon the identified emotional states, by firing actions to support
users’ emotional needs—providing users with proper emotion services instantly.
The accuracy and completeness of detected emotional information and thus the
efficiency of its processing and interpretation depends on the type of modality an
affective or emotion-aware system may utilize, such as visual, auditory, and/or touch,
as well as the number of emotion channels or carriers that it may have access to.
Ideally, different emotional cues should be considered. Or, some emotional cues can
be taken into account more than others, depending on the context, e.g., in relation to
physical environment, in the case of darkness, speech emotional or psychophysio-
logical cues would be more relevant, as facial emotional cue may not be visible.
4.8 Multi-sensor Fusion for Multimodal … 153

The assumption is that not all emotional cues can be available together, as context
may affect the accessibility of emotional cues that are relevant. Also, the context (e.g.,
physical conditions) in which a user is in a concrete moment may also influence
his/her emotional states, which are likely to be externalized and translated into a form
intelligible to an affective or emotion-aware system through relevant emotion
channels. Moreover, in terms of sociocultural environment, various factors can have
an effect on emotion expression and identification, e.g., verbal cues related to the user
language or idiosyncratic facial emotional properties associated with the user’s
culture. Furthermore, it is important to note that the more channels are involved, the
more robust estimation of the user’s emotional states. The same goes for modalities
as to their combination for multimodal recognition. In fact, here might be limits to the
distance at which, for instance speech is audible (easy to hear), and thereby facial
expressions and gestures become more relevant source (emotion carrier) of affective
information. In other words, the advantages of having many different sensors
embedded in user interfaces of affective systems and distributed in the environment
are valued, as some sensor or sensor nodes may fail at any time and local events and
situations may distort some sensor readings. Research in affective computing (e.g.,
MIT Media Lab 2014) is currently investigating how to combine other modes than
visual and auditory to accurately determine users’ emotional states. The assumption
is that a more robust estimation of the user’s emotional states and thus relevant,
real-time emotional responsive services is dependent on a sound interpretation and
processing based on a complete detection of emotional information—i.e., multiple,
diverse sensors, assembled for acquisition of multi-sensor data. Put differently, the
potential of machine learning techniques can only be exploited to generate sophis-
ticated inferences about emotional states through reasoning processes—if based on
complete sensor data. While it is more effective to consider various modalities and
channels and thus multiple, diverse sensors, when it comes to capturing emotional
states, sensing emotional information and perceiving emotional states must be based
on multi-sensor fusion technology, a process which is inspired by (emulate) the
human cognitive processes of sensation and perception. This entails creating novel
signal processing, specialized data processing, and machine learning mechanisms for
efficient fusion and aggregation of sensor data into features and making inferences
about emotional states. All in all, multi-sensor fusion for multimodal recognition of
emotions provides many intuitive benefits that should be exploited to develop
sophisticated and powerful affective and emotion-aware systems.

4.9 Multi-sensor Systems: Mimicking the Human


Cognitive Sensation and Perception Processes

With high applicability in the field of context-aware computing and affective


computing, multi-sensor systems were initially inspired by human cognition as
information processors, in particular the cognitive process of sensation–perception
where the brain fuses sensory information, received as signals, from the various
154 4 Context Recognition in AmI Environments …

sensory organs (i.e., visual, audio, and touch receptors) and associate these signals
with a concept (e.g., the positive or negative state of a person’ emotion)—attaching
a meaning to the sensory information as part of the cognitive process of perception,
which involves recognition, interpretation, and evaluation as mental sub-processes
(see below for more detail). The information processing model of cognition is the
dominant paradigm in the disciplines of cognitive psychology, cognitive science,
and AI (e.g., machine learning). Thus, multi-sensor systems have outgrown these
disciplines, and the information processing view is supported by many years of
research across many disciplines.
From a cognitive psychology perspective, mental processes are the brain
activities that handle information when sensing and perceiving objects, events, and
people and their states (as well as when solving problems, making decisions, and
reasoning). Humans are viewed as dynamic information processing systems whose
mental operations are described in computational terminology, e.g., inputs, struc-
tures, representations, processes, and outputs. Information processing model is a
way of thinking about mental processes, envisioning them as software programs
running on the computer that is the brain. This relates to mental
information-manipulation process that operates between stimulus and response.
‘The notion that mental…processes intervene between stimuli and responses
sometimes takes the form of a ‘computational’ metaphor or analogy, which is often
used as the identifying mark of contemporary cognitive science: The mind is to the
brain as software is to hardware; mental states and processes are (like) computer
programs implemented (in the case of humans) in brain states and processes’
Rapaport (1996, p. 2). For an overview of information processing model and human
cognition as well as cognitive psychology, cognitive science, and AI and the
relationship between them and their contribution to AmI—beyond multi-sensor
systems, he reader is directed to Chap. 8.
In sum, the underlying idea of multi-sensor fusion in AmI is to simulate the
human cognitive processes of sensation and perception into emotion-aware and
context-aware systems. These multi-sensor systems can therefore be seen as a
computational rendition of human cognitive processes of sensation and perception
in terms of detecting and fusing sensory information from various types of sensors
and link sensor readings or observations to emotional states as human-defined
concepts. Therefore, the design of multi-sensor context-aware systems and the
development of related computational cognitive processes—detection, processing,
interpretation, and recognition algorithms—attempt to mimic the human sensory
organs and the associated sensation and perception processes. In computing, the
sensing process involves the acquisition and pre-processing of low-level data col-
lected by multisensory devices and the recognition process entails the interpretation
of and reasoning on information to generate high-level abstractions of contexts. The
typical computational processes underlying context recognition encompass:
detection, fusion, aggregation, classification, interpretation, evaluation, and infer-
ence (in addition to learning in the case of machine learning). However, the way
sensory devices and recognition algorithms function in computers are still far from
how human sensory organs detect signals and human brain fuses sensory
4.9 Multi-sensor Systems: Mimicking the Human Cognitive … 155

information and further processes it for further perception and thus recognition.
Computational artifacts and processes are circumscribed by the constraints of
existing technologies as well as engineering theory and practice. In other words, the
sensors and pre-processing and analysis mechanisms—of the multi-sensor systems
—are technology-driven, i.e., their development is driven by what is technically
feasible, rather than by how the cognitive processes of sensation and perception
function according to cognitive psychology theories (e.g., Passer and Smith 2006).
In fact, there is a tendency not only in context-aware systems but in all kinds of
computer systems towards reducing the complexity of various human cognitive
processes, such as problem solving, emotion, attention, motivation reasoning,
decision making, and so on (in other words: alienating the concepts from their
complex meaning in more theoretical disciplines, such as cognitive psychology) to
serve technical purposes. Thus, the way the sensation and perception cognitive
processes as concepts are operationalized has impact on how multi-sensors and
related computational processes (i.e., signal and data processing algorithms and
pattern recognition techniques) are designed, developed, implemented, and function
in real-world (operating) environments—that is, in an simplified way that result in
imperfect sensing, imperfect inferences, and thus imperfect behaviors.
One implication of the oversimplification underlying the design and modeling of
AmI systems, including multi-sensor context-aware systems, is that in recent years,
some scholars have suggested and others strongly advocated revisiting the whole
notion of intelligence in Am in such a way to give humans a key role in influencing
the representations and thus shaping the actions of nonhuman machines, by
exposing humans to the ambiguities raised by the imperfections pertaining to the
functioning and behavior of AmI systems. As José et al. (2010, p. 1487) state,
‘Letting people handle some of the semantic connections of the system and the
ambiguities that may arise, would overcome many of the complex issues associated
with the need to perfectly sense and interpret the state of the world that many AmI
scenarios seem to announce… [W]e should recognize that many of the complex
inference problems suggested for AmI are in fact trivial when handled by people.
Moreover, even when inferences are simple, systems are not uniform and there will
always be some type of technical discontinuity that may affect sensing and thus the
ability to always get it right.’
Indeed, there is a fundamental difference between computer systems and humans
in terms of cognitive functions as well as biological designs. It may be useful to
provide some theoretical insights drawn from cognitive psychology to give a rough
idea about what characterizes human sensory organs and related cognitive pro-
cesses. Human senses are realized by different sensory receptors. The receptors for
visual, auditory, tactile, olfactory, and gustatory signals are found in the eyes, ears,
skin, nose, and tongue, respectively. The information gathered in these receptors—
sensory information—during the perceptual analysis of the received stimuli is
supplied to the brain that infuses and processes it in a very dynamic, intricate, and
often unpredictable way. Different models of sensation–perception have been
studied in cognitive psychology. There are several unsolved issues associated with
the mental model in psychology. Among which, there is a significant amount of
156 4 Context Recognition in AmI Environments …

interdisciplinary debate about the exact meaning of cognitive processes as concepts,


including sensation and perception. Many scholars within varying social sciences
consider the ability to consciously process information as the defining characteristic
of humans. However, commonly, sensation–perception process involves perceptual
analysis of signals (sensory perception), recognition by comparing sensory infor-
mation to previous encounters and classifying it into a meaningful category, and
subsequent interpretation and evaluation. Sensation refers to consciousness that
results from the stimulation of a sense organ, and perception is concerned with
recognition and interpretation of a stimulus (Galotti 2004). In terms of sensory
memory, initial memory holds fleeting (short-lived) impressions of sensory stimuli,
holds more information than is held by working memory, but cannot hold infor-
mation as long as it is held by working memory; the initial storage of sensory
information (within the senses) occurs while incoming messages are being trans-
mitted to the brain (Ibid). Recognition is a process of generating and comparing
descriptions of objects (and other entities from the external environment) currently
in view, retained in the working memory, with descriptions of objects seen pre-
viously, which are stored or reside in the long memory (Braisby and Gellatly 2005).
It entails seeing something as experienced before or familiar (Galotti 2004).
Recognition is considered as the initial process of perception before the information
is semantically classified, named, and then stored in the long-term memory. The
subsequent interpretation of information is associated with meaning attachment or
significance attribution. Theoretically, the cognitive activity of interpretation in
humans occurs after the brain fuses sensory information from multiple sensory
organs and processes it for recognition. The perception process entails the brain
becoming aware of (and evaluating in time) the information taken in via the human
multisensory system. In sum, sensation process is concerned with the processing of
multisensory information whereas perception with the recognition, interpretation,
and evaluation of this information using different cognitive processes, including
memory, attention, emotion, motivation, and reasoning. All these mental processes
interlink and interact in a dynamic way during the perception process. The cog-
nitive processes working together in the formation of thought serve to acquire
information and make both conscious as well as subconscious inferences about the
world around us, and senses utilized in this complex process serve as a means of
gathering and supplying information. This relates to what is called dynamic mental
model, which is sometimes referred to as a theory of mind (e.g., Goldman 2006;
Gärdenfors 2003; Baron-Cohen 1995; Dennett 1987), and may encompass emotion,
attention, motivation, and belief. This dynamic model is one of the models that are
actually used together with sensor observation as input to AmI systems to analyze
and estimate or infer what is going on in the human’s mind as cognitive and
emotional context.
Perceptions can be viewed as patterns for organizing our cognition of reality. In
other words, cognitions are based on perceptions, which represent mental and social
models. Studies on cognitive psychology, cognitive science, and AI, have shown
that situations, places, objects, and events are never perceived by working from
their inherent or individual component parts to the whole, but rather by ascribing an
4.9 Multi-sensor Systems: Mimicking the Human Cognitive … 157

overall, familiar, structure to situations, places, objects, and events—on the basis of
mental and social representations. Humans resort to existing schemata that provide
a recognizable meaning to make sense of what constitutes reality in its complex
aspects. Inspired by human cognitive perception process, context-aware systems
infer a high-level abstraction of context through executing recognition, interpreta-
tion, and evaluation mechanisms. Deriving high-level context information from
sensor data (values and cues) together with dynamic models (human knowledge
represented in a computational and formal format) by means of such mechanisms is
about bringing meaning to low-level context data. To model, represent, and reason
about context, different context information representation and reasoning tech-
niques have been developed and applied in the field of context-aware computing
based on a wide variety of approaches, e.g., probabilistic methods, rule-based
methods, ontology-based (description logic) approaches, and hybrid approaches.
Regardless of the type of the approach to context representation and reasoning,
recognition algorithms have not yet reached a mature stage, and thus do not operate
or function at the human cognitive level. There is a long way to go to emulate
human cognitive representations, structures, and processes associated with the
perception process that occurs in the human brain. In fact, understanding mental
information-manipulation processes and internal representations and structures used
in cognition has for long been the key concern of cognitive scientists, who indeed
seek to investigate how information is sensed, perceived, represented, stored,
manipulated, and transformed in the human brain. Among the most challenging
research questions is to understand and implement in computer systems the way in
which affective and motivational states influence sensation and perception as
cognitive processes, and how to computationally model what constitutes the cog-
nitive processes as encompassing information processing at the subconscious level,
not only at the conscious level—the ability to think and reason, which is restricted
or exclusive to humans and has been under research in AI and cognitive science for
several decades. The underlying assumption is that there is a plethora of infor-
mation part of us and around us at all moments, shaping our perceptions and
conclusions and allowing decisions or actions to be made about what is around us.
These and many other aspects of human cognitive functioning (or cognition) cannot
be modeled in artificial systems, and hence it is unfeasible for multi-sensor
context-aware systems to operate at the level of human cognitive information
processing—in terms of the sensation and perception processes. In fact, although
the notion of intelligence as ‘an integral part of some of the most enticing AmI
scenarios’ ‘has inspired a broad body of research into new techniques for improving
the sensing, inference and reasoning processes’ (José et al. 2010), no real break-
through in context awareness research is perceived in this regard. The meaningful
interpretation of and efficient reasoning about information remains by far the main
hurdle in the implementation of context-aware systems due to the fact that most of
the interpretation and reasoning processes involve complex inferences based on
imperfect and inadequate sensor data as well as oversimplified cognitive, emotional,
behavioral, social, cultural, and even physical models. A number of subtasks for
158 4 Context Recognition in AmI Environments …

realizing reliably recognition and interpretation of contexts as implicit input are not
solved yet, and this seems at the current stage of research in context awareness close
to impossible (Schmidt 2005).

4.10 The State-of-the-Art Context Recognition

As a result of the continuous efforts to realize and deploy AmI paradigm, which is
evolving due to the advance and prevalence of smart, miniaturized sensors and
computing devices, research is currently being carried out in all domains associated
with AmI, ranging from low-level data acquisition (i.e., sensing, signal processing,
fusion), to intermediate-level information processing (i.e., recognition, interpreta-
tion, reasoning), to high-level application and service delivery (i.e., adaptation and
actions). Most research in AmI focuses on the development of technologies for
context awareness as well as the design of context-aware applications. This
involves MMES, multi-sensor fusion techniques, data processing, pattern recog-
nition algorithms, multimodal user interfaces, software agents, actuators, and query
languages.
Context awareness is a prerequisite technology for the realization of AmI vision,
hence the growing interest and burgeoning research in the area of context recog-
nition. This has emerged as a significant research issue related to the thriving
development of AmI towards the realization of intelligent environments. This
relates to the fact that the system’s understanding (analysis and estimation) of the
user’s context, which is based on observed information and dynamic models, is a
precondition for the delivery of (relevant) intelligent services, or that various
entities (e.g., emotional states, cognitive states, tasks, social dynamics, situations,
events, places, and objects) in an AmI environment provide important contextual
information that should be exploited in such that the intelligent behavior of the
system within such an environment must be pertinent to the user’s context. Context
recognition has been an intensively active and rapidly evolving research area in
AmI. While early work—use of context awareness within AmI environments—
directed the focus towards the analysis of physical information, such as location and
physical conditions, as a means to recognize physical context, more recent research
has shifted to the employment of multiple miniature sensors entrenched in computer
interfaces and spread in the surrounding environment to recognize complex features
of context. These sensors are used to acquire the contextual data required for the
process of recognizing—detecting, interpreting, and reasoning about—such con-
texts as emotional states, cognitive states, task states, and social settings. Therefore,
the focus in research within AmI is being directed towards human factors related
context. Accordingly, a multitude of recognition approaches and pattern recognition
methods that have been proposed and studied are being experimented with, and the
main differences between each of these are the manner in which different types of
context, in relation to various application domains, are modeled, represented,
4.10 The State-of-the-Art Context Recognition 159

reasoned about, and used. Indeed, existing approaches to context modeling and
reasoning, such as probabilistic methods, ontology-based approaches, rule-based
methods, and relational databases are often integrated for optimal results and in
response to the increasing complexity of new context-aware applications as well as
the advancement of context awareness technology in terms of the operationalization
of context and its conceptualization and application, giving rise to a whole set of
novel complex pattern recognition algorithms. In all, existing approaches to context
recognition are thus numerous and differ in many technical and computational
aspects. Context awareness has been extensively studied in relation to various
domains, and work in the field has generated a variety and multiplicity of lab-based
applications, but a few real-world ones, involving the use of various pattern rec-
ognition algorithms. In this chapter, the emphasis is on machine learning approa-
ches to context recognition algorithms, and a wide range of related applications are
surveyed. Ontology-based and hybrid approaches and related applications are
addressed in Chap. 5.

4.10.1 Context Recognition Process

Context recognition refers to the process whereby various contextual features or


states of the user as an entity (including location, lighting, time, temperature,
specific motion pattern, absolute position, behavior, intention, personal event,
mental state, psychophysiological state, affect displays, social dynamics, social
relations, task state, to name a few ingredients) are detected (or monitored) and
analyzed to infer a particular high-level abstraction of context—e.g., situational,
cognitive, emotional, or social dimensions of context. The process encompasses
many different tasks, namely context modeling and representation, detection and/or
behavior monitoring, data processing, and pattern recognition. Context awareness
functionality entails acquiring and processing contextual data, and then analyzing
using machine learning techniques or interpreting and reasoning using ontological
mechanisms tied to specific representation formalisms in order to estimate or infer a
particular context. To perform emotional context recognition, for instance, it thus is
necessary to, using Chen and Nugent’s (2009) terminology:
1. create computational emotional context models in a way that enables software
agents to perform reasoning and manipulation;
2. detect facial, vocal, gestural, and psychophysiological aspects of the user as an
atomic level of the context or monitor and capture a user’s emotional behavior;
3. process observed information through aggregation and fusion to generate a
high-level abstraction of emotional context;
4. decide which context recognition algorithm to use, which is based on the
manner in which emotional context is modeled, represented, and reasoned; and
finally
5. carry out pattern recognition to estimate or infer the user’s emotional state.
160 4 Context Recognition in AmI Environments …

These steps can roughly be applicable to the recognition of different dimensions


of context. Researchers from different application domains in the field of context-
aware computing have investigated context recognition for the past two decades by
developing and enhancing a variety of approaches, techniques, and algorithms in
relation to a wide variety of context-aware applications. Based on the way the
above tasks can be undertaken, context recognition can be categorized into different
classes.

4.10.2 Movement Capture Technologies and Recognition


Approaches

With the omnipresence of embedded, networked, distributed sensing and comput-


ing devices characteristic to AmI to support human action, interaction, and com-
munication whenever needed, the interactions between users and computers will
drastically change. In other words, in the world of AmI, everyday human envi-
ronment will be permeated by an overwhelming amount of active devices, and
therefore humans are likely to interact with most if not all kinds of sensors and
computing devices. Hundreds of thousands of such devices accompanying and
surrounding people will form, or function as, a unified interface through which they
can interact with AmI. The permeation process continues to expand with recog-
nition systems for facial expression, behavior monitoring, and body movement
tracking. With biometric technology, our faces, gestures, eyes, voices, and move-
ments will be used to model the way we live (Jain 2004; Oviatt et al. 2004). Human
movements can provide a wealth of contextual information as implicit input to
context-aware systems indicating the users’ emotional, cognitive, physiological,
and social states and their activities and behaviors. Accordingly, enabling tech-
nologies for capturing human movement data are advancing and proliferating,
taking many forms and being used in a wide variety of ways, either directly or
indirectly. With all these evolving developments, the AmI vision may no longer
seem a distant one.
Detecting or monitoring a user’s context is a decisive task in context recognition.
It is a process that is responsible for observing or capturing relevant contextual
information (in addition to dynamic models such as mental, physiological, behav-
ioral, and social) for context recognition systems to deduce or estimate a user’s
context. As to the approach, data type, and available integration possibilities of
existing capture technologies, there exist currently three main context recognition
approaches—multisensory-based, vision-based, and sensor-based (wearable sensors
and sensors attached to objects) context recognition. The most frequently used
context recognition approaches are based on multiple, diverse sensors. As pointed
out earlier, multi-sensor-based context recognition has evolved tremendously over
the last decade and attracted an increasing attention among researchers as minia-
turized and smart sensors, wireless communication networks, and AmI infrastruc-
tures have technically matured and also become affordable. Multi-sensor fusion has
4.10 The State-of-the-Art Context Recognition 161

been under vigorous investigation in the development of new context-aware appli-


cations for user assistance or support. Other context recognition approaches have also
been used in relation to different application domains. This implies that the context
recognition approach has been and can be applied to different types of context-aware
systems. The suitability and performance of existing context recognition approaches
depends on the nature of the type of contexts (e.g., emotional state, cognitive state,
situational state, activity, etc.) being assessed and the technical features of the con-
crete applications, e.g., MEMS features, hybrid models of context. MEMS
advancements have brought forth many changes to sensors in terms of performance,
networkability, mobility, flexibility, self-configurability, self-localization, and
self-powering. Also, integrated approaches to context modeling and reasoning have
proven to be very effective as to achieving optimal results for context inferences.
Capture technologies have moreover been applied in application domains outside
context awareness, e.g., recognizing facial and eye movements and gestures as
commands

4.10.2.1 Human Activities and Physical Movement

In activity recognition, vision-based, wearable sensor-based, and object-based rec-


ognition approaches have been used. Vision-based recognition approach entails using
visual sensing facilities, e.g., camera-based surveillance systems, for interaction and
visual activity recognition (Ivano and Bobick 2000), humans tracking and activity
recognition (Bodor et al. 2003), human activity monitoring and related environment
changes (Fiore et al. 2008), and face and body action recognition (Gunes and Piccardi
2005). Vision-based activity recognition ‘exploits computer vision techniques to
analyze visual observations for pattern recognition. Vision-based activity recognition
has been a research focus for a long period of time due to its important role in areas
such as human–computer interaction, user interface design, robot learning and sur-
veillance. Researchers have used a wide variety of modalities, such as single camera,
stereo and infrared, to capture activity contexts. In addition, they have investigated a
number of application scenarios, e.g., single actor or group tracking and recognition.
The typical computational process of vision-based activity recognition is usually
composed of four steps, namely object (or human) detection, behavior tracking,
activity recognition and finally a high-level activity evaluation.’ (Chen and Nugent
2009, p. 412). However, vision-based activity recognition approaches are associated
with some shortcomings: they ‘suffer from issues related to scalability and reusability
due to the complexity of real-world settings, e.g., highly varied activities in natural
environment. In addition, as cameras are generally used as recording devices, the
invasiveness of this approach as perceived by some also prevents it from large-scale
uptake in some applications, e.g., home environments’ (Ibid).
Sensor-based activity approach is commonly used to monitor an actor’s behavior
along with the state change of the environment. The typical computational process of
this approach consists of data collection using signal and data processing algorithms
and analysis and recognition of activity using data mining and machine learning
162 4 Context Recognition in AmI Environments …

techniques. This approach involves wearable sensors that can be attached to a human
actor whose behavior is being monitored or to objects that constitute the environ-
ment where the human actor is performing a given activity—sensor augmentation of
artifacts of use in daily living. On-body sensors include accelerometers, gyroscopes,
biosensors, vital processing devices, and RFID tags (use radio waves to remotely
identify people or objects carrying reactive tags). Based on networking RFID tags
humans are expected to be overwhelmed by huge amount of personalized real-time
responses in AmI environments. However, wearable sensor-based activity recog-
nition approach has been extensively used in the recognition of human physical
activities (Bao and Intille 2004; Huynh 2008; Lee and Mase 2002; Parkka et al.
2006), such as walking, sitting down/up, or physical exercises. Radar as an indirect
system has also been used for human walking estimation (van Dorp and Groen
2003). Tapia and Intille (2007) have used wireless accelerometers and a heart rate
monitoring device for real-time recognition of physical activities and their intensi-
ties. Wearable sensors have been used to recognize daily activities in a scalable
manner (Huynh et al. 2007). Accelerometers sensing movements in three dimen-
sions have been employed in wearable implementations (DeVaul et al. 2003; Ling
2003; Sung et al. 2005), incorporated into a mobile phone (Fishkin 2004). As a novel
approach, a wrist-mounted video camera has been used to capture finger movements
and arm-mounted sensing of electrical activity relating to hand movement (Vardy
et al. 1999). In all, due to their reduced cost and wide availability, accelerometers are
probably the most frequently used as wearable sensors for data acquisition and
activity recognition for human body movements. However, given the prerequisites
of wearable computers (Rhodes 1997), it is crucial to keep sensors to a minimum and
as resource friendly as possible. For this reason, many researchers have considered
using fewer accelerometers to measure different aspects of user body positions (Kern
et al. 2002; Lee and Mase 2002; Park et al. 2002), attempting to avoid
over-complicated and over-resource intensive processes. Otherwise this may put
constraints on the real-world implementation of AmI systems. For example, Van
Laerhoven et al. (2002) have used more than thirty accelerometers to build models of
a user’s posture. While wearable sensors provide some benefits, they are associated
with some limitations. ‘The wearable sensor based approach is effective and also
relatively inexpensive for data acquisition and activity recognition for certain types
of human activities, mainly human physical movements. Nevertheless, it suffers
from two drawbacks. First, most wearable sensors are not applicable in real-world
application scenarios due to technical issues such as size, ease of use and battery life
in conjunction with the general issue of acceptability or willingness of the use to
wear them. Second, many activities in real-world situations involve complex
physical motions and complex interactions with the environment. Sensor observa-
tions from wearable sensors alone may not be able to differentiate activities
involving simple physical movements’ (Chen and Nugent 2009, 413). In fact,
operationalizing many types of human activities and their contexts in daily living—
human interactions with artifacts in the situated environment—pose many technical
issues that need to be addressed, especially oversimplification of concepts.
Accordingly, the object-based activity recognition approach has emerged to address
4.10 The State-of-the-Art Context Recognition 163

the drawbacks associated with wearable based recognition approach (Philipose et al.
2004), in activity recognition. Based on real-world observations, this approach
entails that ‘activities are characterized by the objects that are manipulated during
their operation. Simple sensors can often provide powerful clues about the activity
being undertaken. As such it is assumed that activities can be recognized from sensor
data that monitor human interactions with objects in the environment… It has been,
in particular, under vigorous investigation in the creation of intelligent pervasive
environments for ambient assisted living (AAL)… Sensors in an SH can monitor an
inhabitant’s movements and environmental events so that assistive agents can infer
the undergoing activities based on the sensor observations, thus providing
just-in-time context-aware ADL assistance.’ (Chen and Nugent 2009, p. 413).
An interesting European project called ‘Opportunity’, which started in 2009 and
finished in 2011, picks up on recognizing context and activity as the very essential
methodological underpinnings of any (AmI) scenario and investigates methodolo-
gies to design context-aware systems: ‘(1) working over long periods of time
despite changes in sensing infrastructure (sensor failures, degradation); (2) provid-
ing the freedom to users to change wearable device placement; (3) that can be
deployed without user-specific training’ (CORDIS 2011). The activities of the
project center on developing what is called opportunistic context-aware systems
that ‘recognize complex activities/contexts despite the absence of static assump-
tions about sensor availability and characteristics’; ‘are based on goal-oriented
sensor assemblies spontaneously arising and self-organizing to achieve a common
activity/context recognition goal’; ‘are embodied and situated, relying on
sel-supervised learning to achieve autonomous operation’; ‘make best use of the
available resources, and keep working despite…changes in the sensing environ-
ment’. One of the interesting works done in this project is the development of
‘classifier fusion methods suited for opportunistic systems, capable of incorporating
new knowledge online, monitoring their own performance, and dynamically
selecting most appropriate information sources’, as well as unsupervised dynamic
adaptation to cope with changes and trends in sensor infrastructure.

4.10.2.2 Emotional States

Researchers in the field of affective computing and context-aware computing have


investigated emotion recognition for the past 15 years or so by developing a
diversity of approaches and techniques for different tasks, namely emotion mod-
eling, emotion detection or emotional behavior monitoring, data processing, and
pattern recognition. In the area of HCI, emotional states are recognized using
multisensory-based, vision-based (discussed above) and sensor-based approaches,
where sensors are embedded in user interfaces of computer systems, spread in the
environment, or/and attached to the user—wearable sensors. In particular,
multisensory-based emotional context recognition exploits the emerging
multi-sensor fusion techniques and wireless sensor network technologies to detect a
user’s emotional state or monitor his/her emotional behavior. The sensor data which
164 4 Context Recognition in AmI Environments …

are collected are analyzed using various pattern recognition algorithms based on
ontology, machine learning, data mining (the discovery of previously unknown
properties in the data extracted from databases) or hybrid approaches. The use of
these techniques depends on the type of emotion channels or carriers that are
considered by a given application (in terms of operationalizing and modeling
emotions) to infer a user’s emotional state. Example sources for affective infor-
mation include emotiveness, prosodic features of speech, facial expressions, hand
gestures, and psychophysiological responses. These can be combined depending on
the features of the concrete AmI systems in relation to various application domains
(e.g., affective system, emotional intelligent system, emotional context-aware sys-
tem, context-aware affective system). Research shows that affective and context-
aware HCI applications are increasingly being equipped with the so-called multi-
modal user interfaces (i.e., facial, gesture, voice, and motion tracking interfaces),
which incorporate a wide variety of miniature dense sensors used to detect a user’s
emotional state by reading multimodal sources. Such applications are therefore
increasingly using multi-sensor fusion technology for multimodal recognition of
emotional states, as discussed above.
In computing, studies on emotion may be classified heuristically into two cat-
egories: face-based (micro-movement) recognition and non-face-based
(macro-movement and speech) recognition. The first category, which relates to
simple emotional states, involves recognizing emotions from facial expressions
using image analysis and understanding, and the second category, which pertains to
complex emotional states, focuses on recognition of emotions by modeling and
recognition based on hand gestures, body movement, and speech as human
behaviors (see Chap. 8 for a large body of work on emotion recognition). Laster has
been used for face and gesture recognition for HCI (Reilly 1998). Another popular
method for emotion recognition is biometric data (Teixeira et al. 2008). Dedicated
systems often facilitate the challenge of emotion detection (Ikehara et al. 2003;
Sheldon 2001; Vick and Ikehara 2003). Vital sign processing devices and other
specialized sensors have been used to detect emotional cues from heart rate, pulse,
skin temperature, galvanic skin response, electroencephalographic response, blood
pressure, perspiration, brain waves, and so on to help derive emotional states.
Miniaturization of computing devices, thanks to nano- and micro-engineering, is
making possible the development of wearable devices that can register parameters
without disturbing users.

4.10.2.3 Cognitive States

Researchers in the field of context-aware computing have more recently started to


investigate cognitive context recognition by developing some approaches and
techniques for core tasks, namely cognitive context modeling, cognitive behavior
monitoring, data processing, and pattern recognition. Thus, research in cognitive
context awareness is still in its infancy. Cognitive states can be recognized using
(multi) sensor-based, vision-based, or/and software algorithm-based approaches.
4.10 The State-of-the-Art Context Recognition 165

As channels or carriers of cognitive information, eye movement and facial


expressions can provide data indicating the user’s cognitive states or processes and
thereby implicit input to cognitive context-aware systems. With the advance of
multi-sensor fusion techniques and the availability of advanced pattern recognition
mechanisms today, sensor-based recognition approach using eye movement
tracking and eye gaze, in particular, has attracted increasing attention among
researchers in HCI. It has been under vigorous investigation in the development of
cognitive context-aware applications (cognitive-based ambient user interfaces) for
cognitive support. Sensors can detect a user’s cognitive state or monitor his/her
cognitive behavior so that agents can infer the ongoing cognitive activities based on
the sensor observations or readings and thus provide cognitive support. Eye gaze
movement indicates changes in visual attention and reflects the cognitive states of
the user (Salvucci and Anderson 2001; Tobii Technology 2006). It has been
researched in an attempt to derive finer indicators of such cognitive activities as
writing, reading, information searching, and exploring. Eye tracking tools like Tobii
1750 eye tracker (Tobii Technology 2006), which can be embedded in user
interfaces, have a great enabling potential to gauge the cognitive state of the user.
As an indirect system, infrared has been used for eye movement protocol analysis
(Salvucci and Anderson 2001). Likewise, facial expressions can be used to detect
some cognitive processes. The facial muscles express thought (Scherer 1992, 1994),
that is, indicate cognitive processes. Kaiser and Wehrle (2001) found that a frown
as a facial expression indicates incomprehension. Frowning is associated with
problem solving as a mental state or process. It often occurs when an individual
encounters a difficulty in a task or does some hard thinking while concentrated on,
or attending to, a problem (Ibid). As to vision-based and software algorithm-based
recognition approaches, video cameras have been used for recognizing cognitive
task activities, such as writing, reading, and designing. Cognitive context such as
user’s intention can be inferred using software algorithms (see, e.g., Kim et al.
2007; Kwon et al. 2005) as equivalents to sensors.

4.10.2.4 Physical Environment: Location

Location has been extensively researched in the field of context-aware computing.


Broadly, location is detected using sensor-based approach, where sensors are
ubiquitously distributed or dispersed across geographical areas, embedded in indoor
environments, or entrenched in objects. Sensor-based location recognition exploits
the emerging miniature low-cost sensors, sensor network, and wireless communi-
cation networking technologies, in addition to the fact that ubiquitous computing
infrastructures have become technically mature. Global Poisoning Systems (GPS),
the most commonly used sensor, is a radio navigation system, a space-based global
navigation satellite system (GNSS) that allows determining one’s exact location and
time information, anywhere in the world. GPS sensors are fairly accurate and cheap,
and the supporting infrastructure is already in place. They have undergone radical
reinvention in terms of miniaturization, cost reduction, and high performance.
166 4 Context Recognition in AmI Environments …

As envisaged by Saffo (1997), integrated sensor/GPS modules will become mini-


ature and inexpensive enough to integrate into courier parcels to track the location
in the not-too-distant future. Using hierarchical conditional random fields, GPS
traces have been applied to extract places as well as activities (Liao et al. 2007). In
terms of the computational process of sensor-based location recognition approach,
Ashbrook and Starner (2002) have enhanced the use of GPS by using a hidden
Markov model to predict the user’s possible location. Commonly GPS are con-
sidered as the current standard for outdoor location system. Infrared (IR) sensor is,
on the other hand, preferred for indoor location, in addition to the current wi-fi
system. IR tags (Randell and Muller 2000) or active badges (using radio tags) (Dey
et al. 1999; Farringdon et al. 1999) have also been used for sensing location, in
addition to sensing which other people are in a given location or around a given
person. Lee and Mase (2002) adopt wearable sensors for location recognition.
Clarkson (2003) uses a wearable system capable of distinguishing coarse locations
and user situations. Different locations and situations of an individual user like
‘home’, ‘at work’, or ‘restaurant’ are recognized based on a clustering of video and
audio data recordings.

4.10.2.5 Mobile Devices and Everyday Smart Artifacts

Integrating sensors and microprocessors in everyday objects so they can think and
interact with each other and with the environment is common to the vision of AmI;
it also represents the core of UbiComp vision. Indeed, AmI and UbiComp visions
assume that people will be surrounded by intelligent user interfaces supported by
sensing and computing devices and wireless communication networks, which are
embedded in virtually all kinds of everyday objects, such as mobile phones, books,
paper money, clothes, and so on. However, sensor-based and multi-sensor-based
approaches are the most commonly used in the augmentation of mobile devices and
artifacts with awareness of their environment and situation as context. In relation to
mobile and ubiquitous computing, Gellersen et al. (2001) have attempted to inte-
grate diverse simple sensors as alternative to generic sensor for positioning and
vision, an approach which is ‘aimed at awareness of situational context that cannot
be inferred from location, and targeted at resource constraint device platforms that
typically do not permit processing of visual context.’ The authors have investigated
multi-sensor context awareness in a number of projects and developed various
device prototypes, including Technology Enabling Awareness (TEA): an awareness
module used for augmentation of a mobile phone, the Smart-Its platform for aware
mobile devices, and the Media-cup exemplifying context-enabled everyday arti-
facts. (See Beigl et al. (2001) for experience with design and use of
computer-augmented everyday artifacts). The sensor data collected were analyzed
using different methods for computing situational context, such as statistical
methods, rule-based algorithms, and neural networks.
4.10 The State-of-the-Art Context Recognition 167

4.10.2.6 Human Movement as New Forms of Explicit Input

Not only are sensors (embedded in user interfaces) used for detecting emotional and
cognitive states in HCI, but also for receiving new forms of explicit input so that
assistive agents can execute commands based on the sensor detection of signals,
thus performing many tasks effectively. In this case, user movement as explicit
input can be employed as part of a multimodal input or unimodal input design. To
provide intuitiveness and simplicity of interaction and hence reduce the cognitive
burden to manipulate systems, facial movements, gestures, and speech can allow
new forms of explicit input. Eye gaze, head movement, and mouth motion as facial
movements and hand gestures are being investigated in the area of HCI so that they
can be used as direct commands to computer systems. For example, using dedicated
sensors, facial interfaces with eye gaze tracking capability, a type of interface that is
controlled completely by the eyes, can track the user’s eye motion and translate it
into a command to perform different tasks, such as scrolling, dragging items, and
opening documents. Adjouadi et al. (2004) describe a system whereby eye position
coordinates were obtained using corneal reflections and then translated into
mouse-pointer coordinates. In a similar approach, Sibert and Jacob (2000) show a
significant speed advantage of eye gaze selection over mouse selection and consider
it as a natural, hands free method of input. Adjouadi et al. (2004) propose remote
eye gaze tracking system as an interface for persons with severe motor disability.
Similarly, facial movements have been used as a form of explicit input. As an
alternative to aid people with hand and speech disabilities, visual tracking of facial
movements has been used to manipulate and control mouse cursor movements, e.g.,
moving the head with an open mouth which causes an object to be dragged (Pantic
and Rothkrantz 2003). Also, de Silva et al. (2004) describe a system that tracks
mouth movements. In terms of gestures, utilizing distance sensors, Ishikawa et al.
(2005) propose touchless input system based on gesture commands. As regards
speech, it can be very promising as a new form of explicit input in various appli-
cation domains. On a mobile phone, given the size of its keypad, a message may be
cognitively demanding to type but very easy to be spoken to the phone. The whole
idea is to incorporate multiple modalities as new forms of explicit input to enhance
usability as a benefit to HCI. The limitation of one modality is offset by the
strengths of another, or rather used based on the context in which the user is, since
the context determines which modality can be accessible.

4.10.3 Context Recognition Techniques, Models,


and Algorithms

As high-level contexts can be semantically abstracted from contextual cues extracted


from low-level context data obtained from physical sensors, human knowledge and
interpretation of the world must be formally conceptualized, modeled, and encoded
according to certain formalism. In the realm of context-aware computing, models
168 4 Context Recognition in AmI Environments …

have been developed for a variety of aspects of human context (e.g., emotional
states, cognitive states, situations, social settings, activities, etc.). These models of
human contexts are represented in a formal and computational format, and incor-
porated in the context-aware systems that observe or monitor the cognitive, emo-
tional, social, and physical state or behavior of the user so such systems can perform
a more in-depth analysis of the human context, which can result in an context-aware
environment that may affect the situation of users by undertaking in a knowledgeable
manner actions that provide different kind of support or assistance. Investigating
approaches to context information modeling and reasoning techniques for context
information constitutes a large part of a growing body of research on the use of
context awareness as a technique for developing AmI applications that can adapt to
and act autonomously on behalf of users. Pattern recognition algorithms in
context-aware (and affective) computing have been under vigorous investigation in
the development of AmI applications and environments for ambient support. This is
resulting in a creative or novel use of pattern recognition algorithms. A multitude of
such algorithms and their integration have been proposed and investigated on the
basis of the way in which the contexts are operationalized, modeled, represented, and
reasoned about. This can be done during a specification process whereby, in most
cases, either concepts of context and their interrelationships are described based on
human knowledge (from human-directed disciplines) and represented in a compu-
tational format that can be used as part of reasoning processes to infer context, or
contexts are learned and recognized automatically, i.e., machine learning techniques
are used to build context models and perform further means of pattern recognition—
i.e., probabilistic and statistical reasoning. While several context recognition algo-
rithms have been applied in the area of context-aware computing, the most com-
monly used ones are those that are based on machine learning techniques, especially
supervised and unsupervised methods, and on ontological, logical, and integrated
approaches. Indeed, machine learning techniques and ontological approaches have
been integrated in various context-aware applications. This falls under what has
come to be known as ‘hybrid context modeling and reasoning approaches’, which
involve both knowledge representation formalisms and reasoning mechanisms.
Hybrid approaches involve other methods, such as rule-based methods, case-based
methods, and logic programing. Ontological and hybrid approaches are addressed in
more detail in Chap. 5.
This subsection aims to describe conceptual context models in terms of what
constitutes context information and the aspects and classes of contexts; provide an
overview of machine learning techniques and related methods; briefly describe
ontological and logical modeling and reasoning approaches; review work applying
these techniques and approaches; address uncertainty of context information; col-
lect together work dealing with uncertainty of context information in relation to
different approaches to context information modeling; and synthesize different
mechanisms for reasoning on uncertainty in the literature with a focus on proba-
bility theory and logic theory.
4.10 The State-of-the-Art Context Recognition 169

4.10.3.1 Conceptual Models and Qualities of Context Information

Conceptual context models are concerned with what constitutes context and its
conceptual structure. The semantics of what constitutes ‘context’ has been widely
discussed in the literature. And it is covered in more detail in the previous chapter
along with a detailed discussion of context operationalization in context-aware
computing. Likewise, defining what constitutes context information has been
studied extensively. Context information refers to the representation of the situation
—a set of contextual features—of an entity (e.g., user) in a computer system, where
these contextual features are of interest to a service provider for assessing the
timeliness and user-dependent aspects of assistive service delivery. There is a wide
variety of works that identify qualitative features of context information. Context is
framed by Schmidt et al. (1999) as comprising of two main components, human
factors and physical environment. Human factors related context encompasses three
categories: information on the user (knowledge of habits, emotional state,
bio-physiological conditions), the user’s tasks (activity, engaged tasks, general
goals), and the user’s social environment (social interaction, co-location of other,
group dynamics). Similarly, physical environment related context encompasses
three categories: location (absolute position, relative position, co-location), infra-
structure (computational communication and information resources, task perfor-
mance), and physical conditions (light, temperature, pressure, noise). Their model is
one of the first endeavors in the field of context-aware computing to explicitly
conceptualize context or model context information. As illustrated in Fig. 4.6,
context is modeled using features, namely there is a set of relevant features for each
context and a value range is defined for each feature. Building on Schmidt et al.
(1999), Göker and Myrhaug (2002) present AmbieSense system, where user context
consist of five elements: environment context (place where user is); personal

Fig. 4.6 Context feature space. Source Schmidt et al. (1999)


170 4 Context Recognition in AmI Environments …

context (physiological and cognitive state); task context (activity); social context
(social aspects of the current user context); and spatiotemporal context (time and
spatial extent for the user context). In the context of work, Krish (2001) describes
context as ‘highly structured amalgam of informational, physical and conceptual
resources that go beyond the simple facts of who or what is where and when to
include the state of digital resources, people’s concepts and mental state, task state,
social relations and the local work culture, to name a few ingredients.’ Based on
Schmidt et al. (1999) model, Korpipaa et al. (2003) present a context structure with
the following properties: context type, context value, source, confidence, time-
stamp, and attributes. The Context Toolkit by Dey et al. (2001) is based on a
framework consisting of context widgets, aggregators, interpreters, services, and
discoverers, and in this framework: widgets collect context information, aggrega-
tors assemble information that concerns a certain context entity, interpreters analyze
or process the information to generate a high-level abstraction of context, services
perform actions on the environment using the context information, and discoverers
find the other components in the environment. There have been so many attempts to
model context, e.g., Dey et al. (2001), Jang and Woo (2003), and Soldatos et al.
(2007), but to name a few. It is to note that most of the above work does not provide
computational and formal representations of the proposed models using any related
technique.
One of the challenges in context-aware computing (or AmI) has been to provide
frameworks that cover the class of applications that exhibit human-like under-
standing and intelligent behavior. In this context, human-like understanding sig-
nifies analyzing (or interpreting and reasoning about) and estimating (or inferring)
the human’s context—the states of the user as, in an ideal case, in the manner in
which the user perceives them (what is going on his/her mind), a process for which
input is the observed information about the user’s cognitive, emotional, psycho-
physiological, and social states over time (i.e., human behavior monitoring), and
dynamic human process and human context models. As to human-like intelligent
behavior, it entails the system coming up with and firing the context-dependent
actions that provide support to the user’s cognitive, emotional, and social needs.
Acting upon or interacting based on human factors related context relates to human
functioning, which is linked to the behavioral patterns of individuals in the different
systems that they form part of within their environment. In reference to human
aspects in AmI, Bosse et al. (2007) propose a framework combining different
ingredients, as shown in Fig. 4.7, including human state and history models,
environment state and history models, profiles and characteristics models of
humans, ontologies and knowledge from psychological and/or social disciplines,
dynamic process models about human functioning, dynamic environment process
models, and methods for analysis on the basis of such models. Examples of such
analysis methods—in relation to AmI in general—include prosodic features anal-
ysis, facial expression analysis, gesture analysis, body analysis, eye movement
analysis, psychophysiological analysis, communicative intent analysis, social pro-
cess analysis, and so on.
4.10 The State-of-the-Art Context Recognition 171

Fig. 4.7 Framework to combine the ingredients. Source Bosse et al. (2007)

As a template for the class of AmI applications showing human-like under-


standing and supporting behavior, the framework ‘can include slots where the
application-specific content can be filled to get an executable design for a working
system. This specific content together with the generic methods to operate on it,
provides a reflective coupled human-environment system, based on a tight coop-
eration between a human and an ambient system to show human-like understanding
of humans and to react from this understanding in a knowledgeable manner’ (Ibid,
p. 8).

4.10.3.2 Machine Learning Techniques: Supervised and Unsupervised


Learning Models and Algorithms

Making artifacts ‘able to compute and communicate does not make them intelli-
gent: the key (and challenge) to really adding intelligence to the environment lies in
the way how the system learns and keeps up to date with the needs of the user by
itself. A thinking machine, you might conclude—not quite but close: if you rely on
the intelligent environment you expect it to operate correctly every time without
tedious training or updates and management. You might be willing to do it once but
not constantly even in the case of frequent changes of objects…or preferences in the
environment. A learning machine, I’ll say.’ (Riva et al. 2005).
Machine learning is a subfield of computer science (specifically a subspecialty of
AI) that is concerned with the development of software programs that provide
computers with the ability to learn from experiences without following explicitly
programed instructions—that is, to teach themselves to grow and change when
exposed to new data. As a widely quoted, more formal definition provided by
Mitchell (1997, p. 2), ‘A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E’. This definition of
machine learning in fundamentally operational terms resonate with the idea that
172 4 Context Recognition in AmI Environments …

computer can think, which underlies the vision of AmI and UbiComp. Machine
learning is employed in AmI where designing and programing ontological,
rule-based, and logical algorithms is inadequate, unsuitable, or infeasible. However,
it is computationally unfeasible to build models for all sorts of situations of life and
environments—in other words, training sets or trained classes are finite, environ-
ments are dynamic, and the future is uncertain, adding to the limited sensor data.
Hence, notwithstanding the huge potential of machine learning techniques, the
underlying probability theory usually does not yield assurances of the performance
of algorithms; rather, probabilistic (and statistical) reasoning limits to the perfor-
mance are quite common. This relates to computational learning theory, a branch of
theoretical computer science that is concerned with the computational analysis of
machine learning algorithms and their performance in relation to different appli-
cation domains. Furthermore, machine learning has been extensively applied in
various HCI application domains within the area of AmI and AI, e.g., context-aware
computing, affective computing, and conversational systems. In AmI, machine
learning and reasoning aims at monitoring the actions of humans along with the
state change of the environment using various types of sensors as well as actuators
to react and pre-act in response to human actors. A major strand of context and
activity recognition algorithms is based on supervised and unsupervised learning
approaches as machine learning techniques. Machine learning entails various types
of algorithms, which can be classified into different categories based on the type of
input available during machine training or the desired outcome of the algorithm—
e.g., context recognition, including, in addition to supervised and unsupervised,
semi-supervised learning (combines both labeled and unlabeled examples to gen-
erate an appropriate classifier), transductive inference (attempts to predict new
outputs on specific test cases from observed training cases), learning to learn (learns
its own inductive bias based on previous experience), reinforcement learning (the
agent act in a dynamic environment by executing actions which trigger the
observable state of that environment to change, and in the process of acting, it
attempts to gather information about how the environment reacts to its actions as
well as to synthesize a sequence of actions that maximizes some notion of cumu-
lative reward), and so on. However, other strands of context and activity recogni-
tion algorithms are broadly based on logical, ontological, or hybrid modeling and
reasoning.
Supervised Learning: The basic idea of supervised learning is to classify data in
formal categories that an algorithm is trained to recognize. In context-aware
computing, supervised learning entails using a learning data set or labeled data
upon which an algorithm is trained, and the algorithm classifies unknown con-
textual data following training and thus grow and change as it get exposed to new
experiences. In this sense, the machine learning process examines a set of atomic
contexts which have been pre-assigned to categories, and makes inductive
abstractions based on this data that assists in the process of classifying future atomic
contexts into, for example cognitive, emotional, or situational context. Approaches
based on supervised learning require an important training period during which
several examples of each context and related concepts are collected and analyzed.
4.10 The State-of-the-Art Context Recognition 173

Hence, training phase often needs as much human intervention as probably a


manual context specification phase would require in the case of ontological mod-
eling. In approaches based on supervised machine learning, the quality of a training
context influences the outcome of the classification critically. And the granularity of
the learned context concepts is influenced by the availability and nature of the
low-level sensor data (Bettini et al. 2010). In all, supervised learning algorithms
enable context-aware systems to keep track of previous observed experiences, in the
form of trained classes of context or a learning contextual data set, and employ
them to dynamically learn the parameters of the stochastic context (a pattern that
may be analyzed statistically but not predicted precisely) models, which allow them
to generate predictive models based on the observed context patterns. AmI inter-
faces learn from users’ behaviors and actions in order to predict their future needs.
Figure 4.8 illustrates an adaptive context awareness process where the system
incrementally creates a model of the world it observes. In this process, ‘the user of a
device can be taken into the loop to train and detect contexts on the spot; the device
learns new situations by example, with its user as the teacher. This flexible learning
scheme is often referred to as incremental learning: new classes can be trained
without having to retrain the ones that were already trained, and old classes can be
retrained should they have changed or become obsolete’ (Van Laerhoven and
Gellersen 2001, p. 2).
In this way, contexts can be learned and recognized automatically, that is, sensor
observations are associated to a human-defined context label using probabilistic and
statistical reasoning.
The general process of a supervised learning algorithm for context recognition
encompasses several stages, namely, borrowing Chen and Nugent’s (2009) termi-
nology in relation to activity recognition:

Fig. 4.8 Context awareness as an adaptive process where the system incrementally creates a
model of the world it observes. Source Adapted from Van Laerhoven and Gellersen (2001)
174 4 Context Recognition in AmI Environments …

1. To acquire sensor data representative of context (e.g., emotional state, cognitive


state, task state, etc.), including labeled annotations of context.
2. To determine data features and related representation.
3. To aggregate and fuse data from multiple data sources (using multi-sensor data
fusion techniques) and transform them into cues or the application-dependent
features, e.g., through noise elimination, dimension reduction, data normaliza-
tion, and data fusion.
4. To divide the data into two sets: training and test.
5. To train the recognition algorithm on the training set.
6. To test the performance of the trained algorithm classifier on the test set.
7. To apply the algorithm in the context of context recognition.
Stages (4) to (7) are commonly repeated with different partitioning of the
training and test sets as means of achieving better generalization with the context
recognition models.
Classification and Classifiers: Classification is a key subprocess in the cog-
nitive process of human perception. In other words, it is a crucial step to achieving
the perception of any sensory information received by the human brain. In real life,
an observer of a context starts off with a perceptual analysis of the context, com-
pares this to previous encountered contexts, classifies the context into a meaningful
category, and subsequently interprets and evaluates the context—since the initially
perceived context is only an expression of a certain interpretation of a situation and
thus other relevant aspects may be included in the context description over time,
resulting in a context conclusion or inference. Recognition is a process of com-
paring descriptions of objects currently in view with descriptions of objects seen
previously, which reside in the long memory (Braisby and Gellatly 2005). The
classification process applied in context-aware systems tends to follow the same
logic. That is to say, in the domain of context recognition, the basic idea of clas-
sification is to determine different context labels on the basis of a set of context
categories (training examples), learned from the real-world as models. The algo-
rithm is presented with a set of inputs and their desired outputs, e.g., association of
sensor data with real-world contexts, and the goal is to learn a general rule that
maps inputs (e.g., sensor data) to outputs (context labels), so that the algorithm can
map new sensor data into one of these context labels. The quality of classification,
how well a classifier performs, is inextricably linked to the nature and richness of
the learning experience of the algorithm, and also depends critically on the features
of the contextual data to be classified. Building new models, training new classes,
during the analysis of the collected senor data is important for making future
inductive abstractions in terms of classifying unknown contextual data and thus
gaining experience. Classification of contexts is done using a classifier that is
learned from a comprehensive training set of annotated context examples.
Classifiers represent tasks entailing the use of pattern matching to determine a best
match between the features extracted from sensor data and a context description.
This is about classification of sensor cues into a known category and the storing
general patterns of context. From a cognitive psychology perspective, human
4.10 The State-of-the-Art Context Recognition 175

recognition is considered as the initial process of perception before the information


is semantically classified, named, and then stored in the long-term memory.
However, there are various supervised learning classifiers, and they vary in terms of
performance, which depends on the application domain of context awareness to
which they can be applied. For example, Wimmer et al. (2009) state that Binary
Decision Tree is a robust and quick classifier when it comes to facial expressions
classification. Decision tree learning approach uses a decision tree as a predictive
model which maps sensor observations to inferences about the context’s target
value. In a cognitive context-aware system, Kim et al. (2007) adopted Support
Vector Machine (SVM) classifier in a context inference algorithm because of its
highest performance compared to other classifiers in terms of accurate categoriza-
tion of text. As a set of related supervised learning methods used for classification,
an SVM training algorithm builds a model that predicts into which of two cate-
gories a new example falls, assuming that each training example is marked as
belonging to one of two categories. In view of this, it is not possible to say that one
classifier is superior to or better than the other, nor is there a single classifier that
works best for all on all given problems. Thus, determining a suitable classifier for a
given problem domain is linked to the complexity and nature of that problem
domain. Accordingly, multi-class classifiers that are able to derive the class
membership from real valued features can be integrated.
Common Classifiers in Context-Aware and Affective Computing: Based on the
literature, there are a wide range of classifiers that are used in context-aware (and
affective) computing. They include, in addition to Binary Decision Tree and SVM,
neural network, k-nearest neighbor, dynamic and naive Bayes, and Hidden Markov
Models (HMMs). These are also known as algorithms and models for supervised
learning and context recognition. They have been applied in a wide variety of
context awareness domains within both laboratory-based as well as real-world
environments. For example, SVM is used by Kim et al. (2007) in a context
inference algorithm to recognize or infer a user’s intention as a cognitive context;
the sources that the user is using on the Web-based information system should be
discerned and then the representatives of each source category should be extracted
and classified by means of a text categorization technique. In the context of emotion
recognition based on facial expressions, Michel and El Kaliouby (2003) utilize a
SVM to determine one of the six facial expressions within the video sequences of
the comprehensive facial expression database developed by Kanade et al. (2000) for
facial expression analysis. Schweiger et al. (2004) classification method is based on
supervised neural network learning; they compute the optical flow within 6 pre-
defined regions of a human face in order to extract the facial features. Wimmer et al.
(2009) apply Binary Decision Tree (Quinlan 1993) as a classifier to infer the correct
facial expressions from the features extracted from a video frame. Sebe et al. (2002)
adopt cauchy naive Bayes classifier for emotion recognition from facial expres-
sions. A recent research project called ‘Machine Learning and Pattern Recognition
with Multiple Modalities’ (MIT Media Lab 2014) aiming to develop new recog-
nition algorithms to enable computer systems to make rapid and accurate inferences
from multiple modes of data, applies Bayesian approach: ‘formulating probabilistic
176 4 Context Recognition in AmI Environments …

models on the basis of domain knowledge and training data, and then performing
inference according to the rules of probability theory.’ The detection and recog-
nition of emotional states from facial expressions can thus be achieved through
various classifiers or methods. It is worth mentioning that emotion recognition
based on facial expressions is a de facto standard in context-aware computing and
affective computing as well as emotionally intelligent and conversational systems.
As emotions are inherently multimodal, to provide a more robust estimation of the
user’s emotional state, different modalities can be combined so too can classifiers.
Caridakis et al. (2006) combine facial expressions and speech prosody, and
Balomenos et al. (2004) combine facial expressions and hand gestures. Further, in
the context of activity recognition, HMMs are adopted in Patterson et al. (2005),
Ward et al. (2006) and Boger et al. (2005), dynamic and naïve Bayesian networks
in Philipose et al. (2004), Wang et al. (2007) and Albrecht and Zukerman (1998),
decision trees in Tapia and Intille (2007), nearest neighbor in Lee and Mase (2002)
and SVMs in Huynh (2008). With regard to wearable computing, HMMs are used
for ‘Learning Signification Locations and Predicting User Movement with GPS’
(Ashbrook and Starner 2002) and neural networks in Van Laerhoven et al. (2002)
using many sensors (accelerometers) to build models of and analyze user’s body
movement. Brdiczka et al. (2007) propose a four-layered situation learning
framework, which acquires different parts of a situation model, namely situations
and roles, with different levels of supervision. Situations and roles are learned from
individual audio and video data streams. The learning-based approach has a 100 %
recognition rate of situations with pre-segmentation.
Among the above models and algorithms for supervised learning, HMMs and
Bayes networks are thus far the most commonly applied methods in the area of
context recognition. While both of these methods have been shown to be successful
in context-aware computing, they are both very complex and require lots of a large
amount of labeled training and test data. This is in fact the main disadvantages of
supervised learning in the case of probabilistic methods, adding to the fact that it
could be computationally costly to learn each context in a probabilistic model for an
infinite richness or large diversity of contexts in real-world application scenarios
(see Chen and Nugent 2009). Moreover, given that context-aware applications
usually incorporate different contextual features of the user that should be combined
in the inference of a particular dimension of context, and one feature may, in turn,
involve different types of sensor data, e.g., emotional feature of a user’s context
include data from image sensor, voice sensor, and biosensor, adding to the varia-
tions of users’ states and behaviors, the repetitive diversification of the partitioning
of the training and test sets may not lead to the desired outcome with regard to the
generalization with the context recognition models. This has implication for the
accuracy of the estimation of context, that is, the classification of dynamic con-
textual data into relevant context labels. Machine learning methods in the case of
probabilistic methods ‘choose a trade-off between generalization and specification
when acquiring concepts from sensor data recordings, which does not always meet
the correct semantics, hence resulting in wrong detections of situations’ (Bettini
et al. 2010, p. 11). A core objective of a learning algorithm is to generalize from its
4.10 The State-of-the-Art Context Recognition 177

experience (see Bishop 2006), whereby generalization denotes the ability of a


learning mechanism to perform accurately on previously not experienced or seen
context examples after having experienced a learning data set—the combination of
context patterns with their class labels, given that each pattern belongs to a certain
predefined class. While this is a decision that should be made, the resulting context
models are often ad-hoc and not reusable. Indeed, supervised leaning algorithms
inherently suffer from several limitations, namely scalability, data scarcity, inflex-
ibility, ad-hoc static models; these methods ‘should tackle technical challenges in
terms of their robustness to real-world conditions and real-time performance’ (Chen
and Nugent 2009). Research endeavors in machine learning should focus on cre-
ating alternative theories based on new discoveries in human-directed sciences in
terms of developing less complicated, computationally elegant, and, more impor-
tantly, effective and robust algorithms, with wider applicability, irrespective of the
application domain.

4.10.3.3 Unsupervised Learning Methods

Distinct from supervised learning, unsupervised learning tries to directly build


recognition models from unlabeled data. With having no labels, the learning
algorithm is left on its own to groups of similar inputs or density estimates that can
be visualized effectively (Bishop 2006). Unsupervised learning thus provides
context-aware systems with the ability to find context patterns in cues as abstraction
from raw sensor data—i.e., features extracted from the data stream of multiple,
diverse sensors. Probabilistic algorithms can be used for finding explanations for
streams of data, helping recognition systems to analyze processes that occur over
time (Russell and Norvig 2003). Underlying the unsupervised learning algorithm is
the idea to manually assign a probability to each possible context and to use a
pre-established stochastic model to update the context likelihoods on the basis of
new sensor readings as well as the known state of the system (see Chen and Nugent
2009). The general process of an unsupervised learning algorithm for context
recognition include, according to Chen and Nugent (2009, p. 414):
1. to acquire unlabeled sensor data;
2. to aggregate and transform the sensor data into features; and
3. to model the data using either density estimation (to estimate the properties of
the underlying probability density) or clustering methods (to discover groups of
similar examples to create learning models).
There exist a number of methods and algorithms for unsupervised learning that
are based on probabilistic reasoning, including Bayes networks, graphical models,
multiple eigenspaces, and various variants of HMMs. Huynh and Schiele (2006)
used multiple eigenspaces for discovery of structure in activity data. Liao et al.
(2007) adopted a hierarchical HMM that can learn and infer a user’s daily actions
through an urban community. Unsupervised learning probabilistic methods are
capable of handling the uncertainty and incompleteness of sensor data. Probabilities
178 4 Context Recognition in AmI Environments …

can be used to serve various purposes in this regard, such as modeling uncertainty,
reasoning on uncertainty, and capturing domain heuristics (see, e.g., Bettini
et al. 2010; Chen and Nugent 2009). It is worth mentioning that uncertainty is one
of the weaknesses of ontological approaches in terms of both modeling and rea-
soning. However, unsupervised learning probabilistic methods are usually static
and highly context-dependent, adding to their limitation as to the assignment of the
handcrafted probabilistic parameters (e.g., modeling uncertainty, capturing heuris-
tics) for the computation of the context likelihood (see Chen and Nugent 2009).
Indeed, they seem to be less applied than supervised learning in the domain of
context recognition.

4.10.3.4 Logical and Ontological Modeling Methods and Reasoning


Algorithms

Context is a domain of knowledge that can be formally represented and reasoned


about based on a variety of languages and reasoning mechanisms, such as Ontology
Web Language (OWL) and logic programing. Context representation and reasoning
entails a context model that is semantically expressed in a computational way to
allow software agents to conduct reasoning and manipulation, using a wide variety
of knowledge representation and reasoning logic-based formalisms, including
description logic, first-order logic, fuzzy logic, sentential logic, modal logic, and
inductive logic (see, e.g., Bettini et al. 2010; Russell and Norvig 2003; Luger and
Stubblefield 2004; Nilsson 1998). The basic idea of logical approach is to represent
context knowledge domain using a logic-based formalism and model sensor data,
and to use logical reasoning to carry out context recognition (see Chen and Nugent
2009). The logical context recognition is closer to the ontological approach in
nature, as the latter is also based on description logic, which is one of several
extensions of logic that are intended to handle specific domains of knowledge. In
reference to activity recognition, the general process of a logical approach, which
can also apply to context recognition, includes, according to Chen and Nugent
(2009, p. 415):
1. to use a logical formalism to explicitly define and describe a library of activity
[or context] models for all possible activities [or contexts] in a domain;
2. to aggregate and transform sensor data into logical terms and formula; and
3. to perform logical reasoning, e.g., deduction, abduction and subsumption, to
extract a minimal set of covering models of interpretation from the activity [or
context] model library based on a set of observed actions [or contextual fea-
tures], which could explain the observations.
The general process of an ontological approach includes:
1. to use an expressive formalism to explicitly specify key concepts and their
interrelationships for all possible contexts (e.g., emotional states) in a domain
(e.g., emotion);
4.10 The State-of-the-Art Context Recognition 179

2. to aggregate, fuse, and transform sensor data into semantic terms; and
3. to perform descriptive-logic based reasoning, e.g., subsumption, to interpret
atomic context concepts and then deduce or infer a high-level context abstraction.
The logical and ontological approaches to context representation and reasoning
are acknowledged to be semantically clear in computational reasoning. See next
chapter for a detailed account of ontological approach to context modeling and
reasoning, including its strengths and weaknesses. The strength of logical
approaches lies in the easiness to integrate domain knowledge and heuristics for
context models and data fusion, and the weakness ‘in the inability or inherent
infeasibility to represent fuzziness and uncertainty’; they ‘offer no mechanism for
deciding whether one particular model is more effective than another, adding to ‘a
lack of learning ability associated with logic based methods’ (Ibid).
Like supervised and supervised leaning probabilistic methods, there is a range of
logical modeling methods and reasoning mechanisms with regard to logical theories
(e.g., situation theory, event theory, lattice theory) and representation formalisms
(e.g., first-order logic, inductive logic, description logic, fuzzy logic). In terms of
logical representation, first-order logic can express facts about objects and their
properties and interrelationships, and allows the use of predicates and quantifiers
(see Russell and Norvig 2003; Luger and Stubblefield 2004). In a project called
Gaia, a predicate logic representation of context information is developed by
Ranganathan and Campbell (2003) based on logic programing using XSB (Sagonas
et al. 1994). In this model, a first order predicate is associated with each context,
with its designation describing the context type. As a logic operator, quantification
is always done over finite sets, and can be used in addition to other logic operators,
such as conjunction, disjunction, and negation, to combine the context predicates
into more complex context descriptions (see Perttunen et al. 2009). Ranganathan
and Campbell (2004) have used AI planning techniques to the Gaia system, namely
‘STRIPS’ (Brachman and Levesque 2004) planning. In his thesis, Ranganathan
(2005) states that that they believed that planning computationally was too costly
for their system. Henricksen and Indulska (2006) applied predicate logic to infer a
situation abstraction. High-level situation abstractions are expressed in their model
using a novel form of predicate logic that balances efficient evaluation against
expressive power. They define a grammar for formulating high-level situation
abstractions that model real-world situations in order to evaluate more complex
conditions than can be captured by assertions. Assertion is used to define the sets
over which the quantification is performed. Assertions that are interpreted under a
closed-world assumption of three-valued logics are used to reduce the values in
quantified expressions describing situations. High-level situation abstractions can
be incrementally combined to form more complex logical expressions. Moreover,
the context predicates can be combined using different logic operators into more
complex context descriptions. This is similar to the case of description-logic based
reasoning, where the fillers of a number of properties can be linked to form a
context description, the inference of unknown context described by the perceived
properties.
180 4 Context Recognition in AmI Environments …

In the context of activity recognition, as mentioned in Chen and Nugent (2009)


and Kautz (1991) espoused first-order axioms to construct a library of hierarchical
plans for plan recognition, an approach that was extended by Wobke (2002)
adopting situation theory to address the inferred plans as to their different proba-
bilities. Proposed by Barwise and Perry (1981), situation Theory deals with
model-theoretic semantics of natural language in a formal logic system. Bouchard
and Giroux (2006) adopted action description logic and lattice theory for plan
recognition.
Further, several extensions of logic—logic-based formalisms—have been
designed to handle specific domains of knowledge in the area of UbiComp or AmI.
Examples include, iterating: descriptive logic, situation calculus, event (and time)
calculus, and sentential logic. The situation calculus (Reiter 2001) provides a
logical language for reasoning about action and change. In the context of activity
recognition, as mentioned in Chen and Nugent (2009). Chen ‘exploited the event
theory—a logical formalism, for explicit specification, manipulation and reasoning
of events, to formalize an activity domain for activity recognition and assistance.
The major strength of Chen’s his work is its capabilities to handle temporal issues
and undecidability.’
As pointed out above, modeling methods and reasoning algorithms that are
based on formal logics have strengths and weaknesses. Particularly, they provide a
high level of abstraction and formality for specifying or describing contexts, but
they are known for their inability to handle uncertainty and incompleteness of
context information representation, in addition to the limitation as to reasoning
performance compared to probabilistic methods, which reduce their scalability in
real-world application scenario. Moreover, they are recognized in the field of
context-aware computing as error-prone due to the ambiguity and incompleteness
of contextual information.

4.10.4 Uncertainty in Context-Aware Computing

4.10.4.1 Uncertainty of Context Data/Information

By their very nature, humans are exquisitely attuned to their context. They rec-
ognize, understand, and respond to it without being explicitly or necessarily aware
of doing so. This indicates the subtlety of human sensory organs and internal
representations and structures involved in handling context in real-world situations
in terms of the cognitive information processing. Once established cognitive,
schemata facilitate the interpretation of new experiences, enabling humans, for
example to perform accurately on new, unseen contexts. In other words, humans
resort to schemas that provide a recognizable—yet evaluating in time—meaning of
contexts in order to make sense of a complex reality in terms interaction. The
specifics of context are dynamic, volatile, subjective, fluid, intricate, and subtle.
Hence, they are difficult to identify to be measured and modeled (operationalized),
4.10 The State-of-the-Art Context Recognition 181

which may well hinder the system to estimate or make predictions about users’
cognitive and emotional needs at a given moment. Our measurement of the
real-world is prone to uncertainty due to the use of imprecise sensors—and thus
imperfect sensing. As contextual data often originate from sensors, uncertainty
becomes unavoidable. Likewise, computational models and reasoning mechanisms
must necessarily be (over) simplified, as they are circumscribed by existing tech-
nologies. To simulate representations and structures and mental information pro-
cesses of humans into computer systems has been a daunting challenge in AI.
Consequently, context-aware systems are faced with the inevitability of the
employment of alterative techniques to deal with the issues of uncertainty, vague-
ness, erroneousness, and incompleteness of sensor context information in relation to
modeling methods and reasoning algorithms. A real challenge in context-aware
computing is to build robust, accuracy-enhanced, and comprehensive context
models that can deal with these issues. Bettini et al. (2010) point out that
context-aware applications are required to capture and make sense of imprecise and
conflicting data about the physical world, as its measurements are merely prone to
uncertainty. One aspect to consider in this regard is to formally conceptualize
context entities as dynamic rather than static entities, fixed routines with common
sense patterns and heuristics. While the respective problem seems to be difficult to
eradicate, especially when it comes to dealing with human functioning (emotional,
cognitive and behavioral processes), it is useful to develop innovative techniques
and methods to deal with it in ways that at reduce its effect on the performance of
context-aware applications because of imperfect inferences. In context-aware
applications, adaptation decisions ‘are made based on evaluation of context infor-
mation that can be erroneous, imprecise or conflicting’, and hence ‘modeling of
quality of context information and reasoning on context uncertainty is a very
important feature of context modeling and reasoning’ (Bettini et al. 2010, p. 2).
Failure to overcome the issue of uncertainty has implications for the quality of
context-aware applications in terms of the relevancy of delivered services—wrong
choices as to context-dependent actions—due to wrong detections of situations or
imperfect inferences of high-level abstraction of contexts. Therefore, uncertainty is
increasingly becoming a topic of importance and thus gaining a place in the research
area of context-aware computing in relation to low-level data acquisition,
intermediate-level information processing, and high-level service delivery and
applications. Many computational problems associated with context-aware func-
tionality, namely learning, sensing, representation, interpretation, reasoning, and
acting entail that software agents operate with uncertain, imprecise, or incomplete
contextual information. Different types of software objects in the environment must
be able to reason about uncertainty, including ‘entities that sense uncertain contexts,
entities that infer other uncertain contexts from these basic, sensed contexts, and
applications that adapt how they behave on the basis of uncertain contexts. Having a
common model of uncertainty that is used by all entities in the environment makes it
easier for developers to build new services and applications in such environments
and to reuse various ways of handling uncertainty.’ (Bettini et al. 2010, p. 12). In a
recent review of context representation and reasoning in pervasive computing
182 4 Context Recognition in AmI Environments …

(Perttunen et al. 2009), the authors stated that there is only a handful of work on
context-aware system that deals with representation and reasoning under
uncertainty.

4.10.4.2 Models for Uncertainty

Over the last decade, there have been some attempts to create models that deal with
uncertainty issues when representing and reasoning about context information.
A number of research projects have focused on modeling of quality of context
information and reasoning on context uncertainty as an important feature of context
modeling and reasoning. Among the early efforts to address and overcome uncer-
tainty is the work by Schmidt et al. (1999) and Dey et al. (2000). Schmidt and
colleagues associate each of their context values with a certainty measure which
captures the likelihood that the value accurately reflects reality, whereas Dey and
colleagues suggest a method whereby ambiguous information can be resolved by a
mediation process involving the user. This solution is particularly viable when the
context information is manageable in terms of the volume and not subject to rapid
change, so that the user is not unreasonably burdened (Bettini et al. 2010). In Gray
and Salber (2001), the authors discuss the issue of information quality in general
and include it as a type of meta-information in their context model. They describe
six quality attributes: coverage, resolution, accuracy, repeatability, frequency, and
timeliness. Allowed by a context service, different quality metrics are associated
with context information, as described by Lei et al. (2002). Ranganathan et al.
(2004a, b) provide a classification of different types of quality metrics that can be
associated with location information acquired from different types of sensors. These
metrics are: (1) resolution, which is the region that the sensor states the mobile
object is in, and can be expressed either as a distance or as a symbolic location,
depending on the type of sensor, e.g., GPS and card-reader, respectively;
(2) Confidence, which is measured as the probability that the person is actually
within a certain area, which is calculated based on which sensors can detect that
person in the area of concern; and (3) Freshness, which is measured based on the
time that has elapsed since the sensor reading, assuming that all sensor readings
have an expiry time.
Furthermore, an attempt of modeling uncertain context information with
Bayesian networks has been undertaken by Truong et al. (2005). They suggest
representing Bayesian networks in a relational model, where p-classes are used to
store probabilistic information, i.e., their properties have concomitant constraints:
parents-constraint and conditional probability table constraint. In Henricksen and
Indulska’s (2006) model whose interpretation is based on three-valued logic or
under the closed-world assumption, the ‘possibly true’ value is used to represent
uncertain information. To represent contexts information, Mäntyjärvi and Seppänen
(2002) adopted fuzzy logic as manifested in vague predicates to represent various
types of user activities as concepts. Ranganathan et al. (2004a) developed an
uncertainty model and describe reasoning with vague and uncertain information in
4.10 The State-of-the-Art Context Recognition 183

the Gaia system, their distributed middleware system for enabling Active Spaces.
This model is based on a predicate representation of contexts, where a confidence
value, which can be interpreted as one of two values: a probability in probabilistic
logic or a membership value in fuzzy logic, is assigned for each context predicate.
In other words, it measures the probability in the case of probabilistic logic or the
membership value in the case of fuzzy logic of the event that corresponds to the
context predicate holding true. Thus, this model uses various mechanisms such as
fuzzy logic, probabilistic logic, and Bayesian networks. However, the authors state
that probabilities or confidence values to be associated with types of context
information cannot be known by the designer. Approaches to inferring and pre-
dicting context information from sensor data in a bottom-up manner are proposed
by Mayrhofer (2004) and Schmidt (2002). In the ECORA framework (Padowitz
et al. 2008), a hybrid architecture for context-oriented pervasive computing, context
information is represented as a simple multi-dimensional vector of sensor mea-
surements, a space where a context is described as a range of values. A confidence
value is derived on the basis of the current sensor measurements (observations) and
the context descriptions to represent the ambiguity or uncertainty in the occurrence
of a context.
As to ontological approaches to context modeling and reasoning, there are a few
projects that have attempted to address the issue of representing and reasoning about
uncertainty. Straccia (2005) and Ding and Peng (2004) propose to extend existing
ontology languages and related reasoning tools to support fuzziness and uncertainty
while retaining decidability. However, according to Bettini et al. (2010), the few
existing preliminary proposals to extend OWL-DL to represent and reason about
fuzziness and uncertainty do not properly support uncertainty in context data at the
time of writing ontology languages and related reasoning tools. As echoed in a recent
survey carried out by Perttunen et al. (2009), none of the description logic-based
approaches are capable of dealing with uncertainty and vagueness. Although some
work (e.g., Schmidt 2006; Reichle et al. 2008) attempted combine ontological
modeling with modeling of uncertainty as an attempt to approach the issue, it falls
short in considering and preserving the benefits of formal ontologies. In all, sum-
marizing a review of work on modeling vagueness and uncertainty, Perttunen et al.
(2009) note that no work presents a model that satisfies all the requirements for
context representation and reasoning; and seemingly ‘the benefit of modeling
uncertainty and fuzziness has not been evaluated beyond the capability of repre-
senting it’, meaning that ‘the work doesn’t make it clear how easy it is to utilize such
models in applications…and in what kind of applications does it benefit the users.’
Based on the literature, empirical work that deals with representing and rea-
soning under uncertainty in relation to cognitive and emotional context-aware
systems is scant, regardless of whether context pattern recognition algorithm is
based on machine learning techniques or ontological approaches to modeling and
reasoning. This can probably be explained by the fact that the research within
emotional and cognitive context awareness is still in its infancy, and thus the
associated modeling methods and reasoning algorithms are not as mature as those
related to situational context, activity, and location.
184 4 Context Recognition in AmI Environments …

4.10.4.3 Reasoning on Uncertainty in Context Information

In context-aware computing, many problems pertaining to learning, inference, and


prediction entail that software agents operate with uncertain information. Therefore,
researchers from both AmI and AI domains have developed and proposed a number
of mechanisms for reasoning on uncertainty using probability theory and logic
theory. Broadly, the purpose for reasoning on uncertainty is twofold: to improve the
quality of context information and to infer new kinds of context information, by
using multi-sensor fusion where data from different sensors are used to enhance
context quality metrics, and deducing higher level contexts or situations from
lower level ones, respectively; since it is not possible to directly capture the
higher level contexts, a certain level of uncertainty becomes likely in relation to
these contexts, which depends on both the accuracy of information detection and
the precision of the deduction process (Bettini et al. 2010).
Various approaches have been applied to reason on uncertainty in context
information. They are referred to in a recent survey of context modeling and rea-
soning techniques (Bettini et al. 2010). Before describing the main of these
approaches, it is important to note that most of them relate to probabilistic and
logical methods, since ontological and logical approaches are known for their
inherent infeasibility to represent fuzziness and uncertainty, and thus their inability
of reasoning on uncertainty.
Fuzzy logic: As a version of first-order logic, it allows the truth of a statement to
be represented as a value between 0 and 1 (see Russell and Norvig 2003).
Commonly, fuzzy systems that are used for uncertain representation and reasoning
have been adopted in various application domains within AmI and AI. Fuzzy logic
(Zadeh 1999) approach can be utilized for representing and reasoning on context
uncertainty or vagueness. It is manifested in vague predicates (Brachman and
Levesque 2004). In fuzzy logic ‘confidence values represent degrees of membership
rather than probability. Fuzzy logic is useful in capturing and representing impre-
cise notions such as…“confidence” and reasoning about them. The elements of two
or more fuzzy sets can be combined (fused) to create a new fuzzy set with its own
membership function…Fuzzy logic is well suited for describing subjective con-
texts, performing multi-sensor fusion of these subjective contexts and resolving
potential conflicts between different contexts.’ (Bettini et al. 2010, p. 13).
Probabilistic logic: In probabilistic logic the truth values of sentences are
probabilities (Nilsson 1986). Probabilistic logic is used to handle uncertainty with
the capacity of deductive logic, allowing ‘making logical assertions that are asso-
ciated with a probability’ (Bettini et al. 2010), and the result provides more
expressive formalism. Fagin et al. (1990) propose probabilistic logic based on
propositional logic and specify a complete axiomatization. Probabilistic logic allows
writing ‘rules that reason about events’ probabilities in terms of the probabilities of
other related events. These rules can be used both for improving the quality of
context information through multi-sensor fusion as well as for deriving higher level
probabilistic contexts. The rules can also be used for resolving conflicts between
context information obtained from different sources.’ (Bettini et al. 2010, p. 13).
4.10 The State-of-the-Art Context Recognition 185

However, it is argued that probabilistic logics are associated with some difficulties,
manifested in their tendency to multiply the computational complexities of their
probabilistic and logical components.
Hidden Markov Models (HMMs): HMMs have a wide applicability in context
awareness for different problems, including learning, inference, and prediction.
They represent ‘stochastic sequences as Markov chains; the states are not directly
observed, but are associated with observable evidences, called emissions, and their
occurrence probabilities depend on the hidden states’ (Bettini et al. 2010, p. 14).
They have been used for location prediction. For example, Ashbrook and Starner
(2002) adopt HMMs that can learn signification locations and predict user move-
ment with GPS sensors. In a similar approach, Liao et al. (2007) adopt a hierar-
chical HMM that can learn and infer a user’s daily actions through an urban
community. Multiple levels of abstraction are used in their model to bridge the gap
between raw GPS sensor measurements and high-level information.
Bayesian networks: Based on probability theory, Bayesian networks can be used
for a wide range of problems in AI and AmI: perception using dynamic Bayesian
networks (e.g., Russell and Norvig 2003), learning using the expectation-
maximization algorithm (e.g., Poole et al. 1998; Russell and Norvig 2003), and
reasoning using the Bayesian inference algorithm (e.g., Russell and Norvig 2003;
Luger and Stubblefield 2004). They ‘are directed acyclic graphs, where the nodes
are random variables representing various events and the arcs between nodes rep-
resent causal relationships. The main property of a Bayesian network is that the
joint distribution of a set of variables can be written as the product of the local
distributions of the corresponding nodes and their parents’ (Bettini et al. 2010,
p. 14). They provide efficiency in representing conditional probabilities in the case
of sparsity as to the dependencies in the joint distribution, and are well suited for
inferring higher level contexts and combining uncertain information from a large
number of sources (Ibid).

4.10.5 Basic Architecture of Context Information


Collection, Fusion, and Processing

The process of recognizing relevant contexts consists of determining the ‘condi-


tions’ or ‘circumstances’ of entities, e.g., a user and his/her emotional state, cog-
nitive state, situational state, or activity, according to the universe of discourse
(respectively emotion, cognition, situation, and activity) associated with the
respective context-aware application. The aim is to make sensible predictions about
what users need, want, or feel and then undertake in a knowledgeable manner
actions that support their emotional and cognitive needs and daily tasks and
activities, by providing efficient and useful services. Accordingly, context-aware
computing involves various research topics, ranging from low-level sensor data
collection; to middle-level data fusion, representation, interpretation, and reasoning;
to high-level applications and service delivery. These topics can be framed
186 4 Context Recognition in AmI Environments …

differently. First topic concerns capture technologies and related signal and data
processing approaches. Second topic deals with pattern recognition methods and
algorithms and related models that are used to learn or represent and interpret and
reason about contexts. Part of this topic, in relation to ontological approach, pertains
to explicit specification of key concepts and their interrelationships for a certain
context domain and their formal representation using the commonly shared ter-
minology in that domain. The third topic is concerned with context-dependent
actions or ambient service delivery, involving application and adaptation rules. In
all, context-aware functionality is established through capturing, collecting, orga-
nizing, and processing context information to support adaptation of services in the
AmI spaces. This occurs at different levels of the system. This implies that
context-aware applications are based on a multilayered architecture, encompassing
different, separate layers of context information processing, i.e., raw sensor data,
feature extraction, classification or clustering (in the case of a supervised or
unsupervised learning methods), and high-level context derivation from semantic or
logical information (in the case of an ontological and logical approaches).
Figure 4.9 illustrates, in addition to the physical layer of sensors, three layers of
context information processing, along with examples techniques and methods that
have typically been used in context-aware computing. The arrows depict the flow of
context data/information.
Layer 1—Physical sensors: Signals in the environment are detected from mul-
tiple sources using various types of sensors. This sensor layer is usually defined by
open–ended (unrestricted) collection of sensors embedded within computer sys-
tems, i.e., user interfaces, attached to humans or objects, or spread in the envi-
ronment. The data supplied by sensors in a particular context-aware application can
be very different, ranging from slow sensors to fast and complex sensors (e.g.,
MMES, multi-sensors) that provide larger volume of data like those used for
detecting human activities or emotional states. It is also expected that the update
rate can vary largely from sensor to another, depending on the nature of the context.
Some generic context-aware application may deal with a large amount of context
information types beyond location, co-location, time, and identity, to include
emotional states, cognitive states, and activities; the current temporal and spatial
location; physical conditions; and preferences details. Moreover, manually entered
information also constitutes part of context information. context-aware applications
involve both implicit and explicit inputs; on the one hand, context data are acquired
from invisibly embedded sensors (or software equivalents), and, on the other hand,
via keyboard, touchtone screen, pointing device, or manual gestures. Context-aware
services execute service logic, based on information provided explicitly by end
users and implicitly by sensed context information (Dey 2001; Brown et al. 1997;
Schmidt 2005). Based on context data and explicit user input the application logic
defines which new data can be inferred as new context data at inference level and
then which action(s) should be performed at application level. But before this,
sensor data should first be aggregated, fused, and transformed into features.
Layer 2—Context data processing and computation: this layer is dedicated
to aggregate, fuse, organize, and propagate context data for further computation.
4.10 The State-of-the-Art Context Recognition 187

Fig. 4.9 Basic multilayered architecture underlying context information processing

On this layer signal processing, data processing, and pattern recognition methods
are used to recognize context either from sensor signals and labeled annotations
(classification) or from the data stream of multiple sensors—groups of similar
examples (clustering). It is to note that architectures for context-aware applications
usually do not prescribe specific methods for feature extraction. Referred to as
‘cooking the sensors’ (Golding and Lesh 1999), abstraction from sensors to cues
provide the advantage of reducing the data volume independent of any specific
application (Gellersen et al. 2001). The bottom part of this layer provides uniform
interface defined as set of cues describing the sensed user’s context. In this sense,
‘the cue layer strictly separates the sensor layer and context layer which means
context can be modeled in abstraction from sensor technologies and properties of
specific sensors. Separation of sensors and cues also means that both sensors and
feature extraction methods can be developed and replaced independently of each
188 4 Context Recognition in AmI Environments …

other’ (Ibid, p. 8). Accordingly, in relation to context recognition algorithms that


are based on ontological approach, the initial two steps entail acquiring sensor
readings and mapping them to matching properties described in context ontologies,
and using the latter to aggregate and fuse sensor observations to generate a context.
However, at this level of context processing, context data as abstraction from raw
sensor data still has no meaning for users or applications, as they need to be
interpreted. Data are objective observations, which are unorganized and unpro-
cessed, and do not convey any specific meaning (Pearlson and Saunders 2004).
Data needs to be organized in the form of information so that they have meaning to
the recipient. ‘Information is data that have been shaped into a form that is
meaningful and useful to human beings’ (Laudon and Laudon 2006, p. 13). This is
what the top part of this layer, context layer, deals with. The context layer intro-
duces a set of contexts which are abstractions of real-world situations, each as
function of available cues. It is only at this level of abstraction, after feature
extraction and data reduction in the cue layer, that information from different
sensors is combined for calculation of context. While cues are assumed to be
generic, context is considered to be more closely related to … [the system] and the
specific situations in which it is used. [M]ethods [that can be used] for calculating
context from cues [include:] rule-based algorithms, statistical methods and neural
networks… Conceptually, context is calculated from all available cues… The
mapping from cues to context may be explicit, for instance when certain cues are
known to be relevant indicators of a specific context, or implicit in the result of a
supervised or unsupervised learning technique.’ (Gellersen et al. 2001, p. 8).
Layer 3—Context representation and reasoning. This layer involves the appli-
cation of a multitude of representation and reasoning techniques, either in a separate
or combined manner (e.g., hybrid approach), with consideration of uncertainty of
context information and reasoning—when appropriate and feasible. The highly
dynamic outputs of the bottom part of the layer 2 put hard demands on this layer,
which depends on the diversity and the multiplicity of the sensors involved in a
particular application. At this layer, in the case of an ontological approach—for
instance, on the semantic level, semantic information is collected and used to enrich
the context data and thus produce context information that is meaningful and
understandable to the application. And on the inference level, information is used
from the semantic level, history information, and inference rules, to make predic-
tions or estimates about what the user context is and thus what kind of relevant
services that need to be delivered. Some instances that bind properties with sensor
readings are indicators of a specific context, or a set of atomic context concepts are
transformed into a higher level abstraction of context through the inference process.
To iterate, it is feasible to integrate various approaches to context modeling and
reasoning.
Layer 4—Applications and service delivery. This layer is concerned with firing
the context-dependent actions. Decisions are made about what actions are to be
performed, i.e., delivery of adaptive or responsive services, triggered by situations
defined at the inference level. The type of service to be delivered is typically
dependent on the nature of the context and thus application domain. For example,
4.10 The State-of-the-Art Context Recognition 189

the actions taken at the application level can be oriented towards ambient services
that support the user’s cognitive and emotional needs. Used to describe queries and
subscriptions, context query languages (CQLs) (see Reichle et al. 2008; Haghighi
et al. 2006 for detailed reviews) are broadly used by context-aware application to
access context information from context providers. Again, architectures for
context-aware applications do not prescribe specific languages for querying context
from service providers, and thus different query languages can be used; however,
the selection of the query language is based on the context representation tech-
niques used in layer 3 (ontological versus logical representation). Specifically, as
explained by Perttunen et al. (2009, p. 2), ‘The meaning of the queries must be
well-specified because in the implementation the queries are mapped to the rep-
resentations used in the middle layer. An important role of the middle layer and the
query language is to eliminate direct linking of the context providing components to
context consuming components… Thus, the query language should support que-
rying a context value regardless of its source. However, knowing the source of a
value may be useful for the client in the case of finding the cause of an erroneous
inference, for example, and can thus be included in the query response. It should be
noted that since the CQL acts as a facade for the applications to the underlying
context representation, the context information requirements of the applications are
imposed as much on the query language as on the context representation and
context sources… Procedural programing is typically used…to create queries and
to handle query responses, adapting the application according to context. In con-
trast, the context representation…can be purely declarative.’

References

Adjouadi M, Sesin A, Ayala M, Cabrerizo M (2004) Remote eye gaze tracking system as a
computer interface for persons with severe motor disability. In: Proceedings of the 9th
international conference on computers helping people with special needs, Paris, pp 761–766
Albrecht DW, Zukerman I (1998) Bayesian models for keyhole plan recognition in an adventure
game. User Model User Adap Interaction 8:5–47
Ashbrook D, Starner T (2002) Learning signification locations and predicting user movement with
GPS. In: The 6th international symposium on wearable computer. IEEE Computer Society, Los
Alamitos, CA, pp 101–108
Balomenos T, Raouzaiou A, Ioannou S, Drosopoulos A, Karpouzis K, Kollias S (2004) Emotion
analysis in man–machine interaction systems. In: Bengio S, Bourlard H (eds) Machine learning
for multimodal interaction, vol 3361. Lecture Notes in Computer Science, Springer, pp 318–328
Bao L, Intille S (2004) Activity recognition from user annotated acceleration data. In: Proceedings
of pervasive, LNCS3001, pp 1–17
Ballard DH, Brown CM (1982) Computer vision. Prentice Hall, New Jersey
Barghout L, Sheynin J (2013) Real-world scene perception and perceptual organization: lessons
from computer vision. J Vision 13(9):709–709
Baron-Cohen S (1995) Mindblindness. MIT Press, Cambridge
Barwise J, Perry J (1981) Situations and attitudes. J Philos 78(11):668–691
Beigl M, Gellersen HW, Schmidt A (2001) Mediacups: experience with design and use of
computer-augmented everyday objects. Comput Netw 35(4):401–409
190 4 Context Recognition in AmI Environments …

Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A


survey of context modelling and reasoning techniques. J Pervasive Mobile Comput Spec Issue
Context Model Reasoning Manag 6(2):161–180
Bishop CM (2006) Pattern recognition and machine learning, Springer
Bodor R, Jackson B, Papanikolopoulos N (2003) Vision based human tracking and activity
recognition. In: Proceedings of the 11th mediterranean conference on control and automation,
Rhodes, Greece
Boger J, Poupart P, Hoey J, Boutilier C, Mihailidis A (2005) A decision-theoretic approach to task
assistance for persons with dementia. In: Proceedings of the international joint conference on
artificial intelligence, IJCAI’05, pp 1293–1299
Bosse T, Castelfranchi C, Neerincx M, Sadri F, Treur J (2007) First international workshop on
human aspects in ambient intelligence. In: Workshop at the European conference on ambient
intelligence, Darmstadt, Germany
Bouchard B, Giroux S (2006) A smart home agent for plan recognition of cognitively-impaired
patients. J Comput 1(5):53–62
Brachman RJ, Levesque HJ (2004) Knowledge representation and reasoning. Morgan Kaufmann,
Amsterdam
Braisby NR, Gellatly ARH (2005) Cognitive psychology. Oxford University Press, New York
Brdiczka O, Crowley JL, Reignier P (2007) Learning situation models for providing context-aware
services. In: Proceedings of universal access in human–computer interaction, UAHCI 2007.
Lecture Notes in Computer Science, Springer, Berlin
Brown PJ, Bovey JD, Chen X (1997) Context-aware applications: from the laboratory to the
marketplace. IEEE Pers Commun 4(5):58–64
Caridakis G, Malatesta L, Kessous L, Amir N, Raouzaiou A, Karpouzis K (2006) Modeling
naturalistic affective states via facial and vocal expressions recognition. In: International
conference on multimodal interfaces (ICMI’06), Banff, Alberta, 2–4 Nov 2006
Chen L, Nugent C (2009) Ontology-based activity recognition in intelligent pervasive
environments. Int J Web Inf Syst 5(4):410–430
Clarkson B (2003) Life patterns: structure from wearable sensors. PhD thesis, Massachusetts
Institute of Technology
Community Research and Development Information Service (CORDIS) (2011) Project
Opportunity. http://cordis.europa.eu/home_en.html. Accessed 11 Dec 2011
Dellaert F, Polzin T, Waibel A (1996) Recognizing emotion in speech. In: Paper presented at the
4th international conference on spoken language processing (ICSLP)
Dennett DC (1987) The intentional stance. MIT Press, Cambridge
de Silva GC, Lyons MJ, Tetsutani N (2004) Vision based acquisition of mouth actions for
human–computer interaction. In: Proceedings of the 8th Pacific Rim International Conference
on Artificial Intelligence, Auckland, pp 959–960
Dey AK (2001) Understanding and using context. Personal Ubiquitous Comput 5(1):4–7
Dey AK, Manko J, Abowd G (2000) Distributed mediation of imperfectly sensed context in aware
environments. Technical representation, Georgia Institute of Technology
Dey AK, Abowd GD, Salber D (2001) A conceptual framework and a toolkit for supporting the
rapid prototyping of context-aware applications. Human Comput Interaction 16(2–4):97–166
Dey AK, Salber D, Abowd GD, Fetakwa M (1999) The conference assistant: combining
context-awareness with wearable computing. In: 3rd international symposium on wearable
computers. IEEE Computer Society, Los Alamitos, CA, pp 21–28
DeVaul R, Sung M, Gips J, Pentland A (2003) MIThril 2003: applications and architecture. In:
Proceedings of the 7th IEEE international symposium on wearable computers, White plains,
NY, pp 4–11
Ding Z, Peng Y (2004) A probabilistic extension to ontology language OWL. In: Proceedings of
37th annual Hawaii international conference on system sciences, pp 111–120
Fagin R, Halpern JY, Megiddo N (1990) A logic for reasoning about probabilities. Inf Comput 87
(1–2):78–128
References 191

Farringdon J, Moore AJ, Tilbury N, Church J, Biemond PD (1999) Wearable sensor badge and
sensor jacket for contextual awareness. In: 3rd international symposium on wearable
computers. IEEE Computer Society, Los Alamitos, CA, pp 107–113
Farringdon J, Oni V (2000) Visual augmented memory (VAM). In: Proceedings of the IEEE
international symposium on wearable computing (ISWC’00), Atlanta, GA, pp 167–168
Fiore L, Fehr D, Bodor R, Drenner A, Somasundaram G, Papanikolopoulos N (2008)
Multi-camera human activity monitoring. J Intell Rob Syst 52(1):5–43
Fishkin KP (2004) A taxonomy for and analysis of tangible interfaces. Personal Ubiquitous
Comput 8(5):347–358
Galotti KM (2004) Cognitive psychology in and out of the laboratory. Wadsworth
Gärdenfors P (2003) How homo became sapiens: on the evolution of thinking. Oxford University
Press, Oxford
Gaura E, Newman R (2006) Smart MEMS and sensor systems. Imperial College Press, London
Gellersen HW, Schmidt A, Beigl M (2001) Multi-sensor context-awareness in mobile devices and
smart artefacts. Department of Computing, Lancaster University, Lancaster, UK, Teco
University of Karlsruhe, Germany
Golding A, Lesh N (1999) Indoor navigation using a diverse set of cheap wearable sensors. In:
Proceedings of the IEEE international symposium on wearable computing (ISWC99), San
Francisco, CA, pp 29–36
Goldman AI (2006) Simulating minds: the philosophy, psychology and neuroscience of mind
reading. Oxford University Press, Oxford
Gray PD, Salber D (2001) Modelling and using sensed context information in the design of
interactive applications. In: Proceedings of the 8th IFIP international conference on engineering
for human–computer interaction (EHCI ’01), vol 2254. Springer, Toronto, pp 317–335
Gunes H, Piccardi M (2005) Automatic visual recognition of face and body action units. In:
Proceedings of the 3rd international conference on information technology and applications,
Sydney, pp 668–673
Göker A, Myrhaug HI (2002) User context and personalisation. In: ECCBR workshop on case
based reasoning and personalisation, Aberdeen
Haghighi PD, Zaslavsky A, Krishnaswamy S (2006) An evaluation of query languages for
context-aware computing. In: 17th international conference on database and expert systems
applications. IEEE, Krakow, pp 455–462
Henricksen K, Indulska J (2006) Developing context-aware pervasive computing applications:
models and approach. Pervasive Mobile Comput 2(1):37–64
Huynh DTG (2008) Human activity recognition with wearable sensors. PhD thesis, TU Darmstadt,
Darmstadt
Huynh T, Schiele B (2006) Unsupervised discovery of structure in activity data using multiple
eigenspaces. In: The 2nd international workshop on location- and context-awareness (LoCA),
vol 3987, LNCS
Huynh T, Blanke U, Schiele B (2007) Scalable recognition of daily activities with wearable
sensors. In: The 3rd international symposium on location- and context-awareness (LoCA), vol
4718, pp 50–67
Ikehara CS, Chin DN, Crosby ME (2003) A model for integrating an adaptive information filter
utilizing biosensor data to assess cognitive load. In: Brusilovsky P, Corbett AT, de Rosis F
(eds) UM 2003, vol 2702. LNCS, Springer, Heidelberg, pp 208–212
Ishikawa T, Horry Y, Hoshino T (2005) Touchless input device and gesture commands. In:
Proceedings of the international conference on consumer electronics, Las Vegas, NV, pp 205–206
Ivano Y, Bobick A (2000) Recognition of visual activities and interactions by stochastic parsing.
IEEE Trans Pattern Anal Mach Intell 22(8):852–872
Jain AK (2004) Multibiometric systems. Commun ACM 47(1):34–44
Jähne B, Haußecker H (2000) Computer vision and applications, a guide for students and
practitioners. Academic Press, Massachusetts
Jang S, Woo W (2003) Ubi-UCAM: a unified context-aware application model. In: Modeling and
using context, pp 1026–1027
192 4 Context Recognition in AmI Environments …

José R, Rodrigues H, Otero N (2010) Ambient intelligence: beyond the inspiring vision. J Univ
Comput Sci 16(12):1480–1499
Kahn JM, Katz RH, Pister KSJ (1999) Next century challenges: mobile networking for “Smart
Dust”. Department of ElectricalEngineering and Computer Sciences. University of California
Kaiser S, Wehrle T (2001) Facial expressions as indicators of appraisal processes. In: Scherer KR,
Schorr A, Johnstone T (eds) Appraisal processes in emotions: theory, methods, research.
Oxford University Press, New York, pp 285–300
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In:
International conference on automatic face and gesture recognition, France, pp 46–53
Kautz H (1991) A formal theory of plan recognition and its implementation. In: Allen J, Pelavin R,
Tenenberg J (eds) Reasoning about plans. Morgan Kaufmann, San Mateo, CA, pp 69–125
Kern K, Schiele B, Junker H, Lukowicz P, Troster G (2002) Wearable sensing to annotate meeting
recordings. In: The 6th international symposium on wearable computer. The University of
Washington, Seattle, pp 186–193
Kim S, Suh E, Yoo K (2007) A study of context inference for web-based information systems.
Electron Commer Res Appl 6:146–158
Kirsh D (2001) The context of work. Human Comput Interaction 16:305–322
Klette R (2014) Concise computer vision, Springer, Berlin
Korpipaa P, Mantyjarvi J, Kela J, Keranen H, Malm E (2003) Managing context information in
mobile devices. IEEE Pervasive Comput 2(3):42–51
Kwon OB, Choi SC, Park GR (2005) NAMA: a context-aware multi-agent based web service
approach to proactive need identification for personalized reminder systems. Expert Syst Appl
29:17–32
Laudon KC, Laudon JP (2006) Management information systems: managing the digital firm.
Pearson Prentice Hall, Upper Saddle River, NJ
Lee SW, Mase K (2002) Activity and location recognition using wearable sensors. IEEE Pervasive
Comput 1(3):24–32
Lee CM, Narayanan S, Pieraccini R (2001) Recognition of negative emotion in the human speech
signals. In: Workshop on automatic speech recognition and understanding
Lei H, Sow DM, John I, Davis S, Banavar G, Ebling MR (2002) The design and applications of a
context service. SIGMOBILE Mobile Comput Commun Rev 6(4):45–55
Liao L, Fox D, Kautz H (2007) Extracting places and activities from GPS traces using hierarchical
conditional random fields. Int J Rob Res 26(1):119–134
Ling B (2003) Physical activity recognition from acceleration data under semi-naturalistic
conditions. Masters thesis, Massachusetts Institute of Technology (MIT), MA
Luger G, Stubblefield W (2004) Artificial intelligence: structures and strategies for complex
problem solving. The Benjamin/Cummings Publishing Company, Inc
Lyshevski SE (2001) Nano- and microelectromechanical systems: fundamentals of nano- and
microengineering. CRC Press, Boca Ratón, EUA
Mayrhofer R (2004) An architecture for context prediction. In: Ferscha A, Hörtner H, Kotsis G
(eds) Advances in pervasive computing, vol 176, Austrian Computer Society (OCG)
Michel P, El Kaliouby R (2003) Real time facial expression recognition in video using support
vector machines. In: The 5th international conference on multimodal interfaces, Vancouver,
pp 258–264
Mitchell T (1997) Machine learning. McGraw Hill, London
MIT Media Lab (2014) Affective computing: highlighted projects. http://affect.media.mit.edu/
projects.php. Accessed 12 Oct 2013
Mäntyjärvi J, Seppänen T (2002) Adapting applications in mobile terminals using fuzzy context
information. In: Human computer interaction with mobile devices, pp 383–404
Morris T (2004) Computer vision and image processing. Palgrave Macmillan, London
Nilsson NJ (1986) Probabilistic logic. Artif Intell 28(1):71–87
Nilsson N (1998) Artificial intelligence: a new synthesis. Morgan Kaufmann Publishers,
Massachusetts
References 193

Oviatt S, Darrell T, Flickner M (2004) Multimodal interfaces that flex, adapt, and persist.
Commun ACM 47(1):30–33
Padovitz A, Loke SW, Zaslavsky A (2008) The ECORA framework: a hybrid architecture for
context-oriented pervasive computing. Pervasive Mobile Comput 4(2):182–215
Pantic M, Rothkrantz LJM (2003) Automatic analysis of facial expressions: the state of the art.
IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445
Park S, Locher I, Savvides A, Srivastava MB, Chen A, Muntz R, Yuen S (2002) Design of a
wearable sensor badge for smart kindergarten. In: The sixth international symposium on
wearable computers. IEEE Computer Society, Los Alamitos, CA, pp 231–238
Parkka J, Ermes M, Korpipaa P, Mantyjarvi J, Peltola J, Korhonen I (2006) Activity classification
using realistic data from wearable sensors. IEEE Trans Inf Technol Biomed 10(1):119–128
Passer MW, Smith RE (2006) The science of mind and behavior. McGraw Hill, Boston, MA
Patterson DJ, Fox D, Kautz H, Philipose M (2005) Fine-grained activity recognition by
aggregating abstract object usage. In: Proceedings of the IEEE international symposium on
wearable computers, pp 44–51
Pearlson KE, Saunders CS (2004) Managing and using information systems: a strategic approach.
Wiley, New York
Perttunen M, Riekki J, Lassila O (2009) Context representation and reasoning in pervasive
computing: a review. Int J Multimedia Eng 4(4)
Philipose M, Fishkin KP, Perkowitz M, Patterson DJ, Hahnel D, Fox D, Kautz H (2004) Inferring
activities from interactions with objects. IEEE Pervasive Comput Mobile Ubiquitous Syst 3
(4):50–57
Poole D, Mackworth A, Goebel R (1998) Computational intelligence: a logical approach. Oxford
University Press, New York
Poslad S (2009) Ubiquitous computing smart devices, smart environments and smart interaction.
Wiley, New York
Quinlan R (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo
Randell C, Muller H (2000) The shopping jacket: wearable computing for the consumer. Personal
Technol 4:241–244
Ranganathan A (2005) A task execution framework for autonomic ubiquitous computing. PhD
dissertation, University of Illinois at Urbana-Champaign, Urbana, Illinois
Ranganathan A, Campbell RH (2003) An infrastructure for context-awareness based on first order
logic. Personal Ubiquitous Comput 7(6):353–364
Ranganathan A, Campbell RH (2004) Autonomic pervasive computing based on planning. In:
Proceedings of international conference on autonomic computing, New York, pp 80–87, 17–18
May 2004
Ranganathan A, Al-Muhtadi J, Campbell RH (2004a) Reasoning about uncertain contexts in
pervasive computing environments. IEEE Pervasive Comput 3(2):62–70
Ranganathan A, Al-Muhtadi J, Chetan S, Campbell R, Mickunas MD (2004b) Middlewhere: a
middleware for location awareness in ubiquitous computing applications. In: Proceedings of the
5th ACM/IFIP/USENIX international conference on middleware. Springer, Berlin, pp 397–416
Rapaport WJ (1996) Understanding: semantics, computation, and cognition, pre-printed as
technical report 96–26. SUNY Buffalo Department of Computer Science, Buffalo
Reichle R, Wagner M, Khan MU, Geihs K, Valla M, Fra C, Paspallis N, Papadopoulos GA (2008)
A Context query language for pervasive computing environments. In: 6th Annual IEEE
international conference on pervasive computing and communications, pp 434–440
Reilly RB (1998) Applications of face and gesture recognition for human–computer interaction. In:
Proceedings of the 6th ACM international conference on multimedia, Bristol, pp 20–27
Reiter R (2001) Knowledge in action: logical foundations for specifying and implementing
dynamical systems. MIT Press, Cambridge
Rhodes B (1997) The wearable remembrance agent: a system for augmented memory. In: The 1st
international symposium on wearable computers. IEEE Computer Society, Los Alamitos, CA,
pp 123–128
194 4 Context Recognition in AmI Environments …

Riva G, Loreti P, Lunghi M, Vatalaro F, Davide F (2003) Presence 2010: the emergence of
ambient intelligence. In: Riva G, Davide F, IJsselsteijn WA (eds) Being there: concepts, effects
and measurement of user presence in synthetic environments. Ios Press, Amsterdam, pp 60–81
Riva G, Vatalaro F, Davide F, Alcañiz M (2005) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human–computer interaction.
IOS Press, Amsterdam
Russell S, Norvig P (2003) Artificial intelligence—a modern approach. Pearson Education, Upper
Saddle River, NJ
Sagonas K, Swift T, Warren DS (1994) XSB as an efficient deductive database engine. In:
Proceedings of the ACM SIGMOD international conference on management of data.
Minneapolis, Minnesota, New York, pp 442–453
Saffo P (1997) Sensors: the next wave of infotech innovation, 1997 ten-year forecast. Institute for
the Future. http://www.saffo.com/essays/sensors.php. Accessed 25 March 2008
Salvucci DD, Anderson JR (2001) Automated eye movement protocol analysis. Human Comput
Interaction 16(1):38–49
Sanders DA (2008) Environmental sensors and networks of sensors. Sensor Rev 28(4):273–274
Sanders DA (2009a) Introducing AI into MEMS can lead us to brain-computer interfaces and
super-human intelligence. Assembly Autom 29(4)
Sanders DA (2009b) Ambient intelligence and energy efficiency in rapid prototyping and
manufacturing. Assembly Autom 29(3):205–208
Scherer KR (1992) What does facial expression express? In: Strongman K (ed) International
review of studies on emotion, vol 2, pp 139–165
Scherer KR (1994) Plato’s legacy: relationships between cognition, emotion, and motivation.
University of Geneva
Schweiger R, Bayerl P, Neumann H (2004) Neural architecture for temporal emotion
classification. In Andre E, Dybkjær L, Minker W, Heisterkamp P (eds) ADS 2004, vol
3068. LNCS (LNAI), Springer, Heidelberg, pp 49–52
Schmidt A (2002) Ubiquitous computing—computing in context. PhD dissertation, Lancaster
University
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human–computer interaction.
IOS Press, Amsterdam, pp 159–178
Schmidt A (2006) Ontology-based user context management, the challenges of imperfection and
time-dependence. In: On the move to meaningful internet systems: CoopIS, DOA, GADA, and
ODBASE, vol 4275. Lecture Notes in Computer Science, pp 995–1011
Schmidt A, Beigl M, Gellersen HW (1999) There is more to context than location. Comput
Graph UK 23(6):893–901
Sebe N, Lew MS, Cohen I, Garg A, Huang TS (2002) Emotion recognition using a cauchy naive
Bayes classifier. In: Proceedings of the 16th international conference on pattern recognition,
vol 1. IEEE Computer Society, Washington, DC, pp 17–20
Shapiro LG, Stockman GC (2001) Computer vision. Prentice Hall, New Jersey
Sheldon EM (2001) Virtual agent interactions, PhD thesis, Major Professor-Linda Malone
Sibert LE, Jacob RJK (2000) Evaluation of eye gaze interaction. In: Proceedings of the ACM
conference on human factors in computing systems, The Hague, pp 281–288
Soldatos J, Pandis I, Stamatis K, Polymenakos L, Crowley JL (2007) Agent based middleware
infrastructure for autonomous context-aware ubiquitous computing services. Comput Commun
30(3):577–591
Straccia U (2005) Towards a fuzzy description logic for the semantic web (preliminary report). In:
Proceedings of the second European semantic web conference, ESWC 2005, vol 3532. Lecture
Notes in Computer Science, Springer, Berlin
Sung M, Marci C, Pentland A (2005) Wearable feedback systems for rehabilitation. J NeuroEng
Rehabil 2(17):1–12
References 195

Tapia EM, Intille S (2007) Real-time recognition of physical activities and their intensities using
wireless accelerometers and a heart rate monitor. In: Paper presented at international
symposium on wearable computers (ISWC)
Teixeira J, Vinhas V, Oliveira E, Reis L (2008) A new approach to emotion assessment based on
biometric data. In: Proceedings of WI–IAT ‘08, pp 459–500
Tobii Technology (2006) AB, Tobii 1750 eye tracker, Sweden. www.tobii.com. Accessed 21 Nov
2012
Truong BA, Lee Y, Lee S (2005) Modeling uncertainty in context aware computing. In:
Proceedings of the 4th annual ACIS international conference on computer and information
science, pp 676–681
van Dorp P, Groen FCA (2003) Human walking estimation with radar. IEEE Proc Radar Sonar
Navig 150(5):356–365
Van Laerhoven K, Gellersen HW (2001) Multi sensor context awareness. Abstract, Department of
Computing, Lancaster University, Lancaster
Van Laerhoven K, Schmidt A, Gellersen H (2002) Multi-sensor context aware clothing. In: The
6th international symposium on wearable computer. IEEE Computer Society, Los Alamitos,
CA, pp 49–56
Vardy A, Robinson JA, Cheng LT (1999) The WristCam as input device. In: Proceedings of the
3rd international symposium on wearable computers, San Francisco, CA, pp 199–202
Vick RM, Ikehara CS (2003) Methodological issues of real time data acquisition from multiple
sources of physiological data. In: Proceedings of the 36th annual Hawaii international
conference on system sciences. IEEE Computer Society, Washington, DC, pp 1–156
Waldner JB (2008) Nanocomputers and swarm intelligence. ISTE, London
Wang S, Pentney W, Popescu AM, Choudhury T, Philipose M (2007) Common sense based joint
training of human activity recognizers. In: Proceedings of the international joint conference on
artificial intelligence, Hyderabad, India, pp 2237–2242
Ward JA, Lukowicz TP, Starner TG (2006) Activity recognition of assembly tasks using body-worn
microphones and accelerometers. IEEE Trans Pattern Anal Mach Intell 28(10):1553–1567
Weiser M (1991) The computer for the 21st Century. Sci Am 265(3):94–104
Wimmer M, Mayer C, Radig B (2009) Recognizing facial expressions using model-based image
interpretation. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals:
cognitive and algorithmic issues. Springer, Berlin, pp 328–339
Wobke W (2002) Two logical theories of plan recognition. J Logic Comput 12(3):371–412
Wright D (2005) The dark side of ambient intelligence. Foresight 7(6):33–51
Zadeh LA (1999) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 100:9–34
Zhou J, Yu C, Riekki J, Kärkkäinen E (2007) AmE framework: a model for emotionaware
ambient intelligence, University of Oulu, Department of Electrical and Information Engineering,
Faculty of Humanities, Department of English VTT Technical Research Centre of Finland
Chapter 5
Context Modeling, Representation,
and Reasoning: An Ontological
and Hybrid Approach

5.1 Introduction

Investigating context recognition in terms of approaches to context information


modeling and reasoning techniques for context information constitutes a large part
of a growing body of research on context awareness technology and its use in the
development of AmI applications that are adaptable, responsive, and capable of
acting autonomously on behalf of users. The benefits of formal context information
modeling are well understood. Indeed, the AmI community increasingly realizes
that to provide relevant and efficient adaptive and responsive services to users, it is
necessary to support the development of context-aware applications by adequate
context information modeling methods and reasoning techniques, especially context
data collected from a variety of sources are often limited and imperfect or failure
prone. The challenge of incorporating context awareness functionality in AmI
services lies in the complexity associated with, in addition to sensing, capturing,
representing, processing, and managing context information. Existing approaches to
context information modeling ‘differ in the ease with which real-world concepts can
be captured by software engineers, in the expressive power of the context infor-
mation models, in the support they can provide for reasoning about context
information, in the computational performance of the reasoning, and in the scala-
bility of the context information management’ (Bettini et al. 2010, p. 2). Context-
aware systems involves a large amount of dynamic context information—especially
in large-scale distributed systems—that needs to be constantly retrieved; effectively
interpreted; rapidly processed; securely disseminated to the interested context
consumers; and maintained, distributed, and synchronized in various context
repositories across a horde of administrative domains. Therefore, streamlined, or
solid context management, mechanisms need to be adopted and, a heterogeneous,
dynamic, scalable, and interoperable context representation scheme needs to be
established.

© Atlantis Press and the author(s) 2015 197


S.E. Bibri, The Human Face of Ambient Intelligence, Atlantis Ambient
and Pervasive Intelligence 9, DOI 10.2991/978-94-6239-130-7_5
198 5 Context Modeling, Representation …

A number of context modeling and reasoning approaches have recently been


developed ranging from simple models—that mainly focus on addressing the needs
of one application class in a simple and straightforward manner—to the current
state-of-the-art context models—that use modeling concepts not tied to specific
application domains and involve novel technical features in the manner they rep-
resent and reason about context information of diverse types, including emotional
states, cognitive states, communicative intents, social states, and activities. Indeed,
a growing body of research on the use of context awareness technology for
developing AmI applications investigates approaches to integrating various context
modeling and reasoning techniques at different levels, with the purpose of har-
nessing formal knowledge form the human-directed sciences (models represented in
a computational format) in ways that enable context-aware applications or envi-
ronments to perform more in-depth analyses of human contexts and behaviors, and
to come up with better informed actions.
Researchers in the field of context-aware computing increasingly understand that
well-designed context models can be achieved by adopting ontology, an approach
that is assumed to evolve towards seamlessly integrating different representation and
reasoning techniques, by creating novel formats and tools to reconcile disparate
reasoning methods and interrelating different conceptual foundations of different
representation formalisms. Using ontologies for context modeling is a recent chal-
lenging endeavor and has gained growing interest among researchers in the field of
context-aware computing. Research shows that majority of the recent work on
context awareness applies ontology-based approaches for they seems to provide
benefits for the development of context-aware applications. Ontological context
modeling provides many advantages associated with semantic expressiveness as to
context formalism and with efficiency and automation as to reasoning about context
information using descriptive logic. With their semantic expressive formalism,
ontologies allow for integrating heterogeneous applications, facilitating interopera-
bility and integration across multiple models and applications, enabling reusability
and portability of models between various application domains and systems,
merging various reasoning techniques, and providing interfaces for communicating
with knowledge-based software agents. Therefore, ontologies are regarded as more
suited to context modeling than the existing modeling approaches that are based on
probabilistic or logical methods, which inherently suffer from a number of limita-
tions. Ontology-based approaches continue to win the battle over existing approa-
ches, and are expected to attain a strong foothold in the realm of context-aware
computing. With their features, they are increasingly becoming popular for
context-aware applications to provide automatic ambient support—for cognitive,
emotional, and social needs. However, ontologies are associated with some short-
comings, particularly with regard to capturing, representing, and processing con-
stantly changing information in a scalable manner as well as dealing with uncertainty
and fuzziness in context information as to both representation and reasoning. It is in
the complexity of capturing, representing, and processing context information where
the challenge lies with regard to the incorporation of context awareness functionality
in the AmI service provision chain (Pascoe and Morse 1999).
5.1 Introduction 199

That a discussion of context recognition based on probabilistic methods and


logical approach (close to ontological approach in nature) was covered in the
previous chapter, the focus in this chapter is on ontological and hybrid approaches
into formal representation of and reasoning on context information. This chapter
aims to review and show the state-of-the-art in the area of ontological and hybrid
context modeling, representation, and reasoning in AmI. In addition to focusing on
works on context information representation and reasoning that fall into the onto-
logical category, other relevant representation and reasoning techniques from the
literature on context-aware computing are included for comparative purposes.
Context is primarily considered from the view point of adaptation in HCI, and
ontology is discussed in the applied context of software engineering.

5.2 Evolution of Context Modeling and Reasoning

Over the last decade, a number of context modeling and reasoning approaches have
been developed, ranging from simple early models to the current state-of-the-art
models. These models have been utilized to develop a large number of
context-aware applications for or within various application domains. With the
experiences with the development of the variety of context-aware applications,
context information models have evolved from static, unexpressive, inflexible
representations to more dynamic, semantic (high expressive power), and extensible
ones, providing support for reasoning about context with enhanced computational
performance. Key-value models are one of the early models in context-aware
applications. They use simple key-value pairs to define the list of attributes and their
values as an approach to describe context information. On the onset attribute-value
models were quite often used, i.e., in Context Toolkit for building context-aware
applications (Dey 2000). Markup-based is another approach to context information
models and uses Extensible Markup Language (XML) among a variety of markup
languages. Composite Capabilities/Preference Profile (CC/PP) (Klyne et al. 2004) is
a context modeling approach involving both key-value pair and markup-based
approaches to context information models. CC/PP approach is perhaps the first
context modeling approach to adopt Resource Description Framework (RDF) and
to include elementary constraints and relationships between context types (Bettini
et al. 2010). It ‘can be considered a representative both of the class of key-value
models and of markup models, since it is based on RDF syntax to store key-value
pairs under appropriate tags. Simple kinds of reasoning over the elementary con-
straints and relationships of CC/PP can be performed with special purpose rea-
soners.’ (Ibid, p. 3). The above approaches to context information models have
many shortcomings and cannot respond to the growing complexity of the context
information used by context-aware applications. Indeed, they are criticized for their
limited capabilities in capturing a variety of context types, relationships, depen-
dencies, timeliness, and quality of context information; allowing consistency
checking; and supporting reasoning on (or inference of) higher level context
200 5 Context Modeling, Representation …

abstractions and context uncertainty (Ibid). This is according to the evaluation in the
literature surveys carried out by Indulska et al. (2003), Strang and Linnhoff-Popien
(2004), and Lum and Lau (2002). Recent research on context-aware modeling and
reasoning have attempted to address many of these limitations, giving a rise to new
a class of context information models characterized by more expressive context
modeling tools. A common feature of recent models is that they have the capa-
bilities to define concepts and their interrelationships and the constraints on their
application. Examples of these models include, and are not limited to: 4-ary
predicates (Rom et al. 2002), object-oriented (Hofer et al. 2003), and fact-based
(Bettini et al. 2010). These models, however, differ in terms of expressiveness and
reasoning efficiency and offer quite many distinctive features in terms of reducing
the complexity of context-aware applications development. For example,
‘fact-based context modeling approach…originated from attempts to create suffi-
ciently formal models of context to support query processing and reasoning, as well
as to provide modeling constructs suitable for use in software engineering tasks
such as analysis and design.’ (Ibid). Fact-based models use Context Modeling
Language (CML) (see, e.g., Henricksen et al. 2004). CML is based on Object-Role
Modeling (ORM) but extends it with modeling constructs for ‘capturing the dif-
ferent classes and sources of context facts…: specifically, static, sensed, derived,
and user-supplied…information; capturing imperfect information using quality
metadata and the concept of ‘‘alternatives’’ for capturing conflicting assertions
(such as conflicting location reports from multiple sensors); capturing dependencies
between context fact types; and capturing histories for certain fact types and con-
straints on those histories.’ (Bettini et al. 2010).
Situated as one of the latest waves of context modeling approaches, ontologies
have evolved as the types of context information used by context-aware applica-
tions grew more sophisticated. Ontological approaches to context information
modeling can be considered as a natural extension of CC/PP and RDF based
approaches ‘to satisfy the requirements of heterogeneity, relationship, and reason-
ing’ (Bettini et al. 2010, p. 4). Ontological context models are characterized by high
expressiveness and apply ontology-based reasoning on context using semantic
description logic. Hence, ontologies are considered very suitable for context
models. Especially the expressive power is a factor that significantly influences
reasoning processes—fuel sound context reasoning mechanisms. Indeed, the use of
the Web Ontology Language (OWL) as a representation scheme better supports
automated reasoning. There exist numerous representations to define context
ontologies, to specify context types and their descriptors and relationships,
including OWL; W3C’ semantic web activities; and Resource Description
Framework (RDF); these logic-based languages probably gave boost to ontologies
(Criel and Claeys 2008). For examples of ontology-based context models see Gu
et al. (2005), Chen et al. (2004b), and Korpip et al. (2005). Furthermore, ontology
researchers have recently started to explore the possibility of integrating different
models (e.g., representation sublanguage) and different types of reasoning mecha-
nisms in order to obtain more flexible, robust, and comprehensive systems. This
hybrid approach to context modeling is bringing, as research shows, many benefits.
5.2 Evolution of Context Modeling and Reasoning 201

Hence, it is increasingly gaining a significant place in the field of context-aware


computing, as an attempt to deal with the complexity and uncertainty of context
information that is to be handled by context-aware applications.
The research on context models has also been active as to the development of
context management systems to gather, manage, evaluate, secure, and disseminate
context information, in particular in relation to large-scale distributed systems.

5.3 Requirements for Context Representation


and Reasoning

There is a large body of work on surveying, comparing, and combining repre-


sentation and reasoning requirements with the aim to create unified approaches as
well as stimulate new research directions towards enhancing context information
models. This is an attempt to foster the development and deployment of
context-aware applications. In a recent survey of context representation and
reasoning in AmI (or UbiComp), Perttunen et al. (2009) provide a synthesis of a set
of requirements for context representation and reasoning based on two sources:
Strang and Linnhoff-Popien (2004) and Korpipää (2005). Given the overlap among
the requirements and features from these two sources, the authors attempted to
merge them to come up with a representative selection of requirements and features.
This selection is presented below based on the authors’ view of their relevancy and
supported by analytical insights from their discussion and based on other authors.

5.4 Representation

5.4.1 Unique Identifiers

In some scale, such identifiers are necessary for the context-aware system to be able
to identify various entities of contexts in a unique way in the real-world domains
that system deals with. This uniqueness allow for reusing the representations
without conflicts in identifiers. All work applying OWL naturally supports
expressing unique identifiers while other work does not deal with unique identifiers.

5.4.2 Validation

This allows software components to ensure that data is consistent with its repre-
sentation schema before performing any reasoning on, or processing with, it.
According to Strang and Linnhoff-Popien (2004), a context representation should
allow validating data against it.
202 5 Context Modeling, Representation …

5.4.3 Expressiveness

This pertains to the ability of a representation language, e.g., OWL-DL, to encode


or represent complex entities and relations. An expressive representation is in
mutual conflict with reasoning mechanism in terms of its soundness, completeness,
and efficiency (Brachman and Levesque 2004). In Korpipää (2005) efficiency and
expressiveness are specified as requirements for context representation. In relation
to expressiveness and efficiency, an important sub-requirement for context repre-
sentation is the support for retraction. ‘Since contexts are usually recognized based
on sensor measurements and new measurements may replace previous values, the
previous measurements and the contexts derived based on the previous measure-
ments should be retracted, as well as the context derivation repeated. This generally
requires computing every conclusion again.’ (Perttunen et al. 2009, p. 6). Forbus
and Kleer (1993) adopted e what is called ‘truth maintenance’ as a means to
preclude some of the computation involved in rule-based systems.

5.4.4 Simplicity, Reuse, and Expandability

A system applying a simple representation to encode the context domain knowl-


edge supports reuse and expandability. In Korpipää (2005), simplicity, flexibility
and expandability are included as the requirements of context representation.
However, simplicity, the ease-of-use, is somewhat conflicting with expressiveness;
especially, ‘it intuitively seems easier to encode the knowledge needed by a simple
application in a less expressive, but simple representation than in an overly
expressive representation, not providing real benefit for the particular application
being designed’, which is ‘a trade-off that has to be made in favor of more complex
applications utilizing the framework.’ (Perttunen et al. 2009, p. 6).

5.4.5 Uncertainty and Incomplete Information

As discussed previously, the real-world context is dynamic, intricate, intractable,


and unpredictable. And our measurement of the real-world is prone to uncertainty
and vagueness due to the use of artificial devices—inaccurate or imprecise sensors.
Therefore, context-aware applications should be able to deal with or handle
uncertain and incomplete context information in relation to detection, representa-
tion, and reasoning. Strang and Linnhoff-Popien (2004) refer to the ability to deal
with ‘incompleteness and ambiguity’ and to represent ‘richness and quality’ of
context information.
5.4 Representation 203

5.4.6 Generality

This entails the ability to support all kinds of context information as to a context
representation (Korpipää 2005). In this perspective, generality of a context repre-
sentation is associated with the expressiveness of a representation language since it
affects its ability to encode context information of different forms of complexity.

5.5 Reasoning

5.5.1 Efficiency, Soundness, and Completeness

It is important to handle dynamic knowledge updates for the context representation


and reasoning system given the high volatility of context information. ‘Using the
most expressive system that provides sound, complete, and efficient-enough rea-
soning under dynamic knowledge base updates is desirable.’ (Perttunen et al. 2009,
p. 5). It could be of value to study the interplay between efficiency, soundness, and
completeness with respect to finding the most suitable trade-offs for context
representation.

5.5.2 Multiple Reasoning/Inference Methods

A context-aware system may incorporate multiple reasoning mechanisms or


inference methods operating on its context representation, a feature which entails
that some of the other requirements for reasoning should be loosened. However, in
some cases, with the different semantics being necessarily encoded in the individual
reasoners and with the same representation having multiple interpretations, inter-
operability can be hindered. Despite the existence of many implementations of
reasoners for a representation, and thus the variation of computational requirements
as to space and time, the resulting conclusion set is identical.

5.5.3 Interoperability

Comprehensive interoperability entails enabling sharing and reuse of representa-


tions. This can be accomplished through representing contexts in a syntactically and
semantically interoperable format. The loosely coupled components of a
context-aware system should conform to a common representation format for
message exchange and the reasoning processes be standardized to ensure that
different implementations of the processes produce identical results. Put differently,
204 5 Context Modeling, Representation …

as Perttunen et al. (2009, p. 5) state, ‘evaluated against the same set of axioms, a set
of assertions should always produce the same conclusions. This implies that when a
set of assertions represents a message, its receiver can derive the exact meaning the
sender had encoded in the message.’
While the congruency of inference conclusions represents a basic prerequisite
for interoperability, it entails a disadvantage in terms of strengthening ‘ontological
commitment’ (Studer et al. 1998). That is to say, ‘the more consequences are
encoded as axioms in the representation, the more its clients are tied to dealing with
the represented entities in the exact same manner’, a case which ‘is undesirable
when only a few of the entities of the representation are of interest to the client.’
(Perttunen et al. 2009, p. 5). The reuse of modules of a Web Ontology Language
(OWL) is one example to deal with this issue (Bechhofer et al. 2004).

5.6 Requirement for Generic Context Models

There have been many attempts to synthesize and evaluate the state-of-the-art
context models that are suitable for any kind of application and that can meet most
of the requirements set for the context modeling, reasoning, and management. The
experiences with the variety of context-aware applications developed based on
various context models has influenced the set of the requirements defined for
generic context models, the context representation and reasoning of the system. In a
recent survey of context modeling and reasoning techniques, Bettini et al. (2008)
synthesize a set of requirements for a generic context information modeling, rea-
soning, and management approach. These requirements quoted below need to be
taken into account when modeling context information.

5.6.1 Heterogeneity and Mobility

There is a large variety of context information sources (e.g., mobile sensors, bio-
sensors, location sensors, image sensors, etc.) that context information models have
to handle, which differ in-in addition to the quality of information they generate—
their means of collecting and interpreting information about certain processes of the
human world or/and states of the physical world, update rate (user profiles versus
user behaviors and activities), dynamic nature of context data, semantic level,
derivation of context data from existing context information, and so on. Moreover,
context-aware applications that are dependent on mobile context information
sources add to the issue of heterogeneity due to the need for context information
provisioning to be flexible and adaptable to the changing environment. It is
essential that context information models consider different aspects and types of
context information in terms of handling and management.
5.6 Requirement for Generic Context Models 205

5.6.2 Relationships and Dependencies

In order to ensure that context-aware applications behave properly, various rela-


tionships between types of context information must be captured, especially when it
comes to complex context-aware applications like those involving cognitive,
emotional, and situational aspects of context, where context information entities
depend on the existence of other context information entities, e.g., atom contexts.

5.6.3 Timeliness

One of the features of context information that needs to be captured by context


information models and handled by context management systems is timeliness
(context histories). This entails that context-aware applications may need access to
past states, in addition to future states. In particular, context histories can become
difficult to manage, when the number of updates is extremely high.

5.6.4 Imperfection

The variable quality of context information may be associated with its dynamic and
varied nature. Accordingly, the changing patterns of the physical world affect the
sensed values in terms of increasing inaccuracy over time or rendering context data
incorrect. Adding to this is the potential incompleteness of context information as
well as its conflicting with other context information. Thereby, it is essential that
context modeling approach incorporates modeling of context information quality as
a means to support reasoning on context information.

5.6.5 Reasoning

Context-aware applications often need reasoning capabilities to take a decision


according to whether any adaptation to the change of the user context is needed,
which involves the use of context information to assess whether there is a change to
the user context. Hence, consistency verification of the context model and context
reasoning techniques becomes of importance to be supported by the context
modeling techniques. In particular, the context reasoning techniques should be
computationally efficient in terms of reasoning about high-level context abstractions
and/or deriving new context facts from existing ones.
206 5 Context Modeling, Representation …

5.6.6 Usability of Modeling Formalisms

The key features of modeling formalisms are the ease with which software designers,
who create context information models to enable context-aware applications to
manipulate context information, can translate real-world concepts associated with
various situations to the modeling constructs and their interrelationships, as well as
the ease with which such applications can manipulate and utilize context
information.

5.6.7 Efficient Context Provisioning

The context modeling approach needs to support the representation of attributes for
appropriate access paths—i.e., dimensions along which context-aware applications
select context information—in order to pick the pertinent objects. This is associated
with the efficiency of access to context information, which the presence of
numerous data objects and large models makes it a difficult requirement to meet.
Those dimensions are, as stated by the authors, ‘often referred to as primary con-
text, in contrast to secondary context, which is accessed using the primary context.
Commonly used primary context attributes are the identity of context objects,
location, object type, time, or activity of user. Since the choice of primary context
attributes is application-dependent, given an application domain, a certain set of
primary context attributes is used to build up efficient access paths’ (Bettini et al.
2010, p. 3).
The experiences with the development of context-aware applications have
shown that deriving and taking into account the requirements for the generic
context knowledge representation and reasoning of the system when modeling
context information is associated with difficulty due to the problematic issues
relating to the development of information context models that usually emerge at
the time of writing the definition of some context domain and devising related
reasoning mechanisms. Context models are usually created for specific use cases or
applications. They have always been application dependent and there are not really
generic context models suitable for all kinds of applications (Dey 2001). ‘As the
context representation and reasoning of the system should be divided between
generic and application-specific, the generic representation and reasoning can be
encoded in the common ontologies, and the application-specific, in turn, in
ontologies extending the common ontology and as rules.’ (Perttunen et al. 2009,
p. 20). Moreover, deriving precisely the requirements for the generic system of
context information representation and reasoning is difficult, as the system should
support all kinds of applications, and some of which are not even known at system
design-time. A straightforward way to approach the situation caused by this
inherent problem because of which the design of the context information repre-
sentation and reasoning system necessarily relies on general requirements is to
5.6 Requirement for Generic Context Models 207

derive requirements from a typical application, thereby designing for the ‘average’
(Perttunen et al. 2009). Nonetheless, there have been some recent attempts (e.g.,
Strimpakou et al. 2006) to design and develop generic context models, not tied to
specific application domains. Common to most approaches to generic context
models is that they should allow for defining various context abstraction levels,
support the mappings and operations on contextual representations across various
contextual entities, and enable an easy reuse and dynamic sharing across models as
well as applications. Unlike the system of context information representation and
reasoning, context management system can be supported by generic mechanisms
applicable in any context domain and should not be bound to specific application
spaces. Overall, the design of generic representation, reasoning, and management
approach to context information modeling remains a quite challenging endeavor,
and thus a research area that merits further attention.

5.7 Context Models in Context-Aware Computing:


Ontological Approach

Incorporating context-aware functionality in the AmI service provision chain entails


capturing, representing, and processing context information. In other words, to be
able to deal with context computationally or in a computerized way so it can be
supported in AmI environments, a context model is needed. This is the way context
knowledge is represented within context-aware systems based on the way context is
operationalized and what this entails in terms of the nature of the contexts being
measured and modeled (or the application domain) and the features of the concrete
applications. The development of context-aware applications requires significant
modeling efforts to ensure that context information is comprehensively represented
in the context management system and that applications are able to perform con-
sistent and effective manipulation of and reasoning on context information.
A well-designed context model should provide highly expressive representation and
support efficient reasoning in support of fully integrated context-aware services
within AmI environments. Context formalism plays a role in fuelling context rea-
soning mechanisms, a computational aspect which, in turn, contributes to the
effectiveness and relevance of adaptive and responsive services delivered to the user.
As one of the recent frameworks used for context knowledge representation and
reasoning, ontology has emerged in response to the growing complexity of context
information used by context-aware applications. It is considered very suitable for
context models because of the semantic, expressive formalism it offers and the
possibility of applying ontology-based reasoning techniques as well as integrating
different representation and reasoning techniques. Hence, a wide range of complex,
profound, and generic context ontologies have recently been applied in context-aware
applications. Particularly, ontological features of context information modeling
reduce the complexity inherent in the development of context-aware applications.
208 5 Context Modeling, Representation …

It is relatively straightforward to build an ontological context model using a


description language. Specifically, ontological approach ‘allows easy incorporation
of domain knowledge and machine understandability, which facilitates interopera-
bility, reusability, and intelligent processing at a higher level of automation.’ (Chen
and Nugent 2009, p. 410). Bettini et al. (2008) point out that ontological context
information models exploit the representation and reasoning power of description
logic: use the expressiveness of the language to describe complex context data that
cannot be represented by simple languages; share and/or integrate context among
different sources by means of a formal semantics to context data; and use the available
reasoning tools to check for consistency of the set of relationships describing a
context scenario as well as to recognize that a particular set of instances of basic
context data and their relationships reveal the presence of an unknown context—a
more abstract context abstraction or characterization. Ideally, context models should
be able to capture and encode the patterns underlying how different context entities
may interact and build on or complete one another depending on the situation in a
dynamic way, as well as to account for variations between users as to the nuances of
their interactive or behavioral patterns and their environments. This seems at the
current stage of research difficult to realize, posing a challenge to system modeling.

5.7.1 Origin and Definitional Issues of Ontology

By ontology in context-aware computing is meant a set of concepts and their


associated definitions and their interrelationship intended to describe different types
of contexts as part of the life-world. Ontology has its origin in philosophy. As a part
of the major branch of philosophy known as metaphysics, ontology is the study of,
or concerned with articulating, the nature and organization of being, existence or
reality as such, as well as the basic categories of being and their relations.
Specifically, ontology deals with questions concerning what is or can be said to
exist as entities, and how such entities can be grouped and subdivided according to
similarities and differences. The word category is the more traditional term of
ontology used by Aristotle to classify anything that can be said or predicated about
anything (Sowa 2000). Drawing inspiration from philosophical ontology, some
researchers viewed computational ontologies as a kind of applied philosophy (Sowa
1984). Ontology offers an operational method to put theory to practice in compu-
tational systems (Gruber 2009). The legacy of computational ontology is a rich
body of theoretical knowledge about how to make ontological distinctions of a
domain, which can be captured by representational choices at the highest level of
abstraction, in a systematic and coherent manner (Ibid). For example, building
computational ontologies for worlds of data can apply many of the insights of
‘formal ontology’ that are driven by understanding ‘the real-world’ (Guarino 1995).
Indeed, it is argued that the representation of entities and events, along with their
properties and relations, according to a system of categories is what many
5.7 Context Models in Context-Aware Computing: Ontological Approach 209

computational and philosophical ontologies have in common. Nevertheless, the


meaning of ‘ontology’ in philosophy is significantly different from the term
‘ontology’ adopted in computer science, in that, as a matter of focus, computer
scientists are less involved in debating scientific knowledge and methodologies
while philosophers are less concerned with establishing fixed meanings and ter-
minology. One implication of this in the context of AmI as a branch of computer
science is the propensity in context-aware systems towards reducing the complexity
of context as an ontological entity—alienating the concept from its complex nature
and structure for technical purposes or due to the constraints of existing system
engineering, design, and modeling methods. Also, work on context-aware systems
is typically based on having designers define what aspects of the world become
context in terms of its nature and structure, starting with philosophical (or com-
prehensive) definitions but operationalize much simpler concepts of context. Here
context as a noun provides, following ontological philosophy, a kind of shorthand
for reference to a collection of events, objects, settings, physical conditions, and
processes that determine varied, interrelated states relevant to different entities.
In computer science ontology as a technical term refers to a formal represen-
tation of some domain of knowledge or the world that is hierarchical and comprise a
set of concepts and their interrelationships. In other words, ontology as an artifact
provides a shared vocabulary that is intended to enable the modeling of knowledge
about some domain by specifying a set of representational primitives. In computing,
researchers use the term to refer to ‘an explicit specification of a conceptualization’
(Gruber 1993). In this context, as an abstract, conceptualization is a simplified view
of (some aspects of) the world that we wish to represent for some purpose. The
essential points of Gruber’s definition involve: ontology defines different entities
(e.g., objects, concepts) that are assumed to exist in some domain of interest (e.g.,
activity, context) and the relationships that hold these entities, and other distinctions
with which or that are relevant to model a domain knowledge; and (2) the speci-
fication takes the form of the definitions or descriptions of representational
vocabulary (e.g., classes, relations, roles, functions), which provide meanings for
the vocabulary and formal constraints on its logically consistent or coherent
application (Gruber 2009). However, this definition is overly broad, allowing for a
range of specifications, e.g., logical theories expressed in predicate calculus (Smith
and Welty 2001). A more precise definition of ontology is provided by Studer et al.
(1998) who describe it as ‘a formal, explicit specification of a shared conceptual-
ization. A “conceptualization” refers to an abstract model of some phenomenon in
the world by having identified the relevant concepts of that phenomenon. “Explicit”
means that the type of concepts used, and the constraints on their use are explicitly
defined. For example, in medical domains, the concepts are diseases and symptoms,
the relations between them are causal and a constraint is that a disease cannot cause
itself. “Formal” refers to the fact that the ontology should be machine readable,
which excludes natural language. “Shared” reflects the notion that an ontology
captures consensual knowledge, that is, it is not private to some individual, but
accepted by a group.’
210 5 Context Modeling, Representation …

Ontologies have been applied in computing in multiple ways. As knowledge


base systems, context-aware applications are committed to context conceptualiza-
tions, which are created for machine manipulation: to enable certain kinds of
automated reasoning about context entities within some domain of knowledge. By
ontology is meant, in this context, logically sound representation and reasoning
mechanism. But one can mean by ontology a multitude of things. Gómez-Pérez
(1998) classifies ontology in a four-tier taxonomy: (1) domain ontology that pro-
vides a vocabulary for describing a particular domain; (2) task ontology that pro-
vides a vocabulary for the terms involved in a problem solving process;
(3) meta-ontology that provides the basic terms to codify domain and task ontology;
and (4) knowledge representation ontology that captures the representation primi-
tives in knowledge representation languages. Regardless, the core advantage of
utilizing ontology is to facilitate the knowledge sharing among the various parties
(including applications and users) involved in some domain of knowledge. In
relation to context-aware environments, drawing on Fensel (2003), ontology can be
described as a shared understanding of context knowledge domain that can be
communicated between people and heterogeneous applications.

5.7.2 Key Characteristics and Fundamentals of Ontology

As structural frameworks for organizing knowledge about humans and the world in
a computerized way, ontologies have a wide applicability. This spans diverse areas,
including AI (e.g., conversational agents, emotionally intelligent systems, affective
systems, expert systems, etc.), AmI (e.g., cognitive, emotional, activity, and loca-
tion context-aware systems), the Semantic Web, enterprise engineering, testing, and
academic research.
Due to their semantic expressive formalism, ontologies meet most of the
requirements set for representation and reasoning and are distinctive from other
models in quite many aspects. Ontologies allow for integrating heterogeneous
applications (e.g., mobile, ubiquitous, AmI, and AI applications), enabling inte-
gration and interoperability with regard to shared structure and vocabulary across
multiple models and applications, enabling reusability and portability of models
between various application domains and systems, amalgamating different repre-
sentation schemes and reasoning techniques, providing interfaces for interacting
with knowledge-based software agents, and so on. As to the latter, for example,
ontology specifies a vocabulary with which to make assertions, which may con-
stitute inputs or outputs of software (knowledge) agents, providing, as an interface
specification, a language for communicating with the agent, which is not required to
use the terms of the ontology as an internal encoding of its knowledge while the
definitions and formal constraints of the ontology put restrictions on what can be
meaningfully stated in that language (Gruber 2009). Fundamentally, in order to
5.7 Context Models in Context-Aware Computing: Ontological Approach 211

commit to an ontology (i.e., supporting an interface using the ontology’s vocabu-


lary), statements asserted on inputs and outputs must be in logical consistency with
the definitions of the ontology and the constraints on its application (Gruber 1995).
Furthermore, ontologies provide a common vocabulary of a domain and define the
meaning of the terms and their interrelationships with different levels of formality.
The level of formality of the implemented ontology depends on the formality that
will be used to codify the terms and their meanings. Uschold in (Uschold and
Gruninger 1996) classifies the level of formality in a range of: highly informal
ontologies; semi-informal ontologies, semiformal ontologies; and rigorously formal
ontologies, depending on whether the terms and their meanings are codified in a
language between natural language and a rigorous formal language. There is a set of
principles for the design of formal ontologies. Gruber (1993) provides a preliminary
set of design criteria for the ontologies developed for knowledge sharing; they
include: clarity, coherence, completeness, extendibility, minimal encoding bias, and
minimal ontological commitment. Other criteria that have been demonstrated to be
useful in the development of ontology are ‘ontological distinction principle’ (Borgo
et al. 1996); diversification of ontologies to improve the power provided by mul-
tiple inheritance mechanisms and minimizing the semantic distance between sibling
concepts (Arpirez et al. 1998); and modularity (Bernaras et al. 1996).
In the field of AmI, the use of ontology in context-aware applications is typically
linked to context recognition (but also includes human-like understanding of what
is going on the human’s mind and his/her behavior), a process which depends on
the representation of context information necessary for the system to interpret and
process different contextual features pertaining to the user in order to infer
high-level context abstractions as a crucial basis for delivering relevant services,
through firing context-dependent application actions. Context recognition is thus
performed through semantic reasoning that makes extensive use of semantic
descriptions and domain knowledge. There are now a variety of standardized lan-
guages and tools (discussed below) for creating and working with ontologies in the
area of context-aware computing. Most context ontologies use description
logic-based languages such as OWL for specifying knowledge domains and their
descriptors and relationships. In relation to this, due to their semantic nature,
context ontologies as conceptualizations are intended to be independent of data
representation (e.g., OWL, UML) and implementation approaches (e.g., Java,
Protégé). Ontology is specified in a language that allows abstraction away from data
modeling structures and implementation strategies (Gruber 2009). Ontologies being
typically formulated in languages, which are closer in expressive power to logical
formalisms, enables ‘the designer to be able to state semantic constraints without
forcing a particular encoding strategy’, that is, ‘in an ontology one might represent
constraints that hold across relations in a simple declaration (A is a subclass of B),
which might be encoded as a join on foreign keys in the relational model’ (Ibid).
212 5 Context Modeling, Representation …

5.7.3 Ontology Components

In general, ontology involves various components including, class hierarchy, clas-


ses, super-classes, subclasses, attributes/properties, multi-attributes, instances,
functions, actors, processes, values, default values, inheritance, multi-inheritance,
variables, restrictions, relations, axioms, events, and so on. Some of these compo-
nents (e.g., classes, properties, relations, and instances) are used in most applica-
tions. Other components, which are more sophisticated, (e.g., axioms,
multi-inheritance, events) can be used depending on the application domain. This
typically relates to the nature and complexity of the system in terms of the speci-
fications for the ontological model involved. Regardless of the ontology language in
which contemporary ontologies are encoded or represented, they share many
structural similarities. The most commonly used components of ontology include:
• Classes: sets, types of objects, kinds of things, or concepts. Concepts could be
the description of a task, function, action, strategy, reasoning, or process
(Gruber 1993). Classes are usually organized in taxonomies. Sometimes the
definition of ontologies is diluted, that is, taxonomies are considered to be full
ontologies (Studer et al. 1998).
• Individuals: instances or objects are used to represent elements.
• Attributes: properties, features, characteristics, or parameters that classes (or
objects) can have.
• Relations: ways in which classes and instances (individuals) can be related to
one another. They represent a type of interaction between concepts of the
domain.
• Events: the changing of attributes or relations.
• Functions terms: complex structures formed from certain relations that can be
used in place of an individual term in a statement. Functions are a special case of
relations in which the nth element of the relationship is unique for the n − 1
preceding elements (Gruber 1993).
• Axioms: used to model assertions—statements that are always true. Assertions,
including rules in a logical form, constitute what the ontology describes in its
domain of application as theories derived from axiomatic statements.
• Restrictions: formally stated descriptions of what must be true in order for some
assertions to be accepted as input.
• Rules: consist of conditions and consequences, in that when the conditions are
satisfied the consequences are processed. They are statements in the form of an
if-then (condition-consequent) sentence, which describe the logical inferences
that can be drawn from an assertion in a particular form. In the context of
context-aware applications, consequence can be context-dependent action or
new inferred context data. While logic to define rules can sometimes be very
complex, rule-based algorithms or reasoners have made it easier to develop
context-aware applications.
5.7 Context Models in Context-Aware Computing: Ontological Approach 213

5.7.4 Ontological Context Modeling

5.7.4.1 Conceptual Model and Context Structure

The context representation problem has two sides, the encoding of knowledge and
conceptual model. It is argued by ontology designers that the conceptual structure is
associated with more issues and challenges than the encoding process. Winograd
(2001) notes that it is relatively straightforward to put what needs to be encoded
once understood into data structures, but the hard part is to come up ‘with con-
ceptual structures that are broad enough to handle all of the different kinds of
context, sophisticated enough to make the needed distinctions, and simple enough
to provide a practical base for programing.’
Ontological context modeling is the process to explicitly specify a set of rep-
resentational primitives—i.e., key concepts and their interrelations—and other
distinctions with which or that are relevant to model a domain of context, and build
a representation structure or scheme to encode such primitives and other distinc-
tions using the commonly shared vocabularies in the context domain. The repre-
sentational primitives include constraints on the logically consistent application and
use of concepts and their interrelations as part of their explicit specification. The
resulting context ontologies, the explicit representation of contexts that comprise
context categories and their relationships in, for instance a cognitive, emotional,
social, or situational domain, are essentially shared context knowledge models that
enhance automated processing capabilities by enabling software agents to interpret
and reason about context information, thereby allowing intelligent decision support
in a knowledgeable manner. This is associated with the delivery of adaptive and
responsive services in a knowledgeable, autonomous manner. Contexts in context
ontologies are modeled based on various contextual entities, e.g., emotional state,
cognitive state, task state, social state, environmental states, time, events, objects,
and so on, as well as the interrelationships between these entities, a computational
feature which allows software agents to take advantage of semantic reasoning
directly to infer high-level context abstraction. This dynamic context can be derived
from existing context information, using intelligent analysis, rather than using
probabilistic methods.

5.7.4.2 Languages for Implementing Context Ontology

In general, ontology can be implemented using a variety of modeling languages,


including OWL, Unified Modeling Language (UML), RDF, the Ontology Interface
Layer (OIL), OIL+DAML (predecessor of OWL), and the DARPA Markup
Language (DAML). This variety can be explained by the fact that general nature of
ontologies makes them independent of any language by which they can be
implemented. It is precisely the semantic description form through which ontolo-
gies can be formally specified that enables abstraction away from data modeling
214 5 Context Modeling, Representation …

and implementation. While different languages have been proposed to implement


context models, OWL and thus OWL-based ontologies are the most used within the
area of context-aware computing. Especially, W3C supports standardization in
various domain applications. Standardization provides a significant thrust for fur-
ther progress because it codifies best practices, enables and encourages reuse, and
facilitates interworking between complementary tools (Obrenovic and Starcevic
2004). Several variants varied in expressive power are proposed by today’s W3C
Semantic Web standard as a specific formalism for encoding ontologies
(McGuinness and van Harmelen 2004).
UML is increasingly gaining ground to implement ontology, although it is a
standardized general-purpose modeling language in the field of object-oriented
software engineering. Kogut et al. (2002) provide a number of good reasons why
UML is a promising notation for ontologies. In fact, UML is widely adopted and
familiar to many software engineers and practitioners. It can be used to specify,
visualize, modify, construct, and document the artifacts of context-aware applica-
tions as object-oriented software-intensive systems. UML consists of a collection of
semiformal graphical notations. These notations can be used to support different
development methodologies (Fowler and Scott 1997). Its models can be automat-
ically transformed to other representations such as Java by means of QVT-like
transformation languages or OWL using OWL formats. However, most modeling
tools allow transformation of models using specific formats such as XML or OWL,
so they can be used by other modeling tools. Further, UML is extensible with two
mechanisms for customization: profiles and stereotypes. It includes built-in facili-
ties that allow profiles for a particular domain. A profile can specify additional
constraints on selected general concepts to capture context domain forms and
abstraction. This extension mechanism has enabled practitioners to extend the
semantics of the UML, by allowing defining stereotypes, tagged values, and con-
straints that can be applied to model elements. A stereotype allows defining a new
semantic meaning for a modeled element, tagged values allow to ‘tag’ any value
onto that modeled element, and constraints define the well-formedness of a model.
UML allows with Object Constraints Language (OCL) the more formal to attach
additional information, such as constraints and invariants, to the graphical speci-
fication (Oechslein et al. 2001).
OWL and UML have been originally designed for other purposes than con-
ceptual context modeling: OWL for computational efficiency in reasoning and
UML for supporting software design and implementation. In contrast to what
ontological approaches to context information modeling assume—OWL and UML
offer the adequate conceptual foundations upon which ontological models can be
based—OWL and UML fall short in offering suitable abstractions for constructing
conceptual models. This is defended extensively in Guizzardi et al. (2002),
Guizzardi (2005).
5.7 Context Models in Context-Aware Computing: Ontological Approach 215

5.7.4.3 OWL-DL as de Facto Standard Language


for Context Encoding

A modeling language refers to an artificial language that can be used to describe


knowledge in a structure. Constituting a formal representation of some knowledge
domain in context ontology, this structure is defined by a consistent set of rules,
which are used for the interpretation and reasoning against context information.
Context encoding entails representing a context model using a particular knowledge
representation technique, such as ontology-based (description logic) language,
case-based reasoning (CBR), rule-based reasoning, logic programing language, and
so forth. OWL-DL as a graphical modeling language uses a diagram technique with
named symbols that represent concepts and lines that connect the symbols
and represent relationships, and various other graphical notations to represent
constraints. Most of ontology-based context models are implemented using
OWL-DL—in other words, OWL-DL is the most frequently used language to
encode ontological context models. OWL-DL is typically the formalism of choice
in ontology-based models of context information (Horrocks et al. 2003), and some
of its variations. Most of the ontology-based context representation applies
description logics (Baader et al. 2003)—OWL-DL. OWL-DL becoming a de facto
standard in context-aware applications (and various application domains) has also
been corroborated by two recent surveys of context modeling and reasoning,
namely Perttunen et al. (2009) and Bettini et al. (2008).

5.7.4.4 DL Knowledge Bases and OWL-DL Key Constructs

Most of the ontology-based work applies description logics, irrespective of the


nature of the type of context that is to be modeled. All types of ontological contexts
(e.g., emotional states, cognitive states, social states, task states, communicative
intents, activities, etc.) can be inferred through semantic reasoning making extensive
use of semantic descriptions and context domain knowledge. Broadly, description
logic (DL) knowledge bases, i.e., repositories consisting of context ontologies,
comprise two main components: the TBox and ABox. The Tbox contains a set of
terminological axioms T of the application domain, i.e., schema sentences formed
by, for example, OWL constructs and ontological concepts, and the ABox consists
of a set of assertional axioms A about individuals, i.e., ground facts (instances),
using the terminology of the TBox. DL languages involve three key constructs:
classes that represent objects, roles that describe relations between classes, and
individuals that represent instances of classes (see Perttunen et al. 2009).
By means of OWL-DL it is possible to model various knowledge domains by
defining classes, attributes of classes, binary relations between classes (roles),
individuals, features of individuals (datatype properties), and relations between
individuals (instance or object properties). Context knowledge domain can be
ontologically modeled using OWL-DL constructs. It may be useful to describe
some OWL-DL constructs and ontological concepts and how they may be used in
216 5 Context Modeling, Representation …

context knowledge representation, without any specific application in mind. The


conceptual context models can be represented in a formal ontology, which depends
on how the context can be operationalized; this implies that context-aware appli-
cations differ as to how many contextual entities they incorporate and how such
entities interrelate—comprehensive versus simple definitions of context.
Regardless, the context class enables having an integrated set of ontologies,
developed for various applications. In a formal context ontology represented by
OWL-DL, a context class is made of a set of subclasses, (classes by themselves),
thus forming a context hierarchy. There are simple and complex definitions of
classes. The former are straightforward and clear-cut, and the latter can be obtained
by using operators (e.g., property restrictions) that can force some or all values of a
certain property to fit in a particular class. The operators provided by OWL-DL
allow composing elementary descriptions through specific operators to build
complex descriptions of classes and properties. Examples of complex context data
that can be represented by structured OWL-DL expressions and be inferred by
means of reasoning tasks on the basis of low-level data directly acquired from
sensors, include information about emotional states, cognitive states, social states,
nonverbal communication intents and functions, dynamic user preferences
regarding the adaptation of services, human movements, and activities.
Furthermore, context ontology consists of context hierarchy. Typically, the class
user is the central entity in the context ontology and corresponds to all human
entities and offers various data type properties for incorporating both human factors
and physical environment related context. In context hierarchy, each class denotes a
type of context or a context entity and is described with a number of properties,
using literal or instances of other classes as their values, thus linking two classes.
Context entities may demonstrate various properties which are represented by
attributes. The property identifies context entities’ statuses (static or dynamic
attributes), and therefore captures all the context information that will be used to
characterize the situation of the upper entity. Put differently, if the fillers of a
number of properties of different context entities are observed and linked to form a
description of a specific (high-level) context, the context data described by the
perceived properties can then be inferred through descriptive reasoning against the
ontologies of context entities (see Chen and Nugent 2009). Subclass and super-class
properties denote the type and interrelationship between context entities. Moreover,
the attributes and associations may have an activation status parameter, which
indicates whether or not instances are currently activated. A context entity may be
linked to other context entities via different types of associations. Some associations
originate at the upper entity, and points to one or more child entities while other
associations may form generic associations among peer entities. An instance of
class such as ‘user’ can have a relationship such as ‘Doingtask’ or
‘Feelingfrustrated’ which links to an instance in the ‘Task’ or ‘Emotional State’
class. All classes and relationships can be added or removed as needed. Also, the
class ‘event’ is marked with a status feature that indicates the dynamic changing of
attributes or associations of classes or objects as instances of classes.
5.7 Context Models in Context-Aware Computing: Ontological Approach 217

5.7.4.5 OWL-DL Expressiveness Limitations

While it is shown that various knowledge representation techniques have been


experimented with and applied in the field of context-aware computing, a major
breakthrough is still not achieved or lacks. ‘[A]ltough using expressive (e.g.,
OWL-DL) ontologies is suggested in a number of papers, the evidence does not yet
show that these systems would meet all requirements. This lack of evidence comes
up as the small number of work reporting quantitative evaluation and as the
non-existence of work reporting large scale deployment.’ (Perttunen et al. 2009,
p. 22). Like many modeling languages, OWL-DL is associated with several limi-
tations, manifested in ontological context models supported by OWL-DL failing to
meet all the requirements set specially for the context representation. At the time of
writing ontology language there is very little support for modeling temporal aspects
in ontologies (Bettini et al. 2010). Also, ‘despite the ability to express relations and
dependencies among context data makes the ontological model a satisfactory
solution for a wide range of context-aware applications, experiences with the
development of context ontologies show that the operators provided by OWL-DL
are sometimes inadequate to define complex context descriptions…This problem is
due to the fact that the constructors included in the OWL-DL language were chosen
in order to guarantee decidable reasoning procedures. For this reason, OWL-DL
does not include very expressive constructors that would be helpful for modeling
complex domains, such as users’ activities.’ (Ibid, p. 9). Lassila and Khushraj
(2005) point out that the lack of composition constructor for properties remains an
issue in representing context with description logic. Moreover, in terms of relations
expressed through properties, there are some definitions that cannot be expressed in
OWL-DL. For example, ‘if a person A is employed by B person b that is the
employer of C, then A is colleague of C…In fact, the language—in order to
preserve its decidability—does not include a constructor for composing relations.
Similarly, OWL-DL does not include some expressive class constructors, such as
the ones that restrict the membership to a class only to those individual objects that
are fillers of two or more properties (these constructors are called role-value-maps
in the literature).’ (Bettini et al. 2010, p. 9). In an attempt to overcome some of
these limitations, in particular by including a restricted form of property compo-
sition, extensions of OWL have been proposed (Motik et al. 2008). Because ‘at the
time of writing the definition of some context domains with OWL-DL can be
problematic’, the Semantic Web community has recently investigated the possi-
bility of augmenting the expressivity of ontological languages through an extension
with rules, and brought it to the definition of logic languages such as Semantic Web
Rule Language (SWRL) as adopted by Chaari et al. (2007), Bettini et al. (2010).
Specifically, Chaari et al. (2007) use ontologies to represent context and rules to
trigger adaptation, an approach that, as Horrocks et al. (2004) point out, aims to use
the expressive combination of OWL and SWRL; however it is not possible to fully
translate SWRL rules into a rule-based reasoner. Besides, the ‘rule extensions are
not really hybrid approaches since rules are fully integrated in ontological rea-
soning. The main problem with this approach is that reasoning in OWL-DL is
218 5 Context Modeling, Representation …

already computationally expensive…and the proper integration of rules makes the


resulting language undecidable. A further research issue consists in extending
existing ontological languages to support fuzziness and uncertainty while retaining
decidability…’ (Bettini et al. 2010, p. 9). In addition, Perttunen et al. (2009) point
to a general finding that representing context is infrequently differentiated from
representing other knowledge, although context is considered a key factor in terms
of its use in context-aware systems, e.g., some representations do not define that
some facts are context rather than general facts about the domain. To highlight the
usefulness of this alternative approach to representation, the authors raised the
question about whether it is useful to differentiate contexts from other knowledge.
They point out that the main idea is to enumerate ‘all contexts that are required by
the system and defining them in terms of other domain knowledge or in terms of
other contexts’, which relates to the idea that ‘contexts are modeled as “first-class
objects”, context objects can have relations to other domain objects, and context can
have relations to other contexts’—propositional logic.

5.7.4.6 The Relationship Between Representation Formalism


and Reasoning Mechanisms

Context ontologies provide highly expressive context formalism that fuel sound
context reasoning mechanisms. They are basically shared knowledge models that
enhance automated processing capabilities by enabling software agents to interpret
and reason about context information. To a large extent, the efficiency of reasoning
mechanisms is determined by the nature of the expressive system (formalism) used
to codify context knowledge domain as to the ability to deal with dynamic
knowledge updates, to encode complex context entities and relations or context
information of different forms of complexity, to use a simple representation, to
consider uncertainty and incompleteness of context information, and so forth.
Research shows that ontological formalism is fundamental in the design, devel-
opment, and evaluation of reasoning mechanisms in context-aware applications.
The research in symbolic knowledge representation has been mostly driven by
the trade-off between the expressiveness of representation and the complexity of
reasoning. The research is still active as to investigating this interplay more closely
to achieve the most suitable solutions for data-intensive AmI systems (see Perttunen
et al. 2009). Ontology as essentially descriptions of concepts and their relationships
has emerged as an alternative solution, a common language for defining
user-specific rules based on semantic description logics that support automated
reasoning. Description logics (Baader et al. 2003) have emerged because they
provide complete reasoning supported by optimized automatic mechanisms (Bettini
et al. 2010). While other reasoning techniques have been utilized in the field of
context-aware computing such as probabilistic and statistical reasoning, logical
reasoning, case-based reasoning, and rule-based reasoning, the subset of the
OWL-DL admitting automatic reasoning is the most frequently used in various
application domains and supported by various reasoning services.
5.7 Context Models in Context-Aware Computing: Ontological Approach 219

5.7.5 Ontological Context Reasoning

5.7.5.1 OWL-DL-Based Reasoning

In context ontology, reasoning refers to an algorithm that uses semantic description


logic (DL) to reason about context information in order to deduce high-level
context abstractions. This algorithm is executed for inferring new knowledge about
the current context based on the defined classes and properties and their relation-
ships, and on the sensors readings and other context sources, depending on the
application domain. The support of reasoning tasks, intelligent processing at a
higher level of automation, is a key benefit of ontologies with regard to simpler
representation formalisms. Generally, ontological context models use the reasoning
power of DL to identify potential inconsistencies in the context information, the
definition of the classes and properties of the ontology, by performing consistency
checking of the set of concepts and relationships describing a given context,
a process which is critical in the definition of an ontology and its population by new
instances, in addition to the derivation of new context information on the basis of
individual objects retrieved from multiple, diverse sensors and other context
sources, and on the classes and proprieties defined in ontologies (see Bettini et al.
2010)—in other words, to determine the presence of a high-level context abstrac-
tion, such as the user’s emotional state, cognitive state, situational state, task state,
or activity, based on a particular set of instances of basic context data and their
relationships.
Generally, description logic-based reasoning may use equivalency and sub-
sumption for context recognition, i.e., to test if two context concepts are equivalent
or if a context concept is subsumed by one or more context concepts. Key reasoning
operations with respect to ABox are realization, i.e., ‘determining the concepts
instantiated by a given individual’ and retrieval, i.e., ‘determining the set of indi-
viduals that instantiate a given concept’ (Baader et al. 2003, p. 310). This is related
to the context recognition algorithm that is based on ontological approach, which
may differ slightly from an application to another with respect to the technical detail
of the involved phases, depending on the nature and complexity of the context
being assessed. In relation to this, the actual sensor readings and subsequent context
data/information computation or processing are normally bound with the way
context model is conceptualized and semantically expressed by representation
formalism, which is inextricably linked to the degree of automation pertaining to
reasoning mechanisms. Without a specific application in mind, the data upon which
the reasoning is performed are initially detected and pre-processed by sensors, and
then the sensor readings are mapped to matching properties described in context
ontologies. These are used to aggregate and fuse sensor observations to create a
high-level abstraction of a context or situation. The context to be recognized or
inferred is described at two levels of abstraction: one denotes the conceptual
description of context and the other denotes its instances that bind properties with
sensor readings. DL reasoner is used to check whether the conceptual description of
220 5 Context Modeling, Representation …

context is equivalent to any atomic context concept in TBox (a set of terminological


axioms). If this is the case, context can be recognized as the type of the current
context. If context is not equivalent to any atomic context concept, then DL rea-
soner is used to compute the most specific atomic concepts in Tbox subsuming
context, which is, in essence, the direct super-concept of atomic context. As an
example of context inference based on subsumption determination, the emotional or
cognitive dimension of context, such as decision making/information searching or
feeling frustrated/uninterested can be inferred as a higher level context by using the
internal context such as user’s intention, work context, and personal event, or facial
expressions, gestures, and psychophysiological responses, respectively, as an
atomic level of the context. Likewise, the physical dimension of the context, like
‘watching TV’ can be deduced by using the external context (i.e., type of location is
indoor, light level is changing, certain audio level (not silent), room temperature,
and user is stationary) as an atomic level of the context. The atomic contexts are
transformed into a higher level of the context—context inference—through rea-
soning mechanisms.

5.7.5.2 Recognition Algorithm Based on Ontological Approach


to Modeling and Reasoning

It may be useful to illustrate reasoning tasks as part of a complete recognition


algorithm based on ontological approach to modeling and reasoning. Chen and
Nugnet (2009) proposed an integrated ontology-based approach to activity recog-
nition, which espouses ontologies for modeling activities, objects, and sensors, and
exploits semantic reasoning based on descriptive logic. They stated that this
approach has been implemented in the realms of a real-world activity recognition
situation in the context of assisted living within Smart Home (SH) environments.
As an assumption, SH knowledge repositories KR (T, A) are composed of a set of
terminological axioms T and a set of assertional axioms A. The key feature of their
ontology-based activity recognition algorithm is, as reported by the authors, that it
supports incremental progressive activity recognition at both coarse-grained (an
assistive agent can only make a suggestion for a type of activity to an actor as the
ongoing activity of assisted daily living) and fine-grained levels (an assistive agent
needs to compares the sensor readings with the specified property values of the
instances concerning contextual information). For the detailed algorithm and related
issues, the reader is directed to Chen and Nugnet (2009).

5.7.5.3 Ontological Reasoning Weaknesses

Ontological reasoning-DL-enabled inference mechanism—is a suitable approach to


context recognition, deducing new context information that can be understandable
to the user and support the user task. That notwithstanding, it raises serious per-
formance issues. These issues are in Agostini et al. (2009), Wang et al. (2004)
5.7 Context Models in Context-Aware Computing: Ontological Approach 221

confirmed by experimental evaluations with different ontology-based context rea-


soning architectures. The lack of procedural attachments is a problem in reasoning
with OWL-DL (Lassila and Khushraj 2005). To resolve this issue, the authors adopt
a hybrid reasoning procedure, where they integrate rule-based reasoning DL rea-
soning. Special purpose procedural attachments (Brachman and Levesque 2004)
may well involve cost reduction of inference in contrast to a generic inference
mechanism. Scalability is another issue of OWL-DL. It is raised, according to
Bettini et al. (2008), by online execution of ontological reasoning when the
ontology is populated by a large number of individuals; various optimizations based
on the use of relational database techniques have been proposed in an attempt to
improve the efficiency of reasoning with OWL-DL. Scalability is a critical feature
of context modeling. It is a desirable property of context-aware systems as it
enables them to handle growing amounts of work in a graceful manner or to be
readily enlarged. In addition, the OWL-DL being designed for monotonic inference
signifies that assertions cannot trigger the change of the truth of previous assertions,
which does not fit modeling context due to the fact that the knowledge base evolves
with time (Perttunen et al. 2009). Also, although some DL systems support
retraction of assertions, which is required by context awareness, typical description
logics systems are not optimized for query answering in domains with rapid
changes in ABox information (Ibid). Parsia et al. (2006) report ongoing investi-
gation on how to exploit the monotonicity of DLs to reduce the cost of incremental
additions and retractions.
Still, the strength of ontological reasoning is in its efficiency and straightfor-
wardness compared to other reasoning methods such as probabilistic reasoning
where inference need to be performed based on rules that must be dynamically
learned in order to make educated guesses about the user and the relevancy of the
services to be delivered. Besides, it is considered computationally expensive to
learn each context and its rule in a probabilistic model for an infinite richness of
contexts in real-world scenarios, especially when it comes to human factors related
contexts given the associated subtlety and subjectivity. Nonetheless, probabilistic
reasoning can be efficient in some applications related to location context, espe-
cially in terms of handling vagueness and uncertainty. In a location context
application Ranganathan et al. (2004) apply a form of probabilistic predicate logic
for which probabilistic reasoning is known to be sound and complete, and they
experience some advantages as to dealing with uncertainty.

5.7.5.4 Combining Ontology-Based and Rule-Based Reasoning


Techniques

Research shows that a large part of ontology-based work in relation to AmI or


pervasive computing involves rule-based reasoning only. A part of this work adds
inferences encoded in the ontology axiomatic statements. This is to say,
rule-based-reasoning can be merged with ontology-based inference. Perttunen et al.
(2009, p. 20) note that ‘most of the recent work describing usage of OWL to represent
222 5 Context Modeling, Representation …

context merely refer to using OWL inference, but the focus is on making inferences
using an external rule-based system. There is an important distinction between
inferences licensed by the ontology axioms and inferences based on arbitrary rules. In
the former, any reasoner for that ontology language produces the same results,
whereas in the latter both the ontology and the rules in the specific rule language are
needed, possibly also an identical rule engine. For this reason, much of the benefit of
using standard ontology languages is lost when inference is based on ad-hoc rules,
merely using the ontology terms as a vocabulary. Nevertheless, extending the rea-
soning beyond the inferences licensed by the ontology axioms is often necessary due
to the fact that the expressive power of the ontology language is often insufficient for
the task at hand.’ However, there is an active research to seek solutions to this issue
by extending OWL with rules (Maluszynski 2005). The semantic eWallet (Gandon
and Sadeh 2003) architecture for context awareness adopted more expressive
ontology languages obtained by extending OWL-DL with rules. Overall, ontologies
extending the common ontology and as rules are intended for encoding
application-specific as a category of the context representation and reasoning of the
system, whereas common ontologies are used to encode the generic representation
and reasoning.

5.7.6 OWL-Based Context Models: Examples


of Architectures for Context Awareness

Various OWL ontologies have been proposed for representing shared descriptions
of context with a commonality of being grounded in the hierarchies of the taxo-
nomic top-down of the domain components. The SOUPA (Chen et al. 2004c) OWL
ontology for modeling context in pervasive environments and the CONON (Zhang
et al. 2005) ontology for smart home environments are recognized to be among the
most prominent proposals and notable examples of OWL ontologies.
OWL-DL ontological models of context have been adopted in several archi-
tectures for context awareness. Context Broker Architecture (CoBra) (Chen et al.
2004a) for context awareness adopts the SOUPA (Chen et al. 2004c) ontology. The
authors note that reasoning is carried out both based on the axioms in the ontologies
as well as utilizing additional rule-based reasoning with arbitrary RDF triples, using
Jena’s rule-based OWL reasoning engine and Java Expert System Shell (Jess),
respectively (Perttunen et al. 2009). Although there is no description of the
mechanism for detecting when OWL reasoning is not enough, ‘the system is said to
be able to query the ontology reasoner to find all relevant supporting facts, and to
convert the resulting RDF graph(s) into a Jess representation. A forward-chaining
procedure is executed in Jess and any new facts are converted back to RDF and
asserted to the ontology reasoner’ (Ibid, p. 11). The SOCAM (Gu et al. 2004b)
middleware is another architecture that espouses the CONON (Zhang et al. 2005)
ontology. As proposed in Bouquet et al. (2004), SOUPA and CANON can be
5.7 Context Models in Context-Aware Computing: Ontological Approach 223

incorporated with application-specific models of context by means of extensions of


the OWL language (Bettini et al. (2008). In Gu et al. (2004a) OWL ontology
language was also used to model context. Their ontological context model includes
the upper ontology and the domain ontology as layers. They represent contexts in
first-order predicate logic by means of the vocabulary defined in the ontology in the
first-order expressions, and maintained that the reasoning tasks in their systems
involves RDF Schema, OWL Lite axioms, and rule-based reasoning with arbitrary
RDF triples. An ontology approach that is similar to that of Gu et al. (2004a) is used
in the Semantic Space project (Wang et al. 2004) and in Nicklas et al. (2008), which
both use OWL to represent context information and rule-based reasoning. Also,
Lassila and Khushraj (2005) use description logics (DL) to represent context. In
their ontology, classes are employed to represent contexts and individuals or
instances to represent the current contextual information about an entity. As rea-
soning operations with respect to ABox, their system uses realization and retrieval,
i.e., ‘determining the concepts instantiated by a given individual’ and ‘determining
the set of instances that instantiate a given concept’ (Baader et al. 2003), respec-
tively. Using generic DL class constructors, they moreover consider specifying new
contexts in the form of an intersection of existing context classes.
Furthermore, in ACAI (Khedr and Karmouch 2005), agent-based context-aware
infrastructure for spontaneous applications, the authors divide the ontological
model into ‘levels of expressiveness’: relational ontology and dependency ontology
(used to represent the parameters of inference rules as ontology concepts and their
properties). To approach the problem domain, Bobillo et al. (2008) divide the
model into two ontologies: domain ontology and context ontology, in addition to
defining Context-Domain Relevance (CDR) ontology. The domain ontology
involves the entities, relations, and individuals pertaining to the domain being
modeled and context ontology describes the setting where the domain ontology is
applied, while CDR ontology is used to derive relevant knowledge from the domain
ontology.

5.7.7 Key Components, Features, and Issues


of Architectures of Context-Aware Systems

As described and illustrated in the previous chapter, context-aware systems are


based on a multilayered architecture, encompassing different, separate levels of
context information processing and abstraction. The design quality of such archi-
tecture is determined by how different computing components, e.g., sensors,
information processing components, and actuators, connect together and efficiently
interact with each other, as well as how context is operationalized and thus con-
ceptualized, encoded, processed, and managed in terms of supporting such features
as comprehensiveness, expressiveness, simplicity, reusability, uncertainty and
incompleteness handling, efficiency, soundness, dynamicity, interoperability,
224 5 Context Modeling, Representation …

coordination, scalability, and heterogeneity. To build architectures that support


context awareness is not an easy task, especially when it comes to data-intensive
systems, managing a large amount of information coming from a variety of sensors
in the form of observations and mapping them to corresponding properties defined
in ontologies, prior to inferring high-level context abstractions upon which adap-
tation decision are made to deliver relevant services. Therefore, one key feature of
the architecture of context-aware systems is to handle all the required sensors and,
ideally, to have the ability to handle changing sensors in a scalable manner to the
growing requirement for the system. Winograd (2001) discusses the advantages and
disadvantages of different architectures. And the literature on AmI includes a vast
range of AmI architectures that provide the appropriate infrastructure for
context-aware (or AmI) systems (Bravo et al. 2006). A large body of current
research in context awareness focuses on the architectures of the system, how to
manage and transfer the information around the system and how different com-
ponents of the system interrelate, including sensors, ontologies, networks, mid-
dleware, and interfaces. A wide variety of architectures of context-aware systems
have been proposed that employ a service infrastructure, where sensor and context
libraries, middleware (main processing), and network protocols are stored on a
central system and accessed via a ubiquitous network. The advantage of this
approach is that it simplifies ‘the tasks of creating and maintaining context-aware
systems’ (Hong and Landay 2001). The Sulawesi framework, which is developed to
support multi-modal interaction on a wearable device (Newmann 1999; Newmann
and Clark 1999), is a common integration platform that is ‘flexible enough to
encompass a wide variety of input devices, separating the service (agent) devel-
opment from the input mechanism’. The compelling feature of this approach to the
architecture of context-aware systems is that it separates all the software/protocol
handling the different input sensors (e.g., mouse, pen-input, GPS receivers, etc.)
from the set of software agents that analyze, interpret, and process the user’s
context. In addition, it allows for high adaptation of the context-aware system in
terms of changing the input sensors to suit different situations of or availability of
services to the user.
It is important to note that the complexity of conceptual context models in terms
of the context types they incorporate has implication for the design of
context-aware systems with regards to the number and diversity of sensors that
should be included in the overall structure of the context-aware system. The more
types of context and number of (diverse) sensors, the more complex the architec-
ture. Schmidt et al.’s (1999) context model involves a wide variety of sensors that
are associated with measuring physical and social environment, physiological and
cognitive states, and behavior and task. However, the actual implementation of the
rather comprehensive definitions of context in context awareness architecture tends
to consist of a number of explicitly defined features—much simpler concepts of
context are operationalized. The Context Toolkit by Dey et al. (2001) introduces
four categories of context information: identity, location, activity, and time. As an
approach to handle the diversity and number of sensors used in context sensing, the
Context Toolkit (Salber et al. 1999) attempts to remove the sensors from the
5.7 Context Models in Context-Aware Computing: Ontological Approach 225

application designer instead of directly dealing with multiple sensors, thus allowing
context-aware applications to be designed without having to worry about what
sensors are being used and evaluating the raw sensor data. In Gray and Salber
(2001), the authors discuss how the aspects of sensorized contextual information
should be taken into account when designing context-aware applications. Their
work focuses on what they label ‘the meta-attributes’ of sensorized contextual
information, as opposed to context information in general, such as sensory source,
representation forms, information quality, interpretation, reasoning, and actuation.
However, there have been attempts undertaken to incorporate various sensors into
context-aware systems in the desired manner. In ‘Building Distributed Context-
Aware Application’ (Urnes et al. 2001), the authors attempt to address the problem
of dynamically and automatically managing a multitude of location sensors, and
‘Jini Dynamic Discovery Protocol’ is utilized to interface with an arbitrary number
of location sensors and deliver their information to a position service. This protocol
is commonly employed to manage a wide variety of sensors.

5.7.8 Three-Layer Architecture of Context Abstraction

Sensors and ontological context models are needed to deal with context in a
computerized way in order to be supported in AmI environments. The context
recognition process entails acquiring sensor readings and mapping them to corre-
sponding properties defined in ontologies, aggregating and fusing multiple sensor
observations using context ontologies to create a high-level context abstraction, and
performing automated processing by allowing software agents to interpret infor-
mation and reason against ontological context—and making knowledge-based
intelligent decision as to what application actions to take in response to the user
needs. The idea is to abstract from low-level context by creating a new model layer
that gets the sensor perceptions as input and generates inferences and system
actions. Acquired without further interpretation, low-level context information from
physical sensors can be meaningless, trivial, vulnerable to small changes, or
uncertain (Ye et al. 2007). The derivation of higher level context information from
raw sensor values is a means to alleviate the issue of the limitation of low-level
contextual cues when modeling users’ behavior interactions that risks reducing the
usefulness of context-aware applications (Bettini et al. 2010). High-level context
abstraction is a layer that is referred to in the literature as situational context (e.g.,
Gellersen et al. 2002) or situation (e.g., Dobson and Ye 2006; Dey 2001). As a
higher level concept for a state representation, situation brings meaning to the
application so it becomes useful to the user by its relevant actions. In context-aware
applications, situations are external semantic interpretations of low-level context
(Dobson and Ye 2006). They allow for a higher level specification of human
actions in the scene and the corresponding application services (Bettini et al. 2010).
These can be of affective, cognitive, social, and communicative nature, and the
behavior of the AmI system is triggered by the change of situations. Compared to
226 5 Context Modeling, Representation …

Fig. 5.1 Overview of the different layers of semantic context interpretation and abstraction.
Source Bettini et al. (2008)

low-level contextual cues, situations are more stable and easier to define and
maintain and, thus, make design and implementation of context-aware applications
much easier because the designer can operate at a high level of abstraction rather
than on all context cues that create the situation (Ibid). Figure 5.1 illustrates the
basic ideas of what has been discussed up till now. The description of the three
layers, from the bottom to the top, of the pyramid are: sensor-based low-level
context information is semantically interpreted by the high-level context layer;
situations abstract from low-level data and are reusable in different applications; and
relationships defined between situations can provide for a further abstraction and
limitation of complexity (Ibid). The top layer has a narrow applicability in
approaches to context-aware applications, which usually focus on defining and
recognizing contexts/situations. Nevertheless, according to Bettini et al. (2008), one
motivation behind some approaches specifying and modeling situation relationships
‘is to considerably reduce the search space for potential situations to be recognized,
once the actual situation is known and knowing possible relationships (e.g.,
knowing possible successor situations of the current situation).’
Soldatos et al. (2007) present a context model in which situations represent
environmental state descriptions based on entities and their properties. In the sit-
uation model, states are connected with transitions, which can be triggered by
changes in the properties of observed entities. However, to include all potential
situations, their relationships and transitions is not always possible, particularly in
informal settings and scenarios (Bettini et al. 2010). Indeed, Soldatos et al. (2007)
note that their context model may seem not to be scalable due to the fact that the
situation states will hardly capture all possible contexts.
In all, establishing links between context information with sensor observations
through context properties defined in ontologies is a critical step in context
awareness functionality. The whole process of context awareness involving
low-level sensor data acquisition, middle-level data aggregation and fusion based
on context ontologies, and information interpretation and high-level context rea-
soning can be made more efficient and effective by employing faster and simpler
5.7 Context Models in Context-Aware Computing: Ontological Approach 227

functioning methods, i.e., streamlined in an integrated modeling, representation,


interpretation, and reasoning formalism. This allows a unified systematic approach
to the development of context-aware systems by exploiting seamless amalgamation
of a wide range of data, AI techniques, and AmI technologies. Furthermore, AmI
service task can take advantage of the computational understandability and pro-
cessability—at greater automation—of semantic content enabled by ontological
approaches, especially in their hybrid forms.

5.8 Hybrid Context Models

Hybrid approaches to context models have emerged as a recent challenging endeavor


in the field of context-aware computing. This perspective on context modeling and
reasoning aims at integrating different representation formalisms (composed by
different sublanguages) and reasoning techniques in order to obtain more flexible and
general systems. The key aim is to harness the functionality of context awareness as
to generating sophisticated high-level context abstractions, e.g., human activities and
behaviors. A wide range of hybrid ontological context models and reasoning tools
have been proposed and applied in context-aware architectures and applications, and
new hybrid approaches are under investigation aiming to address some of the current
open issues and challenges relating to existing methods for modeling context
information, especially concerning the formal conceptualization of such contextual
entities as the user’s emotional and cognitive states, complex human activities, and
social processes and relations. Such context entities or their combination with other
types of context are simply too complex to be modeled by a single representation
formalism or reasoning technique. Indeed, hybrid context models are needed because
ontological models are generally unsuited to the recognition of sophisticated context
data such as physical activities and emotional behaviors. Therefore, it is necessary
and fruitful to espouse an integration approach to enhance and expand the recog-
nition spectrum of context aspects based on an ontological approach, i.e., combining
rule-based reasoning, logic programing, probabilistic and/or statistical methods with
ontology-based representation and reasoning—inferences embedded in the ontology
axiomatic statements—when appropriate. In fact, ontologies have been employed to
provide a clear semantics to data derived through different reasoning techniques
(Bettini et al. 2010). Hybrid approaches seek to amalgamate different approaches to
obtain more dynamic, comprehensive, and efficient solutions to context modeling.
However, it is widely recognized that representation and reasoning are integrally tied
together (Brachman and Levesque 2004), and therefore any expressive representa-
tion remains computationally inefficient.
This implies that the complexity of representation formalism that might result from
an integration approach would require a very complex, yet well-suited, reasoning
mechanism. ‘Though a single expressive representation language fulfilling most of
the identified requirements could probably be defined, there are strong indications that
the resulting complexity of reasoning would make it useless in real-world scenarios.
228 5 Context Modeling, Representation …

In the area of knowledge representation, an alternative approach to the use of a single


very expressive formalism has been identified in hybrid knowledge representation
formalisms; i.e., formalisms composed by different sublanguages to represent dif-
ferent kinds of knowledge, and loosely coupled reasoning procedures. One of the
advantages of such formalisms is that the complexity of hybrid reasoning is generally
no worse than the complexity of reasoning with the single sublanguages.’ (Bettini
et al. 2010, p. 15).

5.8.1 Examples of Projects Applying Hybrid Approach


to Representation and/or Reasoning

Hybrid context modeling approaches may be based on a loose or complete inte-


gration between different models and reasoning tools. In the area of large-scale
distributed context-aware systems, COMPACT (Strimpakou et al. 2006) framework
for middleware for context representation and management in AmI adopts an
integration between traditional database-based and ontology-based model. In this
framework, the hybrid context modeling scheme aims to integrate the advantages of
both approaches, the semantic superiority and advantages of context ontology and
the administrative power of a relational, object-oriented, location-based context
model for distributing and managing context data, respectively. This is to achieve
maximum scalability and efficient context interpretation in large-scale distributed
context-aware systems. This hybrid context modeling approach is considered
adequate for addressing the heavy requirements of context awareness, by reducing
the inherent complexity of sharing, collaborating and synchronizing contextual
knowledge in an open and dynamic pervasive environment. The context knowledge
ontology called COMANTO describes general context types and interrelationships
that are not domain-application or situation-specific, and is integrated with the
location-based COMPACT context model which focuses on addressing context
management challenges in distributed pervasive environments. In addition, this
combined modeling approach aims to enable efficient management of context
information, support context data management mechanisms in distributed pervasive
environments, and allow for widely applicable context formalism. Overall, the
rationale for adopting hybrid context model in this framework is that the inability of
ontologies ‘to capture and process constantly changing information in a scalable
manner’, while ‘traditional relational context models address many traditional data
management aspects using a database-style management of context information’.
Enhanced context ontologies are capable of providing sophisticated reasoning
mechanisms, but cannot address critical data management challenges, which can be
addressed by classical context models that indeed exhibit prominent advantages in
areas where ontologies seem to fail, but do not allow feasible context taxonomy and
formalism adequate for fuelling context reasoning mechanisms (Strimpakou et al.
2006). Protégé (Gennari et al. 2003) ontology editor and knowledge-base
framework has been used to implement the COMANTO ontology in OWL.
5.8 Hybrid Context Models 229

The InstanceStore system proposed by Horrocks et al. (2004) is also based on the
idea of improving the efficiency of reasoning with OWL-DL based on the use of
relational database techniques.
Bettini et al. (2008) survey two hybrid approaches to context modeling: one
approach is a loosely coupled markup-based/ontological model, the CARE frame-
work for context awareness proposed by Agostini et al. (2009); and the other
approach, proposed by Henricksen et al. (2004), combines ontological approach
with fact-based approach proposed by the Context Modeling Language (CML). The
CARE framework espouses ‘a context modeling approach that is based on a loose
integration between a markup model and an ontological model. The integration
between these models is realized through the representation of context data by
means of CC/PP profiles which contain a reference to OWL-DL classes and rela-
tions. In order to preserve efficiency, ontological reasoning is mainly performed in
advance with respect to the service provision. Whenever relevant new context data
is acquired, ontological reasoning is started, and derived information is used, if still
valid, at the time of service provisioning together with efficient rule evaluation.
Complex context data (e.g., the user’s current activity) derived through ontological
reasoning can be used in rule preconditions in order to derive new context data such
as user preferences.’ (Bittini et al. 2008, p. 15). As to the hybrid fact-based/
ontological model, ‘the aim is to combine the particular advantages of CML models
(especially the handling of ambiguous and imperfect context information) with
interoperability support and various types of reasoning provided by ontological
models. The hybrid approach is based on a mapping from CML modeling con-
structs to OWL-DL classes and relationships. It is worth noting that, because of
some expressivity limitations of OWL-DL, a complete mapping between CML and
OWL-DL cannot be obtained. With respect to interoperability issues, the advan-
tages gained by an ontological representation of the context model are clearly
recognizable. However, with respect to the derivation of new context data, expe-
riences with the proposed hybrid model showed that ontological reasoning with
OWL-DL and its SWRL extension did not bring any advantage with respect to
reasoning with the CML fact-based model. For this reason, ontological reasoning is
performed only for automatically checking the consistency of the context model,
and for semantic mapping of different context models.’ (Ibid). Furthermore, with
respect to fact-based models, the CML, which provides a graphical notation
designed to support software engineering in the analysis and formal specification of
the context requirements of context-aware applications, offers various advantages,
including: capturing ‘the heterogeneity of the context information sources, histories
(timeliness) of context information’; providing ‘an easy mapping from real-world
concepts into modeling constructs’; providing ‘a good balance between expressive
power and efficient reasoning procedures for evaluation of simple assertions about
context and for reasoning about high-level context abstractions…expressed as a
form of predicate logic’, which is ‘well suited for expressing dynamic context
abstractions’. However, CML is less expressive than OWL-DL and ‘a possible
shortcoming of CML with respect to more expressive languages is the lack of
support for hierarchical context descriptions. Moreover, even if CML supports
230 5 Context Modeling, Representation …

Fig. 5.2 Context reasoning architecture. Source Lassila and Khushraj (2005)

queries over uncertain information through a three-valued logic, a deeper support


for modeling and reasoning about uncertainty is desirable’.
In addition, Lassila and Khushraj (2005) adopt a hybrid reasoning process,
merging rule-based reasoning and DL reasoning in their architecture by adding the
former on top of the latter, as in the architecture illustrated in Fig. 5.2. This is
adopted to solve the problem of the lack of procedural attachments associated with
DL reasoning. The authors present the algorithm indicating the combination of a
rule-based reasoned and a DL reasoner.
Khushraj et al. (2004) extend tuple spaces combining ontology-based reasoning
and query processing. Integration of tuple spaces and ontology-based representation
is done through enforcing an object field in every tuple to contain a DAML+OIL
individual (Horrocks 2002); checking the consistency of DAML+OIL individuals
from newly written tuples is done by a reasoner before they are committed to the
knowledge base; and querying is done using query template which basically con-
trols what queries are send to the reasoned. The queries are performed and the query
results combined by a special matcher agent according to a tailored algorithm using
a special matcher agent. As a predecessor of OWL, DAML+OIL is the basis of the
context model of the GAIA (Ranganathan et al. 2004), middleware for active
spaces, where reasoning for deriving new context data is performed by means of
rule-based inferencing and statistical learning. Riboni and Bettini (2009) also
combined ontological and statistical reasoning in context-aware activity recognition
framework to recognize physical activities. More to hybrid approaches, Agostini
et al. (2005) suggest merging rule-based reasoning with DL reasoning. In their
approach, OWL-DL ontology is used to represent context with user profiles being
represented separately using CC/PP, and by outcomes of executing the rule-based
system not being stored to the ABox as a key feature of their reasoning approach,
the data flows from the ABox to the rule-based system, following a unidirectional
pattern. Ontology and probabilistic model have been used in Yamada et al. (2007)
to infer human activity from surrounding things.
5.8 Hybrid Context Models 231

5.8.2 Towards a Hierarchical Hybrid Model

Hierarchical hybrid models are assumed to bring clear advantages in terms of the set
of the requirements defined for a generic context model used by context-aware
applications. For example, they can provide solutions to overcome the weaknesses
associated with the expressive representation and reasoning in description logic, as
discussed above. Bettini et al. (2008) contend that there is likelihood to satisfac-
torily address a larger number of the identified requirements by hierarchical hybrid
context model if hybrid approaches can be further extended to design such a model.
They propose a model that is intended to provide a more comprehensive solution, in
terms of expressiveness and integration of different forms of reasoning. In the
proposed model, the representation formalism used to represent data retrieved from
a module executing some sensor data fusion technique should, in order to support
the scalability requirements of AmI services, enable the execution of efficient
reasoning techniques to infer high-level context data on the basis of raw ones by,
for example, executing rule-based reasoning in a restricted logic programing lan-
guage. As suggested by the authors, a more expressive, ontology-based context
model is desirable on top of the respective representation formalism since it
inevitably does not support a formal definition of the semantics of context
descriptions. As illustrated in Fig. 5.3, the corresponding framework is composed of
the following layers:

Fig. 5.3 Multilayer framework. Source Adapted from Bettini et al. (2010)
232 5 Context Modeling, Representation …

• Layer 1: This layer, sensor data fusion, can be organized peer-to-peer network of
software entities and is dedicated to acquire, process (using techniques for
sensor data fusion and aggregation) and propagate raw context data in the AmI
space in order to support cooperation and adaptation of services (see Mamei and
Zambonelli 2004). The arrows depict the flow of context data. At this layer,
signal processing algorithms are used to process raw context data from sensor
signals.
• Layer 2: This layer involves shallow context data representation, integration
with external sources, and efficient context reasoning, and particularly includes
module for efficient markup-based, RDF-based, or DB-based representation and
management of context data; modules for efficient shallow reasoning (logics-
and/or statistics-based); and data integration techniques for acquiring data from
external sources and for conflict resolution. The highly dynamic and hetero-
geneous outputs of the layer 1 put hard demands on the middle layer.
• Layer 3: This layer involving realization/abstraction process to apply ontological
representation and reasoning aims to specify the semantics of context terms,
which is critical for sharing and integration; to check consistency of the set of the
concepts and relationships describing a context scenario; and to provide an
automatic procedure to classify sets of context data (particular sets of instances of
basic context data and their relationships) as more abstract context abstractions.

5.8.3 Limitations of Hybrid Context Models

Although they have proven to be advantageous in addressing many of the require-


ments set for a generic approach to context representation and reasoning, hybrid
context models remain inadequate to such complex technical issues as dynamicity,
heterogeneity, scalability, fuzziness, and uncertainty pertaining to context knowl-
edge representation and reasoning. There are still open issues associated with the
integration of diverse reasoning techniques, e.g., ‘how to reconcile probabilistic
reasoning with languages not supporting uncertainty’, and how to integrate the
conceptual foundations of different representation formalisms, e.g., ‘the open-world
semantics of ontologies with the closed-world semantics of DB-based models and
logic programing’ (Bettini et al. 2010). In this line of thinking, Gruber (2009) point
out that ontologies constitute models of data at the ‘semantic’ level while database
schema constitute models of data at the ‘physical’ level because ontology languages
are closer to first-order logic than those used to model databases in terms of
expressive power, and by being independent from lower level data models, ontolo-
gies are used to specify interfaces to independent knowledge-based services, inte-
grate heterogeneous databases, and enable interoperability among disparate systems.
Novel approaches, methods, and formats are needed to tackle the challenges
pertaining to modeling of human interaction into new computing environments that
5.8 Hybrid Context Models 233

are posed by real-world situations and complex human functioning—e.g., emotion,


cognition, and social processes. This has strong implications for the performance of
context-aware applications as they need to be timely in acting as well as for the
reliability of such systems as they need to take action proactively. Hybrid context
model approaches are far from panaceas to the problems of context representation
and reasoning for they also suffer from a number of limitations, which usually
surface as applications become of larger scale, more complex, and generic. Current
hybrid context models coming in many forms are designed based on different
perspectives depending on the involved researchers. And the outcome is thus based
on how the effectiveness and suitability of a given hybrid context model is per-
ceived as to addressing issues of modeling and reasoning techniques posed by
existing systems in a particular application domain. So different hybrid models
could easily be criticized by the researchers who work with an approach different
from that for which they have been created. However, the all the researchers
involved in the development of hybrid context models agree that they solve some
problems and create new ones or fail to provide adequate solutions. So why would
these different researchers define their hybrid approaches so differently? Perhaps it
is because they provide arguments to support their definition of hybrid approaches
and its associated purposes. In a nutshell, current hybrid approaches are scientifi-
cally subjective and driven by what is technologically feasible. Indeed, although a
multitude of representation and reasoning modules can be loosely or completely
combined at the semantic and inference levels of contextual models, such combi-
nations fail to enable context-aware applications to address the issue of the rele-
vancy and accuracy of service provision and to handle the multiplicity of services
over large-scale distributed systems. There are still heterogeneity issues relating
context-aware services although context models have been proposed to bridge the
semantic gaps between services. The increasing number of users and context-aware
services—which depend on various multimodal sensing tools and processing
algorithms as well as the integration of an array of representation sublanguages and
reasoning techniques—makes it difficult to overcome the problem of heterogeneity.
It is becoming evident that novel approaches—beyond hybrid models—are needed
to deal with the sensor data coming from multiple sources and associated with
uncertainty and the representation of and reasoning on diverse, interrelated con-
textual entities with different levels of complexity in the context of ever-growing
and data-intensive AmI applications. Given that solutions for context-aware com-
puting seem to be always technology-driven, the most innovatively suitable solu-
tions—highly needed—are those that can address technical open issues and
challenges little beyond the constraints of existing technologies or the evolutionary
changes of technology—e.g., nanotechnology, nano-engineering. The principles
upon which many hybrid approaches are built are only extrapolated from engi-
neering and design sciences, which seem to fail to address the complexity inherent
in, and to handle unpredictable behavioral patterns of, AmI technologies that
continue to challenge the foundations of computer science. The amount of context
data will multiply and grow manifold and more sophisticated as the number of
people living in AmI environments will increase over time, adding to the multitude
234 5 Context Modeling, Representation …

of context providers and sources involved in large-distributed systems, coupled


with the real-time retrieval, dissemination, and query of context data and related
service delivery by multiple, disparate systems. Attaining seamless mobility across,
and overcoming the heterogeneity of, devices, models, applications, and services in
AmI environments seem to be a daunting challenge. And realizing the vision of
AmI, the full potential of AmI technology, may be unattainable when looking at the
reality of the evolutionary paths of, and the constraints of exiting, technology,
adding to the weaknesses inherent in artificial systems in general.

5.9 Modeling Emotional and Cognitive Contexts or States

The process of capturing, modeling, and representing emotional and cognitive states
and behaviors is one of the most difficult computational tasks in the area of HCI.
Moreover, while ontologies allow formal, explicit specification of some aspects of
human emotion and cognition as shared conceptualization, they have not matured
enough to enable the modeling of interaction between emotion and cognition as two
distinct knowledge domains, whether in the area of context-aware computing,
effective computing, or computational intelligence. Current ontologies consist of the
concepts and their relationships pertaining to emotional states (or emotion types),
cognitive states or activities, or communicative intents. Emotion ontologies have
thus been used in diverse HCI application domains within AmI and AI, including
context-aware computing (e.g., emotional context-aware systems, social intelligent
systems), affective computing (e.g., emotion-aware systems, emotionally intelligent
systems), and computational intelligence (e.g., dialog acts, conversational systems).
Further, emotional and cognitive elements of context significantly affect interaction
in everyday life; therefore, they must influence and shape the interaction of users
with computational artifacts and environments. In human interaction, emotional and
cognitive states, whether as contextual elements or communicative messages, can be
conveyed through verbal and nonverbal communication behavior as a reliable
source. Human communication is highly complex, manifold, subtle, fluid, and
dynamic, especially in relation to the interpretation and evaluation of behaviors
conveying contextual information. Likewise, the interpretation and processing of
emotional and cognitive states has proven to be a daunting challenge to emulate as
part of human mental information-manipulation processes and what this entails in
terms of internal representations and structures of knowledge. This carries over its
effects on making appropriate decisions and thus undertaking relevant actions, e.g.,
delivering adaptive and responsive services. This implies that capturing, repre-
senting, and processing emotional and cognitive elements of context requires highly
sophisticated computational techniques. It is not an easy task to deal with human
factors related context in a computerized way, and novel context models are more
needed than ever. Context awareness technology uses verbal and nonverbal cues to
detect people’s emotional and cognitive states through reading multimodal sources
using dedicated multiple, diverse sensors and related multi-sensor data fusion
5.9 Modeling Emotional and Cognitive Contexts or States 235

techniques, but the interpretation and processing of the multimodal context data
collected from sensorized AmI environments should be supported by powerful
modeling, representation, and reasoning techniques to offset the imperfection and
inadequacy of sensor data, so that context-aware applications can adapt their
behavior in response to emotional or/and cognitive states of the user. A great variety
of advanced theoretical models of emotion and cognition and myriad new findings
from very recent studies are available, but most work in developing context-aware
and effective systems seems to be technology-driven, by what is technically feasible
and computationally attainable, and also, a large body of work on emotional and
cognitive context-aware and effective systems tend or seem to operationalize con-
cepts of related states that are rather simple compared to what is understood as
psychological states in cognitive psychology, neurocognitive science, and the phi-
losophy of mind as academic disciplines specialized on the subject matter.
However, the semantic expressiveness and reasoning power of ontologies makes
thus far ontological approach a suitable solution to emotional and cognitive context
modeling used by context-aware applications. Indeed, in relation to emotions,
ontology allows flexible description of emotions at different levels of conceptual-
ization, and it is straightforward to develop conceptual ontological models that
enable such a logical division, as exemplified below. Still, modeling, representation,
and processing of emotional context, in particular, is regarded as one of the most
challenging tasks in the development of context-aware applications, as the specifics
of such context in real life are too subjective, subtle, dynamic, and fluid. And
cognitive states are too tacit, intricate, dynamic, and difficult to identify—even for
the user to externalize and translate into a form intelligible to the system—to be
modeled. In fact, it is more intricate to computationally deal with cognitive context
than emotional context as the former is in most cases of an internal nature, whereas
the latter is often dealt with externally via affect display (see Chap. 8). It is difficult
to recognize the cognitive context of the user (see Kim et al. 2007).
Overall, emotional and cognitive context systems are based on a layered
architecture whose design quality is determined by the relevance of the multiplicity
and diversity of sensors embedded in the system and spread in the environment as
well as the level of the semantic expressiveness and the automation of intelligent
processing (interpretation and reasoning) pertaining to the recognition of emotional
and cognitive states. To build architectures that support emotional and cognitive
context awareness is far too complex compared to other types of context, as they
involve dynamic acquisition techniques, multi-sensor data fusion approaches,
specialized recognition algorithms, complex mappings techniques, e.g., mapping
patterns of facial expressions, gestures, and voice every second as sensor readings
to corresponding properties defined in respective context ontologies to create
high-level context abstractions—emotional or cognitive states. Given the com-
plexity inherent in human emotion and cognition, representing concepts of related
contexts and their relationships and reasoning against related information should
integrate various approaches to modeling and reasoning in order to enhance the
quality of context inference process, i.e., the transformation of the atom contexts
into a higher level of the contexts.
236 5 Context Modeling, Representation …

5.10 Examples of Ontology Frameworks: Context-Aware


and Affective Computing

Most proposed ontology-based frameworks used for developing emotional and


cognitive context-aware and emotion-aware systems are typically based on
knowledge representation and reasoning techniques from AI and borrow theoretical
models from cognitive psychology and cognitive science. There are, however, some
technical differences and details at the application level of theoretical models of
emotion, in particular, depending on the application domain, e.g., emotionally
intelligent systems, emotion-aware systems, emotional context-aware systems,
context-aware affective systems, and conversational systems. Also, these systems
vary in the extent to which they focus on context in their functioning, as long as
they are intended to provide different types of emotional services to the user.
Regardless of the domain and type of application, it is important to critically review
operationalizations of emotional and cognitive states in computational artifacts and
their impact on how such states are conceptualized and represented. This has
implications for the efficiency of the inference of high-level context abstraction as
well as the appropriateness of the application actions—in other words, the rele-
vancy of the adaptive and responsive services that are intended to meet the user’s
emotional and cognitive needs. A few selected application examples of recent
projects applying an ontological approach to emotional and cognitive context
modeling are presented and described.

5.10.1 AmE Framework: A Model for Emotion-Aware AmI

In an attempt to facilitate the development of applications that take their user’s


emotions into account and participate in the emotion interaction, Zhou et al. (2007)
propose AmE framework: a model for emotion-aware AmI, as illustrated in Fig. 5.4.
This (preliminary) framework integrates AmI, affective computing, emotion ontol-
ogy, service ontology, service-oriented computing, and emotion-aware services. It
espouses an ontological approach to emotion modeling and emotion-aware service
modeling. Emotion modeling is, in this framework, responsible for two components:
emotion detection and emotion motivation acquisition. The former component
identifies positive and negative emotions that are represented by emotion actions
through facial expressions, hand gestures, body movements, and speech, whereas the
latter recognizes the intention of the emotion. Emotion-aware service modeling is
responsible for reacting to the identified emotion motivations by creating services,
delivering services (supplying appropriate emotion services to the users), and
managing the delivery of emotional services. The service creation involves emotion-
aware service composition (assembling existing services) and emotion-aware service
development (creating new services in response to identified emotion motivation).
5.10 Examples of Ontology Frameworks … 237

Fig. 5.4 The ambient intelligence framework. Source Zhou et al. (2007)

According to the authors, the services can help users to carry out their everyday
activities, generate emotional responses that positively impact their emotions, and
train them mediate some aspects of their emotional intelligence associated with the
perception, assessment, and management of their emotions and those of others.
Building on Goleman’s (1995) mixed models of emotional intelligence, the author
mentions that self-awareness, self-management, social skill and social awareness as
emotion capabilities are fulfilled in an emotion experience. In this framework, it is
assumed that the emotional state generated by the user is the context according to
which the responsive services are delivered, and their context consists of cul-
tural background, personal knowledge, present human communication, legacy
emotion positions, and so on, and also refers to emotion situation that produces
emotions.
However, the authors give no detail of how they address the issues of the
non-universality of emotions—i.e., emotions are interpreted differently in various
cultures. It is important to underscore that a framework that is based on common
emotion properties could work in one cultural setting and might not in another.
Rather, emotional context-aware applications should be culture specific and thus
designed in a way to be tailored to cultural variations of users if they are to be
widely accepted. Also, this framework does not provide information on whether the
expressed emotion (negative or positive) is appropriate for the context or situation it
is expressed in, a criterion that is important in the case of further implementation of
Ability EIF. This is a useful input to consider in the final model for emotion-aware
AmI. The premise is that a system cannot help users to improve their emotional
intelligence abilities if it is not emotionally intelligent itself. Indeed, as part of the
authors’ future work is to investigate the feasibility and applicability of mediating
human emotional intelligence by providing ambient services. Further, contextual
appropriateness of emotions, whether displayed through vocal or gestural means, is
a key element in understanding emotions, which is in turn a determining factor for
providing relevant responsive services. There is much to study to be able to
238 5 Context Modeling, Representation …

implement the AmE framework and develop application prototypes. Indeed, the
authors point out that there is a need for further research with regard to ‘emotion
structure in English conversation for detecting emotions and identifying emotion
motivations’ as well as ‘emotion services modeling for a pervasive emotion-aware
service provision responding to emotion motivations’. Ontological modeling of
emotional context should take into account the complexity and the context
dependence of emotions rather than simple emotion recognition—valence classi-
fication, in order to create effective affective context-aware applications. Moreover,
other sources of emotional cues (e.g., psychophysiological responses) may need to
be incorporated in the development of emotional context-aware applications.
Authors have mainly described models for the communication of emotions via
speech, face, and some contextual information (Obrenovic et al. 2005).

5.10.2 Domain Ontology of Context-Aware Emotions

In Cearreta et al. (2007), the authors propose a generic approach to modeling


context-aware emotions, domain ontology of context-aware emotions, taking dif-
ferent theoretical models of emotions into account. This ontology is defined based
on references found in the literature, introduces and describes important concepts
and mechanisms used in the affective computing domain to create models of
concrete emotions. The authors state that this application ontology contains all the
necessary concepts to model specific applications, i.e., affective recognizers in
speech, and enables description of emotions at different levels of abstraction while
serving as a guide for flexible design of multimodal affective devices or
context-aware applications, independently of the starting model and the final way of
implementation. This domain ontology of context-aware emotion collects infor-
mation obtained of different emotion channels (e.g., facial expressions, postural
expressions, speech paralinguistic parameters, psychophysiological responses),
providing the development of multimodal affective applications. The authors
maintain that this generic ontology can be useful for the description of emotions
based on the various systems of emotion expression and detection which are
components that constitute user context. As illustrated in Fig. 5.5, concepts in the
domain ontology are grouped into seven global modules, representing different
aspects related to emotion modeling.
Emotion module: Describe the emotion of the user within a context, which can
be, according to Lang (1979), composed by one or more kinds of emotional cues,
such as verbal, facial, gestural, speech paralinguistic, and psychophysiological. The
emotion of the user is influenced by the context he/she in and may change
throughout time for different reasons. Moreover, different models and theories (e.g.,
categorical, dimensional, appraisal) can be used to represent emotions in different
ways.
5.10 Examples of Ontology Frameworks … 239

Fig. 5.5 Relationship among modules in the domain ontology of emotional concepts. Source
Cearreta et al. (2007)

Theory module: Describes the main types of theories, such as dimensional (Lang
1979), categorical (Ekman 1984), and appraisal (Scherer 1999). For each type of
which the emotion can be represented in a different way.
Emotional cue module: Depicts external emotional representations in terms of
different media properties. An emotional cue will be taken into account more than
another one depending on the context in which the user is. To take into account all
emotional cues and complete emotion, each type of emotional cue corresponds to
each one of the three systems: verbal information, conductal information, and
psychophysiological responses, as proposed by Lang (1979).
User context module: Defines the user context which consists of different context
elements or entities: personal, social, task, environment, and spatiotemporal (Göker
and Myrhaug 2002). This is to take into account the complexity of emotion
dependence of and the influence by context in an actual moment.
Context element module: Describes the context representation in terms of dif-
ferent context elements. Various factors can have an effect on emotion expression
and identification, e.g., verbal cues relate to the user language. As an important
contextual aspect when it comes to emotion detection, different emotional cues can
be taken into account according to user context, e.g., in the case of darkness, speech
emotional cue will be more relevant and facial emotional cue may not be so. Indeed,
not all emotional cues can be available together as context affects cues that are
relevant.
Media property module: Describes basic media properties for emotional cues,
which are used for description of emotional cues. These media properties are
context-aware, e.g., voice intensity value is different depending on the gender in
terms of personal context element. A media property can be basic as voice intensity
or derived as voice intensity variations.
Element property module: Describes properties for context elements, which are
used for description of context elements. In the same manner as to media property, a
context element property can be basic (e.g., temperature) or derived (e.g., mean
temperature). An emotional context can be composed by voice intensity, temper-
ature, facial expression, speech paralinguistic parameters, and with the composition
with the other context elements, user context is to be completed.
240 5 Context Modeling, Representation …

The compelling feature of this ontology is that it can be used by developers to


construct tools for a generic description of emotions that can be personalized to
each user, language, and culture; establish a formal description of emotions in an
easily understandable way; and enables more abstract description of emotions in
various ways. The authors suggested a practical example using the proposed
domain ontology: application ontology for describing speech emotional cue,
depending on the user context-aware model of emotion, and represented it using
Ekman’s (1984) taxonomy—categorical model of emotion. This application
ontology is useful for developing speech recognition systems, and can be used for
various purposes, e.g., a description or metadata about some emotions, a part of
user profile. As a future work, the authors mentioned that they were working on
applications that could be parameterized for particular emotional cues, using models
of emotions and user context. They pointed out that generic applications could be
created merging emotional cue module and context module, parameterized with
developed models. In addition, they planned to take more models related to emotion
and user context into account in addition to the emotions they studied according to
categorical, dimensional and appraisal models of emotions via other modalities
apart from speech and more contextual information.
Obrenovic et al. (2005) also propose an ontology solution for description of
emotional cues. McIntyre and Göcke (2007) propose a novel approach to affective
sensing. In their approach, they use a generic model of affective communication and
a set of ontologies to analyze concepts and to enhance the recognition process,
taking the context and different emotion classification methods into account.

5.10.3 Cognitive Context-Aware System: A Hybrid


Approach to Context Modeling

Research on cognitive aspects of context is still in its infancy. While there is a large
part of a growing body of research on context awareness technology that investigate
approaches to context information modeling and reasoning for context information,
it appears to be a less active work on cognitive context modeling. Based on the
literature on context awareness, a very few methods for capturing, representing, and
inferring cognitive context have been developed and applied. And the few practical
attempts to implement cognitive context are far from real-world implementation, so
concrete applications using algorithmic approach have not been performed.
Noticeably, frameworks for developing cognitive context-aware applications are
way less than those for developing emotional ones. In a cognitive context system
proposed by Kim et al. (2007), ontology is proposed to implement components of a
prototype deploying inference algorithms, and a probabilistic method is used to
model cognitive context. Therefore, this approach may be classified as of a hybrid
category.
5.10 Examples of Ontology Frameworks … 241

5.10.3.1 Inference and Service Recommendation Algorithms


and the Prototype Framework

In a study carried out by Kim et al. (2007), the authors propose the context
inference and service recommendation algorithms for the Web-based information
system (IS) domain. The context inference algorithm aims to recognize the user’s
intention as a cognitive context within the Web-based IS, while the service rec-
ommendation algorithm delivers user-adaptive or personalized services based on
the similarity measurement between the user preferences and the deliver-enabled
services. In addition, the authors demonstrate cognitive context awareness on the
Web-based IS through implementing the prototype deploying the two algorithms.
The aim of the proposed system deploying the context inference and service rec-
ommendation algorithm is to help the IS user to work with an information system
conveniently and enable an existing IS to deliver AmI services. However, to apply
the context inference algorithm—that is, to recognize a user’s intention, which is
regarded as a cognitive context, the sources that the user uses on the Web-based IS
should be discerned and then the representatives of each source category should be
extracted and classified by means of a text categorization technique. For example, a
user may browse or refer to various sources, such as Web pages, PDF documents,
and MS Word documents and these sources used by the user (while using the
Web-based IS) reflect his/her intention as a cognitive context, which can be inferred
by considering the combination of each source category synthetically. The obtained
categories, representative of sources resulting from the text categorization process,
are matched with the predefined categories in the IS context-category memory,
which contains various IS contexts (e.g., business trip request for conference
attendance, business trip request for international academic exchange, book pur-
chasing request, etc.). The predefined categories are assigned to each of these IS
contexts. The IS context, which can be extracted through the IS structure using the
content analysis, is the user’s intention or cognitive context that should be inferred.
It is determined after the process of comparing and scoring the categories has
completed. The perception—recognition and interpretation—of a user’s cognitive
context enables the system to recommend a personalized service to the user, using
service recommendation algorithm that selects user-adaptive services from the list
considering the user preferences recognized normally in advance. The relevant
service is extracted from a deliver-enabled service list, which is obtained by using
the inferred context and the user’s input data. Given the controversy surrounding
the invisibility notion driving context awareness, it would be more appropriate in
terms of how the system should behave to present the context-dependent infor-
mation or service and let the user decide what to do with it. Context-sensitive
information or service is always useful to the user, but the recommendation in
context-aware computing is that by priming information or service with contextual
features or providing information or service that is right for the context, the per-
formance in terms of speed of response as to finding answers in the information
should still increase. Figure 5.6 illustrates the context inference and service rec-
ommendation framework.
242 5 Context Modeling, Representation …

Fig. 5.6 Context inference and service recommendation and procedure. Source Kim et al. (2007)

As it was mentioned earlier, in addition to the context inference and service


recommendation algorithms, the authors demonstrate cognitive context awareness
on the Web-based IS through implementing the prototype deploying the proposed
algorithms. The overview of this prototype is illustrated in Fig. 5.7.
As shown in this framework, the prototype consists of two engines: a context
inference engine and a service recommendation engine. The context inference
engine is made up of a text categorization module and a context decision module. In

Fig. 5.7 Prototype framework. Source Kim et al. (2007)


5.10 Examples of Ontology Frameworks … 243

the text categorization module, a Support Vector Machine (SVM) is used as a


classifier for an accurate categorization given that this supervised learning algorithm
shows the highest performance compared to other classifiers. SVM algorithm is
used to implement the text categorization module. Free software for text catego-
rization and extraction called Minor third v.2.5.8.3 is used to conduct pilot tests
with sample texts. The context decision module is responsible for confirming the
user’s cognitive context using category data transferred by the text categorization
module. As far as the service recommendation engine is concerned, it is composed
of a context requirement decision and a service decision module. Using the inferred
context and the IS context data in the IS context-category memory, the context
requirement decision module finds the system requirements needed to execute the
user’s cognitive context, and also requests the user to input relevant minimal
information, including user ID and password which are necessary to login an IS.
This is needed to grasp the user’s cognitive context. The service module, on the
other hand, extracts service lists from external Web sites applying agent technology
and delivers user-adaptive services considering the user’s cognitive context as well
as user preferences. The prototype is implemented using C++ programing language,
which is generally associated with such application domains as systems software,
application software, embedded software, and high-performance server. C++ is
designed to comprehensively support multiple programing styles including data
abstraction and object-oriented programing. However, the author state that the
algorithms need more work in order to be suited for more practical situations. One
of their goals is to improve the algorithms to become robust enough for real-world
implementation. As to the prototype, they mentioned that they faced some technical
issues, namely the proposed system deploying the suggested algorithms are focused
only on a specific IS (POSIS) rather than being compatible with other systems
through deploying more generalized algorithms, adding to the technical limitations
of multi-screen control, agent technology.

5.10.3.2 Context Categorization and the Inference Algorithm

Considering the emphasis of this chapter, it may be worth elaborating further on the
categorization method as part of the inference algorithm used in this system. Text
content-based categorization is used as a method for categorizing documents, an
approach which is, as pointed out by the authors, ‘based on machine learning,
where the quality of a training text influence the result of the categorization criti-
cally’. This approach, according to Pierre (2001, cited Kim et al. 2007) can render
good results in a way that both is robust as well as makes few assumptions about
the context to be analyzed. As one of the areas in text mining, this method auto-
matically sorts text-based documents into predefined categories, e.g., a system
assigns themes such as ‘science’, ‘sports’, or ‘politics’ to the categories of general
interest (Kim et al. 2007). The authors state that this approach involves machine
learning to create categorizers automatically, a process that ‘typically examines a
set of documents that have been pre-assigned to categories, and makes inductive
244 5 Context Modeling, Representation …

abstractions based on this data that will assist it in categorizing future documents’,
assuming that quality of a training text has a critical impact on the categorization
result. To perform text categorization, features should be extracted and then
weighted, including the following steps: tokenizing, word stemming, and feature
selection and weighting (see Kim et al. 2007, for a brief description of the steps).
Categorizing each reference used by the user using text categorization techniques is
the first step to infer the user’s cognitive context within the Web-based IS through
the context inference algorithm. That is, the category data are used to infer the
user’s cognitive context. The IS context-category memory which contains the IS
contexts and categories that are to be matched with the categories derived from the
representative extraction phase to infer the user’s cognitive context should be based
on ontology given its advantages in, according to Khedr and Karmouch (2005, cited
Kim et al. 2007) enabling the system to perceive the abundant meaning utilizing the
inheritance and attribute data, in addition to its expressiveness power that allow to
understand the real meaning of an item. The context inference algorithm involves
three steps: (1) after the first reference category is determined, the IS context that
includes this category in the IS context-category memory is activated; (2) the same
goes for the second reference category; (3) if the user acknowledges this context
positively, the algorithm function is terminated and the selected context is deter-
mined by the system as the user’s cognitive context; otherwise, step 2 is repeated.

5.11 Key Benefits of Context Ontologies: Representation


and Reasoning

Research shows that a large part of the recent work in the field of context-aware
computing applies an ontology-based approach to context modeling. Various
application-specific, generic, and hybrid context ontology solutions have indeed
been proposed and adopted in a wide range of architectures for context awareness.
Context ontologies seem to provide intuitive benefits for the development of
context-aware applications and compelling features for their implementation. They
allow context to be recognized through direct semantic reasoning that make
extensive use of semantic content—descriptions and domain knowledge. Numerous
studies (e.g., Bettini et al. 2010; Strimpakou et al. 2006; Gu et al. 2005; Khedr and
Karmouch 2005; Chen et al. 2004a, b, c; Korpip et al. 2005; Korpipaa et al. 2003;
Wang et al. 2004; Strang et al. 2003) have demonstrated the benefits of using
ontology-based context models. Evaluation of a research work on context modeling
(Strang and Linnhoff-Popien 2004) shows that the usage of ontologies exhibits
prominent benefits in AmI environments. Enriching context-aware applications
with semantic knowledge representation provides robust and straightforward
techniques for describing contextual facts and interrelationships in a precise and
traceable manner (Strang et al. 2003). Moreover, context ontologies address the
need of applications to access a widely shared representation of knowledge.
5.11 Key Benefits of Context Ontologies: Representation and Reasoning 245

Knowledge sharing is of particular importance in AmI environments, in which


different heterogeneous and distributed components must interact for the exchange
of users’ context information. Sharing of both technologies and knowledge domains
is a major strength of ontology in context recognition. And ontologies enable
integration and interoperability with regard to shared structure and vocabulary
across multiple models and applications. They moreover allow for an easy capture
and encoding of real-world concepts—rich domain knowledge—in a computa-
tionally understandable and processable manner; scalability of context information
management and to a large number of users and contexts; multilevel and dynamic
recognition of context patterns; efficient computational performance of reasoning;
and so on. Another clear advantage of ontological context models is usability of
modeling formalisms; graphical tools make the design of such models viable to
developers that are not particularly familiar with description logics (Bettini et al.
2010). In terms of the set of the requirements defined for generic context models,
Bettini et al. (2008, p. 15) point out that ontological models have clear benefits
regarding support for heterogeneity, and ‘since they support the representation of
complex relationships and dependencies among context data, they are particularly
well suited to the recognition of high-level context abstractions’; however they do
not fulfill all the requirements for a generic context information modeling and
reasoning approach (Ibid).

5.12 Context Ontologies: Open Issues and Limitations

Enormous challenges lie in developing and implementing context-aware systems on


different scales, and system modeling (of context) is one of these challenges within
AmI research and practice. There is still much to be done in this regard, as both the
current state-of-the-art context models and research on ontological context models
do not seem yet to provide and find, respectively, the most suitable and robust
solutions for data-intensive AmI systems. The field of context-aware computing
still lacks a major breakthrough—the evidence does not yet show that ontologies as
systems would meet all requirements, although using ontologies is de facto standard
in context awareness, as there is no work reporting large scale deployment
(Perttunen et al. 2009). This implies that there are many issues that need to be
addressed and overcome in order to effectively deal with context information in
sensorised AmI environments, from the flow of the sensor data, through mapping
the sensor readings with concepts and properties in ontologies, to the real-time
processing of context information. Various requirements have been defined to help
create generic and profound context models to ease, in turn, the development of
context-aware applications, but experiences continue to demonstrate that context
information is intricate to handle computationally, to model, to reason about, and
mange. Indeed, as extensively discussed in Chap. 3, context is quite a fluid, subtle,
and fuzzy concept, and thus it becomes hardly the case that a complete set of
requirements for context representation and reasoning can be rigorously applied.
246 5 Context Modeling, Representation …

Context in context-aware applications involves numerous types of contextual


entities that interrelate and interact within a unified modeling, representation, and
reasoning framework. But the overwhelming variety of types of context incorpo-
rated in context-aware applications has been a stumbling block towards the creation
of profound and generic context ontology solutions for context-aware service
provision that would allow users to benefit from a diverse range of pervasive
services, seamlessly supported by the underlying technology (Strimpakou et al.
2006).
Although ontologies offer clear advantages compared to other representation and
reasoning approaches, they fall short in, as research shows, addressing the issue of
uncertainty and vagueness of context information which it is imperative to over-
come because adaptation decisions in context-aware applications are all made based
on the interpretation and processing of context information. And this information
should be clear, consistent, and complete when collected by sensors, encoded, and
reasoned about. Otherwise the use of context-aware applications would be coun-
terproductive to users, which might have implications for their acceptance of new
technologies. This relates to the issue of the delegation of control to intelligent
software/systems agents to execute tasks on their own autonomy and authority as a
key feature of AmI systems, which has been criticized by many authors (e.g.,
Crutzen 2005; Ulrich 2008; Criel and Claeys 2008). Hence, considering uncertainty
in context information modeling and reasoning on context uncertainty is a very
crucial feature of context modeling and reasoning. According to Bettini et al.
(2008), the few existing preliminary proposals to extend ontologies to represent and
reason about fuzziness and uncertainty (see, e.g., Straccia 2005; Ding and Peng
2004) do not properly support uncertainty in context data at the time of writing
ontology languages and related reasoning tools. As echoed by Perttunen et al.
(2009), none of the description logic-based approaches can deal with uncertainty
and vagueness, although some work (e.g., Schmidt 2006; Reichle et al. 2008) have
attempted to combine ontological modeling with modeling of uncertainty as an
attempt to approach this issue. To note as a result of summarizing a reviewed work
on modeling vagueness and uncertainty, there is no modeling endeavor that meets
all the requirements for context representation and reasoning; and seemingly ‘the
benefit of modeling uncertainty and vagueness has not been evaluated beyond the
capability of representing it; that is, the work doesn’t make it clear how easy it is to
utilize such models in applications, what is the computational expense, and in what
kind of applications does it benefit the users.’ (Ibid, p. 20).
In addition, ontologies have limitations in terms of handling dynamic context
knowledge. As argued by Bettini et al. (2008), ‘ontologies are not well suited to
represent some dynamic context data such as users’ adaptation preferences’, and
suggest that such data can be more profitably modeled by lower-complexity,
restricted logics like those proposed in Henricksen and Indulska (2006) and Bettini
et al. (2008). There is a trade-off between expressiveness and dynamicity in context
information modeling. Formal context representation runs across the fundamental
trade-off between expressiveness and reasoning complexity because formal context
representation puts hard requirements on the context knowledge representation and
5.12 Context Ontologies: Open Issues and Limitations 247

reasoning system as to dynamicity and expressiveness Perttunen et al. (2009). The


choice of ontological models may not always be satisfactory when considering the
trade-off between expressiveness and complexity (Bettini et al. 2010). The future
research should focus on investigating this interplay to find the most suitable
solutions—novel modeling techniques that support the advantages of expressive-
ness as well as reasoning complexity. Relinquishment of one benefit for another
that seems to be more desirable has proven to be unsatisfactory in terms of bal-
ancing interoperability and heterogeneity, on the one hand, and uncertainty and
dynamicity, on the other hand, with regard to representing and reasoning on context
information using ontological models.
Furthermore, context ontologies lack full scalability. They are designed to
statically represent the knowledge of a domain rather than capturing constantly
changing context information in dynamic environments in a scalable manner
(Strimpakou et al. 2006). Higher level abstractions should be derived from dynamic
data produced by dedicated sensors that use dynamic acquisition and perception
techniques.

5.13 Context Models Limitations, Inadequacies,


and Challenges

5.13.1 Technology-Driven and Oversimplified Context


Models

As a description, a model must necessarily be an (over)simplified view of the


real-world. As such, it consists of a set of propositions expressing relationships
among concepts forming the vocabulary of some domain. It is a representation of
how things are with regard to a variety of aspects of human context and functioning
—as captured from ‘real-world’ by software engineers or based on computationally
formalized knowledge from the human-directed disciplines, such as social sciences,
neurocognitive science, cognitive psychology, the philosophy of mind, and so on.
Thus, it is represented in a formal and computational format as a shared concep-
tualization, an abstract model of some phenomenon identified as a set of relevant
concepts and their interrelations. By capturing consensual knowledge (conceptu-
alization), models are used as a basis for common understanding between system
designers, service providers, and users. For example, a computer system tracks the
eye movements of a user, and uses this information in a computational model that is
able to estimate the user’s ongoing cognitive processes, e.g., decision making,
information searching, and reading, and accordingly fire actions that facilitate such
processes. Conceptualizations are extremely valuable to designers; they define the
terminology used to describe and think about different phenomena, e.g., interaction,
context, cognition, emotion, behavior, and brain functioning. In relation to context-
aware computing, while the conceptualization of context is complex, involving an
248 5 Context Modeling, Representation …

infinite number and a wide variety of contextual elements that dynamically interact
with each other to define and shape human interaction, technologically context
consists of a very limited number of contextual entities or a number of explicitly
defined attributes. It is also (operationalized) based on a static view, occurring at a
point in time, rather than a dynamic view, constantly evolving. Although context
models aim and attempt to capture rich domain of knowledge, concepts and their
relationships as close as possible to real-world based on advanced theoretical
models, they are still circumscribed by technological boundaries. Indeed, most
work in developing context-aware systems seems to be driven by what is techni-
cally feasible and computationally attainable. Also, or probably as a consequence, a
large body of work on context-aware systems tend to operationalize concepts of
various contexts in a rather simplified way compared to how context is conceptu-
alized in situated theory, philosophy, constructivism, communication studies, or
rather academic disciplines devoted to the study of context or specialized on the
subject matter (see Goodwin and Duranti 1992 for a detailed account).
Technological feasibility pertaining to the machine understandability and process-
ability of semantic context content—semantic reasoning making use of context
semantic descriptions and domain knowledge—has implication for how context
should be conceptualized. In fact, the computational feasibility issues relating to the
notion of intelligence as allured to in AmI are associated with an inherent com-
plexity and intrinsic intricacy pertaining to sensing all kinds of patterns in the
physical world and modeling all sorts of situations and environments. In a nutshell,
context models should fit with what technology has to offer in terms existing
computational representation and reasoning capabilities rather than technology
support and responds to how context needs to be modeled or conceptualized: ‘In the
terms of practical philosophy…, human context includes the dimension of
practical-normative reasoning in addition to theoretical-empirical reasoning, but
machines can handle the latter only. In phenomenological terms, human context is
not only a “representational” problem (as machines can handle it) but also an
“interactional” problem, that is, an issue to be negotiated through human interac-
tion…. In semiotic terms, finally, context is a pragmatic rather than merely semantic
notion, but machines operate at a syntactic or at best approximated semantic level of
understanding.’ (Ulrich 2008, p. 7). Theoretical criteria have been proposed for
defining the user context from a theoretic or holistic view, context models are based
on a simplified set of concepts and their relationships, notwithstanding. As a
consequence to the way context is operationalized, driven by the constraints of
existing technologies and engineering theory and practice, it is feasible to model
any domain, or even worse, the world, an outcome of modeling which is based on
user groups, as long as computational models can enable systems to bring (a certain
degree of) utility to the user. Currently, the key concern of how context should be
modeled, represented, processed, managed, disseminated, and communicated seem
to be to make context-aware applications useful in terms of being able to adapt to
some features of the user context in AmI environments. In fact, computer and
design scientists argue that the concern of models should be utility not truth. Hence,
context model is useful insofar as it contributes to developing context-aware
5.13 Context Models Limitations, Inadequacies, and Challenges 249

applications that are functional as to the provision of some feasibly-to-deliver


ambient services. As pointed out by Gruber (2009), ontology is a tool and product
of engineering and thus defined by its intended use—what matters is to provide the
representational mechanism ‘with which to instantiate domain models in knowledge
bases, make queries to knowledge-based services, and represent the results of
calling such services’. This is what context-aware applications as knowledge-based
systems are about.
The computational rendition of context and the way it influences interaction in
AmI are far from reality—how context defines and shapes human interaction in the
everyday life world, according to constructivistic worldview or situated theory, for
instance. This has direct implications for the functioning and performance of
context-aware applications as interactive systems. Indeed, a number of recent
critical and social studies on AmI technology have drawn attention to the pitfalls of
reducing the complexity of context for technical purposes, bringing attention to the
psychological and social issues that context-aware applications may pose when
implemented in real-world environments (see Chaps. 3 and 10 for a related dis-
cussion). Current context models overlook details and involve abstraction inaccu-
racies when capturing context concepts. This is though seen as an inconsequential
matter for some researchers who claim that the purpose of creating and incorpo-
rating context models in new interactive technologies is to enhance user interaction
experience; however, it is important for context-aware systems engineers and
designers to be aware of the inaccuracies and inadequacies inherent in context
models, so that they can find alternatives to avoid potential unintended conse-
quences that might result from inappropriate context-dependent application actions
(the system behavior). One alterative in this regard is to combine different levels of
interactivity in context-aware applications, namely passive and active, pull and
push, or interactive and proactive in the provision of context-aware personalized
and adaptive services (see Chap. 3 for a detailed account).
Sound models are models that are codified using natural languages to capture the
structure of reality—semantically rich knowledge domains. This is unfeasible with
existing formal languages, which require that for ontological models to be machine
readable, they should exclude natural languages. While natural languages are
capable of capturing meanings that are constructed in interaction with persons,
places, or objects as contextual entities, existing formal representation languages
are capable of encoding only the definition of contextual entities as such. That is,
they cannot capture how different contextual entities are perceived by each indi-
vidual according to their cognitive-social representations nor the way these entities
intertwine in a given situation in their interpretation by each individual. Therefore,
the usefulness of context models is a matter of the extent to which context
abstractions can capture and encode real-world situations, and therefore they are
evaluated in terms of their comprehensiveness, dynamicity, fidelity with real-world
phenomena, meticulousness, internal consistency, robustness, coherence, but to
name a few.
The quality and prominence of current context-aware systems as artificial arti-
facts depend on the accurate conceptualization and the correct representation of
250 5 Context Modeling, Representation …

human functioning pertaining to emotional, cognitive, behavioral, and social pro-


cesses, particularly those associated with human interaction. Advanced computa-
tional modeling abstractions of knowledge about human functioning are needed,
and the state-of-the-art theoretical models from cognitive neuroscience and social
sciences, in particular, should be incorporated into AmI technologies.

5.13.2 Context Models as User Groups Models

Context is subjective, fluid, and dynamic, constantly evolving according to users’


meanings and actions, and because it is very dedicated for every user, it is more
difficult or even unfeasible to define as models. Indeed, in most work on
context-aware systems, context models are mostly based on user groups rather than
on every user and interaction. Despite much technological advancement has been
achieved to create more advanced context models, only little research has been
carried out on how models can be designed by communities (e.g., De Moor et al.
2006) and even fewer by various classes of users—e.g., marginalized users. In this
line of thinking, in relation to emotional context awareness, for instance, although
there is a great variety of theoretical models of emotions that can help create
culturally sensitive emotional context-aware applications, emotional elements of
context as expressive behaviors tend to be modeled based on common properties of
emotions. But, many types of emotions are not universal, as individuals differ on
the basis of their cultures and languages as to expressing emotions. There are more
emotional properties that are idiosyncratic than universal, e.g., ‘six facial expres-
sions’ (Ekman 1982). In reference to affective computing, Picard (2000) points out
that there is hardly ever a one-size-fits-all solution for the growing variety of users
and interactions. Each person reflects emotions in a personal way, so there is a need
to properly adjust parameters to each one of them—ideally desirable, an aspect
which should be taken into account when modeling emotional context. While many
authors and researchers claim to be able to model emotions of individuals, they
‘forget that the meaning of context differs for every person and is modulated
through interactions’ (Criel and Claeys 2008). One of the most complex cognitive
human behaviors is to understand emotions (see Salovey and Mayer 1990).
However, context-aware applications create new challenging problems to designers
and implementers when built on context models that are based on user groups: it
often occurs that implemented applications do not fulfill users’ (emotional)
expectations and do not allow users to properly exploit these systems. And many
current evaluation methods for emotions in AmI are developed for laboratory set-
tings, and hence the changeable context of use of context-aware applications—e.g.,
in field settings or real-life environments—sets new practical challenges for
research of user emotional experience.
Therefore, ontology and system designers not only in relation to emotional
context-aware systems but also within other context awareness domains should
engage users as target groups actively in the design process. This is a salient factor
5.13 Context Models Limitations, Inadequacies, and Challenges 251

for creating well-informed context-aware applications dedicated for relevant user


groups, thereby achieving the desired outcomes in terms of responding to the needs
and expectations of specific classes of users. A key-role user is an expert in a
specific domain, but not in computer science, and is also aware of the needs of the
user when using computational systems (Bianchi-Berthouze and Mussio 2005).
This approach enables domain experts to collaborate with HCI engineers to design
and implement context-aware systems (Ibid). Accordingly, novel insights into
modeling context have to be driven by tacit, creative, and non-technological
knowledge as dynamics of how different groups of users aspire to interact with
technologies in various settings. The codified knowledge and the push philosophy
of context-aware technology are not the most effective way of understanding how
users like to interact with AmI technology. Failure to engage different classes of
users poses a risk of them rejecting or not accepting technology. Context models
should not be developed to be used only in protected research or laboratory con-
ditions, but rather to be designed with the prime goal to be implemented success-
fully in real-world environments. The problem of assessing the usefulness of
context-aware applications in influencing user interaction experience is still com-
pounded by the lack of clear empirical evidence in relation to different classes and
cultural variations of users.

5.14 Holisitic Approach to Context Models

Context models should be developed by holistic approach, encompassing techno-


logical and human-directed research, thereby extending the remit of computer
science and AI to include cognitive science, philosophy, and social science. It is no
longer sufficient to perform specific research in areas of AI for AmI (e.g., hybrid
approach to representation and reasoning and performance features of context
modeling and reasoning techniques) and embody the results of that research in
particular context-aware applications. While developing context-aware applications
should be supported by adequate context modeling methods and efficient reasoning
techniques, context models should be based on a theoretically clear overall
approach. Indeed, technological advancement is rapid but appears to happen ad-hoc
when new techniques and methods become available. The technical complexity of
computational human context and state models and dynamic process models about
human functioning requires that all aspects of the development chain pool their
knowledge together and integrate their efforts. The coherence required to achieve
significant computational modeling development impact requires engagement of
researcher from both computer science and human-directed disciplines. It is war-
ranted to encourage people from these disciplines or working on cross connections
of AmI with these disciplines to collaborate or work together. The emphasis is on
the use of knowledge from these disciplines, specialized on the subject matter of
context, in context-aware applications, in order to support, in a knowledgeable
manner, users in their living, working, learning, communication, and social
252 5 Context Modeling, Representation …

respects. Context touches upon the basic structure of human and social interaction.
Context-aware applications constitute a high potential area for modelers in the
psychological and social disciplines to implement and assess their models. From the
other side, it is valuable to sensitize researchers in computer science, AI, and AmI
to the possibilities and opportunities to incorporate more substantial knowledge
from human-directed disciplines in context-aware applications.
In sum, context models should be developed through collaborative research
endeavors, an approach that requires rigorous scholarly interdisciplinary and
transdisciplinary research. This has the potential to create new interactional and
holistic knowledge necessary to better understand the multifaceted phenomenon of
context and context awareness and thereby enhance context models. This in turn
reduces the complexity of, and advances, the development of context-aware
applications in ways that allow users to exploit them to the fullest, by benefitting
from a diverse range of context-aware personalized, adaptive, and responsive ser-
vices. Therefore, when designing AmI technologies, it is of import for researchers
or research teams to be aware of the limitation and specificity of technological
knowledge, to challenge assumptions, and to constantly enhance models to increase
the successfulness of the deployment and adoption of new technologies. This
should depart from challenging technology-driven perspectives on context models,
critically reviewing operationalizations of context and their implication for how
context is conceptualized, and questioning the belief in the existence of models of
the user’s world and models of the user’s behavior, as well as revolutionizing the
formalisms used to codify context knowledge beyond hybrid approaches. What is
needed is to create innovative modeling techniques and languages that can capture
and encode context models with high fidelity with real-world phenomenon, com-
prehensiveness, dynamicity, and robustness.

References

Agostini A, Bettini C, Riboni D (2005) Loosely coupling ontological reasoning with an efficient
middleware for context awareness. In: Proceedings of the 2nd annual international conference
on mobile and ubiquitous systems. Networking and services, pp 175–182
Agostini A, Bettini C, Riboni D (2009) Hybrid reasoning in the CARE middleware for context
awareness. Int J Web Eng Technol 5(1):3–23
Arpírez JC, Gómez-Pérez A, Lozano A, Pinto HS (1998) (ONTO)2 agent: an ontology-based
WWW broker to select ontologies. In: Gómez-Pérez A, RV Benjamins (eds) ECAI’98
workshop on applications of ontologies and problem-solving methods, Brighton, pp 16–24
Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF (2003) The description
logic handbook: theory, implementation, and applications. Cambridge University Press, New
York
Bechhofer S, van Harmelen F, Hendler J, Horrocks I, McGuinnes DL, Patel-Schneider PF,
Stein LN (2004) OWL web ontology language reference. W3C
Bernaras A, Laresgoiti I, Corera J (1996) Building and reusing ontologies for electrical network
applications. In: Proceedings of the 12th European conference on artificial intelligence (ECAI),
pp 298–302
References 253

Bettini C, Pareschi L, Riboni D (2008) Efficient profile aggregation and policy evaluation in a
middleware for adaptive mobile applications. Pervasive Mobile Comput 4(5):697–718
Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A
survey of context modelling and reasoning techniques. J Pervasive Mobile Comput 6(2):161–180
(Special Issue on Context Modelling, Reasoning and Management)
Bianchi-Berthouze N, Mussio P (2005) Introduction to the special issue on “context and emotion
aware visual computing”. J Visual Lang Comput Comput 16:383–385
Bobillo F, Delgado M, Gómez-Romero J (2008) Representation of context-dependant knowledge
in ontologies: a model and an application. Expert Syst Appl 35(4):1899–1908
Borgo S, Guarino N, Masolo C (1996) A pointless theory of space based on strong connection and
congruence. In: Proceedings of principles of knowledge representation and reasoning (KR96),
Morgan Kaufmann, Boston, MA, pp 220–229
Bouquet P, Giunchiglia F, van Harmelen F, Serafini L, Stuckenschmidt H (2004) Contextualizing
ontologies. J Web Semant 1(4):325–343
Brachman RJ, Levesque HJ (2004) Knowledge representation and reasoning. Morgan Kaufmann,
Amsterdam
Bravo J, Alaman X, Riesgo T (2006) Ubiquitous computing and ambient intelligence: new
challenges for computing. J Univers Comput Sci 12(3):233–235
Cearreta I, Miguel J, Nestor L, Garay-Vitoria N (2007) Modelling multimodal context-aware
affective interaction. Laboratory of human-computer interaction for special needs, University
of the Basque Country
Chaari T, Dejene E, Laforest F, Scuturici VM (2007) A comprehensive approach to model and use
context for adapting applications in pervasive environments. Int J Syst Softw 80(12):1973–1992
Chen H, Finin T, Joshi A (2004a) Semantic web in the context broker architecture. Proceedings of
the 2nd IEEE international conference on pervasive computing and communications (PerCom
2004). IEEE Computer Society, pp 277–286
Chen H, Fenin T, Joshi A (2004b) An ontology for context-aware pervasive computing
environments. Knowl Eng Rev 18(3):197–207 (Special Issue on Ontologies for Distributed
Systems)
Chen H, Perich F, Finin TW, Joshi A (2004c) SOUPA: standard ontology for ubiquitous and
pervasive applications. In: 1st annual international conference on mobile and ubiquitous
systems, MobiQuitous. IEEE Computer Society, Boston, MA
Chen L, Nugent C (2009) Ontology–based activity recognition in intelligent pervasive
environments. Int J Web Inf Syst 5(4):410–430
Criel J, Claeys L (2008) A transdisciplinary study design on context-aware applications and
environments’, a critical view on user participation within calm computing. Observatorio
(OBS*) J 5:057–077
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc 3
(4):219–232
De Moor A, De Leenheer P, Meersman M (2006) DOGMA–MESS: A meaning evolution support
system for interorganizational ontology engineering. Paper presented at the 14th international
conference on conceptual structures, Aalborg, Denmark
Dey AK (2000) Providing architectural support for building context-aware applications. PhD
thesis, College of Computing, Georgia Institute of Technology
Dey AK (2001) Understanding and using context. Pers Ubiquit Comput 5(1):4–7
Dey AK, Abowd GD, Salber D (2001) A conceptual framework and a toolkit for supporting the
rapid prototyping of context-aware applications. Hum Comput Interact 16(2–4):97–166
Ding Z, Peng Y (2004) A probabilistic extension to ontology language OWL. In: Proceedings of
the 37th annual hawaii international conference on system sciences (HICSS’04). IEEE
Computer Society, Washington, DC
Dobson S, Ye J (2006) Using fibrations for situation identification. Proceedings of pervasive 2006
Workshops. Springer, New York
Ekman P (1982) Emotions in the human face. Cambridge University Press, Cambridge
Ekman P (1984) Expression and nature of emotion. Erlbaum, Hillsdale, Nueva Jersey
254 5 Context Modeling, Representation …

Fensel D (2003) Ontologies: a silver bullet for knowledge management and electronic commerce.
Springer, Berlin
Forbus KD, Kleer JD (1993) Building problem solvers. MIT Press, Cambridge, MA
Fowler M, Scott K (1997) UML distilled: applying the standard object modeling language.
Addison-Wesley, Reading, MA
Gandon F, Sadeh NM (2003) A Semantic e-wallet to reconcile privacy and context awareness.
Proceedings of ISWC 2003, 2nd international semantic web conference. Springer, Berlin,
pp 385–401
Gellersen HW, Schmidt A, Beigl M (2002) Multi-sensor context-awareness in mobile devices and
smart artifacts. Mobile Netw Appl 7(5):341–351
Gennari JH, Musen MA, Fergerson RW, Grosso M, Crubezy WE, Eriksson H, Noy NF, Tu SW
(2003) The evolution of Protégé: an environment for knowledge-based systems development.
Int J Hum Comput Stud 58(1):89–123
Göker A, Myrhaug HI (2002) User context and personalisation. In: ECCBR workshop on case
based reasoning and personalisation, Aberdeen
Goleman D (1995) Emotional intelligence. Bantam Books Inc, NY
Gomez-Perez A (1998) Knowledge sharing and reuse. In: Liebowitz J (ed) The handbook of
applied expert systems. CRC Press, Boca Raton, FL
Goodwin C, Duranti A (eds) (1992) Rethinking context: Language as an Interactive phenomenon.
Cambridge University Press, Cambridge
Gray PD, Salber D (2001) Modelling and using sensed context information in the design of
interactive applications. In: Proceedings of engineering for human-computer interaction: 8th
IFIP international conference, vol 2254. Toronto, pp 317–335
Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquisition
5:199–221
Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing. Int J
Hum Comput Stud 43(5–6):907–928
Gruber T (2009) Ontology. In: Liu L, Tamer Özsu M (eds) The encyclopedia of database systems.
Springer, Heidelberg
Gu T, Pung HK, Zhang DQ (2004a) Toward an OSGi-based infrastructure for context-aware
applications. Pervasive Comput 3(4):66–74
Gu T, Wang XH, Pung HK, Zhang DQ (2004b) An ontology-based context model in intelligent
environments. In: Proceedings of communication networks and distributed systems modeling
and simulation conference, San Diego, California, pp 270–275
Gu T, Pung HK, Zhang DQ (2005) A service-oriented middleware for building context-aware
services. J Network Comput Appl 28(1):1–18
Guarino N (1995) Formal ontology, conceptual analysis and knowledge representation. Int J Hum
Comput Stud 43(5–6):625–640
Guizzardi G (2005) Ontological foundations for structural conceptual models. PhD thesis,
University of Twente, The Netherlands
Guizzardi G, Herre H, Wagner G (2002) On the general ontological foundations of conceptual
modeling. In: Proceedings of the 21st int’l conference on conceptual modeling (ER-2002), vol
2503. LNCS, Finland
Henricksen K, Indulska J (2006) Developing context-aware pervasive computing applications:
models and approach. Pervasive Mobile Comput 2(1):37–64
Henricksen K, Livingstone S, Indulska J (2004) Towards a hybrid approach to context modelling,
reasoning and interoperation. In: Indulska J, Roure DD (eds) Proceedings of the 1st
international workshop on advanced context modelling, reasoning and management, University
of Southampton, Nottingham
Hofer T, Schwinger W, Pichler M, Leonhartsberger G, Altmann J, Retschitzegger W (2003)
Context-awareness on mobile devices—the hydrogen approach. Proceedings of the 36th annual
Hawaii international conference on system sciences (HICSS ‘03), vol 9. IEEE Computer Society
Hong JI, Landay JA (2001) An infrastructure approach to context-aware computing. Hum Comput
Interact 16:287–303
References 255

Horrocks I (2002) DAML+OIL: a reason-able web ontology language. In: Advances in database
technology—8th international conference on extending database technology, vol 2287. Prague,
Czech Republic, pp 2–13, 25–27 Mar 2002
Horrocks I, Patel-Schneider PF, van Harmelen F (2003) From SHIQ and RDF to OWL: the
making of a web ontology language. J Web Semant 1(1):7–26
Horrocks I, Patel-Schneider PF, Boley H, Tabet S, Grosof B, Dean M (2004) SWRL: a semantic
web rule language combining OWL and RuleML. W3C Member Submission, W3C, viewed 23
June 2009. http://www.w3.org/Submission/2004/SUBM-SWRL-20040521/
Indulska J, Robinson R, Rakotonirainy A, Henricksen K (2003) Experiences in using CC/PP in
context-aware systems. In: Chen MS, Chrysanthis PK, Sloman M Zaslavsky AB (eds) Mobile
data management, vol 2574. Lecture notes in computer science. Springer, Berlin
Khedr M, Karmouch A (2005) ACAI: agent-based context-aware infrastructure for spontaneous
applications. J Network Comput Appl 28(1):19–44
Khushraj D, Lassila O, Finin T (2004) sTuples: semantic tuple spaces. In: The 1st annual international
conference on mobile and ubiquitous systems: networking and services, pp 268–277
Kim S, Suh E, Yoo K (2007) A study of context inference for Web-based information systems.
Electron Commer Res Appl 6:146–158
Klyne G, Reynolds F, Woodrow C, Ohto H, Hjelm J, Butler MH, Tran L (2004) Composite
capability/preference profiles (CC/PP): structure and vocabularies 1.0. W3C Recommendation,
Technical Representation, W3C
Kogut P, Cranefield S, Hart L, Dutra M, Baclawski K, Kokar M, Smith J (2002) UML for
ontology development. Knowl Eng Rev 17(1):61–64
Korpip P, Malm E, Salminen I, Rantakokko T (2005) Context management for end user
development of context-aware applications. In: Proceedings of the 6th international conference
on mobile data management. ACM Press, Ayia Napa, Cyprus
Korpipää P (2005) Blackboard-based software framework and tool for mobile device context
awareness. PhD thesis, University of Oulu
Korpipaa P, Mantyjarvi J, Kela J, Keranen H, Malm E (2003) Managing context information in
mobile devices. IEEE Pervasive Comput 2(3):42–51
Lang PJ (1979) A bio-informational theory of emotional imagery. Psychophysiology 16:495–512
Lassila O, Khushraj D (2005) Contextualizing applications via semantic middleware. In:
Proceedings of the 2nd annual international conference on mobile and ubiquitous systems:
networking and services, San Diego, pp 183–189
Lum WY, Lau FCM (2002) A context-aware decision engine for content adaptation. IEEE
Pervasive Comput 1(3):41–49
Maluszynski J (2005) Combining rules and ontologies: a survey. REWERSE, Technical
Representation, I3–D3
Mamei M, Zambonelli F (2004) Programming pervasive and mobile computing applications with
the TOTA middleware. In: Proceedings of the 2nd IEEE international conference on pervasive
computing and communications. IEEE Computer Society
McGuinness DL, van Harmelen F (2004) OWL web ontology language. W3C Recommendation.
http://www.w3.org/TR/owl-features/. Viewed 25 May 2012
McIntyre G, Göcke R (2007) Towards affective sensing. Proceedings of HCII, vol 3
Motik B, Patel-Schneider PF, Parsia B (2008) OWL 2 web ontology language: structural
specification and functional-style syntax. World Wide Web Consortium, Working Draft
WD-owl2-syntax-20081202
Newmann NJ (1999) Sulawesi: a wearable application integration framework. Proceedings of the
3rd international symposium on wearable computers (ISWC ‘99), San Fransisco
Newmann NJ, Clark AF (1999) An intelligent user interface framework for ubiquitous mobile
computing. Proceedings of CHI ‘99
Nicklas D, Grossmann M, Mínguez J, Wieland M (2008) Adding high-level reasoning to efficient
low-level context management: a hybrid approach. In: 6th annual IEEE international
conference on pervasive computing and communications, pp 447–452
256 5 Context Modeling, Representation …

Obrenovic Z, Starcevic D (2004) Modeling multimodal human-computer interaction. IEEE


Comput 37(9):65–72
Obrenovic Z, Garay N, López JM, Fajardo I, Cearreta I (2005) An ontology for description of
emotional cues. In: Tao J, Tan T, Picard RW (eds) vol 3784. LNCS, pp 505–512
Oechslein C, Klügl F, Herrler R, Puppe F (2001) UML for behavior-oriented multi-agent
simulations. In: Dunin-Keplicz B, Nawarecki E (eds) From theory to practice in multi-agent
systems, 2nd international workshop of central and Eastern Europe on multi-agent systems.
Springer, Cracow, p 217ff
Parsia B, Halaschek-Wiener C, Sirin E (2006) Towards incremental reasoning through updates in
OWL DL. Proceedings of workshop on reasoning on the web, Edinburgh
Pascoe NR, Morse D (1999) Issues in developing context-aware computing. In: International
symposium on handheld and ubiquitous computing, vol 1707. Lecture notes in computer
science, Karlsruhe, pp 208–221
Perttunen M, Riekki J, Lassila O (2009) Context representation and reasoning in pervasive
computing: a review. Int J Multimedia Eng 4(4):1–28
Picard RW (2000) Perceptual user interfaces: affective perception. Commun ACM 43(3):50–51
Ranganathan A, Al-Muhtadi J, Campbell RH (2004a) Reasoning about uncertain contexts in
pervasive computing environments. IEEE Pervasive Comput 3(2):62–70
Ranganathan A, Mcgrath RE, Campbell RH, Mickunas MD (2004b) Use of ontologies in a
pervasive computing environment. Knowl Eng Rev 18(3):209–220
Reichle R, Wagner M, Khan MU, Geihs K, Valla M, Fra C, Paspallis N, Papadopoulos GA (2008)
A context query language for pervasive computing environments. In: 6th annual IEEE
international conference on pervasive computing and communications, pp 434–440
Riboni D, Bettini C (2009) Context-aware activity recognition through a combination of
ontological and statistical reasoning. In: Proceedings of the 6th international conference on
ubiquitous intelligence and computing, UIC-09, vol 5585. LNCS. Springer, Berlin, pp 39–53
Rom M, Hess C, Cerqueira R, Ranganathan A, Campbell RH, Nahrstedt K (2002) Gaia: a
middleware platform for active spaces. SIGMOBILE Mobile Comput Commun Rev 6(4):65–67
Salber D, Dey AK, Abowd GD (1999) The context toolkit: aiding the development of
context-enabled applications. Proceedings of the conference on human factors in computing
systems, Pittsburgh, PA, pp 434–441
Salovey P, Mayer JD (1990) Emotional intelligence. Imagination Cogn Pers 9:185–211
Scherer KR (1999) Appraisal theory. In: Dalgleish T, Power MJ (eds) Handbook of cognition and
emotion. Wiley, New York, pp 637–663
Schmidt A (2006) Ontology-based user context management, the challenges of imperfection and
time-dependence, on the move to meaningful internet systems: CoopIS, DOA, GADA, and
ODBASE. Lecture notes in computer science, vol 4275. pp 995–1011
Schmidt A, Beigl M, Gellersen HW (1999) There is more to context than location. Comput
Grap UK 23(6):893–901
Smith B, Welty C (2001) Ontology-towards a new synthesis. Proceedings of the international
conference on formal ontology in information systems (FOIS2001). ACM Press, New York
Soldatos J, Pandis I, Stamatis K, Polymenakos L, Crowley JL (2007) Agent based middleware
infrastructure for autonomous context-aware ubiquitous computing services. Comput Commun
30(3):577–591
Sowa JF (1984) Conceptual structures. Information processing in mind and machine. Addison
Wesley, Reading, MA
Sowa JF (2000) Knowledge representation: logical, philosophical, and computational foundations.
Brooks Cole Publishing Co, Pacific Grove, CA
Straccia U (2005) Towards a fuzzy description logic for the semantic web (preliminary report). In:
Proceedings of the second European semantic web conference, ESWC 2005, vol 3532. Lecture
notes in computer science, Springer, Berlin
Strang T, Linnhoff-Popien C (2004) A context modeling survey. In: Indulska J, Roure DD
(eds) Proceedings of the 1st international workshop on advanced context modelling, reasoning
and management as part of UbiComp 2004—The 6th international conference on ubiquitous
computing. University of Southampton, Nottingham
References 257

Strang T, Linnhoff-Popien C, Frank K (2003) CoOL: a context ontology language to enable


contextual interoperability. In: Proceedings of distributed applications and interoperable
systems: 4th IFIP WG6.1 international conference, vol 2893. Paris, pp 236–247
Strimpakou M, Roussak I, Pils C, Anagnostou M (2006) COMPACT: middleware for context
representation and management in pervasive computing. Pervasive Comput Commun 2
(3):229–245
Studer R, Benjamins VR, Fensel D (1998) Knowledge engineering: principles and methods. Data
Knowl Eng 25(1–2):161–197
Ulrich W (2008) Information, context, and critique: context awareness of the third kind. In: The
31st information systems research seminar in scandinavia, Keynote talk presented to IRIS 31
Urnes T, Hatlen AS, Malm PS, Myhre O (2001) Building distributed context-aware applications.
Pers Ubiquit Comput 5:38–41
Uschold M, Grüninger M (1996) Ontologies: principles, methods, and applications. Knowl Eng
Rev 11(2):93–155
Wang XH, Gu T, Zhang DQ, Pung HK (2004) Ontology based context modeling and reasoning
using OWL. Proceedings of the 2nd IEEE annual conference on pervasive computing and
communications workshops. IEEE Computer Society, Washington, DC, p 18
Winograd T (2001) Architectures for context. Hum Comput Interact 16:401–419
Yamada N, Sakamoto K, Kunito G, Isoda Y, Yamazaki K, Tanaka S (2007) Applying ontology
and probabilistic model to human activity recognition from surrounding things. IPSJ Digital
Courier 3:506–517
Ye J, Coyle L, Dobson S, Nixon P (2007) Using situation lattices to model and reason about
context. In: Proceedings of the 4th international workshop on modeling and reasoning in
context, (MRC 2007)
Zhang D, Gu T, Wang X (2005) Enabling context-aware smart home with semantic technology.
Int J Hum Friendly Welfare Robotic Syst 6(4):12–20
Zhou J, Yu C, Riekki J, Kärkkäinen E (2007) AmE framework: a model for emotion-aware
ambient intelligence. University of Oulu, Department of Electrical and Information Engineering,
Faculty of Humanities, Department of English VTT Technical Research Center of Finland
Chapter 6
Implicit and Natural HCI in AmI:
Ambient and Multimodal User Interfaces,
Intelligent Agents, Intelligent Behavior,
and Mental and Physical Invisibility

6.1 Introduction

As a new paradigm in ICT, AmI is heralding new ways of interaction, which will
radically change the interaction between humans and technology. AmI could be
seen as a novel approach to HCI, entailing a shift from conventional interaction and
user interfaces towards human-centric interaction and naturalistic user interfaces,
e.g., direct communication with all sorts of everyday objects. AmI has emerged as a
result of amalgamating recent discoveries in human communication, computing,
and cognitive science towards natural HCI. AmI technology is enabled by effortless
(implicit human–machine) interactions attuned to human senses and adaptive and
proactive to users. This entails adding adaptive HCI methods to computing systems
based on new insights into the way people aspire to interact with these systems,
meaning augmenting them with context awareness, multimodal interaction, and
intelligence. The evolving model of natural HCI tries to take the holistic nature of
the human user into account—e.g., context, behavior, emotion, intention, motiva-
tion, and so on—when creating user interfaces for and conceptualizing interaction
in relation to AmI applications and environments. Human-like interaction capa-
bilities aim to enhance the understanding and supporting intelligent behavior of
AmI systems. Therefore, human verbal and nonverbal communication behavior has
become an important research topic in the field of HCI; especially, computers are
becoming increasingly an integral part of everyday and social life. Research in this
area is burgeoning within the sphere of AmI. A diverse range of related capture
technologies are under vigorous investigation in the creation of AmI applications
and environments. Utilizing human verbal and nonverbal communication behavior
allow users to interact with computer systems on a human level, like face-to-face
human interaction. The trends toward AmI are driving research into more natural
forms of human–machine interaction, moving from explicit means of input towards
more implicit forms of input that supports more natural forms of communication,
such as facial expressions, eye movement, hand gestures, body postures, and speech

© Atlantis Press and the author(s) 2015 259


S.E. Bibri, The Human Face of Ambient Intelligence, Atlantis Ambient
and Pervasive Intelligence 9, DOI 10.2991/978-94-6239-130-7_6
260 6 Implicit and Natural HCI in AmI: Ambient …

and its paralinguistic features. Such forms of communication are also utilized by
context-aware systems to acquire information as input for interaction and interface
control in AmI environments. Recognized as an inherent part of direct human
communication, nonverbal behavior, in particular, plays a significant role in con-
veying context. They can provide a wealth of information about the user’s emo-
tional, cognitive, and physiological states as well as actions and behaviors, a type of
contextual information that can be captured implicitly by context-aware systems, so
that they can enhance their computational understanding of interaction with users
and thereby adapt their behavior in ways that intelligently respond to users’ needs.
Indeed, it is by having a greater awareness of context that context-aware systems
can become able to provide more intelligent services, in addition to rendering
interaction with users more intuitive and effortless. However, placing greater reli-
ance on knowledge of context, reducing interactions with users (minimizing input
from them and replacing it with knowledge of context), and providing intelligent
services signify that applications become invisible. Invisibility, the guiding prin-
ciple of context-aware computing has been a subject of much debate and criticism
in the recent years for it poses a special conundrum and a real dilemma. This vision
remains of limited modern applicability.
This chapter examines, discusses, and classifies the different features of implicit
and natural HCI pertaining to ambient and multimodal interaction and user inter-
faces, intelligent agents, intelligent behavior (personalization, adaptation, respon-
siveness, and anticipation), and mental and physical invisibility, as well as related
issues, challenges, and limitations.

6.2 Definitional Issues, Research Topics, and Shifts in HCI

HCI, a branch of computer science, involves a wide variety of areas, including


mobile computing, ubiquitous computing, AmI, and AI. There is thus a large body
of work that deals with various HCI application domains or types of HCI-based
systems. HCI goes beyond computers (e.g., laptops, PCs) to include many other
interactive devices, such as mobile phones, cameras, sensors, PDAs, DVDs,
machines, and so on. It is also referred to as computer-human interaction (CHI) or
man-machine interaction (MMI). Like many concepts in computer science, HCI has
multiple definitions. Rather, there is no agreed upon definition of the range of topics
that form the area of HCI. In computer science, however, a common thread running
through most definitions of HCI is that it deals with the study, development, and
implementation of the interaction between users and computers. The Association
for Computing Machinery defines HCI as ‘a discipline concerned with the design,
evaluation and implementation of interactive computing systems for human use and
with the study of major phenomena surrounding them.’ (ACM SIGCHI 2009). HCI
is the process of communicating information from or presenting services by com-
puter systems via display units to human users as a result of their manipulation and
control of such systems by means of explicit or implicit input devices. Its special
6.2 Definitional Issues, Research Topics, and Shifts in HCI 261

concerns include: the joint performance of tasks by users and computers; the
structure of communication between users and computers; human capabilities to use
computers; algorithms and programing of user interfaces; engineering issues
relating to designing and building interfaces, the process of analysis, design and
implementation of interfaces; and design trade-offs (Ibid). HCI also deals with
enhancing usability and learnability of interfaces; techniques for evaluating the
performance of interfaces; developing new interfaces and interaction techniques;
the development and practical application of design methodologies to real-world
problems; prototyping new software and hardware systems; exploring new para-
digms for interaction (e.g., natural interaction); developing models and theories;
and so forth. HCI is of a highly interdisciplinary nature for it studies humans and
computers in conjunction. It integrates a range of fields of research and academic
disciplines, including engineering science, design science, and applied science, as
well as cognitive science, communication theory, linguistics, social anthropology,
and so on. Accordingly, it is concerned with scientific methodologies and processes
for investigating and designing interaction and user interfaces.
HCI has evolved over the last four decades and have been applied in various
application areas, and recently in context-aware computing. The idea of interaction
has evolved from an explicit timely bidirectional interaction between the human
user and the computer system to a more implicit multidirectional interaction. In
desktop applications, graphical user interfaces (GUIs) as commonly used approa-
ches are built on event based interaction, a direct dialog which occurs as a sequence
of communication events between the user and the system, whereby the basic idea
is to assign events to interactions performed by the user (e.g., pressing a button),
which are linked to actions, e.g., calls of certain functions (Schmidt 2005). Whereas
in new context-aware applications, the user and the system are in an implicit dialog
where the system is aware of the context where it operates through using natural-
istic, multimodal user interfaces combining graphical, facial, voice, gestures, and
motion interfaces. In all, designing interaction and user interfaces for context-aware
systems has its distinctive challenges, manifested in the complexity of the novel
forms that aim at illuminating interaction between users and computers, by making
interaction rich, smooth, intuitive, and reliable. This reflects a qualitative leap
crystallized into AmI as a paradigm shift of HCI.

6.3 HCI Design Aspects: Usability, Functionality,


Aesthetics, and Context Appropriateness

The field of HCI studies has undergone some significant transitions. The focus of
research has shifted from tasks to actions and from laboratories to real-world set-
tings where people would want to use and experience new technologies. Academic
design studies of innovation highlight the importance of observing real people in
real life situations and encourage approaches that make user participation an
262 6 Implicit and Natural HCI in AmI: Ambient …

inseparable part of technology production (Kelley 2002). Studies in the field of HCI
have gone through a number of milestones, including the emphases on function-
ality, usability, and, more recently emotional computing and aesthetic computing.
Research within HCI has for long struggled to address many issues that affect the
amount of effort the user must expend to provide input for the system and to interpret
the output of the system, and how much effort it takes to learn how to perform this.
Dix et al. (1998) observe significant differences when it comes to usability and the
time needed to learn how to operate a system. Usability is a key characteristic of the
user interface; it is concerned with the ease with which a user interface can be used
by its target users to achieve defined goals. Usability is also associated with the
functionality of the computer software and the process to design it. Functionality
refers to the ability to perform a task or function, e.g., software with greater func-
tionality is one that is capable of serving a purpose well or can provide functions
which meet stated and implied needs as intended by its user. In software technology,
usability refers to the capability of the software to be understood, learned, used and
attractive to the user under specified conditions. In this context, it describes how well
a technological artifact can be used for its intended purpose by its target users in
terms of efficiency, effectiveness, and satisfaction. ISO 9241-11 (1998) suggests
measuring usability on three levels: effectiveness (i.e., information retrieval task),
efficiency (i.e., usefulness of time taken to do tasks), and satisfaction (fulfillment of
user’s needs). Usability of technology has been extensively researched in recent
years (e.g., Nielsen 1993; Norman 1988; Hix and Hartson 1993; Somervell et al.
2003; Nielsen and Budiu 2012).
In the context of AmI, usability has gone beyond efficiency and effectiveness to
include appropriateness of context of use—context awareness—for optimal satis-
faction of user’s needs. Context-aware computing promises a rich interaction
experience and a smooth interaction between humans and technology.
Context-aware applications provide intuitive interaction as well as ambient intel-
ligent services, namely adaptive, personalized, responsive, anticipative, and im-
mersive services. The so-called naturalistic, multimodal user interfaces are aimed at
reducing interaction with users by replacing it with knowledge of context with the
goal to reduce the physical and cognitive burden to manipulate and control appli-
cations and better serve the users, thereby increasing usability.
Usability not only represents the degree to which the design of a particular user
interface makes the process of using the system effective, efficient, satisfying, and
context-sensitive, but also takes into account emotional, cognitive and sociocultural
factors of users. It is recognized that HCI design that can touch humans in holistic
ways is fundamental in ensuring a satisfying user interaction experience. Alongside
the standard usability and functionality concerns there is an increasing interest in
questions concerning aesthetics and pleasure. Aesthetic and emotional computing is
another milestone which studies in HCI design has gone through. Particularly in the
area of AmI, design ideals have been confronted by visions of ‘emotional com-
puting’, and HCI research has identified the central position of emotions and aes-
thetics in designing user experiences and computer artifacts. Design aesthetics is of
focus in AmI systems. The basic idea is that high quality of design aesthetics can
6.3 HCI Design Aspects: Usability … 263

profoundly influence people’s core affect through evoking positive affective states
such as sensuous delight and gratification. Aesthetics is thus associated with user’s
emotions. It is used to describe a sense of pleasure, although its meaning is much
broader including any sensual perception (Wasserman et al. 2000). ‘Aesthetics’
comes from the Greek word aesthesis, meaning sensory perception and under-
standing or sensuous knowledge. In a notable work, Udsen and Jorgensen (2005)
unravel recent aesthetic approaches to HCI. It has been realized that studies on
aesthetics in HCI have taken different notations of aesthetics (Ibid). Lavie and
Tractinsky (2004) provide a review of the different approaches to studying aesthetics
including studies in HCI. It is worth mentioning that aesthetics is a contested concept
in design of artifact. Since it is linked to emotions, it touches very much on cultural
context. Visual conventions have indeed proven not to be universal because per-
ception of aesthetics is subjective and socioculturally situated.
However, interfaces are increasingly becoming tailored to a wide variety of users
based on various specificities. In advocating user-centrality, HCI emphasizes the
central role of users in the design of technology, through allowing them to have far
greater involvement in the design process. Widely adopted principles of
user-centered-design (UCD) raise the perspectives of user and context of use to the
center of the design process. The premise of UCD, a common approach to HCI
design, is to balance functionality, usability, and aesthetic aspects. This requires
accounting for psychological, behavioral, social, and cultural variations of users as
a condition for building successful and acceptable interactive technologies.
Therefore, new directions of HCI design research calls for more interdisciplinary
research endeavor to create new interactional knowledge necessary to design
innovative interactive systems in terms of social intelligence (see Chaps. 8 and 9) in
order to heighten user interaction experience. Social intelligence capabilities are
necessary for AmI systems to ensure users’ acceptability and pleasurability. All in
all, interactive computer systems should function properly and intelligently and be
usable, useful, efficient, aesthetically pleasant, and emotionally appealing—in shot
elicit positive emotions and pleasant user experiences. For a detailed discussion of
emotional and aesthetic computing, see Chaps. 8 and 9.

6.4 Computer User Interfaces

6.4.1 Key Characteristics

A user interface is the system by which, and the space where, users interact with
computers. Users tend to be more familiar (and aware) with user interfaces as a
component than other external components of the whole computer system when
directing and manipulating it. This is due to the fact that users interact with the
systems in a multimodal fashion, using visual, voice, auditory, and tactile modal-
ities. User interfaces include hardware (physical), (e.g., input devices and output
units) and related software (logical) components for processing the received
264 6 Implicit and Natural HCI in AmI: Ambient …

information and presenting feedback information to the use on the computer


monitor. Computer user interfaces denotes the graphical, textual and auditory
information the computer system presents to the user, and the control sequences,
such as keystrokes with the keyboard, movements of the pointing device, and
selections with the touch screen, the user uses to control the computer system.
Traditionally, computer user interfaces provide a means of input, allowing the users
to manipulate a system, and output, allowing the system to indicate the results of
the users’ manipulation. And the aim of interaction between a human user and a
computer system at the user interface is effective manipulation and control of this
system to achieve the goal for which the user is using it. And HCI design seeks to
produce user interfaces that make it easy to direct and manipulate a computer
system in ways that lead to the expected results. This means that the user needs to
provide minimal input to achieve the desired output, and also that the computer
minimizes undesired outputs to the human user.

6.4.2 Explicit HCI Characterization

Nowadays, humans predominantly interact with computers via the medium of


graphical user interfaces (GUIs) as an explicit form of interaction. This explicit HCI
approach works through a user conforming to static devices (e.g., keyboard, mouse,
touch screen, and visual display unit) using them in a predefined way. It therefore
involves input and output devices as well as related software applications, e.g., to
display menus and commands and to present information to the user on the screen.
The basic process of a user initiated explicit interaction involves the following
steps: (1) the user requests the system to carry out a certain action; (2) the action is
carried out by the computer, in modern interfaces providing feedback on this
process; (3) and the system responds with an appropriate reply, which in some cases
may be empty (Schmidt 2005).

6.4.3 Explicit HCI Issues

Explicit HCI is associated with a number of issues in terms of user interaction


experience. Explicit user interfaces force users to master different techniques to use
interfaces and thus direct and manipulate applications, hence the continuous
research within HCI to develop design methodologies that help create user inter-
faces that are usable, i.e., can be operated with ease, and useful, i.e., allow the user
to complete relevant tasks. They moreover restrict the range of the interaction and
enforce the user to react in a specific way to continue the ongoing task, adding to
the issue of response time difference, the time between the user interaction that is
carried out and the response of the system, which may have implications for the
user interaction experience. This usually occurs due to some inefficiency in
6.4 Computer User Interfaces 265

algorithms and programing of user interfaces, e.g., in the case of web browsing, a
related software application becomes unable (or fail) to locate, retrieve, present, and
traverse information resources on Web sites, including Web pages, images, video,
and other files. Explicit interaction requires a dialog between the user and the
computer, and this dialog ‘brings the computer inevitably to the center of the
activity and the users focus is on the interface or on the interaction activity.’
(Schmidt 2005). This form of interaction is obviously unsuitable to AmI applica-
tions, as explicit input is insufficient for such applications to function properly; they
rather require a great awareness of the user context, so that they can adapt their
functionality accordingly, i.e., in ways that better match the user needs—to some
extent though. It is simply difficult to imagine or achieve AmI with explicit
interaction only, irrespective of the modality. Regardless, explicit HCI does not take
into account the nonverbal behavior of users leading to some authors characterizing
computers as ‘autistic’ in nature (Alexander and Sarrafzadeh 2004). It is thus in
contrast to the visions of calm computing and UbiComp (Weiser 1991; Weiser and
Brown 1998). As Schmidt (2005, p. 162) observes, explicit interaction contradicts
the idea of AmI and disappearing interfaces, and therefore new interaction para-
digms and HCI models are required to realize the vision of an AmI environment
which can offer natural interaction.

6.4.3.1 Explicit User Interfaces

In computing, a number of explicit user interface types can be distinguished, among


which include:
• Batch interfaces are non-interactive user interfaces, where the user specifies all
the details of the batch job in advance to batch processing, and receives the
feedback (output) when the processing is done. Background processes in current
systems do not allow a direct dialog between the user and the program.
• Command line interfaces are text based interfaces and accept input by typing a
command string with the computer keyboard and the system provides output by
printing text on the computer monitor.
• Graphical user interfaces (GUIs) are covered above. Computing systems and
devices have become user-friendlier with the introduction of the GUIs addressing
the blank screen problem that confronted early computer users. The computer
gave the user no (visual) indication what the user was to do next. As common
characteristics, GUIs consist of windows, icons, menus, and push-buttons—these
change the look and feel of a computer system, specifically the interface between
the human and the computer, and allow the user to concentrate on the task.
• Web user interfaces (WUI) accept input and provide output by generating web
pages which are transmitted via the Internet and viewed by the user using a web
browser.
• Natural-language interfaces are used for search engines and on webpage where a
user types in a question and waits for a response.
266 6 Implicit and Natural HCI in AmI: Ambient …

• Touch screen are displays that accept input by touch of fingers or a stylus.
• Zooming user interfaces are graphical interfaces in which information objects
are represented at different levels of scale and detail: the user can change the
scale of the viewed area in order to show more detail.
Common to all these user interfaces is that the user explicitly requests an action
from the computer, the action is carried out by the computer, and then the system
responds with an appropriate reply.

6.5 The New Paradigm of Implicit HCI (iHCI)

6.5.1 Internal System Properties of iHCI

The main goal of AmI is to make computing technology simple to use and interact
with, intuitive, ubiquitous, and accessible to people with minimal knowledge by
becoming flexible, adaptable, and able of acting autonomously on their behalf
wherever they are. This implies that there are five main properties for AmI or
UbiComp: iHCI, context awareness, autonomy, ubiquity, and intelligence. These
properties tend to have many some overlaps among them in their concepts, e.g.,
iHCI involes context awareness, intelligence, and autonomy. However, there are
different internal system properties that characterize iHCI, among which include
(Poslad 2009):
• iHCI versus explicit HCI: more natural and less conscious interaction instead of
explicit interaction which involves more devices and thus results in human
overload. Computer interaction with humans needs to be more hidden as much
HCI is overly intrusive. Using implicit interaction systems anticipate use.
• Embodied reality as opposite of virtual reality: Weiser (1991) positioned
UbiCom as an opposite of virtual reality, where computing devices are inte-
grated in the real-world—embodied in the physical and human environment—
instead of putting human users in computer-generated environments. UbiCom is
described as computers ‘that fit the human environment instead of forcing
humans to enter theirs.’ (York and Pendharkar 2004). Devices are bounded by
and aware of both physical and virtual environment so as to optimize their
operation in their physical and human environment, and thus users have access
to various services.
• Concept of calm or disappearing computer model: computer devices are too
small to be visible, embedded, and user interfaces are visible, yet unnoticeable,
becoming part of peripheral senses.
It may be useful to elaborate further on these system properties in the context of
AmI. Disappearing of user interfaces into our environment and from our perception
entails that the computing and networking technology (supporting these interfaces)
and their logic will physically disappear, i.e., technologies will be an integral part of
6.5 The New Paradigm of Implicit HCI (iHCI) 267

interactions and peripheral senses and the technology behind will invisibly be
embedded in everyday life world and function unobtrusively in the background.
Diverse, multiple sensors and other computing devices will be entrenched in
context-aware systems and spread in context-aware environment serving to detect
or capture implicit information about the user’s various contextual elements (e.g.,
cognitive states, emotional states, (psycho)physiological states, social states, social
dynamics, events, activities, physical environment and conditions, spatiotemporal
setting, etc.), for analysis and estimation of what is going in the user’s mind and in
his/her behavior and in the physical, social, and cultural environments, and execute
relevant context-dependent actions. In this way, the user will have full access to a
diverse range of services (e.g., personalized, adaptive, responsive, and proactive),
which will be delivered in a real-time fashion, with the environment appearing fully
interactive and reactive. Detecting and analyzing observed information for gener-
ating intelligent behavior is enabled and supported by flexible multimodal inter-
actions, using naturalistic user interfaces. Forms of implicit inputs, which support
natural forms of communication, allow context-aware applications and systems to
capture rich contextual information, which influence and fundamentally change
such applications and systems. Contextual elements are important implicit infor-
mation about the user that the system can use to adapt its behavior intelligently
accordingly. To approach the goal of HCI emulating natural interaction, it is crucial
to include implicit elements into the interaction (Schmidt 2005). The quest for new
forms of interaction and novel user interfaces is motivated by observing how
interaction between humans differs from HCI. As noted by Schmidt (2005, p. 159):
‘Observing humans interacting with each other and new possibilities given by
emerging technologies indicate that a new interaction model is needed’, e.g., cre-
ating naturalistic user interfaces that are capable of detecting as much information
as possible about the user’s context necessary for inferring an accurate high-level
abstraction of context, as such interfaces can employ multiple sensory modalities
and thus channels for information transmission and for interface (or system) con-
trol. The more channels are involved, the more robust estimation of user’s context.

6.5.2 iHCI Characterization

The new paradigm of iHCI is characterized by the definition of iHCI, as provided


by Schmidt (2005, p. 164): ‘the interaction of a human with the environment and
with artifacts which is aimed to accomplish a goal. Within this process the system
acquires implicit input from the user and may present implicit output to the user’.
Here implicit input refers to ‘actions and behavior of humans, which are done to
achieve a goal and are not primarily regarded as interaction with a computer, but
captured, recognized and interpreted by a computer system as input; and implicit
output denotes an ‘output of a computer that is not directly related to an explicit
input and which is seamlessly integrated with the environment and the task of the
user’ (Ibid). User’s action and behavior are contextual elements captured by a
268 6 Implicit and Natural HCI in AmI: Ambient …

system to adapt its functionality accordingly. Therefore, implicit forms of input and
output, and the process of acquiring the former from and presenting the latter to the
user are associated with context-aware systems and applications. The basic idea of
implicit interaction ‘is that the system can perceive the users interaction with the
physical environment and also the overall situation in which an action takes place.
Based on the perception the system can anticipate the goals of the user to some
extent and hence it may become possible to provide better support for the task the
user is doing. The basic claim is that…iHCI allows transparent usage of computer
systems. This enables the user to concentrate on the task and allows centering the
interaction in the physical environment rather than with the computer system’
(Schmidt 2005, p. 164).
Essentially, context-aware systems and applications involve both implicit and
explicit inputs and outputs—that is, context data are acquired from invisibly
embedded sensors (or software equivalents) as well as via keyboard, touchscreen,
pointing device, and/or manual gestures. According to Schmidt’s (2005) iHCI
model, explicit user interaction with a context-aware application is a way of
extending the context of the user in addition to being embedded into the context of
the user. Context-aware services execute service logic, based on information pro-
vided explicitly by end users and implicitly by sensed context information (Dey
2001; Brown et al. 1997). As to outputs, notwithstanding the use of explicit output
to a lesser extent in early context-aware systems, combining explicit and implicit
forms of output is increasingly gaining prevalent as a result of revisiting the notion
of intelligence and addressing the issue of ambiguity and disempowerment asso-
ciated with technology invisibility, which has for quite some time guided
context-aware computing. Pushing information towards and taking actions auton-
omously on behalf of the user was the commonly adopted approach in most
attempts to use context awareness within AmI environments.
The model of iHCI has a wide applicability, spanning a variety of application
domains and thus offering solutions to different problem domains relating to
context-aware computing, affective computing, and conversational agents, which
all involve context awareness at varying degrees. Applications that make use of
iHCI take the user’s context into account as implicit input, and respond to the user
accordingly through implicit output. The iHCI model, as proposed by Schmidt
(2005, p. 167), is centered on the standard HCI model ‘where the user is engaged
with an application by a recurrent process of input and output, and in it ‘the user’s
center of attention is the context… The interaction with the physical [social, cul-
tural, and artificial] environment is also used to acquire implicit input. The envi-
ronment of the user can be changed and influenced by the iHCI application’
However, the type of implicit input a system can acquire when its user interacts
with the environment and its artifacts depends on the application domain, so too
does how implicit output influences and changes the environment of the user.
In all, to realize iHCI requires new interaction paradigms and novel methods for
design and development of user interfaces that make no assumptions about the
available input and output devices or usage scenarios and potential users in a
stereotypical way.
6.5 The New Paradigm of Implicit HCI (iHCI) 269

6.5.3 Analyzing iHCI: Basic Issues

Placing reliance on context information through recognizing, interpreting, and


reasoning about context to infer new context data and reacting to it (by usually
performing application actions) is a process that is non-trivial and often extremely
difficult to realize. A central concern, in particular, is the issue of linking the
perceived context to actions—firing context-dependent actions. Analysis of appli-
cations relevant to iHCI, Schmidt (2005) identifies four basic issues that are central
and necessary to be addressed to create context-aware applications:
• ‘Perception as precondition. To create applications that offer iHCI capabilities it
is inevitable to provide the system with perception for context. This includes the
domains of sensing, abstraction, and representation
• Finding and analyzing situations relevant for the application. When applica-
tions are based on implicit interaction it becomes a central problem to find the
situations that should have an effect on the behavior of the system
• Abstracting from situations to context: Abstracting from situations to context.
Describing a situation is already an abstraction. To describe what should have an
influence on applications classes of situations have to be selected which will
influence the behavior of an application
• Linking context to behavior. To describe an iHCI application classes of situa-
tions and in a more abstracted way contexts must be linked to actions carried out
by the system’ (Schmidt 2005, p. 166).
These basic issues relate to generic contextual model, a basic multi-layered
architecture for context awareness, described in Chap. 3.
The author underlines some imminent questions that arise when considering the
use and development of iHCI systems. One question is how to represent fuzzy
borders and dynamic thresholds? This is because ‘it is often not possible to describe
contexts, especially reflecting complex types of situations, in well-defined sets.’
Some techniques and approaches have been proposed (see Chaps. 4 and 5) in an
attempt to address this issue. As to the issue of interface stability when users interact
with a system, the two central questions are: how to achieve a balance between
stability and using dynamic concepts such as refractory periods and hysteresis and
how to keep the user in charge of the interaction and not wondering about the
actions taken by the system? On this note, the author argues that the trade-off
between stability in the interface and adaptation of the interface is a key issue to
address when designing context-aware systems. The main argument for stability, a
severe problem that occurs particularly in proactive applications (see below for
clarification), ‘is that humans picture the interface and know where to look for a
function. This spatial memorizing becomes much harder or even impossible if
interface keeps changing. The counter argument is that if adaptation works well the
right functions are always at hand and there is no need to memorize where they
are…For the design of context-aware systems these issues should be taken into
account and the trade-off should be assessed.’ (Schmidt 2005, pp. 175–176).
270 6 Implicit and Natural HCI in AmI: Ambient …

Another question is how implicit interaction can be tied in or integrated with explicit
interaction? This is based on the assumption that implicit interaction is rarely the
only form of interaction, hence the significance of its integration with explicit
interaction. This is also a severe problem that occurs in the kind of proactive
applications, particularly. A related question is how to resolve conflicting inputs
when implicit and explicit user interaction goes together. A final question men-
tioned by the author is how to deal with ambiguities in iHCI given that implicit
interaction is often ambiguous? Disambiguating implicit interaction is of a critical
importance in context-aware applications. Most of these questions relate to the
issues posed by the idea of the invisibility of technology. To iterate, the invisibility
of technology has been a subject of debate in the field of context-aware computing.

6.6 Natural Interaction and User Interfaces

6.6.1 Application Domains: Context-Aware, Affective,


Touchless, and Conversational Systems

One interesting aspect of iHCI is the application feature of natural interaction. This is
a key aspect heralding a radical change to the Interaction between user and tech-
nology, as computers will be ubiquitous and invisible, supporting human action,
interaction, and communication in various ways, wherever and whenever needed.
Using naturalistic user interfaces, AmI can anticipate and respond intelligently to
spoken or gestured indications of desires and wishes, and these could even result in
systems or agents that are capable of engaging in intelligent dialog (Punie 2003;
Riva et al. 2005). Utilizing implicit forms of inputs that support natural human forms
of communication, such as speech, facial movements, and gestural movements,
signifies that users will be able to interact naturally with computer systems in the way
face-to-face human interaction occurs. As one of the key human-like computational
capabilities of AmI, natural interaction has evolved as a solution to realize the full
potential of, and is one of the most significant challenges in, AmI. The idea of
mimicking natural interaction is to create computers that can emulate various aspects
of human interaction, using natural modalities, namely to understand and respond to
cognitive, emotional, social, and conversational processes of humans. The under-
lying assumption of augmenting AmI systems with human-like interaction capa-
bilities and consider human intention, emotion, and behavior is to enhance their
intelligent functionality, with the aim to improve the life of people by providing a
panoply of adaptive, responsive, proactive, immersive, and communicative services.
Context-aware systems are enabled by effortless interactions, which are attuned
to human senses and sensitive to users and their context. To approach the aim of
creating interaction between humans and systems that verge on natural interaction,
it becomes crucial to utilize natural forms of communication and therefore include
implicit elements into the interaction. In more detail, the basic idea of natural
6.6 Natural Interaction and User Interfaces 271

interaction is that the system can recognize the user’s cognitive, emotional, and
psychophysiological states as well as actions, using verbal and nonverbal com-
munication signals (facial, vocal, gestural, corporal, and action cues). Based on this,
the system can select, fine-tune, or anticipate actions according to the context of the
task or to the emotional state of the user, therefore providing support for the task the
user is doing. With natural interaction capabilities, systems become able to detect,
understand, and adapt in response to the user cognitive and emotional states.
Therefore, user interfaces that support natural modalities are important for
context-aware systems to be able to interact naturally with users and behave
intelligently, provide services and support cognitive and emotional needs.
A context-aware user interface assumes ‘that things necessary for daily life embed
microprocessors, and they are connected over wireless network’ and that ‘user
interfaces control environmental conditions and support user interaction in a natural
and personal way. Hence, an ambient user interface is a user interface technology
which supports natural and personalized interaction with a set of hidden intelligent
interfaces’ (Lee et al. 2009, p. 458).
There is more to natural interaction than just recognizing the user’s cognitive or
emotional context as implicit input and adapting in response to it as implicit output.
In addition to supporting users in their daily tasks and activities and responding to
their emotional states, AmI systems can, thanks to the integration of affective
computing into AmI, detect users’ emotions and produce emotional responses that
have positive effect on their emotions as well as appear sensitive and show empathy
to them and even help them improve their emotional intelligence abilities (e.g.,
Zhou and Kallio 2005; Zhou et al. 2007; Picard et al. 2001; Picard 1997).
Furthermore, AmI systems are capable to understand and respond to speech and
gestures as commands to perform a variety of tasks as new forms of explicit inputs
(see, e.g., Kumar et al. 2007; Adjouadi et al. 2004; Sibert and Jacob 2000; Pantic
and Rothkrantz 2003; de Silva et al. 2004). Applications utilizing natural modalities
such as facial movement, eye gaze, hand gestures, and speech to execute tasks have
a great potential to reduce the cognitive and physical burden needed for users to
operate and interact with computer systems. In addition, natural interaction enables
AmI systems to engage in intelligent dialog or mingle socially with humans users.
This relates to ECAs which are capable of creating the sense of face-to-face con-
versation with the human user, as these systems are able to receive multimodal
input and then produce multimodal output in nearly real-time (Vilhjálmsson 2009).
ECAs are concerned with natural interaction given that when constructing believ-
able conversational systems, the rules of human multimodal (verbal and nonverbal)
communication behavior are taken into account. ECAs ‘are capable of detecting and
understanding multimodal behavior of a user, reason about it, determine what the
most appropriate multimodal response is and act on this’ (ter Maat and Heylen
2009, p. 67). They involve, in addition to explicit interaction, implicit interaction in
the sense that to engage in an intelligent dialog with a human user, the conversa-
tional system needs to be aware of various contextual elements that surround the
multimodal communicative signals being received from the human user as explicit
input. These context elements which need to be captured as implicit input include:
272 6 Implicit and Natural HCI in AmI: Ambient …

the dialog context, the environmental context, and the cultural context (Samtani
et al. 2008). Therefore, conversational systems are, like context-aware systems,
based on iHCI paradigm.
The subsequent chapters explore further cognitive and emotional context-aware,
affective, social, conversational, and touchless systems as HCI applications based
on natural interaction, along with a set of relevant examples of systems that have
been development or are being developed.

6.6.2 Naturalistic User Interfaces (NUIs)

Augmenting AmI systems with natural interaction capabilities entails using user
interfaces that are ambient, perceptual, reactive, and multimodal—that is, naturalistic.
As a research area in HCI, natural interaction paradigm aims to provide models and
methods for design and development of what has come to be known as NUIs. These
provide multiple means of interfacing with a system and several distinct tools and
devices for input and output. The most descriptive identifier of the so-called NUIs is
the lack of a physical keyboard, pointing device, and/or touchscreen. In other words,
NUIs are based on or use natural modalities of human communication, such as
speech, facial expressions, eye gaze, hand gestures, body postures, paralinguistic
features, and so on. It is worth noting that NUIs may have multi-functionality: they
can be used to acquire context as implicit input, to recognize emotions, to receive
commands in the form of spoken and gestured signals as new forms of explicit inputs,
and to detect multimodal communication behavior. Ideally, an AmI system should be
equipped with user interfaces that support all these functionalities and that can be
used flexibly in response to the user’s needs. NUIs include, and are not limited to:
• Facial user interfaces are graphical user interfaces which accept input in a form
of facial gestures or expressions.
• Gesture interfaces are graphical user interfaces which accept input in a form of
hand or head movements.
• Voice interfaces accept input and provide output by generating voice prompts.
The user input is made by responding verbally to the interface. In this context,
verbal signals can be used by computers as commands to perform tasks.
• Motion tracking interfaces monitor the user’s body motions and translate them
into commands.
• Eye-based interface is a type of interface that is controlled completely by the
eyes. It can track the user’s eye motion or movement and translate it into a
command that a system can execute to perform such tasks as scrolling up and
down, dragging icons, opening documents, and so on.
• Conversational interface agents attempt to personify the computer interface in
the form of an animated person (human-like graphical embodiment), and present
interactions in a conversational form.
6.6 Natural Interaction and User Interfaces 273

6.6.3 Multimodality and Multi-channeling in Human


Communication

Multi-channel and multi-modal are two terms that tend to be often mixed up or used
interchangeably. However, they refer to quite distinct ideas of interaction between
humans and between humans and computers (HCI). In human–human communi-
cation, the term ‘modality’ refers to any of the various types of sensory channels.
These are: vision, hearing, touch, smell and taste. Human senses are realized by
different sensory receptors. The receptors for visual, auditory, tactile, olfactory, and
gustatory signals are found in, respectively, the eyes, ears, skin, nose, and tongue.
Communication is inherently a sensory experience, and its perception occurs as a
multimodal (and thus multi-channel) process. Multimodal interaction entails a set of
varied communication channels provided by a combination of verbal and nonverbal
behavior involving speech, facial movements, gestures, postures, and paralinguistic
features, using multiple sensory organs. Accordingly, one modality entails a set of
communication channels using one sensory channel and different relevant classes of
verbal and nonverbal signals. Basically, nonverbal communication involves more
channels than verbal communication, including space, silence, touch, and smell, in
addition to facial expressions, gestures, and body postures. Indeed, research sug-
gests that nonverbal communication channels are more powerful than verbal ones;
nonverbal cues are more important in understanding human behavior than verbal
ones—what people say. Particularly, visual and auditory modalities, taken sepa-
rately, can enable a wide range of communication channels, irrespective of the class
of verbal and nonverbal signals (see next chapter for examples of channels). These
modalities and related verbal and nonverbal communication behaviors are of high
applicability in HCI, in particular in relation to context-aware computing, affective
computing, and conversational agents.

6.6.4 Multimodal Interaction and Multimodal User


Interfaces

The term mode and ‘modality’ usually refers to how someone interacts with an
application, which depends on the intended use of that application, how they
provide input to the application, and how output is provided back to them. In HCI, a
modality is a sense through which the human can receive the output of the computer
and a sensor or an input device through which the computer can receive the input
from the human. It is a path of communication employed by the user interface to
carry input (e.g., keyboard, touchscreen, digitizing tablet, sensors) and output (e.g.,
display unit or monitor, loudspeaker) between the human and the computer.
For user input, the visual modalities typically require eyes only, whereas auditory
modalities require ears only, and tactile modalities requires fingers only.
The combination of these modalities is what entails multimodal interfaces. In other
274 6 Implicit and Natural HCI in AmI: Ambient …

words, one is dealing with a multimodal interface when one can both type and
speak (e.g., using vocal signals to send commands to a computer, so that it can
perform a given task), and both hear and see, then. The benefit of multiple input
modalities is increased usability, as mentioned above, for example in the case of
new forms of explicit input, for example, a message may be quite difficult to type
(cognitively demanding) but very easy to communicate to a mobile phone with a
small keypad. Another benefit in the context of context-aware computing, affective
computing, and conversational agents is the accurate detection of a user’s emotional
state, the robust estimation of a user’s emotions, and the disambiguation of com-
municative signals (mapping detected multimodal behavior to intended emotional
communicative functions), respectively. In all, the weakness or unavailability of
one modality is offset by the strength or availability of another.
Furthermore, while auditory, visual, olfactory, and tactile modalities are the fre-
quently used ones in human-to-human communication, HCI commonly uses audi-
tory, visual, and tactile (mostly to carry out input) modalities given the nature of the
interaction—based on computational processes and artificial agents. However, there
are other modalities through which the computer can send information to the human
user, such as tactile modality (e.g., the sense of pressure) and olfaction modality.
Based on the above reasoning, multimodal interaction in HCI, comprising
mostly visual, auditory, and tactile modalities provides multiple modes for the user
to interface with a system, including artificial and natural modes (e.g., keyboard,
mouse, touchscreen, explicit or/and implicit human verbal and nonverbal signals,
etc.). Hence, multimodal user interfaces provide several distinct tools for input and
output of data. Depending on the application, interfaces that may be integrated in a
multimodal user interface include, and are not limited to: web user interface (WUI),
natural-language interface, touchscreen display, zooming user interface, as well as
voice interface, speech interface, facial interface, gesture interface, motion interface,
and conversation interface agent. In the context of ECA, a conversation interface
agent involves inherently many interfaces, especially those associated with natural
modalities, as they may all be needed in a face-to-face conversation. In HCI, an
ECA represents a multimodal interface that uses natural modalities of human
conversation, including speech, facial gestures, hand gestures, and body stances
(Argyle and Cook 1976). In the context of emotional context-aware applications,
multimodal user interfaces allow capturing emotional cues as context information
from different communication channels using both visual and auditory sensory
modalities and relevant classes of verbal or nonverbal signals.

6.6.5 Context Awareness, Multimodality, Naturalness,


and Intelligent Communicative Behavior in Human
Communication: A Synergic Relationship

Communication and interaction between humans as a natural form of communi-


cation and interaction is highly complex and manifold. It involves invariably
6.6 Natural Interaction and User Interfaces 275

context awareness, multimodality, naturalness, and intelligence. Responding


intelligently in human-to-human communication is determined by the way context
is perceived (meaning attachment to such entities as places, settings, people,
objects, etc.) and its evolving patterns are monitored in a specific situation and in a
particular environment. Context consists of specific aspects that characterize a
specific situation, a certain interpretation of some situational features. And situation
denotes everything that surrounds the communication, including the sociocultural
conventions; roles and knowledge of the participants; communication goals; local,
social, and physical environment, and so on. Accordingly, the situation in which the
communication takes place provides a common ground that generates implicit
conventions and calls for the implicitly shared common knowledge, i.e., internal
models of the world and language model, which influence and to some extent set
the rules for interaction, including communicative actions, as well as provide a key
to decode the meaning of verbal and nonverbal communicative behavior. However,
intelligence is specifically relevant for interpreting, reasoning about, and responding
to communication signals—communicative intents/functions and behaviors—
associated with a particular conversational act as part of interaction that is shaped
by a given context as an expression of a certain interpretation of a situation where
that interaction takes place. How intelligent a response, a communicative behavior
can be is determined by how well it fits with the contextuality and situatedness
underlying the interaction in terms of how accurately communicative intents are
read and communicative behaviors are discerned. In other words, the context and
situation is the key to the ‘right’ meaning, as, particularly in relation to commu-
nicative behavior, words, facial expressions, gestures and so on have often many
different meanings. To respond intelligently in human interaction entails figuring
out how appropriately a context tells you to act under specific circumstances, as
context may provide an array of ways or different possibilities of reacting.
Multimodality is concerned with the perception of verbal and nonverbal signals,
the degree and quality of which is most likely to have implication for the inter-
pretation of the meaning of communicative intents and behaviors, which, in turn,
augment the chance to deliver an intelligent response. This is because these intents
and behaviors also reveal or convey contextual information (e.g., cognitive, emo-
tional, psychophysiological states) beyond the context of dialog, the culture, and the
environment. Such contextual information also changes the interaction and thus the
ongoing conversational act. Constructivist theory posits that human interaction is
always contextual situated, and it is within this evolving context that meaning is
assigned to this interaction. Context defines and also influences interaction due to
perception of emergent contextual variables or re-interpretation of situation, which
entails including more modalities and thus channels as carries of new information.
Accordingly, cognitive, emotional, and social behaviors in human interaction
provide information that changes the current context and hence how the interaction
evolves.
There is a synergic relationship between context awareness, multimodality,
naturalness, and intelligence. This is what typifies the applications that are based on
implicit and natural interaction paradigms, including context-aware systems,
276 6 Implicit and Natural HCI in AmI: Ambient …

affective systems, and conversational systems. For example, context-aware systems


use naturalistic, multimodal user interfaces in order to be able to recognize the
context in which it is being used so that it can adapt its functionality according to that
context, thereby reacting and pre-acting intelligently to the user’s spoken and ges-
tured indications. Intelligent behavior of such systems requires an accurate detection
of context data, which necessitate using multiple input modalities, and complex
interpretation and reasoning processes to be able to infer a relevant high-level
abstraction of context, which determines the best way to behave or act, by providing
services that match the user’s immediate needs. Context awareness thus enables
adaptation, responsiveness, and anticipation in terms of service provision as an
intelligent behavior. And verbal and nonverbal communication behavior provides
context-aware systems with the possibility to implicitly acquire context information,
which intelligent agents use to perform further means of processing and to take
actions autonomously. Context-aware systems are increasingly moving towards
NUIs that are ambient, multimodal, and intelligent. In all, for context-aware appli-
cations to be able to provide intelligent services to users they must be equipped with
naturalistic user interfaces to allow acquiring rich and accurate information about
user context, and thus robust estimation of high-level context abstractions.

6.7 Intelligence and Intelligent Agents

The ‘intelligence’ alluded to in AmI pertains particularly to the environments, net-


works, devices, and actions, where it resides and manifests and its associations to
aspects of human functioning in terms of cognitive, affective, and behavioral pro-
cesses and established concepts of AI and cognitive science. The areas of AI that have
been integrated into AmI encompass: cognitive intelligence in relation to context
awareness, emotional computing, social intelligence, and conversational intelligence
and what these entail in terms of sensing, machine learning/pattern recognition,
modeling and reasoning, and actuators/effectuators. AI is the branch of computer
science that is concerned with understanding the nature of human intelligence and
creating computer systems capable of emulating intelligent behavior (see Chap. 9 for
description of AI and its contribution to AmI). AmI relates to AI in that it deals with
intelligent systems that possess human-inspired cognitive, emotional, social, and
conversational intelligence in terms of both computational processes and behaviors.
Intelligent agents are an important and common topic in the literature on and a
major research area in AI and AmI alike. The intelligent agent as a paradigm
became widely acknowledged during the 1990s (Russell and Norvig 2003; Luger
and Stubblefield 2004). This period also marked the emergence of the vision of
UbiCom—in early 90s—and the vision of AmI—in late 90s. In computer science,
namely in the field of AI, HCI, AmI, Ubicom, and mobile computing, the term
‘intelligent agent’ may be used to describe a software agent that has some intelli-
gence. Originated in AI, an intelligent agent may be considered as an umbrella term
for an intelligent agent in AmI.
6.7 Intelligence and Intelligent Agents 277

6.7.1 Intelligent Agents in AI and Related Issues

In AI, Russell and Norvig (2003) define an intelligent agent as an autonomous


entity which observes an environment using sensors and acts upon it using actuators
(i.e., it is an agent that actuates systems by responding to command stimulus or
control signals) or effectors (i.e., it is an agent that produces a desired change in an
object in response to input), and directs its activity towards achieving goals.
Therefore, intelligent agents are characterized by autonomy, reaction to the envi-
ronment, goal-orientation, and persistence (Franklin and Graesser 1997). Here
persistence entails a code that runs continuously and decides for itself when it
should perform some action, and autonomy refers to the ability to select and pri-
oritize tasks, make decisions, and change or display new behavior, which is based
on the experience of the system and hence without human intervention. Russell and
Norvig (1995, p. 35) states: ‘A system is autonomous to the extent that its behavior
is determined by its own experience.’ It is to note that while some definitions of
intelligent agents emphasize their autonomy, thereby preferring the term autono-
mous intelligent agents, others (notably Russell and Norvig 2003) consider
goal-directed behavior as the essence of intelligence, thus preferring rational agents.
As echoed by Wooldridge and Jennings (1995), an agent is defined in terms of its
behavior. In this line of thinking, Kasabov (1998) describes an intelligent agent
system as one that should adapt online and in real-time; be able to analyze itself in
terms of behavior and success; learn and improve through interaction with the
environment; and learn quickly from large amounts of data; and accommodate new
problem solving rules incrementally, among others. It can be noticed that intelligent
agents in AI are closely related to software agents in terms of being capable to
behave intelligently; hence, the term ‘intelligent agent’ is also used to describe a
software agent that has some intelligence. In distinguishing intelligent software
agents from intelligent agents in AI, Russell and Norvig (2003) points out that
intelligent agents are not just software programs, they can also be anything that is
capable of goal-directed behavior, such as machines, human beings, community or
organization of human beings working together towards a goal. For example, ECAs
are agents that has human-like graphical embodiment, and possess the ability to
engage people in face-to-face conversation (Cassell et al. 2000) to mingle socially.
Furthermore, depending on the application domain, Al agents or systems
encapsulate a wide variety of intelligent subagents, including input agents, pro-
cessing agents, mapping agents, decision agents, actuating agents, world agents,
physical agents, temporal agents, and so forth. As an example of a world agent,
ECA, a believable agent that exhibit a personality via the use of an artificial
character (graphical embodiment) for the interaction, incorporate a wide range of
subagents or classes of agents to be able to conduct an intelligent dialog as an
autonomous behavior with a human user. Moreover, the idea of multi-agent system,
a system involving a number of varied sets of subagents, can be better illustrated
through social phenomenon. This occurs, according to Smith and Conrey (2007),
as the result of repeated interactions between multiple individuals and these
278 6 Implicit and Natural HCI in AmI: Ambient …

interactions can be looked at as a multi-agent system involving multiple (sub)agents


interacting with each other and/or with their environments where the outcomes of
individual agent’s behaviors are interdependent in the sense that each agent’s ability
to achieve its goals depends on what other agents do apart from what it does itself.

6.7.1.1 Five Classes of Intelligent Agents—AI and AmI

As to the structure of an intelligent agent in AI, an agent program can be described


as an agent function which maps every possible percept sequences the agent
receives to a possible action the agent can perform (Russell and Norvig 1995,
2003). In AmI, an intelligent agent (e.g., a context-aware system) performs actions
based on the interpretation of and reasoning about the context data acquired from
sensors in its environment together with context models, thereby linking between
situation and action. A software agent encodes bit strings as its percept and action
(Russell and Norvig 1995). In both AmI and AI, the agent’s percept/implicit input
are mapped to an action that usually reflects, based on the goal of the agent, a
different level of intelligent behavior, e.g., reactive/adaptive, responsive, or pro-
active (see below for an elaborative discussion). Indeed, agents differ in their degree
of perceived intelligence and capability upon which five classes of agents can be
clustered: simple reflex agent, model-based reflex agents, goal-based agents,
utility-based agents, and general learning agents (Russell and Norvig 2003). These
are described below with attempts to link them to some AmI applications.

Simple Reflex Agent

A simple reflex agent (see Fig. 6.1) observes the current environment—percept—
and acts upon it, ignoring the rest of the percept history. Its function is based on the
condition-action rule: if condition then action, and only succeeds when the envi-
ronment is fully observable. Otherwise—if operating in partially observable envi-
ronments—infinite loops become unavoidable, unless, to note, the agent can
randomize its actions.
The simple reflex agent may be used in systems that incorporate the basic concept
of iHCI, i.e., they use situations as implicit elements to trigger the start of systems. In
most of these systems there is direct connection between the situation and the action
that is executed. That is, these systems carry out a predefined action when certain
context is recognized—if-then rule. A simple reflex agent works only if the correct
decision can be made on the basis of the current percept (Russell and Norvig 1995).
Thus, recognition of the situation, the interpretation, and the reaction is simple to
describe, as shown in Fig. 6.1. A common example is an automatic outdoor lantern
device. Such lights are found at the entrance and the floor-levels of buildings.
Whenever a person approaches the entrance and it is dark the light switches on in an
automatic way. A simple sensor is used to detect the situation of interest, which is
hard-coded with an action (switching on the light for a certain period of time).
6.7 Intelligence and Intelligent Agents 279

Fig. 6.1 Simple reflex agent. Source Russell and Norvig (2003)

Model-Based Reflex Agent

Due to storing internal models of the world, a model-based reflex agent (see
Fig. 6.2) can handle a partially observable environment. In other words, the agent
current state is stored inside the agent that maintains some sort of knowledge
representation of ‘how the world works’ representing the part of the world that
cannot currently be seen. This knowledge structure (internal model) depends on the
percept input history and thus reflects some of the unobserved aspects of the current
state. Like the reflex agent, the model based agent’s function is based on the
condition-action rule: if condition then action. The model-based reflex agent
resembles intelligent software agents that are used in context-aware applications
which are based on ontological or logical context models, e.g., activity-based
context-aware applications.

Fig. 6.2 Model-based reflex agent. Source Russell and Norvig (2003)
280 6 Implicit and Natural HCI in AmI: Ambient …

Fig. 6.3 Model-based, goal-oriented agent. Source Russell and Norvig (2003)

Goal-Based Agent

A goal-based agent (see Fig. 6.3) uses goal information, which describes situations
that are desirable, to expand on the capabilities of the model-based agent. This
added capability allows the agent to select among the multiple available possibil-
ities the one which reaches a goal state. This stems from the fact that awareness of
the current state of the environment may not always be enough to decide an action.
Search and planning are devoted to finding action sequences that reach the agent’s
goals. This agent is characterized by more flexibility due to the explicit represen-
tation and the modification possibility of the knowledge that supports its decisions.
Also, decision making is fundamentally different from the condition-action rules, in
that it involves consideration of the future. Involving additionally internal model of
‘how the world works’, the goal-based agent may be relevant to AmI systems that
provide predictive services.

General Learning Agent

A learning agent (see Fig. 6.4) is able to initially operate in unknown environments
and becomes more knowledgeable than its initial knowledge alone might allow. It
entails three distinctive elements: the ‘learning element’, which is responsible for
making improvements, the ‘performance element’, which is responsible for
selecting external actions, and the ‘problem generator’, which is responsible for
suggesting actions that will lead to new experiences. For future improvement, the
learning element employs feedback from the ‘critic’ on how the agent is performing
and determines accordingly how the performance component should be adapted.
The learning agent is what machine learning technique—unsupervised learning
algorithm—entails with regard to context recognition, especially the performance
6.7 Intelligence and Intelligent Agents 281

Fig. 6.4 General learning agent. Source Russell and Norvig (2003)

component represents the entire agent: it takes in (implicit) percepts and decides
on (implicit: context-dependent) actions. The learning agent is relevant to
activity-based or cognitive context-aware applications.

Utility-Based Agent

Unlike a goal-based agent which only differentiates between goal states and
non-goal states, a utility-based agent (see Fig. 6.5) can define a measure of how
desirable a particular state is compared to other states. Comparing different world
states is done, using performance measure, on the basis of ‘how happy they would
make the agent’, a situation which can be described using the term utility. In this
sense, a utility function is used to map ‘a state to a measure of the utility of the
state’, onto a real number that describes the associated degree of happiness. The
concept of ‘utility’ or ‘value’, a measure of how valuable something is to an
intelligent agent, is based on the theory of economics, and used in computing to
make decisions and plans. With the probabilities and utilities of each possible action
outcome, a rational utility-based agent selects, based on what it expects to derive,
the action that maximizes the anticipated utility of the action outcomes. Perception,
representation, reasoning, and learning are computational processes that are used by
utility-based agent to model and keep track of its environment. The computational
tools that analyze how an agent can make choices or decisions include such models
as dynamic decision networks, Markov decision processes, and game theory. Many
of the computational processes underlying the utility-based agent seem to have
much in common with supervised learning algorithms for context recognition. The
utility-based agent can thus be used in location-, activity- and emotion-based
context-aware applications.
282 6 Implicit and Natural HCI in AmI: Ambient …

Fig. 6.5 Utility-based agent. Source Russell and Norvig (2003)

6.7.2 Intelligent Agents in AmI and Related Issues:


Context-Aware Systems

Intelligent agents are closely related to software intelligent agents, an autonomous


computer or software program that perceives a context and behaves accordingly,
carrying out tasks on behalf of users using effectors. This is associated with AmI (or
iHCI) applications like context-aware applications. AmI emphasizes both the
autonomy and behavior of intelligent software agents. One of the cornerstones of
AmI is the autonomous, adaptive behavior of systems in response to the user’s
cognitive, emotional, physiological, and social states. In this context, autonomy
denotes the ability to interpret and reason about context, make knowledge-based
decisions or choices, and execute actions or exhibit new behavior. Autonomy is
usually based on the experience of the system in terms of learning of context
models using machine learning techniques (see below for more detail) and/or on the
use of description logic knowledge bases, i.e., ontological context repositories.
Behavior, on the other hand, entails learning from the user and improving through
interaction with the user environment to build experience as well as adaptation,
responsiveness, and proactivity (see below for more detail). Agents are autonomous
and have flexible behavior, i.e., possessing reactive, proactive, and social abilities
(Wooldridge 2002). A social ability here means an agent capability to engage other
components through communication and coordination, which may collaborate on a
particular task, e.g., a pattern recognition algorithm which is based on a hybrid
6.7 Intelligence and Intelligent Agents 283

approach to context modeling and reasoning (e.g., ontological, probabilistic, and


logical programing approaches). The benefit of ontological context models lies in
providing shared knowledge models that improve automated processing capabilities
by allowing software intelligent agents to soundly interpret and reason about
context information, therefore enabling intelligent decision making in a knowl-
edgeable manner. However, besides epitomizing a complex intelligent agent, a
context-aware application essentially involves a set of subagents. To perform its
functionality, it is assembled in a hierarchical structure containing many subagents
that perform functions at different levels. Input agents perform signal processing,
computation, processing: process and make sense of sensor inputs—e.g., neural
network or HMMs based agents. In ontology-based recognition algorithms,
semantic mapping agents map sensor readings to corresponding properties in
context ontologies, thereby collecting semantic information to generate a context.
Reasoning or inference agents reason and make assumptions about the context and
the relevancy of the services to be delivered based on semantic information and
inference rules. Decision agents make decisions on what actions to perform at the
application level. Application agents fire relevant actions using effectors. Taken
together, the intelligent agent and subagents form a complete context-aware system
that can deliver services to the user with behaviors or responses that exhibit a form
of intelligence. Intelligent agents can also be integrated to form a world agent,
which comprises various classes of intelligent agents. For example, an ECA
involves a set of integrated intelligent agents, including a context-aware agent for
acquiring information about the user’s environment and culture, a multimodal
behavior agent for reading communicative signals, an affective agent for recog-
nizing and responding to emotions, and an emotionally intelligent agent for man-
aging emotional responses, to allow an autonomous behavior. These agents should
all be incorporated in a conversational system, so it can engage in an intelligent
dialog or mingle socially with a human user. In all, intelligent agents are key
components of AmI systems and undergird their autonomous intelligent behavior:
they carry out tasks on behalf of users by detecting, interpreting, and reasoning
about information, making decisions, performing actions, and exploiting the rich
sets of services available within AmI environments.
The function of the service delivery subagent is associated with a key aspect of
the intelligent behavior of AmI systems as an ensemble of intelligent subagents,
namely adaptation, proactiveness, and responsiveness. These represent types of
services that are provided to users as complex goals—e.g., to support their cognitive,
emotional, and social needs. As to adaptation, an intelligent agent perceives the
context in which it operates and adapts to it appropriately. Specifically, it detects,
interprets, and reason about, for example, a user’s cognitive or emotional state;
determines the most appropriate action; and then carries it out. With regard to
proactiveness, the intelligent agent makes decisions based on predictions or
expectations about the near future, and acts on behalf of users. In other words, it
learns from users’ behavior in order to anticipate their future needs and
self-initiatively perform tasks designed to make their life easier. Here, the agent
explicitly takes into account possible future events, which is not the case for
284 6 Implicit and Natural HCI in AmI: Ambient …

adaptation whereby the system is inherently reactive because the decision making is
based on the current context with no explicit regard to the future. There is an
assumption in AmI that the software agent should be so intelligent that it can
anticipate the user’s behavior and predict the user’s intentions. AmI represents
technology that can think on its own and predict and adapts and respond to users’
needs. As to responsiveness, the intelligent agent detects and interprets emotional
cues as multimodal behavior, reason about it, determines what the most appropriate
response is, and acts on it. It is worth noting that service-based behaviors and
responses involve both effectors and actuators (physical actors) to act, react, and
pre-act based either on pre-programed heuristics (using ontologies) or real-time
reasoning (using machine learning) capabilities. AmI service types are discussed
further in the next section.
Learning is a key characteristic of the behavior of intelligent agents. It serves
AmI systems to build experience on various types of contexts in a large variety of
domains as well as their relationships as in in real-world scenarios. This is used
primarily to classify or infer new contexts and predict users’ behaviors and actions.
To iterate, it is the experience of the intelligent agent that determines the behavior
of an autonomous system. Machine learning is used to augment AmI systems with
the ability to learn from the user’s context (e.g., states, behaviors) by building and
refining models, specifically in relation to supervised learning algorithms which
keep track of their earlier perceived experiences and employ them to learn the
parameters of the stochastic context models in a dynamic way. This enable AmI
interfaces (agents) to learn from users’ states or behaviors in order to anticipate their
future needs, in addition to recognize new or unknown contextual patterns.
However, the difficulty with intelligent agents is that they can become unpredict-
able. As a consequence of the ability of software intelligent agents to learn, to
adapt, and self-initiatively anticipate their configuration and even their program
structure is that they can react differently on the same control signals at different
points in time. The more intelligent agents learn, the less predictably they behave
(e.g., Rieder 2003).
In all, AmI systems involve various autonomous active devices which entail the
employment of a range of artificial and software intelligent agents. These include,
and are not limited to: push and pull agents (context-aware applications), world
agents, physical agents, distributed agents, multi-agents, and mobile agents, but to
name a few. World agents incorporate an amalgam of classes of agents to allow
autonomous behaviors. Physical agents perceive through sensors and acts through
actuators. Distributed agents are executed on physically distinct (networked)
computers; multi-agent systems are distributed agents that do not have the capa-
bilities to achieve a goal alone and therefore must communicate; and mobile agents
are capable to relocate their execution onto different processors (Franklin and
Graesser 1997). Indeed, an intelligent software agent could run on a user’s com-
puter but could also move around on and across various networks and while exe-
cuting its task, it can collect, store, process, and distribute data.
An AmI intelligent agent is assumed to possess human-like cognitive, emotional,
and social skills. It is claimed by AmI computer scientists that computers will have
6.7 Intelligence and Intelligent Agents 285

a human-like understanding of humans and hence will affect their inner world by
undertaking actions in a knowledgeable manner that improve the quality of their
life. Put differently, AmI seeks to mimic complex natural human processes not only
as a computational capability in its own, but also as a feature of intelligence that can
be used to facilitate and enhance cognitive, emotional, and social intelligence
abilities of humans. However, some views are skeptical towards the concept of
AmI, considering it as questionable, inferior to human intelligence, and something
nonhuman. ‘There may possibly…be a reaction against the concept of AmI as
something nonhuman that completely envelops and surrounds people even if it is
unobtrusive or completely invisible. It will be important to convey the intention
that, in the ambient environment, intelligence is provided through interaction,
or participation and can be appreciated more as something that is
non-threatening, an assistive feature of the system or environment which
addresses the real needs and desires of the user’ (ISTAG 2003, pp. 12–13) (bold
in the original).

6.8 Personalized, Adaptive, Responsive, and Proactive


Services in AmI

AmI can offer a wide variety of services to the user, namely personalized, adaptive,
responsive, and proactive services. AmI is capable of meeting needs and antic-
ipating and adapting and responding intelligently to spoken or gestured indications
of desire, and even these could lead to systems that are capable of engaging in
intelligent dialog (ISTAG 2001; Punie 2003). In terms of iHCI, the range of
application areas that utilize iHCI model is potentially huge, but given the scope of
this chapter, the emphasis is on context-aware applications in relation to ubiquitous
computing and mobile computing that provide the kind of personalized, adaptive,
and proactive services. It is important to note that context-aware applications should
adopt a hybrid form of interactivity to provide these types of services—that is,
user-driven (visibility) and system-driven (invisibility) approaches.

6.8.1 Personalization

Having information on the specific characteristics of the user and their context
available, it becomes possible to create applications that can be tailored to the user
needs. Personalization, sometimes also referred to as tailoring of applications, is a
common feature of both desktop and ubiquitous computing applications. It has been
widely investigated (see, e.g., Rist and Brandmeier 2002; Rossi et al. 2001;
Stiermerling et al. 1997). It entails accommodating the variations between users in
terms of habits (i.e., customs, conducts, routines, practices, traditions, conventions,
286 6 Implicit and Natural HCI in AmI: Ambient …

patterns, tendencies, inclinations, likes, preferences, interests, and lifestyles) as well


as location, time, social category, and cultural profile. By the very nature of
context-aware systems, information about users and their situations can be collected
using both explicit and implicit forms of input in order to deliver personalized
output. Moreover, the range of application domains that utilize or involve per-
sonalization is potentially huge, such as e-education/e-learning, e-health,
e-business, e-communities, digital new and social media, mobile computing, and
activities of daily living (ADL) within smart homes (SH)—context-aware person-
alized assistance. The diversity and dynamics of AmI applications as well as their
users presume an increased level of personalization. This will add to the smoothness
of interaction and thus the enrichment of the user experience.
Personalization is where applications let the user specify his/her settings for how
the application should behave in a given situation (Chen and Kotz 2000). In
context-aware computing, personalization may involve two levels of interactivity:
passive and active. Passive context-aware applications present updated context or
sensor information to the user and let the user decide how to change the application
behavior, whereas active aware applications autonomously changes the application
behavior according to the sensed information (Ibid). These two approaches or levels
of interactivity have been termed differently in the context of personalized services
and information provided by context-aware applications, including in addition to
passive versus active (Barkhuus and Dey 2003), pull versus push (Cheverst et al.
2001); interactive versus proactive (Brown and Jones 2001); and sometimes explicit
versus implicit.
In context-aware computing, researchers consider only push based applications
to be context-aware (Erickson 2002). In other words, pushing information towards
the user has been the commonly used approach in context-aware computing.
However, the perception has grown that personalized information and services
typifying changes in relevance to users’ needs should ideally be based on a hybrid
model in context-aware applications, i.e., combining both approaches, and thereby
taking into account users’ preferences to interactivity level. In a study on
context-aware mobile computing Barkhuus and Dey (2003) analyze users’ attitudes
towards the three levels of interactivity: personalization and active and passive
context awareness and found that users ‘feel a lack of control when using the more
autonomous interactivity approaches but that they still prefer active and passive
context-aware features over personalization oriented applications in most cases’.
Nevertheless, the authors conclude that users are willing to accept a large degree of
autonomy from applications and thus give up partial control if the reward in use-
fulness is great enough—greater than the cost of limited control. Regardless, these
results provide useful insights into understanding the difference in users’ perception
of the available levels of interactivity. Hence, autonomy, which is driven by the
idea of invisibility that is guiding context-aware computing, should not be taken for
granted as a way of tailoring applications to users’ preferences, in particular, and
responding to their needs, in general. Chen and Kotz (2000) maintain that a hybrid
form of interactivity can provide a more profound understanding of context-aware
computing. In all, a hybrid approach to personalization is a means of
6.8 Personalized, Adaptive, Responsive, and Proactive Services in AmI 287

complementing invisibility with visibility in context-aware computing and also


reflects a way of accounting for user differences, an aspect which is critical for user
acceptance of AmI technologies. Especially, it is often difficult to interpret the
user’s intentions as to whether wanting, in some cases, to change the settings of
applications at all, which may result in these applications behaving outside the
range of the user’s expectations. Therefore, it becomes important and fruitful to
carry out further investigations on how different users perceive context-aware
personalized services. Designers and researchers in context-aware computing
should draw on or conduct new research into the specificities of users as to
sociocultural, behavioral, and other relevant dimensions—based on ethnographic,
in-depth studies of users in real life settings—in attempt to better understand how
different users and user groups would aspire to benefit from context-aware per-
sonalized services. Ideally, context-aware personalized services should be dedicated
for every user, not just based on user groups as is the case for many applications,
including e-education, e-health, and web applications, rendering sometimes
context-aware personalized services (and information) undesirable due to lack of
relevance, annoyance, and frustration (albeit not in intention). This actually goes for
adaptive, responsive, and proactive services as well.
In context-aware computing, personalization is seen as a means of enhancing the
user experience by providing smoothness and enrichment to interaction and
meeting different users’ needs more effectively. Widely applicable, it offers many
benefits to users within a wide range of situations, among which include: elimi-
nating repetitive tasks and preventing redundant work, which reduces the cognitive
and physical burden for a user to manipulate applications; filtering out information
not relevant to a user; providing more specific information that is relevant to a
user’s interests, habits, and environment; increasing the reliability of information;
accommodating personal preferences (i.e., allow users to personalize websites,
information systems, and communication systems from the comfort of their own
activity, work, and setting); and so on. Personalization is therefore necessary for
more efficient interaction and for fine-tuning and better acceptance of technologies.
Researchers contend that the diversity and dynamics of applications call for an
increased level of tailoring applications, and that this emphasis on personalized
functionality will add to the user experience and smoothness of interaction
(Stiermerling et al. 1997). This is of high relevance to AmI whole ultimate goal is to
heighten user experience and bring smoothness to user interaction.
There is a downside of personalization in AmI. This pertains to encroachments
upon privacy, security violation, intrusiveness, frustration, annoyance, loss of
control, and lack of relevance. While these risks constitute major concerns in AmI,
privacy is probably the most critical issue that worries people. The fact that AmI
technology is designed to provide personalized services to users signifies that it is
able to gather and store a large amount of sensitive information about users’
everyday interactions, communications, activities, behaviors, attitudes, preferences,
and so on, without user consent. The risk is that this personal information will be
disclosed to other sources, institutions, and individuals (see Punie 2003), and will
be abused either accidentally or intentionally (Wright 2005). The more AmI knows
288 6 Implicit and Natural HCI in AmI: Ambient …

about the user, the larger becomes the privacy threat. Although considered uneth-
ical, encroachments upon privacy practices continue nowadays and will in the AmI
era, committed by government agencies in association with ICT industry and
marketing companies, and thereby directing the data collected originally for the
purpose of personalized service provision for other acts deemed unjustified and
unacceptable, putting personal data of individuals at risk. Notwithstanding the effort
to overcome privacy issues, the privacy conundrum remains unsolved. How to
‘ensure that personal data can be shared to the extent the individual wishes and no
more’ is ‘not an easy question to answer. Some safeguards can be adopted, but the
snag is that profiling and personalization…is inherent in AmI and operators and
service providers invariably and inevitably will want to ‘‘personalize’’ their offer-
ings as much as possible, and as they do, the risks to personal information will
grow’ (Wright 2005, p. 43). The ICT industry is required to address and overcome
the privacy issues that are most likely to cause many users to decline or distrust any
sort of personalized services in the medium and long-term. Already, experiences
have shown numerous incidents that make personalization unwelcome (e.g., Wright
et al. 2008; Wright 2005).
However, AmI applications should allow the user, especially in non-trivial sit-
uations, to choose to accept or decline the proposed personalized services. Besides,
the control of context-aware interactions should lie in the users’ own hands and not
be dictated by developers as representatives of the ICT industry. This is most often
not the case in current context-aware applications where it is the developer who
decides how the application should behave, not the user. This issue should be
considered in future research endeavors focusing on the design and development of
context-aware applications in terms of personalized services. Designers and
developers in AmI should draw on new findings from recent social studies of new
technologies on user preferences, attitudes, and impression formation in relation to
the use of technology.

6.8.2 Adaptation and Responsiveness

Adaptation and responsiveness are key features of AmI. The adaptive behavior of
AmI systems in response to the user’s cognitive or emotional state is regarded as
one of the cornerstones of AmI. Another related feature of the behavior of AmI
systems is the ability to respond to human emotions as a communicative behavior.
AmI aims to provide services and control over interactive processes, and support
various cognitive, emotional, and social needs. See Chaps. 8 and 9 for examples of
adaptive and responsive applications and services and an elaborative discussion
on adaptation and responsiveness as intelligent computational capabilities. There is
much research in the field of HCI dedicated to cognitively and emotionally ambient
user interfaces and related capture technologies and pattern recognition techniques
(real-time reasoning capabilities and pre-programed heuristics as the basis
for adaptation and responsiveness as well as anticipation (see below), in addition to
6.8 Personalized, Adaptive, Responsive, and Proactive Services in AmI 289

natural language interaction, emotional sensitivity, emotional intelligence, social


intelligence, and the relationship between emotion and cognition.
In human-centric computing, adaptation entails a system perceiving the context in
which it operates and adapting its behavior to that context appropriately, adjusting
for use in different conditions. The significance of research in adaptation in AmI
stems from its potential to improve people’s quality of life. On the adaptive skills as a
necessary feature of AmI in order to interact with the human actors, Gill and
Cormican (2005, p. 6) write: ‘AmI needs to be able to adapt to the human actor
directly and instinctively. This should be accomplished without being discovered or
consciously perceived therefore it needs to be accomplished instinctively… The
characteristics it is required to show are spontaneity, sensitivity, discerning,
insightful and at times shrewd’. For example, in relation to cognitive context
awareness, the system recognizes a user’s cognitive states or processes, such as
decision making, problem solving, learning, and reasoning, and support him/her in
performing cognitive tasks, such as information searching, information retrieval,
product design, workshop organization, game playing, and so on (e.g., Kim et al.
2007; Lieberman and Selker 2000). The adaptive behavior of AmI system is asso-
ciated with human factors related context, which encompasses cognitive state,
emotional state, bio-physiological conditions, activities, engaged tasks, goals, social
dynamics, and so forth. With awareness of such contextual elements, AmI systems
become able to intelligently adapt for use in different situations, which should occur
without conscious mediation. Computers will become unobtrusive, finding ‘their
way invisibly into people’s lives by means of users using computers ‘unconsciously
to accomplish everyday tasks’ (Weiser 1991). This assumes that computers will be
equipped with human-like interaction and cognitive processing capabilities, using
implicit user interfaces that support natural human forms of communication and
multiple intelligent agents dedicated for performing complex computational func-
tions. Using naturalistic ambient user interfaces is one of the most critical compo-
nents of AmI systems to allow the adaptive behavior. These user interfaces are
equipped with multisensory devices dedicated to read context data (e.g., emotional
cues, cognitive cues, social cues, etc.) from multiple sources using multiple
modalities that do not dictate the number of communication channels that can
potentially be used for interfacing with the system. This context of use of AmI
applications is driving design of hardware and software towards ever-more-complex
technologies, e.g., multi-sensor fusion, knowledge-based and hypermedia interfaces,
hybrid forms of modeling and reasoning, machine learning and reasoning, and
multi-agent software. It is becoming increasingly possible to build applications that
adapt to cognitive and emotional states as both internal and external context. The use
of context awareness offers a great potential to dynamically adapt applications to the
current human situation.
Adaptive user interfaces entail adjusting the software part of the user interface at
runtime based on the available context with relevancy to the task at hand.
Generally, the requirements for the user interfaces are dependent on, in addition to
the user and the context, the application (e.g., ‘quality parameters for the visuali-
zation of certain content’) and the user interface hardware available (e.g., ‘device
290 6 Implicit and Natural HCI in AmI: Ambient …

with specific properties or a distributed configurable UI system with various input


and output options’) (Schmidt 2005). Accordingly, the visual features of a display
like colors, brightness, and contrast can be adjusted depending on where the user
moves with his/her laptop (e.g., dim room, living room, in open air). Also in a
multi-display environment, a relevant display can be selected with the right font and
size based on the type of the task the user is engaged with (e.g., writing, reading,
designing, information searching, game playing, etc.) in a way that helps the user
perform better and focus on the task at hand. However, there is a variety of chal-
lenges associated with the topic of adaptive user interface; user interface adaptation
for distributed settings and user interface adaption in a single display are two areas
that show exemplarily the problem domain. As the first area: ‘in environments
where there is a choice of input and output devices it becomes central to find the
right input and output devices for a specific application in a given situation. In an
experiment where web content, such as text, images, audio-clips, and videos are
distributed in a display rich environment…context is a key concept for determining
the appropriate configuration…In particular to implement a system where the user
is not surprised where the content will turn up is rather difficult’ (Schmidt 2005,
p. 169). As to the second area: ‘adapting the details in a single user interface a
runtime is a further big challenge. Here in particular adaptation of visual and
acoustic properties according to a situation is a central issue…We carried out
experiments where fonts and the font size in a visual interface became dependent on
the situation. Mainly dependent on the user’s activity the size of the font was
changed. In a stationary setting the font was small whereas when the user was
walking the font was made larger to enhance readability…’ (Ibid).
Like personalized services, adaptive services pose issues when they are delivered
based only on autonomous acting of the system. This relates to the issues pertaining
to the notion of invisibility in context-aware computing. Invisibility has its own
special conundrums. Issues in this regard include lack or loss of control, intru-
siveness, frustration, fear, mistrust, and suspicion. Therefore, it is necessary to
adopt a hybrid approach to the provision of adaptive services, i.e., combining
system-driven (or adaptation), which is autonomous and based on analytic and
reasoning patterns of context (i.e., a user’s cognitive state, emotional state, activity)
and user-driven (or adaptability), which allow the user to decide how the system
should behave based on the dynamic, context-aware features provided by the
system. As a computational functionality, both adaptive and adaptable services
demonstrate an intelligent behavior and focus on the human actor, with the first
being associated with intelligence as a default feature and the second with intelli-
gence as presented to the user who can decide if the system should execute the
action. In addition, the provision of adaptive services is also associated with privacy
issues as it is based on gathering personal information on the user, such as facial,
gestural, and bodily movement. The price of the adaptation of AmI systems (as
intelligent devices) ‘is continuous measurement and interpretation of our body data
and movements’ (Crutzen 2005). Some of user’s information might become
privacy-sensitive when the agent processes it or combines it with other information
(Wright 2005).
6.8 Personalized, Adaptive, Responsive, and Proactive Services in AmI 291

Responsiveness is a feature of the intelligent behavior of AmI systems. AmI


environments facilitate human emotion experiences by providing users with
appropriate emotional services instantaneously (Zhou and Kallio 2005). Emotional
services can help users to perform their daily tasks by attempting to avoid negative
emotions that affect cognition, produce emotional responses that have a positive
effect on users’ emotions, and train users mediate their emotional intelligence.
Accordingly, responsive services are associated with context-aware systems in
terms of responding to the emotional states of the user triggered when doing
cognitive tasks; affective systems in terms of displaying and producing emotions,
i.e., appearing sensitive, tactful, and empathetic; and conversational systems in
terms of responding to emotions conveyed verbally and nonverbally as part of the
user multimodal communicative behavior. AmI systems need to ‘be tactful and
sympathetic in relation to the feelings of the human actor, has to react quickly,
strongly, or favorably to the various situations it encounters. In particular, it needs
to respond and be sensitive to a suggestion or proposal. As such, it needs to be
responsive, receptive, aware, perceptive, insightful, precise, delicate, and most
importantly finely tuned to the requirements of the human actor and quick to
respond.’ (Gill and Cormican 2005, p. 6). Moreover, responsiveness is based on the
interpretation and real-time reasoning (supported by pre-programed heuristics) of
emotional/affective information. This requires that AmI systems be equipped with
perceptual and multimodal user interfaces in order to be able to capture the relevant
information about the users’ emotional states and emotions conveyed through affect
display or verbal and nonverbal signals: emotiveness, prosody, and facial, vocal,
and gestural cues.
Responsiveness is associated with significant challenges in the area of AmI and
AI. In other words, whether designing emotional context-aware, affective, or con-
versational systems, dealing with emotions in a computerized way is a daunting
challenge. It is difficult to handle emotions as communicative intents and behaviors
in relation to conversational agents in AmI environments given the multidimen-
sional and complex nature of human communication, involving linguistic, para-
linguistic, extra-linguistic, pragmatic, sociolinguistic, psycholinguistic, and
neuro-linguistic dimensions of spoken and gestured language and the relationship
between these dimensions. Also, as to emotion conveyance in dialog acts, different
body parts movement are used to convey feelings: the gestural expression is used
for expressing attitudes, the facial expression is used for emotional reactions,
prosody can express feelings and attitudes, and speech is the most precise tool for
expressing complex intentions (Karpinski 2009). AmI systems should be able to
accurately detect, soundly interpret, and rapidly reason on emotional stances as
outward manifestations of emotions that human show or use in dialog or conver-
sational acts. They should moreover be able to sense dynamic emotional changes in
humans from different body parts movements and speech and learn to respond to
them promptly, immediately, and even proactively, especially speech requires a
real-time response. Hence, the performance of AmI systems becomes very critical
given that they need to be timely in acting. Furthermore, more recent work argues
that emotions cannot be so easily classified and that the expression of emotions is
292 6 Implicit and Natural HCI in AmI: Ambient …

culturally dependent (Pantic and Rothkrantz 2003). Individuals differ on the basis of
their cultures and languages as to expressing and interpreting emotions. There are as
many emotional properties that are idiosyncratic as universal. There is hardly ever a
one-size-fits-all solution for the growing variety of users and interactions (Picard
2000). For more challenges and open issues involved in dealing with emotions in
the aforementioned mentioned computing domains, the reader is directed to Chaps.
7 and 8. To avoid negative emotions and convey, evoke, and elicit positive ones is
critical to the success of AmI systems.

6.8.3 Anticipation (and Proactiveness)

Anticipation and proactiveness is an interesting feature of the intelligent behavior of


AmI systems. One of the most fundamental ideas in the AmI vision is the antici-
patory and proactive nature of the AmI system that frees humans from routine tasks
and manual control of the environment. AmI proclaims that human environments
will be embedded with various types of sensors, computing devices, and networks
that can sense and monitor ongoing human activities and behaviors and proactively
respond to them. In AI, anticipation entails an intelligent agent perceiving the
environment, making decisions, and acting proactively on behalf of users (or human
actors) based on predictions about the near future. An anticipatory system differs
from an adaptive system—agents employing anticipation and adaptation, in that the
former tries to predict the future state of the environment and make use of the
predictions in the decision making, and the latter which can perceive and react to
people involves decision making based on the current state of the environment with
no regard to the future. In AmI, the software agent should be so intelligent that it
can anticipate or predict the user’s needs, intentions, and behaviors, with the goal to
ease people’s lives. ‘As you move through an environment, AmI interfaces register
your presence, self-initiatively perform tasks designed to make your life easier, and
learn from your behavior in order to anticipate your future needs… The promises of
intelligent…anticipation are directed to the individual’ (Crutzen 2005, pp. 221–
222). AmI is a world of machine learning, where computers monitor the activities
and behaviors of humans and the changes in their environment to predict what they
will need, want, and do next based on real-time reasoning capabilities (or
pre-programed heuristics). Supervised learning algorithms enable AmI systems to
keep track of previous perceived experiences—e.g., various atomic contexts and
high-level context abstractions—and employ them to learn the parameters of the
stochastic context models in a dynamic way, which allow them to generate pre-
dictive models that the AmI system (agent) use to decide on what actions to take
proactively. AmI interfaces (agents) learn from users’ states and behaviors in order
to anticipate their future needs. Rosen (1985) describes an anticipatory system as a
system entailing a predictive model of its environment, which allows it to change
state at an instant in accord with the model’s predictions pertaining to a later instant.
Machine learning techniques started to incorporate anticipatory intelligent
6.8 Personalized, Adaptive, Responsive, and Proactive Services in AmI 293

capabilities in an implicit form as in reinforcement learning systems (Sutton and


Barto 1998; Balkenius 1995), which are concerned with how software agents
should take actions in an environment. Specifically, the agent act in a dynamic
environment by executing actions which trigger the observable state of that envi-
ronment to change, and in the process of acting it attempts to gather information
about how the environment responds to its actions as well as to synthesize a
sequence of actions that maximizes some notion of cumulative reward. Anticipation
improves performance of machine learning techniques to face with complex
environments where intelligent agents need to direct their attention to gather
important information to take action (Balkenius and Hulth 1999).
The anticipatory behavior of AmI systems entails recognizing the context or
situation of interest, and based on a predictive context model it derives the users’
needs, activities, and behaviors and then provide relevant proactive services. That is
to say, proactive systems change their state based on anticipation for linking
between context and action pertaining to a later occurred state so as to achieve a
certain goal. Examples are numerous in this regard, ranging from trivial to
non-trivial (iHCI) applications. A common approach is when using situations or
events to trigger the start of systems or applications, whereby a direct link between
the context and the system or application is executed—if-then rule. Such applica-
tions are widely discussed in Schilit et al. (1994), Brown et al. (1997). Starting and
stopping the application, in this approach, represents the minimal proactive appli-
cation. A common example is an automatic outdoor lantern device. Such lights are
often found at the entrance of buildings and at the floor-levels of buildings.
Whenever a person approaches the entrance and it is dark the light switches on in an
automatic way. A simple sensor is used to detect the situation of interest, which is
hard-coded with an action (switching on the light for a certain period of time), a link
which emanates from the expectation that the person needs light when moving
towards the location. Executing commands or performing actions based on the
current context is a further approach. A typical example is a laptop that (proac-
tively) switches on or off, save information, or runs its applications automatically
according to the situation. Another related example is a mobile phone that would
restrict incoming calls when the user is in a meeting or immersed in any situation
that may signal his/her unwillingness to receive calls. Selecting applications based
on the current context is another approach. Two typical examples, taken from
(Schmidt 2005, p. 168), are: to have a general purpose computer device that
becomes a specific information appliance depending on the context, e.g., ‘a PDA
that runs its applications automatically according to the context, e.g., when the PDA
is close to a phone it runs the phone book application, in the supermarket the
shopping list application is executed, and in the living room it becomes a remote
control’; and to use ‘context information to set default values so that they fit the
current situation, e.g., in meeting minutes the form is already preset with appro-
priate default values for time, date, location, and participants. This type of appli-
cation is closely related to applications that generate meta-data’.
However, not all the delivered anticipatory services can work according to the
user planning in AmI environments. This occurs when, for example, anticipating a
294 6 Implicit and Natural HCI in AmI: Ambient …

users’ intention as an internal context, which is tacit and thus difficult to learn or
capture. It is not easy even for the user to externalize and translate what is tacit into
a form intelligible to a computer system, adding to the fact that user’s intentions are
based on subjective perceptions and change constantly or subject to re-assessment.
‘Realizing implicit input reliably…appears at the current stage of research close to
impossible. Some ‘subtasks for realizing implicit input’ such as…anticipation of
user intention are not solved yet’ (Schmidt 2005, p. 164). Thus, it is more likely that
a computer system may fail in predicting what the user intend or plan to do and
thereby acts outside the range of his/her expectation when taking proactive actions,
causing fear, frustration, or lack of control, especially in such instances where the
mismatch between the system’s anticipation and the reality that was meant to be
experienced by the user is way too much significant. Schmidhuber (1991) intro-
duces, in relation to what is called adaptive curiosity and adaptive confidence, the
concept of curiosity for agents as a measure of the mismatch between expectations
and future experienced reality. This is a method that is used to decrease the mis-
match between anticipated states and states actually experienced in the future. His
rationale is that agents that are capable to monitor their own curiosity explore
situations where they expect to engage with unexpected or novel user experiences
and are capable to deal with complex, dynamic environment more than the others.
This can be useful to AmI systems in the sense of offering the potential to enhance
their anticipatory capabilities to provide relevant proactive services.
Regardless, the degree with which an AmI system’s anticipatory behavior can be
determined by reasoning over dedicated representations or by using predictive
models is a priori decided by the designers of AmI systems. Here the idea of
invisibility comes into play with its contentious issues. The underlying assumption
is that proactive (as well as personalized, adaptive, and responsive) services should
be based on a hybrid approach to service delivery (or finding ways of how implicit
and explicit user interaction may be combined) in the non-trivial kind of AmI
applications. In this line of thinking, Schmidt (2005, p. 168) points out that the
question is how it is possible to achieve stability in the user interface without
confusing the user, for example, ‘when a device is showing different behavior
depending on the situation and the user does not understand why the system
behaves differently and in which way it might lead to confusion and frustration. It is
therefore central to build user interfaces where the proactive behavior of the system
is understandable and predictable by the user even if the details are hidden…’.
Users should be able to understand the logic applied in proactive applications,
meaning that they should know why a certain action is performed or an application
behaves the way it behaves. In addition, they must have the option to switch off the
context-aware proactive interaction or the so-called ‘intelligent’ functionality,
instead of just submitting to what the developer define for them or passively
receiving proactive services without any form of negotiation; they should be able to
intervene in what should happen proactively when certain context conditions are
met by, during design process, composing their own context-aware proactive logic
by defining their own rules; and they should be able to define their own meaning to
context topics, which is typically subjective and evaluating in time.
6.9 Invisible, Disappearing, or Calm Computing 295

6.9 Invisible, Disappearing, or Calm Computing

6.9.1 Characterization and Definitional Issues

Disappearing or calm computing is one of the internal properties for iHCI, which is
in turn one of the main features of AmI and UbiComp. That is to say, the notion of
invisibility of technology and disappearing user interfaces is common to the visions
of AmI and UbiCom. AmI is about technology that is invisible, embedded in our
natural environments and enabled by effortless interactions. In other words, AmI
aims to create an active technology, physically and mentally invisible, seamlessly
integrated into everyday human environment. Invisibility of technology was crys-
tallized into a realist notion in the early 1990s. Weiser (1991) was the first who
focused on this characterization of computing: ‘The most profound technologies are
those that disappear. They weave themselves into the fabric of everyday life until
they are indistinguishable from it… This is not just a “user interface” problem…
Such machines cannot truly make computing an integral, invisible part of the way
people live their lives. Therefore we are trying to conceive a new way of thinking
about computers in the world, one that takes into account the natural human
environment and allows the computers themselves to vanish into the background.
Such a disappearance is a fundamental consequence not of technology, but of
human psychology…. Only when things disappear are we freed to use them without
thinking and so to focus beyond them on new goals.’ The idea that technology will
recede or vanish into the background of our lives and disappear from our con-
sciousness entails that the technology behind will invisibly be embedded and
integrated in everyday life world, and the user interface and its logics (e.g., rea-
soning processes, agent decisions) will be an integral part of interactions, a kind of a
natural extension to our daily tasks and activities.
However, technology invisibility as a phenomenon has proven to be conceptu-
ally diversified. In the literature on and in the discourse of AmI, the term ‘invisi-
bility’ has been used in multiple ways, meaning different things to different people
or based on contradictory or complementary perspectives. According to Crutzen
(2005, pp. 224–225), ‘physical invisibility or perceptual [mental] invisibility mean
that one cannot sense (smell, see, hear or touch) the AmI devices anymore; one
cannot sense their presence nor sense their full (inter)action, but only that part of
interaction output that was intended to change the environment of the individual
user.’ In point of fact, AmI working ‘in a seamless, unobtrusive and often invisible
way’ (ISTAG 2001) entails that even interaction output: adaptive, responsive, and
proactive actions of AmI systems, will be presented in the same way, that is,
without being discovered or consciously perceived by the user. However, more to
the meaning of invisibility, Schmidt (2005, pp. 173–174) conceives of it as ‘not
primarily a physical property of systems; often it is not even clearly related to the
properties of a system… It is not disputed that invisibility is a psychological
phenomenon experienced when using a system while doing a task. It is about the
human’s perception of a particular system in a certain environment.’ This notion of
296 6 Implicit and Natural HCI in AmI: Ambient …

invisibility is strongly related to the familiarity of a system for performing or


solving a particular task, which puts into perspective the notion of a ‘natural
extension’ (Norman 1998) and the idea of ‘weave themselves into the fabrics of
everyday life’ (Weiser 1991). This is discussed in detail below. In line with this, to
Ulrich (2008, p. 24) invisibility means: ‘the machine is to take care of the context in
which users find themselves, by retrieving locally available information and
responding to it autonomously.’ Putting the emphasis on domestication and
acceptance, Punie (2003, p. 36) sees mental invisibility as one of the outcomes of a
domestication process, which is not ‘…necessarily harmonious, linear or complete.
Rather it is presented as a struggle between the user and technology, where the user
tries to tame, gain control, shape or ascribes meaning to the technological artifact.
This is not resistance to a specific technology but rather an active acceptance
process.’ To Weiser (1991) and Schmidt (2005), mental invisibility remains a
precondition for the domestication of technologies in the sense that these ‘tech-
nologies are domesticated when they are ‘taken for granted’, when they reach a
state of mind of being a “natural” part of everyday life. As such, they are not
anymore perceived as technologies, as machines, but rather as an almost natural
extension of the self.’ (Punie 2003, p. 35). However, as Crutzen (2005, p. 225)
contends, ‘[p]hysical invisibility is contradictory to mental invisibility because the
process of domestication is not a process initiated by the user. In our daily life a lot
of things and tools become mentally invisible. Because of their evident and con-
tinuous presence they disappear from our environment.’ In relation to this, Punie
(2003) states, ‘there is a difference between the physical and mental disappearance
of computing and that it is incorrect to assume that physical disappearance will lead
automatically to acceptance and use, and thus to mental invisibility. Rather, the
physical invisibility of technology is likely to be detrimental to its acceptance and
use merely because it is invisible, and hence become unmanageable. In all, common
to the characterization and most definitions of invisibility are two key factors: the
psychological factor (involving intelligent agent functioning in the background of
human life, simple and effortless interaction attuned to natural senses and adaptive
to users, reducing input from users and replacing it with context information, and
provision of autonomous services) and the physical factor (entailing miniaturization
and embedding of computing devices, disappearance of conventional input and
output media).
The various interpretations of invisibility have generated a cacophony leading to
an exasperating confusion and contradiction in the area of AmI. This is evinced by
the contentions, misconceptions, and erroneous assumptions about mental and
physical invisibility. Related issues involve: misunderstanding about physical and
mental invisibility; erroneous assumptions about how physical invisibility relates to
mental invisibility, contradiction between physical and mental invisibility in rela-
tion to the process of domestication; the legitimacy of invisibility as a guiding idea
of context-aware computing; psychological and social implications of invisibility
for users; whether invisibility is primarily a physical property of systems or not; but
to name a few (see, e.g., Crutzen 2005; Punie 2003; Ulrich 2008; Bibri 2012; Criel
and Claeys 2008; Schmidt 2005). The notion of invisibility involves special
6.9 Invisible, Disappearing, or Calm Computing 297

conundrums that are of no easy task to tackle in the pursuit of realizing the vision of
AmI. The whole idea is in fact controversial, spurring an incessant debate of a
philosophical and social nature. Adding to this is the growing criticism that ques-
tions its computational feasibility and real benefits to the user in relation to different
application domains. The fact is that most of the reasoning processes applied in
AmI applications—based on machine learning, logical, or ontological techniques or
a combination of these—involve complex inferences based on limited and imper-
fect sensor data and on oversimplified models.

6.9.2 Mental Versus Physical Invisibility and Related Issues

6.9.2.1 Mental Invisibility

The basic premise of mental invisibility in AmI is that the operation of the com-
puting devices (e.g., registering presence; monitoring and capturing behavior along
with the state change of the environment; detecting context, learning from user
experiences, reasoning, decision making, etc.) should be moved to the periphery of
our attention. The functioning of the computing devices unobtrusively in the
background is aimed at increasingly invisibility of AmI applications, which can be
accomplished by placing greater reliance on context information and reducing
interactions with or input from users, and thus render interaction effortless, attuned
to human senses (by utilizing natural forms of communications), adaptive and
anticipatory to users, and autonomously acting. Thus, user interfaces will become
visible, yet unnoticeable, part of peripheral senses. This is opposed to the old
computing paradigm where interaction is mostly of an explicit nature, entailing a
kind of a direct dialog between the user and the computer that brings the computer
and thus its operation inevitably to the center of the activity and the whole inter-
action to the center of the user’s attention—the user focus is on the interaction
activity. In AmI, technology will be an integral part of interactions: interactions
between artificial devices and functions of intelligent agents will take place in the
background of the life of the users to influence and change their environment. This
is enabled by context awareness, natural interaction, and autonomous intelligent
behavior as human-inspired computational capabilities. Specifically, augmented
with such computational capabilities, AmI systems can take care of the context in
which users find themselves, by retrieving contextual information (which typically
define and influence their interaction with the environment and its artifacts) and
responding intelligently to it in an autonomous way. Human behaviors and actions
as contextual information will be objects of interactions, ‘captured, recognized and
interpreted by a computer system as input’; and the system output ‘is seamlessly
integrated with the environment and the task of the user’ (Schmidt 2005, p. 64).
Invisible computing is about, quoting Donald Norman, ‘ubiquitous task-specific
computing devices’, which ‘are so highly optimized to particular tasks that they
blend into the world and require little technical knowledge on the part of their users’
298 6 Implicit and Natural HCI in AmI: Ambient …

(Riva et al. 2003, p. 41). Unobtrusiveness of AmI is about interaction that does not
involve a steep learning curve (ISTAG (2003). With the availability of things that
think on behalf of the user, technical knowledge required from users to make use of
computers will be lowered to the minimum, and computing devices will work in
concert to support people in coping with their tasks and performing their activities.
A myriad of intelligent agents will be made available to think on behalf of users and
exploit the rich sets of adaptive and proactive services available within AmI
environments. All in all, mental invisibility is about the perception of user interfaces
in AmI environments, which is experienced when users effortlessly and naturally
interact with user interfaces and what defines and influences this interaction and its
outcomes is done in the background of human life. It is important to underscore that
natural interaction is a salient defining factor for the perception of invisibility and
realization of mental invisibility: users will be able to interact naturally with
computer systems in the same way face-to-face human interaction takes place.
However, the logics of the computer (intelligent user interfaces) disappearing
does not necessarily mean that the computer becomes so intelligent that it can carry
out all sorts of tasks. Rather, it can be optimized only to a particular type of tasks or
activities. In other words, there are some tasks that still require learning—that is,
technical knowledge or ‘minimal expertise’ to make use of computer functionality
and processing to execute these tasks given their complexity. Indeed, not all our
acting is routine acting in everyday life. Hence, the systems (user interfaces) used
for performing demanding tasks may not psychologically be perceived the way
Weiser would put it—‘weave themselves into the fabric of everyday life’—for there
is simply no natural or straightforward way of performing such tasks. Training is
thus required to carry out the task and thus use the system to do so, no matter how
intelligent a computer can become in terms of monitoring, capturing, learning or
interpreting, and reasoning on a user’s cognitive behavior to adapt to what the user
is doing. Yet, in this case, there are different factors that can influence the per-
ception of invisibility, which is strongly linked to the familiarity of the system used
for performing a particular task in a particular environment, which pertains to
non-routine tasks. In this context, perceptual invisibility and the degree of invisi-
bility is contingent on the extent to which people become familiar with (the use of)
the system to perform tasks. Accordingly, the computer as a tool can become a
‘natural extension’ to the task (Norman 1998). But this depends on the knowledge
of the user, the nature of the task, and the surrounding environment, as well as how
these factors interrelate. There are many variations as to the systems, the users, the
tasks, and the environments. In maintaining that invisibility is not primarily a
physical property of systems, Schmidt (2005) suggests four factors that can shape
the perception of invisibility: the user, the system, the task, and the environment,
and only the relationship between them can determine the degree of invisibility as
experience, which is again difficult to assess. To elaborate further on this, taking
this relationship into account, the way the user perceives the system and the task in
terms of whether and how they are complex depends on, in addition to the envi-
ronment being disturbing or conducive (physical, emotional, and social influences),
the experience of the user with using the system and performing the task (cognitive,
6.9 Invisible, Disappearing, or Calm Computing 299

intellectual, and professional abilities). To frame it differently, the nature of the task
and the complexity of the system can be objective facts. From an objectivistic point
of view, the universe of discourse of task or system is comprised of distinct objects
with properties independent of who carries out the task. If two do not understand
how to perform a task or use a system in the same way, it is due to lack of training,
limited knowledge, insufficient experience, unfamiliarity, and plain misunder-
standing. In a nutshell, the degree of invisibility is determined by the extent to
which either the user takes the system for granted or struggles in manipulating it,
and either he/she finds the task easy to perform or is encountered with a difficulty to
perform the task. Accordingly, the perception of invisibility can be linked to the
user’s knowledge and familiarity of using a system to perform a particular task and
also how this task is new and complex as perceived by each user. This notion of
invisibility is different from that which guides context-aware computing and puts
emphasis rather on the system; it is associated with facilitating or improving the
user’s performance of cognitive tasks through placing reliance on knowledge of
cognitive context (e.g., user’s intention, task goals, engaged tasks, work process,
etc.), when computationally feasible in some (demanding) tasks, and also eliciting
positive affective states through aesthetic and visual artifacts to enhance creative
cognition (see Chap. 9 for more detail). Here, the system may, depending on the
situation, disappear, form the (knowledgeable) user’s perception, and it is the
cognitive context awareness functionality that contributes to the system becoming a
‘natural extension’ to the task in this case rather than the user’s familiarity with the
system to perform the task.
In the context of everyday routine tasks, invisibility can essentially be achieved
for any tool, yet to some degree, if the user puts enough time in using it, a notion
which does not relate to the basic idea of AmI in the sense that some ICT-tools
(e.g., on- and off-switch buttons, gadgets, devices, etc.) are embedded invisibly in
the physical world. This is different, to note, from everyday objects as digitally
enhanced artifacts—augmented with micro processors and communication capa-
bilities—with no change to the behavior with regard to their usage. Hence, the
necessity for analyzing the influence of AmI stems from how humans experience
everyday objects and tools in their environment. In our daily life, not only tech-
nologies but a lot of objects and things become mentally invisible, as we use them
without thinking in our routing acting or form a relationship with them so that they
are used subconsciously. They become part of our already unreflective acts and
interactions with the environment. They find their way invisibly into our lives,
disappearing from our perception and environment because of the effortlessness to
use them and of their evident continuous presence (which makes objects still
blended into the world, without having to be hidden or embedded invisibly in
everyday life word). For example, a TV set is mentally invisible when we switch it
on as a matter of routine. But the moment we cannot switch it on, the TV becomes
very present in the action of trying to watch a daily favorite program. Similarly, a
computer becomes mentally invisible when we use it to do something (write or
chat) or as a matter of routine. But the moment the word-processing or commu-
nication application stops functioning, the whole computer becomes very present
300 6 Implicit and Natural HCI in AmI: Ambient …

and at the center of attention in the action of trying to continue writing or chatting.
We do not notice the technologies, things, or tools and their effects until they stop
functioning or act outside the range of our expectations. Nevertheless, these objects
can still be tractable in such situations due to their very physical presence, which is in
contrast to the basic ideas of AmI as to physical invisibility. As Crutzen (2005,
p. 226) argues, ‘Actions and interactions always cause changes, but not all activities
of actors are ‘present’ in interaction worlds. If changes are comparable and com-
patible with previous changes, they will be perceived as obvious and taken for
granted… [R]eady-to-hand interactions will not raise any doubts. Doubt is a nec-
essary precondition for changing the pattern of interaction itself. Heidegger gives
several examples of how doubt can appear and obvious tools will be
‘present-at-hand’ again: when a tool does not function as I expect, when the tool I am
used to is not available, and when the tool is getting in the way of reaching the
intended goal… [T]he ‘present-at-handness’…and the ‘ready-to-handness’…of a
tool are situated and they do not exclude each other. On the contrary, they offer the
option of intertwining use and design activities in interaction with the tool itself. This
intertwining makes a tool reliable, because it is always individual and situated… [T]
his can happen only through involved, embodied interaction. Intertwining of use
and design needs the presence at-hand of the ICT-representations… Their readiness-
to-hand should be doubtable. With AmI we are in danger of losing this ‘critical
transformative room’ … In our interaction with the AmI environment there is no
room for doubt between representation and interpretation of the ready-made inter-
actions with our environment. The act of doubting is a bridge between the obvious
acting and possible changes to our habitual acting. Actors and representations are
only present in an interaction if they are willing and have the potential to create doubt
and if they can create a disrupting moment in the interaction.’ There are many
routine tasks and daily activities that can performed via ICT-tools, and they will
increase in number even more with the use of context-aware functionalities—
ICT-tools will vanish, no physical presence. Whether performed via ICT-tools or
supported by context-aware functionalities, routine tasks can be classified as obvious
and hence mentally invisible. Dewey describes these unreflective responses and
actions as ‘fixed habits’, ‘routines’: ‘They have a fixed hold upon us, instead of our
having a free hold upon things. …Habits are reduced to routine ways of acting, or
degenerated into ways of action to which we are enslaved just in the degree in which
intelligence is disconnected from them. …Such routines put an end to the flexibility
of acting of the individual.’ (Dewey 1916). As further stated by Crutzen (2005,
p. 226), ‘Routines are repeated and established acting; frozen habits which are
executed without thinking. Routine acting with an ICT-tool means intractability; the
tool is not present anymore. The mutual interaction between the tool and the user is
lost.’ This notion of invisibility is the basic idea of AmI, where applications provide
adaptive and proactive services and carry out tasks autonomously on behalf of the
user. This is in line with the idea of ‘technologies…weave themselves into the
fabrics of everyday life’ (Weiser 1991). Here technology becomes accessible by
people to such an extent that they are not even aware of its physical presence and
thus its computational logics, engaging so many computing devices and intelligent
6.9 Invisible, Disappearing, or Calm Computing 301

agents simultaneously without necessarily realizing that they are doing. Hundreds of
computer devices ‘will come to be invisible to common awareness’ and that users
‘will simply use them unconsciously to accomplish everyday tasks’ (Weiser 1991).
Mental invisibility connotes the integration of technology into the daily (inter)action
of humans with the environment and its artifacts, and will, as claimed by AmI, be
settled in their daily routines and activities. In sum, mental invisibility in AmI is
expected to result from equipping context-aware systems with ambient, naturalistic,
multimodal, and intelligent user interfaces and what this entails in terms of context
awareness, natural interaction, and intelligent behavior.

6.9.2.2 Physical Invisibility

Physical invisibility of technology is common to the vision of AmI. AmI is


embedded; countless distributed, networked, invisible sensing and computing
devices are hidden in the environment. Underlying the idea of invisibility is that
technology will disappear and invisibly be integrated and ubiquitously spread in
everyday life world. This is ‘a new way of thinking about computers in the world,
one that takes into account the natural human environment and allows the com-
puters themselves to vanish into the background. Such a disappearance is a fun-
damental consequence…of technology’ (Weiser 1991). Both physical and human
environment will be strewn with countless tiny devices, invisibly entrenched into
everyday objects and attached to people. AmI is ‘a world of smart dust with
networked sensors and actuators so small to be virtually invisible, where the clothes
you wear, the paint on your walls, the carpets on your floor, the paper money in
your pocket have a computer communications capability.’ (Wright 2005, p. 33).
This is made possible by progress in the development of microelectronics, thanks to
micro- and nano-engineering. Miniaturization of technology has for long guided,
and been a driving force for, technological development, but it is about to reach its
mature stage in AmI. Miniaturization has played, and continues to play, a key role
in the pervasion of technology, a complete infiltration of our environment with
intelligent, interconnected devices. In a world of AmI, myriad invisible devices will
be seamlessly embedded in virtually everything around us. The omnipresence and
always-on interconnection of computing resources is meant to support daily life, by
offering services whenever and wherever people need them.
With a continuous process of miniaturization of mechatronic computing sys-
tems, devices, and components along with their efficiency improvement pertaining
to computational speed, energy, bandwidth, and memory, AmI computing is
evolving from a vision to an achievable and deployable computing paradigm.
Regardless of their size, AmI technologies will be equipped with quantum-based
processing capacity terabyte (or unlimited memory size) and linked by mammoth
bandwidth and wireless network limitless connectivity, ushering in the era of the
always-on AmI as an internet of things. The miniaturization trend involves not only
302 6 Implicit and Natural HCI in AmI: Ambient …

devices that are to be embedded in everyday objects, but also in computer systems.
The increasing miniaturization of computer technology is predicted to result in a
multitude of microprocessors and micro-sensors being integrated into user inter-
faces as part of AmI artifacts and environments and thus in the disappearance of
conventional explicit input and output media, such as keyboards, pointing devices,
touch screen, and displays (device, circuitry, and enclosure). See Chap. 4 for more
detail on miniaturization trend in AmI.
In relation to context-aware systems, physical invisibility and seamless inte-
gration of a multitude of microelectronic devices and components (dedicated
hardware) that form ambient user interfaces without conventional input and output
media (but with visual output displays) has implication for the psychological per-
ception of such user interfaces and thus mental invisibility. This related to the
assumption that physical invisibility may lead to mental invisibility, but this is valid
as long as the system does not react in ways it is not supposed to react or function
when it is not needed. Otherwise as long as a tool is physically invisible, the process
of mental invisibility cannot start. Hence, the physical presence of ICT-tools or
computer systems is important in the sense that people can still control them if
something goes wrong, thereby shunning any issue of intractability. The smart
devices constituting context-aware systems are not possible to control for they are
too small to see and manipulate, or rather they are designed in ways not to be
accessed by users. Consequently, the assumption that physical invisibility will lead
to mental invisibility becomes, to some extent, erroneous, unless ICT-tools and
products function flawlessly or are faultlessly designed, which will never be the
case when it comes to interactive computer systems,—whether in AmI or in any
vision of a next wave in computing. This is due to many reasons, among which:
failure of technologies during their instantiation is very significantly likely, as they
are computationally complex and technology-driven (constrained by existing
technologies); undergo fast, insubstantial evaluation, which is often favored in
technology and HCI design to get new applications and systems quickly to the
market; and with an exponential increase in networked, embedded, always-on
devices, the probability of failure for any of them increases proportionally, adding
to the fact that the technology is created in society and thus is the product of social
processes and thus diverse social actors and factors—sociocultural situativity.
Besides, achieving a high degree of robustness and fault tolerance is what the ICT
industry covets or wishes for when it comes to ‘innovative’ technologies regardless
of the computing paradigm.
As initially defined by Mark Weiser and if it actually materializes—it is still the
way to follow completely, the vision of invisible computing will radically change
the way people perceive the digital and physical world, and much of the way they
understand and act in the social world. The AmI vision explicitly proposes to
transform society by fully technologizing it, and hence it is very likely that this will
have far-reaching, long-term implications for people’s everyday lives and human,
social, and ethical values (see Bohn et al. 2004).
6.9 Invisible, Disappearing, or Calm Computing 303

6.9.3 Invisibility in Context-Aware Computing

The vision of invisibility underlies the notion of context-aware computing—sensing,


reasoning, inference, and action (service and information provision). The different
descriptions or metaphors used for context-aware applications assume invisibility of
technology and disappearing interfaces: AmI, UbiComp, pervasive computing,
everywhere computing, calm computing, disappearing computing, proactive com-
puting, sentient computing, and wearable computing. Context-aware applications
are becoming invisible, unobtrusive, and autonomous, by lacking conventional input
and output media and by reducing interactions with users through allowing natural
human forms of communications and placing greater reliance on knowledge of
context to provide more intelligent, autonomous services that reduce the cognitive
and physical burden on users to manipulate and interact with applications.
Invisibility is a driving force for and an end of the development of context-aware
computing. This is manifested in the research focusing on the development of
technologies for context awareness as well as the design of context-aware applica-
tions (e.g., MEMS, NMES, autonomous intelligent agents, new machine learning
techniques, etc.) that comply with the vision of AmI. It is worth noting that AmI
remains a field that is strongly driven by a particular vision of how ICT would shape
the future, a vision developed by particular stakeholders or actors, hence the need for
alternative perspectives and research avenues. For the scope of this book, the
emphasis is context-aware applications as to the topic of invisibility, rather than on
ubiquitous computing, i.e., the integration of microprocessors and communication
capabilities into everyday objects, enabling people to communicate directly with
their clothes, books, lights, doors, paper money, watches, pens, appliances, and
furniture, as well as these objects to communicate with each other and other people’s
objects. However, disappearing, invisible or calm computing poses its own special
conundrums (problems, dilemmas, paradoxes, and challenges) and thus calls for new
ways of thinking and alternative research directions—based on the premise that
technologies remain nonhuman machines. The implications of the vision of calm
computing add to the downside of AmI.

6.9.4 Delegation of Control, Reliability, Dependability


in AmI: Social Implications

Invisibility underlies the notion of AmI computing—intelligent service provision.


Technology invisibility (or user interface disappearing) is inextricably linked with
black boxing the computer technology and represents an intrusive way of interac-
tion. Computer devices blending into the world, requiring minimal technical
knowledge, reducing interactions with users, and allowing natural human forms of
communication does not necessarily mean that users would be so familiar with
computers that the interaction between them would become natural and thus occur
304 6 Implicit and Natural HCI in AmI: Ambient …

without conflict. The translations of the representations of AmI systems—exhibiting


intelligent behaviors or firing context-dependent actions—must not fit smoothly
without conflict into the world of AmI (for which these behaviors or actions are
planned—made-ready). In real-world settings, interaction situations are always
subject to negotiation—to avoid conflicts—among the persons involved in the sit-
uation, who are also good at recognizing situation changes as they are part of the
negotiation that leads to changes (see Lueg 2002). In addition, our (inter)actions are
never planned; instead, they are contextual, situated and ad-hoc (done for a particular
purpose as necessary), as their circumstances are never fully anticipated and con-
tinuously changing around us. This is related to a wider debate that can be traced
back to the notable work by Suchman on the role of plans in situated action
(Suchman 1987). Suchman (2005, p. 20) states, ‘Plans are a weak resource for what
is primarily an ad-hoc activity’. This implies that plans are mostly resources that are
to be merged with many other situational and dynamic variables to generate actions.
Hence, they are far from having a prominent role in determining our decisions and
setting our actions. Put differently, our (inter)actions entail meaning, which is sub-
jective and evaluating in time (and hence open to re-interpretation), and how this
meaning is constructed and reconstructed shapes our perception of the context of the
situation and thus how we inter(act)—e.g., setting priorities to actions. Constructivist
worldviews posit that interactions are fundamentally contextual, situated, and
meaning to them is ascribed within this changing (perception of) context, which is
linked to how people see reality around them not as a world that is a mere reflection
of such entities as objects, places, and events but as one of intersubjective con-
structed meanings that are defined in interaction and by those who are involved in it.
Context then defines and changes interaction, and this unfolds in the form of a series
of intertwined patterns and exchanges, as context and interaction both evolve.
Hence, the situation (ambiance) determined by AmI artifacts and the actions
taken based on that situation differ from what the users involved in the situation
might have negotiated and the actions they might want to take according to the
outcome of the negotiation. Moreover, in AmI settings, most users do not under-
stand the logic (e.g., machine learning, knowledge representation and reasoning,
application and adaptation rules) applied in AmI applications, and will not be able
to look into these rather black boxes so to be able to define themselves for which
context data—created ambience—a certain action should be performed.
Determining what the action will be for certain implicit input, e.g., observed
information about user’ emotional state, cognitive state, activity, or social setting, is
the task of autonomous intelligent agents—which more often than not lack situated
forms of intelligence. The behavior of a situated agent is the outcome of a close
coupling between the agent and the environment of the user (Pfeifer and Scheier
1999; Lindblom and Ziemke 2002). In many cases, an autonomous agent senses,
analyzes, reason about, and acts upon its environment in the service of its own
agenda. When users become unable to open the black box onto a certain level
(communicate with agent) raises the question of empowerment of the user, which is
associated with loss of control. Minimizing the need for human intervention and
technical knowledge in highly complex, dynamic smart environments is about
6.9 Invisible, Disappearing, or Calm Computing 305

giving more power to intelligent agents as to taking care of tasks autonomously.


With the availability of Things that Think, minimal effort will be required from
users to make use of computers, and intelligent agents are assumed to work in
concert to support people in carrying out their everyday tasks or performing them
on their behalf. This implies that benefiting from the adaptive, responsive, and
proactive services of AmI systems is associated with delegating control and deci-
sion power to intelligent agents to execute tasks on their own authority and
autonomy. Therefore, it becomes relevant to speak of fears for the loss of control
since AmI assumes everyday life to be dependent on intelligent user interfaces
embedded and strewn in natural surroundings. The degree of the loss of control is
proportional to the degree of the system autonomy—e.g., users’ sense of control
decreases when autonomy of the service increases. AmI technologies are indeed
said to be able to easily acquire some aspects of them controlling people. The
argument is that AmI systems should not be given full control and thus autonomy,
as they may well fail annoyingly due to wrong choices becoming significant. This
raises the issue of accountability when the system as a corollary of wrong choices
exhibit unpredictability, unreliability, and undependability. It is necessary to have
some control and accounting mechanisms to determine ‘who is in control of an
autonomous system, and who is responsible if something goes wrong’ (Bohn et al.
2004). AmI should be controllable by users, which requires that they should be
given the lead in the ways that applications, interfaces, and services are designed,
configured, and implemented. (See Chap. 3 for an overview on boundaries for
developing critical user participatory AmI applications and environments.)
In a world of AmI, computer devices will be encountered in unfamiliar settings
and, by being ubiquitous and invisible, may not be recognizable or perceived as
computer devices. This may well frustrate users if their environment becomes
difficult to manage. This lack or loss of control may in some instances frighten
users, especially when ICT-tools are not present anymore and, thus, the mutual
interaction between the technology and the user is lost. So, when the system
autonomously reacts or pre-acts in a way that is unexpected by the user and the
latter cannot control the situation—intractability—because the ‘off-switch’ is sim-
ply not within reach or there is no evident presence of the system (or its command
components), this may cause feelings of fear. Users may get used to the effects of
the system, but when it acts outside the range of their expectations, it will only
frighten them because they cannot control it (Crutzen 2005, p. 225). In some
instances, people could feel to be surrounded by enemies or traitors (Lucky 1999).
As a result, users will experience AmI systems and environments as disturbing,
intrusive, and unfriendly, and will fear the unknown of their behavior. Feelings of
fear for ICT are related to feelings of locus of control and self-efficacy (Criel and
Claeys 2008): the perception of a personal capability to perform a particular set of
tasks. Criel and Claeys (2008) stipulate that without feeling of control and digital
literacy, people will stay frightened of technological changes regarding their indi-
vidual lives as well as the society where they live.
Other issues associated with technology invisibility as to not understanding AmI
logic (i.e., knowing why a certain action is performed or an application behaves as it
306 6 Implicit and Natural HCI in AmI: Ambient …

behaves); dispossessing user from the option to switch off intelligent functionalities;
partial user participation in system and application design; underestimation of the
subjectivity and situatedness of interaction and what defines and surrounds it; and
unaccountability of designers and developers include: disturbance, annoyance,
confusion, mistrust, insecurity, suspicion, and hostility, as well as marginalization
and disempowerment of users, discrimination and favoritism against users, and
power relations. The whole notion of invisibility of technology ‘is sometimes seen as
an attempt to have technology infiltrates everyday life unnoticed by the general
public in order to circumvent any possible social resistance’ (Bohn et al. 2004, p. 19).
Loss of control has implication for user acceptance of AmI technologies. It will
be very difficult for technologies to be accepted by the public, if they do not react in
ways they are supposed to react; do not function when they are needed; and do not
deliver what they promise (Beslay and Punie 2002). AmI applications need to be
predictable, reliable, and dependent. Similarly, physical invisibility may harm
acceptance because AmI systems become difficult to control (Punie 2003). This
intractability is due to the loss of mutual interaction between the technology and the
user. Perhaps, the interface as an omnipresent interlocutory space will lose its
central stage as a mediator in human-computer interactions (Criel and Claeys 2008).
As a consequence, an intelligent environment that takes decisions on user’s behalf
and what this entails in terms of reduced interaction with the user may very well
harm rather than facilitate AmI acceptance (see Punie 2005).

6.9.5 Misconceptions and Utopian Assumptions

Related to invisible computing, there are a lot of visions of limited modern


applicability, dreams, and fallacies. To begin with, it is not because user interfaces
will psychologically be imperceptible and the technology behind will physically be
invisible that the logics of the computer and the behavior and cognitive processes of
software agents will come closer to human functioning, and context-aware envi-
ronments will become human-like intelligent. Computers as nonhuman machines
lack a certain amount of common sense (good sense and sound judgement in
practical matters) and hence the impregnation of the life world by AmI remains
limited (see Dreyfus 2001). Or, on the difference between human situated actions
and machine planned actions, Suchman (2005, p. 20) writes: ‘The circumstances of
our actions are never fully anticipated and are continuously changing around us. As
a consequence our actions, while systematic, are never planned in the strong sense
that cognitive science would have it. Plans are a weak resource for what is primarily
an ad-hoc activity’. Interactive computer systems lack or do not possess the
capacity to respond to unanticipated circumstances (Hayes and Reddy 1983).
Specifically, in his collaborative work (Lenat et al. 1990; Lenat and Guha 1994),
Lenat suggests that there is a fundamental difference between humans and com-
puters in terms of knowledge content and how it is used, and notes that humans are
equipped to deal with new and unexpected situations as they arise, whereas
6.9 Invisible, Disappearing, or Calm Computing 307

computers cannot dynamically adjust to a new situation, when a situation exceeds


their limitations. Furthermore, rendering technical knowledge minimal on the part
of the users, through placing greater reliance on knowledge of context and reducing
interactions, the burden to interact with applications: reducing input from users and
replacing it with knowledge of context as well as through utilizing natural human
forms of communications are definitely great achievements that demonstrate
technological advancements and offer a whole range of fascinating possibilities and
opportunities. But the conundrum lies in that context-aware applications may well
fail annoyingly when their wrong choices become significant, which usually happen
due to an inaccurate or imperfect sensing, interpretation, and inference of context,
especially when it comes to such subtle, fluid contexts as users’ intentions, emo-
tional states, and social settings. Computer systems never work perfectly. As argued
by Ulrich (2008, p. 6), ‘…the idea that artificial systems should be intuitive and
easy to handle is valid; but making them and their handling of context “vanish into
the background” …risks coming close in effect (albeit not in intention) to a
machine-centered utopia’. The underlying assumption is, in addition to the above,
that AmI systems can never understand the meanings humans give to interactions or
communication acts—within changing context—nor emulate humans as to the
intersubjective construction of meaning that are defined in interactions. A general
thesis of reciprocity of perspectives (Schütz and Luckmann 1974) cannot be spoken
of in communication acts between humans and computers; the difference is way
more striking as to interacting with computers compared to interacting with
humans. Moreover, while artificial systems may support the effort of taking ‘the
fundamentally contextual nature of all information more seriously than is common
now and accordingly would try to uncover contextual selectivity systematically’,
they ‘cannot tell us what selections are appropriate in terms of human motivation
and responsibility’, as ‘the key lies in understanding the selectivity of what we
observe and how we value that ‘that condition our judgments and claims’; hence,
‘we have to tell them; for only human agents are responsible. Only humans care,
and only humans (because they care) are critical, can question things with a view to
moving them closer to their visions for improvement’ (Ulrich 2008). That said
autonomous behaviors of context-aware systems to features of the user context
signifies ‘substituting rather than supporting human autonomy in framing the rel-
evant context—as if the machine’s awareness of context (or what is called so) could
be compared to human intentionality’ (Ibid, p. 6).
Therefore, it is of necessity to question whether invisible computing, the current
object of context-aware computing’s fascination in the AmI paradigm, is actually
useful to be in the most diverse scenarios of people’s everyday lives. The idea of
invisible computing in its current narrow construal is, arguably, vulnerable to the
same criticism of technological symbolism and vague idealism made against pre-
ceding technological visions. In this case, the idealistic fascination appears to build
upon a romanticized view (or the new discourse surrounding the introduction) of
AmI as a breakthrough in technology—inspiring visions of calm computing but of
limited modern applicability. Invisibility ought to be redefined and embedded into a
broader understanding of technology in society so that it becomes a useful guiding
308 6 Implicit and Natural HCI in AmI: Ambient …

principle for AmI development in its social context. In fact, the idea will be par-
ticularly effective, instead of merely evoking an inspiring vision of an unprob-
lematic and a peaceful ‘computopia’ in the twenty-first century. The idea that
technologies will ‘weave themselves into the fabric of everyday life until they are
indistinguishable from it’, i.e., context-aware systems ‘will come to be invisible to
common awareness’ so that ‘people will simply use them unconsciously to
accomplish everyday tasks’ and in this way ‘computers can find their way invisibly
into people’s lives’ (Weiser 1991) is just a faulty utopia associated with AmI
(Ulrich 2008, p. 5). The early vision of disappearing interfaces and invisibility of
technology as initially defined by Weiser is perhaps not the way to follow com-
pletely (Criel and Claeys 2008). Crutzen (2005, p. 225) contends, ‘The hiding of
AmI in daily aesthetic beautiful objects and in the infrastructure is like the wolf in
sheep’s clothing, pretending that this technology is harmless. Although “not seeing
this technology” could be counterproductive, it is suspicious that computing is
largely at the periphery of our attention and only in critical situations should come
to our attention. Who will decide how critical a situation is and who is then given
the power to decide to make the computing visible again’. And the physical
invisibility of AmI signifies ‘that the whole environment surrounding the individual
has the potential to function as an interface. Our body representations and the
changes the individual will make in the environment could be unconsciously the
cause of actions and interactions between the AmI devices’ (Ibid).

6.9.6 Challenges, Alternative Avenues, and New


Possibilities

The vision of invisible computing has over the last decade been a subject of much
debate and criticism. The main critical voice or standpoint underlying this debate,
from within and outside the field of AmI, recognizes that users should be given the
lead in the ways that the so-called intelligent interfaces and services are designed and
implemented and that technologies should be conspicuous and controllable by
people. This involves exposing ambiguity and empowering users—that is, recon-
sidering the role of users, by making them aware of and enabling them to control what
is happening behind their backs and exposing them to the ambiguities raised by the
imperfect sensing, analysis, reasoning, and inference. Rather than focusing all the
efforts on the development of technologies for context awareness and on the design
and implementation of context-aware applications based on the guiding principle of
invisibility, research should—and it is time to—be directed towards revisiting the
notion of intelligence in context-aware computing, especially in relation to user
empowerment and visibility. Indeed, it has thus been suggested that it is time for the
AmI field to move beyond its vision of disappearing interfaces and technology
invisibility, among others, and embrace emerging trends around the notion of
intelligence as one of the core concepts of AmI. In other words, several eminent
scholars in and outside the field of AmI have advocated the proposed alternative
6.9 Invisible, Disappearing, or Calm Computing 309

research directions within context-aware computing given the underlying benefits


which are also contributory to user acceptance of AmI technologies. Ulrich (2008,
p. 6) states, ‘the challenge to context-aware computing is to enhance, rather than
substitute, human authorship, so that people (not their devices) can respond pur-
posefully and responsibly to the requirements and opportunities of the context. The
aim is not to make the context vanish but to give users meaningful and easy control of
it. The fundamental issue, then, is how we can make contexts visible.’ Domestic
technologies should be conspicuous rather than inconspicuous in terms of technology
revealing what the system has to offer (Petersen 2004) as intelligent services. Schmidt
argues for a context-aware interaction model in which users can always choose
between implicit and explicit interfacing: users ‘should know why the system has
reacted as it reacted’ (Schmidt 2005). This in fact provides a deeper understanding of
context-aware computing. Context-aware applications functioning unobtrusively—
sensing and processing information in the background of human life—and intelli-
gently reacting to people and anticipating and proactively responding to their desires
and intentions is no longer as a fascinating idea as it was during the inception of
context-aware computing. Rather, what is becoming increasingly more desirable (yet
challenging) is to create computational artifacts that can enable users: to retrieve
which context is measured in the environment surrounding them at any time and any
place and to be able to understand what it means; to understand and control the logic
applied in context-aware applications, i.e., to know why an application behaves as it
behaves and to decide how it should behave in a given situation; to switch off any
context-aware interaction, when needed; to intervene in what should happen, i.e.,
what actions to be performed, when certain context conditions are met, thereby
composing their own context-aware logic by defining their own rules; and finally to
be given the opportunity to define their own meaning to context, which is subjective
and evaluating in time (see Chap. 4 for more detail). These alternative context-aware
artifacts may sound technically unfeasible, or at least very computationally difficult
and also expensive, to achieve, at the current stage of research. But the design of
context-aware applications should support some kind of a hybrid approach to
interaction, especially in relation to service provision, where service offerings should
be presented in an explicit way to the user, a kind of user-driven service provision. As
Ulrich (2008, p. 22) suggests, ‘…the early ideal of invisibility of context-aware
computing may need to be replaced (or at least, complemented) with that of visibility:
a vision of computing that would render users aware of contextual assumptions and
give them contextual options’. Furthermore, it is equally important to ensure that
some parts of the technology is physically present so that users can have control over
it and thus manage their environments, for example, by switching off intelligent
functionalities, if something goes wrong—e.g., a system does not react in ways it is
supposed to react or does not function when it is needed. Especially, the idea of
accounting mechanisms for determining who is responsible if something goes wrong
seems to be a wobbly concept or may complicate matters computationally. In
addition to the off-switch, which ‘is only one end of a rich spectrum of intervention
tools’ and to the fact that AmI ‘applications are very fragile and any design paradigm
must include ways in which the average user can fix problems’, AmI should include a
310 6 Implicit and Natural HCI in AmI: Ambient …

diversity of options to influence the behavior, use and design of the technology’
(Crutzen 2005, p. 227). All in all, the way forward is to make (some aspects of)
technology visible mentally and physically in aspects deemed necessary for enabling
users to control the behavior of computing devices and oversee their interactions with
the environment in human presence. Otherwise users may fail or find it difficult to
develop an adequate mental concept for AmI interactions and behaviors when
computing devices grow more sophisticated, gain more autonomy and authority,
function unobtrusively, and become invisible, embedded. To overcome the issues of
invisibility, new interaction paradigms and novel HCI models and methods for
design and development of user interfaces are needed. AmI requires a new turn in
HCI for interacting with small and embedded computing devices to serve people
well. AmI should not be so much about how aesthetically beautiful computing
devices or how seamlessly integrated are in AmI environments as it should be about
the way people would aspire to interact with these computing devices when they
become an integral part of their daily. The challenge to context-aware computing is to
advance the knowledge of context-aware applications that conceptualize and oper-
ationalize context based on more theoretic disciplines instead of alienating the
concept from its complex meaning to serve technical purposes. The key concern is no
longer to provide context information and context-dependent services but rather, to
question the way the concept of context is defined and operationalized in the first
place. ‘Invisibility is not conducive to questioning. To make sure we are aware of
contextual assumptions and understand the ways they condition what we see, say,
and do, we have no choice but to go beyond the vision of invisibility… We probably
need to take the concept of context much more seriously than we have done so far…
I would argue that information systems research and practice, before trying to
implement context awareness technically, should invest more care in understanding
context awareness philosophically and should clarify, for each specific application,
ways to support context-conscious and context-critical thinking on the part of users.
In information systems design, context-aware computing and context-critical
thinking must somehow come together, in ways that I fear we do not understand
particularly well as yet’ (Ulrich 2008, p. 8).
The underlying assumption of complementing invisibility with visibility is to
enable users to have a certain degree of control over the behavior of intelligent agents
by having the possibility to mutually exchange representations or negotiate with
context-aware systems (intelligent agents), thereby influencing the execution of their
(ready-made) behavior. Any kind of agent-based negotiations can only succeed if
there is trust, e.g., the agents will represent the user at least as effective as the user
would do in similar circumstances (Luck et al. 2003). Otherwise technologies could
easily acquire an aspect of ‘them controlling us’ (ISTAG 2001). Furthermore, the
technology revealing what the system has to offer motivates users to relate the
possibilities of the technology to their actual needs, dreams, and wishes (Petersen
2004). Drawing on Crutzen (2005), our acting is not routine acting in its entirety, and
using an AmI system is negotiating about what actions of the system are appropriate
for the user or actor’s situation. The ready-made behavior of ICT-representations
should ‘be differentiated and changeable to enable users to make ICT-representations
6.9 Invisible, Disappearing, or Calm Computing 311

ready and reliable for their own spontaneous and creative use; besides ‘translations
and replacements of ICT-representations must not fit smoothly without conflict into
the world for which they are made ready. A closed readiness is an ideal which is not
feasible, because in the interaction situation the acting itself is ad-hoc and therefore
unpredictable.’ (Ibid). Hence, a sound interface, nearby or remote, is the one that can
enable users to influence the decisions and actions of context-aware applications and
environments. It is important to keep in mind that people are active shapers of their
environments, not passive consumer of what technology has to offer as services in
their environments. Intelligence should, as José et al. (2010, p. 1487) state, ‘emerge
from the way in which people empowered with AmI technologies will be able to act
more effectively in their environment. The intelligence of the system would not be
measured by the ability to understand what is happening, but by the ability to achieve
a rich coupling with users who interpret, respond to, and trigger new behavior in the
system. This view must also accommodate the idea that intelligence already exists in
the way people organize their practices and their environments’. This entails that
human environments such as living places, workplaces, and social places, already
represent human intelligence with its subjectivity and situatedness at play. People
should be empowered into the process of improvised situatedness that characterizes
everyday life (Dourish 2001).

6.10 Challenges to Implicit and Natural HCI

To mimic or rather come closer to the aim of natural interaction—as it is impossible


to realize a complete model of this form of interaction, at least at the current stage of
research—as a key enabling technology for AmI poses many open issues and
challenges associated with system engineering, system modeling, and systems
design. In relation to human natural forms of communication, challenges include,
and are not limited to:
• interaction paradigms that govern the assembly of multimodal and perceptual user
interfaces associated with conversational agents, cognitive and emotional
context-aware systems, affective systems, emotionally intelligent systems; and so
on;
• principles and tailor-made methodologies for engineering natural interaction;
• practical application of design methodologies to real-world interactive
problems;
• general methods for acquiring and modeling of verbal and nonverbal behavior
as direct communicative behavior, implicit contextual information, and emo-
tional display;
• techniques, theories, and models of the information and structure of multimodal
user interfaces;
• evaluation techniques of such interfaces; and
• algorithms and programing of such interfaces.
312 6 Implicit and Natural HCI in AmI: Ambient …

It is crucial to address and overcome these challenges in order to create AmI


systems that are capable to emulate human interaction capabilities in terms of
dynamic perception and what this entails in terms of multimodality and
multi-channeling which are crucial for context-aware (iHCI) applications to func-
tion properly in terms of understanding and supporting behavior as to responding to
cognitive, emotional, social, and conversational needs and desires.
As to implicit interaction, key challenges are addressed in Chap. 3 of this book.
Accordingly, it is of no easy task to achieve an advanced form of implicit interaction.
‘Realizing implicit input reliably as general concept appears at the current stage of
research close to impossible. A number of subtasks for realizing implicit input, such
as recognition and interpretation of situations…are not solved yet (Schmidt 2005,
pp. 164–165). For example, machine learning methods ‘choose a trade-off between
generalization and specification when acquiring concepts from sensor data record-
ings, which does not always meet the correct semantics, hence resulting in wrong
detections of situations’ (Bettini et al. 2010, p. 11), thereby wrong choices and thus
irrelevant application actions—implicit output. See Chaps. 4 and 5 for a detailed
account of the issues associated with context recognition with respect to existing
supervised and unsupervised learning algorithms as well as ontological modeling
methods and reasoning mechanisms. To address some of the issues relating to
context recognition, emerging technologies such as MEMS, NMES, and
multi-sensor fusion are expected to drastically change the way sensors can be
designed and function, so to realize an advanced and reliable form of implicit inputs.
MEMS technology is expected to considerably enhance computational speed,
memory capacity, and bandwidth, as well as methods to achieve a dynamically
defined multi-parametric performance goal (e.g., reliability, accuracy, energy use,
etc.). However, MEMS technology also poses many challenges pertaining to
research and development, design and engineering, and manufacturing and fabri-
cation (see Chap. 4 for a detailed discussion). Regardless, technological advance-
ment is rapid but seems to happen ad-hoc when new capture, modeling, and machine
learning technologies become available, rather than based on a theoretically clear
approach, which keep perpetually distancing technologies from computing theories
and thus creating a gap between theory and practice in the sphere of AmI.
Communication and interaction between humans as a natural form of interaction
is highly complex and manifold. By all indicators (based on current research in AmI),
realizing natural interaction remains a daunting challenge, if not an unattainable goal.
It can be a never-ending pursuit under interdisciplinary research endeavors that can
bring together eminent scholars from such fields as HCI, AI, cognitive science,
cognitive psychology, human communication, linguistics, anthropology, and phi-
losophy to focus their effort and pull their knowledge together towards the sought
objective. There are several valid reasons for which a computer (artificial) system
may never be able to interact with a user on a human level. In addition to the
differences between computer systems and humans as mentioned above in terms of
situated versus planned actions, inability of computers to answer unanticipated cir-
cumstances and to understand the meanings and intentions given by humans to
communication acts, and computers’ lack of a certain amount of common sense,
6.10 Challenges to Implicit and Natural HCI 313

computer systems do not possess solutions to detect communication problems.


Communication between humans is not error free, as many conversational acts
involve some misunderstandings and ambiguities. These problems in real-life
interactive situations are resolved by the communication partners. ‘Often ambiguities
are rephrased and put into the conversation again to get clarity by reiteration of the
issue. Similarly misunderstandings are often detected by the monitoring the response
of the communication partner. In case there is a misinterpretation issues are repeated
and corrected. When monitoring conversations it becomes apparent that efficient
communication relays heavily on the ability to recognize communication errors and
to resolve them. When building interactive systems that are invisible the ability to
detect communication problems and to have ways to resolve it becomes crucial. In
certain cases knowledge about the situation can provide the essential cues to solve the
problem.’ (Schmidt 2005, p. 163). See next chapter for further elaboration on
communication error in relation to linguistic performance. Further to comparing the
complex ways in which humans interact to the way humans interact with computers,
it is apparent that computers lack the capacity to meaningfully interpret context to
influence and change interaction with the user. The underlying assumption is that the
meaning or perception of context is evaluating in time, subjective, and socially
situated, and therefore vary from an individual to another depending on an array of
factors, including cognitive, emotional, motivational, biochemical, intellectual,
social, cultural, normative, empirical, and so forth. Also, human communication
differs from computer systems regarding the nature of knowledge base that is used in
communication and interaction between humans for understanding each other. This
shared knowledge is cognitively and socioculturally represented and constructed. It
involves a complete world and language model, which can be very difficult to grasp
formally and make use of computationally effectively. What humans expect from
other humans is in any form of communication strongly influenced by the implicitly
shared common knowledge (see Schmidt 2005). See next chapter for more chal-
lenges as to mimicking human communication in relation to many different aspects.
Given the challenges posed by mimicking natural interaction into computers, the
vision of AmI is unlikely to materialize according to the initial plan set by its
originators. Although some advocates of AmI claim that AmI is already upon us,
natural interaction paradigm appropriate to a fully robust AmI has a long way to go,
if it will transpire at all. Yet, the motivation for realizing natural interaction, coupled
with observing the differences between human interaction and HCI, continue to
inspire researchers into a quest for novel forms of interaction. Not to demean the
value of the work that has already been done in the field of HCI, the new interactive
technologies utilizing verbal and nonverbal behavior is undeniably a great
achievement, providing advanced user interfaces, promising simplicity and intui-
tiveness, and supporting AmI computing paradigm. Human-directed disciplines
have provided many foundational theories and approaches that have proven to be
influential in the way user interfaces are designed, function, and behave. No doubt,
there is still a vast unexplored zone to explore in the area of human communication
(see next chapter) and thus a lot more to learn from in order to enhance interaction
capabilities of future-generation AmI applications and systems.
314 6 Implicit and Natural HCI in AmI: Ambient …

6.11 Interdisciplinary and Transdisciplinary Research

Reaching the current stage of research within implicit and natural HCI and achieving
the current state-of the-art related applications has been made possible by the
amalgamation of the breakthroughs at the level of the enabling technologies and
processes of AmI and new discoveries in cognitive science, AI, cognitive neuro-
science, communication engineering, human communication, and social sciences—
that, combined, make it possible to acquire a better understanding of the cognitive,
emotional, behavioral, and social aspects and processes underlying human-to-human
communication and how this complex and manifold process can be implemented
into computer systems. In this regard, it is important to underscore that interdisci-
plinary research endeavors have been of great influence on the advent of the new
HCI paradigm, which has made it possible to build ground-breaking or novel
interactive systems. Human communication and thus HCI entail many areas that
need to be meshed together through interdisciplinary research to create interactional
knowledge necessary to understand the phenomenon of AmI as a novel approach to
HCI. HCI within the area of AmI is too complex to be addressed by single disciplines
and also exceeds the highly interdisciplinary field—in some of its core concepts such
as context, interaction, and actions. It is suggested that interdisciplinary efforts
remain inadequate in impact on theoretical development for coping with the
changing human conditions (see Rosenfield 1992). Hence, transdisciplinary
approach remains more pertinent to investigate HCI in relation to AmI—as a
complex problem, as this approach insists on the fusion of different elements of a set
of theories with a result that exceeds the simple sum of each. Thus, any future
research agenda for HCI in AmI should draw on several theories, such as context in
theoretic disciplines, situated cognition, situated action, social interaction, social
behavior, verbal and nonverbal communication behavior, and so on. Understanding
the tenets of several pertinent theories allows a more complete understanding
of implicit and natural HCI. Among the most holistic, these theories are drawn
mainly from cognitive science, social science, humanities, human communication,
philosophy, constructivism and constructionism, and so on.

References

ACM SIGCHI (2009) Curricula for human-computer interaction. http://old.sigchi.org/cdg/cdg2.


html#2_1. Viewed 20 Dec 2009
Adjouadi M, Sesin A, Ayala M, Cabrerizo M (2004) Remote eye gaze tracking system as a
computer interface for persons with severe motor disability. In: Proceedings of the 9th
international conference on computers helping people with special needs, Paris, pp 761–766
Alexander S, Sarrafzadeh A (2004) Interfaces that adapt like humans. In: Proceedings of 6th
computer human interaction 6th Asia pacific conference (APCHI 2004), Rotorua, pp 641–645
Argyle M, Cook M (1976) Gaze and mutual gaze. Cambridge University Press, Cambridge
Balkenius C (1995) Natural intelligence in artificial creatures. PhD thesis, Department of
Cognitive Studies, Lund University, Lund
References 315

Balkenius C, Hulth N (1999) Attention as selection-for-action: a scheme for active perception. In:
Schweitzer G, Burgard W, Nehmzow U, Vestli SJ (eds) Proceedings of EUROBOT ‘99, IEEE
Press, pp 113–119
Barkhuus L, Dey AK (2003) Is context-aware computing taking control away from the user? Three
levels of interactivity examined. Proceedings of UbiComp. Springer, Heidelberg, pp 149–156
Beslay L, Punie Y (2002) The virtual residence: identity, privacy and security. The IPTS Report
67:17–23 (Special Issue on Identity and Privacy)
Bettini C, Brdiczka O, Henricksen K, Indulska J, Nicklas D, Ranganathan A, Riboni D (2010) A
survey of context modelling and reasoning techniques. J Pervasive Mobile Comput 6(2):161–180
(Special Issue on Context Modelling, Reasoning and Management)
Bibri SE (2012) A critical reading of the scholarly and ICT industry’s construction of ambient
intelligence for societal transformation of Europe. Master thesis, Malmö University
Bohn J, Coroama V, Langheinrich M, Mattern F, Rohs M (2004) Living in a world of smart
everyday objects—social, economic, and ethical implications. J Hum Ecol Risk Assess 10
(5):763–786
Brown PJ, Jones GJF (2001) Context-aware retrieval: exploring a new environment for
information retrieval and information altering. Pers Ubiquit Comput 5(4):253–263
Brown PJ, Bovey JD, Chen X (1997) Context-aware applications: from the laboratory to the
marketplace. IEEE Pers Commun 4(5):58–64
Cassell J, Sullivan J, Prevost S, Churchill E (eds) (2000) Embodied conversational agents. MIT
Press, Cambridge
Chen G, Kotz D (2000) A survey of context-aware mobile computing research. Paper
TR2000-381, Department of Computer Science, Darthmouth College
Cheverst K, Mitchell K, Davies N (2001) Investigating context-aware information push vs.
information pull to tourists. In: Proceedings of mobile HCI 01
Criel J, Claeys L (2008) A transdisciplinary study design on context-aware applications and
environments, a critical view on user participation within calm computing. Observatorio
(OBS*) J 5:057–077
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc
3(4):219–232
de Silva GC, Lyons MJ, Tetsutani N (2004) Vision based acquisition of mouth actions for
human-computer interaction. In: Proceedings of the 8th pacific rim international conference on
artificial intelligence, Auckland, pp 959–960
Dewey J (1916) Democracy and education. The Macmillan Company, used edition: ILT Digital
Classics 1994. http://www.ilt.columbia.edu/publications/dewey.html. Viewed 25 June 2005
Dey AK (2001) Understanding and using context. Pers Ubiquit Comput 5(1):4–7
Dix A, FinlayJ, Abowd G, Beale R (1998) Human computer interaction. Prentice Hall Europe,
Englewood Cliffs, NJ
Dourish P (2001) Where the action is. MIT Press
Dreyfus H (2001) On the internet. Routledge, London
Erickson T (2002) Ask not for whom the cell phone tolls: some problems with the notion of
context-aware computing. Commun ACM 45(2):102–104
Franklin S, Graesser A (1997) Is it an agent, or just a program?: a taxonomy for autonomous
agents. In: Proceedings of the 3rd international workshop on agent theories, architectures, and
languages. Springer, London
Gill SK, Cormican K (2005) Support ambient intelligence solutions for small to medium size
enterprises: Typologies and taxonomies for developers. In: Proceedings of the 12th
international conference on concurrent enterprising, Milan, Italy, 26–28 June 2005
Hayes PJ, Reddy RD (1983) Steps toward graceful interaction in spoken and written man-machine
communication. Int J Man Mach Stud I(19):231–284
Hix D, Hartson HR (1993) Developing user interfaces: ensuring usability through product and
process. Wiley, London
316 6 Implicit and Natural HCI in AmI: Ambient …

ISO 9241-11 (1998) Ergonomic requirements for office work with visual display terminals
(VDTs), part 11: guidance on usability. International Organization for Standardization,
Switzerland, Genève
ISTAG 2001 (2001) Scenarios for ambient intelligence in 2010. ftp://ftp.cordis.lu/pub/ist/docs/
istagscenarios2010.pdf. Viewed 22 Oct 2009
ISTAG 2003 (2003) Ambient intelligence: from vision to reality (For participation—in society and
business). http://www.ideo.co.uk/DTI/CatalIST/istag-ist2003_draft_consolidated_report.pdf.
Viewed 23 Oct 2009
José R, Rodrigues H, Otero N (2010) Ambient intelligence: beyond the inspiring vision. J Univ
Comput Sci 16(12):1480–1499
Karpinski M (2009) From speech and gestures to dialogue acts. In: Esposito A, Hussain A,
Marinaro M, Martone R (eds) Multimodal signals: cognitive and algorithmic issues. Springer,
Berlin, pp 164–169
Kasabov N (1998) Introduction: hybrid intelligent adaptive systems’. Int J Intell Syst 6:453–454
Kelley T (2002) The art of innovation: lessons in creativity from IDEO, America’s leading design
firm. Harper Collins Business, London
Kim S, Suh E, Yoo K (2007) A study of context inference for web-based information systems.
Electron Commer Res Appl 6:146–158
Kumar M, Paepcke A, Winograd T (2007) EyePoint: practical pointing and selection using gaze
and keyboard. In: Proceedings of the CHI: conference on human factors in computing systems,
San Jose, CA, pp 421–30
Lavie T, Tractinsky N (2004) Assessing dimensions of perceived visual aesthetics of web sites.
Int J Hum Comput Stud 60(3):269–298
Lee Y, Shin C, Woo W (2009) Context-aware cognitive agent architecture for ambient user
interfaces. In: Jacko JA (ed) Hum Comput Interact. Springer, Berlin, pp 456–463
Lenat DB, Guha RV (1994) Enabling agents to work together. Communications of the ACM 37
(7):127–142
Lenat DB, Guha RV, Pittman K, PrattM D, Shepherd M (1990) Cyc: toward programs with
commonsense. Commun ACM 33(8):30–49
Lieberman H, Selker T (2000) Out of context: computer systems that adapt to, and learn from,
context. IBM Syst J 39:617–632
Lindblom J, Ziemke T (2002) Social situatedness: Vygotsky and beyond. In: 2nd international
workshop on epigenetic robotics: modeling cognitive development in robotic systems,
Edinburgh, pp 71–78
Luck M, McBurney P, Priest C (2003) Agent technology: enabling next generation computing.
A roadmap for agent-based computing. Agentlink EU FP5 NoE
Lucky R (1999) Connections. In: Bi-monthly column in IEEE Spectrum
Lueg C (2002) Operationalizing context in context-aware artifacts: benefits and pitfalls. Hum
Technol Interface 5(2):1–5
Luger G, Stubblefield W (2004) Artificial intelligence: structures and strategies for complex
problem solving. The Benjamin/Cummings Publishing Company Inc
Nielsen J (1993) Usability engineering. Academic Press, Boston
Nielsen J, Budiu R (2012) Mobile usability. New Riders Press
Norman DA (1988) The design of everyday things. Doubleday, New York
Norman DA (1998) The invisible computer. MIT Press, Cambridge, MA
Pantic M, Rothkrantz LJM (2003) Toward an affect sensitive multimodal human-computer
interaction. Proc IEEE 91(9):1370–1390
Petersen MG (2004) Remarkable computing—the challenge of designing for the Home. In: CHI
2004, Vienna, Austria, pp 1445–1448
Pfeifer R, Scheier C (1999) Understanding intelligence. MIT Press, Cambridge
Picard RW (1997) Affective computing. MIT Press, Cambridge
Picard RW (2000) Perceptual user interfaces: affective perception. Commun ACM 43(3):50–51
Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of
affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
References 317

Poslad S (2009) Ubiquitous computing: smart devices, environments and interaction. Wiley,
London
Punie Y (2003) A social and technological view of ambient intelligence in everyday life: what
bends the trend? In: The European media and technology in everyday life network, 2000–2003,
Institute for Prospective Technological Studies Directorate General Joint Research Center
European Commission
Punie Y (2005) The future of ambient intelligence in europe: the need for more everyday life. In:
Media technology and everyday life in Europe: from information to communication. Roger
Silverstone Edition, Ashgate, pp. 141–165
Rieder B (2003) Agent technology and the delegation-paradigm in a networked society. Paper for
the EMTEL conference, 23–26 April, London
Rist T, Brandmeier P (2002) Customizing graphics for tiny displays of mobile devices. Pers
Ubiquit Comput 6(4):260–268
Riva G, Loreti P, Lunghi M, Vatalaro F, Davide F (2003) Presence 2010: the emergence of
ambient intelligence. In: Riva G, Davide F, Jsselsteijn WAI (eds) Being there: concepts, effects
and measurement of user presence in synthetic environments. IOS Press, Amsterdam, pp 60–81
Riva G, Vatalaro F, Davide F, Alcañiz M (2005) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam
Rosen R (1985) Anticipatory systems: philosophical, mathematical and methodological founda-
tions. Pergamon Press, Oxford
Rosenfield PL (1992) The potential of transdisciplinary research for sustaining and extending
linkages between the health and social science. Soc Sci Med 35(11):1343–1357
Rossi D, Schwabe G, Guimares R (2001) Designing personalized web applications. In:
Proceedings of the tenth international conference on World Wide Web, pp 275–284
Russell S, Norvig P (1995) Artificial intelligence: a modern approach. Prentice-Hall Inc,
Englewood Cliffs, NJ
Russell S, Norvig P (2003) Artificial intelligence—a modern approach. Pearson Education, Upper
Saddle River, New Jersey
Samtani P, Valente A, Johnson WL (2008) Applying the SAIBA framework to the tactical
language and culture training system. In: Parkes P, Parsons M (eds) The 7th international
conference on autonomous agents and multiagent systems (AAMAS 2008). Estoril, Portugal
Schilit B, Adams N, Want R (1994) Context-aware computing applications. In: Proceedings of
IEEE workshop on mobile computing systems and applications, Santa Cruz, CA, pp 85–90
Schmidhuber J (1991) Adaptive confidence and adaptive curiosity. Technische Universitat
Munchen, Institut fur Informatik
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam, pp 159–178
Schütz A, Luckmann T (1974) The structures of the life-world. Heinemann, London
Sibert LE, Jacob RJK (2000) Evaluation of eye gaze interaction. In: Proceedings of the ACM
conference on human factors in computing systems. The Hague, pp 281–288
Smith R, Conrey FR (2007) Agent-based modeling: a new approach for theory building in social
psychology. Person Soc Psychol Rev 11:87–104
Somervell J, Wahid S, McCrickard DS (2003) Usability heuristics for large screen information
exhibits. In: Rauterberg M, Menozzi M, Wesson J (eds) INTERACT 2003, Zurich, pp 904–907
Stiermerling O, Kahler H, Wulf V (1997) How to make software softer—designing tailorable
applications. In: Symposium on designing interactive systems, pp 365–376
Suchman L (1987) Plans and situated actions: the problem of human-machine communication.
Cambridge University Press, Cambridge
Suchman L (2005) Introduction to plans and situated actions II: human-machine reconfigurations,
2nd edn. Cambridge University Press, New York/Cambridge
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA
318 6 Implicit and Natural HCI in AmI: Ambient …

ter Maat M, Heylen D (2009) Using context to disambiguate communicative signals. In:
Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals. Springer, Berlin,
pp 164–169
Udsen LE, Jorgensen AH (2005) The aesthetic turn: unraveling recent aesthetic approaches to
human-computer interaction. Digital Creativity 16(4):205–216
Ulrich W (2008) Information, context, and critique: context awareness of the third kind. In: The
31st information systems research seminar in Scandinavia, Keynote talk presented to IRIS 31
Vilhjálmsson HH (2009) Representing communicative function and behavior in multimodal
communication. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Signals: cognitive
and algorithmic issues. Springer, Berlin, pp 47–59
Wasserman V, Rafaeli A, Kluger AN (2000) Aesthetic symbols as emotional cues. In: Fineman S
(ed) Emotion in organizations. Sage, London, pp 140–167
Weiser M (1991) The computer for the 21st century. Sci Am 265(3):94–104
Weiser M, Brown JS (1998) The coming age of calm technology. In: Denning PJ, Metcalfe RM
(eds) Beyond calculation: the next fifty years of computing. Springer, New York, pp 75–85
Wooldridge M (2002) An introduction to multiagent systems. Wiley, London
Wooldridge M, Jennings NR (1995) Intelligent agents: theory and practice. Knowl Eng Rev 10
(2):115–152
Wright D (2005) The dark side of ambient intelligence. Forsight 7(6):33–51
Wright D, Gutwirth S, Friedewald M, Punie Y, Vildjiounaite E (2008) Safeguards in a world of
ambient intelligence. Springer, Dordrecht
York J, Pendharkar PC (2004) Human-computer interaction issues for mobile computing in a
variable work context. Int J Hum Comput Stud 60:771–797
Zhou J, Kallio P (2005) Ambient emotion intelligence: from business awareness to emotion
awareness. In: Proceeding of 17th international conference on systems research, informatics
and cybernetics, Baden
Zhou J, Yu C, Riekki J, Kärkkäinen E (2007) AmE framework: a model for emotion-aware
ambient intelligence. University of Oulu, Department of Electrical and Information Engineering,
Faculty of Humanities, Department of English VTT Technical Research Center of Finland
Part II
Human-Inspired AmI Applications
Chapter 7
Towards AmI Systems Capable
of Engaging in ‘Intelligent Dialog’
and ‘Mingling Socially with Humans’

7.1 Introduction

Human communication has provided a wealth of knowledge that has proven to be


valuable and seminal in HCI research and practice. This involves the way inter-
active computer systems can be engineered, designed, modeled, operate, and
behave—e.g., perceive and respond to users’ multimodal verbal and nonverbal
communication signals in relation to a variety of application domains within both
AI and AmI. Verbal and nonverbal communication behavior has been extensively
studied and widely applied across several computing fields. The most significant
contribution of verbal and nonverbal communication behavior theories to AmI is
the development of naturalistic multimodal user interfaces, which can be imple-
mented in all kinds of applications emulating human functioning in terms of cog-
nitive, emotional, social, and conversational processes and behaviors. Entailing
specific user interfaces, AmI is capable of, among others, responding intelligently to
spoken or gestured indications, reacting to explicit spoken and gestured commands,
engaging in intelligent dialogs, and mingling socially with human users. In other
words, as a research area in HCI, naturalistic multimodal user interfaces and thus
verbal and nonverbal communication behavior have been applied in context-aware
systems, affective systems, touchless systems, dialog act systems, and embodied
conversational agents (ECAs). The focus of this chapter is on the use and appli-
cation of verbal and nonverbal communication behavior in ECAs and dialog act
systems (sometimes referred to as spoken dialog systems (SDS). This has inspired
many researchers into a quest for creating interaction between humans and systems
in AmI environments that strive to emulate natural interaction. Recent approaches
in research in both AI and AmI have been influenced by a pursuit of modeling the
common knowledge base used in communication between humans for under-
standing each other and which encompasses a complete world and language model
as well as of modeling verbal and nonverbal communication behavior model and to
make this knowledge accessible for computer systems.

© Atlantis Press and the author(s) 2015 321


S.E. Bibri, The Human Face of Ambient Intelligence, Atlantis Ambient
and Pervasive Intelligence 9, DOI 10.2991/978-94-6239-130-7_7
322 7 Towards AmI Systems Capable …

The origins of ECA (and SDS) can be traced back to AI research in the 1950s
concerned with developing conversational interfaces. The research is commonly
considered a branch of HCI. However, it is only during the last decade, with major
advances in speech and natural interaction technology, that large-scale working
conversational systems have been developed and applied, where the incorporation
of components remains of a key issue. As a research area in AI, ECAs attempt to
personify the computer interface in the form of an animated person (human-like
graphical embodiment) or robot (human-like physical embodiment), and present
interactions in a conversational form. Given the fundamental paradigm of AmI,
namely that interfaces disappear from the user’s consciousness and recede into the
background, the model of human-like graphical embodiment is of more relevance
in the context of AmI. A face-to-face conversation involving humans and virtual
beings is considered as the highest intelligent behavior an AmI system can exhibit.
In this sense, AI relates to AmI in that the latter entails artificial systems that
possess human-inspired intelligence in terms of the processes and behaviors
associated with conversational acts—computational intelligence.
More recent research within ECAs has started to focus on context (dialog,
situation, environment, and culture) to disambiguate communicative signals and
generate multimodal communicative behavior. This research endeavor constitutes
one of the critical steps towards coming closer to the aim of creating interaction
between humans and systems that verge on natural interaction. Conversational
systems are built on theoretical models of linguistics and its subfields as well as
nonverbal communication behavior, coupled with context awareness, natural
interaction, and autonomous intelligent behavior as computational capabilities
exhibited by agents. Within AI research in AmI, many theoretical perspectives of
human communication are being investigated, and new computational modeling
and simulation techniques are being developed to create believable human repre-
sentatives. The combination of recent discoveries in human communication and
neurocognitive science—that make it possible to acquire a better understanding of a
variety of aspects of human functioning in terms of interaction (linguistic, prag-
matic, psycholinguistic, neurolinguistic, sociolinguistic, cognitive-linguistic, and
paralinguistic aspects), and the breakthroughs at the level of the enabling tech-
nologies make it increasingly possible to build advanced conversational systems
based on this understanding.
This chapter addresses computational intelligence in terms of conversational and
dialog systems and computational processes and methods to support complex
communicative tasks. It aims to explore human verbal and nonverbal communi-
cation behavior and shed light on the recent attempts undertaken to investigate
different aspects of human communication with the aim to replicate and implement
them into ECAs. In HCI, ECAs represent multimodal user interfaces where
modalities are the natural modalities of human conversation, namely speech, facial
expressions and gestures, hand gestures, and body postures (Cassell et al. 2000).
7.2 Perspectives and Domains of Communication 323

7.2 Perspectives and Domains of Communication

Communication is of a multifarious, multifaceted, and diversified nature. The term


‘communication’ has been defined in multiple ways and approached from different
perspectives. Communication has been studied in various disciplines for a long
period of time. Among the disciplines where the term is of fundamental use include:
humanities, cognitive science, cognitive psychology, cognitive neuroscience,
sociology, anthropology, organizational science, computer science (HCI, AmI, and
AI), ICT, and human communication, but to name a few. Moreover, there is a
variety of specialties that can be found under the communication curricula of major
academic institutions, including interpersonal communication, intrapersonal com-
munication, verbal and nonverbal communication, intercultural communication,
cross-cultural communication, applied communication, organizational communi-
cation, human–computer communication (HCC), computer-mediated communica-
tion, mass communication, and so on. Accordingly, human communication occurs
on different levels, such as intrapersonal, interpersonal, intercultural, and cross-
cultural, and in small, medium, and large groups, as well as in a variety of settings,
such as home, working, social, and public environments. Interpersonal communi-
cation, which can be described as how two individuals send and receive messages,
is of particular relevance to AmI as an approach to HCI. HCI denotes, in addition to
influencing the functioning of the computer system by a human user by means of
explicit and/or implicit input and communicating information from computer sys-
tems to human users, the two-way process of communication between computer
systems and human users, which is the focus of this chapter.

7.3 Human Communication

The working definition of communication for this chapter intends to accommodate


the mainstream perspective as adopted by many scholars who study communica-
tion. Communication or the act of communicating can be described as a process of
interchanging and imparting thoughts, feelings, messages, opinions or information
from one entity to another via one or a combination of relevant means, such as
speech, prosody, facial gestures, body language, written texts, symbols, and aes-
thetics. For example, to communicate emotions in dialog acts, different body parts
movement can be used: the gestural expression for attitudes; the facial expression
for emotional reactions; prosody for expressing feelings and attitudes; and speech,
the most precise tool, for expressing complex intentions (Karpinski 2009).
Communication processes are interactions between at least two humans or agents
and through which information is intended, channeled and imparted, via some
medium, by a sender to a recipient/receiver. One common model in communication
theory is the transmission model, which consists of three basic elements of com-
munication: production, transmission and reception. Sometimes referred to as
324 7 Towards AmI Systems Capable …

expression, production entails the process by which human agents express them-
selves through deciding, planning, encoding, and producing the message they wish
to communication. Transmission involves sending the message through some
medium to the recipient, e.g., in verbal communication the only medium of con-
sequence through which the spoken message travels is air. Reception, also referred
to as comprehension, entails the process by which the recipient detects the message
through the sense of hearing and then decodes the expression produced by the
sender. The receiver interprets the information being exchanged and then gives the
sender a feedback. Through this process, which is intrapersonal in nature, infor-
mation transmission affects each of the parties involved in the communication
process. Communication, whether be it verbal or nonverbal, involves, according to
Johnson (1989), three essential aspects: transmission of information, the meaning of
that transmission, and the behavioral effects of the transmission of the information.
From a different perspective, human communication can be clustered into four
levels: the content and form of messages, communicators, levels of communication,
and contexts and situations in which communication occurs (Littlejohn and Foss
2005). Using these levels, human communication can have a more structured view.
Furthermore, communication entails that communicating participants share an area
of communicative commonality, which is essential for a better understanding of the
content being exchanged between them. Characteristically, human communication
involves a common knowledge base used in communication between humans for
understanding each other. This common knowledge includes a complete world and
language model; language is a particular way of thinking and talking about the
world. The expectation of humans towards other humans in any communication act
is strongly influenced by the common knowledge they share. There are many types
of theories that attempt to describe the different models, levels, components, and
variables of how human communication as a complex and manifold process is
achieved. In all, human communication is a planned act performed by a human–
agent for the purpose to cause some effect in an attentive human recipient using
both verbal and nonverbal behaviors. In other words, it entails a two-way com-
munication process of reaching mutual understanding, in which participants
exchange (encode–decode) representations pertaining to information, ideas,
thoughts, and feelings, as well as create, share, and ascribe meaning (to these
representations).
Human communication is the field of study that is concerned with how humans
communicate, involving all forms of verbal and nonverbal communication. As an
academic discipline, human communication draws from several disciplines,
including linguistics, sociolinguistics, psycholinguistics, cognitive linguistics,
behavioral science, sociology, anthropology, social constructivism, social con-
structionism, and so on. As a natural form of interaction, human communication is
highly complex, manifold, subtle, and dynamic. It makes humans the most pow-
erful communicators on the planet. To communicate with each other and convey
and understand messages, humans use a wide variety of verbal and nonverbal
communicative behaviors. As body movements, such behaviors are sometimes
classified into micro-movements (e.g., facial expressions, eye movement) and
7.3 Human Communication 325

macro-movements (e.g., gestures, corporal stances). They have been under vigorous
investigation in the creation of AmI systems for ambient services and conversa-
tional purposes, as they can be utilized as both implicit and explicit inputs for
interface control and interaction.

7.3.1 Nonverbal Communication

Nonverbal communication is the process of communicating through sending and


receiving nonverbal signals that can be communicated through non-linguistic
means, such as facial expressions, eye contact, hand gestures, body stances, para-
linguistic features, spatial arrangements, patterns of touch, expressive movement,
cultural symbols and differences, and other nonverbal acts, and an aggregate of
these acts. Non-linguistic means involve auditory modality and visual sensory
modality, a sense through which humans can receive communicative signals, e.g.,
acoustical prosodic features and lip-movements or facial gestures, respectively.
Nonverbal communication has different functions, depending on the context of
the conversation. Nonverbal messages may communicate the exact same meanings
as verbal messages, and also may occur in combination with verbal messages,
serving a meta-communication purpose, that is, nonverbal messages communicate
something about verbal messages. They can moreover convey a wealth of con-
textual information that can help decode spoken language and thus understand its
meanings. Specifically, with facial expressions and gestures, hand gestures, and
body language conveying the context of statements, this information is shared in an
implicit way which can be significant for the overall communication, as this
information constitutes part of the evolving context that influences interaction and
the meaning to it—that is, given this implicit information the communication
becomes different for either case. The perceived need to complement written
messages with emoticons as often seen in SMSs, chat messages, and emails reflects
a desire to add context and convey meaning. Nonverbal communication can also be
used to serve a variety of functions, such as establishing and maintaining contact
using the smile, the eye contact, and the leaning forward and backward, for
example, as well as dissolving and breaking up interpersonal relationships through,
for instance, avoiding eye contact and frowning more often. It can also be utilized to
learn about and persuade other people you communicate with when, for example,
using eye gaze to communicate confidence or sincerity or use your facial expres-
sions to influence someone.
In daily life, we communicate more information nonverbally than verbally.
Research suggests that nonverbal communication is of greater importance than
verbal communication when it comes to understanding human behavior—the
nonverbal channels (e.g., face, eye, hand, body, prosody, silence, etc.) seem to be
more powerful than the verbal ones, what people can say in words alone. As
probably most people have experienced in everyday life, facial displays and hand
gestures, in particular, carry the significant part of a face-to-face communication,
326 7 Towards AmI Systems Capable …

by having more often a greater impact than the words being said. It is about how,
rather than what, the sender conveys the message that has a major effect on the
receiver. All in all, a great deal of our communication is of a nonverbal form. Facial
expression followed by vocal intonation (with the actual words being of minor
significance) are primarily relied on by the listener to determine whether they are
liked or disliked in an engaged face-to-face conversation, as Pantic and
Rothkrantz’s (2003) findings indicate. Research shows that hand gestures play an
important role in carrying the contrast between what a person likes and dislikes
instead of relying completely on the words. In general, by conveying gestures, the
sender can capture the attention of the receiver and connect with him/her. Also the
receiver of the message usually tends to base the intentions of the sender on the
nonverbal cues he/she receives to decode or better understand what he/she want to
say should the flow of communication is hindered due to, for example, an incon-
gruousness between nonverbal cues and the spoken message. In relation to con-
veying emotions, Short et al. (1976) point out that the primacy of nonverbal
affective information—independent of modality—is demonstrated by studies
showing that when this visual information is in conflict with verbal information,
people tend to trust visual information. Moreover, a communication act can more
often take place when a sender expresses facially or gestures a desire to engage in a
face-to-face conversation, assuming that both the sender and the receiver have to
give the same meanings to the nonverbal signal.
Nonverbal communication behavior constitutes the basis of how humans interact
with one another. It has been extensively researched and profusely discussed. There is
a large body of theoretical, empirical and analytical scholarship on the topic. The
following studies constitute the basis for an ever-expanding understanding of how we
all nonverbally communicate: Andersen (2004, 2007), Argyle (1988), Bull (1987),
Burgoon et al. (1996), Floyd and Guerrero (2006), Guerrero et al. (1999), Hanna
(1987), Fridlund et al. (1987), Hargie and Dickson (2004), Siegman and Feldstein
(1987), Gudykunst and Ting-Toomey (1988), Ottenheimer (2007). Segerstrale and
Molnar (1997), and Freitas-Magalhães (2006); but to name a few. These works cover
a wide range of nonverbal communication from diverse perspectives, including
psychological, social, cultural, and anthropological perspectives.
Nonverbal communication is most likely easily elucidated in terms of the various
channels through which related messages pass, including face, hand, eye, body,
space, touch, smell prosody, silence, time, and culture. Considering the purpose of
this chapter, only key relevant channels are reviewed. This is based on the basic
idea of how meaningful and consequential such channels are than others with
regard to human-like graphical embodiment, how conversational agents attempt to
personify the computer interface in the form of an animated person. Accordingly,
face, hand, eye, body, prosody, and paralanguage seem to be of higher relevance—
naturalistic multimodal user interfaces are used by computer systems to engage in
intelligent dialog with humans in an AmI environment. Next, body movements,
facial gestures, eye movements and contact, and paralanguage are addressed.
7.3 Human Communication 327

7.3.1.1 Body Movements

Human nonverbal communication entails a variety of behavioral modes. It is pro-


duced with the mouth (lip movement), the face, the hands, or/and other parts of the
body. Gestures are forms of nonverbal communication in which bodily movements
communicate particular messages, either jointly and in parallel with speech or in its
place. When we speak, we move our entire body (Kendon 1980, 1997; McNeill
1992; Kita 2003), in addition to articulatory gestures (Dohen 2009). Gestures are
distinct from physical nonverbal communication that does not communicate spe-
cific messages, such as expressive displays and proxemics (Kendon 2004).
Generally, they include movement of the hands, face, head, eye, and other parts of
the body. Visible behaviors such as head nods, gaze, gesture, and facial expressions
contribute to the communication of interpersonal and affective information. To date
most research has focused on recognizing and classifying facial expression,
although body movements, in particular arm and hand gestures, during a conver-
sation convey a wealth of contextual information to the listener (Gunes and Piccardi
2005; Kapur et al. 2005). Moreover, there is no such thing as a universal form of
human nonverbal communication and each culture has its own norms and styles of
communication.

7.3.1.2 Hand Gestures

One of the most frequently observed conversational cues is hand gestures—in other
words, most people use hand movements regularly in conversational acts. Gestures
form the basis on how humans interact with one another, enabling to communicate
a variety of feelings and thoughts, and therefore they are natural and invisible to
each other. While some gestures have universal meanings, others are individually
learned and thus idiosyncratic. Researchers in kinesics—the study of nonverbal
communication through face and body movements—identify five major categories
of body movements: emblems, illustrators, affect displays, regulators, and adaptors
(Ekman and Friesen 1969; Knapp and Hall 1997). Emblems are body gestures that
directly translate into words or phrases, which are used consciously to communicate
the same meaning as the words, such as the ‘OK’ sign and the ‘thumbs-up’.
Illustrators are body gestures that enhance or illustrate verbal messages they
accompany, e.g., when referring to something to the right you may gesture toward
the right. Illustrators are often used when pointing to objects or communicate the
shape or size of objects you’re talking about. Therefore, most often you illustrate
with your hands, but you can also illustrate with head and general body movements,
e.g., you turn your head or your entire body toward the right. Affect displays are
gestures of the face (such as smiling or frowning) but also of the hands and general
body (e.g., body tension or relaxation) that communicate emotional meaning. Affect
displays are often unconscious when, for example you smile or frown without
awareness. Sometimes, however, you may frown more than you smile consciously,
trying to convey your disapproval or deceit. Regulators are nonverbal behaviors
328 7 Towards AmI Systems Capable …

that monitor, control, coordinate, or maintain the speaking of another individual.


For example, a head nod may tell the speaker to keep on speaking. In terms of
serving to co-ordinate turn-taking transitions, according to Beattie (1978, 1981) and
Duncan (1972), the termination of gesture acts as a signal that the speaker is ready
to hand over the conversational floor, and is therefore a ‘turn-yielding’ cue.
Adaptors are gestures that are emitted without awareness and that usually satisfy
some personal need, such as scratching to relieve an itch. There are different cat-
egories of adaptors, including: self-adaptors, self-touching, and object-adaptors.
The first category concerns gestures done by the speaker such as rubbing his/her
nose or scratching left side of his/her head; the second category is movements
directed at the communication recipient, such as straightening your tie or folding
your arms in front of you to give or keep others a comfortable distance from you;
and the last category is gestures focused on objects, such as swinging a pen between
two fingers. As it can be noticed some of the above categories are associated with
facial gestures, but they are still presented in this section for the purpose of
coherence.
Gestures are used as a nonverbal means to communicate between humans, but
the way they are used involves cultural dimensions. Hence, there is no such thing as
a universal form of gestural communication and each culture has its own rules of
communication. Not only are variations in gestures cross-cultural, but also
intra-cultural and inter-cultural. This is important for HCI community to understand
and account for when designing any class of naturalistic user interfaces. Failure to
consider cultural variations in interface design is likely to have implications for the
performance of AmI interactive systems when instantiated in their operating
environment. This concerns both context-aware systems and conversational sys-
tems. Indeed, arm and hand gestures can convey a wealth of contextual information
as implicit input to context-aware systems, and in order to have a sound interpre-
tation of this information for further means of processing—context inference, it is
important to account for cultural variations, differences in culturally based com-
munication styles. Similarly, gestures can be used by conversational agents as
contextual cues to decode and better understand the meaning of spoken language of
the human user, and so accounting for cultural differences is crucial to deliver a
relevant communicative function and behavior. Cultural variations are great when it
comes to nonverbal communication behavior, as different cultures may assign
different meanings to different gestures (e.g., emblems, regulators, adaptors). In
multicultural societies it is common that the same gesture may have different
meanings (e.g., the ‘thumbs up’ sign as an emblem means ‘okay’ in almost every
part of the world except in Australia where it is considered ‘impolite’), as different
cultures use different signs or symbols to mean the same thing. Therefore, under-
standing and accepting cultural differences is critical for social acceptance of AmI
technology. Specifically, to design successfully widely adopted AmI systems, it is
critical to account for cultural variations because miscalculating the relevance of
cross-cultural communications can be counterproductive and disregarding the cul-
turally sensitive communication styles be considered improper and discourteous in
the context of conversational systems within AmI environments. This has also
7.3 Human Communication 329

implications for the functioning of context-aware systems in the sense that they may
become unreliable when they fail to react the way they are supposed to. Otherwise,
as research suggests, a universal gesturing language must be created and taught in
order for context-aware, affective and conversational computing to work. Indeed,
some joint research endeavors (see, e.g., Vilhjálmsson 2009) are being undertaken
to define and build a universal nonverbal communication framework as a part of the
ongoing research in the area of conversational systems—modeling of human
multimodal nonverbal communication behavior. However, this solution may not be
as robust as interactive systems that can adapt and respond dynamically to each
user, context, and interaction; hence, the issue of unviability or unworkability of
new technologies becomes likely. There is no such thing as a one-size-fits-all
solution for the diversity of users and interactions. Indeed, cross-cultural HCI has
emerged to respond to a need brought up by the inevitability of embedding
‘culturability’ in global ICT design. Even in human-to-human communication,
people are becoming increasingly aware of cultural variations and thus culturally
sensitive when using gestures in foreign countries. A discrepancy in the shared
knowledge of gestures may lead to communication difficulties and misunder-
standings as probably most people have experienced in everyday life. Different
cultures use different symbols to mean the same thing or use the same symbol to
mean different things. Among the main hurdles in implementing emotional and
social models of context as well as models of nonverbal communication behavior is
the meaningful interpretation of data collected implicitly from the users’ nonverbal
communication behaviors. More research is needed to investigate the implications
of sociocultural contexts in interpreting nonverbal communication behavior.
Gesture recognition is recognized as one of the most important things to design
effective emotional context-aware systems and emotion-aware conversational
agents. Whether be it of a gestural, facial or corporal nature, nonverbal commu-
nication behavior serves as significant channels to convey emotions between con-
versational participants as well as emotional context information, which normally
influence the patterns of conversational acts. Detection of emotion will rely upon
assessment of multimodal input, including gestural, facial and body movement
(Gunes and Piccardi 2005; Kapur et al. 2005). Culturally nuanced variations in
gestures presume the use of different modes/modalities rather than relying solely
upon one mode to shun ineffective or erroneous interpretation of affective
information.

7.3.1.3 Facial Movements

Before delving into the discussion on facial movements as part of nonverbal


communication behavior, it is first important to underline that facial behavior has
multi-functionality. Facial movements can be used by humans to allow commu-
nication—facial gestures—or to convey emotions—facial expressions. For exam-
ple, a smile or a frown ‘can have different meanings: it can be a speech-regulation
signal (e.g., a back-channel signal), a speech-related signal (illustrator), a means for
330 7 Towards AmI Systems Capable …

signaling relationship (e.g., when a couple is discussing a controversial topic, a


smile can indicate that although they disagree on the topic there is no ‘danger’ for
the relationship),…and an indicator for an emotion (affect display)’ (Kaiser and
Wehrle 2001, p. 287). Facial expressions are explicit emotional displays that can
occur during or outside a conversation. On the other hand, facial gestures tend to
serve as means, during a conversation to regulate talking, that is, to monitor,
control, coordinate, or maintain the speaking, which normally includes speech and
other hand and corporal gestures. Facial behavior has non-emotional, communi-
cative functions (Ekman 1979; Ekman and Friesen 1969; Fridlund 1994; Russell
and Fernández-Dols 1997). Furthermore, facial displays involve explicit verbal
displays (e.g., visemes) or have an explicit verbal message (e.g., an observation
about the shape or size of artifacts may be accompanied by widening of the eyes
(Zoric et al. 2009). A viseme describes the particular facial and oral movements that
occur alongside the voicing of phonemes. The term viseme was introduced based
on the interpretation of the phoneme as a basic unit of speech in the
acoustic/auditory domain (Fisher 1968).

7.3.1.4 Facial Expressions

As an explicit affect display, facial expressions are highly informative about the
affective or states of people, as they are associated with expressing emotional
reactions. The face is so visible that conversational participants can interpret a great
deal from the faces of each other. Facial expressions can be important for both the
speaker and the listener in the sense of allowing the listener to infer the speaker’s
emotional stance to their utterances, and the speaker to determine the listener’s
reactions to what is being uttered. Particularly, the listener/recipient relies heavily
on the facial expressions of the speaker/sender as a better indicator of what he/she
intends to convey as feelings, and therefore monitor facial expressions constantly as
they change during interaction. Facial cues can constitute communicative acts,
comparable to ‘speech acts’ directed at one or more interaction partner
(Bänninger-Huber 1992). Pantic and Rothkrantz’s (2003) findings indicate that
when engaged in conversation the listener determines whether they are liked or
disliked by relying primarily upon facial expression followed by vocal intonation,
with the spoken words or utterances being of minor significance. In line with this,
when visual information conveyed by facial expressions is in conflict with verbal
information, people tend to trust visual information (Short et al. 1976).
Facial expressions communicate various emotional displays irrespective of
cultural variations. Ekman and Friesen (1969) and Ekman (1982) identify six
universal facial displays: happiness, anger, disgust, sadness, fear, and surprise, and
show that are expressed and interpreted in the similar way by people regardless of
their culture. In terms of conversational agents, the six universal facial expressions
occur in Cassell (1989), an embodied conversation that integrates both facial and
gestural expressions into automatic spoken dialog systems. However, while based
on classic psychological theory, those six basic emotions are universally displayed
7.3 Human Communication 331

and recognized, more recent work argues that the expression of emotions is cul-
turally dependent and that emotions cannot be so easily categorized (Pantic and
Rothkrantz 2003). Similar to gestures, cultural variations are also applicable to
facial expressions as different cultures may assign different meanings to different
facial expressions, e.g., a smile as a facial display can be considered a friendly
gesture in one culture while it can signal embarrassment or even regarded as
insulting in another culture. Hence, to achieve a wide adoption and ease social
acceptance of AmI, it is critical to account for cultural variations in facial expres-
sions when designing AmI systems (context-aware systems, affective systems, and
conversational systems. Failure to recognize and account for differences in cul-
turally based, facially expressed emotions may have implications for the perfor-
mance of AmI systems. The lab-based defined metrics to evaluate how well
technologies perform may be inessential to the real-world instantiations of AmI
systems in different operating environments. In other words, what is technical
feasible and risk-free within the lab may have implications in the real-world
environment. For a detailed discussion on facial expressions, e.g., unsettled issues
concerning their universality, see next chapter.

7.3.1.5 Facial Gestures

As mentioned above, facial gestures serve as means to regulate talking, that is, to
monitor, control, coordinate, or maintain the speaking. Thus, they are of pertinence
and applicability to conversational systems. As a form of nonverbal communica-
tion, a facial gesture is ‘made with the face or head used continuously in combi-
nation with or instead of verbal communication’ (Zoric et al. 2009). Considerable
research (e.g., Chovil 1991; Fridlund et al. 1987; Graf et al. 2002) has been done on
facial gestures, e.g., head movements, eyebrow movements, eye gaze directions,
eye blinks, frowning and so on. Knapp and Hall (1997, 2007) identify six general
ways in which nonverbal communication (involving facial gestures, prosodic pat-
terns or hand gestures) blends with verbal communication, illustrating the wide
variety of meta-communication functions that nonverbal messages may serve to
accentuate, complement, contradict, regulate, repeat, or substitute for other mes-
sages. To accentuate is when you use nonverbal movement like raising your voice
tonality to underscore some parts of the verbal message, e.g., a particular phrase; to
complement is when you add nuances of meaning not communicated by verbal
message, e.g., a head nod to mark disapproval; to contradict is when your verbal
message is not congruent with your nonverbal gestures, e.g., crossing your fingers
to indicate that you’re lying; to regulate or control the flow of verbal messages, e.g.,
making hand gestures to indicate that you want to speak or put up your hand to
indicate that you’ve not finished and are not ready to relinquish the floor to the next
speaker; to repeat or restate the verbal message nonverbally, e.g., you motion with
your head or hand to repeat your verbal message; and finally, to substitute for or
take the place of verbal messages, e.g., you can nod your head to indicate ‘yes’ or
shake your head to indicate ‘no’. Likewise, some of Knapp and Hall’s (1997, 2007)
332 7 Towards AmI Systems Capable …

general ways involve hand gestures, but they are presented in this section for the
purpose of coherence.
In everyday communication humans employ facial gestures (e.g., head move-
ment, eyebrow movement, blinking, eye gaze, frowning, smiling, etc.) consciously
or unconsciously to regulate flow of speech, punctuate speech pauses, or accentuate
words/segments (Ekman and Friesen 1969). In this context, Pelachaud et al. (1996)
distinguish several roles of facial gestures:
• Conversational signals—facial gestures in this category include eyebrow
actions, rapid head movements, gaze directions, and eye blinks, and these occur
on accented items clarifying and supporting what is being said.
• Punctuators—facial gestures in this category involve specific head motions,
blinks, or eyebrow actions, and these gestures support pauses by grouping or
separating sequences of words.
• Manipulators—involve facial gestures that correspond to the biological needs of
a face and have nothing to do with the linguistic utterances, e.g., blinking to wet
the eyes or random head nods.
• Regulators—correspond to facial gestures that control the flow of conversation
(e.g., turn-taking, turn-yielding, and feedback-request), and these gestures
include eye gaze, eye-contact, and eyebrow actions. Speakers look at listeners
and raise their eyebrows when they want feedback and listeners raise eyebrows
in response (Chovil 1991). Emphasis generally involves raising or lowering of
the eyebrows (Argyle et al. 1973).
Which of these facial gestures can be implemented or applied to ECA systems is
contingent upon whether the ECA acts as a presenter or is involved in a face-to-face
conversation—a believable virtual human. For example, the work of Zoric et al.
(2009) deals with ECAs that act only as presenters so only the first three roles are
applicable for ECAs. Accordingly, the features included in the current version of
their system are: head and eyebrow movements and blinking during speech pauses;
eye blinking as manipulators; and amplitude of facial gestures dependent on speech
intensity. This system is described and illustrated in the end of this chapter.

7.3.1.6 Eye Movement

Like other nonverbal communication behaviors, eye movement has a


multi-functional role, serving to convey emotions and to provide conversational
cues (part of which has been covered above), as well as ‘to express thought’
(Scherer 1992, 1994) (see Chap. 9 for more detail on eye gaze as an indicator of
thought/cognitive processes). Research shows that eye gaze is associated with
conveying interpersonal attitude, affect, attention, joint attention, turn-taking,
seeking feedback, reference, joint reference, and so on.
Eye gaze indicates spontaneous and emotional responses to your communication
instead of relying solely on conscious and verbal responses. It is highly informative
about the interpersonal attitude or emotional stance of conversational participants;
7.3 Human Communication 333

it is visible enough for them to interpret a great deal of interpersonal information.


Oculesis studies—research on the messages communicated by the eyes—show that
these messages vary depending on the duration, direction, frequency, and quality of
the eye behavior. Commonly, eye gaze patterns show specific distributions with few
gazes lasting more than a second, a deviation from which is subsequently associated
with an unusual form of interaction. People tend to evaluate others by their patterns
of gaze: people who look at their interlocutor a lot of the time are ‘friendly’ and
‘sincere’, whereas those who look at their interlocutor only a small part of the time
are judged as ‘defensive’ (Kleck and Nuessle 1968). People tend to look more at
conversants whom they like (Exline and Winters 1965). Eye movements may
moreover signal the nature of a relationship, whether positive or negative through,
respectively, an attentive glance or eye avoidance (De Vito 2002). However, the
duration rules vary from a culture to another in terms of the proper duration for eye
contact. Findings in Argyle and Ingham (1972) and Argyle (1988) indicate that in
USA the average length of mutual gaze (when both participants are simultaneously
looking at each other) is 1.18 s. Accordingly, a deviation in excess from this
duration may mean that the person is showing unusually high interest, whereas a
deviation in terms of eye contact falling short of this duration may lead to thinking
that the person is uninterested. Furthermore, directing your gaze downward when
breaking or avoiding eye contact, you communicate a lack of interest in the other
person (De Vito 2002). Likewise, the direction of the eye is often culturally
dependent; accordingly, breaking directional rules you might convey different
meaning, such as high or low interest or self-consciousness aversion and uneasi-
ness, as research suggests. Also, the frequency of eye contact may signal either
interest or boredom. As far as the quality of eye behavior—how wide or how
narrow the eyes can be during interaction—is concerned, it communicates such
emotions as surprise, fear, and disgust (see Ekman 1994) as well as interest level
(De Vito 2002).
In addition, eye movement serves a variety of conversational functions. There is
a great deal of work elucidating the function of eye gaze and gesture in mediating
turn-taking (signaling others to speak), seeking feedback, compensating for
increased physical distance, and reference. The speaker informs the listener that the
channel of communication is open so he/she could now speak, e.g., when a speaker
asks a question or finishes a thought and then looks to the listener for a response.
Speakers break or look for an eye-contact with a listener at turn beginning (Argyle
and Cook 1976). Speakers and listeners show different patterns of gaze, with lis-
teners spending more time looking at speakers than vice versa (Ibid). In other
words, the average speaker maintains a high level of eye contact while listening and
a lower level while speaking. In the case of eye gaze mediating transitions, Kendon
(1967) found that speakers tend to look more at listeners as speakers draw to the
end of their turn to await confirmatory indication that the listener is ready to carry
on. Speakers select next speaker with gaze near the end of their own turn (Kendon
1990). Another function of eye gaze is to seek feedback when talking with someone
by looking at him/her intently as if to ask his/her opinion. Moreover, you psy-
chologically lessen the physical distance between yourself and another by making
334 7 Towards AmI Systems Capable …

eye contact, when you catch someone’s eye you become psychologically close
though physically far apart (De Vito 2002). Additionally, eye gaze plays a role in
reference to objects or events, and a critical aspect of conversational content
coordination is the ability to achieve joint reference. People excel at determining
where others are looking (Watt 1995). Gaze serves to coordinate the joint attention
of conversational participants as to an object or event by referring to it by pointing
gestures. Joint attention to an object or event also allows participants greater
flexibility in how they verbally refer to it, whether they resort to pointing gestures
or not (Clark and Marshall 1981). All in all, eye gaze is a powerful form of
nonverbal communication and a key aspect of social communication and interac-
tion. Therefore, eye movement is of high relevance to HCI as to designing con-
versational systems as well as context-aware and affective systems for what it
entails in terms of conveying emotional cues, indicating cognitive processes, and
having conversational functions.

7.3.1.7 Paralanguage and Prosody

The study of paralanguage is known as paralinguistics. Paralanguage refers to


nonverbal information coded in different forms of communication, such as speech
and writing language, which modify/nuance meaning or convey emotions. It is not
bound to any sensory modality, e.g., speech or vocal language can be heard, seen,
and even felt. It may be expressed consciously or unconsciously. It involves a set of
non-phonemic properties of speech, including vocal pitch (highness or lowness),
intonational contours, speaking tempo, volume (loudness), and rhythm, as well as
speaking styles, hesitations, sighs, gasps, and so on. A variation in any of para-
linguistic features communicates something, and the meanings will differ for the
receiver depending on how they can be combined, although the words may be the
same. Like other nonverbal communication behaviors, paralanguage has to do with
how a speaker says what he/she says rather than what he/she says. Therefore, the
paralinguistic features of speech play a key role in human speech communication.
All utterances and speech signals have paralinguistic properties, as speech requires
the presence of a voice that can be modulated as the communication evolves.
‘Paralinguistic phenomena occur alongside spoken language, interact with it, and
produce together with it a total system of communication… The study of para-
linguistic behavior is part of the study of conversation: the conversational use of
spoken language cannot be properly understood unless paralinguistic elements are
taken into account.’ (Abercrombie 1968).
Paralanguage can be used in assessing the effectiveness of communicating
messages and emotional stances and reactions. According to MacLachlan (1979), in
one-way communication (when one person is doing all or most of the speaking and
the other person is doing all or most of the listening), those who talk fast (about
50 % faster than normal) are more persuasive. In a way, a recipient agrees more
with a fast speaker than with a slow speaker and finds the fast speaker more
intelligent and objective. Paralanguage helps us interpret people and their
7.3 Human Communication 335

believability and emotions. Research suggests that paralinguistic features convey


emotions that can be accurately judged regardless of the content of the message that
can involve both speech and other nonverbal communication behaviors (e.g., facial
expressions, gestures, body stances). Therefore, one as a communicator should be
aware of the influence of paralinguistic features on the interpretation of one’s
message by the receiver. A listener can accurately judge the emotional state of a
speaker from intonation and vocalizations. Paralinguistic cues or signals are often
used as a basis for evaluating communicators’ emotional states. Research suggests
that common language and cultural norms—shared knowledge as a complete world
and language model—are necessary for paralanguage cues to communicate emo-
tions between people. Paralanguage cues are not so accurate when used to com-
municate emotions to those who speak a different language (Albas et al. 1976).
Paralanguage is a significant research topic within ECA community (see below),
especially in relation to conversational systems with human-like graphical
embodiment. Besides, when building believable ECAs or AmI systems capable of
engaging in intelligent dialog, the rules of human verbal and nonverbal commu-
nication must be taken into account. In their project, Zoric et al. (2009) connect
speech related facial gestures with prosody to animate ECAs using only natural
speech as input. The authors note that knowledge needed for correlating facial
gestures and prosodic features extracted from the speech signal is based on the
results of paralinguistic and psychological research. Paralanguage has been inves-
tigated within the area of ECAs in particular and that of HCI in general.
Prosody is about paralinguistic properties of speech. In linguistics, prosody
refers to the rhythm, pitch, stress, and intonation of speech to convey information
about the structure and meaning of an utterance. Zoric et al. (2009, p. 13) define
prosody as: ‘characteristics of speech which cannot be extracted from the charac-
teristics of phoneme segments, where pauses in speech are also included. Its
acoustical correlates are pitch, intensity (amplitude), syllable length, spectral slope
and the formant frequencies of speech sounds.’ ‘Frequency code’ (Ohala 1984) is a
most fundamental and widespread phenomenon of prosody; it serves the purpose of
distinguishing questions from statements. Prosody may reflect various features of
the utterance: the form pertaining to statement, question, or command; emphasis
and contrast; or other aspects of language that may not be grammatically or lexi-
cally encoded in the spoken utterances. Prosody may facilitate or impede lexical
and syntactic processing, organize higher levels of discourse, and express feelings
and attitudes, as well as contribute to topic identification processes and turn taking
mechanisms (Karpinski 2009). It is difficult to describe prosody in a consistent way
and is a source of endless controversies due to the abundance of its functions (Fox
2000). It is hence crucial for a conversational agent as a believable human repre-
sentative to consider and implement prosodic elements for an effective interpreta-
tion of the meaning of verbal and emotional messages as well as a clear delivery of
communicative behavior.
Prosody is linked to other nonlinguistic communication behaviors, and some of
its features that are concerned with modifying or nuancing meaning can be
expressed through punctuation in written communication. There is a correlation
336 7 Towards AmI Systems Capable …

between prosody and facial gestures (and expressions). The information extracted
from speech prosody is essential for generating facial gestures by analyzing natural
speech in real-time (Zoric et al. 2009). Prosody is crucial in spoken communication
as illustrated by an example from Truss (2003). In this example, punctuation rep-
resents the written equivalent of prosody. Although they have completely different
meanings and are pronounced differently, the two sentences below correspond to
exactly the same segmental content:
A woman, without her man, is nothing.
A woman: without her, man is nothing.
Significant differences in meaning are easily communicated depending on where
the speaker places the stress in a given sentence. Each sentence with a stress on a
given word (or a combination of two or more) may communicate something dif-
ferent, or each asks a different question if the sentence is in a form of a question,
even though the words are exactly the same. That is to say, all that distinguishes the
sentences is stress, the way they are uttered.

7.3.2 Verbal Communication: A Theoretical Excursion


in Linguistics and Its Subfields

Verbal communication entails the process whereby conversational content is


transferred orally from a sender to receiver via linguistic means. It is, in other
words, the process of sending and receiving verbal signals through speech. In
human communication, speech is referred to as verbal behavior provided through a
combination of spoken texts and sounds. Spoken text, a system of symbols or
lexemes, is governed and manipulated by semiotic rules of spoken discourse.
Semiotics, the field that is closely related to linguistics is concerned with the study
of signs and symbols, the ways in which signs and symbols and their meanings are
created, decoded, and transformed in language as other systems of communication.
Semiotics encompasses syntactic (structure of words and sentences), semantics
(meaning of words and sentences), and pragmatics (the role of context in the
interpretation of the meaning of words and sentences). On the other hand, sound is
concerned with phonology (sound systems and abstract sound units) and phonetics
(the acoustic properties, physiological production, and auditory perception of
speech sounds). Like written texts, which involve such nonverbal elements as
handwriting style, font, the use of emoticons, and arrangement of words, spoken
texts have nonverbal elements, such as rhythm, intonation, pitch, loudness, inten-
sity, voice quality, and speaking style. These are used, to iterate, to modify or
nuance meaning or convey emotions. Whether in written or spoken form of texts,
emotions consist of combination of symbols used to convey emotional content. As
the transmission system of language is characterized by an intricate set of dynamic
components, speech is deemed a highly complex form of communication. To form
7.3 Human Communication 337

meaningful words, speech involves rapid, coordinated movements of the lips,


tongue, mouth palate, vocal cords and breathing to articulate sounds. It is the most
precise, effective and flexible means of communicating complex intentions,
meanings, and emotions, as well as sharing experience and knowledge. This is true
of all human social groups. Research suggests that humans form sentences using
very complex and intricate patterns, but they are oblivious to the rules that regulate
and govern their own speech, as these rules seem to be obscure to their con-
sciousness. The topic of spoken communication is sweeping and multifaceted, but
the focus in this chapter is on the aspects of verbal communication behaviors that
are of relevance and applicability to HCI, with a particular emphasis on conver-
sational systems associated with AmI.
Linguistics is the scientific study of natural language, the general and universal
properties of language. The features language has distinguish it from any possible
artificial language. The scientific study of language covers the structure, sounds,
meaning, and other dimensions of language as a system. Linguistics encompasses a
range of single and interdisciplinary subfields: single subfields include morphology,
syntax, phonology, phonetics, lexicon, semantics, and pragmatics, and
Interdisciplinary subfields include sociolinguistics, psycholinguistics, cognitive
linguistics, and neurolinguistics.

7.3.2.1 Communicative Language Competence

Communicative language competence can be described as the language-related


knowledge, abilities, and know-how that language users bring to bear to realize
their communicative acts and thus their communicative intents. Communicative
language competence can be considered as consisting of several components: lin-
guistic, sociolinguistic, and pragmatic. There are different approaches into com-
municative language competence. They tend to differ with regard to what they
include as components, but linguistic competence seems to be common to all of
them. There are, though, different competing frameworks for linguistic compe-
tences in terms of what they involve as linguistic components, e.g., morphology,
syntax, semantics, lexicon, phonology, phonetics, orthography (the study of correct
spelling according to established usage or a method of representing the sounds of
language by written signs and symbols), orthoepy (the study of the relationship
between the pronunciation of words and their orthography), and so on. The same
goes for these components, e.g., there are a number of competing theoretical models
for syntax and semantics. Regardless, for individuals to be able to effectively
communicate with one another, they need to, in addition to linguistic knowledge, be
cognizant about sociocultural and pragmatic dimensions of language. Besides,
language expresses much more than what is signified by its arbitrary signs as
signifiers that have basically no inherent relationship with what they signify.
338 7 Towards AmI Systems Capable …

7.3.2.2 Linguistic Competence

Linguistic competence entails the system of linguistic knowledge possessed by


native speakers, all areas and types of competences internalized, developed, and
transformed by language users, e.g., mental representations, capacities, and
know-hows. According to Chomsky, linguistic competence is the ‘mental reality’
which is responsible for all those aspects of language use that can be regarded as
linguistics, and entails the ideal speaker-hearer’s knowledge of his/her language
(Kroy 1974). As an integral part of an individual’s communicative language
competence, linguistic competence is associated with the extent and quality of
knowledge (e.g., accurate grammatical use of sentences, lucidity of meaning, pre-
cision of vocabulary, relevance of lexical expression, phonetic distinctions, trans-
lation of the abstract representations of speech units to articulatory gestures and
acoustic signals); readiness as the expressiveness/articulateness of knowledge; the
way knowledge is stored, structured, activated, recalled, retrieved, and manipulated
at a cognitive level. Generally, the cognitive storage, organization, and accessibility
of linguistic knowledge vary from one person (language user) to another and
depend, among other things, on the intellectual, genetic, social, and cultural factors
involved in language learning and usage. As mentioned above, there is a number of
competing frameworks for linguistic competences in terms of what they involve as
components comprising knowledge, abilities, and know-how, such as morpholog-
ical, syntactic, semantic, lexical, phonological, and phonetic components. Indeed,
some views argue that linguistic competence is about grammar. One’s competence
is defined by the grammar (Kroy 1974; Evans and Green 2006) or a set of language
rules. Chomsky’s notion of linguistic competence is purely syntactic as well.

7.3.2.3 Grammar, Generative Grammar, and Grammatical


Competences

As a subfield of linguistics, grammar refers to the set of structural rules and prin-
ciples governing the composition of words, phrases, and sentences or the assembly
of various elements into meaningful sentences, in any given natural language. There
are several competing theories and models for the organization of words into
sentences. The same goes for ‘generative grammar’ (Chomsky 1965). Based on the
underlying premise that all humans have an internal capacity to acquire language,
Chomsky’s perspective of language learning implies that the ability to learn,
understand, and analyze linguistic information is innate (Rowe and Levine 2006).
Chomsky regards grammatical competence to be innate because one will still be
able to apply it in an infinite number of unheard examples without having to be
trained to develop it (Phillips and Tan 2010). It is argued that grammatical com-
petence defines an innate knowledge of rules because grammar is represented
mentally and manifested based on the individuals’ own understanding of acceptable
usage in a given language idiom. It is worth pointing out that the subtler sorts of
grammatical differences in languages and the fact the grammar of any language is
7.3 Human Communication 339

highly complex and defies exhaustive treatment may pose challenges for building a
universal grammar framework that can be used in conversational systems.
The term ‘generative grammar’ (Chomsky 1965) is used to describe a finite set
of rules that can be applied to or hypothetically generate an infinitive number (or all
kinds) of sentences precisely those that are grammatical in a given language and no
other. This description is provided by Chomsky (1957) who coined and popularized
the term. It is most widely used in the literature on linguistics. In Chomsky’s (1965)
own words: ‘…by a generative grammar I mean simply a system of rules that in
some explicit and well-defined way [generates or] assigns structural descriptions to
sentences.’ The idea of the ‘creative’ aspect of language and that a grammar must
be existent to describe the process that makes a language possible to ‘make infinite
use of finite means, is advocated by Wilhelm von Humboldt who is one of the key
figures quoted by Chomsky as a spark for his ideas (Chomsky 1965). René
Descartes is also a major influence on Chomsky and whose concern with the
creative powers of the mind led him to regard natural language as an instrument of
thought (Phillips and Tan 2010).
Literature shows that the term ‘generative grammar’ is used in multiple ways. It
refers, in theoretical linguistics, to a particular (Chomskian) approach to the study
of syntax. A generative grammar of a language attempts to provide a set of rules
that will, in addition to predicting the morphology of a sentence according to some
approaches to generative grammar, will correctly predict which combinations of
words will form grammatical sentences. Linguists working in the generativist tra-
dition claim that competence is the only level of language that is studied, as this
level gives insights into the universal grammar, a theory credited to Noam
Chomsky which suggests that there are properties that all possible natural languages
have and that some rules of grammar are hard-wired into the brain and manifest
without being taught. This is however still a subject of a heated debate in terms of
the argument of whether there is such thing and that the properties of a generative
grammar arise from an ‘innate’ universal grammar. Generative grammar also relates
to psycholinguistics in that it focuses on the biological basis for the acquisition and
use of human language. Indeed, Chomsky’s emphasis on linguistic competence
greatly spurred the development of psycholinguistics as well as neuro-linguistics. It
moreover distinguishes between linguistic performance, the production and com-
prehension of speech (see below for detail), and linguistic competence, the
knowledge of language, which is represented by mental grammar—the form of
language representation in the mind. Furthermore, given the fact that generative
grammar characterizes sentences as either grammatically well-formed or not and the
algorithmic nature of the functioning of its rules to predict grammaticality as a
discrete result, it is of high relevance to computational linguistics and thus con-
versation systems. But using theoretical models of generative grammar in modeling
natural language may be associated with the issue of standardization, as there are a
number of competing versions of or approaches to generative grammar currently
practiced within linguistics, including, minimalistic program, lexical functional
grammar, categorical grammar, relational grammar, tree-adjoining grammar,
head-driven phrase structure grammar, and so forth. They all share the common
340 7 Towards AmI Systems Capable …

goal and endeavor to develop a set of principles that account for the well-formed
natural language expression.
However, the knowledge of, and the ability to use, the grammatical rules of a
language to understand and convey meaning by producing and recognizing
well-formed sentences in accordance with these grammatical principles is what
defines grammatical competence. According to Chomsky (1965), competence is the
‘ideal’ language system that enables speakers to understand and generate an infinite
number (all kinds) of sentences in their language and to distinguish grammatical
from ungrammatical sentences. Grammatical competence involves two distinctive
components: morphology (word forms) and syntax (sentence structure).
Morphology is concerned with the internal structure of words and their formation,
identification, modification, and analysis into morphemes (roots, infixes, prefixes,
suffixes, inflexional affixes, etc.). Morphological typology represents a method for
categorizing languages that clusters them according to their common morphological
structures, i.e., on the basis of how morphemes are used in a language or how
languages form words by combining morphemes, e.g., fusional language, a type of
synthetic language which tends to overlay many morphemes to denote syntactic or
semantic change, uses bound morphemes, affixes, prefixes, suffixes, infixes,
including: word-forming affixes. Accordingly, morphological competence is the
ability to form, identify, modify, and analyze words. On the other hand, syntax is
concerned with the patterns which dictate how words are combined to form sen-
tences. Specifically, it deals with the organization of words into sentences in terms
of a set of rules associated with grammatical elements (e.g., morphs, morphemes-
roots, words), categories (e.g., case and gender; concrete/abstract; (in)transitive and
active/passive voice; past/present/future tense; progressive, perfect, and imperfect
aspect) classes (e.g., conjugations, declensions, open and closed word classes),
structures (compound and complex words and sentences, phrases, clauses), pro-
cesses (e.g., transposition, affixation, nominalization, transformation, gradation),
relations (e.g., concord, valency, government) (Council of Europe 2000).
Accordingly, syntactic competence is the ability to organize sentences to convey
meaning.

7.3.2.4 Semantics and Semantic Competence

In addition to grammatical components, most of communicative language compe-


tence frameworks include semantic component (in addition to lexical, phonological
and phonetic components, which are addressed next, respectively). Semantics is the
study of meaning (of words and sentences). It focuses on the relationship between
words, phrases, and sentences as signifiers and what they represent as signified,
their denotata. Semantic competence is about the knowledge of the ability to control
the organization of meaning in terms of both words and sentences; it consists of
lexical, grammatical and pragmatic semantics: lexical semantics deals with ques-
tions of word meaning (e.g., relation of word to general context: reference, con-
notation, and exponence of general specific notions; interlexical relations, such as
7.3 Human Communication 341

synonymy/antonymy, hyponymy, collocation, part-whole relations, translation


equivalence, and so on); grammatical semantics is concerned with the meaning of
sentences in terms of grammatical elements, categories, structures, and processes,
which are associated with syntax; and pragmatic semantics takes up issues relating
to logical relations, such as entailment, implicature, and presupposition (Ibid).

7.3.2.5 Lexical Competence

Lexicon is concerned with the vocabulary of a given language, essentially a cata-


logue of a given language’s words. Lexical competence is the knowledge of, and
the ability to use, the vocabulary of a language, different types of words and
expressions, as some analyses consider compound words, idiomatic expressions
(certain categories), and other collocations (a sequence of words that co-occur more
often or used regularly together) to be part of the lexicon. Lexical competence
consists of grammatical elements and lexical elements (Ibid). Grammatical elements
belonging to closed word classes include articles, quantifiers, question words, rel-
atives, personal pronoun, possessives, demonstratives, prepositions, auxiliary verbs,
conjunctions. Lexical elements include single word forms and fixed expressions:
single word forms involve polysemy and members of the open word classes: noun,
verb, adjective, and adverb, through which may include closed lexical sets; and
fixed expressions include sentential formulae (e.g., proverbs, relict archaisms, direct
exponents of language functions), phrasal idioms (e.g., semantically opaque, frozen
metaphors), fixed frames, fixed phrases (e.g., phrasal verbs, compound preposi-
tions), and fixed collocations (Ibid).

7.3.2.6 Phonological and Phonetic Competences

Phonology is often complemented by phonetics. Phonology is different from, yet


related to, phonetics. The distinction between them constitutes a subject of con-
fusion to many people (outside the field of linguistics), and thus it is useful to
differentiate between them and elucidate how they interrelate in speech
communication.
As a subfield and a multidimensional subject of linguistics, phonetics is the
study of the production, transmission, and perception of speech sounds or sounds of
language. It is of particular importance in and high relevance to ECAs research,
since it is concerned with the physical properties of the sounds of speech in terms of
their physiological production, auditory perception, and acoustic transmission (see,
e.g., Lass 1998; Carr 2003), which all occur simultaneously in the process of speech
communication. As a field of research, phonetics involves three basic areas of
study: articulatory phonetics, which investigates the production of speech by the
articulatory and vocal tract by the speaker; acoustic phonetics, which studies the
transmission of speech from the speaker to the listener; and auditory phonetics,
which is concerned with the perception of speech by the listener. That being said
342 7 Towards AmI Systems Capable …

phonetics goes beyond audible sounds as entailing what happens in the mouth,
throat, nasal cavities, and lungs (respiration) in order to produce sounds of language
to include cognitive aspects associated with the perception of speech sounds. As to
phonetic competence, it is of three-dimensional nature: articulatory, auditory, and
acoustic. It entails, specifically, the knowledge of and the skill in the production,
perception, and transmission of the sounds of speech, phonemes, words, and sen-
tences: the distinctive features of phonemes, such as voicing, rounding, articulation,
accent, nasalization, and labialization; the phonetic composition of words in terms
of the sequence of phonemes and word stress and tones; and other sounds relating
to prosodic features of speech, which cannot be extracted from the characteristics of
phoneme segments, including pause, intonation/pitch, intensity, rhythm, fluctu-
ation, spectral slope, syllable length, the formant frequencies of speech sounds, and
so on. In the context of conversational agents, grammatical, semantic, pragmatic
and sociocultural dimensions of spoken language are treated as levels of linguistic
context.
Phonology is the subfield of linguistics that deals with the systematic use of
sounds to encode meaning in any spoken human language (Clark et al. 2007). It
entails the way sounds function within and across languages and the meaning
behind it. Sounds as abstract units are assumed as levels of language to structure
sound for conveying linguistic meaning. Phonology has traditionally centered lar-
gely on investigating the systems of phonemes. As a basic unit of a language’s
phonology, a phoneme can be combined with other phonemes to form meaningful
units such as words or morphemes (the smallest grammatical unit in a language);
the main difference between the two is that a word is freestanding, comprising one
or more morphemes, whereas a morpheme may or may not stand alone. Moreover,
as the smallest contrastive linguistic unit, one phoneme in a word may bring about a
change of meaning, e.g., the difference in meaning between the words tax and tag is
a result of the exchange of the phoneme /x/ for the phoneme /g/. However, just as a
language has morphology and syntax, it has phonology—phonemes, morphemes,
and words as sound units and their mental representation. In all, phonology deals
with the mental organization of physical sounds and the patterns formed by sound
combinations and restrictions on them within languages. Phonology is concerned
with sounds and gestures as abstract units (e.g., features, phonemes, onset and
rhyme, mora, syllables, articulatory gestures, articulatory features, etc.), and their
conditioned variations through, for example, allophonic rules, constraints, or der-
ivational rules (Kingston 2007). For example, phonemes constitute an abstract
underlying representation for morphemes or words, while speech sounds (phones)
make up the corresponding phonetic realizations. Allophones entail the different
speech sounds that constitute realizations of the same phoneme, separately or in a
given morpheme or word, which are perceived as equivalent to each other in a
given language. Allophonic variations may be conditioned, i.e., a phoneme can be
realized as an allophone in a particular phonological environment—distributional
variants of a single phoneme. And as far as phonological competence is concerned,
it involves the knowledge of, and the skill in, the use of sound-units to encode
7.3 Human Communication 343

meaning in a spoken language—in other words, the perception and production of


the sound-units of the language and their conditioned realizations.
Regarding the link between phonology and phonetics, the former relates to the
latter via the set of distinctive features which map the abstract representations of
speech units to articulatory gestures, acoustic signals, and/or perceptual represen-
tations (Halle 1983; Jakobson et al. 1976; Hall 2001).

7.3.2.7 Sociolinguistic and Pragmatic Competences

Fundamentally, human communication involves a shared or common knowledge


base that is used in interaction between humans to understand each other. This
shared knowledge is socioculturally constructed, and involve a complete world and
language model. Language (or discourse) is a particular way of understanding and
talking about the world. The expectation of humans towards other humans is in any
form of communication strongly influenced by this shared knowledge. Higher level
cultural models constitute shared forms of understanding the world, and thus affect
spoken (and written) language. Fisher (1997) advances the notion of cultural frames
as ‘socioculturally and cognitively generated patterns which help people to
understand their world by shaping other forms of deep structural discourse [or
language]’. They can be equated to social representations, which are, according to
Moscovici (1984), cultural-specific and conventionalized by each society and
attuned to its values, as well as prescriptive in the sense that they shape the way
people think. This is manifested in, among other things, in the use of language in
different sociolinguistic contexts and the pragmatic functions of the realization of
communicative intentions.
Common models or frameworks of communicative language competence
comprise sociolinguistic and pragmatic competences. Canale and Swain (1980)
hypothesize about ‘four components that make up the structure of communicative
[language] competence’ with the third being sociolinguistic competence and the
fourth being pragmatic competence. Sociolinguistic and pragmatic competences are
communicative competences, especially when the emphasis is on how to interpret
the speaker’s intended meaning in a particular utterance, apart from the literal
meaning (Myers-Scotton 2006). To advance the notion of language communicative
competence, Dell Hymes developed a functionalist theory which focuses on
socially situated performance in response to the abstract nature of linguistic com-
petence (Hymes 1971, 2000). The user’s intended meaning, which can be disam-
biguated from a communicative behavior (in the case of speech) using context, is of
particular relevance to natural HCI and thus conversational systems. This is also
important to account for when designing emotional context-aware applications, as
the performance of such applications depends on sound interpretation of the user’s
emotional states that can be captured implicitly as contextual information from the
user’s vocal cues—the context is the key to the right meaning, in other words.
As an interdisciplinary subfield of linguistics, sociolinguistics is the study of the
relation between language and society—the effect of society on the way language is
344 7 Towards AmI Systems Capable …

used and the effects of language usage on society. There exist several relationships
between language and society, including: ‘social structure may either influence or
determine linguistic structure and/or behavior…’, ‘linguistic structure and/or
behavior may either influence or determine social structure…’; and ‘the influence is
bidirectional: language and society may influence each other…’ (Wardhaugh 2005).
In relation to the first relationship, which appears to be the most at work and
prevalent in almost all societies, language expresses, according to Lippi-Green
(1997, p. 31), the ‘way individuals situate themselves in relationship to others, the
way they group themselves, the powers they claim for themselves and the powers
they stipulate to others.’ People tend to position (express or create a representation
of) themselves in relation to others with whom they are interacting by using
(choosing) specific linguistic forms in utterances that convey social information.
A single utterance can reveal an utterer’s background, social class, or even social
intent, i.e., whether he/she wants to appear distant or friendly, deferential or familiar,
inferior or superior (Gumperz 1968). According to Romaine (1994, p. 19), what
renders ‘a particular way of speaking to be perceived as superior is the fact that it is
used by the powerful’. In all, linguistic choices carry social information about the
utterer, as they are made in accordance with the orderings of society. Accordingly
Gumperz (1968, p. 220) argues that the ‘communication of social information
presupposes the existence of regular relationships between language usage and
social structure’. Given this relationship between language and society, the linguistic
varieties utilized by different groups of people (speech communities) on the basis of
different social variables (e.g., status, education, religion, ethnicity, age, gender)
form a system that corresponds to the structure of society and adherence to socio-
cultural norms is used to categorize individuals into different social classes. Each
speech community ascribes social values to specific linguistic forms in correlation
with which a group uses those forms. Gumperz (1968) provides a definition of
speech community: ‘any human aggregate characterized by regular and frequent
interaction by means of a shared body of verbal signs’, where the human aggregate
can be described as any group of people that shares some common attribute such as
region, race, ethnicity, gender, occupation, religion, age, and so on; interaction
denotes ‘a social process in which utterances are selected in accordance with socially
recognized norms and expectations’, and the ‘shared body of verbal signs’ is
described as the set of ‘rules for one or more linguistic codes and…for the ways of
speaking’ that develop as a ‘consequence of regular participation in overlapping
networks.’ It is to note that these rules of language choice vary based on situation,
role of speakers, relationship between speakers, place, time, and so forth. Moreover,
William Labov is noted for introducing the study of language variation (Paolillo
2002), which is concerned with social constraints that determine language in its
contextual environment. The use of language varieties in different social situations is
referred to as code-switching. Varieties of language associated with specific regions
or ethnicities may, in many societies, be singled out for stigmatization because their
users are situated lower in the social hierarchy. Lippi-Green (1997) writes on the
tendency of the powerful to ‘exploit linguistic variation…in order to send complex
messages’ about the way groups are ranked or placed in society. Language variation
7.3 Human Communication 345

impacts on communication styles and daily lives of people as well as on the way they
communicate at intercultural and cross-cultural levels. Understanding sociocultural
dimension of language is important towards intercultural and cross-cultural com-
munication, since language usage varies among social classes and from place to
place. Second language learners must learn how ‘to produce and understand lan-
guage in different sociolinguistic contexts, taking into consideration such factors as
the status of participants, the purposes of interactions, and the norms or conventions
of interactions.’ (Freeman and Freeman 2004). Learning and practice opportunities
for language learners should include expressing attitudes, conveying emotions,
inferring emotional stances, understanding formal versus informal, and recognizing
idiomatic expressions.
Furthermore, sociolinguistics draws on linguistics, sociology, and anthropology.
Sociolinguists (or dialectologists) study the grammar, semantics, phonetics, pho-
nology, lexicon, and other aspects of social class dialects. Sociology of language
focuses on the effects of language on the society, which is one focus of sociolin-
guistics. Sociolinguistics is closely related to linguistic anthropology (the inter-
disciplinary study of how language influences social life) and the distinction
between these two interdisciplinary fields has even been questioned recently
(Gumperz and Cook-Gumperz 2008).
Sociolinguistic competence deals with the knowledge of social conventions
(norms governing relations between genders, generations, classes, social groups,
and ethnic groups) as well as behaviors, attitudes, values, prejudices, and prefer-
ences of different speech communities, which is necessary to understand socio-
cultural dimensions of language and thus use it in different sociolinguistic contexts.
In specific terms, sociolinguistic competence involves the ability to use language in
different communicative social situations—that is, to know and understand how to
speak given the circumstances one is in, as well as to distinguish between language
varieties on the basis of different social variables. According to Council of Europe
(2000), the matters taken up in sociolinguistic competence in relation to language
usage include: politeness conventions (e.g., impoliteness, positive politeness,
negative politeness); register differences (e.g., formal, neutral, informal, familiar);
dialect and accent (social class, ethnicity, national origin); linguistic markers of
social relations (e.g., use and choice of greetings, address forms and expletives;
conventions for turn-taking, offering, yielding, and keeping), and expressions of
wisdom (e.g., proverbs, idioms). For a detailed account of these matters with
illustrative examples, the reader is directed to Council of Europe (2000), the
Common European Framework of reference for Languages.
Pragmatics is the subfield of linguistics that studies the use of language in
contexts or the ways in which context contributes to meaning—in other words, how
people comprehend and produce communicative acts in a concrete speech situation.
Pragmatics emphasizes what might not be explicitly stated and the way people
interpret utterances in situational contexts. In relation to conversation analysis,
pragmatics distinguishes two intents or meanings in each communicative or speech
act: (1) the informative intent or the utterance meaning, and (2) the communicative
intent or speaker meaning (Leech 1983; Sperber and Wilson 1986). Pragmatics is
346 7 Towards AmI Systems Capable …

‘concerned not so much with the sense of what is said as with its force, that is, with
what is communicated by the manner and style of an utterance.’ (Finch 2000). In
other words, it deals with how the transmission of meaning depends not so much on
the explicit linguistic knowledge (e.g., grammar, semantics, lexicon, etc.) of the
speaker as on the inferred intent of the speaker or the situational context of the
utterance. Overall, pragmatics encompasses talk in interaction, speech act theory,
conversational implicature (the things that are communicated though not explicitly
expressed), in addition to other approaches to language behavior in linguistics,
sociology, philosophy and anthropology (Mey 1993).
Pragmatics competence is a key component of communicative language compe-
tence. It entails the knowledge of, and the skill in, the interpretation of (the meaning
of) utterances in situational contexts. The ability to understand the speaker’s intended
meaning is called pragmatic competence (Takimoto 2008; Koike 1989). In this sense,
pragmatic competence provide language users with effective means to overcome
ambiguities in speech communication given that meaning of utterances or enacted
through speech relies on such contextual factors as place, time, manner, style, situ-
ation, the type of conversation, and relationship between speakers, or that meaning
can be inferred based on logical relations such as entailment, presupposition, and
implicature. Therefore, pragmatic competence entails that language users use lin-
guistic resources to produce speech acts or perform communication functions, have
command of discourse, cohesion and coherence, identify speech types, recognize
idiomatic expressions and sarcasm, and be sensitive to social and cultural environ-
ments. According to Council of Europe (2000), pragmatic competences involves
discourse competence, functional competence, and design competence, that is, it
deals with the language user’s knowledge of the principles according to which
messages are, respectively, organized, structured and arranged; used to perform
communicative functions; and sequenced according to interactional schemata.
Discourse competence, which is the ability to arrange and sequence statements to
produce coherent units of language, involves knowledge of, and ability to control, the
ordering of sentences with reference to topic/focus, given/new, and natural
sequencing (e.g., temporal); cause/effect (invertible); ability to structure and manage
discourse in terms of: thematic organization, coherence and cohesion, rhetorical
effectiveness, logical ordering, style and register; and so on. Discourse refers to a set
of statements that provide a language for talking within some thematic area.
Functional competence is, on the other hand, concerned with the use of utterances and
spoken discourse in communication for functional purposes; it is the ability to use
linguistic resources to perform communicative functions. It involves micro-
functions, macro-functions, and interaction schemata. Micro-functions entails cate-
gories for the functional use of single utterances, including ‘imparting and seeking
factual information: identifying, reporting, correcting, asking, answering; expressing
and finding out attitudes: factual (agreement/disagreement), knowledge (knowledge/
ignorance, remembering, forgetting, probability, certainty), modality (obligations,
necessity, ability, permission), volition (wants, desires, intentions, preference),
emotions (pleasure/displeasure, likes/dislikes, satisfaction, interest, surprise, hope,
disappointment, fear, worry, gratitude), moral (apologies, approval, regret,
7.3 Human Communication 347

sympathy); suasion: suggestions, requests, warnings, advice, encouragement, asking


help, invitations, offers; socializing: attracting attention, addressing, greetings,
introductions, toasting, leave-taking; structuring discourse: (28 micro-functions,
opening, turn-taking, closing, etc.); communication repair: (16 micro-functions)’.
Macro-functions represent categories for the functional use of spoken discourse
comprising a sequence of sentences that can sometimes be extended, e.g., description,
narration, commentary exposition, exegesis, explanation, demonstration, instruction,
argumentation, persuasion, and so on. Interaction schemata are the patterns of social
interaction which underlie communication (e.g., verbal exchange patterns); interac-
tive communicative activities involving structured sequences of actions by the parties
in turns form pairs (e.g., question: answer; statement: agreement/disagreement;
request/offer/apology: acceptance/non-acceptance; and greeting/toast: response) and
triplets (in which the first speaker acknowledges or responds to the interlocutor’s
reply), which are usually embedded in longer interactions. In all, pragmatic com-
petence seems to be one of the most challenging aspects of language performance
(addressed next).
In sum, communicative language competence involves so many skills, broad
knowledge, and long experience necessary to perform communicative functions and
to realize communication intentions. Particularly, sociocultural and pragmatic
dimensions of language is becoming an increasingly important topic and of interest
to ECA community, as researchers (e.g., Vilhjálmsson 2009; ter Maat and Heylen
2009; Samtani et al. 2008) have recently started to focus on various contextual
variables (dialog, situation, environment, and culture) that define and surround
communicative behavior (speech). These factors are deemed critical as to interpret
and disambiguate multimodal communicative signals, that is, to use context to
determine what was intended by communicative behaviors in a particular conver-
sation. Sociocultural setting, dialog and the environment are contextual elements that
play a significant role in the interpretation of the communicative behavior, specifi-
cally ‘in determining the actual multimodal behavior that is used to express a con-
versational function.’ (ter Maat and Heylen 2009, p. 71). See below for more detail.

7.3.2.8 Linguistic Performance and Communication Errors


and Recovery

Linguistic performance entails the act of carrying out speech communication or the
production of a set of specific utterances by native speakers. It is a concept that was
first coined by Chomsky (1965) as part of the foundations for his theory of
transformational generative grammar (see above). It is said that linguistic perfor-
mance reflects the intrinsic sound-meaning connections established by language
systems, e.g., phonology, phonetics, syntax, and semantics, and involves extra-
linguistic beliefs pertaining to the utterer, including attitude, physical well-being,
mnemonic skills, encyclopedic knowledge, absence of stress, and concentration.
348 7 Towards AmI Systems Capable …

Moreover, linguistic performance is inextricably linked to the context, situation, and


environment, which surround and affect the speech communication; and in which it
takes place. That is to say, speech communication happens always in a certain
context, a specific situation, and in a particular environment and, subsequently, the
meaning of words and sentences as well as the way the communication is performed
is heavily influenced by the context, situation, and environment. These contingent
factors play therefore a fundamental role in determining how speech is perceived,
recognized, and produced. Speaking and responding in human-to-human commu-
nication is determined by the way context is interpreted or construed and its evolving
patterns are monitored in a specific situation and in a particular environment by both
the speaker and the listener. Context consists of specific aspects that characterize a
specific situation, a certain interpretation of situational features. And situation
denotes everything that surrounds the speech communication, including the cultural
conventions; communication goals; knowledge and roles of participants; and local,
social, physical and chemical environment. The situation in which the speech
communication takes place provides a common ground that generates implicit
conventions and calls for the implicitly shared common knowledge, i.e., world and
language model, which influence and to some extent set the rules for interaction,
including communicative actions, as well as provide a key to decode the meaning of
verbal behavior.
Furthermore, linguistic performance can demonstrate the concrete use of lin-
guistic competence, showing how language users differ from one another as to the
accessibility of linguistic knowledge (the competence of language) that is cogni-
tively stored and organized. Linguistic competence entails an ideal speaker-listener,
in a completely homogeneous speech communication, knowing his/her language
perfectly and ‘that it is unaffected by such grammatically irrelevant conditions as
memory limitations, distractions, shifts of attention and interest, and errors (random
or characteristic) in applying his knowledge of this language in actual perfor-
mance.’ (Chomsky 1965). Linguistic performance is governed by these principles
of cognitive structure that are technically not regarded as aspects of language
(Chomsky 2006).
According to Chomsky (1965), a fundamental distinction has to be made
between linguistic competence and linguistic performance. He argues that only
under an idealized situation whereby the speaker-hearer is unaffected by gram-
matically irrelevant conditions will performance (idealized capacity) mirror com-
petence. Hymes’s (1971) criticism of Chomsky’s notion of linguistic competence is
the inadequate distinction between competence and performance, commenting that
no significant progress in linguistics is possible without studying forms along with
the ways in which they are used. Functional theoretical perspectives of grammar
(approaches to the study of language that hold that the functions of language and its
elements are the key to understanding linguistic processes and structures) tend to
dismiss the sharp distinction between linguistic competence and linguistic perfor-
mance, and especially the primacy given to the study of the former. Functional
grammar analyzes, among others, ‘the entire communicative situation: the purpose
of the speech event, its participants, its discourse context’, and claims ‘that the
7.3 Human Communication 349

communicative situation motivates, constrains, explains, or otherwise determines


grammatical structure, and that structural or formal approaches are not merely
limited to an artificially restricted data base, but are inadequate even as structural
accounts’, thereby differing ‘from formal and structural grammar in that it purports
not to model but to explain; and the explanation is grounded in the communicative
situation’ (Nichols 1984). That is, functional theories of grammar tend to focus on,
in addition to the formal relations between linguistic elements, the way language is
used in communicative situational context (Ibid).
Communication between humans is not error free, and communication partners
usually resort to different ways to resolve communication errors. Errors may occur
consciously or unconsciously when carrying out speech communication acts—
linguistic performance. They commonly involve grammatical incorrectness, pho-
netic inaccuracy, articulation inappropriateness, lexical inexactness, meaning
ambiguity, false starts and other deviations, and other non-linguistic features, such
as disfluencies, short term misunderstandings, and propositional imprecision
(inability to formulate thoughts so as to make one’s meaning clear). All types of
errors may be detected by communication partners though not all corrected, which
depends on the nature of the error and the type of the conversation. In case there is
misunderstanding, issues are repeated and corrected; grammatically incorrect sen-
tences can be reorganized, inexact lexical elements can be modulated; inappropriate
articulations can be rearticulated; ambiguities associated with topics can be clarified
or elaborated on further; and so on.
Given the complexity of human spoken language, human users are going to
experience many communication errors during their interaction with conversational
systems within AmI environments. Yet worse, artificial systems are not able to
detect communication problems, not to mention resolve them, as humans do. Hayes
and Reddy (1983), computer linguistics, quoted a fundamental difference between
humans and computer systems as to communication: interactive computer systems
do not possess solutions to detect communication problems. In the case conver-
sational agents as AmI systems, communication problems are significantly likely to
occur due to inaccurate detection of, unsound interpretation of, and inefficient
reasoning on, multimodal communication behavior information received as signals
form human users. Particularly, the likelihood of misinterpreting the intended
meaning of human users is highly likely due to the subtlety and intricacy associated
with the pragmatic and sociolinguistic dimensions of language use, in particular, as
to speech communication. In some cases, with advanced context awareness and
sensor technology (e.g., MMES) as well as patterns recognition and ontological
modeling and reasoning techniques, conversational systems may well rely on the
knowledge about the situation to avoid and solve some communication problems,
as it can provide essential cues for the purpose. Overall, the effectiveness and
efficiency of communication between conversational systems and humans users rely
heavily on the ability of these systems to recognize different kinds of communi-
cation errors and resolve them in real-time fashion.
350 7 Towards AmI Systems Capable …

7.3.2.9 Psycholinguistics and Neurolinguistics

Neurolinguistics and psycholinguistics are two interdisciplinary subfields of lin-


guistics that tend to overlap to a great extent. They moreover converge on many
fields from which they draw theories. Neurolinguistics is concerned with the neural
mechanisms in the human brain that control the acquisition, comprehension, and
production of language—that is, how the brain processes information related to
language at neurophysiological and neurobiological levels. Much work in neuro-
linguistics is informed by psycholinguistics models, and focuses on investigating
how the brain can implement the processes that psycholinguistics (drawing on
cognitive psychology) propose as necessary for the comprehension and production
of language. Neurolinguistics integrates many fields, including, in addition to lin-
guistics and psycholinguistics, neuroscience, neuropsychology, neurobiology,
cognitive science, computer science, and communication disorders. For readers who
are interested to read more on neurolinguistics, refer to Stemmer and Whitaker
(1998) and Ahlsén (2006). Psycholinguistics (or psychology of language) deals, on
the other hand, with the acquisition, comprehension, and production of language,
‘with language as a psychological phenomenon’ (Garman 1990). Psycholinguistics
provides insights into how we assemble our own speech and understand that of
others, how we store and use vocabulary, and how we manage to acquire language
in the first place (Field 2004). Psycholinguistics draws upon empirical findings from
cognitive psychology, and thus cognitive science, to explain the mental processes
underlying speech, namely acquisition, storage, comprehension, and production,
e.g., acquiring a new language, producing grammatical and meaningful sentences
out of grammatical structures and vocabulary, memorizing idiomatic expressions,
understanding utterances and words, and so on. Based on the relationship between
cognitive psychology, cognitive science, and AI (see next chapter for more detail)
theoretical models of psycholinguistics are of high relevance to building conver-
sational systems with respect to the processes of speech perception and production,
as applied in ECAs (see, e.g., Dohen 2009; Robert-Ribes 1995; Schwartz et al.
1998). According to Garman (1990), there are three key psycholinguistics elements
that are used to describe the mechanisms underlying language understanding and
production: language signal, operations of neuropsychological system, and lan-
guage system. Language signal refers to all forms of language expression, which are
generated and perceived by language users, and in the perception of which gaps are
closed and irregularities overlooked; the most striking characteristic of this element
is its perceptual invariance in speech. Operations of neuropsychological system
determine how language signals—speech cues—are perceived and generated, a
process which involves auditory pathways from sensory organs to the central
processing areas of the brain and then to the vocal tract. Language system involves
silent verbal reasoning, contemplation of language knowledge; it can be imple-
mented even when not using palpable language signals at all. As an interdisci-
plinary field, psycholinguistics integrates, in addition to linguistics and psychology,
cognitive science, neuropsychology, neuroscience, information theory, and speech
and language pathology in relation to how the brain processes language.
7.3 Human Communication 351

7.3.2.10 Cognitive Linguistics

Cognitive linguistics is a paradigm within linguistics that emerged following


Langacker’s (1987, 1991) notable work, a seminal, two-volume foundations of
cognitive grammar. Cognitive linguistics seeks to investigate the interrelations and
interactions between linguistic knowledge and its cognition and how language and
cognition mutually influence one another. Cognitive linguistics subsumes a number
of distinct theories and focuses on explicating the interrelationship between lan-
guage and cognitive faculties (van Hoek 2001). According to Geeraerts and
Cuyckens (2007), cognitive linguistics is the study of language in its cognitive
function and thus postulates that our interaction and encounters with the world is
mediated through informational structures in the mind, and language is a means for
organizing, processing, and conveying that information—a repository of world
knowledge as a structured set of meaningful categories help us store information
about new experiences as well as deal with new ones. Cognitive linguistics com-
prises three main subject areas of study: cognitive semantics, which deals largely
with lexical semantics (see above), separating meaning into meaning-construction
and knowledge representation and therefore studies much of the area devoted to
pragmatics and semantics; cognitive approaches to grammar, which is concerned
mainly with syntax and morphology; and cognitive phonology, which investigates
classification of various correspondences between morphemes and phonetic
sequences. Cognitive phonology is concerned with the sound systems of languages
as abstract/conceptual units. Cognitive phonology assumes that other aspects of
grammar can be directly accessible due to its subordinate relationship with cog-
nitive grammar (a cognitive approach to language developed by Langacker (1987,
1991, 2008) considers the basic units of language (where grammar entails con-
straints on the way these units are combined to generate phrases and sentences) to
be conventional pairings of a phonological label with a semantic structure), thereby
making it feasible to make relationships between phonology and various syntactic,
semantic and pragmatic aspects. Cognitive approaches to grammar entail theories of
grammar, e.g., generative grammar, cognitive grammar, construction grammar
(developed by Langacker’s student Adele Goldberg), that relate grammar to mental
structures and processes in human mind or cognition. And cognitive semantics
holds that language as part of human cognitive ability can only describe the world
as organized within people’s conceptual spaces (Croft and Cruse 2004). The main
tenets of cognitive semantics include: grammar is a means of expressing the
utterer’s concept of the world; knowledge of language is acquired and contextual;
and the ability to use language involves more general cognitive resources, not a
special language module (Ibid). In fact, cognitive linguists denying that the mind
has any language-acquisition module that is unique and autonomous stands in
contrast to the position adopted by generative grammarians, but goes in line with
the stance espoused by functionalists (see above). Denying that the mind involves
an autonomous linguistic faculty is one of the central positions or premises to which
cognitive linguistics adheres, in addition to understanding grammar in terms of
conceptualization and the claim that knowledge of language arises out of its use
352 7 Towards AmI Systems Capable …

(Ibid). Cognitive linguistics argue that language is embedded in the experiences and
environments of its users, and knowledge of linguistic phenomena is essentially
conceptual in nature; they view meaning in terms of conceptualization—i.e., mental
spaces instead of models of the world; they assert that the cognitive processes of
storing and retrieving linguistic knowledge are not significantly different from those
associated with other knowledge, and that similar cognitive abilities are employed
in the use of language in understanding as those used in other non-linguistic tasks;
and they deny that human linguistic ability (although part of it is innate) is separate
from the rest of cognition—that is, linguistic knowledge is intertwined with all
other cognitive processes and structures, not an autonomous cognitive faculty with
processes and structures of its own (see, e.g., Geeraerts and Cuyckens 2007; Croft
and Cruse 2004; Vyvyan and Green 2006; Vyvyan 2007; Vyvyan et al. 2007).
Moreover, aspects of cognition that are of interest to cognitive linguists include
conceptual metaphor and conceptual blending; cognitive grammar, conceptual
organization (categorization, metonymy, and frame semantics); gesture (nonverbal
communication behaviors); cultural linguistics; and pragmatics.

7.4 Computational Linguistics and Relevant Areas


of Discourse: Structural Linguistics, Linguistic
Production, and Linguistic Comprehension

As a branch of AI, computational linguistics is the scientific study of language in


relation to computing or from a computational perspective. It deals with the
rule-based modeling of natural language into computer systems. It is thus of per-
tinence to ECAs researchers and of particular interest to natural HCI and AmI
community. To model human language, computational linguistics draws theoretical
models from a variety of fields of linguistics as well as from cognitive science.
Computational linguistics aims at providing computational models of many cate-
gories of linguistic phenomena, which requires extensive computational resources
to study the linguistic behavior of conversational systems as complex and intelli-
gent systems by computational simulation. This chapter covers aspects of theo-
retical computational linguistics, taking up some issues in theoretical linguistics and
cognitive science, as well as applied computational linguistics, focusing on some
practical outcomes of modeling human language use in relation to conversational
systems. Among the emphases is the processing of speech as a language medium
and the related tasks in terms of speech analysis (recognition) and synthesis
(generation). Speech recognition and synthesis is concerned with how spoken
language can be interpreted/understood and generated using computers.
In this regard, there are different areas of discourse in computational linguistics,
including structural linguistics, linguistic production, and linguistic comprehension,
which are of relevance to conversational systems (natural HCI) under investigation.
A computational approach to the structure of linguistic data is very crucial to
7.4 Computational Linguistics … 353

organize and uncover much of the information about any language that would
otherwise be still hidden under the vastness and infinite richness of data within that
language—incalculability. Structural linguistics approach aims to understand the
structure of language, using computational approaches, e.g., large linguistic corpora
like the ‘Penn Treebank’ (Marcus et al. 1993). This is to grasp how the language
functions on a structural level, so to create better computational models of lan-
guage. Information about the structural data of language allows for the discovery
and implementation of similarity recognition between pairs of utterances (Angus
et al. 2012). While information regarding the structural data of a language can be
available for any language, there are differing patterns as to some aspects of the
structure of sentences. This usually constitutes sort of intriguing information which
computational linguistics is aimed to uncover and which could lead to further
important discoveries regarding the underlying structure of some languages.
Different grammatical models can be employed for the parsing and generation of
sentences. As a subspecialty of computational linguistics, parsing and generation
deal with taking language apart and putting it together. Computational approaches
allow scientists not only to parse huge amounts of data reliably and efficiently and
generate grammatical structures, but also to generate the possibility for important
discoveries, depending on the natural features of a language.
Linguistic production approach involves how a computer system generates or
produces language, an area in which computational linguistics has made some
fascinating discoveries and remarkable progress. The production of language is a
key feature of AmI systems and ECAs, where a computer system receives speech
signals and to respond to them in a human-like manner. This makes computer
system capable of thought and a human-like interactive system when it becomes
difficult for the subject to differentiate between the human and the computer. This
was proposed some six decades ago by Turing (1950) whose ideas remain
influential in the area of AI. The ELIZA program, which was devised by Joseph
Weizenbaum at MIT in 1966, is one of the very early attempts to design a computer
program that can converse naturally with humans. While the program seemed to be
able to understand what was uttered to it and to respond intelligently to written
statements and questions posed by a user, it only comprehended a few keywords in
each sentence and no more using a pattern matching routine (Weizenbaum 1966).
Nevertheless, the research in this domain has significantly improved, giving rise to
more sophisticated conversational systems. The computational methods used in the
production (and comprehension) of language have become matured and hence the
results generated by computational linguists more enlightening. Specific to com-
putational linguistics, current work in developing conversational agents shows how
new machine learning techniques (supervised learning algorithms and models) have
been instrumental in improving the computational understanding of language, how
speech signals are perceived and analyzed and generated and realized by computer
systems. This work adds to the endeavor towards making computers understand and
produce language in a more naturalistic manner. In this line of thinking, there exist
some specialized algorithms which are capable of modifying a system’s style of
production (speech generation) based on linguistic input from a human or on any of
354 7 Towards AmI Systems Capable …

the five dimensions of personality (Mairesse 2011). This work and others notable
ones (see below) use computational modeling approaches that aim at making HCI
much more natural.
Linguistic comprehension approach concerns how a computer system under-
stands language, recognize, interpret and reason about speech signals. There is a
proliferation of application domains of language comprehension that modern
computational linguistics entails, including research engines, e-learning/education,
e-health, automated customer service, activities of daily living (ADL), and con-
versational agents. The ability to create a software agent/program capable of
understanding human language has many broad possibilities, especially in relation
to the emerging paradigm of AmI, one of which is enabling human users to engage
in intelligent dialog or mingle socially with computer systems. Language perception
(speech analysis) involves the use of various types of pattern recognition algorithms
that fall under supervised machine learning methods, including Support Vector
Machine (SVM), neural network, dynamic and naive Bayes network, and Hidden
Markov Models (HMMs). Early work in language comprehension applied Bayesian
statistics to optical character recognition, as demonstrated by Bledsoe and Browing
(1959). An initial approach to applying signal modeling to language (where
unknown speech signals are analyzed or processed to look for patterns and to make
predictions based on their history) was achieved with the application of HMMs as
described by Rabiner (1989). This and other early attempts to understand spoken
language were grounded in work carried out in the 1970s. Indeed, similar
approaches to applying signal modeling to language were employed in early
attempts at speech recognition in the late 1970s using part-of-speech pair proba-
bilities (Bahl et al. 1978). More endeavors to build conversational agents since the
late 70s up till now are cited below.

7.5 Speech Perception and Production: Key Issues


and Features

7.5.1 The Multimodal Nature of Speech Perception

Perception of speech is the formation, from sensory information received from


various sensors (acoustical and visual signals), of an internal representation that is
suitable for interpretation and reasoning—intelligent processing. As a dominant
approach in AI, speech perception is said to ‘start with the acoustic wave of a
human utterance and proceeds to an internal representation of what the speech is
about. A sequence of representations is used: the digitization of the acoustic wave
into an array of intensities; the formation of a small set of parametric quantities that
vary continuously with time (such as the intensities and frequencies of the formants,
bands of resonant energy characteristic of speech); a sequence of phons (members
of a finite alphabet of labels for characteristic sounds, analogous to letters);
7.5 Speech Perception and Production … 355

a sequence of words; a parsed sequence of words reflecting grammatical structure;


and finally a semantic data structure representing a sentence (or other utterance) that
reflects the meaning behind the sounds.’ (McGraw-Hill Science and Technology
Encyclopedia 2007). Rather, speech is multimodal in its perception (as well as in its
production); it is perceived with the ears and the eyes and (produced with the
mouth, the vocal tract, the hands, and the entire body) (Dohen 2009). Multimodal
perception of speech concerns both segmental perception of speech, phonemes or
words that are produced by a speaker, as well as supra-segmental perception of
speech, prosodic features such as intonation, rhythm, pitch, and phrasing which are
also crucial in spoken communication (Ibid). Both auditory and visual modalities
are therefore involved in the perception of speech.
Vocal language has both linguistic and paralinguistic properties that can be seen.
According to Schwartz (2004), the percentage to perceive and understand speech
with the eyes is estimated to: 40–60 % of the phonemes and 10–20 % of the words
up to 60 % that can be recovered through lip-movement reading, an ability which is
highly inter-speaker dependent. In line with this observation, Stetson (1951, cited in
Dohen 2009, p. 25) quotes: ‘speech is rather a set of movements made audible than
a set of sounds produced by movements’; it is multisensory and does not consist of
sounds which are produced just to be heard (Ibid). As a moderate offshoot of
Stefson’s assertion based on some findings from other studies, Dohen (2009, p. 25)
concludes that ‘speech is (…) a set of movements made audible and visible’. Seen
from this perspective, speech as a communication behavior involves a set of
communication channels using visual and auditory sensory channels and various
classes of verbal and nonverbal signals. Among others, the audio-visual perception
of speech serves for providing redundant information, as shown by many studies
(Sumby and Pollack 1954; Binnie et al. 1974; Summerfield 1979; MacLeod and
Summerfield 1987; Benoît et al. 1994) that the use of vision for speech perception is
evident when the acoustic modality is degraded by noise. In line with this, Grant
and Seitz (2000) suggest that vision helps better understand speech in noise as well
as improve auditory detection of spoken utterances in noise. ‘…when we see a
speaker, we perceive him/her as speaking louder.’ (Dohen 2009, p. 26). In their
investigation on whether there are early interactions between audition and vision,
Schwartz et al. (2004) tested the intelligibility in noise of sequences which are not
distinguishable by lip-reading and found, among other things, that adding vision
significantly improve auditory perception (AV > A), which is interpreted as
potentially corresponding to reinforcement of the voicing feature by vision. In fact,
in terms of spoken mode in interactive situations, people tend to prefer direct
conversations because they can better perceive and understand the content being
exchanged as the use of multiple senses may aid in disambiguating communicative
signals using context as well as discerning communicative behavior in terms of
orofacial articulatory gestures. Speech consists of gestures which are produced to be
heard and seen (Dohen 2009). Depending on the situation, people are sometimes
aware of the beneficial differences between seeing each other while they are
speaking and speaking over the phone. Reisberg et al. (1987) demonstrate that
vision aid in perceiving speech in a foreign language or speech produced by a
356 7 Towards AmI Systems Capable …

non-native speaker. Indeed, when conversing with foreigners, people seem more
often to relay on visual modality to ease their understanding of speech and thus
conversational content. Furthermore, some views argue for visual-auditory nature
of the perception of speech on the basis that visual modality carries complementary
information. Indeed, research shows that the role of visual modality when humans
perceive speech goes beyond just serving as redundant information—when part of
the missing auditory information can be recovered by vision. Dohen (2009)
maintains that the role of vision in the perception of speech is not just that of a
backup channel or that the visual information overlays the auditory one, auditory
and visual information are in fact fused for perceptual decision. Also, McGurk and
MacDonald’s (1976) effect, which illustrates that a [da] percept results from an
audio [ba] dubbed onto a visual [ga], also demonstrates that there is more to vision
than just providing redundant information. As demonstrated by Summerfield
(1987), perceptual confusions between consonants differ one from another and
complement one another in the visual and the auditory modalities. To further
support the argument that visual information is not only of a redundant nature in
speech perception, Boë et al. (2000) point out the [m]/[n] contrast which exists in
more or less all the languages in the world is not audible but visible.
Speaking entails producing gestures of paralinguistic as well as phonetic nature
that are intended to be heard and seen. The multimodal nature of speech perception
involves, in other words, segmental to include supra-segmental perception of
speech. Dohen (2009) points out that the production of prosodic information (e.g.,
prosodic focus) involves visible articulatory correlates that are perceived visually,
and adds that it is possible to put forward an auditory-visual fusion when the
acoustic prosodic information is degraded so to enhance speech perception. The
acoustic correlates of prosodic focus have been widely investigated (Dahan and
Bernard 1996; Dohen and Loevenbruck 2004). While prosody was for a long time
uniquely considered as acoustic/auditory, recent studies carried out in the lab (Graf
et al. 2002; Dohen and Loevenbruck 2004, 2005; Dohen et al. 2004, 2006; Beskow
et al. 2006) have demonstrated that prosody has also potentially visible correlates
(articulatory or other facial correlates) (Dohen 2009).
From a psycholinguistics perspective, there are three mechanisms that underlie
language perception: language signal, operations of neuropsychological system,
and language system (Garman 1990). Operations of neuropsychological system
determine how language signals (spoken utterances) are perceived and generated,
which is a process that involves auditory pathways from sensory organs to the
central processing areas of the brain, and language system involves silent verbal
reasoning, contemplation of language knowledge. Drawing upon different psy-
cholinguistic models, there are characteristic cognitive processes that underlie the
fusion of the auditory and visual information in speech perception. Schwartz et al.
(1998) and Robert-Ribes (1995) analyzed the fusion models in the literature and
presented four main potential fusion architectures, as summarized in Fig. 7.1 by
Dohen (2009, p. 26):
7.5 Speech Perception and Production … 357

Fig. 7.1 The four main types of auditory-visual fusion models. Source Schwartz et al. (1998) and
Robert-Ribes (1995)

• Direct Identification (DI): the auditory and visual channels are directly
compiled.
• Separate Identification (SI): the phonetic classification is operated separately
on both channels and fusion occurs after this separate identification. Fusion is
therefore relatively late and decisional.
• Recoding in the dominating modality (RD): the auditory modality is con-
sidered to be dominant and the visual channel is recoded under a compatible
format to that of the auditory representations. This is an early fusion process.
• Recoding in the motor modality (RM): the main articulatory characteristics are
estimated using the auditory and visual information. These are then fed to a
classification process. This corresponds to an early fusion.
Dohen (2009) reviews a number of studies on fusion models and draws different
conclusions and suggestions: the DI and SI models are easier to implement; visual
attention can modulate audiovisual speech perception; there are strong
inter-individual variations as well as inter-linguistic differences; the RD and RM
models seem to be more likely to reflect the cognitive processes underlying
auditory-visual fusion; and several behavioral studies provide valuable information
on the fusion process, adding to the role of vision in understanding speech in noise,
358 7 Towards AmI Systems Capable …

improving auditory detection of spoken utterances in noise, and reinforcing the


voicing feature (auditory perception) by vision, as mentioned above.
The above discussion aims to provide insights into understanding the complexity
inherent in the cognitive processes underlying segmental and supra-segmental
perception of speech as well as given an idea about the challenges HCI designers
may face when building naturalistic interactive systems, especially in relation to
speech analysis and synthesis, whether pertaining to conversational systems or
emotional context-aware systems. But the challenge remains greater as to building
believable conversational systems, since they are concerned with modeling natural
language, emulating human verbal communication behavior.
In relation to context awareness, the idea of multimodal, supra-segmental per-
ception of speech is very useful to the design of context-aware applications that use
speech recognition and analysis techniques to detect emotional cues as contextual
information from prosodic features of verbal signals. In this case, emotional
context-aware applications using speech as input can incorporate multisensory
fusion techniques based on a relevant fusion model of supra-segmental speech
perception to capture some aspects of the user’s emotional context from prosodic
visible articulatory or other facial correlates that are intended to convey emotions in
spoken communication. Incorporating visual cues in the process of speech recog-
nition can have impact on the accuracy of the detection of the user’s emotional
state. This can in turn have direct implication for the soundness of the interpretation
of and the efficiency of reasoning on the sensed contextual information. Moreover,
the use of visual modality for supra-segmental speech perception is important in
case the acoustic modality is degraded by noise or acoustic channels are unavail-
able. Indeed, not all emotional cues can be available together as context will affect
cues that are relevant. Prosody has potentially visible correlates, and when the
acoustic prosodic information is degraded an auditory-visual fusion can be put
forward in order to enhance speech perception (Dohen 2009) with regard to both
conversational and emotional information. All in all, speech as a medium to
transmit, impart and exchange thoughts, feelings, and complex meanings and
intentions involves very complex cognitive processes and abilities both on per-
ceptual and behavioral level.

7.5.2 Vocal-Gestural Coordination and Correlation


in Speech Communication

Speech is inherently verbal and gestural. Humans use a wide variety of articulatory,
facial and hand gestures when speaking. For example, gestural movements range
from simple actions of using the hand to point at objects to the more complex
actions that allow communication with others. There is also coordination between
speech and gestures, that is, our hand move along with orofacial articulatory
gestures.
7.5 Speech Perception and Production … 359

7.5.2.1 Hand-Mouth Coordination and Pre-planning of Motor


Behavior in Speech

Research suggests that hand-mouth coordination has role in the development of


language and communication. Arbib (2003, 2005) and Holden (2004) argue that
there has been a co-evolution of manual and speech gestural systems towards
communication. Iversen and Thelen (2003) demonstrate that the motor control
system of the vocal tract and that of the hand develop in cooperation. This per-
spective underscores the underlying relationship between hand gestures and artic-
ulatory gestures—hand-mouth gestural combination in human communication. The
motor links between the hand and the mouth develop progressively from birth until
they reach synchrony around average 17 months (Iversen and Thelen 1999). As
illustrated in many studies (Goldin-Meadow and Butcher 2003; Pizzuto et al. 2005;
Volterra et al. 2005), using hand gestures to point at objects appears to play a
particular role at many stages of language development, especially in speech
development in infants.
Gesture-speech coordination in adult’s speech communication has been exten-
sively researched. Theoretically, both simple gestural movements as well as coor-
dination between orofacial and hand gestures require timing and coordination of
muscle activity in order to be effective. It is hypothesized that the brain forms a plan
of commands, which is then performed resulting in the desired action (Banich 1997;
Jacko and Sears 2003). Experimental investigations on speech ‘tend to lend cre-
dence to the idea of pre-planning of motor activity… Further support for the theory
that sequences of motor behavior are pre-planned comes from experiments
involving subjects speaking sentences of varying lengths. Subjects were given
sentences to speak when given a signal. The response time between the signal and
beginning speech related directly to the length of the sentence. If no preplanning is
occurred the response time would be expected to be the same irrespective of the
length of the sentence.’ (Abawajy 2009, p. 63). Moreover, the association between
speech and hand gestures seems to be of a motor rather than a perceptive nature
(Dohen 2009). In this regard, carrying out a number of experiments using a
dual-task paradigm, Hollender (1980) demonstrates that when combining speech
and gesture, there is a delay to the vocal response in comparison with the speech
alone condition for the two systems to be synchronized, and the conclusion is that
the system adapts due to limited processing capacities. There are other empirical
studies (see, e.g., Fagot and Pashler 1992; Castiello et al. 1991) that tend to suggest
that hand-mouth coordination may not be so strict. Indeed, research suggests that
while relation between speech and lip movements is obvious, the correlation
between speech and gestures is not that significant due to cultural variations
between people. As regards pointing as a gestural movement, there is synchrony
between the gesture pointing at objects and the part of speech that determines the
action. In two separate studies on pointing and speech, Feyereisen (1997) and
Levelt et al. (1985) observe a delay in both the manual and vocal responses in the
gesture and speech condition, but this delay was greater for the vocal response, and
conclude that different processes compete for the same resources, which explains
360 7 Towards AmI Systems Capable …

the delay measured for vocal responses. As explained by Dohen (2009, p. 32) ‘this
delay could simply be due to coordination requirements: the vocal and gestural
responses would have to be synchronized at some point and when a gesture is
produced at the same time as speech, speech would wait in order for the synchrony
to be achieved.’

7.5.2.2 Speech-Driven Facial Gestures

There is a large body of work on speech-driven facial gestures, correlation between


speech signal and occurrence of facial gestures. A systematic investigation conducted
by Ekman (1979) shows that eyebrow movements occur during word searching
pauses or when emphasizing certain words or segments of the sentence. With
emphasis on the role of facial displays in conversation, Chovil’s (1991) findings
indicate that syntactic displays (punctuators, emphasized words) are the most frequent
facial gestures accompanying speech, and among which raising or lowering eyebrows
are the most relevant. In their investigation of the relationships between rapid eye-
brow movement and fundamental frequency changes, Cavé et al. (1996) suggest that
these are consequence of linguistic and communicational choices rather than auto-
matically linked. In fact, the relation between facial gestures and vocal patterns is not
so strong, and variations among people are greater (Zoric et al. 2009). Further, Honda
(2000) connects pitch and head movement and Yehia et al. (2000) linearly map head
motion and pitch contour (F0). It is worth noting that regardless of whether the work is
related to psychological and paralinguistic (presented above) or phonological, neu-
rological, and cultural research, it remains relevant to synthesizing natural behavior of
ECAs in terms of facial gestures. Furthermore, there is a large body of research work
on nonverbal communication of the face. This work provides rules for generating
facial gestures (e.g., avoidance of gaze, eyebrows raising, and frowning) during
thinking or word-search pauses (Cassell et al. 1994b; Lee et al. 2002), rules for the use
of blinking as a manipulator (Pelachaud et al. 1996), and rules considering gaze in the
function of turn-taking (Lee et al. 2002). In general, the nonverbal communication of
the face, involve different types of signals and displays, including facial gestures,
prosodic features of speech and related facial gestures, lip-movement (synchroniza-
tion), explicit verbal displays, and explicit emotional displays.
Hand-mouth and speech-driven facial gestures has become a topic of interest to
computational linguistics community. It is of high relevance to building ECAs,
especially those with human-like graphical embodiment. It is no easy task, though,
to computationally model coordination between orofacial articulatory gestures and
hand gestures and correlation between nonverbal speech signals and facial gestures.
ECAs research shows that proper coordination of verbal and nonverbal commu-
nicative behavior is one of the significant challenges in ECA research, but reliance
on the careful production of multimodal cues serve many important communicative
functions. Modulating spoken language through synchronized or correlated pro-
sodic articulatory, facial and hand gestures to modify and nuance meaning or
convey emotions constitutes a new focus area in ECA research.
7.6 Context in Human Communication 361

7.6 Context in Human Communication

Context is an important topic in the literature on human communication. It touches


upon the basic patterns and structures of human interaction in daily life. It defines,
shapes, and changes interaction, as meaning to interaction is ascribed within the
(evolving) context. We use context unconsciously because we are unaware of how
our minds supply or perceive it and adapt to it. Human interaction is heavily based
on context as an infinite richness of assumptions and factors, against which relevant
facts and concerns are delimited in the form of dynamic interweaving of internal
and external entities, including motivational, emotional, cognitive, physiological,
biochemical, pragmatic, empirical, ethical, intellectual, behavioral, social, cultural,
normative, situational, physical, and spatiotemporal elements. All (inter)actions
carried out by humans take place in context—in a certain situation and within a
particular environment. Context influences not only the content and patterns of
communication, but also the selection of communicative behavior in terms of
modalities and thus channels as well as the interpretation of verbal and nonverbal
communication behavior. Meaning, whether enacted through speech or conveyed
through prosodic articulatory, facial, and hand movements, is influenced and
shaped by context—how context is perceived as an expression of a certain situation,
so too is (inter)action in this situation. In other words, meaning to interaction
constructed in it is given within the changing context.
Context is a fluid and ill-defined concept. It is of a multifarious and multifaceted
nature. Therefore, it is of no easy task to delineate what constitutes a context,
whether in relation to face-to-face conversation functions (i.e., interactive, content,
cognitive, emotional, and attitudinal) or human interaction. This emanates from the
complexity inherent in comprehending its characteristics—dynamic, unstructured,
changeable, volatile, indiscernible, unconscious, intractable, and subtle—as well as
and how its components interrelate dynamically to form a contextual amalgam that
shapes interaction. At present, the number of theoretical definitions of context is
large. Notwithstanding the agreement on many issues, there is still no definitive
theoretical definition of context. Likewise, there are several technical definitions
that have been suggested in the literature on context awareness, generating a
cacophony that has led to an exasperating confusion in the field of context-aware
computing.
Context may be divided heuristically into two categories, macro and micro
context: the first concerns the overall human interaction and may encompass
motivational, emotional, ethical, intellectual, sociocultural, and normative contex-
tual assumptions, and the second pertains to verbal or nonverbal language and may
encompass syntactic, prosodic, and semantic contextual elements, or informative,
interactive, communicative, intentional, behavioral, and environmental circum-
stances, respectively.
362 7 Towards AmI Systems Capable …

7.6.1 Multilevel Context Surrounding Spoken Language


(Discourse)

There is a set of contextual elements that surround and influence spoken language,
including linguistic (e.g., syntactic, semantic), pragmatic, sociolinguistic, and
extra-linguistic. Lyons (1968) describes several linguistic situations which appear
on different levels and in which the context should be used. For example, on the
syntactic level, a word can have multiple lexical categories (e.g., verb, noun), and
thus the context of the respective word formed by the surrounding words has to be
used to determine the exact lexical class of the word, whether it is a verb or a noun.
For example, a word like look can be a noun (e.g., Please have a look) or a verb
(e.g., I will look at it soon). A similar thing occurs on a semantic level—denotata of
words and sentences. The meaning of a single word and, on an even high level, the
grammatical mood of a sentence depends on the context. An utterance classified as
declarative may become imperative, ironic, or express other meanings under the
right circumstances depending on how it is conversed. This relates to prosodic
features associated with spoken utterances, with what tone of voice it was uttered,
which involves a whole set of variations in the characteristics of voice dynamics:
volume, tempo, pitch, speed, rhythm, intensity, fluctuation, continuity, and so on.
Indeed, prosody is used to nuance meaning and thus reflects various features of
utterances: the form pertaining to statement, question, or command or other aspects
of language that may not be grammatically or lexically encoded in the spoken
utterances. Prosody may facilitate lexical and syntactic processing and express
feelings and attitudes (Karpinski 2009). As Lyons (1977) states: ‘…a speaker will
tend to speak more loudly and at an unusually high pitch when he is excited or
angry (or, in certain situations, when he is merely simulating anger…’. In nonverbal
communication parlance, the use of paralanguage serves primarily to change
meaning or convey emotions.
Other contexts that are considered when it comes to spoken language include
sociocultural, historical, pragmatic, and extra-linguistic, and so on. Halliday and
Hasan (1976) consider various aspects of what they label the context of situation in
terms of the environment in which discourse is situated. This environment is used to
put the (spoken or written) text into perspective. For example, a set of non-cohesive
sentences might become not so, that is, understood correctly as a coherent passage
of discourse under a particular set of contextual elements—the context of situation.
According to the authors, three different components can be used to describe the
context of situation: the field, the tenor, and the mode. The field is the current topic
under discussion (dialog context); the tenor entails the knowledge about and the
relationship between the participants in the discourse; and the mode is about the
communication channel (the genre of the interaction, the type of channel). In
relation to the latter, Karpinski (2009) points out that each communication channel
has its particular properties and it varies in the range of ‘meanings’ it may convey
and in the way it is used, e.g., a facial expression is frequently used for feedback
or emotional reactions. However, remaining on the contextual aspects of discourse,
7.6 Context in Human Communication 363

as another example on a higher level of discourse, AmI as a cluster of discourses—a


discursive field where language is structured according to particular patterns that
technology creators and industry experts’ utterances follow when they take part in
the domain of AmI—around the relationship between technology and society cir-
culate in European culture. It is in the broader social context that AmI as knowledge
constructions are ascribed meaning, form, and, ultimately, applied. This is associ-
ated with discourse analysis, an analytical approach which serves to examine
spoken (and written) texts, by deducing how meaning is constructed and how this
construction shapes actions of historical actors. This is carried out by exploring
patterns in and across a collection of utterances within a particular thematic area and
identifying the social implications of different discursive constructions of reality.
Discourse analysis has been applied on a variety of relevant social and scientific
fields, including ICT, HCI, AmI, AI, cognitive science, (applied) linguistics and its
subfields, and so on. However, Foucault (1972) asserts that knowledge, whether
theoretical or practical, is fundamentally contextual and constantly a matter of
episteme. In other words, it is culturally specific and historically contingent. This is
one of the premises of social constructionism that we are fundamentally cultural and
historical beings and our knowledge about the world is the product of culturally and
‘historically situated interchanges among people’ (Burr 1995; Gergen 1985,
p. 267). Hence, our knowledge constitutes one construction of the world among
many other possible constructions and hence is grounded on perennially changing
claims, assumptions, and values.
Pragmatics is concerned with the ways in which context contributes to (the
interpretation of) the meaning of utterances in situational contexts. This implies that
meaning of utterances relies on such contextual factors as place, time, manner,
style, situation, the type of conversation, and relationship between speakers. Part of
pragmatic competence (e.g., functional competence, interactional competence) is
about linguistic performance which also involves extra-linguistic factors pertaining
to the speaker, including physical well-being, mnemonic skills, encyclopedic
knowledge, absence of stress, and concentration. These contextual factors have
influence on spoken language in terms of performance.
Spoken language is also influenced by sociocultural contexts. To use and
understand language in different sociolinguistic contexts entails being sensitive to
sociocultural conventions (norms governing relations between genders, generations,
classes, social groups, and ethnic groups) as well as behaviors, attitudes, values,
prejudices, and preferences of different speech communities. Accordingly, different
social contextual variables (e.g., status, education, religion, ethnicity, age, and
gender) are usually adhered to when attempting to convey and understand meaning
in sociocultural environment.
364 7 Towards AmI Systems Capable …

7.6.2 Context Surrounding Nonverbal Communication


Behavior

There are various circumstantial factors that affect nonverbal communication


behavior, including physical, pragmatic, sociolinguistic, interactive, informative,
intentional, paralinguistic, and extra-linguistic contexts. Ekman and Friesen (1969)
discuss various aspects pertaining to the interpretation of nonverbal behavior. One
element of their analysis is the usage of the behavior, i.e., looking at the context of
the behavior, the consistent circumstances that surround the behavior. These cir-
cumstances can be clustered into several categories:
• External condition—refers to environmental circumstances such as the setting
(home, work, school, classroom, formal meeting, etc.). This can also include
physical conditions (noise, lighting, temperature, pressure, etc.).
• Awareness—is about whether the communication actor knows he/she is per-
forming a particular nonverbal act at the moment he/she does it.
• Intentionality—specifies whether the communication actor does the behavior
deliberately or not.
• Relationship to verbal behavior—entails the relationship of the nonverbal
with the verbal behavior, e.g., if the nonverbal behavior accents, augments,
repeats, or contradicts certain words. This relates to the Knapp and Hall’s (1997)
identified general ways in which nonverbal communication blends with verbal
communication, which illustrate the wide variety of meta-communication
functions that nonverbal messages may serve to accent, complement, contradict,
regulate, repeat, or substitute for other messages.
• External feedback—defines signals that the listener sends back, e.g., raises
eyebrow, to the speaker to acknowledge that he/she perceives and evaluates the
speaker’s actions.
• Type of information conveyed—refers to the different classes of nonverbal
behavior, including communicative, informative, and interactive. Interactive—
an act by one communication participant which clearly tries to modify or
influence the interactive behavior of the other participant(s). Communicative—a
consciously intended act by the speaker to transmit or convey a specific message
to the receiver. Informative—an act (may or may not be intentional) that pro-
vides the listener with some information, that it, the act at least carries some
meaning for the listener. In relation to conversation analysis, pragmatics dis-
tinguishes two intents or meanings in each communicative or speech act: (1) the
informative intent or the utterance meaning, and (2) the communicative intent or
speaker meaning (Leech 1983; Sperber and Wilson 1986).
Like in verbal communication, pragmatic, sociolinguistic, and paralinguistic fac-
tors provide a key to decode the meaning of, or disambiguate, nonverbal commu-
nicative behavior. Moreover, in relation to speech, nonverbal behavior as an
important part of human communication provides a key to conveying the context of
statements or to decoding spoken language. In a conversation, the listener/recipient
7.6 Context in Human Communication 365

relies heavily on the facial expressions or gestures of the speaker to decode how
his/her messages are being interpreted, i.e., inferring the speaker’s emotional stance to
his/her utterances. Likewise, the speaker/sender can determine the listeners’ reaction
to what is being said. Pantic and Rothkrantz (2003) found that when engaged in
conversation the listener determines whether he/she is liked or disliked by relying
primarily upon facial expressions followed by vocal intonation, while words tend to
be of minor weight. Facial, gestural, and corporal behavior constitute a rich source of
information that humans share in an implicit and subtle way and that has a seminal
shaping influence on the overall communication. In all, context affects the selection
and the interpretation of nonverbal communicative behavior, which in turn contrib-
utes to conveying the context of spoken utterances and decoding their meanings.

7.7 Modalities and Channels in Human Communication

What context moreover influences is the selection of modalities, and thus com-
munication channels, used to express communicative intents. Multi-channel and
multi-modal are two terms that tend to be often mixed up or used interchangeably.
However, they refer to quite distinct ideas of interaction between humans and
between humans and computers (HCI). In human–human communication, the term
‘modality’ refers to any of the various types of sensory channels. Human senses are
realized by different sensory receptors (see previous chapter for further detail).
Communication is inherently a sensory experience, and its perception occurs as a
multimodal (and thus multi-channel) process. Multimodal interaction entails a set of
varied communication channels provided by a combination of verbal and nonverbal
behavior involving speech, facial movements, gestures, postures, and paralinguistic
features, using multiple sensory organs. Accordingly, one modality entails a set of
communication channels using one sensory channel and different relevant classes of
verbal and nonverbal signals. In reference to dialog act systems, Karpinski (2009)
describes modality as a set of communication channels using one sensory channel
and a relevant class of verbal or nonverbal signals. Basically, nonverbal commu-
nication involves more channels than verbal communication, including space,
silence, touch, and smell, in addition to facial expressions, gestures, and body
postures. Indeed, research suggests that nonverbal communication channels are
more powerful than verbal ones; nonverbal cues are more important in under-
standing human behavior than verbal ones—what people say. Particularly, visual
and auditory modalities, taken separately, can enable a wide range of communi-
cation channels, irrespective of the class of verbal and nonverbal signals. For
example, visual modality provides various channels from facial gestures (e.g.,
eyebrow raising, eyebrow lowering, eye blinking, eye gaze, and head nods, as well
as visual orofacial articulatory or other facial correlates of prosody) and from
gestures (e.g., fingers, arms, and hands). On the other hand, auditory modality
provides textual channels (e.g., words, syntactic structures) and prosodic channels
(e.g., pitch, tempo, rhythm, intonation).
366 7 Towards AmI Systems Capable …

7.8 Conversational Systems

7.8.1 Key Research Topics

Research within conversational systems takes up so many different topics. ECA as a


‘crossover approach’ is related to a lot of computer science, AI, linguistics, and
nonverbal communication behavior topics, including calm and context-aware
computing, knowledge-based and multimodal user interfaces, sensor technology,
machine learning and reasoning, information/knowledge representation and pro-
cessing, multi-agent software, intelligent agents, speech recognition and synthesis,
natural language modeling, multimodal communication behavior modeling, and so
on. Most of relevant computer science topics have been addressed in previous
chapters; though in relation to context-aware computing, they are applicable to
conversational systems. Some of research topics that are studied by computational
linguistics include: design of parsers for natural languages, computational grammar;
computational complexity of natural language modeled with the application of
context-sensitive grammar; and computational semantics (including defining logics
for linguistic meaning representation and reasoning). As interdisciplinary endeav-
ors, projects in computational linguistics may involve language experts, profes-
sional linguists, computer scientists (specialized in natural language processing), AI
experts, computational mathematicians, logicians, cognitive scientists, cognitive
psycholinguists, neuroscientists, anthropologists, social scientists, and philoso-
phers, among others. Many of these scholars are involved (as an interdisciplinary
team) in the international research project aimed at building conversational systems
or ECAs. In addition to, or in parallel with, research endeavors being undertaken
within the areas of computational linguistics and psycholinguistics across Europe
and in the USA, several research institutions are carrying out research within the
areas of computational pragmatics and computational sociolinguistics—computa-
tional modeling of interactive systems in terms of dialog acts, intention
recognition/pragmatics, and interpretation or generation of multimodal communi-
cative behavior in different sociolinguistic contexts and based on different prag-
matic situations. Most of the projects being conducted in relation to conversational
systems are of multidisciplinary and interdisciplinary nature, involving knowledge
from and across the fields of linguistics, psycholinguistics, neurolinguistics, com-
putational linguistics, computational pragmatics, computational sociolinguistics,
cognitive science, and speech-accompanying gestures. For example, the Center of
Excellence ‘Cognitive Interaction Technology’ (CITEC) at Bielefeld University,
Germany, carries out interdisciplinary research into understanding the functional
processes of cognitive interaction with the goal of replicating them in computer
systems. In what remains of this chapter, the intent is to shed light on some key
research topics associated with conversational systems, with a particular emphasis
on ECAs and SAIBA framework. Besides, a comprehensive set of research topics
on conversational systems is beyond the scope of this chapter.
7.8 Conversational Systems 367

7.8.2 Towards Believable ECAs

The research on ECAs—or its components—has been active for more than two
decades in academic circles. It draws on theoretical and empirical research from
linguistics and its subfields (specifically psycholinguistics, pragmatics, sociolin-
guistic, and cultural linguistics) as well as human nonverbal communication
behavior. Many models, theories, frameworks, and rules of speech and gestural
communication have been investigated and applied to computer systems within the
area of AI. A large body of studies has been conducted in simulated environment by
experts and scholars in computational linguistic or joint research groups and
resulted in the development of a range of systems that attempt to emulate natural
interaction. Many of these systems are, though, far from real life implementation.
There is way more that needs to be done than what has been accomplished thus far
given the complexity associated with mimicking human verbal and nonverbal
communication behavior, simulating natural language and natural forms of com-
munication into computer systems, adding to the evaluation of constructs, models,
methods, and instantiation as components that underlie conversational systems.
Especially, the objective of research within AI and AmI is to build fully functional
and well realized ECAs that are completely autonomous. One of the most inter-
esting investigations happening in the area of ECA is how to present communi-
cative behavior at two levels of abstraction, namely the higher level of
communicative intent or function and the lower level of physical behavior
description, using the SAIBA framework as an international research platform.
Prior delving into the discussion of what has been accomplished and under research
in relation to conversational systems, it might be worth providing short background
information on ECAs.

7.8.3 Embodied Conversational Agents (ECAs)

ECAs are autonomous agents that have a human-like graphical embodiment, and
possess the ability to engage people in face-to-face conversation (Cassell et al.
2000). This agent can create the sense of face-to-face conversation with the human
user, as it is capable of receiving multimodal input and producing multimodal
output in nearly real-time (Vilhjálmsson 2009). In HCI, it represents a multimodal
user interface where modalities are the natural modalities of human conversation,
namely speech, facial expressions and gestures, hand gestures, and body postures
(Cassell et al. 2000). ECAs ‘are capable of detecting and understanding multimodal
behavior of a user, reason about it, determine what the most appropriate multimodal
response is and act on this.’ (ter Maat and Heylen 2009, p. 67). ECAs are concerned
with natural interaction given that when constructing believable ECAs, the rules of
human verbal and nonverbal communication behavior must be taken into account.
368 7 Towards AmI Systems Capable …

7.8.4 Research Endeavor and Collaboration for Building


ECAs

For almost three decades, there has been an intensive research in academic circles as
well as in the industry on UbiCom and AmI with the goal to design a horde of next
generation technologies that can support human action, interaction, and communi-
cation in various ways, taking care of people’s needs, responding intelligently to
their spoken or gestured indications of desire, and engaging in intelligent dialog or
mingling socially with human users. This relates to conversational systems/agents
which can enable people to engage in intelligent interaction with AmI interfaces.
A collaborative research between scholars from various domains both within the
area of AmI and AI is necessary to achieve the goal of creating a fully interactive
environment. Indeed, one of the many research projects being undertaken at the MIT
within the field of AI is NECA project, which aims to develop a more sophisticated
generation of conversational systems/agents, virtual humans, which are capable of
speaking and acting in a human-like fashion (Salvachua et al. 2002). There is also an
international research community (a growing group of researchers) that are currently
working together on building conversational systems, a horde of believable virtual
humans that are capable to mingle socially with humanity (Vilhjálmsson 2009).
Again collaboration in this regard is of critical importance to make a stride towards
the goal. Building conversational systems requires bringing researchers together and
pooling their efforts, the knowledge of their research projects, in order to facilitate
and speed up the process. Vilhjálmsson (2009) recognizes that collaboration and
sharing of work among research communities that originally focus on separate
components or tasks relating to conversational systems ‘would get full conversa-
tional systems up and running much quicker and reduce the reinvention of the
wheel’. Towards this end, in 2007 an international group of researchers began laying
the lines for a framework that would help realize the goal, with a particular emphasis
on defining common interfaces in the multimodal behavior generation process for
ECA (Ibid). Following the efforts for stimulating collaboration, the research group
pooled its knowledge of various full agent systems and identified possible areas of
reuse and employment of standard interfaces, and, as a result, the group proposed the
so-called SAIBA framework as a general reference framework for ECA (Kopp et al.
2006; Vilhjálmsson and Stacy 2005). The momentum is to achieve the plan of
constructing a universal framework for multimodal generation of communicative
behavior that allows the researchers to build whole multimodal interaction systems.
Currently, research groups working on ECAs focus on different tasks, ranging from a
high level (communicative intents) to a low level (communicative signals) of the
SAIBA framework. The definition of this framework and its two main interfaces are
still at an early stage, but the increased interest and some promising patterns indicate
that the research group may onto something important (Vilhjálmsson 2009). As an
international research platform, SAIBA is intended to foster the exchange of com-
ponents between different systems, which can be applied to autonomous conver-
sational agents. This is linked to the aspect of AmI interfaces that aim to engage in
7.8 Conversational Systems 369

intelligent dialog with human users and that use verbal and nonverbal communi-
cation signals as commands, explicit inputs as speech waveform or gestural cues
from the user to perform actions.
Furthermore, another type of collaborative research that is crucial towards the
aim of building conversational systems is the interdisciplinary scholarly research
work. Endeavors should focus on raising awareness among the active researchers in
the disciplines of linguistics and its subfields and nonverbal communication
behavior about the possibility to incorporate up-to-date empirical findings and
advanced theoretical models in conversational systems, in particular in relation to
the context surrounding nonverbal communication and verbal communication
(especially in relation to semantics), mouth-hand coordination, speech-face syn-
chronization, communication error and recovery, and so on. Modelers in linguistic,
psycholinguistic, sociolinguistic, pragmatic, and behavioral disciplines should
develop an interest in ECA as a high-potential application sphere for their new
models. They can simultaneously get inspiration for new problem specifications and
new areas that need to be addressed for further developments of the disciplines they
study in relation to ECA. Examples of computational modeling areas of high
topicality to ECA may include: multimodal speech perception and generation,
multimodal perception and generation of nonverbal behavior, situated cognition and
(inter)action, mind reading (e.g., communicative intent), psycholinguistic pro-
cesses, emotional processes, emotional intelligence, context awareness, generative
cultural models, multilingual common knowledge base, and so forth. Far-reaching
conversational systems crucially depend on the availability of adequate knowledge
about human communication. And interdisciplinary teams may involve, depending
on the research tasks, language experts, professional linguists, computer scientists,
AI experts, computational mathematicians, logicians, cognitive scientists, cognitive
psychologists, psycholinguists, neuroscientists, anthropologists, social scientists,
and philosophers, among others.

7.8.5 SAIBA (Situation, Agent, Intention, Behavior,


Animation) Framework

As mentioned above, the working of ECA researchers has introduced the SAIBA
framework as an attempt to, in addition to stimulating sharing and collaboration,
scaffold the production process, a time-critical production process with high flex-
ibility, required by the generation of natural multimodal output for embodied
conversational agents. SAIBA framework involves two main interfaces: Behavior
Markup Language (BML) at the lower level between behavior planning and
behavior realization (Kopp et al. 2006; Vilhjálmsson et al. 2007) and Function
Markup Language (FML) at the higher level between intent planning and behavior
planning (Heylen et al. 2008). As illustrated in Fig. 7.2, the framework divides the
overall behavior generation process into three sub-processes, starting with com-
municative intent planning, going through behavior planning, and ending to actual
370 7 Towards AmI Systems Capable …

Fig. 7.2 The SAIBA framework for multimodal behavior, showing how the overall process
consists of three sub-processes at different levels of abstraction, starting with communication intent
and ending in actual realization in the agent’s embodiment. Source Vilhjálmsson (2009)

realization trough the agent’s embodiment. In other words, the framework specifies
multimodal generation of communicative behavior at a macro-scale, comprising
processing stages on three different levels: (1) planning of a communicative intent,
(2) planning of a multimodal realization of this intent, and (3) realization of the
planned behaviors. FML interface describes the higher level of communicative
intent, which does not make any claims about the surface form of the behavior, and
BML interface describes the lower level of physical behavior, which is realized by
an animation mechanism, instantiating intent as a particular multimodal realization
(Vilhjálmsson 2009). Moreover, in SAIBA framework, the communicative function
is separated from the actual multimodal behavior that is used to express the com-
municative function (ter Maat and Heylen 2009). As illustrative examples, the
function ‘request a feedback’ is conceptually separated from the act of raising
eyebrows and breathing in to signal that you want a feedback, and the function ‘turn
beginning’ is conceptually separated from the act of breaking eye-contact. The
separation is accomplished by putting the tasks of the communicative functions and
signals in two different modules that should be capable of communicating the
relevant functions and signals to each other, a process that is performed using FML
and BML as specification languages behavior (ter Maat and Heylen 2009).

7.8.6 Communicative Function Versus Behavior


and the Relationship

It is first important to understand the difference between communicative function,


which is specified with FML, and communicative behavior, specified with BML
7.8 Conversational Systems 371

As pointed out above, in the act or process of communication information (con-


versational content) is intended, channeled and imparted by a sender to a receiver
via some medium, such as speech, facial gestures, hand gestures or a combination
of these. This implies that each act of communicating starts with a communicative
intent (what a speaker intends to do such as initiate a topic, request a feedback, give
turn, inform, clarify, express a stance, convey emotion or complex intention, etc.) as
a mental activity carried out in the mind of the sender before any of his/her com-
munication behavior gets produced as a way to transform the intent into a concrete
form, which occur in accordance with governing rules of spoken communication
involving both verbal and nonverbal behavior. What gets generated as a commu-
nicative behavior is the realization that fulfill or best serves the original intent in a
particular communication situation, given the unique constraints and conventions
(Vilhjálmsson 2009). The realization of vocal or/and gestural communicative sig-
nals eventually lead to the transmission of the intended messages. This does not
necessarily mean that the meaning is always understood by the receiver as intended
by the speaker. This implies that pragmatic, sociolinguistic, and semantic compe-
tences are still required to decode complex meanings and intentions. Indeed,
communicative intent is in turn surrounded and influenced by situational and social
contexts.
The existing large body of work that attempts to describe human behavior in
terms of its functions in communication emphasizes that these two abstractions
involve psychological, social, cultural, and behavioral processes of human func-
tioning. This conceptualization is of significant value in helping scholars to make
sense of human linguistic, cognitive, and social behavior. It is also very useful to
AmI and AI research with regard to conversational agents that involve multimodal
perception and generation of speech and gestural communication. Whether in
computing parlance or human communication field, there is no one single approach
into describing communication intent and communicative behavior. As
Vilhjálmsson (2009) point out, these have been termed differently, and the various
terms that have been used to describe communicative function at the higher level of
abstraction include: meaning, function, discourse function, intent, and goal, and
those used to describe the more concrete level of realization encompass: sign,
behavior, discourse devise, realization, and action (Ibid). According to the author,
the order these terms are listed tend to roughly correspond to each other, e.g.,
meaning/sign often occur together, and that it is unlikely that a particular pair will
be used as a standard. In addition, these terms can be interpreted in multiple ways or
adopted based on application domains, which indeed tend to differ as to opera-
tionalizing and conceptualizing aspects of human verbal and nonverbal commu-
nication behavior in terms of simplifications. Notwithstanding the perpetual slight
differences in their interpretation, these terms aim at the same purpose: ‘to create
meaningful and useful levels of abstraction to better understand or produce dis-
cernible acts’ (Ibid).
There is a large body of work (e.g., Argyle et al. 1973; Fehr and Exline 1987;
Chovil 1991; Kendon 1990; Cassell et al. 1994b; Thorisson 1997) that attempts to
describe human communicative behavior in relation to its communicative functions.
372 7 Towards AmI Systems Capable …

Research on the mapping from communicative functions to supporting communi-


cative behaviors covers a wide variety of rules that have been discovered empiri-
cally. Examples of mapping rules (most of which quoted from Vilhjálmsson 2009),
how communicative functions have been correlated with visible behavior, include:
it is more likely that gestures occur with new material than given material (Cassell
et al. 1994a); people often change body posture when they change the conversation
topic (Cassell et al. 2001); emphasis generally involves raising or lowering of the
eyebrows (Argyle et al. 1973); speaker commonly selects next speaker with gaze
near the end of their own turn (Kendon 1990); speaker commonly breaks
eye-contact at turn beginning (Argyle and Cook 1976); speaker looks at listeners
and raises their eyebrows as he/she expects feedback and listeners raise eyebrows in
response (Chovil 1991), speaker signals the intention of a turn offer with an eye
gaze (ten Bosch et al. 2004; Cassell et al. 1999). For further clarification of how
communicative functions can be mapped to facial gestures (communicative
behavior) in relation to conversational signals, punctuators, and regulators, see
Pelachaud et al. (1996), and Ekman and Friesen (1969).
However, there are no standard approaches into mapping communicative
functions to behaviors, as there is no such thing of universal human communica-
tion. Communication is socioculturally dependent in many of its aspects. It is
conventionalized by each society and attuned to its norms and practices. Therefore,
each culture has its own communication rules, although there may be some com-
parable aspects between some cultures. These rules may even slightly differ at
inter-cultural level, especially in relation to facial gestures as nonverbal commu-
nicative behaviors. In addition, it may not be feasible to empirically investigate and
discover the way communicative functions are mapped to communicative behaviors
across all cultures. Nor to ‘talk about the mapping from function to behavior as
absolute rules’; ‘in fact mapping rules are merely regularities that have been dis-
covered empirically, but there are always going to be some exceptions.’
(Vilhjálmsson 2009, pp. 54–55). Therefore, the rule governing the mapping from
function to behavior in the context of ECA assumes a particular culture or social
situation that makes the rule applicable (see Fig. 7.3) (Ibid).

7.8.7 Taxonomy of Communicative Functions and Related


Issues

Communicative functions are usually classified into different categories.


Concerning particularly face-to-face conversation, communicative functions seem
to generally fall into one of three broad categories: (1) interaction functions,
(2) content functions and (3) mental state and attitude functions (Heylen et al.
2008). The first category entails establishing (or initiating), maintaining and closing
the communication channel, which a metaphor for the social contract that
binds communication participants together in the common purpose of talking.
7.8 Conversational Systems 373

Fig. 7.3 Rules that map functions to behavior assume a certain context like the social situation
and culture. Source Vilhjálmsson (2009)

This category relates to functional competence, which is one of the key pragmatic
competences and concerned with the use of spoken discourse in communication.
‘Conversational competence is not simply a matter of knowing which particular
functions (microfunctions)…are expressed by which language forms. Participants
are engaged in an interaction, in which each initiative leads to a response and moves
the interaction further on, according to its purpose, through a succession of stages
from opening exchanges to its final conclusion. Competent speakers have an
understanding of the process and skills in operating it. A macro-function is char-
acterized by its interactional structure. More complex situations may well have an
internal structure involving sequences of macro-functions, which in many cases are
ordered according to formal or informal patterns of social interaction (schemata).’
(Council of Europe 2000, p. 125). This category of functions is therefore contextual
as well as socioculturally situated. Initiating a conversation, for example, is greatly
dependent on the cultural conventions, roles of and relationship between the par-
ticipants, politeness conventions, register differences, place, time, and so on. It has
moreover been termed differently. But the widely accepted names are interactional
(Cassell et al. 2001), envelope (Kendon 1990) and management (Thorisson 1997).
The second category involves the actual conversational content that gets exchanged
or interchanged across a live communication channel. Exchanging content evolves at
the own accord of communication participants once the interaction is established.
This has much to do with discourse competence, a pragmatic competence which
involves the ability to arrange and sequence utterances to produce coherent stretches
of language, including control of the ordering of sentences with reference to
topic/focus, given/new, and natural sequencing (e.g., temporal), and organization and
management of discourse in terms of: thematic organization, coherence and cohe-
sion, rhetorical effectiveness, logical ordering, and so on (Council of Europe 2000).
This category also relates to framing in terms of the structuration of discourses,
socially dominant discourses. In this context, framing entails organizing patterns that
gives meaning to a diverse array of utterances and direct the construction of spoken
discourse in the sense of giving meaning and coherence to its content. The third
category is concerned with functions describing mental states and attitudes, which in
374 7 Towards AmI Systems Capable …

turn influence the way in which other functions give rise to their own independent
behavior (e.g., This category is needed to take care of ‘the various functions con-
tributing to visible behavior giving off information, without deliberate intent’, as ‘the
second category covers only deliberate exchange of information’ Vilhjálmsson 2009,
p. 52). In terms of ECAs, cognitive context (intended or unintended meaning) has
proven to be crucial to the functioning of conversational agents. What constitutes a
communicative function and how it should be distinguished from contextual ele-
ments is a critical issue in the current debate about FML (Heylen 2005). Table 7.1
illustrates some examples of all three categories, respectively.

7.8.8 Deducing Communicative Functions from Multimodal


Nonverbal Behavior Using Context

Each category of function is associated with a class of multimodal nonverbal


behavior. In realizing multimodal nonverbal behaviors, a set of communication
channels is established for the purpose of engaging in interaction, conveying
content, and expressing mental states and attitudes. Interaction functions are
associated with facial, hand gestures, and prosody; content functions with facial,
hand and corporal gestures; and mental states with eye and head movements;

Table 7.1 Interaction function, content functions, and mental states and attitude functions
Interaction functions
Function category Example functions
Initiation/closing React, recognize, initiate, salute-distant, salute-close, break-away, etc.
Turn-taking Take-turn, want-turn, yield-turn, give-turn, keep-turn, assign-turn,
ratify-turn, etc.
Speech-act Inform, ask, request, etc.
Grounding Request-ack, ack, repair, cancel, etc.
Content functions
Function category Example functions
Discourse structure Topics and segments
Rhetorical structure Elaborate, summarize, clarify, contrast, etc.
Information structure Rheme, theme, given, new, etc.
Propositions Any formal notation (e.g., ‘own(A,B)’)
Mental states and attitude functions
Function category Example functions
Emotion Anger, disgust, fear, joy, sadness, surprise, etc.
Interpersonal relation Framing, stance, etc.
Cognitive processes Difficulty to plan or remember
Source Vilhjálmsson (2009)
7.8 Conversational Systems 375

and emotional and attitudinal functions with facial expressions and prosody. One
class of nonverbal behavior may serve different functions. For example, prosody
may organize higher levels of discourse and contribute to topic identification
processes and turn taking mechanisms, as well as express feelings and attitudes (see
Karpinski 2009). To determine what a speaker intends to do, assuming that all these
categories of functions, interactive, informative, communicative, cognitive, emo-
tional, and attitudinal might all be involved at a certain point of time (e.g., engaging
in a topic relating to one of the socially dominant discourses such as AmI, to which
the speaker has intellectual standing or institutional belonging, but find it hard to
comprehend how some aspects of it relate to power relations, corporate power, and
unethical practices) requires a sound interpretation of the different nonverbal
communicative behaviors as well as how they interrelate in a dynamic way. And to
carry out this task necessitates using a set of various, intertwined contextual ele-
ments, consistent circumstances that surround the respective conversation. This can
be an extremely challenging task for an ECA to perform. However, for whatever is
formally graspable and computationally feasible, analyzing nonverbal communi-
cation behaviors using (machine-understandable and—processable entities of)
context is important for an ECA to plan, decide, and execute relevant communi-
cative behaviors. Contextual elements such as dialog, sociocultural setting, and the
environment play a significant role in the interpretation of the multimodal com-
municative behavior. Contextual variables are necessary to disambiguate multi-
modal communicative behaviors, that is, to determine the actual multimodal
communicative behavior that is used to express an interaction function, a process
which entails using the context to know which communicative functions are
appropriate at a certain point of time, and this knowledge of context can be used to
determine what was planned or intended with a given signal (ter Maat and Heylen
2009). Given the focus of SAIBA framework, generation of multimodal commu-
nicative behavior, the emphasis in the following section is on the category of
interaction or conversational functions, and the associated nonverbal communica-
tive behaviors (especially facial gestures).

7.8.9 Conversational Systems and Context

Context has recently become of particular interest to ECA community. If context


defines and changes human interaction, it must influence any interactive entity
(computational system) that interacts with humans. Hence, context awareness
functionality is at the core of ECAs, and thus advanced context models should be
implemented in ECAs, especially these are assumed to be believable human rep-
resentatives. The various contextual dimensions underlying human communication,
including linguistic, pragmatic, sociocultural, extra-linguistic, paralinguistic,
behavioral, situational, and environmental components, and how some of them
interrelate should be modeled into conversational systems, particularly those with
human-like graphical embodiment, so that they can be able to engage in intelligent
376 7 Towards AmI Systems Capable …

dialog or mingle socially with human users. Besides, when building believable
ECAs, the rules of human communication must be taken into account; they include,
in addition to natural modalities, common knowledge base, and communication
error and recovery schemes, the diverse, multiple contextual entities that surround
and shape an interaction between human users and computer systems. To create
interaction between humans and computer systems that is closer to natural inter-
action, it is necessary to include various implicit elements into the communication
process (see Schmidt 2005). Like context-aware applications, ECAs need to detect
the user’s multimodal behavior and its surrounding context, interpret and reason
about behavior-context information, determine the most appropriate multimodal
response, and act on it.
There are several technical definitions that have been suggested in the literature
on context awareness, generating a cacophony that has led to an exasperating
confusion in the field of context-aware computing. ECA community is not immune
to the difficulty of context definition. Accordingly, in relation to conversational
systems, the concept of context is operationalized in a simplified way compared to
what is understood as context in human communication in academic disciplines
specialized on the subject matter. Based on the literature on ECA, context consists
of three entities: the dialog context, the environmental context, and the cultural
context (see Samtani et al. 2008). These contextual elements are assumed to sur-
round and influence the interpretation of the communicative behavior that is being
detected and analyzed as multimodal signals by an ECA system and also to
determine its communicative intent and behavior. It is important to note that in
relation to the SAIBA framework no detail is provided as to which features of each
component of context are implemented in conversational systems, nor is there an
indication of how these features are interrelated in their implementation, e.g., the
current topic and the level of tension between participants as part of dialog context
relate to a particular socially dominant discourse and cultural conventions as part of
cultural context.

7.8.10 Basic Contextual Components in the (Extended)


SAIBA Framework

Until recently, context has become an important topic in research on conversational


agents. In relation to the SAIBA framework, Samtani et al. (2008) analyze the
context of multimodal behavior of interactive virtual agents and argue that selecting
the concrete conversational behavior to carry out as an agent when a conversational
function is provided cannot be accomplished without any context. To add context to
mapping Function Markup Language (FML) to Behavior Markup Language
(BML), they suggest a new representation, what has come to be known as Context
Markup Language (CML), a specification language that is created to communicate
context and consists of three parts: the dialog context, the environmental context,
7.8 Conversational Systems 377

and the cultural context. The dialog context considers the dialog history, the current
topic under discussion, the communication characters and the level of tension
between them; the environmental context takes into account such elements as the
location, local time, and the current setting; and the cultural context includes cul-
tural conventions and rules, e.g., information on how culturally express certain
communicative functions in an appropriate way. In this sense, specific theoretical
models of pragmatics and sociolinguistics are taken into account in the imple-
mentation of context into ECAs. Context is crucial to the task of producing a more
natural communicative multimodal output in real-time. It helps an artificial agent to
interpret and reason more intelligently on multimodal communicative input and to
carry out actions in knowledgeable manner. In their work, Agabra et al. (1997)
demonstrate that using context is useful in expert domains and conclude that
contextual knowledge is essential to all knowledge-based systems. However, a
complete model of nonverbal behavior related context seems at the moment
unfeasible. One example of theoretical models is Ekman and Friesen’s (1969),
which encompasses other classes of the consistent circumstances that surround and
influence the interpretation of nonverbal behavior than external environment,
external feedback, and the relationship of the nonverbal with the verbal behavior
(being currently under investigation in relation to SAIBA), including awareness,
intentionality, and type of information conveyed.

7.8.11 The Role of Context in the Disambiguation


of Communicative Signals

Context plays a significant role in the interpretation of nonverbal communicative


behaviors—i.e., determining what is intended by (sometimes synchronized) facial
gestures, acoustical prosodic features, and other nonverbal communicative signals.
context-aware systems detect, interpret, and reason about contextual information
pertaining to the user (that interacts with the system to achieve a certain goal) and
then infer a higher level abstraction of context (e.g., cognitive states, emotional
state, situational state, activity, etc.); and then respond to it by performing actions,
delivering adaptive services. The same logic, context awareness functionality, can
be embedded in conversational systems with the main difference being that the
context-dependent actions fired for service delivery become the context-based
generated multimodal communication behavior. Accordingly, conversational sys-
tems monitor and capture a human user’s multimodal communicative behavior
along with the environmental context, the dialog context, and the cultural context;
interpret and reason about behavioral-contextual information to infer what was
intended by, or deduce the meaning of, interactive, content, and/or mental states and
attitude functions; and then determine the most appropriate response and act on it,
by generating relevant, real-time multimodal communicative behavior. This relates
to, in the context of SAIBA framework, mapping FML to BML, the agent’s
378 7 Towards AmI Systems Capable …

intended functions are mapped into visible behavior using the current context. This
process entails selecting the relevant communicative behavior to perform when a
conversational function is provided, which should in turn be done through context.
Providing the conversational function occurs after mapping BML to FML with
regard to the human user interacting with the agent. To iterate, this process entails
analyzing and interpreting the multimodal input received from the user in the
current context, and then generating an abstract description of the user’s commu-
nicative intent upon which the agent can act.
Mapping the detected multimodal behavior to the intended communicative
functions (the interaction category of functions) is what ter Maat and Heylen (2009)
call ‘disambiguation problem’, and reliance on knowledge of context is aimed to
remove the potential ambiguities that may surround the class of nonverbal com-
municative behavior that tends to have different roles (e.g., conversational signals,
punctuators, manipulators, regulators, etc.) in relation to interaction functions, such
as facial gestures (e.g., eyebrow actions, head motions, head nods, eye gaze, gaze
directions, blinks, eye-contact, etc.). For example, eyebrow actions, head motions,
and eye blinks serve as conversational signals and punctuators; eye blinks (to wet
the eyes) and head nods also serve as manipulators; and eyebrow actions serve as
regulators as well (Pelachaud et al. 1996). However, relying on knowledge of
context to disambiguate nonverbal communicative behavior is necessary for the
effective performance of ECAs as to realizing their own autonomous communi-
cative behavior. In other words, how best the generated communicative behavior
serves the communicative function for the agent is primarily contingent on how
accurately the agent detects and effectively interprets and reason about the context
surrounding the detected communicative signal to determine the underlying
meaning of it. ‘When trying to disambiguate a signal…It does not really matter
which signal was chosen to express a certain function, the important part is to find
the meaning behind the signals…One has to know the context to know which
functions are appropriate at a certain point and this knowledge can be used to
determine what was intended with a detected signal. In this case, the context can act
as a filter, making certain interpretations unlikely.’ (ter Maat and Heylen 2009,
p. 72).
Therefore, by using context, an agent can determine what is intended by a certain
communicative behavior and why it is performed. Thereby, context provides a key
to decoding the meaning of the nonverbal communicative behavior. Accordingly,
depending on the situation, a conversational agent can determine the actual
meaning of an eye gaze or a head nod (e.g., if a head nod is shown just after a direct
yes/no question, then it probably means yes). Both signals may occur in different
contexts with widely diverging meanings. Facial gestures have different roles in
conversational acts. An eye gaze from the speaker might be part of a behavior
complex that signals the intention of a turn offer (ten Bosch et al. 2004; Cassell
et al. 1999) or it may signify a request for feedback (Heylen 2005; Beavin Bavelas
et al. 2002; Nakano et al. 2003). Likewise, a head nod can have different meanings,
it can signify yes, it can serve as a backchannel, it can convey an intensification
(ter Maat and Heylen 2009), or it can mark disapproval.
7.8 Conversational Systems 379

7.8.12 Context or Part of the Signal

It is significant to differentiate between communicative signals and their context, by


delimiting what constitutes a signal against a context, i.e., specifying which ele-
ments are part of a signal and which are part of the context. According to ter Maat
and Heylen (2009, p. 70), ‘an important distinction to make when talking about
signals and their context is what exactly should be defined as context and what
should be considered part of a signal.’ It is though not easy to border what a
communicative signal exactly is and to distinguish between a signal and the con-
text. The parameters of a communicative signal may constitute part of the context of
that signal; they are intertwined with the contextual elements that surround the
signal. On the overlap context has with the definition of signals, ter Maat and
Heylen (2009, p. 70) underline that the difficulty of defining a signal is associated
with two problems: segmentation and classification. ‘The first is the act of deter-
mining what elements are parts of a particular signal. As we are trying to associate
signals with meanings, the question is “What are the meaning-bearing units that we
are talking about?” The input of a conversational agent usually is a constant stream
of multi-modal behaviors (position and movement of limbs, spoken words, etc). To
identify signals in this stream, it has to be cut at certain boundaries. But segmen-
tation not only consists of determining the boundaries of a signal, but also deter-
mining which elements within those boundaries are parts of a signal. Is the
movement of the eyebrows a single signal? Or should that signal include the
complete facial expression? And should the position or the movement of the head
be taken into account as well? The problem of classification arises when a particular
signal has been identified. Just as linguistic phonemes may infinitely differ in their
phonetic realization, the realization of the nonverbal equivalent of a phoneme will
each time be realized in a specific ways along various parameters. Examples of such
parameters are the placement, duration, extension and the speed with which
behavior is executed. Head nods, for instance, can differ in the speed, extension and
number of repetitions. One nod could be a small nod, performed very slowly while
another nod could be very fast and aggressive. This variation might correspond to a
systematic difference in meaning. The classification of a signal based on the settings
of the parameters along which it can vary is similar to the disambiguation of a
signal in context; just as a head nod can mean different things in different contexts;
a nod can also mean different things when the parameters are different. The question
may thus arise whether a variation in context should not be seen as a variation
within the signal.’ (Ibid) The authors conclude that while a line must be drawn to
define what a signal is exactly, it is the complete set of data, the parameters of the
signal itself and the context that matters for the disambiguation, with no signifi-
cance being given to where the segmentation line can be drawn as long as the
complete set remains the same. And how signals should be mapped to conversa-
tional functions is determined by the way signals are described.
380 7 Towards AmI Systems Capable …

7.8.13 Contextual Elements for Disambiguating


Communicative Signals

For an ECA to understand what communicative function was intended by the


communicative behavior upon receiving a multimodal communication signal from
the user, it is necessary to capture various contextual elements that surround such a
signal as well as to determine which of these elements are more relevant as to
helping to find the communicative function that is most likely meant. Indeed,
contextual elements as part of the overall situation can be infinite, hence the need to
assess their relevance to the signal. ter Maat and Heylen (2009) maintain that to
determine what contextual elements from the context are significant (e.g., to know
whether the actor of a signal is the speaker or the listener, which other signals are
sent at the same time, the placement of the signal in the sentence, etc.) is considered
to be an important task in signal disambiguation. According to the authors, the
contextual elements that can make up the context and are used to disambiguate
signals can be divided into three basic categories: parameters, constraining elements
and pointer elements:
Parameters—This category of contextual elements involves the parameters of
the communicative signals themselves. These parameters are just as important as
the context. The way signals should be mapped to conversational functions is
basically determined by how signals are described. ‘When disambiguating signals,
the first thing that has to be done is specifying exactly what defines a signal and to
what communicative functions this signal (without taking context into account yet)
can be mapped.’ For example, a list must be created of all possible intentions of
different facial gestures, such as a head nod, an eye gaze, an eyebrow raising, an
eyebrow lowering, blinking, etc. It is to use the context of the behavior. This ‘also
contains the parameters of the signals you are detecting. Using this information the
task is to make a list of all communicative functions that are possible in the current
context, merge this with the list of functions that the target behavior can mean and
use the resulting as the intended function. Or, if the list of possible functions in the
context is too large (maybe even infinite) you can check ‘what a signal can mean
and then check which meaning fits the current context.’
Constraining elements—This category encompasses the contextual elements that
constrain the presence or choice of functions in certain contexts. Some functions are
impossible to express (expressive constraints) or are inappropriate in certain con-
texts (appropriateness constraints). As illustrative examples, respectively: a person
cannot decline a turn he is not offered first, and it is very inappropriate to greet a
person when you are already talking for quite a while or give the turn while
continuing speaking. In other words, the context determines which communicative
behaviors are appropriate or possible and which are not.
Pointer elements—This ‘category contains the contextual elements that do not
constrain the function choice but help to find the most likely one’, pointing in the
right direction. This is because the same signal may have widely diverging
meanings in different contexts, i.e., it can be interpreted as multiple intentions or
7.8 Conversational Systems 381

express different possible appropriate functions based on the context. Therefore, the
context has to be used to find pointer elements to solve the disambiguation problem.
For example, to iterate, a head nod can mean yes, it can serve as a backchannel, it
can convey an intensification, and so on. In relation to cultural context, Lyons
(1977) state that in certain cultures the nodding of the head with or without an
accompanying utterance is indicative of assent or agreement. In all, pointer ele-
ments can help pick out the most likely function, although there are always going to
be some exceptions. ‘It is also possible that the person or agent producing an
ambiguous signal intended to communicate all the different meanings. Take for
example a backchannel utterance… [T]his can be a continuer, an indicator of the
listener that he does not want the turn and that the speaker should continue
speaking. Another function that uses a backchannel utterance as a signal is an
acknowledgement … In a lot of contexts the difference between these two functions
is hardly visible, in a lot of situation both a continuer and an acknowledgement
would fit. But it is not unimaginable that a backchannel utterance means both at the
same time… When disambiguating signals, these types of ambiguities should be
kept in mind and it should be realized that sometimes the function of a signal
simply is not clear…, and sometimes a signal can have multiple meanings at the
same time.’

7.8.14 Modalities and Channels and Their Impact


on the Interpretation of Utterances and Emotions

Context also influences the selection of modalities and thus communication channels
used to express communicative intents. This is most likely to have implication for
the interpretation of the meaning of spoken utterances as well as emotion messages
conveyed, in particular, through nonverbal behaviors. This occurs based on how
modalities and communication channels enabled by these modalities are combined
depending on the context. In this vein, information conveyed through one modality
or channel may be interpreted differently from if it is delivered by another modality
or channel or, rather, as a result of a set of combined modalities and channels.
According to Karpinski (2009, p. 167), ‘each modality may provide information on
its own that can be somehow interpreted in the absence of other modalities, and that
can influence the process of communication as well as the information state of the
addressee.’ He contends that it is of no easy task ‘to separate the contributions to the
meaning provided through various modalities or channels and the final message is
not their simple “sum.” The information conveyed through one modality or channel
may be contrary to what is conveyed through the other; it may modify it or extend it
in many ways. Accordingly, the meaning of a multimodal utterance should be, in
principle, always regarded and analyzed as a whole, and not decomposed into the
meaning of speech, gestures, facial expressions and other possible components. For
example, a smile and words of appraisal or admiration may produce the impression
382 7 Towards AmI Systems Capable …

of being ironic in a certain context.’ (Ibid, p. 167). This provides insights into
understanding how meaning conveyed through combined modalities and channels
may be interpreted differently in conversational acts. It is of import to account for
such nuances of meaning and the underlying modalities and channels when building
conversational systems as believable human representatives. Karpinski (2009)
proposes a system of dialog acts called DiaGest along with a conceptual framework
that allows for independent labeling of the contributions provided by various
modalities and channels. Involving the study of the communicational relevance of
selected lexical, syntactic, prosodic and gestural phenomena, this project considers
both auditory and visual modalities and defines four channels: text, prosody, facial
expression, and gestures as major ways of providing quasi-independent modal
contributions. In this context, modal contribution is defined ‘as the information
provided through a given modality within the boundaries of a given dialog act.’
(Ibid, p. 167). This concept is introduced to relieve the problem of annotating both
separate modalities as well as the meaning of entire utterances as to dialog acts. Also
in their study, dialog acts are conceptualized as multidimensional entities composed,
or built on the basis, of modal contributions provided by the aforementioned
channels. In this sense, a single modal contribution may, depending on the context,
constitute the realization of a dialog act.
The affective information conveyed through one modality or channel may also
be interpreted differently in the absence of, or be contrary to what is conveyed
through, the other modality. This is of relevance to context-aware systems when it
comes to the interpretation of the user’s emotional states, which are implicitly
captured using multi-sensory devices embedded in the so-called multimodal user
interfaces. Emotional states are inherently multimodal and thus their perception is
multi-channel based. The premise is that the interpretation of emotional information
captured by one modality may differ in the absence or degradation of the other
modality depending on the context, e.g., noise may affect auditory sensors (to
capture emotiveness and acoustical prosodic features of speech related to emotion
conveyance), and darkness may affect visual sensors (to capture facial expressions).
Consequently, the interpretation of the user’s emotional states may not be as sound
as if both modalities and relevant channels are combined in the perception of the
contextual data/information in the sense of whether it is completely or partially
captured as implicit input from verbal and nonverbal communication signals.
Hence, the meaning of the (rather multimodal) emotional state should be, in
essence, analyzed and interpreted as a whole, and not decomposed into the meaning
of emotiveness, prosodic features, facial expressions, and other possible compo-
nents. Accordingly, there is much work that needs to be done in this regard to
advance both context-aware and conversational systems as to detecting and inter-
preting more subtle shades and meaning of verbal and nonverbal behavior, par-
ticularly when different modalities and thus channels are to be combined.
Especially, according to Karpinski (2009, p. 167), each modality and channel ‘has
its particular properties and they vary in the range of “meanings” they may convey
and in the way they are typically employed. For example, the modality of gestural
expression is frequently sufficient for answering propositional questions, ordering
7.8 Conversational Systems 383

simple actions or expressing certain attitudes. The facial expression is especially


frequently used for feedback (back-channeling) or emotional reactions. Speech is
normally the most precise tool for expressing complex intentions. Prosody may act
on different levels, facilitating (or impeding) lexical and syntactic processing,
expressing feelings and attitudes, organizing higher levels of discourse.’ The
challenge lies mostly when attempting to analyze the meaning of utterances and
emotions as a whole in the face of the multimodal perception of communicative
signals or contextual cues.
Among the key criteria against which models of human communication
implemented into conversational systems should be evaluated include compre-
hensiveness, robustness, coherence, and fidelity with real-world phenomena. In
relation to the latter, one important element of human communication behavior that
should be taken into account when modeling and building conversational systems is
that certain communication channels, especially those provided by facial move-
ments and prosody are frequently used unconsciously, although communication is
an intended act in many of its aspects. Karpinski (2009, pp. 168–169) points out, in
reference to DiaGest project, ‘it is still necessary to investigate the status and
communicational relevance of a number of unintentional, uncontrolled signals
produced during the process of communication and influencing the mental (infor-
mational) state of the observer.’ Adding to non-intentionality and uncontrollability
of communication behavior is the synergy underlying multimodality (and
multi-channeling), that is, the interaction of two or more modalities or channels, so
that their combined effect as to the contribution to the meaning of utterances and
emotions is greater than the sum of their separate effects.

7.8.15 Applications of SAIBA Framework:


Text- and Speech-Driven Facial Gestures Generation

The SAIBA framework with a division between communicative function and


behavior can be utilized within different application domains, particularly in rela-
tion to AmI and AI. In Vilhjálmsson (2009) two different applications where FML
and BML play an important role are described. The first application, as illustrated in
Fig. 7.4, is real-time computer-mediated communication in which users can com-
municate with each other over a communication channel that is narrower than a
typical face-to-face conversation, e.g., a user may send written messages to a friend
over instant messaging. The mediating system could analyze the sender’s message
upon its arrival and looks for various nonverbal devices (i.e., facial gestures, hand
gestures) that have been shown to be associated with particular communicative
functions and annotate these communication functions in the textual message using
FML. At the recipient’s computer, a Generation Module can look at the written
message along with the function annotation—the annotated text, and applies
mapping rules to produce BML that carries out the communication functions and
384 7 Towards AmI Systems Capable …

Fig. 7.4 Communicative function annotated in a real-time chat message helps produce an
animated avatar that augments the delivery. Source Vilhjálmsson (2009)

thereby generates a realization of that message that best supports the intended
communication. In this case, the Generation Module could deliver the message as if
it was being spoken by the avatar, if the sender has also an animated one, and
produce all the supporting nonverbal behavior according to the FML to BML
mapping rules. The author claims that the performance of avatars can even be
personalized or tailored based on the recipient’s local or cultural setting. This
implies that the mapping rules would be applied into the agent embodiment taking
into account the cultural context, and that the same communicative function
associated with the message will be instantiated or realized in two different ways,
using a combination of verbal and nonverbal signals that correspond to commu-
nication rules of that local setting.
The second application, as shown in Fig. 7.5, is a classic ECA where a human
user interacts with a graphical representation of the agent on a large wall-size
display. In this application, following the description of the multimodal input
received from the user using something like BML, a special Understanding Module
interprets the behavior in the current context and generates an abstract description
of the user’s communicative intent in FML specification. The agent’s decisions
about how to respond are made at the abstract level inside a central Decision
Module, which are similarly described in FML. Finally, a Generation Module
applies mapping rules to produce BML (behavior realization) that carries out the
agent’s intended functions visible to the human user, using the current context
(situation and culture).
The author points out that creating an agent in this fashion has some advantages,
one of which is isolating the abstract decision making module, which can be quite
complex, from the surface form of behavior, both on the input side and the output
side. He adds that it may be easier to tailor the agent’s interaction to different
cultural settings or use different means for communication, e.g., phoning a user
7.8 Conversational Systems 385

Fig. 7.5 An embodied conversational agent architecture where the central decision module only
deals with an abstract representation of intent. Source Vilhjálmsson (2009)

instead of a face-to-face interaction. It is important to note that this application is of


high relevance to AmI systems that can engage in an intelligent dialog or mingle
socially with human users through. For example, a conversational agent that per-
sonifies the user computer interface in the form of an animated person, a graphical
representation of the agent, and present interactions in a conversational form.
The application of the mapping rules to produce BML that carries out the
communication functions is done based on situational and sociocultural contexts in
relation to both applications. This entails, in addition to computational linguistics,
computational pragmatics and sociolinguistics. The purpose of the research within
these two areas is to create methods for intention recognition/meaning interpretation
and generation of multimodal communicative behavior based on various situational
and sociocultural contexts. The combination of recent discoveries in psycholin-
guistics, pragmatics, sociolinguistics, and sociocultural linguistics—that make it
possible to acquire a better understanding of the functional processes of
socio-cognitive interaction (with the aim to replicate them in computer systems),
and the breakthroughs at the level of the enabling technologies make it increasingly
possible to build believable conversational systems based on this understanding.
Sociocultural linguistics is concerned with the study of language in its sociocultural
context. As a broad range of theories and methods, sociocultural linguistics has
emerged and developed as a response to the increasingly narrow association of
sociolinguistics with the quantitative analysis of linguistic features and their cor-
relation to sociological variables. William Labov, one of the leading scholars of
sociolinguistics is noted for introducing the quantitative study of language variation
and change (Paolillo 2002) making the sociology of language into a scientific
discipline. Sociocultural linguistics emphasizes particularly an awareness of the
necessity for interdisciplinary scholarly approaches to society, culture and lan-
guage. It draws from diverse disciplines, such as sociolinguistics, linguistic
386 7 Towards AmI Systems Capable …

anthropology, and sociology of language, as well as some streams of social psy-


chology, social theory, discourse analysis, and the philosophy of language (see,
e.g., Bucholtz and Hall 2005).
In addition, one of the significant challenges for building ECAs as believable
human representatives is to generate a full facial animation, involving facial ges-
tures, visible correlates of prosody, explicit verbal displays, and explicit emotional
displays. Considerable research is being carried out on the topic of the relationship
between emotional states and facial displays (see next chapter for more detail).
However, a full facial animation is of critical relevance to both classic ECAs where
human users interact with a graphical representation of virtual beings as well as
AmI systems which can engage in intelligent dialog with human users.

7.8.16 Towards Full Facial Animation

Facial animation has recently been under a vigorous investigation in the creation of
ECAs systems with a graphical embodiment that can, by analyzing text input or
natural speech signals, drive a full facial animation. The goal is to build believable
virtual human representatives. Towards this end, it is critical for ECAs to implement
facial gestures, facial expressions, orofacial articulatory gestures (visible correlates
of prosody), and lip-movement associated with visemes. Research shows that there
is much work to be done as to speech-driven facial gestures and nonverbal speech
full facial animation. As stated in Zoric et al. (2009), there is a considerable literature
(e.g., Pelachaud et al. 1996; Graf et al. 2002; Cassell 1989; Bui et al. 2004; Smid
et al. 2004) on the systems that use text input to drive facial animation and that
incorporate facial and head movements as well as lip movements. However, pro-
ducing lip movements does not help as to naturalness of the face. Indeed, there exist
many systems (e.g., Zoric 2005; Kshirsagar and Magnenat-Thalmann 2000; Lewis
1991; Huang and Chen 1998; McAllister et al. 1997) that although they are capable
of producing correct lip synchronization from speech signal, they miss ‘natural
experience of the whole face because the rest of the face has a marble look’ (Zoric
et al. 2009). Existing systems that attempt to generate facial gestures by only ana-
lyzing speech signal mainly concentrate on a particular gesture or general dynamics
of the face), and related state-of-the-art literature lacks method for automatically
generating a complete set of facial gestures (Zoric et al. 2009). Zoric et al. (2009)
mention a set of research works and elaborate briefly on how they expand on each
other in relation to the generation of head movements based on recent evidence that
demonstrate that pitch contour (F0) as an audio feature is correlated with head
motions. They additionally introduce other systems that use speech features to drive
general facial animation. Examples of works involving such systems include Brand
(1999), Gutierrez-Osuna et al. (2005), Costa et al. (2001), and Albrecht et al. (2002).
The first work learns the dynamics of real human faces during speech
using two-dimensional image processing techniques. This work incorporates
lip movements, co-articulation, and other speech-related facial animation.
7.8 Conversational Systems 387

The second work learns speech-based orofacial dynamics from video and generates
facial animation with realistic dynamics. In the third work, the authors propose a
method to map audio features to video analyzing only eyebrow movements. In the
latter work, the authors introduce a method for automatic generation of several facial
gestures from speech, including ‘head and eyebrow raising and lowering dependent
on the pitch; gaze direction, movement of eyelids and eyebrows, and frowning
during thinking and word search pauses; eye blinks and lip moistening as punctu-
ators and manipulators; random eye movement during normal speech.’

7.8.17 Speech-Driven Facial Gestures Based on HUGE


Architecture: an ECA Acting as a Presenter

In a recent work dealing with ECAs that act as presenters, Zoric et al. (2009)
attempt to model correlation between (nonverbal) speech signals and occurrence of
facial gestures, namely head and eyebrow movements and blinking during speech
pauses; eye blinking as manipulators; and amplitude of facial gestures dependent on
speech intensity. To generate facial gestures, they extract the needed information
from speech prosody, through analyzing natural speech in real-time. Prosodic
features of speech are taken into consideration given the abundance of their func-
tions, including, to iterate, expressing feelings and attitudes; contributing to topic
identification processes and turn taking mechanisms in conversational interactions;
and reflecting various features of utterance pertaining to statements, questions,
commands or other aspects of language that may not be grammatically and lexically
encoded in the spoken utterances. Moreover, their work, which aims to develop a
system for full facial animation driven by speech signals in real-time, is based on
their previously developed HUGE architecture for statistically based facial ges-
turing, and, as pointed out by the authors, extends their previous work on automatic
real-time lip synchronization, which takes speech signal as input and carry out
audio to visual mapping to produce visemes. The components of the system which
is based on the speech signal as a special case of HUGE architecture are illustrated
in Fig. 7.6. The adaption of HUGE architecture to speech signal as inducement
involves the following issues: definition of audio states correlated with specific
speech signal features; implementation of the automatic audio state annotation and
classification module; and integration of the existing Lip Sync system.
For further information on HUGE architecture, supervised learning method
using statistical modeling and reasoning, facial gesture generation and related
issues, and other technical aspects of the project, the reader is directed to the
original document. Figure 7.7 shows snapshots from a facial animation generated
from nonverbal speech signal.
The authors said that their system is still in an early stage, and, as part of future
research, they were planning to add head and eyebrow movements correlated with
388 7 Towards AmI Systems Capable …

Fig. 7.6 Universal architecture of HUGE system adapted to audio data as inducement. Source
Zoric et al. (2009)

Fig. 7.7 From left to right: neutral pose, eyebrow movement, head movement, and eye blink.
Source Zoric et al. (2009)

pitch changes, as well as eye gaze since it contributes a lot to naturalness of the
face. They moreover intend to integrate as many of the rules found in literature on
facial gestures as possible. They state that to have a believable human represen-
tative it is of import to implement, in addition to facial gestures, verbal and emo-
tional displays. They also mention that the evaluation is an important step in
building a believable virtual human. Indeed, it is crucial to carry out a detailed
evaluation of ECAs in terms of the underlying components, namely constructs,
models, methods and instantiations. See Chap. 3 for more detail on the evaluation
of computational artifacts and related challenges. In the context of ECAs, it is as
important to scrutinize evaluation methods for assessing the different components
7.8 Conversational Systems 389

underlying such artificial artifacts as to evaluate what these artifacts embody and
their instantiations. As noted by Tarjan (1987), metrics must also be scrutinized by
experimental analysis. This relates to meta-evaluation, evaluation of evaluations,
whereby metrics define what the evaluation research try to accomplish with regard
to assessing the evaluation methods designed for evaluating how well ECAs can
perform. Periodic scrutiny of these metrics remains necessary to enhance such
methods as the research evolves within ECA community; varied evaluation
methods can be studied and compared.

7.9 Challenges, Open Issues, and Limitations

Although the research on ECAs has made a progress with regard to receiving,
interpreting, and responding to multimodal communicative behavior, it still faces
many challenges and open issues relating to system engineering and modeling that
need to be addressed and overcome in order to achieve the goal of building virtual
humans or online beings. These challenges and open issues include, and are not
limited to:
• paradigms that govern the assembly of ECA systems;
• principles and methodologies for engineering computational intelligence;
• general approaches to modeling, understanding, and generating multimodal
verbal and nonverbal communication behavior, with an interaction of data
analysis techniques and ontologies;
• techniques and models of the knowledge, representation, and run-time behavior
of ECA systems;
• the performance of ECA systems given that they need to act in a (nearly)
real-time fashion, immediately and proactively responding to spoken and ges-
tured signals;
• enabling proactivity in ECA systems through dynamic learning and real-time
and pre-programed heuristics reasoning;
• evaluation techniques of ECA systems; and
• programing of conversational multimodal interfaces and prototyping software
systems.
The way cognitive, emotional, neurological, physiological, behavioral, and
social processes as aspects of human functioning are combined, synchronized, and
interrelated is impossible, at the current stage of research, to mimic and model in
computer systems. Human communication is inherently complex and manifold with
regard to the use, comprehension, and production of language. Advanced discov-
eries in the area of computational intelligence will be based on the combination of
knowledge from linguistics, psycholinguistics, neurolinguistics, cognitive linguis-
tics, pragmatics, and sociolinguistics, as well as the cultural dimension of speech-
accompanying facial, hand, and corporal gestures. It is crucial to get people together
from these fields or working on cross connections of AmI with these fields to pool
390 7 Towards AmI Systems Capable …

their knowledge and work collaboratively. Modelers in these fields must become
interested in conversational and dialog systems associated with AmI research as a
high-potential application area for their models. Otherwise the state-of-the art in
related models, albeit noteworthy, will not be of much contribution to the
advancement of conversational systems towards achieving a full potential. In fact,
research in ECA has just started to emphasize the importance of pooling knowledge
from different growing groups of researchers pertaining to various full agent sys-
tems in order to construct virtual humans capable of mingling socially with human
users. As pointed out by Vilhjálmsson (2009, p. 48, 57), ‘Building a fully functional
and beautifully realized embodied conversational agent that is completely autono-
mous, is in fact a lot more work than a typical research group can handle alone. It
may take individual research groups more than a couple of years to put together all
the components of a basic system [technically speaking only], where many of the
components have to be built from scratch without being part of the core research
effort… Like in all good conspiracy plots, a plan to make this possible is already
underway, namely the construction of a common framework for multimodal
behavior generation that allows the researchers to pool their efforts and speed up
construction of whole multimodal interaction systems.’
Linguistic subareas such as computational linguistics, psycholinguistics, and
neurolinguistics have contributed significantly to the design and development of
current conversational and dialog acts systems. For example, computational lin-
guistics has provided knowledge and techniques for computer simulation of
grammatical models for the generation and parsing of sentences and computational
semantics, including defining suitable logics for linguistic meaning representations
and reasoning. However, research in the area of computational pragmatics and
computational sociolinguistics is still in its infancy, and therefore there is much
work to be done to implement pragmatic and sociolinguistic capabilities (compe-
tences) into conversational systems. Modeling pragmatic and sociolinguistic com-
ponents of language into artificial conversational systems is associated with
enormous challenges. As mentioned earlier, a few research institutions are currently
carrying out research within the areas of computational pragmatics and computa-
tional sociolinguistics—computational modeling of interactive systems in terms of
dialog acts, intention recognition/pragmatics, and interpretation and generation of
multimodal communicative behavior in different sociolinguistic contexts and based
on different pragmatic situations. Most work in building conversational systems is
becoming increasingly interdisciplinary in nature, involving knowledge from across
the fields of linguistics, psycholinguistics, neurolinguistics, computational linguis-
tics, computational pragmatics, computational sociolinguistics, cognitive science,
and speech-accompanying facial and hand gestures. In particular, taking into
account sociocultural and situational contexts in understanding and conveying
meaning, that is, the way in which such contexts contribute to meaning, is of high
importance to building successful ECA systems. In other words, socio-linguistic
and pragmatic components are very critical in order to create AmI systems that can
engage in an intelligent dialog or mingle socially with human users. However, to
formally capture such dimensions in natural language modeling is no easy task,
7.9 Challenges, Open Issues, and Limitations 391

unlike grammatical, semantic, and phonetic competences whose knowledge


domains seem relatively feasible to model and implement into computers, owing to
recent advances in computational modeling and simulation technologies as well as
theoretical computational linguistics. Sociolinguistic and pragmatic dimensions of
language entail complex patterns of meaning-making and other intricacies associ-
ated with situated cognition and action, in that the circumstances of our
socio-cognitive interactions are never fully anticipated and continuously evolve
around us, and as a result our interactions are never planned in the strong sense that
AI, AmI, and cognitive science would have it. Taking meaning construction and
interaction situatedness into account is from an engineering and computing point of
view and within the constraints of existing enabling technologies is a quite strange
switch to make. In fact, computer systems lacking the capacity to repond to
unanticipated circumstances and to understand meaning humans ascribe to com-
munication acts are what make (current) conversational systems and humans as
interactive entities fundamentally different. Therefore, building conversational
systems require a higher abstraction level of conceptualization as well as novel
engineering and computing paradigms.
As part of the pursuit of sharing knowledge and research findings through
existing researcher groups pooling their efforts and harnessing their collaborative
endeavors, new research groups should be formed to conduct empirical investiga-
tions that can produce insight on essential aspects of multimodal communication
behavior in relation to pragmatics and sociolinguistics and discover to which extent
nonverbal behavior, in particular, may be captured in formal categories that an
algorithm can be trained to recognize and respond to as part of communicative
function and behavior of conversational systems. Research questions that need to be
investigated in this regard should focus on the analysis of multimodal communi-
cation in different cultural settings, social contexts, and communication situations.
Advanced knowledge in such subfields of computational linguistics is needed in
order to overcome the challenges relating to developing and modeling language
communicative capabilities (competences) into conversational systems pertaining to
the understanding of the meaning of verbal and nonverbal signals as well as their
generation. Modeling theoretical models of pragmatics and sociolinguistics into
conversational systems can significantly contribute to creating successful believable
virtual humans that are capable to mingle socially with humans. Especially,
advanced understanding of pragmatics will enable ECA researchers to add subtle
rules to the perception and production of speech and much of its paralinguistic
aspects. There is so much to explore in this ambit. Pragmatics is associated with the
role and contribution of context as to speech meaning and act. It can provide fertile
insights into how the meaning of utterances relate to the context they are spoken in,
how and why they are used in particular situations to convey particular meaning or
information. From a different perspective, knowledge about the context can provide
important cues to resolve communication errors triggered by the nuances and
subtleties surrounding face-to-face conversations, by enabling conversational sys-
tems to recognize and resolve such errors in action. Indeed, effective communi-
cation in face-to-face conversation relays greatly on the ability to identify and
392 7 Towards AmI Systems Capable …

resolve communication errors. However, realizing this computational task reliably


appears at the current stage of research close to impossible. A number of subtasks
for equipping conversational systems with the capability to recognize and resolve
communication error are not solved yet. It has been argued by computer linguistics
that interactive computer systems do not have solutions to detect communication
errors (e.g., Hayes and Reddy 1983), not to mention resolve them. Unlike inter-
active computer systems, humans are flexible as to finding different ways or
combining them in a dynamic way to solve communication problems. In other
words, they can come up with different solutions to the same problem, and when
there is no alternative left or all else fails, they can fall back on commonsense.
Whereas computers lack a certain amount of common sense (Dreyfus 2001). In a
dialog, short term misunderstandings and ambiguities (included in many conver-
sations) are resolved by the communication participants; frequently ‘ambiguities are
rephrased and put into the conversation again to get clarity by reiteration of the
issue'; ‘misunderstandings are often detected by the monitoring the response of the
communication partner’; and in the case of ‘a misinterpretation issues are repeated
and corrected’ (Schmidt 2005, pp. 162–163).
Currently, research in human communication is highly specialized, deeply
separated into a plethora of subfields that often fail to connect with each other.
There is a rough shattering of verbal communication (linguistics) and nonverbal
communication, in subfields such as grammar, phonetics, semantics, pragmatics,
paralanguage, and gestures in addition to cognitive linguistics, psycholinguistics,
sociolinguistics, and neurolinguistics, all with their own sub-subfields that barely
have anything to communicate to each other. This is likely to carry over its effects
to the application of human communication in the research area of AI or AmI.
Without a unified verbal and nonverbal framework, it would be difficult to grasp
formally what human communication entails based on a theoretically clear overall
approach as part of the progress in the field. The current reality is that progress is
rapid but seems to be ad-hoc when new techniques become available. However,
without doubt, there is still a vast unexplored zone in the area of human commu-
nication from a variety of perspectives. Besides, modeling of human communica-
tion theories in computer technology is still in its infancy, as research shows from a
few verbal and nonverbal theories being investigated and implemented in relation to
AmI, AI, and HCI as subareas of computing. Indeed, a complete model of
face-to-face conversation between an ECA system and a human user, with a full
facial animation (speech-driven facial gestures) as well as facial emotional displays
and facial explicit verbal displays and messages, seems to be unfeasible at the
current stage of research. Practical attempts—existing systems—do not go beyond
the recognition and generation of a particular facial gesture from a certain prosodic
feature of speech, a set of particular emotional displays, or general dynamics of the
face, as mentioned earlier. Moreover, existing systems cannot fully perceive and
produce speech in real-time manner. Methods, techniques, and algorithms used in
lab-based conversational systems to perceive and produce spoken language have
not matured enough and thus do not enable an advanced form of speech commu-
nication with human users. Accordingly, there is a lot more to mimic to augment
7.9 Challenges, Open Issues, and Limitations 393

communicative capabilities of ECA systems. In view of that, there is a long way to


go for ECA researchers to be able to capture language use in its full dimension of
human communication.
Natural language is of high complexity. The way language is used is extremely
intricate, inherently spontaneous, largely unconscious, and dynamically contextual,
situated. Language use involves a perfect coupling between high-level cognitive
processes and high-level sociocultural constructs. As such, it entails a combination
of psychological, neurological, intellectual, social, cultural, anthropological, and
historical dimensions. Thus, the ability to learn, understand, and produce language
is a complex process, regardless of whether the learner is a human or a nonhuman.
The European Council (2000, pp. 8–9) states: ‘Language systems are of great
complexity and the language of a large, diversified, advanced society is never
completely mastered by any of its users. Nor could it be, since every language is in
continuous evolution in response to the exigencies of its use in communication.’
Language evolution is regarded as a process, a mutual shaping process where
language, on the one hand, and communication, culture, and society, on the other
hand, are shaped at the same time, in response to social development, to changing
cultural norms and practices, and to the exigencies of use in communication.
Indeed, as one manifestation, the communicative content of semiosis (a process that
involves signs including the production of meaning) changes to adapt to different
contexts. Therefore, there is no such thing as a stable semiosis as signs and their
meanings are changing through use, and thus will never be fixed for longer period
of time; they are socially, culturally, and historically situated. The use of signs and
symbols in specific contexts influences and changes their meaning, and subse-
quently the meaning of signs and symbols might not be actual or last in deep time.
Synchronic linguistic studies aim to describe a language as it exists at a given time
while diachronic studies trace a language’s historical development. But, as literature
on language systems shows, there has been no comprehensive description of any
language as a formal system for the expression of meaning, and none of the
attempts undertaken to establish a standard form has ever done it in exhaustive
detail. Nevertheless, while one shies away from foreseeing what the future era of
AmI will bring, it is certain to be a very different world.

References

Abawajy JH (2009) Human-computer interaction in ubiquitous computing environments. Int J


Pervasive Comput Commun 5(1):61–77
Abercrombie D (1968) Paralanguage. Br J Disord Commun 3:55–59
Agabra J, Alvarez I, Brezillon P (1997) Contextual knowledge based system: a study and design in
enology. In: Proceedings of the international and interdisciplinary conference on modeling and
using context (CONTEXT-97), Federal University of Rio de Janeiro, pp 351–362
Ahlsén E (2006) Introduction to neurolinguistics. John Benjamins Publishing Company,
Amsterdam/Philadelphia
394 7 Towards AmI Systems Capable …

Albas DC, McCluskey KW, Albas CA (1976) Perception of the emotional content of speech: a
comparison of two Canadian groups. J Cross Cult Psychol 7:481–490
Albrecht I, Haber J, Seidel H (2002) Automatic generation of non-verbal facial expressions from
speech. In: Proceedings of computer graphics international (CGI2002), pp 283–293
Andersen PA (2004) The complete idiot’s guide to body language. Alpha Publishing, Indianapolis
Andersen P (2007) Nonverbal communication: forms and functions. Waveland Press, Long Grove
Angus D, Smith A, Wiles J (2012) Conceptual recurrence plots: revealing patterns in human
discourse. IEEE Trans Visual Comput Graphics 18(6):988–997
Arbib MA (2003) The evolving mirror system: a neural basis for language readiness. In:
Christiansen M, Kirby S (eds) Language evolution: the states of the art. Oxford University
Press, Oxford, pp 182–200
Arbib MA (2005) From monkey-like action recognition to human language: an evolutionary
framework for neurolinguistics. Behavioral Brain Sci 28(2):105–124
Argyle M (1988) Bodily communication. International Universities Press, Madison
Argyle M, Cook M (1976) Gaze and mutual gaze. Cambridge University Press, Cambridge
Argyle M, Ingham R (1972) Gaze, mutual gaze, and proximity. Semiotica 6:32–49
Argyle M, Ingham R, Alkema F, McCallin M (1973) The different functions of gaze. Semiotica
7:19–32
Bahl LR, Baker JK, Cohen PS, Jelinek F, Lewis BL, Mercer RL (1978) Recognition of a
continuously read natural corpus. In: Proceedings of the IEEE international conference on
acoustics, speech and signal processing, Tulsa, Oklahoma, pp 422–424
Banich MT (1997) Breakdown of executive function and goal-directed behavior. In: Banich MT
(ed) Neuropsychology: the neural bases of mental function. Houghton Mifflin Company,
Boston, MA, pp 369–390
Bänninger-Huber E (1992) Prototypical affective microsequences in psychotherapeutic interactions.
Psychother Res 2:291–306
Beattie G (1978) Sequential patterns of speech and gaze in dialogue. Semiotica 23:29–52
Beattie GA (1981) A further investigation of the cognitive interference hypothesis of gaze patterns.
Br J Soc Psychol 20(4):243–248
Beavin Bavelas J, Coates L, Johnson T (2002) Listener responses as a collaborative process: the
role of gaze. J Commun 52:566–580
Benoît C, Mohamadi T, Kandel S (1994) Effects of phonetic context on audio-visual intelligibility
of French. J Speech Hear Res 37:1195–1203
Beskow J, Granström B, House D (2006) Visual correlates to prominence in several expressive
modes. In: Proceedings of interspeech 2006—ICSLP, Pittsburg, pp 1272–1275
Binnie CA, Montgomery AA, Jackson PL (1974) Auditory and visual contributions to the
perception of consonants. J Speech Hear Res 17(4):619–630
Bledsoe WW, Browning I (1959) Pattern recognition and reading by machine. Papers presented at
the eastern joint IRE-AIEE-ACM computer conference on—IRE-AIEE-ACM’59 (Eastern),
ACM Press, New York, pp 225–232, 1–3 Dec 1959
Boë LJ, Vallée N, Schwartz JL (2000) Les tendances des structures phonologiques: le poids de la
forme sur la substance. In: Escudier P, Schwartz JL (eds) La parole, des modèles cognitifs aux
machines communicantes—I. Fondements, Hermes, Paris, pp 283–323
Brand M (1999) Voice puppetry. In: Proceedings of SIGGRAPH 1999, pp 21–28
Bucholtz M, Hall K (2005) Identity and interaction: a sociocultural linguistic approach. Discourse
Stud 7(4–5):585–614
Bui TD, Heylen D, Nijholt (2004) A Combination of facial movements on a 3D talking head. In:
Proceedings of computer graphics international
Bull PE (1987) Posture and gesture. Pergamon Press, Oxford
Burgoon JK, Buller DB, Woodall WG (1996) Nonverbal communication: the unspoken dialogue.
McGraw-Hill, New York
Burr V (1995) An introduction to social constructivism. Sage, London
Canale M, Swain M (1980) Theoretical bases of communicative approaches to second language
teaching and testing. Appl Linguist 1:1–47
References 395

Carr P (2003) English phonetics and phonology: an introduction. Blackwell Publishing,


Massachusetts
Cassell J (1989) Embodied conversation: integrating face and gesture into automatic spoken
dialogue systems. In: Luperfoy S (ed) Spoken dialogue systems. MIT Press, Cambridge
Cassell J, Douville B, Prevost S, Achorn B, Steedman M, Badler N, Pelachaud C (1994a)
Modeling the interaction between speech and gesture. In: Ram A, Eiselt K (eds) Proceedings of
the 16th annual conference of the cognitive science society. Lawrence Erlbaum Associates,
Publishers, Hillsdale, pp 153–158
Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Becket T, Douville B, Prevost S,
Stone M (1994b) Animated conversation: rule-based generation of facial expressions, gesture
and spoken intonation for multiple conversational agents. In: Proceedings of SIGGAPH, ACM
Special Interest Group on Graphics, pp 413–420
Cassell J, Bickmore T, Billinghurst M, Campbell L, Chang K, Vilhjálmsson H, Yan H (1999)
Embodiment in conversational interfaces: reactions. In: Proceedings of the SIGCHI conference
on human factors in computing systems: the CHI is the Limit, ACM, Pittsburgh, pp 520–527
Cassell J, Sullivan J, Prevost S, Churchill E (eds) (2000) Embodied conversational agents. MIT
Press, Cambridge
Cassell J, Bickmore T, Campbell L, Vilhjalmsson H, Yan H (2001) More than just a pretty face:
conversational protocols and the affordances of embodiment. Knowl-Based Syst 14:55–64
Castiello U, Paulignan Y, Jeannerod M (1991) Temporal dissociation of motor responses and
subjective awareness. Brain 114:2639–2655
Cavé C, Guaïtella I, Bertrand R, Santi S, Harlay F, Espesser R (1996) About the relationship
between eyebrow movements and F0 variations. In: Proceedings of international conference on
spoken language processing, ICSLP’96, Philadelphia, PA, pp 2175–2178
Chomsky N (1957) Syntactic structures. Mouton, The Hague
Chomsky N (1965) Aspects of the theory of syntax. MIT Press, Cambridge
Chomsky N (2006) Language and mind. Cambridge University Press, Cambridge
Chovil N (1991) Discourse-oriented facial displays in conversation. Research on Language and
Social Interaction 25:163–194
Clark H, Marshall C (1981) Definite reference and mutual knowledge. In Joshi A, Webber B,
Clark J, Yallop C, & Fletcher J (eds) An introduction to phonetics and phonology. Blackwell
Publishing, MA
Clark JE, Yallop C, Fletcher J (2007) An introduction to phonetics and phonology, 3rd edn.
Oxford, Blackwell
Costa M, Lavagetto F, Chen T (2001) Visual prosody analysis for realistic motion synthesis of 3D
head models. In: Proceedings of international conference on augmented, virtual environments
and 3D imaging, pp 343–346
Council of Europe (2000) Common European framework of reference for languages: learning,
teaching, assessment. Language Policy Unit, Strasbourg
Croft W, Cruse DA (2004) Cognitive linguistics. Cambridge University Press, Cambridge
Dahan D, Bernard JM (1996) Interspeaker variability in emphatic accent production in French.
Lang Speech 39(4):341–374
De Vito J (2002) Human essentials of human communication. Allyn & Bacon, Boston, MA
Dohen M (2009) Speech through the ear, the eye, the mouth and the hand. In: Esposito A,
Hussain A, Marinaro M, Martone R (eds) Multimodal signals: cognitive and algorithmic
issues. Springer, Berlin, Heidelberg, pp 24–39
Dohen M, Loevenbruck H (2004) Pre-focal rephrasing, focal enhancement and post-focal
deaccentuation in French. In: Proceedings of the 8th ICSLP, pp 1313–1316
Dohen M, Loevenbruck H (2005) Audiovisual production and perception of contrastive focus in
French: a multispeaker study. In: Proceedings of interspeech, pp 2413–2416
Dohen M, Loevenbruck H, Cathiard MA, Schwartz JL (2004) Visual perception of contrastive
focus in reiterant French speech. Speech Commun 44:155–172
Dohen M, Loevenbruck H, Hill H (2006) Visual correlates of prosodic contrastive focus in French:
description and inter-speaker variabilities. In: Proceedings of speech prosody, pp 221–224
396 7 Towards AmI Systems Capable …

Dreyfus H (2001) On the Internet. Routledge, London


Duncan S (1972) Some signals and rules for taking speaker turns in conversation. Journal of
Personal and Social Psychology 23:283–292
Ekman P (1979) About brows: emotional and conversational signals. In: von Cranach M, Foppa K,
Lepenies W, Ploog D (eds) Human ethology: claims and limits of a new discipline. Cambridge
Press University, Cambridge
Ekman P (1982) Emotions in the human Face. Cambridge University Press, Cambridge
Ekman P (1994) All emotions are basic. In: Ekman P, Davidson RJ (eds) The nature of emotion:
fundamental questions. Oxford University Press, Oxford
Ekman P, Friesen WV (1969) The repertoire of nonverbal behavior, categories, origins, usage, and
coding, semiotical. Semiotica 1:49–98
Evans V, Green M (2006) Cognitive linguistics: an introduction. Edinburgh University Press,
Edinburgh
Exline R, Winters L (1965) Effects of cognitive difficulty and cognitive style on eye contact in
interviews. In: Proceedings of the eastern psychological association, Atlantic City, NJ, pp 35–41
Fagot C, Pashler H (1992) Making two responses to a single object: exploring the central
bottleneck. J Exp Psychol Hum Percept Perform 18:1058–1079
Fehr BJ, Exline RV (1987) Social visual interaction: a conceptual and literature review. In:
Siegman AW, Feldstein S (eds) Nonverbal behavior and communication. Lawrence Erlbaum
Associates, Hillsdale, pp 225–326
Feyereisen P (1997) The competition between gesture and speech production in dual-task
paradigms. J Mem Lang 36(1):13–33
Field J (2004) Psycholinguistics: the key concepts. Routledge, London
Finch G (2000) Linguistic terms and concepts. Palgrave Macmillan, New York
Fisher CG (1968) Confusions among visually perceived consonants. J Speech Hear Res 11(4):
796–804
Fisher K (1997) Locating frames in the discursive universe. Sociological Research Online 2(3):
U40–U62
Floyd K, Guerrero LK (2006) Nonverbal communication in close relationships. Lawrence Erlbaum
Associates, Mahwah
Foucault M (1972) The archaeology of knowledge. Routledge, London
Fox A (2000) Prosodic features and prosodic structures: the phonology of suprasegmentals. OUP,
Oxford
Freeman DE, Freeman YS (2004) Essential linguistics: what you need to know to teach reading,
ESL, spelling, phonics, and grammar. Heinemann, Portsmouth, NH
Freitas-Magalhães A (2006) The psychology of human smile. University Fernando Pessoa Press,
Oporto
Fridlund AJ (1994) Human facial expression: an evolutionary view. Academic Press, San Diego
Fridlund AJ, Ekman P, Oster H (1987) Facial expressions of emotion. In: Siegman A, Feldstein S
(eds) Nonverbal behavior and communication. Lawrence Erlbaum, Hillsdale
Garman M (1990) Psycholinguistics: central topics. Routledge, London
Geeraerts D, Cuyckens H (eds) (2007) The Oxford handbook of cognitive linguistics. Oxford
University Press, New York
Gergen K (1985) The social constructionist movement in modern social psychology. Am Psychol
40(3):266–275
Goldin-Meadow S, Butcher C (2003) Pointing toward two-word speech in young children. In:
Kita S (ed) Pointing: where language, culture, and cognition meet. Lawrence Erlbaum
Associates, Hillsdale, pp 85–107
Graf HP, Cosatto E, Strom V, Huang FJ (2002) Visual prosody: facial movements accompanying
speech. In: Proceedings of AFGR, pp 381–386
Grant KW, Seitz PF (2000) The use of visible speech cues for improving auditory detection of
spoken sentences. J Acoust Soc Am 108(3):1197–1208
Gudykunst WB, Ting-Toomey S (1988) Culture and interpersonal communication. Sage
Publications Inc, California
References 397

Guerrero LK, DeVito JA, Hecht ML (eds) (1999) The nonverbal communication reader. Waveland
Press, Lone Grove, Illinois
Gumperz J (1968) The speech community. In: International encyclopedia of the social sciences.
Macmillan, London, pp 381–86. Reprinted In: Giglioli PP (ed) Language and Social Context.
Penguin, London, 1972, p 220
Gumperz J, Cook-Gumperz J (2008) Studying language, culture, and society: sociolinguistics or
linguistic anthropology? J Sociolinguistics 12(4):532–545
Gunes H, Piccardi M (2005) Automatic visual recognition of face and body action units. In:
Proceedings of the 3rd international conference on information technology and applications,
Sydney, pp 668–673
Gutierrez-Osuna R, Kakumanu PK, Esposito A, Garcia ON, Bojorquez A, Castillo JL, Rudomin I
(2005) Speech-driven facial animation with realistic dynamics. IEEE Trans Multimedia, 7(1)
Hall TA (2001) Phonological representations and phonetic implementation of distinctive features.
Mouton de Gruyter, Berlin and New York
Halle M (1983) On distinctive features and their articulatory implementation. Nat Lang Linguist
Theory 91–105
Halliday MAK, Hasan R (1976) Cohesion in English. Longman Publication Group, London
Hanna JL (1987) To Dance is human: a theory of nonverbal communication. University of
Chicago Press, Chicago
Hargie O, Dickson D (2004) Skilled interpersonal communication: research, theory and practice.
Routledge, Hove
Hayes PJ, Reddy RD (1983) Steps toward graceful interaction in spoken and written man-machine
communication. Int J Man Mach Stud I(19):231–284
Heylen D (2005) Challenges ahead: head movements and other social acts in conversations. In:
Halle L, Wallis P, Woods S, Marsella S, Pelachaud C, Heylen D (eds) AISB 2005, Social
intelligence and interaction in animals, robots and agents. The Society for the Study of
Artificial Intelligence and the Simulation of Behavior, Hatfield, pp 45–52
Heylen D, Kopp S, Marsella S, Pelachaud C, Vilhjálmsson H (2008) The next step Towards a
functional markup language. In: Proceedings of Intelligent Virtual Agents. Springer,
Heidelberg
Holden G (2004) The origin of speech. Science 303:1316–1319
Hollender D (1980) Interference between a vocal and a manual response to the same stimulus’. In:
Stelmach G, Requin J (eds) Tutorials in motor behavior. North-Holland, Amsterdam, pp 421–432
Honda K (2000) Interactions between vowel articulation and F0 control. In: Fujimura BDJO,
Palek B (eds) Proceedings of linguistics and phonetics: item order in language and speech
(LP’98)
Huang FJ, Chen T (1998) Real-time lip-synch face animation driven by human voice. In: IEEE
workshop on multimedia signal processing, Los Angeles, California
Hymes D (1971) Competence and performance in linguistic theory. In: Language acquisition:
models and methods, pp 3–28
Hymes D (2000) On communicative competence. In: Duranti A (ed.) Linguistic anthropology:
a reader. Blackwell, Malden, pp 53–73
Iverson J, Thelen E (1999) Hand, mouth, and brain: the dynamic emergence of speech and gesture.
J Consciousness Stud 6:19–40
Iverson J, Thelen E (2003) The hand leads the mouth in ontogenesis too. Behavioral Brain Science
26(2):225–226
Jacko A, Sears A (eds) (2003) The human-computer interaction handbook: fundamentals, evolving
technologies, and emerging applications. Lawrence Erlbaum Associates, Hillsdale
Jakobson R, Fant G, Halle M (1976) Preliminaries to speech analysis: the distinctive features and
their correlates. MIT Press, Cambridge
Johnson FL (1989) Women’s culture and communication: an analytical perspective. In: Lont CM,
Friedley SA (eds) Beyond boundaries: sex and gender diversity in communication. George
Mason University Press, Fairfax, pp 301–316
398 7 Towards AmI Systems Capable …

Kaiser S, Wehrle T (2001) Facial expressions as indicator of appraisal processes. In: Scherer KR,
Schorr A, Johnstone T (eds) Appraisal theories of emotions: theories, methods, research.
Oxford University Press, New York, pp 285–300
Kapur A, Kapur A, Virji-Babul N, Tzanetakis G, Driessen PF (2005) Gesture-based affective
computing on motion capture data. In: Proceedings of the 1st international conference on
affective computing and intelligent interaction, Beijing, pp 1–7
Karpinski M (2009) From Speech and Gestures to Dialogue Acts. In: Esposito A, Hussain A,
Marinaro M, Martone R (eds) Multimodal signals: cognitive and algorithmic issues. Springer,
Berlin, pp 164–169
Kendon A (1967) Some functions of gaze direction in social interaction. Acta Psychol 26:1–47
Kendon A (1980) Gesticulation and speech: two aspects of the process of utterance. In: Key MR
(ed) The relationship of verbal and nonverbal communication. Mouton, The Hague, pp 207–227
Kendon A (1990) Conducting interaction: patterns of behavior in focused encounters. Cambridge
University Press, New York
Kendon A (1997) Gesture. Ann Rev Anthropoly 26:109–128
Kendon A (2004) Gesture: visible action as utterance. Cambridge University Press, Cambridge
Kingston J (2007) The phonetics-phonology interface. In: DeLacy P (ed) The handbook of
phonology. Cambridge University Press, Cambridge, pp 253–280
Kita S (ed) (2003) Pointing: where language, culture, and cognition meet. Lawrence Erlbaum
Associates, Hillsdale
Kleck R, Nuessle W (1968) Congruence between the indicative and communicative functions of
eye-contact in interpersonal relations. Br J Soc Clin Psychol 7:241–246
Knapp ML, Hall JA (1997) Nonverbal communication in human interaction. Harcourt Brace, New
York
Knapp ML, Hall JA (2007) Nonverbal communication in human Interaction. Wadsworth, Thomas
Learning
Koike D (1989) Pragmatic competence and adult L2 acquisition: speech acts in interlanguage. The
Modern Language Journal 73(3):279–289
Kopp S, Krenn B, Marsella SC, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson
HH (2006) Towards a common framework for multimodal generation: the behavior markup
language. In: Gratch J, Young M, Aylett RS, Ballin D, Olivier P (eds) IVA 2006, LNCS, vol
4133. Springer, Heidelberg, pp 205–217
Kroy M (1974) The conscience, a structural theory. Keter Press Enterprise, Israel
Kshirsagar S, Magnenat-Thalmann N (2000) Lip synchronization using linear predictive analysis.
In: Proceedings of IEEE international conference on multimedia and exposition, New York
Langacker RW (1987) Foundations of cognitive grammar, theoretical prerequisites, vol 1. Stanford
University Press, Stanford
Langacker RW (1991) Foundations of cognitive grammar, descriptive application, vol 2. Stanford
University Press, Stanford
Langacker RW (2008) Cognitive grammar: a basic introduction. Oxford University Press, New
York
Lass R (1998) Phonology: an introduction to basic concepts. Cambridge University Press,
Cambridge (2000)
Lee, SP, Badler, JB, Badler, NI (2002) Eyes alive. In: Proceedings of the 29th annual conference
on computer graphics and interactive techniques 2002, ACM Press, New York, pp 637–644
Leech G (1983) Principles of Pragmatics. Longman, London
Levelt WJM, Richardson G, Heij WL (1985) Pointing and voicing in deictic expressions. J Mem
Lang 24:133–164
Lewis J (1991) Automated lip-sync: background and techniques. J Visual Comput Animation
2:118–122
Lippi-Green R (1997) The standard language myth. English with an accent: language,
ldeology, and discrimination in the United States. Routledge, London, pp 53–62
Littlejohn SW, Foss KA (2005) Theories of human communication. Thomson Wadsworth,
Belmont
References 399

Lyons J (1968) Introduction to theoretical linguistics. Cambridge University Press, London


Lyons J (1977) Semantics, vol 2. Cambridge University Press, London
MacLachlan J (1979) What people really think of fast talkers. Psychol Today 13:113–117
MacLeod A, Summerfield AQ (1987) Quantifying the contribution of vision to speech perception
in noise. Br J Audiol 21:131–141
Mairesse F (2011) Controlling user perceptions of linguistic style: trainable generation of
personality traits. Comput Linguist 37(3):455–488
Marcus MP, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English:
the Penn Treebank. Comput Linguist 19(2):313–330
McAllister DF, Rodman RD, Bitzer DL, Freeman AS (1997) Lip synchronization of speech. In:
Proceedings of AVSP 1997
McGraw-Hill Science and Technology Encyclopedia (2007) Artificial intelligence, viewed 21 July
2012. http://www.answers.com/topic/artificial-intelligence
McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748
McNeill D (1992) Hand and mind. University of Chicago Press, Chicago
Mey JL (1993) Pragmatics: an introduction. Blackwell, Oxford
Moscovici S (1984) The Phenomenon of social representations. In: Farr R, Moscovici S
(eds) Social representations. Cambridge University Press, Cambridge, pp 3–69
Myers-Scotton C (2006) Multiple voices: an introduction to bilingualism. Blackwell Publishing,
Australia
Nakano YI, Reinstein G, Stocky T, Cassell J (2003) Towards a model of face-to-face grounding.
In: ACL 2003: Proceedings of the 41st annual meeting on association for computational
linguistics, association for computational linguistics, vol 1. Morristown, NJ, pp 553–561
Nichols J (1984) Functional theories of grammar. Annu Rev Anthropol 13:97–117
Ohala JJ (1984) An ethological perspective on common cross-language utilization of F0 of voice.
Phonetica 41:1–16
Ottenheimer HJ (2007) The anthropology of language: an introduction to linguistic anthropology.
Thomson Wadsworth, Kansas State
Pantic M, Rothkrantz LJM (2003) Toward an affect sensitive multimodal human-computer
interaction. Proc IEEE 91(9):1370–1390
Paolillo JC (2002) Analyzing Linguistic variation: statistical models and methods. CSLI
Publications, Stanford, CA
Pelachaud C, Badler N, Steedman M (1996) Generating facial expressions for speech. Cogn Sci 20
(1):1–46
Phillips J, Tan C (2010) ‘Competence’, the literary encyclopedia, viewed 12 July 2012. http://
courses.nus.edu.sg/course/elljwp/competence.htm
Pizzuto E, Capobianco M, Devescovi A (2005) Gestural-vocal deixis and representational skills in
early language development. Interaction Studies 6(2):223–252
Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech
recognition. Proc IEEE 77(2):257–286
Reisberg D, McLean J, Goldfield A (1987) Easy to hear but hard to understand: a lipreading
advantage with intact auditory stimuli. In: Dodd B, Campbell R (eds) Hearing by eye: the
psychology of lip-reading. Lawrence Erlbaum Associates, Hillsdale, pp 97–114
Robert-Ribes J (1995) Modèles d’intégration audiovisuelle de signaux linguistiques: de la
perception humaine à la reconnaissance automatique des voyelles. Ph.D. thesis, Institut
National Polytechnique de Grenoble
Romaine S (1994) Language in society: an introduction to sociolinguistics. Oxford UP, Oxford
Rowe BM, Levine DP (2006) A Concise introduction to linguistics. Pearson Education, USA
Russell JA, Fernández-Dols JM (1997) What does a facial expression mean? In: Russel JA,
Fernández-Dols JM (eds) The psychology of facial expression. Cambridge University Press,
Cambridge, pp 3–30
Salvachua J, Huecas G, Rodriguez B, Quemada J (2002) Modelling a distributed multimedia
conference with rdf. In: Proceeding of the international semantic web conference, Sardinia,
Italia
400 7 Towards AmI Systems Capable …

Samtani P, Valente A, Johnson WL (2008) Applying the SAIBA framework to the tactical
language and culture training system. In: Parkes P, Parsons M (eds) The 7th international
conference on autonomous agents and multiagent systems (AAMAS 2008), Estoril, Portugal
Scherer KR (1992) What does facial expression express? In: Strongman K (ed) International
review of studies on emotion, vol 2, pp 139–165
Scherer KR (1994) Plato’s legacy: relationships between cognition, emotion, and motivation,
University of Geneva
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam, pp 159–178
Schwartz JL (2004) La parole multisensorielle: Plaidoyer, problèmes, perspective. Actes des
XXVes Journées d’Etude sur la Parole JEP 2004, pp xi–xviii
Schwartz JL, Robert-Ribes J, Escudier P (1998) Ten years after summerfield: a taxonomy of
models for audiovisual fusion in speech perception. In: Campbell R, Dodd BJ, Burnham D
(eds) Hearing by eye II: advances in the psychology of speech reading and auditory-visual
speech. Psychology Press, Hove, pp 85–108
Schwartz JL, Berthommier F, Savariaux C (2004) Seeing to hear better: evidence for early
audio-visual interactions in speech identification. Cognition 93:B69–B78
Segerstrale U, Molnar P (eds) (1997) Nonverbal communication: where Nature meets culture.
Lawrence Erlbaum Associates, Mahwah
Short JA, Williams E, Christie B (1976) The social psychology of telecommunications. Wiley,
London
Siegman AW, Feldstein S (eds) (1987) Nonverbal behavior and communication. Lawrence
Erlbaum Associates, Hillsdale
Smid K, Pandzic IS, Radman V (2004) Autonomous speaker agent. In: Computer animation and
social agents conference CASA 2004, Geneva, Switzerland
Sperber D, Wilson D (1986) Relevance: communication and cognition. Blackwell, Oxford
Stemmer B, Whitaker HA (1998) Handbook of neurolinguistics. Academic Press, San Diego, CA
Stetson RH (1951) Motor phonetics: a study of speech movements in action. Amsterdam,
North-Holland
Sumby WH, Pollack I (1954) Visual contribution to speech intelligibility in noise. J Acoust Soc
Am 26(2):212–215
Summerfield AQ (1979) Use of visual information for phonetic perception. Phonetica 36:314–331
Summerfield Q (1987) Comprehensive account of audio-visual speech perception. In: Dodd B,
Campbell R (eds) Hearing by eye: the psychology of lip-reading. Lawrence Erlbaum
Associates, Hillsdale, pp 3–51
Takimoto M (2008) The effects of deductive and inductive instruction on the development of
language learners’ pragmatic competence. Mod Lang J 92(3):369–386
Tarjan RE (1987) Algorithm design. Commun ACM 30(3):205–212
ten Bosch L, Oostdijk N, de Ruiter JP (2004) Durational aspects of turn-taking in spontaneous
face-to-face and telephone dialogues. In Sojka P, Kopecek I, Pala K (eds) TSD 2004, LNCS,
vol 3206. Springer, Heidelberg, pp 563–570
ter Maat M, Heylen D (2009) Using context to disambiguate communicative signals. In:
Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals, LNAI 5398.
Springer, Berlin, pp 164–169
Thorisson KG (1997) An embodied humanoid capable of real-time multimodal dialogue with People.
In: The 1st international conference on autonomous agents, ACM, New York, pp 536–537
Truss L (2003) Eats, shoots and leaves—the zero tolerance approach to punctuation. Profile Books
Ltd, London
Turing AM (1950) Computing machinery and intelligence. Mind 59(236):433–460
van Hoek K (2001) Cognitive linguistics. In: Wilson RA, Keil FC (eds) The MIT encyclopedia of
the cognitive sciences
References 401

Vilhjálmsson HH (2009) Representing communicative function and behavior in multimodal


communication. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals:
cognitive and algorithmic issues. Springer, Berlin, pp 47–59
Vilhjálmsson HH, Stacy M (2005) Social performance framework. In: Workshop on modular
construction of human-like intelligence at the 20th national AAAI conference on artificial
intelligence, AAAI
Vilhjálmsson HH, Cantelmo N, Cassell J, Chafai NE, Kipp M, Kopp S, Mancini M, Marsella SC,
Marshall AN, Pelachaud C, Ruttkay Z, Thórisson KR, van Welbergen H, van der Werf RJ
(2007) The behavior markup language: recent Developments and challenges. In: Pelachaud C,
Martin JC, EAndré, Chollet G, Karpouzis K, Pelé D (eds) IVA 2007, LNCS, vol 4722.
Springer, Heidelberg, pp 99–111
Volterra V, Caselli MC, Capirci O, Pizzuto E (2005) Gesture and the emergence and development
of language. In: Tomasello M, Slobin D (eds) Elizabeth Bates: a festschrift. Lawrence Erlbaum
Associates, Mahwah, pp 3–40
Vyvyan E (2007) A glossary of cognitive linguistics. Edinburgh University Press, Edinburgh
Vyvyan E, Green M (2006) Cognitive linguistics: an introduction. Edinburgh University Press,
Edinburgh
Vyvyan E, Bergen B, Zinken J (2007) The Cognitive linguistics reader. Equinox, London
Wardhaugh R (2005) An introduction to sociolinguistics. Wiley, Hoboken
Watt R (1995) An examination of the visual aspects of human facial gesture. In: Emmot S
(ed) Information superhighways: multimedia users and futures. Academic Press, London
Weizenbaum J (1966) ELIZA—a computer program for the study of natural language
communication between man and machine. Commun ACM 9(1):36–45
Yehia H, Kuratate T, Vatikiotis-Bateson E (2000) Facial animation and head motion driven by
speech acoustics. In Hoole P (ed) 5th Seminar on speech production: models and data, Kloster
Seeon
Zoric G (2005) Automatic lip synchronization by speech signal analysis. Master thesis, Faculty of
Electrical Engineering and Computing, University of Zagreb
Zoric G, Smid K, Pandzic IS (2009) Towards facial gestures generation by speech signal analysis
using HUGE architecture. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal
signals: cognitive and algorithmic issues. Springer, Berlin, pp 112–120
Chapter 8
Affective Behavioral Features of AmI:
Affective Context-Aware, Emotion-Aware,
Context-Aware Affective, and Emotionally
Intelligent Systems

8.1 Introduction

AmI aims to take the emotional dimension of users into account when designing
applications and environments. One of the cornerstones of AmI is the adaptive and
responsive behavior of systems to the user’s emotional states and emotions,
respectively. Technology designs, which can touch humans in sensible ways, are
essential in addressing affective needs and ensuring pleasant and satisfying user
interaction experiences. In recent years there has thus been a rising tendency in AI
and AmI to enhance HCI by humanizing computers making them tactful, sympa-
thetic, and caring in relation to the feelings of the humans. One of the current issues
in AI is to create methods for efficient processing (in-depth, human-like analysis) of
emotional states or emotions in humans. Accordingly, a number of frameworks that
integrate affective computing (a research area in AI) and AmI have recently been
developed and applied across a range of domains. Including affective computing
paradigm within AmI is an interesting approach; it contributes to affective
context-aware and emotion-aware systems. Therefore, AmI researchers are explor-
ing human emotions and emotional intelligence (as abilities related to emotion) and
advancing research on emotion-aware and affective context-aware technology, by
amalgamating fundamental theoretical models of emotion, emotion-aware HCI, and
affective context-aware HCI. The importance and implication of this research
emanates from its potential to enhance the quality of people’s life. The premise is that
affective or emotion-aware applications can support users in their daily activities and
influence their emotions in a positive way, by producing emotional responses that
have positive impact on the users’ emotion and help them to improve their emotional
intelligence, i.e., abilities to understand, evaluate, and manage their emotions and
those of others, as well as to integrate emotions to facilitate their cognitive activities
or task performance. Another interesting related aspect of AmI is the system feature
of social intelligence. As AmI is envisioned to become an essential part of people’s
social life, AmI systems should support social processes of human users and be

© Atlantis Press and the author(s) 2015 403


S.E. Bibri, The Human Face of Ambient Intelligence, Atlantis Ambient
and Pervasive Intelligence 9, DOI 10.2991/978-94-6239-130-7_8
404 8 Affective Behavioral Features of AmI …

competent agents in social interactions. Socially intelligent systems invoke positive


feelings in human users, by eliciting positive emotions and pleasant user experiences.
Positive emotions can be elicited by subjective experiences of interaction in terms of
smoothness, intuitiveness, and richness and emotional states can be triggered by
subjective, socially situated perception of aesthetics in terms of the affective quality
of computational artifacts and environments. The focus in this chapter is on emotions
as a key component of socially intelligent behavior of AmI systems. The role of
emotions as such on task performance—support of (cognitive) activities—is
addressed in the next chapter. Furthermore, the user’s emotional state is one a key
component of the user context that affective context-aware systems should accurately
identify, meaningfully interpret, and efficiently reason about to determine the most
appropriate response and act upon it, thereby delivering proper services that meet the
motivation behind those emotional states. Affective context-aware systems aim to
support users by intelligently adapting to their emotional states as articulated through
spoken and gestured indications. Verbal and nonverbal behavior is considered as the
most reliable source such systems can use to capture emotional elements of context,
by reading multimodal sources. This requires perceptual and multimodal user
interfaces. Affective display can provide a great deal of information as implicit input
to affective context-aware systems, as they indicate the users’ emotional states. It
constitutes a set of emotion channels carrying affective information, including vocal
cues, facial cues, and gestural cues.
Currently, there is a great variety of theoretical models of emotions that can
inform the design of affective context-aware and emotion-aware applications, and
there are different technologies that can be used for their implementation, including
sensor technologies, pattern recognition techniques, modeling techniques, and
query languages. These have made it possible to computationally detect, model,
analyze, and reason about emotional states in humans. This is related to affective
computing, which is an area that works on the detection of and response to user’s
emotions. However, the design and implementation of affective context-aware and
emotion-aware applications face many challenges, many of which are technical and
some of which are what might be described as behavioral and sociocultural. It is
recognized that emotions are culture-depended and thus there is no such thing as a
universal form of emotion expression—be it through emotiveness, emotional pro-
sodic features of speech, or affective display. Each culture has its own conventions
of communication, and hence individuals differ on the basis of their cultures and
languages as to expressing emotions. In relation to this, perception of interaction
and aesthetics is affected by individual, socially situated interpretation. Adding to
this are the challenges associated with system engineering, design, and modeling, as
addressed in the previous chapters.
The intent of this chapter is to examine and discuss the different aspects and
forms of the affective behavior of AmI systems, as well as to highlight the role of
affective computing as a research area of AI in AmI in advancing the field of AmI
with respect to emotionally human-inspired applications. Examples of HCI
application scenarios revealing important emerging trends in this research area
include: affective context-aware, emotion-aware, context-aware affective, and
emotionally intelligent systems.
8.2 Emotion 405

8.2 Emotion

8.2.1 Definitional Issues

The scientific study of emotion (nonverbal aspects) dates back to the late 1800s—
with Darwin’s (1872) earliest and most widely recognized work on emotional
expressions in humans. Emotion has been extensively researched and widely dis-
cussed. It has been an important topic of study throughout most of the history of
psychology (Lazarus 1991). However, after more than a century of scientific
research and theory development, there is a sharp disagreement on which traits
define the phenomenon of human emotion. There is still no definite definition of
emotion. The term is used inconsistently. And dictionary definitions of many terms
associated with the emotional system demonstrate how difficult it is to clearly
articulate what is meant by emotion. It has been perceived differently by different
scholars in terms of what it precisely consists of as dimensions. Scientists find it
very difficult to agree on the definition of emotion, although there is some con-
sensus that emotions are constituted by different components (Kleinginna and
Kleinginna 1981). There is indeed no way to completely describe an emotion by
knowing some of its components. Nevertheless, some psychologists have attempted
to converge on some key common aspects of emotions or rather the emotional
complex. In general, emotion can be described as a complex, multidimensional
experience of an individual’s state of mind triggered by both external influences as
well as internal changes. In psychology, emotion often refers to complex, subjective
experiences involving many components, including cognitive, arousal, expressive,
organizing, and physical, as well as highly subjective meanings. Emotions are
induced affective states (Russell 2003) that typically arise as reactions to important
situational events in one’s environment (Reeve 2005). They arise spontaneously in
response to a stimulus event and biochemical changes and are accompanied by
(psycho)physiological changes, e.g., increased heartbeat and outward manifestation
(external expression). With different degrees of intensity, individuals often behave
in certain ways as a direct result of their emotional state; hence, behavior is con-
sidered to be essential to emotion.
Emotion is very complex. This is manifested in what it involves as components
of and as a program in the brain. Emotions are biologically regulated by the
executive functions of the prefrontal cortex and involve important interactions
between several brain areas, including limbic system and cerebral cortex which has
multiple connections with the hypothalamus, thalamus, amygdale, and other limbic
system structures (Passer and Smith 2006). Neural structures involved in the cog-
nitive process of emotion operate biochemically, involving various neurotransmitter
substances that activate the emotional programs residing in the brain (Ibid).
Furthermore, emotion involves such factors as personality, motivation, mood,
temperament, and disposition in the sense of a state of readiness or a tendency to
behave in a specific way. It is a transient state of mind encompassing various
dynamic emotional processes evoked by experiencing (perceiving) different
406 8 Affective Behavioral Features of AmI …

sensations as a means to cope with the environment. Keltner and Haidt (1999)
describe emotions as dynamic processes that mediate the organism’s relation to a
continually changing social environment. Emotions orchestrate how we react
adaptively to the external environment, especially to the important events in our
lives. Specifically, emotional processes entail establishing, maintaining, or dis-
rupting the relation between the organism and the environment on matters of central
relevance to the individual (Campos et al. 1989). Emotions are thus strategies by
which they engage with the world (Solomon 1993).

8.2.2 Componential Patterning Approach

Modern emotion theories propose that emotion is an experience of a mental state


typified by the emotional complex—chains of events triggered by certain stimuli.
That is, once activated (cognitive appraisal processing over perception), emotions
arouse the body into action, generate motivational state, surface as expressive
behavior (overtly), and produce feelings. However, these chains of events are
interpreted differently by different theorists. That is, theorists in psychology tend to
diverge as to what precisely comprise the emotional complex. For David Myers, for
example, emotion involves ‘physiological arousal, expressive behaviors, and con-
scious experience’ (Myers 2004) or subjective feeling as termed by other theorists.
According to Galotti (2004), emotions are feeling states that involve patterns of
cognitive and behavioral reactions to events. Following the appraisal of the per-
ceived information through interpretation and meaning attribution, an emotional
response occurs to give the information its significance (Braisby and Gellatly 2005).
Most theorists endorse the classic ‘reaction triad’ and others add motivational
state/action tendency and/or cognitive (appraisal) processing (Scherer 1993).
Scherer (1993) describe emotion as a sequence of interdependent, interrelated,
synchronized changes in the states of five organismic subsystems: the cognitive
system (appraisal), the autonomic nervous system (arousal), the motor system
(expression), the motivational system (action tendencies), and the monitor system
(feeling), in response to the evaluation of an external or internal stimulus event that
is of central relevance to the major needs, goals and concerns of the organism. This
model places emphasis on the notion of system synchronization or modal inter-
linking during emotion episodes (Scherer 1994). Involving five functionally defined
systems, these emotional processes entail information processing over perception,
regulation of internal states (to respond to the presentation of stimuli), decision
making over competing motives, the control of external expressive behavior and a
feedback system across these four processes or episodes. While there is no
agreement on how these components are organized during emotional arousal, how
many different emotions can be distinguished, and when and how emotion begins
and when it ends (Scherer 1993), there exist many theories in (cognitive) psy-
chology that attempt to organize some of these components. For example,
two-factor theory of emotion, the relationship between appraisal and physiological
8.2 Emotion 407

arousal, emphasizes that all emotional responses require some sort of appraisal,
whether we are aware of it or not (Passer and Smith 2006). It can be described as
the intensity of physiological arousal that tells us how strongly we are feeling
something, such as fear or frustration or some other emotion, but it is the situational
cues telling us which felling we are having that provide the information needed to
label that arousal (Ibid). Overall, appraisal theorists following a componential
patterning approach, as proposed by Frijda (1986), Roseman (1984), or Scherer
(1984) share the assumption that (a) emotions are elicited by a cognitive evaluation
of antecedent stimuli (situations and events) and that (b) the patterning of the
reactions in the different response domains, including physiology, action tenden-
cies, expression, and (subjective) feeling is determined by the outcome of this
evaluation process. Scherer (1994) argues that a componential patterning approach
is suitable to understand the complex interactions between various factors in the
dynamic unfolding of emotion. He also contends that ‘the apparent lack of
empirical evidence of component covariation in emotion is partly due to different
response characteristics of the systems concerned and the mixture of linear and
nonlinear systems’ and assures that concepts from nonlinear dynamic models are to
be adopted ‘to treat emotion as a turbulence in the flow of consciousness and make
use of catastrophe theory to predict sudden changes in the nature of the emotional
processes’ (Ibid).

8.2.3 Motivation and Its Relationship to Emotion

In most classical philosophical treatments of emotion, emotion is produced by


cognitive appraisal processes and ‘involves the activation of important antecedent
motives or goals (see Aristoteles, Descartes, Spinoza, Hume…)’ (Scherer 1994).
The component process model of Scherer (1986) postulates interactions between
cognitive, affective, and motivational processes for the antecedents, the unfolding,
and the consequences of emotion. Neuropsychological approaches (e.g., Damasio’s
1989 memory model) demonstrate an integrated, dynamic approach into the
interaction between cognition, emotion, and motivation (Scherer 1994). At neural
system level, the behavioral activation system (BAS) and behavioral inhibition
system (BIS) systems tie motivation and emotion together as the BAS links
approach motives with positive emotions and the BIS links avoidance motives with
negative emotions (Passer and Smith 2006). These two distinct neural systems
underlie the universal tendencies to maximize pleasure and minimize pain: BAS
and BIS (Gray 1991; Passer and Smith 2006). BAS regulates approach motivation
whereas BIS regulates avoidance motivation (Ibid). According to Gray (1991),
BAS is associated with positive need fulfillment, so its activity drives behaviors
towards goals that are associated with pleasure, which in turn produce position
emotions such as joy or delight, while BIS responds to stimuli relating to pain,
which produces negative emotions like frustration or fear.
408 8 Affective Behavioral Features of AmI …

Motives drive our behaviors, the forming of behavioral intentions, planning of


action, the initiation of action and eventually the translation of behavioral intentions
into actual behaviors. Motivation can be described as the driving force by which
individuals achieve their goals—in other words, it involves the reason for which
one chooses to act in a certain direction, which is shaped by one’s attitude. One’s
intentions capture the motivational factors that shape one’s behaviors. As a process,
motivation influences the direction, persistence, and vigor of goal-directed behavior
(Passer and Smith 2006). Motivation is said to be intrinsic or extrinsic (Gardner and
Lambert 1972). It entails ‘two clusters of sources: internal motives (needs, cogni-
tions, and emotions) and external events (environmental incentives)’ (Zhang 2008,
p. 145). Specifically, motives that drive our behaviors include: maximizing plea-
sure, minimizing avoidance, fulfilling curiosity, engaging by novelty, meeting
needs, or, as less-apparent reasons, attributing importance, meaning or value to
actions we intend to do. Motivation could be conscious or unconscious. It is argued
that a significant portion of human behavior is energized and directed by uncon-
scious motives. Or, a goal-directed behavior is determined by the strength of our
expectations that a particular behavior will lead to a goal and by the incentive value
we place on that goal (Brehm and Self 1989).
Motivation is related to, but distinct from, emotion. As mentioned above,
emotion is one type of motives and thus must impel our behaviors to achieve a
certain goal. Emotions are viewed as underlying forces and drives that directly
influence behavior (Freud 1975). In other words, emotion provides the affective
component to motivation and motivation directs behavior. Reeve (2005) points out
that emotion relates to motivation in two ways: (1) emotions are one class of
motives that direct and energize our behavior and (2) emotions serve as an ongoing
‘readout’ system to indicate how well or poorly our adaptation is going. As
expressed by Lazarus (1982), emotion is the result of an anticipated, experienced, or
imagined outcome of the patterns of adaptational transaction between the organism
and the environment. Emotional reactions, and thus subjective feelings, always
occur when our motives or goals are satisfied or dissatisfied. According to Lazarus
(2001), motivation and emotion are always linked because we react emotionally
only when our motives and goals are gratified or frustrated. The way people display
affective behaviors not only tells us about the intensity of their physiological
arousal, but also their motivational state in addition to their cognitive appraisal
patterns. Facial expressions are indicators of mental states and evaluation processes
(Kaiser and Wehrle 2001). Scherer (1992, 1994) points out that it is essential to
study the ways in which face and voice express both the motivational and cognitive
antecedents of emotion (appraisal results) and the functional adaptations generated
by the motivational antecedent. He argues that this componential patterning
approach is suitable to understand the complex interactions between various factors
in the dynamic unfolding of emotion (Scherer 1994).
In all, the relationship between emotion and motivation provides useful insights
into understanding how these two components inextricably interrelate. This is of
high relevance to affective computing and thus affective context-aware computing
(AmI). Hence, how emotion and motivation are linked needs to be taken into
8.2 Emotion 409

account when modeling or simulating emotions into computer systems. Affective


AmI systems should, in addition to recognizing the user’s emotional states, identify
the associated intention. This is important in order to reason proper responsive
services in support of the user’s emotions.

8.2.4 Theoretical Models of Emotions: Dimensional,


Appraisal, and Categorical Models

Fundamental theoretical models of emotions from cognitive psychology and their


computational simulation from cognitive science are significantly shaping and
advancing human like cognitive (intelligent) behaviors of computers in the area of
affective computing and AmI. Cognitive psychologists have proposed a range of
theoretical models of emotion. Of which dimensional (Lang 1979), appraisal
(Scherer 1999), and categorical (Ekman 1984) are the most commonly used models.
Emotion dimensions are a simplified description of basic properties of emotional
states (Schröder 2001). Evaluation/valence, activation, and power are the most
frequently encountered emotion dimensions. Decomposing emotions into these
underlying dimensions, activation (ready-to-act): aroused versus relaxed, valence:
positive versus negative, and power: dominant versus submissive, is a means to
understand emotions and a framework for analyzing emotional states. Several
theorists (e.g., Mehrabian and Russell 1974; Russell 1980; Lang 1980) have
advocated dimensional approach to emotion. In the theory of the two-dimensional,
model of affect, Russell (1980) argues that all emotions can be described in a space
of two dimensions: valence and activation. An example of positive-activated
emotion would be excitement, while a positive-deactivated emotion would be relief.
Examples of negative-activated and negative-deactivated emotions would be anger
or irritation and sadness or loom, respectively. As an evaluation method for emo-
tions, Self-Assessment Manikin (SAM) (Lang 1980) is a self-report instrument
using pictograms for nonverbal assessment of emotions and represents visually
three dimensions of emotion by using three axes: (1) pleasure–displeasure
(2) degree of arousal and (3) dominance submissiveness, as illustrated by Fig. 8.1.
Appraisal theory provides a descriptive framework for emotion based on per-
ceptions, the way individuals experience events, objects, people, and processes at

Fig. 8.1 Example figure of SAM: the arousal dimension. Source Desmet (2002)
410 8 Affective Behavioral Features of AmI …

the focus of the emotional state (Scherer 1999). Appraisal theoretical model is
perhaps the most influential approach to emotion within psychology (Scherer
1999), but categorical models of emotions (see below for further discussion) remain
of frequent use in affective computing due to practical reasons (Cearreta et al.
2007). Indeed, pragmatism and simplifications in operationalizing and modeling
emotional states as fluid, complex concepts prevail in affective computing and AmI
alike. Regardless, if theoretical models of emotions are not currently taken
all-inclusively into account, that is, from a theoretical view, affective aware AmI
systems will never break through to the mainstream as interactive systems.
Thinking in computing has to step beyond the technological constraints and
engineering perspectives. What is needed is what science needs to have and cannot
measure.

8.2.5 Emotion Classification

People may misattribute the specific emotion types, but they rarely misattribute
their valence (Solomon 1993). One would, for example, confuse such emotions as
anger and frustration or irritation and exasperation but it is unlikely that they would
confuse happiness with sadness or admiration with detestation. Further to this point
and at the emotion classification level, there is no definitive taxonomy of emotions;
numerous taxonomies have been proposed. Common categorizations of emotions
include: negative versus positive emotions; basic versus complex emotions, primary
versus blended emotions, passive versus active, contextual versus non-contextual,
and so on. In addition, in terms of time occurrence, some emotions occur over a
period of seconds whereas others can last longer. There are a number of classifi-
cation systems of basic emotions compiled by a range of researchers (e.g., Ortony
and Turner 1990). Emotion classification concerns both verbal and nonverbal
communication behaviors, including facial expressions, gestures, speech paralin-
guistic, and emotive features of speech. However, the lack of standardization often
causes inconsistencies in emotion classification, particularly in facial expressions
and emotiveness, an issue that has implication for emotion modeling and impact
emotion conceptualizations with regard to recognition of affect display used in
emotion computing, such as emotion-aware AmI and affective computing.

8.2.6 Affect Display

In the context of affective and emotional context-aware systems, affect can be


described as an outward manifestation—external expressive behavior—of emotion
or feeling. This is a common thread running through most of dictionary definitions
of affect. Below are some dictionary definitions of affect as related to emotion or
feeling:
8.2 Emotion 411

• An outward, observable manifestation of a person’s expressed feelings or


emotions.
• An observed emotional expression or response.
• The external expression of emotion attached to ideas or mental representations
of objects.
• The feeling experienced in connection with an emotion.
• Observable behavior that represents the expression of a subjectively experienced
feeling state (emotion).
In this regard, examples of affect include sadness, fear, joy, frustration, or anger.
The term refers sometimes to affect display, which is a vocal, facial, or gestural
behavior that serves to indicate an affect. This definition is commonly adopted in
affective computing—recognition of and response to emotions. In this respect,
affect display refers to a user’s externally displayed affect, representing basically the
expressive behavior part of the contemporary Schererian emotional complex. More
concepts related to affect are covered in the next chapter, as they are linked to
aesthetic computing and ICT design in terms of the relationship between affect,
cognition, and creativity.

8.2.7 A Selection of Relevant Studies

Many different disciplines have produced work on the subject of emotion, including
social science, human science, cognitive psychology, philosophy, linguistics,
nonverbal communication, neuroscience and its subfields social and affective
neuroscience, and so on. Studies of emotion within linguistics, nonverbal com-
munication, and social sciences are of particular relevance to emotion computing
technology—affective computing and context-aware computing. Studies in lin-
guistics investigate, among others, the expression of emotion through paralinguistic
features of speech and how emotion changes meaning to non-phonemic or prosodic
aspects, in addition to the expression of emotions through utterances. Beijer (2002)
describes emotive utterances as every utterance in which the speaker is emotionally
involved as expressed linguistically, which is informative for the listener. In non-
verbal communication, research is concerned with, among others, the expression of
emotion through facial and gestural behavior and the role of emotions in the
communication of messages. Emotion in relation to linguistics and nonverbal
communication is discussed in more detail in Chap. 7. Social sciences investigate
emotions for the role they play in social processes and interactions, and take up the
issue of emotion classification and emotion generation, among others. In relation to
emotion study in social sciences, Darwin (1872) emphasizes the nonver-
bal aspects of emotional expressions, and hypothesized that emotions evolve via
natural selection and therefore have cross-culturally universal counterparts. Ekman
(1972) found evidence that humans share six basic emotions: fear, sadness, hap-
piness, anger, disgust, and surprise. From Freudian psychoanalytic perspective,
412 8 Affective Behavioral Features of AmI …

emotions are viewed as underlying forces and drives that directly influence
behavior (Freud 1975). From a cognitive perspective, emotions are about how we
perceive and appraise a stimulus event. Emotion requires thought, information
processing over perception, which leads to an appraisal that, in turn, leads to an
emotion (Cornelius 1996). Several theorists argue that evaluations or thoughts as a
cognitive activity is necessary for an emotion to occur (e.g., Frijda 1986; Scherer
et al. 2001; Ortony et al. 1988; Solomon 1993). Moreover, William James sees
emotions as ‘bodily changes’ arguing that emotional experience is largely due to the
experience of such changes (James 1884). This relates to somatic theories of
emotion that claim that bodily responses rather than judgments are essential to
emotions. Anthropological work claims that emotions are dependent on sociocul-
tural facts rather than ‘natural’ in humans, an argument which challenges the
Darwinian view of emotions as ‘natural’ in humans (Lweis and Haviland 1993;
Lutz 1988). Some anthropology studies analyze and investigate, in addition to
emotions by contextualizing them in culture as a setting in which they are expressed
when looking for explaining emotional behavior, the role of emotions in human
activities, a topic which is of relevance to the interaction between the user and
technology in relation to task performance. Indeed, HCI is emerging as a specialty
concern within, among other disciplines, sociology and anthropology in terms of
the interactions between technology and work as well as psychology in terms of the
application of theories of cognitive processes and the empirical analysis of user
behavior (ACM 2009). Moreover, within sociology, according to Lweis and
Haviland (1993), human emotions are viewed as ‘results from real, anticipated,
imagined, or recollected outcomes of social relations’. From the perspective of
sociology of emotions, people try to regulate and control their emotions to fit in
with the norms of the social situation, and everyday social interactions and situa-
tions are shaped by social discourses. The social constructionists worldviews posit
that emotions serve social functions and are culturally determined rather than
biologically (responses within the individual) fixed as well as emergent in social
interaction rather than as a result of individual characteristics, biology, and evo-
lution (Plutchik and Kellerman 1980).

8.3 Emotional Intelligence: Definitional Issues and Models

The term ‘emotional intelligence’ has been coined to describe attributes and skills
related to the concept of emotion (Koonce 1996). As such, it has recently gained a
significant ground in the new emerging field of affective computing and recently
AmI. Emotional intelligence denotes the ability to perceive, assess, and manage
one’s emotions and others’. Salovey and Mayer (1990) define emotional intelli-
gence as ‘the ability to monitor one’s own and others’ feelings and emotions, to
discriminate among them and to use this information to guide one’s thinking and
actions’. According to Passer and Smith (2006), a cognitive psychologist, emo-
tional intelligence is to be aware of your emotions, control and regulate your own
8.3 Emotional Intelligence: Definitional Issues and Models 413

emotional responses, and adapts to the challenges of daily life, as well as to


understand other people’s emotions, evaluate their emotional expressions, connect
and respond to them appropriately, and identify the emotions that would best
enhance a particular thinking. Emotional intelligence is about the ability to combine
cognitive knowledge with emotional knowledge and use them in tandem.
Unsurprisingly, there is a sharp disagreement regarding the definition of emotional
intelligence with respect to both the conceptualization of the term and the opera-
tionalization of the concept. Currently, there are three main models of emotional
intelligence (EI):
• Ability EI model (Salovey and Mayer 1990)
• Trait EI model (Petrides and Furnham 2000; Petrides et al. 2007)
• Mixed models of EI (Goleman 1995; Kluemper 2008)
Based on the literature on and the current research within affective computing
and AmI, ability model and mixed models are the most commonly adopted models.
However, the idea of emotional intelligence was first proposed by Salovey and
Mayer (1990), followed by Goleman (1995) who introduced what is called mixed
models, an approach which focuses on emotional intelligence as a wide array of
competencies and skills that drive leadership performance. Goleman’s model out-
lines four main constructs:
1. Self-awareness: Refers to the ability to read one’s emotions and recognize their
impact while using a visceral emotional reaction to guide decisions, which is
considered to be often one of uneasiness and as a reflection of intuition rather
than rationality.
2. Self-management: Entails controlling one’s emotions and impulses and adapting
to changing circumstances.
3. Social awareness is the ability to sense, understand, and react to others’ emo-
tions while comprehending a social structure as made up of individuals that are
connected by one or more specific types of interdependency, such as common
interest, dislike, relationships of beliefs, knowledge, and so on.
4. Relationship management: Involves the ability to inspire, influence, and develop
others while managing conflict.
These emotional capabilities can be fulfilled in an emotion experience. Goleman
hypothesizes that individuals are born with a general emotional intelligence that
determines their potential for learning emotional competencies (Boyatzis et al.
2000). This implies that emotional competencies are learned capabilities that must
be worked on and can be developed and improved to achieve outstanding perfor-
mance. In 2000, the conceptual distinction between trait emotional intelligence and
ability emotional intelligence was introduced (Petrides and Furnham 2000). Trait
emotional intelligence is defined by Petrides et al. (2007) as ‘a constellation of
emotional self-perceptions located at the lower levels of personality’. By meaning
in other terms an individual’s self-perceptions of their emotional abilities, trait
emotional intelligence is associated with behavioral dispositions and self-perceived
abilities.
414 8 Affective Behavioral Features of AmI …

Salovey and Mayer (1990) define ability emotional intelligence as a part of


human intelligence responsible for the ability to perceive emotions, integrate
emotions to facilitate thoughts, understand emotions, and regulate emotions.
Specifically, as their ability model claims, emotional intelligence encompasses four
abilities:
1. Perceiving emotions—the ability to detect and differentiate between emotions in
faces and voices—including the ability to identify one’s own emotions. As a
first step of the ability model, perceiving emotions entails identifying emotions
and discriminating between accurate (appropriate) and inaccurate (inappropri-
ate) expressions of emotion, which is an important ability to understand and
analyze emotional states—the third component of EI Framework (Mayer and
Salovey 1997). By enabling all other processing of emotional information,
perceiving emotions represents a basic aspect of emotional intelligence.
2. Using emotions—the ability to integrate and harness emotions to facilitate
thoughts, various cognitive activities such as thinking, decision making, and
problem solving.
3. Understanding emotions—the ability to comprehend emotion language and
complicated relationships among emotions, i.e., the ability to be sensitive to
slight variations between emotions, and to recognize and describe how emotions
evolve over time.
4. Managing emotions—the ability to regulate and control emotions in both our-
selves and in others by, for example, harnessing positive and negative emotions
and managing them to achieve intended goals.
The model proposes that individuals vary in their ability to relate emotional
processing to a wider cognition, which is manifested in certain adaptive behaviors.
It also views emotions as useful sources of information as to making sense of and
navigating the social environment (Mayer and Salovey 1997; Salovey and Grewal
2005).

8.4 Affective Computing and AmI Computing

8.4.1 Understanding Affective Computing

Affective computing is the branch of computer science and the area of AI that is
concerned with modeling emotions or simulating emotional processes into com-
puters or machines. It is a scientific area that works on the detection of and response
to user’s emotions (Picard 2000). Specifically, it deals with the study, design,
development, implementation, evaluation, and instantiation of systems that can
recognize, interpret, process, and act in response to emotions or emotional states.
This is to build computers that are able to convincingly emulate emotions or exhibit
human-like emotional capabilities. It is recognized that the inception of affective
8.4 Affective Computing and AmI Computing 415

computing is credited to Rosalind Picard, director of the Affective Computing


Research Group at the MIT Media Lab, with the publication of Affective
Computing in 1997. According to her, computers must emulate the ability to rec-
ognize, understand, and to have and express emotions in order to be genuinely
intelligent and to interact naturally with humans (Picard 2010). In the early 2000s,
research in computer science started to focus on developing computing devices
endowed with emotional capabilities to recognize human affect display—externally
displayed affect that can be indicated by vocal, facial, or gestural means. The vision
of computers that can respond to the emotion has spawned a new area of research
into perceptual user interfaces (PUI’s) (Turk and Robertson 2000). Affective
computing involves a wide variety of theoretical models of emotions that can frame
the design of affective systems as well as different technologies that can be used to
implement such systems, including miniaturized, multisensory devices; sophisti-
cated pattern recognition techniques; and semantic and probabilistic modeling
approaches. This is to emulate how humans use their sensory modalities to sense
emotional cues, cognitive information processing to perceive emotions, and various
actuators or effectors to act and behave in response to emotions. Profoundly
interdisciplinary, affective computing integrates or draws from computer science,
cognitive science (e.g., Tao and Tieniu 2005), cognitive psychology, and human
communication. The area of affective computing that relates to context-aware
computing (AmI) is known as emotion-aware HCI AmI.

8.4.2 Examples of the State-of-the-Art Application Projects

With the aim to restore a proper balance between emotion and cognition in the
design of new technologies for addressing human (affective) needs, the MIT
affective computing team (Picard 1997; Zhou et al. 2007) carries out research in the
area of affective computing from a broad perspective, contributing to the develop-
ment of techniques for measuring indirect mood, stress, and frustration through
natural interaction; techniques for enhancing the self-awareness of the affective
states and how to select their communication to others, and emotionally intelligent
systems, as well as pioneering studies on ethical issues in affective computing.
Among the notable projects as cited in Zhou et al. (2007) include: ERMIS,
HUMAINE, NECA, and SAFIRA. The prototype system ERMIS (Emotionally Rich
Man-machine Intelligent System) can interpret the user’s emotional states (e.g.,
interest, boredom, anger) from speech, facial expressions, and gestures.
The HUMAINE (Human–Machine Interaction Network on Emotion) project aims to
lay the foundations for emotional systems that can detect, register, model, under-
stand, and influence human emotional states. The aim of NECA project is to develop
a more sophisticated generation of conversational systems/agents, virtual humans,
which are capable of speaking and acting in a human-like fashion. Supporting
Affective Interactions for Real-time Applications (SAFIRA) project focuses on
developing techniques that support affective interactions. The MIT affective
416 8 Affective Behavioral Features of AmI …

computing team is working on many research projects to develop techniques,


algorithms, theories, and models necessary for implementing affective systems, e.g.,
recognition of affect in speech, acoustic parameters, and nonverbal communication
signals, especially facial expressions and gestures.

8.4.3 Integration of Affective and AmI Computing:


Advancing Emotional Context-Aware Systems

Including emotions into context-aware computing is a recent challenging endeavor. It


is increasingly attracting many researchers in the field of AmI. Researchers predict
that this area will gain a stronger foothold in the near future. Unsurprisingly, affective
computing has become an integral part of research within AmI—affective
context-aware computing. Incorporating affective computing paradigm within AmI
seems to be an interesting approach and an important step to advance the research and
development of affective context-aware systems. Utilizing affective computing, AmI
systems as interactive entities can have human-like emotional capabilities, that is,
multimodal user interfaces capable of recognizing emotions from different sources
and responding to these emotions. One of the goals of affective computing is to design
computing devices and systems that are capable to convincingly emulate human
emotions or exhibit natural emotional capabilities. This is of high relevance to AmI
systems. Affective computational tools enable AmI systems to use affect display as
indicators of emotional behavior and other means by reading multimodal sources to
detect and react to the emotional state of the user. In other words, an emotion-aware
AmI system should be able to recognize the user’s emotional state by detecting
various affective cues and psychophysiological cues, which requires using multi-
sensory devices or various types of dedicated sensors for sensing vocal parameters
(prosodic features), speech, facial expression, gestures, body movements as well as
heart rate, pulse, skin temperature, galvanic skin response, and so on. Miniaturization
of computing devices is making possible the development of microsensors and
nanosensors (see Chap. 4 for a detailed account and discussion) and wearable com-
puters that can record parameters or read signals in an intrusive way. After recognizing
the user’s emotional state, an affective AmI system can, through the process of
interpretation, identifies the intention of the emotion (acquire the user’s motivation),
reason about it, and determine a proper emotional service that matches the user’s
emotional state. The emotional state is deemed a critical element of the user context as
to context awareness functionality of AmI systems. In addition to responsive services,
an AmI system should also be capable of delivering adaptive and proactive services
based on the other components or subsets of the user context. Commonly, the user
context involves personal context such as cognitive, emotional, and physiological
states; environmental context such as location and physical condition; task context
such as activity; sociocultural context such as proximity of others and social inter-
action and cultural conventions; and spatiotemporal context such as time and space.
8.4 Affective Computing and AmI Computing 417

Context is framed by Schmidt et al. (1999) as comprising of two main components,


human factors and physical environment. Human factors related context encompasses
three categories: information on the user (knowledge of habits, emotional state,
bio-physiological conditions), the user’s tasks (activity, engaged tasks, general goals),
and the user’s social environment (social interaction, co-location of other, group
dynamics). Likewise, physical environment related context encompasses three cate-
gories: location (absolute position, relative position, co-location), infrastructure
(computational communication and information resources, task performance), and
physical conditions (light, temperature, pressure, noise). Typically, an AmI envi-
ronment is comprised of systems equipped with human-like intelligent interactive
capabilities, allowing users to interact in a natural way with computing devices. In
terms of the user’s emotional states, AmI environments (e.g., homes, offices, schools,
and hospitals) can facilitate emotional experiences by providing users with suitable,
responsive services instantaneously (Zhou and Kallio 2005), using affective context
awareness functionality. The need for emotional context-aware applications to pro-
duce, elicit, and invoke positive emotions and avoid negative ones is critical to the
success of AmI systems.

8.4.4 More Contributions of Affective Computing


to AmI Computing

Given the variety of systems being investigated in the area of affective computing,
there should be a lot more to its integration with AmI than just enhancing affective
context-aware systems. Indeed, AmI are capable of meeting needs and responding
intelligently to spoken or gestured wishes and desires without conscious mediation,
and even these could result in systems that are capable of engaging in intelligent
dialog (Punie 2003, p. 5). Hence, AmI systems should be able not only to auton-
omously adapt to the emotional state of the user, but also generate emotional
responses that elicit positive emotions by having an impact on the user’s emotions,
appear sensitive to the user, help the user to improve his/her emotional intelligence
skills, and even mingle socially with the user. Particularly, the simulation of
emotional intelligence and human verbal and nonverbal communication into
computers is aimed at helping users to enhance different abilities associated with
emotion and support social interaction processes. Conversational agents and emo-
tional intelligent systems are both of interest to and primary focus in affective
computing. Indeed, affective computing scholars and scientists are studying, in
addition to emotionally intelligent systems, a wide variety of technologies for
improving the emotional abilities of the user such as the self-awareness of the
emotional states and how to communicate them to others in a selective way. They
are also working on the development of advanced conversational agents, systems
which can interpret the user’s emotional state from speech and facial expressions
and gestures and can register, model, understand, and influence human emotional
states as well as support affective interactions.
418 8 Affective Behavioral Features of AmI …

8.4.5 Emotional Intelligence in Affective Computing


and Affective AmI

One of the significant challenges of affective AmI is to create systems equipped


with emotional intelligence capabilities. In the field of AI, affective computing
researchers aim to, among others, build emotionally intelligent systems or con-
versational systems endowed with emotional intelligence capabilities. Such systems
are also expected to be part of AmI environments where computers can mingle
socially or engage in intelligent dialogs with users, and thereby exhibit emotional
capabilities and emotional intelligence. As an application of a system proposed by
Ptaszynski et al. (2009, p. 1474), ‘a conversational agent can choose to either
sympathize with the user or to take precautions and help them manage their
emotions’. The simulation of emotions in conversational agents aims to enrich and
facilitate interactivity between human and computers (Calvo and D’Mello 2010).
Understanding and expressing emotions is one of the most important cognitive
behaviors in humans, often described as a vital part of human intelligence (Salovey
and Mayer 1990). As one of the pioneering computer scientists in AI, Marvin
Minsky relates emotions to the broader issues of machine intelligence, stating that
emotion is ‘not especially different from the processes that we call “thinking”’
(Heise 2004). Lehrer (2007) wrote, quoting Marvin Minsky, a professor at MIT:
‘Because we subscribed to this false ideal of rational, logical thought, we dimin-
ished the importance of everything else… Seeing our emotions as distinct from
thinking was really quite disastrous’. The latest scientific findings show that
emotions influence the very mechanisms of rational thinking as they play an
essential role in thought, perception, decision making, problem solving, and
learning. Emotion is fundamental to human experience. The scientific journals are
increasingly filled with research on the connections between emotion and cognition.
The new scientific appreciation of emotion is profoundly altering the field of
computing. New computing is about balancing between cognition and emotion.
One of the current issues in AI is to create methods for efficient interpretation,
processing of emotions, and effective responses (i.e., speech production with
graphical full facial animation). While some researchers focus on computer’s
emotional intelligence or emotionally intelligent computers (e.g., Picard et al. 2001;
Andre et al. 2004; Ptaszynski et al. 2009) other are working on how to help users
improve their emotional intelligence skills (e.g., Zhou et al. 2007). Human innate
emotional intelligence could be mediated by integrating AmI and advanced ICT
(Zhou et al. 2007). In all, there is a rising tendency both in AI and AmI research to
humanize computers, by equipping them with emotional intelligence capabilities.
That is to say, artificial intelligent agents are being built to have emotions and
related abilities. This new wave of computing emphasizes the role of emotions in
the development of the future generation of interactive systems.
8.4 Affective Computing and AmI Computing 419

8.4.6 Context in Affective Computing: Conversational


and Emotional Intelligent Systems

Although context is a fundamental component of emotion, research has paid little


attention to context in the field of affective computing (see Cowie et al. 2005).
Understanding emotion meaning, which is determined by context, is important to be
able to appropriately respond to emotions. In a variety of ways, context is of central
relevance to the expression of and response to emotions. Context has a significant
influence on the selection of expressive emotional behavior and the interpretation of
emotional cues or stances, whether as spoken patterns or displayed nonverbal sig-
nals. Emotions are inherently multimodal in the sense that emotional signs may
appear in various channels, but not all kinds of emotional signs tend to be available
together as context—which is inescapably linked to modality—can affect cues that
are accessible or relevant (Cearreta et al. 2007). This is linked to conversational
agents, affective context-aware systems, and emotionally intelligent systems. Based
on the situation, speech, facial expression, or gestures or a combination of these can
be available or accessible channels that can provide affective information to any of
these systems as an implicit or explicit form. Disambiguation entails, in this context,
using the general user context to determine what was meant (intended) by the
expressed emotional behavior, which is assumed to indicate the emotional state, a
contextual element that is an integral part of the user context. Obviously, disam-
biguation of emotions is of import to affective AmI systems in general for they also
should have conversational and emotional intelligence capabilities. In all, both
conversational agents and affective context-aware systems cannot determine what
was intended with an emotional expressive behavior or state, which is usually
conveyed in a multimodal form. A conversational intelligent agent can enrich and
facilitate various types of emotional interactivity between human users and com-
puters. This is likely to evolve smoothly as long as the intention and meaning of the
emotional state of the user is properly identified and accurately interpreted. In
conversational agents, in general, ‘when trying to disambiguate a signal…the
important part is to find the meaning behind the signals [including emotional cues]…
One has to know the context to know which functions [including emotion intent] are
appropriate at a certain point and this knowledge can be used to determine what was
intended with a detected signal. In this case, the context can act as a filter, making
certain interpretations unlikely.’ (ter Maat and Heylen 2009, p. 72). Choosing the
concrete conversational behavior to perform as an agent when a conversational
intent/function is provided cannot be done without any context (Samtani et al. 2008).
Communicative intent/function includes emotion in addition to speech act, discourse
structure, information structure, and so forth. Therefore, the general user context
plays an important role in the process of disambiguation, which is critical in
determining the most appropriate emotional response that an affective or AmI system
can act on. Disambiguating verbal and nonverbal emotional signals, which constitute
a critical part of our communication, is an important facet of building effective
affective context-aware systems and AmI systems more generally. While acoustic
420 8 Affective Behavioral Features of AmI …

parameters extracted from the speech waveform (related to pitch, speaking tempo,
voice quality, intonation, loudness and rhythm) can be useful in disambiguating
affective display, context remains still determining in the process, especially if the
communication channel does not allow for the use of the textual component of the
linguistic message or is limited to transmission of lexical symbols that describe
emotional states.
The contextual appropriateness of emotions (whether transmitted by speech or
gestural means) is an initial step that is very crucial in order for a system to
understand and interpret emotions and thus provide emotional intelligence services.
While Mayer and Salovey (1997) argue that the ability to discriminate between
appropriate and inappropriate expressions of emotion is the key ability for inter-
preting and analyzing emotional states, Ptaszynski et al. (2009) conclude that
computing contextual appropriateness of emotional states is a key step towards a
full implementation of emotional intelligence in computers. Besides, emotions
should be perceived as context-sensitive engagements with the world, as demon-
strated by recent discoveries in the field of emotional intelligence (Ptaszynski et al.
2009). However, most research focuses on the development of technologies for
affective systems in AI and AmI as well as the design of such systems, but a few
studies on contextual appropriateness of emotions and multimodal context-aware
affective interaction have been conducted. Furthermore, most of the behavioral
methods simply classify emotions to opposing pairs or focus only on a simple
emotion recognition (Teixeira et al. 2008; Ptaszynski et al. 2009), ignoring the
complexity and the context reliance of emotions (Ptaszynski et al. 2009).
Nonetheless, there is a positive change in the tendency of analyzing affective states
as emotion specific rather than using methods that categorize emotions to simple
opposing pairs. This trend can be noticed in text mining and information extraction
approaches to emotion estimation (Tokuhisa et al. 2008). In all, understanding
users’ emotions requires accounting for context as a means to disambiguate and
interpret the meaning or intention of emotional states for a further affective com-
putational processing and relevant service delivery.

8.4.7 Emotions in AmI Research

As mentioned above, emotional states constitute one element of the user context that
a context-aware system should recognize in order to adapt its functionality to better
match user affective needs. A good context-aware system is the one that can act in
response to the evaluation of the elements of the general context that are of central
concern to the user in an interrelated, dynamic fashion. With context gaining an
increased interest in affective computing, it becomes even more interesting to
include affective computing with context-aware computing. The role of affective
computing in context-aware computing is to equip context-aware applications with
the ability to understand and respond to the user’s needs according to the emotional
element of the user context. And speech, facial and corporal gestures have a great
8.4 Affective Computing and AmI Computing 421

potential to provide a wealth of affective information as implicit input to


context-aware systems. However, research shows that the majority of the research
within context-aware computing pays little attention to emotions compared to
location. Rather, emotion has been less investigated in the field of AmI.
Notwithstanding its acceptance by computer scientists, affective AmI remains rel-
atively unexplored territory. As supported by Zhou et al. (2007), the vision of AmI
rarely includes emotions and the majority of the research in this area ignores emo-
tions. Nevertheless, as a research area, affective context-aware computing and per-
ceptual, ambient user interfaces are now increasingly burgeoning as computer
scientists studying AmI have recently started to pay more attention to affective
computing. Underlining the links between AI and AmI, affective computing is
expected to augment AmI systems with elemental processes and aspects of natu-
ralness of interaction, emotion-aware HCI, including perceptual, multimodal user
interfaces. Currently, most of the research in affective computing is focused on
developing emotion models (ontologies), capture technologies, and recognition
techniques. However, the field of emotion computing technology—affective com-
puting and affective AmI—is still in its infancy and there is a vast unexplored zone to
tap into as to both the established scientific knowledge and unsettled issues in the
area of emotion within cognitive psychology, linguistics, nonverbal communication,
and social science.

8.5 Affective and Context-Aware Computing and Affective


Display

Currently, there is a great variety of technologies that can be used for the design and
implementation of affective systems and affective context-aware systems. Here the
emphasis is on capture technologies and recognition techniques. Also, a classifi-
cation of studies on emotion detection and recognition is included. This is to
highlight the enabling role such technologies are playing in the implementation of
affective systems in terms of detecting or recognizing the emotional states of users
from their affective display. Externally expressed, affective display is considered as
a reliable emotional channel, and includes vocal cues, facial cues, physiological
cues, gestures, action cues, etc. These channels are carriers of affective information,
which can be captured by affective systems for further computational interpretation
and processing for the delivery of a range of adaptive and responsive services.

8.5.1 Context and Multimodal Recognition

Affect is displayed to others through facial expressions, hand gestures, postural


expressions, prosodic features, emotive function of speech, and other manifesta-
tions. As an indicator of affective states, affect display is a means used in affective
422 8 Affective Behavioral Features of AmI …

computing to understand and respond to emotions, and as a reliable source and


provider of affective information as a form of implicit input to context-aware
systems. Affective display provides a wealth of useful information deemed nec-
essary for analyzing, interpreting, and processing emotional information by affec-
tive systems so they can respond to users’ emotional states. The accuracy of
identifying and the effectiveness of interpreting user’s emotional states depends on
the type of modality a system may utilize and the number of channels a system may
have access to in order to detect or recognize the emotional state of users, i.e., visual
or auditory modality or both along with the relevant, accessible channels. The
relevance and accessibility of channels are determined by the current context. For
example, some places might allow for some channels and not for others. For
instance, in some cases voice may not be allowed as in the case of libraries. Or,
there might be limits to the distance at which speech is audible, and visible
behaviors such as gaze or facial expressions are accurately visually perceivable—
detectable. Ideally, a situation would allow a computer to combine visual and
auditory modalities and hence a wide range of the associated channels to recognize
emotions. It is important to note that the more channels are involved (or consid-
ered), the more robust estimation of users’ emotional states. More than one
modality can be combined (multimodal recognition). This allow for using facial
expressions and speech prosody (e.g., Caridakis et al. 2006), or one modality but
various channels: facial expressions and hand gestures (e.g., Balomenos et al.
2004). And the aim is to provide a more robust estimation of the user’s emotional
state.
Research in affective computing is investigating how to combine other modal-
ities or modes then visual and auditory to accurately determine users’ emotional
states. An interesting project called ‘Machine Learning and Pattern Recognition
with Multiple Modalities’ is being undertaken at the MIT, which involves Hyungil
Ahn, Rosalind W. Picard, and Ashish Kapoor, and aims to develops new theory and
algorithms to enable computers to make rapid and accurate inferences from multiple
modes of data, determining a user’s emotional state using multiple sensors, such as
video, mouse behavior, chair pressure patterns, typed selections, or physiology
(MIT Media Lab 2014). However, the more robust estimation of the user’s emo-
tional states is, the more effective the interpretation, the more efficient the sub-
sequent processing of the emotional states, and the more appropriate the provided
adaptive and responsive services. Therefore, it is important to consider multimodal
sources or multiples modes when it comes to capturing human emotions, in addition
to using mechanisms for efficient fusion and aggregation of the detected data as
well as methods for meaningful interpretation and techniques for efficient pro-
cessing of emotional states. Overall, the emotive and prosodic features of speech,
facial expressions, hand gestures, corporal movements, physiology, and actions are
considered to be reliable sources of emotional information that determine, as per-
ceived by affective systems or captured as implicit input by context-aware systems,
the computational understanding, processing, adaptation and responsiveness for a
better user emotion interactive experience.
8.5 Affective and Context-Aware Computing … 423

8.5.2 Recognizing Affect Display and Other Emotional Cues


in Affective and Context-Aware HCI Applications

Effective and context-aware HCI applications involving the so-called naturalistic,


multimodal user interfaces—based on natural modalities—incorporate a wide
variety of miniature dense sensors used to detect users’ emotional, cognitive, and
(psycho) physiological states by reading multimodal sources. Such interfaces
involve a wide variety of specialized user interfaces embedded and used together to
detect the emotional states of users. Among the interfaces used for this purpose
include facial user interfaces, gesture interfaces, voice user interfaces, motion
tracking interfaces, conversational interface agents (human-like graphical embodi-
ment), and so on. It is to note that these user interfaces along with the embedded
sensors can be used, not only for detecting emotions, but also for detecting cog-
nitive states as well as receiving commands to perform tasks by using natural forms
of explicit inputs (see next chapter for more detail). However, in emulating how
human sense and perceive multimodal emotional cues in others, affective and
context-aware systems use multiple sensors in order to detect emotions, for
example, a video camera to capture facial expressions and gestures, a microphone
to capture speech, and other sensors to detect emotional cues by directly measuring
psychophysiological data, such as skin temperature, galvanic resistance, heart rate,
and electroencephalographic response. Affective and context-aware systems are
increasingly being equipped with the so-called multisensory devices used for
multimodal detection of emotional cues or recognition of emotional states. Such
devices are based on sensor fusion technology. As an expansion on their work on
facial expression recognition, Wimmer et al. (2009) mentioned that they aim at
integrating multimodal feature sets and apply the so-called Early Sensor Fusion.
Further, recognizing emotional information requires the extraction of patterns or
cues from the gathered data, which is done by parsing the data through various
processes such as emotional speech processing, facial expression detection, gestures
detection, natural language processing—emotiveness, and so forth. Emotional
speech processing identifies the user’s emotional states by processing speech pat-
terns, using speech recognition and synthesis tools. Vocal parameters and prosody
features are analyzed through speech pattern recognition (Dellaert et al. 1996a, b;
Lee et al. 2001). In addition to acoustical prosodic features of speech, affective
computing community has recently started to focus on emotiveness. It is gaining an
increased attention among researchers in AI and AmI alike, particularly in relation
to emotional intelligence. Language-based approaches to emotions are being used
as a reliable means to detect user’s emotional states and contextual information with
respect to affective, emotion-aware and context-aware HCI applications. Zhou et al.
(2007, p. 5) state: ‘Conversation is a major channel for communicating emotion.
Extracting the emotion information in conversation enables computer systems to
detect emotions and capture emotional intention more accurately so as to mediate
human emotions by providing instant and proper services’.
424 8 Affective Behavioral Features of AmI …

8.5.3 Studies on Emotion Recognition: Classification


and Issues

There is a plethora of studies on emotion recognition in computer science. They can


be heuristically classified into two main categories: face-based recognition and non
face-based recognition. The former category focuses on recognizing emotions from
facial expressions by image analysis and understanding (e.g., Wimmer et al. 2009;
Susskinda et al. 2007; Phillips et al. 2005; Schweiger et al. 2004; Cohen et al. 2003;
Michel and El Kaliouby 2003; Sebe et al. 2002; Pantic and Rothkrantz 2000; Tian
et al. 2001), and the latter deals with recognition and modeling of human behaviors,
such as hand gestures (Huang and Pavlovic 1995; Yin and Xie 2001), body
movement (Gavrila and Davis 1996; Gavrila 1999), and speech (Sebe et al. 2004;
Murray and Arnott 1993; Chiu et al. 1994; Dellaert et al. 1996a, b; Scherer 1996;
Sagisaka et al. 1997; Murray and Arnott 1996) or modeling the interaction between
speech and gesture (Cassell et al. 1994). Further, long ago, Lang (1979) suggested
that three systems exist that could serve as indicators to detect the emotion of the
user: (1) verbal information (reports about perceived emotions described by users);
(2) conductual information (facial and postural expressions and speech paralin-
guistic parameters); and (3) psychophysiological responses (heart rate, galvanic
skin response, and electroencephalographic response). Overall, in the most popular
methods, which are usually based on a behavioral approach, emotions are recog-
nized using facial expressions, voice, or biometric data (Hager et al. 2002; Kang
et al. 2000; Teixeira et al. 2008). Today, dedicated systems often facilitate the
challenge of emotion detection (Ikehara et al. 2003; Sheldon 2001; Vick and
Ikehara 2003); they derive the emotional state from different sources, such as blood
pressure, pulse, perspiration, brain waves, heart rate, and skin temperature.
Miniaturization of computing devices, thanks to nano- and micro-engineering, is
making possible the development of on-body sensors that can detect or register
parameters in an intrusive way. Researchers foresee that AmI environments will be
densely populated by systems with potentially powerful NBIC capabilities
(nano-bio-ICT) (Riva et al. 2005).
However, current research shows that most of the popular methods for emotion
recognition ignore the pragmatic and sociocultural context of emotions as well as
adopt simple valence classification of emotions. Designing affective systems for
deployment in different real-world environments, including cultural setting, is not
an easy task. Indeed, current emotion recognition methods lack usability in
real-world implementation with respect to affective and context-aware systems,
although they have yielded excellent results in laboratory settings. In terms of facial
expression recognition, Pantic and Rothkrantz (2003) point out that whilst there is
disagreement about the classification of emotions, varied research shows that
automated systems can recognize a range of emotions with 64–98 % accuracy as
compared to human experiments where recognition rates are 70–98 %. Arguably,
given the limitation of emotion recognition methods, whether concerning facial
expressions, emotional speech or gestures, such variations in recognition rates
8.5 Affective and Context-Aware Computing … 425

would mitigate against their use in more sophisticated and critical applications, such
as conversational agents and affective systems, which rely heavily on the contextual
dimension of emotions. Such variations in recognition may rather be practical in
less critical applications that may use unimodal input such as facial expression. For
example, video games that may alter some aspects of its contents in response to the
view’s emotions such as fear or anger as inferred from their facial expression.

8.6 Areas of Affective Computing

8.6.1 Facial, Prosodic, and Gestural Approaches


to Emotion

Apart from fulfilling conversational functions, nonverbal communication behaviors,


which constitute a large percentage of our communication, are used by humans as
external expressive behaviors to communicate or convey emotions. Nonverbal
behavior such as gestures, facial expressions, head nods, and postural expressions
have been argued to be important for the transmission of affective information (e.g.,
Short et al. 1976; Argyle 1990). Displayed in various forms ranging from the most
discrete of facial expressions to the most dramatic and prolific gestures, these affect
displays vary between and within cultures (Ekman 1993). In computing, affect
display (or verbal and nonverbal multimodal) behaviors are used as emotional
signals or cues by affective systems to identify users’ emotional states. In affective
computing, detection of emotions trend to rely upon the assessment of multimodal
input, but cultural variations seem to be less accounted for. Hitherto, most of the
research in affective computing and, thus, affective AmI tends to focus on facial
displays—facial expressions and gestures. Many research projects are currently
active as to investigating how to detect the different facial movements and head
gestures of users while they are interacting naturally with computers. However,
gesture recognition is also important as to easing affective and context-aware HCI,
as gesture is deemed essential in conveying affective and contextual information to
the listener in a conversation, hence its relevance to affective applications. The
research is indeed active in gesture recognition as an approach to emotion recog-
nition, but not as intensive as for facial expression recognition. The same goes for
paralinguistic speech features or parameters; current research is investigating how
to extract acoustic (prosodic) parameters from the speech waveform (related to
pitch, voice quality, intonation, loudness and rhythm) to disambiguate the affect
display. In addition, there are research initiatives that are focusing on multimodal
recognition, exploring different combination of channels, mainly facial expressions
and speech prosody or facial expression and gesture. For more detail on the
affective and conversational functions of hand gestures and paralanguage as well as
eye movements, refer to the previous chapter.
As far as the recognition methods are concerned, the most common methods used
in the detection and processing of facial expressions are hidden Markov model and
426 8 Affective Behavioral Features of AmI …

neural network processing (see, e.g., Wimmer et al. 2009; Pantic and Rothkrantz
2000). See Chap. 4 for a detailed discussion on pattern recognition techniques
supported with illustrative examples relating to different types of context. Hand
gestures have been a common focus of body gesture detection methods (Pavlovic
et al. 1997). Body gesture is the position and the changes of the body. There are
many proposed methods (Aggarwal and Cai 1999) to detect the body gesture. As an
illustrative example of affect display, facial expressions are addressed in little more
detail in the next section as an affect display behavior and recognition method. But
before delving into this, it may be worth shedding a light on the emotive function of
language—emotiveness—since this topic has not so far covered, neither in relation
to conversational agents in the previous chapter nor to affective systems.

8.6.2 A Linguistic Approach to Emotion: Emotiveness

Stevenson and Stevenson (1963) describes emotiveness as the feature of language


that is made up of the elements of speech used to convey emotive meaning in
sentence. According to Jakobson (1960), emotiveness is described by the emotive
function of language. Conversation is a major channel of communicating emotions.
Solomon (1993) argues that the semantic and pragmatic diversity of emotions is
best conveyed in spoken language. Realizing the emotive function of language
lexically is a common aspect of most languages. The lexicon of words describing
emotional states plays a key role in expressing emotions (Nakamura 1993). The
emotive function of language can be realized lexically through such parts of speech
as endearments and exclamations. For example, in Japanese, the emotive function
of language is realized lexically through such parts of speech as exclamations,
hypocoristics (endearments), vulgar language, and mimetic expressions (Ptaszynski
et al. 2009). Nakamura (1993, cited in Ptaszynski et al. 2009) propose a classifi-
cation of emotions into 10 types as most appropriate for Japanese language on the
basis of a comprehensive study on emotions in this language: joy and delight;
anger; sorrow and sadness; fear; shame, shyness and bashfulness; liking and
fondness; dislike and detestation; excitement; relief; and surprise and amazement.
While in English conversation, Zhou et al. (2007) claim that the processing of
emotion has not been systematically explored. However, based on the situation,
affective systems may have access to communication channels that are limited to
prosodic features of language, at certain moments, and in this case, lexical symbols
may be useful to the analysis of the prosodic features of speech in which utterances
would otherwise be spoken, using a whole set of variations in the features of voice
dynamics. Similarly, in the communication channel limited to transmission of
lexical symbols, the analysis of some prosodic elements of speech such as tone of
voice or intonation must focus on its textual manifestations such as exclamation
marks or ellipsis (Ptaszynski et al. 2009). Acting on many levels, prosody facilitates
(or impedes) lexical processing (Karpinski 2009). Research shows that speech
recognition and synthesis used currently by affective and context-aware systems
8.6 Areas of Affective Computing 427

Table 8.1 Structure of emotion in English conversation


Lexical The use of emotion lexical terms can be associated E.g., dislike, like,
choice with emotion types in conversation pleased, displeased,
joy, distress
Syntactic Word order variation can be associated with the E.g., word order
form display of a speaker’s emotions or question design
Prosody Prosody interacts with the verbal components with E.g., intonation,
respect to a speaker’s emotions duration, and intensity
Sequential The expression of an emotion is an interactional E.g., repeat, repair,
positioning phenomenon that is associated with the organization contingency, overlap
of turns and sequences
Source Zhou et al. (2007)

may not have high confidence that it accurately recognizes, for some reason,
lexicon of all words describing emotional states, but their assessment of users’
affective display, mainly paralinguistic features and facial expressions signals is
more likely to indicate high probability or estimation of the current emotional state
of the user. However, speech remains the most precise tool for expressing complex
intentions (Ibid). In terms of affect recognition in speech, most research focusing on
building computational models for the automatic recognition of affective expression
in speech investigate how acoustic (prosody) parameters extracted from the speech
waveform (related to voice quality, intensity, intonation, loudness and rhythm) can
help disambiguate the affect display without knowledge of the textual component of
the linguistic message.
Regarding the investigation of the structure of emotion in conversations, there
are various features that have been studied in conversation analysis (Karkkainen
2006; Wu 2004; Gardner 2001). In order to explore the structure of emotions in
English conversation, Zhou et al. (2007) study four features (see Table 8.1) in
English conversation: namely lexical choice, syntactic form, prosody and sequential
positioning, which are facets that have been studied in conversation analysis by the
above authors. Álvarez et al. (2006) provide feature subset selection based on
evolutionary algorithms for automatic emotion recognition in spoken Spanish
language.

8.7 Facial Expressions and Computing

Compared to other components of affect display or nonverbal behavior, facial


expression has been mostly and extensively investigated and is still the prime focus
of the research within affective computing. Facial expressions are the most com-
monly used emotional cues in emotionally intelligent systems, affective context-
aware systems, socially intelligent systems, and conversational systems. These
involve the detection or the recognition of emotional states of users, at varying
degrees and in a variety of ways.
428 8 Affective Behavioral Features of AmI …

8.7.1 Facial Expressions: Theoretical Perspectives

There is a long tradition in emotion psychology of investigating facial expressions


as an observable indicator of unobservable emotional processes underlying outward
emotional states. Several theorists argue for the universality of facial expressions—
communicating various emotions irrespective of cultural variations of people. Most
of the research of the so-called discrete emotion theorists concerning the univer-
sality of basic emotions is based on studies on facial expressions (see, e.g., Ekman
1994; Izard 1994). Ekman and Friesen (1972) find that humans demonstrate six
universal facial displays: happiness, anger, disgust, sadness, fear, and surprise, and
these are expressed and interpreted in the similar way by humans of any origin all
over the world. They do not depend on the culture or the country of origin. Ekman
(1982) and Ekman and Friesen (1975) show that people across a number of cultures
are able to recognize seven distinct facial expressions from posed photographs:
adding interest to the list. Some researchers in nonverbal communication claim that
facial movements may express at least eight emotions, adding contempt to the
above seven emotions (Ekman et al. 1972). According to Ekman, an interpretation
of facial expressions must rely on the postulated configurations and not on single
components (Ekman and Rosenberg 1997).
DeVito (2002, p. 139) maintains that facial expressions are called ‘primary affect
displays: They indicate relatively pure, single emotions. Other emotional states and
other facial displays are combinations of these various primary emotions and are
called affect blends. You communicate these blended feelings with different parts of
your face. Thus, for example, you may experience both fear and disgust at the same
time. Your eyes and eyelids may signal fear, and movements of your nose, cheek,
and mouth area may signal disgust.’ Moreover, research studies (e.g., Ekman 1994,
1999) shows that certain facial areas—lower and upper—reveal our emotional state
better than others. For example, the eyes tend to reveal happiness or sadness, and
even surprise; anger can be revealed through eyebrows and forehead; and the lower
face can reveal happiness or surprise. The degree of pleasantness, friendliness, and
sympathy felt can be communicated by facial movements alone and the rest of the
body does not provide any additional information (DeVito 2002). But for other
emotional messages, such as the intensity with which an emotion is felt), both facial
and bodily cues are used (Graham et al. 1975; Graham and Argyle 1975).
Furthermore, listeners vary in their ability to decode and speakers in their ability to
encode emotions (Scherer 1986). In view of that, there are some emotions that are
difficult to disambiguate or differentiate. In terms of emotions types, it is sometimes
difficult to distinguish between fear and angst or frustration and irritation and anger,
while it is unlikely, in terms of emotion valence, to confuse disgust with sympathy
or admiration with detestation. According to Solomon (1993), individuals some-
times misattribute the specific emotion types, but they rarely misattribute their
valence. Unlike humans, computers may confuse valence when it comes to facial
expression recognition. Wimmer et al. (2009) state findings indicating that facial
8.7 Facial Expressions and Computing 429

expressions happiness and fear are confused most often due to the similar muscle
activity around the mouth. This is also reflected by Facial Action Coding System
(FACS) that describes the muscle activities within a human face (Ekman 1999).

8.7.2 Recognizing Emotion from Facial Expressions:


Humans and HCI Applications

As an explicit affect display, facial expressions are highly informative about the
affective or emotional states of people. The face is so visible that conversational
participants can interpret a great deal from the faces of each other. Facial expressions
can be important for both speakers and listeners in the sense of allowing listeners to
infer speakers’ emotional stance to their utterances and speakers determining their
listeners’ reaction to what is being uttered or expressed. Facial cues can constitute
communicative acts, comparable to ‘speech acts’ directed at one or more interaction
partner (Bänninger-Huber 1992). Recognizing facial displays is one of the aspects of
natural HCI and of the challenges to augment computer systems with aspects of
human–human (or human-like) interaction capabilities. Equipping systems with
facial expression recognition abilities is an attempt to create HCI applications that
are aimed to take the holistic nature of the human user into account—that is, to touch
humans in holistic and sensible ways, by considering human emotion, (expressive)
behavior and (cognitive) intention (for more detail on this dimension see next
chapter). This concerns the emerging and future affective systems in terms of
becoming more intuitive, aware, sensitive, adaptive, and responsive to the user.
Widespread applicability and the comprehensive benefit motivate research on the
topic of natural interaction, one important feature of which is facial expression
recognition. Perceiving or being aware of human emotions via facial expressions
plays a significant role in determining the success of next-generation interactive
systems intended for different applications, e.g., computer-assisted or e-learning
systems, conversational agents, emotionally intelligent systems, emotional context-
aware systems, and emotion-aware AmI systems. The quality, success rate, and
acceptance of such applications or their combination will significantly rise as
technologies, especially recognition or capture techniques, for their implementation
will evolve or advance. In a multidisciplinary work on automatic facial expression
interpretation, Lisetti and Schiano (2000) integrate human interaction, AI,
and cognitive science with an emphasis on pragmatics and cognition. Their work
provides a comprehensive overview on applications in emotion recognition. Also,
interdisciplinary research (interactional knowledge) crossing multiple disciplines
(including cognitive psychology, cognitive science, computer science, behavioral
science, communication behavior, and culture studies) is necessary in order
to construct suitable, effective interaction methods and user interfaces and,
thus, successful and widely accepted (affective) interactive systems. Indeed, cul-
tural studies are very important when it comes to HCI design in all of its areas.
430 8 Affective Behavioral Features of AmI …

In terms of affective HCI, cultural variations are great as different cultures may
assign different meanings to different facial expressions. For example, a smile as a
facial expression can be considered a friendly gesture in one culture while it can
signal embarrassment in another culture. Hence, affective HCI whether concerning
affective and AmI systems or conversational systems should account for cultural
variations as a key criteria for building effective user interfaces. The implementation
or instantiation of technological systems in real-world environments may run
counter to what the evaluation conducted in the lab may have to say about the
performance of technologies. In fact, what is technical feasible and risk-free within
the lab may have implications in real life.

8.7.3 Research Endeavors in Facial Expression Recognition


in HCI

Considerable research is being carried out on the topic of facial displays with focus
on the relationship between facial expressions and gestures and emotional states
within the field of affective, emotion-aware, and context-aware HCI. Automatic
recognition of human facial displays, expressions and gestures, has particularly in
recent years gained a significant ground in natural HCI—naturalistic user interfaces
used by affective, conversational, and AmI systems alike. HCI community is
extensively investigating the potential of facial displays as a form of implicit input
for detecting the emotional states of users. To date most research within computing
tends to center on recognizing and categorizing facial expressions (see, e.g., Gunes
and Piccardi 2005; Kapur et al. 2005). Recent research projects are exploring how
to track and detect facial movements corresponding to both lower and upper fea-
tures with the hope to integrate the state-of-the art facial expression analysis
modules with new miniaturized (multi)sensors to reliably recognize different
emotions. A number of approaches into facial expression recognition have been
developed and applied to achieve real-time performance and provide robustness for
real-world applicability. Research is indeed focusing on building a real-time system
for facial expressions recognition that robustly runs in real-world environments
with respect to the implementation of conversational and affective systems. Most of
the popular systems for facial expressions recognition (e.g., Cohen et al. 2003; Sebe
et al. 2002; Tian et al. 2001; Pantic and Rothkrantz 2000; Edwards et al. 1998;
Cohn et al. 1999; Wimmer 2007; Wimmer et al. 2009) are built based on the
universal six facial expressions. Figure 8.2 illustrates one example of each of ‘the
six universal facial expressions’ (Ekman 1972, 1982) as they occur in Kanade et al.
(2000), according to a comprehensive database for facial expression analysis for
automatic face recognition.
In Ekman (1999), the Facial Action Coding System (FACS) describes the
muscle activities within a human face. Facial expressions are generated by com-
binations of Action Units (AUs), which denote the motion of particular facial
8.7 Facial Expressions and Computing 431

happiness anger disgust sadness fear surprise

Fig. 8.2 The six universal facial expressions. Source Kanade et al. (2000)

fractions and state the facial muscles concerned. Based on principles of neuro-
physiology, anatomy and biomechanics, motor neurons supply groups of muscle
fibers with their innervations form motor units, which are connected to the primary
motor cortex of the brain via the pons: an area which conveys the ability to move
muscles independently and perform fine movements. Theoretically, the fewer fibers
are in each motor unit, the finer the degree of facial movement control. On the other
hand, extended systems such as emotional FACS (Friesen and Ekman 1982) denote
the relation between facial expressions and corresponding emotions. In an attempt
to expand his list of basic emotions, Ekman (1999) provide a range of positive and
negative emotions not all of which are encoded in facial muscles, including
amusement, contempt, contentment, embarrassment, excitement, pride in achieve-
ment, relief, satisfaction, sensory pleasure, shame, and so on. Also, research shows
that some facial expressions can have several meanings at the same time for they
normally have different functions and indicate different things. Investigation is
active to develop new approaches to address related issues in the area of facial
expression recognition in relation to affective HCI applications. ‘Given the
multi-functionality of facial behavior and the fact that facial indicators of emotional
processes are often very subtle and change very rapidly…, we need approaches to
measure facial expressions objectively—with no connotation of meaning—on a
micro-analytic level. The Facial Action Coding System (FACS; Ekman and Friesen
1978) lends itself to this purpose; it allows the reliable coding of any facial action in
terms of the smallest visible unit of muscular activity (Action Units), each referred
to by a numerical code. As a consequence, coding is independent of prior
assumptions about prototypical emotion expressions. Using FACS, we can test
different hypotheses about linking facial expression to emotions’ (Kaiser and
Wehrle 2001, pp. 287–288).

8.7.4 The Common Three-Phase Procedure of Facial


Expression Recognition

Commonly, automatic facial expression recognition is considered to be a task of


computer vision, which entails the ability of a computer to analyze visual input (in
this case emotional signs as part of facial cues). This computer application auto-
matically identifies or detects a person’s facial expression from a digital image or a
video frame from a video source, by comparing selected (localized) facial features
432 8 Affective Behavioral Features of AmI …

from the image and a facial expression database. It is important to extract mean-
ingful features in order to derive the facial expression visible from these features.
This task consists of various subtasks and involves a wide variety of techniques to
accomplish these subtasks, which generally include localizing facial features,
tracking them, and inferring the observable facial expressions. Several
state-of-the-art approaches into performing these subtasks could be found in the
literature (e.g., Chibelushi and Bourel 2003; Wimmer et al. 2009), some of which
will be referred to in this section. According to the survey of Pantic and Rothkrantz
(2000), computational procedure of facial recognition involves three phases, face
detection, feature extraction, and facial expression classification (happiness, anger,
disgust, sadness, fear, surprise).
Phase 1: Like in all computer tasks, different methods exist for performing face
detection as part the overall procedure of facial expression recognition. Face
detection task can be executed automatically as in Michel and El Kaliouby (2003),
Cohn et al. (1999) or manually as to specifying the necessary information to focus
on the interpretation task itself as in Cohen et al. (2003), Schweiger et al. (2004),
Tian et al. (2001). However, according to Wimmer et al. (2009, p. 330), ‘more
elaborate approaches make use of a fine grain face model, which has to be fitted
precisely to the contours of the visible face. As an advantage, the model-based
approach provides information about the relative location of the different facial
components and their deformation, which turns out to be useful for the subsequent
phases’.
Phase 2: Face extraction is mainly concerned with the muscle activity of facial
expressions. Most approaches use the Facial Action Coding System (FACS).
Numbering over twenty, the muscles of facial expressions allow a wide variety of
movements and convey a wide range of emotions (Gunes and Piccardi 2005; Kapur
et al. 2005) or emotional states. Specifically, the muscles activity allows various
facial actions depending on what expressive behavior is performed and conveys a
wide range of emotions, which are characterized by a set of different shapes as they
reach the peak expression. Facial expressions consist of two important aspects: the
muscle activity while the expression is performed and the shape of the peak
expression’, and methods used in this phase tend ‘to extract features that represent
one or both of these aspects’ (Wimmer et al. 2009). When it comes to face
extraction, approaches may slightly differ as to the number of feature points that are
to be extracted from the face, which depends on what area of the face is mostly the
focus as well as on the approach adopted. Within the face, Michel and El Kaliouby
(2003) extract the location of 22 feature points that are predominantly located
around the eyes and around the mouth. In their approach, they focus on facial
motion by manually specifying those feature points and determine their motion
between an image showing the neutral state of the face and another representing a
facial expression. In a similar approach, Cohn et al. (1998) uses hierarchical optical
flow approach called feature point tracking in order to determine the motion of 30
feature points. Schweiger et al. (2004) manually specify the region of the visible
face while (Wimmer et al. 2009)’s approach performs an automatic localization via
model-based image interpretation.
8.7 Facial Expressions and Computing 433

Phase 3: The last phase is concerned with the classification of facial expressions.
That is, it determines which one of the six facial expressions is derived or inferred
from the extracted features. Likewise, there are different approaches into facial
expression classification, clustered into supervised learning (i.e., HMMs, neural
networks, decision trees, support vector machines) and unsupervised learning (i.e.,
graphical models, multiple eigenspaces, variants of HMMs, Bayes networks).
Michel and El Kaliouby (2003) utilize or train a support vector machine (SVM) to
determine one of the six facial expressions within the video sequences of the com-
prehensive facial expression database developed by Kanade et al. (2000) for facial
expression analysis. This database is known as the Cohn-Kanade-Facial-Expression
database (CKFE-DB) contains 488 short image sequences of 97 different individuals
performing the six universal facial expressions and each sequence shows a neutral
face at the beginning and then build up to the peak expression. To accomplish
classification, Michel and El Kaliouby (2003) compare the first frame with the
neutral expression to the last frame with the peak expression. Basing their classifi-
cation instead on supervised neural network learning and in order to extract the facial
features, Schweiger et al. (2004) compute the optical flow within 6 predefined
regions of a human face. Other existing approaches follow Ekman and Friesen
(1978)’s rules by first computing the visible action units (AUs) and then inferring the
facial expression.

8.8 Approaches, Frameworks, and Applications

The AmI community increasingly understands that developing emotion-aware and


affective context-sensitive applications that can adaptively and responsively serve
the intentions of the emotional states of users should be supported by adequate
emotion modeling solutions. Hence, the initiative of including affective computing
in AmI is increasingly gaining interest. This is manifest in the recent research
endeavors attempting to incorporate a wide range of applications from AI with
AmI. In addition to including affective aspects of users into context-aware com-
puting, there is a rising tendency in AmI research to integrate more affective
computing applications to enhance HCI by humanizing AmI systems. Examples of
applications that are of interest to the AmI community include: emotion-aware HCI,
multimodal context-aware affective HCI, (context-aware) emotionally intelligent
HCI, conversational HCI, and emotion-oriented HCI (e.g., HUMAINE). Numerous
technologies for the implementation of affective systems have been developed since
late 90s, the period when computer scientists in affective computing started to focus
on developing computer devices that recognize and respond to human affect.
A variety of new projects have recently been launched, others are currently being
investigated and some are under evaluation for further improvements. While most
research projects are happening in the field of affective computing, there are some
joint research endeavors between AI and AmI. However, most of the systems and
434 8 Affective Behavioral Features of AmI …

applications that have been developed so far are far from real-world implementa-
tion. On the whole, the research in the field is still in its infancy.
In the following, an approach to the estimation of user’s affective states in HCI
and frameworks are presented and described along with related example applica-
tions. It is to note that both the approach and the frameworks are preliminary and
the proposed applications are still at very early stages. The approach is a step
towards the full implementation of ability EIF. The first framework is a modeling
approach into multimodal context-aware affective interaction. It is a domain
ontology of context-aware emotions, which serves particularly as a guide for
flexible design of affective context-aware applications. The second framework is a
model for emotion-aware AmI, which aims to facilitate the development of appli-
cations that take their user’s emotion into account, by providing responsive services
that help users to enhance their emotional intelligence.

8.8.1 Towards Context-Aware Effective AmI Systems:


Computing Contextual Appropriateness of Affective
States

In this line of research, in Ptaszynski et al. (2009), the authors propose an approach
to the estimation of user’s affective states in HCI, a method for verifying (com-
puting) contextual appropriateness of affective states conveyed in conversations,
which is capable of specifying users’ affective states in a more sophisticated way
than simple valence classification. Indeed, they assert that this approach is nove, as
it attempts to go beyond the first basic step of EIF—emotion recognition, and
represents a step forward in the implementation of EIF. Their argument for this
method making a step towards practical implementation of EIF is that it provides
machine computable means for verifying whether an affective state conveyed in a
conversation is contextually appropriate. Apart from specifying what type of
emotion was expressed, the proposed approach determines whether the expressed
emotion is appropriate for the context it is expressed or appears in—the appro-
priateness of affective states is checked against their contexts. One more important
feature of this method is its contribution to the classification standardization of
emotions as it uses the most reliable one available today. The proposed method uses
affect analysis system on textual input to recognize users’ emotions—that is to
determine the specific emotion types as well as valence, and a Web mining tech-
nique to verify their contextual appropriateness.
This approach has demonstrated the difficulty in disambiguating emotion types
and valence, since the accuracy of determining contextual appropriateness of
emotions was evaluated against 45 % for specific emotion types and against 50 %
for valence. Accordingly, the authors state that the system is still not perfect and its
components need improvement, but it defines a new set of goals for affective
computing and to the research of AI in general, nevertheless. An example of an
8.8 Approaches, Frameworks, and Applications 435

application of their system is where a conversation agent can choose to either


sympathize with users or help them manage their emotions, that is, it can be
provided with hints about how to desirably plan its communication at any point.

8.8.2 Multimodal Context-Aware Affective Interaction

In Cearreta et al. (2007), the authors propose a generic approach to modeling


context-aware emotions, domain ontology of context-aware emotions, taking dif-
ferent theoretical models of emotions into account. This ontology is defined based
on references found in the literature, introduces and describes important concepts
and mechanisms used in the affective computing domain to create models of
concrete emotions. The authors state that this application ontology contains all the
necessary concepts to model specific applications, i.e., affective recognizers in
speech, and enables description of emotions at different levels of abstraction while
serving as a guide for flexible design of multimodal affective devices or
context-aware applications, independently of the starting model and the final way of
implementation. This domain ontology of context-aware emotion collects infor-
mation obtained of different emotion channels (e.g., facial expressions, postural
expressions, speech paralinguistic parameters, psychophysiological responses),
providing the development of multimodal affective applications. The authors
maintain that this generic ontology can be useful for the description of emotions
based on the various systems of emotion expression and detection which are
components that constitute user context. See Chap. 5 for a detailed description of
the proposed ontology.

8.8.3 Emotion-Aware AmI

In an attempt to facilitate the development of applications that take their user’s


emotions into account and participate in the emotion interaction, Zhou et al. (2007)
propose AmE framework: a model for emotion-aware AmI. This (preliminary)
framework integrates AmI, affective computing, emotion ontology, service ontol-
ogy, service-oriented computing, and emotion-aware services. It espouses an
ontological approach to emotion modeling and emotion-aware service modeling.
Emotion modeling is, in this framework, responsible for two components: emotion
detection and emotion motivation acquisition. The former component identifies
positive and negative emotions that are represented by emotion actions through
facial expressions, hand gestures, body movements, and speech, whereas the latter
whereas the latter identifies the intention of the emotion. Emotion-aware service
modeling is responsible for reacting to the identified emotion motivations by cre-
ating services, delivering services (supplying appropriate emotion services to the
users), and managing the delivery of emotional services. The service creation
436 8 Affective Behavioral Features of AmI …

involves emotion-aware service composition (assembling existing services) and


emotion-aware service development (creating new services in response to identified
emotion motivation).
Affective computing and affective AmI have numerous potential applications in
HCI. One can think of any scenario that may involve the affective states of the user
when interacting with technology. Hence, examples in this regard are numerous.
Using behavioral user state based on eye gaze and head pose, e-learning applica-
tions can adjust the presentation style of a computerized tutor when a learner is
bored, interested, frustrated, or pleased (Asteriadis et al. 2009). Currently,
e-learning platforms use an interactive learning media that provide dynamic feed-
backs to learners’ behaviors and emotions in real-time. This has many features that
help learners be active in participation and facilitate expressivities, which help in
stimulating learning as to handling tasks and acquiring new knowledge as well as
interacting with implicit interfaces. The MIT affective computing team (Zhou et al.
2007) has investigated how to assess indirect frustration, stress, and mood through
natural interaction and conversation. As mentioned earlier, the system ERMIS can
interpret the users’ emotional states (e.g., boredom, anger, interest) from facial
expressions and gestures. In computer-assisted learning, a computer acts as the tutor
by explaining the contents of the lesson and questioning the user afterwards, using
facial expression recognition techniques. In terms of affective context-aware
application, a user interface (agent) can change its visualization by selecting rele-
vant colors, size and fonts in ways that adapt to the current user’s affective states.
Adding to the above examples, an emotion monitoring agent sends a warning as a
context-dependent action prior to users’ acts. An affective computer can improve
the user’s self-awareness of his/her emotional states. A computer selects music
tracks based on the user mood or emotional state. However, regardless of the
emotion application type or domain, affective and AmI systems must be evaluated
in realistic world environment. This is crucial because AmI entails a complex
interaction and most AmI systems are still immature. It is the real setting that better
tells about the utility of a system designed in and for a laboratory environment.
Experiences have shown that many things that are technically feasible within the
lab may have sincere implications in the real-world setting. That is, systems that
work according to the designer may not necessarily work according to the user—the
real-world situation. The realizations of affective and AmI systems in their oper-
ating environments is the primary means to test the performance of such systems,
especially when it comes to intelligent behaviors.

8.9 Socially Intelligent AmI Systems: Visual, Aesthetic,


Affective, and Cognitive Aspects

One interesting and important aspect of AmI is the system feature of social intel-
ligence: the ability to understand, manage, and, to some extent, negotiate complex
social interactions and environments. AmI is envisioned to be an integral part of
8.9 Socially Intelligent AmI Systems … 437

people’s social life. AmI systems should support the social interactive processes of
humans and be competent social agents in social interactions (Markopoulos et al.
2005; Nijholt et al. 2004; Sampson 2005). Emotions are a key element of socially
intelligent behavior. Accordingly, for AmI systems to serve human users well, they
are required to adapt to their emotions and thus elicit positive feelings in them, not
to be disturbing or inconvenient. Socially intelligent features of a system lie in
invoking positive feelings in the user (Markopoulos et al. 2005). A system designed
with socially intelligent features is one that is able to select and fine-tune its
behavior according to the affective (or emotional) state and cognitive state (task) of
the user (see Bianchi-Berthouze and Mussio 2005). The aim of AmI is to design
applications and environments that elicit positive emotions (or trigger emotional
states) and pleasurable user experiences. To ensure satisfactoriness and pleasur-
ability and thus gain acceptance for AmI, applications need not only to function
properly and intelligently and be usable and efficient, but they also need to be
aesthetically pleasant and emotionally alluring. In fact, Aarts and de Ruyter (2009,
p. 5) found that social intelligence, elements from it, plays a central role in the
realization of the AmI vision, in addition to cognitive intelligence and computing.
They reaffirm the notion of intelligence alluded to in AmI, the behavior of AmI
systems associated with context-aware, personalized, adaptive, and anticipatory
services, needs to capture empathic, socialized, and conscious aspects of social
intelligence. AmI systems should demonstrate empathic awareness of users’ emo-
tions or emotional states and intentions by exhibiting human-like understanding and
supportive behavior; the way such systems communicate should emphasize com-
pliance with conventions; and the reasoning of such systems should be reliable,
transparent, and conscientious to the user so as to gain acceptance and ensure trust
and confidence. In relation to emotional awareness, the affective quality of AmI
artifacts and environments as well as the smoothness, intuitiveness, and richness of
interaction evoke positive feelings in users. Positive emotions can be induced by
both subjective, socioculturally situated interpretation of aesthetics as well as
subjective experiences of interactive processes. Therefore, AmI systems should be
equipped with user interfaces that merge hypermedia, visual, aesthetic, naturalistic,
multimodal, and context-aware tools—social user interfaces. These involve artifi-
cial and software intelligent agents that interact with humans, creating the sense of
real-world social interaction, thereby supporting users’ social interactive processes.
With its learning capabilities, a social intelligent agent is capable to learn from the
repeated interactions with humans (social interactive processes) and behave on the
basis of the learned patterns while continuously improving the effectiveness of its
performance to become competent in social interactions. This is like other types of
learning machines, where the key and challenge to adding wit (intelligence) to the
environment lies in the way systems learn and keep up to date with the needs of the
user by themselves in light of the potential frequent changes of people, preferences,
and social dynamics in the environment. Social processes and social phenomena are
forms of social interaction. According to Smith and Conrey (2007), a social phe-
nomenon occurs as the result of repeated interactions between multiple individuals,
and these interactions can be viewed as a multi-agent system involving multiple
438 8 Affective Behavioral Features of AmI …

subagents interacting with each other and/or with their environments where the
outcomes of individual agents’ behaviors are interdependent in the sense that each
agent’s ability to achieve its goals depends on what other agents do apart from what
it does itself.
With its social intelligence features, AmI technology is heralding a radical
change in the interaction between human users and computers, giving rise to novel
interaction design that takes the holistic nature of the user into account. HCI
research is currently active on investigating how to devise computational tools that
support social interactive processes and on addressing important questions and
mechanisms underlying such processes. For instance, how to address questions
relating to subjective perception of interactions and aesthetics, the focus in HCI
research is on considering and developing new criteria when it comes to presenting
and customizing interactive tools to support affective processes. As mentioned in
Bianchi-Berthouze and Mussio (2005, p. 384), ‘Fogli and Piccinno suggest using
the metaphor of the working environment to reduce negative affective states in
end-users of computational tools and to improve their performance. Within the
working environment, they identify a key-role user that is an expert in a specific
domain (domain expert), but not in computer science, and that is also aware of the
needs of the user when using computational systems. The approach enables domain
experts to collaborate with software, and HCI engineers to design and implement
context- and emotion-aware interactive systems. The authors have developed an
interactive visual-environment…which enables the domain-expert user to define the
appearance, functionality and organization of the computational environment’.
As regard to creating computational models that support social interactive pro-
cesses, although much work still needs to be done, dynamic models have been
developed for cognitive and emotional aspects of human functioning and imple-
mented in AmI applications. Although these models have yielded and achieved
good results and implemented in laboratory settings, they still lack usability in real
life. This applies, by extention, to the enabling technologies underlying the func-
tioning of AmI systems. Put differently, the extant developed AmI systems are still
associated with some shortcomings as to accurately detect, meaningfully interpret,
and efficiently reason about the cognitive and emotional states of human users, and
therefore they are far from real-word implementation. In terms of social intelli-
gence, ‘[t]he vision of intelligence in AmI designs is taken…to a new level of
complication in describing the conditions that could introduce…true intelligence.
Thus, AmI 2.0 applications demonstrate only a minute step in that direction, e.g., of
facilitating users with the means for intelligent interaction, affective experience, but
also control. The gap that still needs bridging…relates to the following design
problems: (1) how to access and control devices in an AmI environment; (2) how to
bridge the physical and virtual worlds with tangible interfaces; (3) What protocols
are needed for end-user programing of personalized functionality; (4) how to
capture and influence human emotion; (5) how to mediate social interaction for
social richness, immediacy and intimacy; (6) how devices can persuade and
motivate people in a trustful manner, say, to adopt healthier lifestyles, and; (7) how
to guarantee inclusion and ethically sound designs…. [E]xperience research holds
8.9 Socially Intelligent AmI Systems … 439

the key to eventually bridging this gap between the fiction and concrete realizations.
For example, understanding experience from a deep personality point of view will
unlock unlimited possibilities to develop intelligent applications.’ (Gunnarsdóttir
and Arribas-Ayllon 2012, p. 29). It is not an easy task for AmI systems to emulate
socially intelligent understanding and supporting behavior of humans—that is, to,
in particular, select and fine-tune actions according to the affective and cognitive
state of users by analyzing and estimating what is going on in their mind and
behavior based on observed information about their states and actions over time,
using sensor technologies and dynamic models for the their cognitive and emotional
processes, coupled with exploiting the huge potential of machine learning tech-
niques. More effort is needed for further advancement of the mechanisms, tech-
niques, and approaches underlying the functioning of AmI as socially intelligent
entities. One important issue in this regard is that it is necessary for AmI systems
(intelligent social agents) to be designed in such a way to learn in-action and in a
dynamic way from the user’s emotional and cognitive patterns in social interactive
processes so to be able to make educated or well-informed guesses/inferences about
the user’s affective state and the context of the task and thereby determine the best
behavior in real-time manner. Any supporting behavior performed by systems
designed with socially intelligent features in terms of adaptation and responsiveness
should be based on a (dynamic) combination of real-time reasoning capabilities and
pre-programed heuristics. If with regard to humans, following the tenets of cogn-
itivism, the application of time saving heuristics always result in simplifications in
cognitive representations and schemata, pre-programed heuristics may well intro-
duce bias in computational processing—intelligent agent’s favoritism, which may
carry its effects over to application actions. This is predicated on the assumption
that heuristics are fallible and do not guarantee an accurate solution. In computer
science, a heuristic algorithm is able to produce an acceptable solution to a problem
in different scenarios, but for which there is no formal evidence of its correctness.
Besides, given the variety of users and interactions and the complexity inherent in
social interactive processes, it is not enough for intelligent agents to solely rely on
pre-programed heuristics in their functioning. The basic premise is that such heu-
ristics are more likely to affect reasoning efficiency and hence action appropriate-
ness, which might be disturbing or inconvenient to the user, thus failing to adapt to
users’ emotions. Among the main challenges in AmI pertaining to socially intel-
ligent systems are the performance of such systems given that they need to be
timely in acting; effective models of user interaction with such systems, including
their update and improvements over time; and enabling proactivity in such systems
through dynamic learning and real-time reasoning. There is a need for novel
approaches into integrating different learning techniques, modeling approaches, and
reasoning mechanisms to support social interactive processes.
By demonstrating a novel interaction between human users and computing
technology, socially interactive systems determine an evolution in the culture of
computing. On this culture, Schneiderman (2002) claims that the new computing is
about what people can do, while the old computing is about what computers can do.
Advances in knowledge of affective and cognitive processes of humans and how
440 8 Affective Behavioral Features of AmI …

they influence social interactive processes, coupled with advancement in enabling


technologies, are projected to bring radical change in the way new HCI applications
are designed and behave. Research within cognitive science, human communica-
tion, and computing technology is mushrooming towards revolutionizing HCI. AmI
community has realized the importance of the emotional and social aspects of the
users in determining the unfolding of the interactive processes, thereby the
increasing interest among HCI researchers towards the study of social intelligence
in the area of AmI. However, building such systems create new challenging
problems to designers, modelers, and implementers whose responsibility is to
ensure that implemented systems fulfill users’ expectations and needs and allow
users to properly exploit them—that is, based on insights into the way users aspire
to socially interact with computers as well as the effects such interaction can have
on their cognitive world. The basic assumption is that not all users may accept
socially intelligent features of AmI systems (e.g., aesthetics), and other my face
difficulties in benefiting from or making use of them. Regardless, AmI is marking a
new turn of HCI. There is growing interest within HCI in amalgamating
context-aware, multimodal, naturalistic, perceptual, visual, and aesthetic features in
new applications. These systems offer alluring possibilities to users in their inter-
action in terms of aesthetic experience and visual thinking—which improve user
performance. Therefore, they are increasingly proliferating, spanning a wide range
of application areas, such as education, learning, computer-mediated human–human
communication, workspaces, and so on. It is evident that emotion is gaining more
attention and becoming increasingly determining in the design and use of AmI
systems as socially intelligent entities. The role and importance of emotions as a
key element of socially intelligent behavior has been demonstrated by many studies
in cognitive science (Damasio 1994). Given the scope of this chapter, the issue
pertaining to the relationship between affective processes supported by computa-
tional tools and how they influence cognitive processes in relation to task perfor-
mance as a facet of socially intelligent systems are addressed in more detail in the
next chapter.

8.10 Evaluation of AmI Systems in Real-World Settings:


Emotions and User Experience

Research and design sensitive to the user’s emotions is required in order for AmI
systems to be socially intelligent—able to select and tune actions according to the
emotional and cognitive states of the user—and thus ensure acceptability, appro-
priateness, and pleasurability. Research in AmI must primarily design effective
methods to evaluate all types of computational artifacts in real-world setting,
including affective, emotion-aware, context-aware affective, cognitive context-
aware, conversational agents, and so on. But affective AmI applications are espe-
cially important in this regard, as they directly concern the emotions of users—their
8.10 Evaluation of AmI Systems in Real-World Settings … 441

expectations and motivations as to the perceived benefits of the use of affective


artifacts. And compared to other artifacts, they remain relatively easy to assess if
relevant metrics and tools can be adopted. Affective artifacts should be imple-
mented and evaluated against appropriate assessment criteria of utility in real-world
environments. Potential shortcomings and failings must be identified and analyzed
and appropriate solutions must be suggested to improve affective artifacts with the
goal to enhance user experiences and emotions. The related outcomes are intended
to determine whether the expectations of users are being met, their motives grati-
fied, and their interaction experience heightened. The basic question is, how well
users interact with the system and to what extent does this have positive effects on
them? Recall that progress is achieved when the user experience and emotions are
enhanced.
Evaluation requires the development of metrics and the measurement of user
experience and emotions according to those metrics. Metrics define what the
evaluator is trying to accomplish with the affective artifact. Lack of metrics and
failure to measure the user experience and emotions according to the established
criteria may result in an inability to effectively judge research efforts put into
creating the affective artifacts that are intended to be socially intelligent, to react
adaptively and responsively to users’ emotional states, in particular. Doing such an
evaluation in real-world environment is a necessary empirical work whose aim is to
test the performance of an AmI artifact, a procedure that is in turn based on
predefined metrics that are developed based on the intended use of this artifact.
However, while AmI aims to intelligently adapt in response to the user’s
emotional states, few rigorous studies on how to evaluate the performance of
affective AmI systems in real environments have been published in mainstream
AmI journals. This implies that current research on AmI seems to pay little attention
to the development of evaluation methods for emotions and user experience,
compared to the design, building, evaluation, and implementation of the compo-
nents underlying AmI systems, such as modeling formalisms, pattern recognition
algorithms, capture technologies, recognition approaches, and other technologies
used for the design and implementation of AmI systems, though in laboratory
environments. ‘Current research on AmI is active in terms of technologies, appli-
cations, and infrastructures, but less on suitable evaluation methods, in particular,
methods for evaluating emotions. AmI is a complex thing, and should be evaluated
as such. This is difficult because real-use AmI systems are still immature’ (Tähti and
Niemelä 2005, p. 1). Sharp and Rehman (2005) note that much research centers on
designing and building new applications or proof-of-concepts, and does not strive
for coherent understanding and knowledge of AmI. The adaptive and responsive
behavior of AmI systems should rather be evaluated for its value in real life setting,
implementation in realistic environments. Especially, in computer science, it is
realized that systems that work in lab or ‘on paper’ may not necessarily work as
expected in real-world contexts. In this account, implementation provides the real
proof, which is evident in AmI where achieving intelligent behavior in terms of
responding to users’ emotions is a research objective. Moreover, implementation of
affective AmI systems in real environments is the primary means of identifying
442 8 Affective Behavioral Features of AmI …

deficiencies in many components underlying the instantiation of these systems—


e.g., emotion and context models, recognition techniques, and design methods. On
the adaptive behavior of AmI systems in response to the user’s emotional state,
Noldus (2003) note that to examine or assess the performance of an AmI system it
is necessary to gauge the physiological state of the user before and during the
interaction with the system, especially if it is also desired to know if the user’s state
changes in the process. Overall, it is important to determine why and how an AmI
system works within its environment. This is the way forward to assess the complex
AmI interaction in its threefold dimension, users, applications, and the operating
environment. Drawing on March and Smith (1995), research frameworks in AmI
facilitate the generation of specific research hypotheses by posing interactions
among identified variables, which, in addition to providing innumerable research
questions, should recognize that AmI research is concerned with artificial phe-
nomena operating for a purpose within an environment as well as the adaptive
nature of such phenomena—that is, artifacts are adapted to their changing envi-
ronments. The growing variety of users and interactions adding to ‘the changeable
context of use of AmI systems set new challenges for research of user experience
and emotions, especially because many current methods are developed for labo-
ratory conditions… In field settings…it is difficult to observe users and gauge their
emotions without affecting the user and user experience. Video cameras and other
equipment may make users feel uncomfortable and restrict their natural movements
and actions’ as it ‘may draw the attention of other people, and so alter the situation.
Elimination of environmental factors is of course not desirable in real-use evalu-
ation, as real-use environment is a valid parameter that needs to be taken into
account when designing products’ (Tähti and Niemelä 2005, p. 66). A variety of
evaluation methods have been developed and are being applied for use in lab
conditions to evaluate the performance of early AmI systems in relation to emo-
tional aspects of humans. But there is a need for novel tools and processes beyond
the laboratory conditions to evaluate how well AmI systems perform in terms of
user experience and emotions in real AmI environments. Given the complex nature
of the interaction between the user and AmI system, the evaluation of this intricate
interaction in real-world setting is more likely to provide useful information about
the performance of AmI systems. A sound approach to do so could be to use what I
identify as context-aware evaluation methods (e.g., affect display as indicators of
emotional cues and psychophysiological responses) when interacting with AmI
systems in a real usage situation, thus allowing the presence of environmental
factors, a criterion that is desirable in real-use evaluation. As referred to by Kaiser
and Wehrle (2001), in relation to componential approach followed by appraisal
theorists, a number of researchers support the idea of analyzing facial expressions
as indicators of appraisal processes as an alternative or additional to verbal report
measures (Frijda and Tcherkassof 1997; Kaiser and Scherer 1998; Pope and Smith
1994; Smith and Scott 1997). This approach can be very effective if taken into
account in designing evaluation methods for emotions to gauge the performance of
AmI systems. Or, this could be combined with verbal report (and a nonverbal user
feedback) for a more effective evaluation of users’ emotions. Indeed, as suggested
8.10 Evaluation of AmI Systems in Real-World Settings … 443

by Leventhal and Scherer (1987, cited in Kaiser and Wehrle 2001), the idea of
using facial expressions as indicators is motivated by the fact that emotion-
antecedent information processing can occur at different levels. ‘…appraisal pro-
cesses occurring on the sensory-motor or schematic level that are not or only with
great difficulty accessible through verbalization might be accessible via facial
expressions… Another reason for analyzing facial expressions in experimental
emotion research is that they are naturally accompanying an emotional episode,
whereas asking subjects about their feelings interrupts and changes the process’
(Kaiser and Wehrle 2001, p. 285). However, evaluation should capture not only
users’ emotions, but also other factors of relevance to the interaction experience.
Tähti and Niemelä (2005) argue that ‘to understand the multifaceted interaction
situation with complex AmI systems’, it is necessary to have ‘more profound
information than just the user’s feeling at the moment of interaction’, especially ‘to
understand the context, in which the feeling is evoked in the mind of the user.’
Furthermore, it is important to ensure that evaluation methods are dynamic, flexible,
and easy to use by evaluators. Over-complex formalization of evaluation methods
may interfere with collecting or capturing more relevant, precise information on the
user emotions sought by the evaluator as he/she may get captured by the adherence
to the appropriate application of the method and eventually fails to spot important
information when conducting the evaluation. The complexity of the interaction with
AmI systems calls for or justifies the simplicity of the evaluation method for
emotions. Failure to use or select a proper type of evaluation method has impli-
cation for the task of evaluation. New evaluation methods should be designed in a
way that allow to obtain profound information on complex user experiences in a
simplified way. There is a need for novel assessment tools that allow collecting rich
data of users’ emotions when they are interacting with applications in real life
situations. As Tähti and Niemelä (2005, p. 66) put it, ‘considering the complexity of
interaction between a user and an AmI system, an emotion assessment method
should be able to capture both the emotion and its context to explain what aspect of
interaction affected to the feelings of the user. The method should be applicable for
field tests and easy to use. Furthermore, the method should minimize the influence
of the researcher on the evaluation and possibly enable long-term studying’.
The study of emotion is increasingly gaining attention among researchers in
affective computing and AmI in relation to emotional intelligence, social intelligence,
and emotion communication, and therefore there is a need for evaluation methods for
emotions to address the challenges associated with emotion technology. Especially,
building this technology has proven to be one of the most daunting challenges in
computing. With the aim to address some issues relating to the evaluation of
emotions in AmI, Tähti and Niemelä (2005) developed a method for evaluating
emotions called Expressing Emotions and Experience (3E), which is a self-report
method that allows both pictorial and verbal reporting, combining verbal and non-
verbal user feedback of feelings and experience in a usage situation. It is validated by
comparing it to two emotion assessment methods, SAM and Emocards which are
also self-report instruments using pictograms for nonverbal assessment of emotions.
The development of 3E is described in detail in Tähti and Arhippainen (2004).
444 8 Affective Behavioral Features of AmI …

This method is a way to collect rich data of user’s feeling sand related context—
mental, physical, and social—whilst using an application without too much burden
on the user. It moreover enables to gauge users’ emotions by users being allowed to
depict or express their emotions and experiences by drawing as well as writing, thus
providing information of their feelings and the motivations behind them based on
their preference, and without the simultaneous intervention of the researcher. The
authors point out that their method applies well to AmI use situations that occur in
real-world environments, does not necessarily require the researcher’s presence, and,
as a projective method, may facilitate expression of negative emotions towards the
evaluated system. However, the author state that this method does not apply well to
evaluations in which the purpose is to evaluate detailed properties of an application.
For a detailed discussion of key properties of AmI applications and their evaluation,
the reader is directed to see Chap. 3.

8.11 Issues, Limitations, and Challenges

Affective and AmI applications are seen as the most sophisticated computing
systems ever, as they involve complex dimensions of emotions, such as context
dependence of emotions, multimodal context-aware emotions, context-aware
emotional intelligence—contextual appropriateness of emotion, culture-dependent
emotions, and so on.

8.11.1 Application of Ability EIF and the Issue


of Complexity

Amalgamating affective computing and AmI provides definitely a fertile and


interesting joint research area. The ensuing applications can be very promising with
regard to augmenting computers with human like interactive emotional capabilities.
To create a computer capable to communicate with a user on a human level, there is
a need to equip it with Ability EIF (Mayer and Salovey 1997). However, the
application of this framework in affective computing and AmI poses many chal-
lenges to the computing community due to the complexity inherent in the theo-
retical models of emotional intelligence. This is much more to emotion than just
emotion recognition of and response to emotional states. But research shows that
even emotion recognition, which is the first step of Ability EIF seems to be not an
easy task. Emotional intelligence involves an array of complex abilities and, as
Salovey and Mayer (1990) contend, recognizing emotions is only the first basic step
to acquire full scope of emotional intelligence. And the few practical attempts still
do not go beyond this first basic step of EIF (Picard et al. 2001), and implemen-
tation attempts of the EIF usually do not go beyond theory (Andre et al. 2004). To
effectively implement the first step it is necessary to ensure the contextual
8.11 Issues, Limitations, and Challenges 445

appropriateness of emotions. In affective computing research, one of the current


issues in emotion recognition, whether related to emotiveness or affect display is the
fittingness of emotions with respect to the situation in which they are expressed.
This is critical to the processing of the affective information concerning the user, as
it enables to acquire more robust identification of the user’s emotional state and thus
make decisions based on proper assumptions about what actions to undertake—
responsive services. In other words, the outcome of subsequent reasoning, inference
and service delivery processes are very much dependent on the contextual appro-
priateness of expressed emotions. Thus, it is necessary to apply contextual analysis
to emotion processing when it comes to emotionally intelligent computers or
affective systems more generally. The valence of emotion is determined by the
context they are expressed in Solomon (1993). In real situations, conversational
participants use the context defining the current interaction to understand and
disambiguate emotion valence, so that they can properly react to their partaker. This
should be accounted for in emotion processing research because ‘positive emotions
are not always appropriate and negative inappropriate’ (Ptaszynski et al. 2009).
Note this is only the first step of the ability EIF, which is considered less com-
plicated than the subsequent steps, as described above. How far the implementation
of EIF can go beyond the theory and thus the first step is a question of whether it is
worth engaging in rigorous research endeavors to create a comprehensive com-
putational model of emotional intelligence or not. Especially, this area of research
in affective computing and AmI requires a lot of interdisciplinary work and thus an
immense collaboration effort, by bringing researchers and scholars from
human-directed sciences and computing together and pooling the knowledge of
their research projects in order to facilitate and speed up the process. It is about
sharing of work between research groups. Otherwise focusing on separate com-
ponents or tasks relating to building emotionally intelligent systems would just
reinforce the reinvention of the wheel. As a likely consequence, it may take way
more time and effort than expected to get such systems in their full form up and
running in terms of the scale, the complexity, the subtlety, and the different
application domains. In this regard, an interdisciplinary team would involve com-
puter scientists, AI experts, computational mathematicians, logicians, cognitive
scientists, cognitive psychologists, neuroscientists, psycholinguists, sociolinguists,
professional linguists, language experts, anthropologists, social scientists, ethicists,
philosophers, but to name a few.

8.11.2 Debatable Issues of Emotions in Affective Computing


and AmI

The complexity surrounding the phenomenon of emotion as a topic and area of


study of psychology has been demonstrated by many scientific studies. The
abundance of emotion conceptualizations and perspectives constitutes a source of
446 8 Affective Behavioral Features of AmI …

ongoing technical and philosophical debates, which demonstrate contradictions in


views and disagreements with regard to theoretical models of emotion or the
emotional complex. Obviously, unsettled issues in scientific research on emotion
have implications for the design, modeling and implementation of affective and
AmI systems. Therefore, there is a plethora of challenges that affective computing is
facing and should address and overcome. First and foremost, there is still no
consensus on what constitute emotion as a system among cognitive scientists, and
defining concepts is, arguably, a fundamental step in doing scientific research.
There is a cacophony of definitions of emotions in addition to a wide range of
theoretical models of emotions. Lack of consistency with respect to concepts and
theoretical perspectives of emotion must have implications for emotion modeling
and thus the performance of affective technologies. It has indeed led to a wide range
of conceptual approaches, ontological frameworks, and computer applications.
These are more likely to be designed for the same purpose yet with different
outcomes of performance. Moreover, affective and AmI applications using different
scales of emotion concepts and oversimplified operational definitions of rather
comprehensive emotion concepts and emphasizing some concepts over others
based on the model of emotion adopted is increasingly generating a cacophony
leading to an exasperating confusion in how affective and AmI systems should be
designed and modeled in relation to emotions. This is evinced by the alphabet soup
of applications that commonly fall under affective technologies. These applications
are conceptually diversified and theoretical foundations and lineages seem, in
practice, to be disregarded and distinctions among applications are less significant
and pragmatic concerns are more important. In fact, pragmatism tends to prevail in
technological development. Advancement is rapid but appears to be happening
ad-hoc when new technologies become available, e.g., sensor technologies, com-
puter vision techniques, and modeling and reasoning techniques, rather than based
on a theoretically clear overall approach and direction. Moreover, it is critical to
critically review operationalization of emotion in affective systems and their impact
on how emotion is conceptualized. Simplifications in operationalizing emotion
influences the way emotion is modeled, which in turn has effect on the performance
of affective systems in their operating (real-world) environments. As one mani-
festation of a simplified operationalization of emotion is that a majority of current
approaches to emotion recognition categorize emotion strictly between positive or
negative, a simple valence classification. Most of the behavioral methods simply
classify emotions to opposing pairs (Teixeira et al. 2008), although recent dis-
coveries prove that affective states should be analyzed as emotion specific (Lerner
and Keltner 2000). Although it has become evident that emotion classification
should go beyond the simple dichotomy to include other complex dimensions of
emotion such as context and motivation, there still seems to be a propensity towards
operationalize concepts in a simplified way. Hence, it is not a matter of whether or
not new theories are developed or discoveries in disciplines become available.
8.11 Issues, Limitations, and Challenges 447

8.11.3 Interpretative and Cultural Aspects of Emotions

Fundamentally, human-to-human interaction and communication differ from AmI


and affective computing as advanced approaches to HCI. Computers can neither
fully understand the meanings humans assign to communication acts, including
emotions, nor respond to or predict their emotional states. It is difficult to precisely
detect and interpret why and how people react emotionally (expressively) to events,
objects, situations, and environments. Indeed, a number of subtasks for capturing
emotional cues or realizing emotions as implicit input reliably, such as recognition
and analysis of situations as well as general anticipation of user motives and
intentions, are still unsolved—and in fact appears at the current stage of research
close to impossible. Cogntivism postulates that cognitions are based on perceptions.
Thus, individuals differ in the way they experience events, objects, situations and
events at the focus of emotional states. It is the unique individual experience
coupled with the very complex perception of the context of the situation that
determines why and how a person’s cognitive system reacts interpretatively and
appraises any event, object or situation prior to an emotional response or reaction—
realized observable behavior or external expression of emotion. ‘An important
implication of appraisal theory is that each distinct emotion has a distinctive pattern
of appraisal, but there are few if any one-to-one relationships between a situation
and an emotional response. It is interpretation of the event or situation, rather than
the event itself, which causes the emotion’ (Hekkert 2004, p. 4). In other words,
appraisal theory is based on the premise that each situation is approached anew by
an individual and the strategies for coping with the environment—e.g., experi-
encing different sensations—employed are the direct result of the situation specific
cognitive appraisal process unique to that moment in time. Users may differently
affectively interpret (appraise) situations (i.e., darkness, silence), and the same goes
for artifacts, events, places, and other stimuli. The perception of the whole reality is
inherently subjective. And meaning is evaluating and evolving in time and as we
interact with the environment and varies from an individual to another depending
on a plethora of factors that shape people experience, including emotional, cogni-
tive, intentional, motivational, biological, intellectual, social, cultural, and so on.
From a conceptually different angle, ‘many of the facial patterns that occur
during an interaction are not ‘true’, spontaneous expressions of an internal emo-
tional experience…Thus, individuals often use expressive behaviors more or less
consciously in order to achieve a social goal, for example, to obtain attention or
support. Here, the subjective feeling of the person and his or her facial expression
may not necessarily correspond. A lack of correspondence between feeling and
expression can also be the result of expression-management processes serving
self-presentation…and expression-control processes demanded by sociocultural
norms, such as “display rules”…’ (Kaiser and Wehrle 2001, p. 287).
Furthermore, while affect has been found across cultures to comprise both
positive and negative dimensions, the normal range of expressed and recognized
affect (whether conveyed through verbal or nonverbal behavior) varies considerably
448 8 Affective Behavioral Features of AmI …

between different cultures and even within the same culture. In other words, there is
no universal way of expressing emotions. Emotions are expressed and interpreted
differently in different cultures. Therefore, affective applications should be designed
in a flexible way if they are wanted to be used with a wider class of users. Also, it is
important for AmI systems—affective context-aware applications—to consider
adopting a hybrid approach to handling affective context-dependent actions—
delivery of responsive services, that is, merging invisibility and visibility, as users
may have different motives behind their emotional states. Personalization is nec-
essary for more efficient interaction and better acceptation of AmI systems.
Therefore, both affective computing and AmI should focus on producing applica-
tions that can be easily personalized to each user and that can merge explicit and
implicit affective interaction. Each user may have a different intent of emotional
state, and hence there is a need to properly adjust parameters accordingly as well as
allow accepting or declining the so-called responsive service. However, only the tip
of the iceberg are the above issues and challenges that affective computing and AmI
research should address and overcome in order to design widely accepted
technologies.

8.11.4 The Link Between Facial Expressions and Emotions:


Controversies and Intricacies

Discrete emotion theory posits that there are only a limited number of fundamental
emotions and that there exists a prototypical and universal expression pattern for
each of them. Facial expressions have been discrete emotion theorists’ main evi-
dence for holistic emotion programs (Ellsworth 1991; Ortony and Turner 1990).
However, the notion of basic emotions seems to be a subject of an endless debate,
and there are a lot of unsettled issues in this regard. Many theorists have criticized
the concept of basic or discrete emotions. The overemphasis on the face as
expressing discrete and fundamental emotions has been a corollary of Tomkins’
(1962) notion of innate affect programs affecting the facial muscles (Scherer 1994).
For Scherer (1992) and Kaiser and Scherer (1998) the complexity and variability of
different emotional states can be explained without resorting to a notion of basic
emotions, and the current emotion labels of a large number of highly differentiated
emotional states capture only clusters of regularly recurring ones. Further, findings
of universal prototypical patterns demonstrating the emotions of the six facial
expressions do not enable researchers to interpret them as unambiguous indicators
of emotions in spontaneous interactions (Kaiser and Wehrle 2001). ‘Given the
popularity of photographs displaying prototypical emotion expressions, we need to
remind ourselves that expression does not consist of a static configuration. Rather it
is characterized by constant change’ (Scherer 1994, p. 4). Studying the link between
facial expressions and emotions involve a variety of problems: ‘the mechanisms
linking facial expressions to emotions are not known’, ‘the task of analyzing the
8.11 Issues, Limitations, and Challenges 449

ongoing facial behavior in dynamically changing emotional episodes is obviously


more complex than linking a static emotional expression to a verbal label’, ‘the
powerful role of regulation and expression control through explicit and implicit
social norms and expectations renders the study of expressive behavior particularly
difficult’, and ‘facial expressions can serve multiple functions—they are not nec-
essarily an indicator of emotional involvement’ (Kaiser and Wehrle 2001, p. 286).
Adding to this, while classic psychological theory argues that facial expressions are
products of evolution and universally displayed and recognized, more recent work
argues that emotions cannot be so easily categorized and that the expression of
emotions is culturally dependent (Pantic and Rothkrantz 2003).
In addition, for a computer to use facial expressions as implicit input to capture
user’s emotional cues is not as straightforward a computational process as AmI or
affective computers’ designers may see it, as some facial expressions may indicate
different things simultaneously. ‘To make things even more complicated, a facial
expression can have several meanings at the same time: e.g., a frown can indicate
that the listener does not understand what the speaker is talking about (cognitive
difficulty); at the same time this frown is a listener response (communicative),
indicating disagreement and signaling that the speaker has to explain his argument
more appropriately; finally, it can indicate that the listener is becoming more and
more angry about this difficulty in understanding him (emotional), about the con-
tent, or about the way this interaction develops’ (Kaiser and Wehrle 2001, p. 287).
Therefore, it is crucial to develop recognition approaches and related mechanisms
as well as robust and consistent ontological emotion models that can discriminate
between multiple functions of facial expressions. It is sensitive to accurately gauge
the user’s emotional state in order to be able to appropriately adapt in response to it.
Adaptation decisions are all made based on the evaluation of emotional state.
Having the knowledge of differentiating between the functions of facial behavior ‘is
a prerequisite for developing more adapted models for interpreting facial expres-
sions in spontaneous interactions, i.e., models that do not interpret each occurrence
of a frown in terms of anger, sadness, or fear (Ibid).

8.11.5 The Significance of the Identification of the Intention


of Emotions

At the current stage of research, it is difficult for an affective or AmI system to


identify, let alone interpret or anticipate, users’ emotion intentions and how they
evolve with the context defining the current interaction with the environment and its
artifacts, including the system itself. One of the central issues and significant
challenges in context-aware computing is to interpret and anticipate the user
intention pertaining to both his/her emotional and cognitive states, which are
usually captured as contextual information from verbal and/or nonverbal behaviors.
‘Realizing implicit input reliably…appears at the current stage of research close to
450 8 Affective Behavioral Features of AmI …

impossible. A number of subtasks for realizing implicit input, such as recognition


and interpretation of situations as well as general anticipation of user intention, are
not solved yet’ (Schmidt 2005, p. 164). Scherer (1992, 1994) argues that it is
essential to study the ways in which facial expressions and vocal features express
the motivational and cognitive antecedents of emotion (appraisal results). It is
crucial to identify the motive behind the emotional expressive behaviors (whether
through facial expressions, prosody or emotiveness) of the user. This is of high
relevance to both affective computing and AmI. Acquiring the motivation of the
user’s emotional state is critical to the processing and decision making regarding
what actions to take, as it enables to make proper assumptions, infer the most
appropriate response and act on it and thereby deliver relevant responsive services.
In other words, the outcome of subsequent reasoning, inference, and service
delivery processes are very much dependent on the identification of the motivation
of the emotional state of the user. Accordingly, failure to identify or anticipate the
intention behind the user’s emotional states as captured from the affect display may
render the tasks of recognition, interpretation, processing of the emotional infor-
mation as well as the provision of responsive service simply irrelevant. With the
advanced dedicated sensor technologies and recognition techniques available today,
affective systems may easily detect the user’s emotional state from sensor data or
software equivalents, but to infer proper responses upon which the system should
act certainly necessitates more than the recognition of the user’s emotional state.
From a cognitive psychology perspective, the perception—recognition and
interpretation—of human emotional behavior or state is a mental capacity that
involves very complex, dynamic cognitive processing. This may explain the
daunting challenge associated with the computational modeling of the identification
of the intention of the use’s emotional state. What complicate the matter further is
that even in human communication, while social agents as observers sensitive to
human emotions are capable of recognizing emotions conveyed, they are not
always capable of understanding the meaning behind them. Thus, even human–
agents may not draw precise inferences from others’ emotions and contextually
reacting to others’ emotions may not always be relevant as the emotions a social
agent displays may not authentically reflect his or her actual emotional state as
inferred from, for example, his/facial expressions. In addition understanding
emotions as motives that direct our behaviors within the boundaries of user-system
interaction is very important in the sense of enabling an affective system to gratify
the user’s motives, and thereby invoking positive emotions in the user. In this
account, cognitive theories provide useful insights into gaining an understanding of
the relationship between motivation and emotion. In all, there is much to learn from
cognitive psychology in terms of inform computational modeling of emotion and
thus how affective computer systems should be designed to intelligently respond to
the user’s emotional states as part of natural HCI. Particularly, research in affective
computing and AmI should focus on emotion intention as an area of study, as it
holds a great potential to enhance responsive services of affective applications and
better acceptance of affective technologies.
8.11 Issues, Limitations, and Challenges 451

8.11.6 The Impact of Multimodality on Emotion Meaning


and Interpretation

One key feature of AmI systems is multimodal interaction. This is of particular


significance to the capture of emotional and cognitive elements of the user context
for the purpose of delivering adaptive and responsive services that better match the
user’s needs. As to emotion, one of the central issues in affective computing and
AmI research is to develop technologies that allow multimodal recognition of
emotional cues in order to provide a more robust estimation of the user’s emotional
state. To enhance the inference of the user’s emotional states, current research is
active on investigating the use of multiple sensors or multisensory devices that
enable computers to make rapid and accurate inferences based on the fusion of
multiple modes of data. However, the issue is that the interpretation of emotions,
which are multimodal, may differ based on the modality or channel through which
they can be conveyed. As a part of multimodal communication behavior, affective
information may be interpreted differently by an affective system depending on
which modality or channel (e.g., facial expression, prosody) the latter uses (e.g.,
visual, auditory). The way modalities and channels are combined influences the
way emotion is interpreted, which affects its conveyance and perception. Karpinski
(2009, p. 167) demonstrates that ‘each modality may provide information on its
own that can be somehow interpreted in the absence of other modalities, and that
can influence the process of communication as well as the informational state of the
addressee’; moreover, ‘each modality and each channel has its particular properties
and they vary in the range of “meanings” they may convey and in the way they are
typically employed. For example, the modality of gestural expression is frequently
sufficient for…expressing certain attitudes. The facial expression is especially
frequently used for…emotional reactions. Speech is normally the most precise tool
for expressing complex intentions. Prosody…’can express ‘feelings and atti-
tudes…’ Indeed, listeners and speakers tend to rely heavily on affective information
conveyed by facial expressions. The primacy of nonverbal affective information
conveyed by facial expressions is corroborated by studies indicating that when such
visual information is in conflict with verbal information, people tend to rely on
visual information (Short et al. 1976). For pragmatic purposes, at the current stage
of research in affective computing and AmI, affect display as emotional channels
are seen as carriers of affective information, and very few applications consider
what communication channel is best for what meaning—specific properties of
channels. Adding to this is that the most common methods for emotion recognition
(e.g., facial expressions, speech), which are based on a behavioral approach, tend to
ignore the semantic and pragmatic context of emotions and hence lack usability in
real-world settings, although they achieve good results in the lab settings.
In terms of affective systems and AmI systems, due to the fact that emotional
cues may appear in various different emotion channels, but not all kinds of cues can
be available together due to the context of situation that usually affect cues that are
accessible, it is likely that the estimation of the use’s emotional state may be
452 8 Affective Behavioral Features of AmI …

incomplete, which may have implication on the subsequent reasoning and inference
processes and thereby the appropriateness of responsive services. And even though
different communication channels of emotion might be available and accessible, it
can be challenging for an affective system to meaningfully interpret a user’s
emotional state in the sense of being able to join the contributions to the meaning of
emotions provided through various modalities in the analysis. ‘It is difficult to
separate the contributions to the meaning provided through various modalities or
channels and the final message is not their simple “sum.” The information conveyed
through one modality or channel may be contrary to what is conveyed through the
other; it may modify it or extend it in many ways. Accordingly, the meaning of a
multimodal utterance [e.g., emotion as multimodal expression] should be, in
principle, always regarded and analyzed as a whole, and not decomposed into the
meaning of speech, gestures, facial expressions and other possible components. For
example, a smile and words of appraisal or admiration may produce the impression
of being ironic in a certain context’ (Ibid). This applies to emotion as a multimodal
affective expression in the context of affective or AmI systems. A prosodic channel
(i.e., pitch, tempo, intonation) may modify or extend affective information that is
conveyed though facial expression or gesture channel.
Furthermore, as to visual and auditory modalities, affective information may be
degraded, i.e., noise may affect auditory sensors or distance may affect visual
modality. Therefore, the meaning of the user’s emotional state may change based
on whether the affective information is utterly or incompletely captured as implicit
input from the user’s affect displays as signals. The most significant challenge for
affective systems is to analyze and interpret the meaning of a multimodal emotional
expression as a whole and not decomposed into the meaning of separate verbal and
nonverbal signals. In addition, the auditory modality differs from visual modality in
several aspects. Visual modality offers better emotion recognition, which has an
impact on the quality of the estimation of the user’s emotional state. Auditory
modality is omnidirection, transient and is always reserved (Oviatt 2002). Computer
systems tend to lack olfactory sensory modality, which is considered to be
important when it comes to communicating emotions among humans. In fact, touch
is typically linked to emotions. Touch communicates a wide variety of messages
(Jones and Yarbrough 1985). Olfactory modality too often complements visual and
auditory modality when conveying emotions. Rather, it significantly shapes the
patterns of communication and the informational state of the receiver.
In all, there is so much work left to be done in affective computing and AmI as to
interpreting more subtle shades of multimodal emotional behavior. Affective
computing should take into account a holistic perspective as to the conceptuali-
zation and modeling of emotion in relation to human communication. This
includes, among others, the synergic relationship between multimodality,
multi-channeling and the meaning of emotion in communication acts and the
non-intentionality and uncontrollability of communication behavior, including
facial expressions, paralanguage, and gesture, in relation to emotions. Based on
nonverbal communication studies, a number of unintentional, uncontrolled signals
are produced during the process of emotion communication.
References 453

References

Aarts E, de Ruyter B (2009) New research perspectives on Ambient Intelligence. J Ambient Intell
Smart Environ 1(1):5–14
ACM SIGCHI (2009) Curricula for human–computer interaction. Viewed 20 Dec 2009. http://old.
sigchi.org/cdg/cdg2.html#2_1
Aggarwal JK, Cai Q (1999) Human motion analysis: a review. Comput Vis Image Underst 73
(3):428–440
Álvarez A, Cearreta I, López JM, Arruti A, Lazkano E, Sierra B, Garay N (2006) Feature subset
selection based on evolutionary algorithms for automatic emotion recognition in spoken
Spanish and standard Basque languages. In: Sojka P, Kopecek I, Pala K (eds) Text, speech and
dialog. LNAI, vol 4188, Springer, Berlin, pp 565–572
Andre E, Rehm M, Minker W, Buhler D (2004) Endowing spoken language dialogue systems with
emotional intelligence. LNCS, vol 3068. Spriner, Berlin, pp 178–187
Argyle M (1990) The psychology of interpersonal behavior. Penguin, Harmondsworth
Asteriadis S, Tzouveli P, Karpouzis K, Kollias S (2009) Estimation of behavioral user state based
on eye gaze and head pose—application in an e-learning environment. Multimedia Tools Appl
41(3):469–493
Balomenos T, Raouzaiou A, Ioannou S, Drosopoulos A, Karpouzis K, Kollias S (2004) Emotion
analysis in man–machine interaction systems. In: Bengio S, Bourlard, H (eds) Machine
learning for multimodal interaction. Lecture Notes in Computer Science, vol 3361. Springer,
Berlin, pp 318–328
Bänninger-Huber E (1992) Prototypical affective microsequences in psychotherapeutic interac-
tions. Psychother Res 2:291–306
Beijer F (2002) The syntax and pragmatics of exclamations and other expressive/emotional
utterances. Working papers in linguistics 2. The Department of English, Lund Unversity, Lund
Bianchi-Berthouze N, Mussio P (2005) Introduction to the special issue on “context and emotion
aware visual computing”. J Vis Lang Comput 16:383–385
Boyatzis R, Goleman D, Rhee K (2000) Clustering competence in emotional intelligence: insights
from the emotional competence inventory (ECI). In: Bar-On R, Parker JDA (eds) Handbook of
emotional intelligence. Jossey-Bass, San Francisco, pp 343–362
Braisby NR, Gellatly ARH (2005) Cognitive psychology. Oxford University Press, New York
Brehm JW, Self EA (1989) The intensity of motivation. Annu Rev Psychol 40:109–131
Calvo RA, D’Mello SK (2010) Affect detection: an interdisciplinary review of models, methods,
and their applications. IEEE Trans Affect Comput 1(1):18–37
Campos J, Campos RG, Barrett K (1989) Emergent themes in the study of emotional development
and emotion regulation. Dev Psychol 25(3):394–402
Caridakis G, Malatesta L, Kessous L, Amir N, Raouzaiou A, Karpouzis K (2006) Modeling
naturalistic affective states via facial and vocal expressions recognition. In: International
conference on multimodal interfaces (ICMI’06), Banff, Alberta, Canada
Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Becket T, Douville B, Prevost S,
Stone M (1994) Animated conversation: rule-based generation of facial expressions, gesture
and spoken intonation for multiple conversational agents. In: Proceedings of SIGGAPH, ACM
Special Interest Group on Graphics, pp 413–420
Cearreta I, López JM, Garay-Vitoria N (2007) Modelling multimodal context-aware affective
interaction. Laboratory of Human–Computer Interaction for Special Needs, University of the
Basque Country
Chibelushi CC, Bourel F (2003) Facial expression recognition: a brief tutorial overview. In:
Fisher R (ed) On-line compendium of computer vision, CVonline
Chiu C, Chang Y, Lai Y (1994) The analysis and recognition of human vocal emotions. Presented
at international computer symposium, pp 83–88
454 8 Affective Behavioral Features of AmI …

Cohen I, Sebe N, Chen L, Garg A, Huang T (2003) Facial expression recognition from video
sequences: temporal and static modeling. Comput Vis Image Underst 91(1–2):160–187
(special issue on face recognition)
Cohn J, Zlochower A, Lien JJJ, Kanade T (1998) Feature-point tracking by optical flow
discriminates subtle differences in facial expression. In: Proceedings of the 3rd IEEE
international conference on automatic face and gesture recognition, pp 396–401
Cohn J, Zlochower A, Lien JJJ, Kanade T (1999) Automated face analysis by feature point
tracking has high concurrent validity with manual face coding. Psychophysiology 36:35–43
Cornelius R (1996) The science of emotions. Prentice Hall, Upper Saddle River
Cowie R, Douglas-Cowie E, Cox C (2005) Beyond emotion archetypes: databases for emotion
modelling using neural networks. Neural Netw 18(4):371–388
Damasio AR (1989) Time-locked multiregional retroactivation: a systems level proposal for the
neural substrates of recall and recognition. Cognition 33(1–2):25–62
Damasio A (1994) Descartes’ error: emotion, reason, and the human Brain. Grosset/Putnam,
New York
Darwin C (1872) The expression of emotion in man and animals. IndyPublish, Virginia
Dellaert F, Polizin T, Waibel A (1996a) Recognizing emotion in speech. In: Proceedings of ICSLP
1996, Philadelphia, PA, pp 1970–1973
Dellaert F, Polzin T, Waibel A (1996b) Recognizing emotion in speech. In: International
conference on spoken language processing (ICSLP)
Desmet P (2002) Designing emotions. Doctoral dissertation, Delft University of Technology
DeVito J (2002) Human essentials of human communication. Allyn & Bacon, Boston
Edwards GJ, Cootes TF, Taylor CJ (1998) Face recognition using active appearance models. In:
Burkhardt H, Neumann B (eds) ECCV 1998. LNCS, vol 1407. Springer, Heidelberg, pp 581–595
Ekman P (1972) Universals and cultural differences in facial expressions of emotions. In: Cole J
(ed) Nebraska symposium on motivation. University of Nebraska Press, Lincoln, NB,
pp 207–282
Ekman P (1982) Emotions in the human face. Cambridge University Press, Cambridge
Ekman P (1984) Expression and nature of emotion. Erlbaum, Hillsdale
Ekman P (1993) Facial expression and emotion. Am Psychol 48(4):384–392
Ekman P (1994) All emotions are basic. In: Ekman P, Davidson RJ (eds) The nature of emotion:
fundamental questions. Oxford University Press, Oxford
Ekman P (1999) Facial expressions. In: Dalgleish T, Power MJ (eds) The handbook of cognition
and emotion. Wiley, New York, pp 301–320
Ekman P, Friesen WV (1972) Hand movements. J Commun 22:353–374
Ekman P, Friesen WV (1975) Unmasking the face: a guide to recognizing emotions from facial
clues. Prentice-Hall, Englewood Cliffs
Ekman P, Friesen WV (1978) The facial action coding system: a technique for the measurement of
facial movement. Consulting Psychologists Press, San Francisco
Ekman P, Rosenberg EL (eds) (1997) What the face reveals. Oxford University Press, Oxford
Ekman P, Friesen WV, Ellsworth F (1972) Emotion in the human face: guidelines for research and
an integration of findings. Pergamon Press, NY
Ellsworth PC (1991) Some implications of cognitive appraisal theories of emotion. In:
Strongman KT (ed) International review of studies on emotion, vol 1. Wiley, Chichester,
pp 143–161
Freud S (1975) Beyond the pleasure principle. Norton, New York
Friesen WV, Ekman P (1982) Emotional facial action coding system. Unpublished manuscript,
University of California at San Francisco
Frijda NH (1986) The emotions. Cambridge University Press, Cambridge
Frijda NH, Tcherkassof A (1997) Facial expressions as modes of action readiness. In: Russel JA,
Fernández-Dols JM (eds) The psychology of facial expression. Cambridge University Press,
Cambridge, pp 78–102
Galotti KM (2004) Cognitive psychology in and out of the laboratory. Wadsworth, Belmont
References 455

Gardner R (2001) When listeners talk: response tokens and listener stance. John Benjamins
Publishing Company, Amsterdam
Gardner RC, Lambert WE (1972) Attitudes and motivation in second language learning. Newbury
House, Rowley
Gavrila DM (1999) The visual analysis of human movement: a survey. Comput Vis Image Underst
73:82–98
Gavrila DM, Davis LS (1996) 3-D model-based tracking of humans in action: a multi-view
approach. In: Proceedings of IEEE conference on computer vision and pattern recognition,
IEEE Computer Society Press, pp 73–80
Goleman D (1995) Emotional intelligence. Bantam Books, New York
Graham JA, Argyle MA (1975) Cross-cultural study of the communication of extra-verbal
meaning by gestures. Int J Psychol 10:57–67
Graham JA Ricci, Bitti P, Argyle MA (1975) Cross-cultural study of the communication of
emotions by facial and gestural cues. J Hum Mov 1:68–77
Gray JA (1991) Neural systems emotions, and personality. In: Madden J IV (ed) Neuro-biology of
learning, emotion and effect. Raven Press, New York, pp 273–306
Gunes H, Piccardi M (2005) Automatic visual recognition of face and body action units. In:
Proceedings of the 3rd international conference on information technology and applications,
Sydney, pp 668–673
Gunnarsdóttir K, Arribas-Ayllon M (2012) Ambient intelligence: a narrative in search of users.
Lancaster University and SOCSI, Cardiff University, Cesagen
Hager JC, Ekman P, Friesen WV (2002) Facial action coding system. A Human Face, Salt Lake
City, UT
Heise D (2004) Enculturating agents with expressive role behavior. In: Agent culture: human–
agent interaction in a mutlicultural world. Lawrence Erlbaum Associates, Hillsdale, pp 127–
142
Hekkert P (2004) Design aesthetics: principles of pleasure in design. Department of Industrial
Design, Delft University of Technology, Delft
Huang TS, Pavlovic VI (1995) Hand gesture modeling, analysis, and synthesis. In: Proceedings of
international workshop on automatic face and gesture recognition, Zurich, Switzerland
Ikehara CS, Chin DN, Crosby ME (2003) A model for integrating an adaptive information filter
utilizing biosensor data to assess cognitive load. In: Brusilovsky P, Corbett AT, de Rosis F
(eds) UM 2003. LNCS, vol 2702. Springer, Heidelberg, pp 208–212
Izard CE (1994) Innate and universal facial expressions: evidence from developmental and
cross-cultural research. Psychol Bull 115:288–299
Jakobson R (1960) Closing statement: linguistics and poetics. In: Sebeok TA (ed) Style in
language. The MIT Press, Cambridge, pp 350–377
James W (1884) Psychological essay: what is an Emotion? Mind 9:188–205
Jones SE, Yarbrough AE (1985) A naturalistic study of the meanings of touch. Commun Monogr
52:19–56
Kaiser S, Scherer KR (1998) Models of ‘normal’ emotions applied to facial and vocal expressions
in clinical disorders’. In: Flack WF, Laird JD (eds) Emotions in psychopathology. Oxford
University Press, New York, pp 81–98
Kaiser S, Wehrle T (2001) Facial expressions as indicators of appraisal processes. In: Scherer KR,
Schorr A, Johnstone T (eds) Appraisal processes in emotions: theory, methods, research.
Oxford University Press, New York, pp 285–300
Kanade T, Cohn JF, Tian Y (2000) Comprehensive database for facial expression analysis. In:
Proceedings of the 4th IEEE international conference on automatic face and gesture recognition
(FG’00), Grenoble, France, pp 46–53
Kang BS, Han CH, Lee ST, Youn DH, Lee C (2000) Speaker dependent emotion recognition
using speech signals. In: Proceedings of ICSLP, pp 383–386
Kapur A, Virji-Babul N, Tzanetakis G, Driessen PF (2005) Gesture-based affective computing on
motion capture data. In: Proceedings of the 1st international conference on affective computing
and intelligent interaction, Beijing, pp 1–7
456 8 Affective Behavioral Features of AmI …

Karkkainen E (2006) Stance taking in conversation: from subjectivity to intersubjectivity. Text


Talk 26:699–731
Karpinski M (2009) From speech and gestures to dialogue acts. In: Esposito A, Hussain A,
Marinaro M, Martone R (eds) Multimodal signals: cognitive and algorithmic issues. LNAI, vol
5398. Springer, Berlin, pp 164–169
Keltner D, Haidt J (1999) Social functions of emotions at four levels of analysis. Cogn Emot 13
(5):505–521
Kleinginna PR, Kleinginna AM (1981) A categorized list of emotion definitions with suggestions
for a consensual definition. Motiv Emot 5:345–379
Kluemper DH (2008) Trait emotional intelligence: the impact of core-self evaluations and social
desirability. Pers Individ Differ 44(6):1402–1412
Koonce R (1996) Emotional IQ, a new secret of success? Training Dev 50(2):19
MIT Media Lab (2014) Affective computing: highlighted projects. http://affect.media.mit.edu/
projects.php
Lang PJ (1979) A bio-informational theory of emotional imagery. Psychophysiology 16:495–512
Lang PJ (1980) Behavioral treatment and bio-behavioral assessment: computer applications. In:
Sidowski JB, Johonson JH, Williams TA (eds) Technology in mental health care delivery
systems. Albex, Norwood, pp 119–139
Lazarus RS (1982) Thoughts on the relations between emotion and cognition. Am Psychol
37:1019–1024
Lazarus RS (1991) Emotion and adaptation. Oxford University Press, New York
Lazarus RS (2001) Relational meaning and discrete emotions. In: Scherer KR, Schorr A,
Johnstone T (eds) Appraisal processes in emotion: theory, methods, research. Oxford
University Press, New York, pp 37–67
Lee CM, Narayanan S, Pieraccini R (2001) Recognition of negative emotion in the human speech
signals. Workshop on automated speech recognition and understanding. In: Proceedings of
ASRU’01 IEEE Workshop, Madonna di Campiglio, Italy
Lehrer J (2007) Hearts & Minds. Viewed 20 June 2012. http://www.boston.com/news/education/
higher/articles/2007/04/29/hearts__minds/
Lerner JS, Keltner D (2000) Beyond valence: toward a model of emotion-specific influences on
judgment and choice. Cogn Emot 14(4):473–493
Leventhal H, Scherer KR (1987) The relationship of emotion to cognition: a functional approach
to a semantic controversy. Cogn Emot 1:3–28
Lisetti CL, Schiano DJ (2000) Automatic facial expression interpretation: where human–
computer-interaction, artificial intelligence and cognitive science intersect. Pragmatics Cogn 8
(1):185–235 (special issue on facial information processing: a multidisciplinary perspective)
Lutz C (1988) Unnatural emotions: everyday sentiments on a Micronesian atoll and their challenge
to Western theory. University of Chicago Press, Chicago
Lweis M, Haviland JM (1993) Handbook of emotion. The Guilford Press, New York
March ST, Smith GF (1995) Design and natural science research on information technology. Decis
Support Syst 15(4):251–266
Markopoulos P, de Ruyter B, Privender S, van Breemen A (2005) Case study: bringing social
intelligence into home dialogue systems. ACM Interact 12(4):37–43
Mayer JD, Salovey P (1997) What is emotional intelligence? In: Salovey P, Sluyter D
(eds) Emotional development and emotional intelligence: implications for educators. Basic
Books, New York, pp 3–31
Mehrabian A, Russell JA (1974) An approach to environmental psychology. MIT Press,
Cambridge
Michel P, El Kaliouby R (2003) Real time facial expression recognition in video using support
vector machines. In: 5th international conference on multimodal interfaces, Vancouver,
pp 258–264
Murray I, Arnott J (1993) Toward the simulation of emotion in synthetic speech: a review of the
literature of human vocal emotion. J Acoust Soc Am 93:1097–1108
References 457

Murray I, Arnott J (1996) Synthesizing emotions in speech: is it time to get excited? In:
Proceedings of the international conference on spoken language processing (ICSLP’96),
Philadelphia, PA, USA, pp 1816–1819
Myers DG (2004) Theories of emotion, psychology. Worth Publishers, New York
Nakamura A (1993) Kanjo hyogen jiten (Dictionary of emotive expressions) (in Japanese),
Tokyodo
Nijholt A, Rist T, Tuijnenbreijer K (2004) Lost in ambient intelligence? In: Proceedings of CHI
2004, Vienna, Austria, pp 1725–1726
Noldus L (2003) Homelab as a scientific measurement and analysis instrument. Philips Res
2003:27–29
Ortony A, Turner TJ (1990) What’s basic about basic emotions? Psychol Rev 97:315–331
Ortony A, Clore GL, Collins A (1988) The cognitive structure of emotions. Cambridge University
Press, Cambridge
Oviatt S (2002) Multimodal interfaces. In: Jacko JA, Sears A (eds) A handbook of human–
computer interaction. Lawrence Erlbaum, New Jersey
Pantic M, Rothkrantz LJM (2000) Automatic analysis of facial expressions: the state of the art.
IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445
Pantic M, Rothkrantz LJM (2003) Toward an affect sensitive multimodal human-computer
interaction. Proc IEEE 91(9):1370–1390
Passer MW, Smith RE (2006) The science of mind and behavior. McGraw Hill, Boston
Pavlovic VI, Sharma R, Huang TS (1997) Visual interpretation of hand gestures for
human–computer interaction: a review. IEEE Trans Pattern Anal Mach Intell 19(7):677–695
Petrides KV, Furnham A (2000) On the dimensional structure of emotional intelligence. Pers
Individ Differ 29:313–320
Petrides KV, Pita R, Kokkinaki F (2007) The location of trait emotional intelligence in personality
factor space. Br J Psychol 98:273–289
Phillips PJ, Flynn PJ, Scruggs T, Bowyer KW, Chang J, Hoffman K, Marques J, Jaesik M,
Worek W (2005) Overview of the face recognition grand challenge. In: Proceeding of IEEE
computer society conference on computer vision and pattern recognition
Picard RW (1997) Affective computing. MIT Press, Cambridge
Picard RW (2000) Perceptual user interfaces: affective perception. Commun ACM 43(3):50–51
Picard RW (2010) Emotion research by the people, for the people. Emot Rev 2(3):250–254
Picard WE, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of
affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
Plutchik R, Kellerman H (1980) Emotion: theory, research and experience. Academic Press, New
York
Pope LK, Smith CA (1994) On the distinct meanings of smiles and frowns. Cogn Emot 8:65–72
Ptaszynski M, Dybala P Shi, Rafal W, Araki RK (2009) Towards context aware emotional
intelligence in machines: computing contextual appropriateness of affective states. Graduate
School of Information Science and Technology, Hokkaido University, Hokkaido
Punie Y (2003) A social and technological view of ambient intelligence in everyday life: what
bends the trend? The European media and technology in everyday life network, 2000–2003.
Institute for Prospective Technological Studies Directorate General Joint Research Center
European Commission
Reeve J (2005) Understanding motivation and emotion. Wiley, New York
Riva G, Vatalaro F, Davide F, Alcañiz M (2005) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human–computer interaction.
IOS Press, Amsterdam
Roseman IJ (1984) Cognitive determinants of emotion: a structural theory. In: Shaver P
(ed) Review of personality and social psychology. Sage, Beverly Hills, pp 11–36
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39:1161–1178
Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 1:145–172
Sagisaka Y, Campbell N, Higuchi N (1997) Computing prosody. Springer, New York
Salovey P, Grewal D (2005) The science of emotional intelligence. Curr Dir Psychol Sci 14:281–285
458 8 Affective Behavioral Features of AmI …

Salovey P, Mayer JD (1990) Emotional intelligence. Imagination Cogn Pers 9:185–211


Sampson F (2005) Why do I want ambient intelligence? ACM Interact 12(2):9–10
Samtani P, Valente A, Johnson WL (2008) Applying the saiba framework to the tactical language
and culture training system. In: Parkes P, Parsons M (eds) The 7th international conference on
autonomous agents and multiagent systems (AAMAS 2008). Estoril, Portugal
Scherer KR (1984) On the nature and function of emotion: a component process approach. In:
Scherer KR, Ekman P (eds) Approaches to emotion. Erlbaum, Hillsdale, pp 293–318
Scherer KR (1986) Vocal affect expression: a review and a model for future research. Psychol Bull
99:143–165
Scherer KR (1992) What does facial expression express? In: Strongman K (ed) Int Rev Stud
Emotion 2:139–165
Scherer KR (1993) Neuroscience projections to current debates in emotion psychology. Cogn
Emot 7:1–41
Scherer KR (1994) Plato’s legacy: relationships between cognition, emotion, and motivation.
University of Geneva, Austria
Scherer KR (1996) Adding the affective dimension: a new look in speech analysis and synthesis.
In: Proceeding of international conference on spoken language processing (ICSLP 1996),
pp 1808–1811
Scherer KR (1999) Appraisal theory. In: Dalgleish T, Power MJ (eds) Handbook of cognition and
emotion. Wiley, New York, pp 637–663
Scherer KR, Schorr A, Johnstone T (eds) (2001) Appraisal processes in emotion: theory, methods,
research. Oxford University Press, New York
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human–computer interaction.
IOS Press, Amsterdam, pp 159–178
Schmidt A, Beigl M, Gellersen HW (1999) There is more to context than location. Comput
Graphics UK 23(6):893–901
Schröder M (2011) The SEMAINE API: a component integration framework for a naturally
interacting and emotionally competent embodied conversational agent. PhD thesis, Saarland
University
Schweiger R, Bayerl P, Neumann H (2004) Neural architecture for temporal emotion
classification. In: Andre E, Dybkjær L, Minker W, Heisterkamp P (eds) ADS 2004. LNCS
(LNAI), vol 3068. Springer, Heidelberg, pp 49–52
Sebe N, Lew MS, Cohen I, Garg A, Huang TS (2002) Emotion recognition using a cauchy naive
bayes classifier. In: Proceedings of the 16th international conference on pattern recognition, vol
1. IEEE Computer Society, Washington DC, pp 17–20
Sebe N, Cohen I, Gevers T, Huang TS (2004) Multimodal approaches for emotion recognition: a
survey. In: Proceedings of the SPIE: internet imaging, pp 5667
Sharp R, Rehman K (2005) The 2005 UbiApp workshop: what makes good application–led
research? IEEE Pervasive Comput 4(3):80–82
Sheldon EM (2001) Virtual agent interactions. PhD thesis, Major Professor–Linda Malone
Shneiderman B (2002) Leonardo’s laptop: human needs and the new computing technologies.
MIT Press, Cambridge
Short JA, Williams E, Christie B (1976) The social psychology of telecommunications. Wiley,
London
Smith R, Conrey FR (2007) Agent-based modeling: a new approach for theory building in social
psychology. Person Soc Psychol Rev 11:87–104
Smith CA, Scott HS (1997) A componential approach to the meaning of facial expression. In:
Russel JA, Fernández-Dols JM (eds) The psychology of facial expression. Cambridge
University Press, Cambridge, pp 229–254
Solomon RC (1993) The passions: emotions and the meaning of life. Hackett Publishing,
Indianapolis
References 459

Stevenson C, Stevenson L (1963) Facts and values—studies in ethical analysis. Yale University
Press, New Haven
Susskinda JM, Littlewortb G, Bartlettb MS, Movellanb J, Anderson AK (2007) Human and
computer recognition of facial expressions of emotion. Neuropsychologia 45:152–162
Tähti M, Arhippainen L (2004) A Proposal of collecting emotions and experiences. Interact
Exp HCI 2:195–198
Tähti M, Niemelä M (2005) 3e—expressing emotions and experiences. Medici Data oy, VTT
Technical Research Center of Finland, Finland
Tao J, Tieniu T (2005) Affective computing and intelligent interaction. LNCS, vol 3784. Springer,
Berlin, pp 981–995
Teixeira J, Vinhas V, Oliveira E, Reis L (2008) A new approach to emotion assessment based on
biometric data. In: Proceedings of WI–IAT’08, pp 459–500
ter Maat M, Heylen D (2009) Using context to disambiguate communicative signals. In:
Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals. LNAI, vol 5398.
Springer, Berlin, pp 164–169
Tian YL, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE
Trans Pattern Anal Mach Intell 23(2):97–115
Tokuhisa R, Inui K, Matsumoto Y (2008) Emotion classification using massive examples extracted
from the Web. In: Proceedings of COLING 2008, pp 881–888
Tomkins SS (1962) Affect, imagery, consciousness: the positive affects. Springer, New York
Turk M, Robertson R (2000) Perceptual user interfaces. Commun ACM 43(3):33–44
Vick RM, Ikehara CS (2003) Methodological issues of real time data acquisition from multiple
sources of physiological data. In: Proceedings of the 36th annual Hawaii international
conference on system sciences. IEEE Computer Society, Washington DC, pp 1–156
Wimmer M (2007) Model-based image interpretation with application to facial expression
recognition. PhD thesis, Technische Universitat Munchen, Institute for Informatics
Wimmer M, Mayer C, Radig B (2009) Recognizing facial expressions using model-based image
interpretation. In: Esposito A, Hussain A, Marinaro M, Martone R (eds) Multimodal signals:
cognitive and algorithmic issues. Springer, Berlin, pp 328–339
Wu H (2004) Sensor data fusion for context-aware computing using Dempster-Shafer theory. PhD
thesis, Carnegie Mellon University
Yin X, Xie M (2001) Hand gesture segmentation, recognition and application. In: Proceedings of
IEEE international symposium on computational intelligence in robotics and automation,
Canada, pp 438–443
Zhang P (2008) Motivational affordances: reasons for ICT design and use. Commun ACM 51(11)
Zhou J, Kallio P (2005) Ambient emotion intelligence: from business awareness to emotion
awareness. In: Proceeding of 17th international conference on systems research, informatics
and cybernetics, Baden, Germany
Zhou J, Yu C, Riekki J, Kärkkäinen E (2007) AmE framework: a model for emotion-aware
ambient intelligence. University of Oulu, Department of Electrical and Information
Engineering, Faculty of Humanities, Department of English VTT Technical Research Center
of Finland
Chapter 9
The Cognitively Supporting Behavior
of AmI Systems: Context Awareness,
Explicit Natural (Touchless) Interaction,
Affective Factors and Aesthetics,
and Presence

9.1 Introduction

The supporting cognitive behavior of AmI systems involves different aspects and
thus application domains. One of the cornerstones of AmI is the adaptive behavior
of systems in response to the user’s cognitive state. The functionality of AmI
systems to act according to the user’s cognitive context is associated with cognitive
context-aware applications, which aim to reduce the cognitive burden involved in
performing tasks or carrying out activities, by helping users to cope with these tasks
in intuitive ways (and also freeing them from tedious ones). AmI aspires to create
technology that supports people’s cognitive needs, including decision making,
problem solving, visual perception, information searching, information retrieval,
and so on. These pertain to such cognitive activities as writing, reading, learning,
design, game playing, activity organizing, Internet surfing, and so forth. Hence,
AmI systems should be able to intelligently adapt to the user’s behaviors and
actions, by recognizing the cognitive dimension of context and modifying their
functionality accordingly. In addition, AmI systems should be able to utilize and
respond to speech and gestures (facial, hand, and eye gaze movements) as com-
mands (new forms of explicit inputs) to perform tasks more effectively and effi-
ciently on behalf of users. This design feature of AmI promises simplicity and
intuitiveness, and will enable the user to save considerable cognitive effort when,
for example, navigating between documents, surfing the Internet, scrolling, reading,
writing, and working. Most importantly, any reaction to cognitive behaviors or
explicit gestured or spoken commands must be performed in a way that is articu-
lated as appropriate and desirable.
One important aspect of AmI is the system feature of social intelligence.
A system designed with socially intelligent features is able to select and fine-tune its
behavior according to the cognitive state and affective state of the user and thus
invoke positive feelings in users. The aim of AmI is to design applications and
environments that elicit positive emotions or induce positive emotional states and

© Atlantis Press and the author(s) 2015 461


S.E. Bibri, The Human Face of Ambient Intelligence, Atlantis Ambient
and Pervasive Intelligence 9, DOI 10.2991/978-94-6239-130-7_9
462 9 The Cognitively Supporting Behavior of AmI Systems …

pleasurable experiences in users. To ensure satisfactoriness and pleasurability and


thus gain acceptance for AmI, applications need not only to function properly and
intelligently and be usable and efficient, but they also need to be aesthetically
pleasant and emotionally alluring. The affective quality of AmI artifacts and
environments (pertaining to visual, hypermedia, and aesthetic features) as well as
the smoothness, intuitiveness, and richness of interaction elicit positive emotions
and invoke positive feelings in users, which help improve their performance. In
more detail, positive emotions induced by subjective, socioculturally situated
interpretation of aesthetics and subjective experiences of interaction in terms of
both processes and information content affect users’ cognitive activities. The social
and emotional state of the humans has a key role in determining and shaping the
unfolding of the interaction process. The role of emotions is a key element of
socially intelligent behavior of AmI systems. Hence, such systems should be
equipped with user interfaces that merge hypermedia, visual, aesthetic, naturalistic,
multimodal, and context-aware tools to better support users’ tasks and activities.
These systems provide new and alluring possibilities to users in their interactive
computing and visual, aesthetic, and ambient thinking. In particular, it has become a
well-known phenomenon that ‘attractive things work better’ (Norman 2002) and
‘what is beautiful is usable’ (Tractinsky et al. 2000). Human-product interaction has
been extensively researched in relation to both technological and non-technological
domains. Indeed, in addition to being a common theme in the literature on ICT
design, the interaction between human factors and design has become a new
challenge in the real-world working-out of human usage issues pertaining to the
design of AmI artifacts and environments. The affective (as well as social and
cultural) factors should support AmI designs that address the tacit nature of human
users’ perceptions as well as their emotional responses and aspirations and achieve
aesthetic experience—pleasurability to the senses, through interactions between
them and AmI artifacts and environments. This is giving rise to affective human
factors design as a budding research perspective within the field of AmI. Affective
human factors AmI and thus HCI design relates to the developing area of affective
computing within AI. It addresses the delivering of affective user interfaces capable
of, in addition to responding to the user’s emotional states expressed through affect
display (gestured and spoken indicators), eliciting emotional and pleasant experi-
ences from users. Likewise, aesthetics and affect and the way they influence cog-
nition are increasingly gaining importance in the context of AmI design and use.
Advances in this understanding will have strong implications for the science of ICT
design and thus design of emerging AmI applications and environments. Regardless
of the ways in which emotions can be elicited, recent research in cognitive psy-
chology shows that they influence the very mechanisms of rational thinking,
playing an essential role in perception, problem solving, decision making and
judgment, memorization, learning, and creativity. The new scientific appreciation of
emotion is profoundly altering computing and determining an evolution in its
culture: what people can do, rather than what computers can do.
Research in presence technology is expected to play a key role in the process of
the development of full AmI paradigm. Given the scope of AmI applications and
9.1 Introduction 463

environments in terms of the intelligent supporting behavior (i.e., adaptive,


responsive, immersive, and social), AmI involves various conceptualizations of
presence (the experience of projecting the mind through media to different entities
in various x-realities, with the perceptual illusion of non-mediation), including a
sense of realism, transportation, immersion, interactivity and control, and the
medium as a social actor.
The aim of this chapter is to explore and discuss the different features of the
cognitively supporting behavior of AmI systems, namely cognitive context
awareness, explicit natural (touchless) interaction, affective quality of aesthetic
computational artifacts and environments and their rich and intuitive interaction
with users, and presence in its different conceptualizations. Most of these features
are associated with subjective perceptions that are tacit and difficult to external-
ize and translate into a form intelligible to a computer system. Addressing this issue
entails important mechanisms underlying social and affective interactive processes
and computational tools that support such processes. In relation to presence, for
example, the potential of computational and visual artifacts lie in using emotional
responses and expressions to support both agent–human interaction as well as
computer-mediated human–human interaction.

9.2 The Usage of the Term ‘Cognition’ in Cognitive


Psychology and Cognitive Science

Unlike affect and emotion whose terminology is still, albeit the major strides made
by cognitive and neurosciences in the past two decades, an issue of technical
debates, cognition seems to be overall a well-understood notion. Cognition has
been studied in various disciplines, such as cognitive psychology, social psychol-
ogy, cognitive science, computer science, socio-cognitive engineering, cognitive
anthropology, neuroscience, linguistics, cognitive linguistics, phenomenology,
analytic philosophy, and so on. Hence, it has been approached and analyzed from
different perspectives. In other words, it means different things to different people.
‘To a computer scientist, the mind might be something that can be simulated
through software or hardware… On the other hand, to a cognitive psychologist, the
mind is the key to understanding human or animal behavior. To a cognitive neu-
roscientist, the mind is about the brain and its neurological underpinnings… The list
goes on’ (Boring 2003). In social cognition, which is a branch of social psychology,
the term ‘cognition’ is used to explain attitudes and groups dynamics. In this
chapter, the emphasis is on the definition of cognition as related to cognitive
psychology and thus cognitive science because of its relevance to computing—and
thus AmI. In this sense, as a scientific term cognition refers to an information
processing view of the mental processes of humans as intelligent entities. In cog-
nitive science, intelligent entities also include highly autonomous computers. In
cognitive science, the term ‘cognitive’ is used to describe any kind of mental
464 9 The Cognitively Supporting Behavior of AmI Systems …

process that can be examined in precise terms (Lacoff and Johnson 1999).
A process is any activity that involves more than one operation. Therefore, cog-
nition is an information processing system to perceive and make sense of the world
—and hence an experience-based system. Perception interprets and assigns
meaning, and sense-making refers to the process by which people give meaning to
experience. Cogntivism emphasizes that cognitions are based on perceptions and
that there is no cognition without mental representations of real objects, people,
events, and processes occurring in the world.

9.3 Cognitive/Mental Processes

Cognition entails such diverse mental (cognitive) processes as sensation, percep-


tion, attention, memory, motivation, emotion, recognition, problem-solving, and
language processing. These mental information-manipulation processes are
involved in such high-level cognitive activities as creativity, imagination,
reasoning/thinking, language understanding and production, planning, and so on.
They may operate in the absence of relevant stimulation or between stimulus and
response. Sensation is consciousness that results from stimulus of a sense organ
(hearing, sight, smell, taste, touch) or from the recognition of an emotion (Galotti
2004), e.g., the sensation of delight or pleasure triggered by the affective quality of
an aesthetically pleasant artifact. Human senses are realized by different sensory
receptors: for visual, auditory, tactile, olfactory, and gustatory signals, which are
found in the eyes, ears, skin, nose, and tongue, respectively. Perception involves
perceptual analysis, recognition and meaningful classification of a stimulus (e.g.,
object) after being received by sense organs as sensory information coming from
sensory memory. In other words, it entails interpretation of a stimulus and its
recognition by comparing it to previous encounters and then categorizing it.
Recognition is a process of generating and comparing descriptions of objects
currently in view, which are retained in the working memory with descriptions of
objects seen previously that are stored in the long memory (Braisby and Gellatly
2005). It is about seeing something as familiar or experienced before. Attention
refers commonly to the ability of focusing mental effort—concentrating mental
resources at once on specific stimuli—limited number of things—whilst excluding
other stimuli from consideration. Attention is a strong emotional response caused
by a potential conflict and also by the pre-conscious recognition of an emerging
repressed conflict (Ibid). Memory refers to the capacity to encode and store
something, to recall something learned (stored in long-term memory) or to recog-
nize something previously experienced (Galotti 2004). Memory is used in several
cognitive processes, such as perception, recognition, problem solving, and emotion.
Problem solving entails proceeding from a beginning (the initial state) to the end (a
goal state) via a limited number of steps. It is an activity that draws together the
various different components of cognition presented in various ways inside the
9.3 Cognitive/Mental Processes 465

brain (Passer and Smith 2006). This process uses memory to recover any prior
knowledge we might have that could be relevant to solving a new problem (Braisby
and Gellatly 2005). The desired outcome we expect is the goal that directs the
course of our thinking to overcome the existing situation by guiding retrieval of
goal-relevant information from long-term memory (Ibid). Problem solving is con-
sidered as a fundamental human cognitive process that serves to deal with a situ-
ation or solve issues encountered in daily life. Decision making is the process of
reaching a decision through deciding on an issue by selecting and rejecting avail-
able options, to choose between available alternatives. Therefore, it involves
weighting the positives and negatives of each alternative, considering all the
alternatives, and determining which alternative is the best for a given situation. This
is, mapping the likely consequences of decisions, working out the importance of
individual factors, and choosing the best alternative. Most of the decisions we make
or action we take relates to some kind of problems we try to solve. Research shows
that emotion has a great impact on problem solving and decision making as cog-
nitive processes. As far as emotion and motivation are concerned, they are dis-
cussed adequately in more detail in the previous chapter.
Cognitive psychologists have proposed a range of theoretical models of cogni-
tion. Similarly, cognitive scientists have developed various models in relation to
computing, such as computational model, decisional model, analytical model,
learning model, and formal reasoning model. These models are inspired by human
mental processes, namely computation, decision making, problem solving, learning,
and reasoning. They are of high applicability in AmI systems as autonomous
entities inspired by human cognitive intelligence, including cognitive context-aware
applications, which aim to facilitate and enhance mental abilities associated with
cognitive intelligence, by using computational capabilities.

9.4 Cognitive Context-Aware Computing

9.4.1 Internal and External Context

Context-aware computing is the catchphrase nowadays. Computer systems are


becoming ubiquitous and might not be used by the same user, and interaction is
becoming user-centered. Hence, it becomes relevant and important to pursue
context awareness in the emerging computing paradigm. Application areas of
context awareness are numerous. Depending on the application domain,
context-aware applications use a set of contextual elements to infer the dimension
of context. In terms of context dichotomies, the emphasis here is on internal and
external context. Context-aware applications may use the external context to infer
the emotional, social, and situational dimension of context, the internal context to
infer the cognitive dimension of context, and a combination of both to infer the task
dimension of context. Examples of external context include location, time, light,
466 9 The Cognitively Supporting Behavior of AmI Systems …

co-location, group dynamics, activity, and emotional state. Internal context include
psychophysiological state, cognitive state, and personal event. The external context
is a physical environment, while the internal context is a psychological context that
does not appear externally (Giunchiglia and Bouquet 1988; Kintsch 1988). The
focus of this chapter is on the cognitive (task) dimension of human context, which
may appear externally or internally. In general, human factors related context
encompass, according to Schmidt et al. (1999), three categories: information on the
user (knowledge of habits, emotional state, bio-physiological conditions), the user’s
tasks (activity, engaged tasks, general goals), and the user’s social environment
(social interaction, co-location of other, group dynamics). Regardless, a
context-aware application should be able to act in an interrelated, dynamic fashion
based on the interpretation of a set of atomic contextual elements that are of central
concern to the user, which can be transformed into a higher level abstraction of
context, prior to delivering relevant adaptive services.

9.4.2 Cognitive Context Awareness

AmI technology holds a great potential for permeating everyday life and changing
the nature of almost every human activity. The basic idea of AmI as an emerging
computing paradigm is about what people can do by what computers can do in
contrast to the old computing paradigm which entails what computers can do—i.e.,
be intelligent enough to augment human cognitive intelligence in action and not
only intelligent in executing complex tasks. In AmI people should be empowered
through a smart computing environment that is aware of their cognitive context and
is adaptive and proactive in response to their cognitive needs, among others. In
other words, one feature of AmI is that the services delivered in AmI environments
should adaptively and proactively change according to the user’s cognitive context
and be delivered prior to the user. This feature emphasizes context awareness and
intelligence functionality of AmI systems, a technological feature which involves,
among others, augmenting interactive systems with cognitive capabilities that allow
them to better understand, support, and enhance those of users. Cognitive context is
one of the key elements of the context information amalgam necessary to guide
computational understanding of knowledge-based interaction particularly in rela-
tion to enhancing task and activity performance.
Underpinning AmI is the adaptive behavior of systems in response to the user’s
cognitive state. The computational functionality of AmI systems to act in accor-
dance to the cognitive dimension of context is associated with what is termed
cognitive context-aware applications. Such applications should be able to recognize
the user’s cognitive context in the state of performing a given task or carrying out a
given activity, by means of transforming atomic internal or external elements of
context into a high-level abstraction of context (i.e., sensor-based information are
converted into reusable semantic interpretation of low-level context information—a
process which is known as context inference), and adapt their behavior to best
9.4 Cognitive Context-Aware Computing 467

match the inferred context, that is, meet the user’s cognitive need, by providing the
most relevant services in support of the user’s tasks or activities. The cognitive
dimension of context must be accurately detected, meaningfully interpreted, effi-
ciently reasoned about in order to determine an appropriate response and act upon
it. AmI supports a wide variety of cognitive needs, including decision making,
information searching, information retrieval, problem solving, visual perception,
reasoning, and so on, which are associated with such cognitive activities as writing,
reading, learning, planning, design, game playing, activity organizing, Internet
surfing, and so forth. In light of this, the aim of cognitive context-aware applica-
tions is to reduce the cognitive burden involved in performing tasks or carrying out
everyday life activities. For example, in Web-based information system, cognitive
context awareness feature can help the user to work with a system conveniently and
enable an existing system to deliver AmI services (Kim et al. 2007). In this context,
the cognitive context, which is relevant to psychological state, can be inferred using
such internal context as user’s intention, work context, task goal, business process,
and personal event (Gwizdka 2000; Lieberman and Selker 2000). It is important to
note that the cognitive context may mean different psychological states at different
moments while performing a given task or carrying out a given activity and that one
task might involve one or more cognitive states, such as information retrieval and
problem solving. The range of scenarios for which cognitive context may be uti-
lised is potentially huge. AmI systems can anticipate and intelligently adapt to the
user’s actions, by recognizing the cognitive dimension of context and modifying
their behavior accordingly, e.g., adapt interfaces to ease visual perception, tailor the
set of application-relevant data, enhance decision-making accuracy, recommend
and execute services, enhance memorization, increase the precision of information
retrieval, reduce frustration and thus avoid users to make mistakes, stimulate cre-
ative thinking, facilitate problem solving, enhance learning, and so on.
In context-aware computing research, emotional states as dimensions of the
emotional context have increasingly gained attention, compared to cognitive states
as dimensions of the cognitive context. This is due to the recent joint research
endeavors integrating affective computing with context awareness computing. On
the other hand, research has been less active on the topic of cognitive context. The
lack of interest in this area is likely to be explained by, among other things, the
daunting challenges and subtle intricacies associated with capturing, modeling, and
inferring the cognitive states of users, especially novel recognition and modeling
approaches based on nonverbal behavior are still evolving and hence have not
reached a mature stage yet. In other words, related research in HCI is still in its
infancy.
Advanced knowledge of human cognition, new discoveries in cognitive science,
and further advancement of AI are projected to have strong implications for AmI
system engineering, design, and modeling. One of which is that cognitive
context-aware systems will be able to recognize complex cues of the user’s cog-
nitive behavior using miniature multisensory devices as well as dynamic learning of
stochastic cognitive contexts or activities models—cognitive states and behaviors
pertaining to information handling when reasoning, visually perceiving objects,
468 9 The Cognitively Supporting Behavior of AmI Systems …

solving problems, making decisions, and so on. This will enable cognitive
context-aware systems to make more accurate inferences about cognitive contexts
and what kind of services should be delivered in support of cognitive needs.
Thereby, computational resources and competencies are harnessed and channeled
towards facilitating and enhancing cognitive abilities of users in a more intuitive
way. Cognitive context-aware systems have a great potential to heighten user
interaction experience, by reducing the cognitive burden associated with perform-
ing difficult tasks and activities—the ever-increasing complexity of, and the mas-
sive use of ICT in, everyday life.

9.4.3 Methods for Capturing Cognitive Context

For cognitive context-aware systems to function properly and be efficient and


usable, they should be equipped with ambient, multimodal user interfaces. Such
user interfaces entail a set of hidden intelligent interfaces—augmented with various
types of sensors or multisensory devices—that recognize the user’s cognitive
context by reading multiple sources, infer relevant real-time response, and act upon
it, by providing services to immediate cognitive needs in an unobtrusive way. Most
attempts to use context awareness within AmI environments have for long focused
on the physical elements of the environment, users, or devices. In more detail,
context-aware applications have tended to be aware of external context by using
stereo type cameras, RFID, smart devices, and so on, and in this way, infer a user’s
physical context to guide service delivery. These approaches are, however, asso-
ciated with shortcomings related to recognizing user’s intention in a static condition
and thus fail to support users’ cognitive tasks or activities, e.g., it is difficult to
recognize the user’s intention in the state of browsing web pages on the Internet
through physical devices. In other words, it is more computationally challenging to
recognize user’s cognitive context than the physical context. Therefore, the
importance of capturing the cognitive elements of a user’s context has been widely
acknowledged (Schilit et al. 1994).
Recent years have witnessed the emergence of methods for awareness and
inference of a user’s cognitive context based on software algorithms (e.g., Prekop
and Burnett 2003; Kim et al. 2007; Kwon et al. 2005). Systems using software
algorithms usually infer the user’s cognitive context based on the user’s intention
and use it along with minimal user’s input data to deliver or recommend adaptive
services. In this regard, a few methods for capturing, representing, and inferring
cognitive context have been developed and applied.
Currently, HCI researchers are investigating nonverbal cues, especially eye gaze
and facial expressions, as a finer indicator of, and a reliable source for capturing, the
user’s cognitive states and activities. Facial expressions indicate cognitive processes
(Scherer 1992, 1994b). Eye movements accurately reflect visual attention and
cognitive processes (Tobii Technology 2006). Nonverbal behaviors can provide
useful information as implicit input to cognitive context-aware systems. Thus, the
9.4 Cognitive Context-Aware Computing 469

cognitive dimension of context can be inferred or deduced using facial cues or eye
movement as an external context—an atomic level of the context. To iterate, the
contribution of IA has been significant with regard to pattern recognition techniques,
ontological modeling techniques, naturalistic user interfaces, facial expression rec-
ognition, and computer vision.

9.4.4 Application Areas of Cognitive Context Awareness

Based on the literature, a few methods for capturing, representing, and inferring
cognitive context have been developed. And the few practical attempts to imple-
ment cognitive context are still far from real-world implementation. In other words,
concrete applications using software algorithmic approaches to cognitive context
recognition have not been instantiated in real-world environments. It is also noticed
that frameworks for developing cognitive context-aware applications seem to be far
less than those for developing affective context-aware applications.

9.4.4.1 Context Categorization and Inference

In a study carried out by Kim et al. (2007), the authors propose the context
inference and service recommendation algorithms for the Web-based information
system (IS) domain. The context inference algorithm aims to recognize the user’s
intention as a cognitive context within the Web-based IS, while the service rec-
ommendation algorithm delivers user-adaptive or personalized services based on
the similarity measurement between the user preferences and the deliver-enabled
services. In addition, the authors demonstrate cognitive context awareness on the
Web-based IS through implementing the prototype deploying the two algorithms.
The aim of the proposed system deploying the context inference and service rec-
ommendation algorithm is to help the IS user to work with an information system
conveniently and enable an existing IS to deliver AmI services. For further detail on
the context inference and service recommendation framework see Chap. 5.
A few other studies have been done on the inference and adaptation to cognitive
context. Prekop and Burnett (2003) suggested a conceptual model of activity-
centric context, which focuses on creating context-aware applications that support
cognitive activities, but is far from a real-world implementation. Also, Kwon et al.
(2005) proposed a Need Aware Multi-Agent (NAMA) system, which attempts to
recognize both the cognitive context and the physical context, a research endeavor
that credits a contribution in the view of considering both contexts. While Kim et al.
(2007) system considers only cognitive context but adopts pretty much a similar
algorithmic approach to the inference of the cognitive context and the service
delivery or recommendation as NAMA, the inference algorithm used in the latter to
recognize the user’s context in is far from real-world application, and the method
for collecting the internal context is not accurate.
470 9 The Cognitively Supporting Behavior of AmI Systems …

9.4.4.2 Metadata Capture for Information Retrieval

Information searching and information retrieval are among the most frequent sets of
tasks or activities performed or carried out by users. They constitute either separate
actions in themselves or part of other complex tasks or activities. In either case, they
can be inferred or classified as cognitive dimensions of context by the so-called
cognitive context-aware applications. In reference to the first case, these applica-
tions recognize the cognitive dimension of context from the task the user is doing—
in this case information searching or retrieval, by detecting one or more internal
contexts at an atomic level and transforming them into a high-level abstraction of
context, reason about it, and then deliver the relevant service—recommending a list
of potentially needed documents to be retrieved along with their sources in rele-
vance to the task and other contextual elements. One approach used by cognitive
context-aware applications to perform the application action is what is called
metadata, which documents are tagged with. Metadata involve the document name,
the time and date of creation, and additional information related to the context in
which the system is being used. There is no limit to metadata (Ulrich 2008). To fire
or execute the context-dependent action, cognitive context-aware applications use a
context query language (e.g., Reichle et al. 2008) to access context information
from context providers to respond to the user’s cognitive need. Table 9.1 illustrates
some examples of how context can be used to retrieve documents (Schmidt 2005).
Both basic and context-driven metadata are important part of the data stored—
necessary to retrieve documents. The association between documents based on the
context used is critical as a criterion for information retrieval. For example, all
documents that have been open together with a given document—same time, same
location, and same project—can be retrieved as an adaptive service to be delivered
to the user. In all, metadata is of importance to cognitive context-aware applications,
as it reduces the cognitive burden that the user would otherwise incur to complete
the task at hand—information searching and retrieval. Applications that automati-
cally capture context are central to the idea of AmI (Abowd 1999) and iHCI
(Schmidt 2005).

Table 9.1 Using context metadata to retrieve documents


Context used Sample user query
People around Who was around when this document was created?
Social context Show all documents that were created while X was around
Location information Where was this document created?
Show all documents that I have accessed while I was in London
Location and time Show all documents that have been open together with this
information document. (Same time and same location)
Environmental Show all documents that were created when it was cold
information
Source Schmidt (2005)
9.4 Cognitive Context-Aware Computing 471

9.4.4.3 Adaptive User Interfaces

Having information on the current context (e.g., cognitive and physical dimension
of context), it becomes possible to build user interfaces that adapt to the user’s
cognitive and environmental context. For example, once the cognitive dimension of
context is recognized in the state of reading documents, physical changes in the
environment can be used as an external context (e.g., location, temperature, time,
lighting, etc.) by the system in order to adapt its functionality to the user’s cognitive
context, such as visual perception and visual attention. Having awareness of dif-
ferent, yet related, contextual dimensions, cognitive context-aware applications can
adjust (its interfaces) for use in different situations, which should occur without
conscious mediation. Were context is available during runtime in systems it
becomes feasible to adjust the user interfaces at runtime; however, the requirements
for the user interfaces are dependent on, in addition the user and the context, the
application and the user interface hardware available (Schmidt 2005). Visual fea-
tures of the display like colors, brightness, contrast, arrangement of icons, and so on
can be adjusted depending on where the user moves and exists (e.g., dim room,
living room, sunny space, in open air). Also, a display in a multi-display envi-
ronment may adapt in terms of the font and the size according to the type of the task
the user is engaged with (e.g., writing, reading, design, visual perception, chatting)
in a way that helps the user perform better and focus on the task at hand. However,
there is a variety of challenges associated with the topic of adaptive user interfaces,
among which include: user interface adaptation for distributed settings and user
interface adaptation in a single display (Schmidt 2005). As to the former, ‘in
environments where there is a choice of input and output devices it becomes central
to find the right input and output devices for a specific application in a given
situation. In an experiment where web content, such as text, images, audio-clips,
and videos are distributed in a display rich environment…context is a key concept
for determining the appropriate configuration…In particular to implement a system
where the user is not surprised where the content will turn up is rather difficult’
(Ibid, p. 169). As to the latter, ‘adapting the details in a single user interface a
runtime is a further big challenge. Here in particular adaptation of visual and
acoustic properties according to a situation is a central issue…We carried out
experiments where fonts and the font size in a visual interface became dependent on
the situation. Mainly dependent on the user’s activity the size of the font was
changed. In a stationary setting the font was small whereas when the user was
walking the font was made larger to enhance readability’ (Ibid).

9.4.4.4 Resources Management

The different computational and communication resources that may surround the
user in a particular location are normally discovered by the system as contextual
information that can be used to support the user’s task. In fact, using resources
dependent on the location and the context more generally was a main motivation in
472 9 The Cognitively Supporting Behavior of AmI Systems …

the early attempts at using context (Schilit 1995) in ubiquitous computing envi-
ronments. In addition to finding appropriate resources, context-aware systems use
also context ‘to adjust the use of resource to match the requirements of the context’
(Schmidt 2005). In an AmI setting, computational resources refer to resources that
have certain functionality and are able to communicate with the rest of AmI
computing and network systems. In this setting, the user would need a highly
efficient means to access resources, as their number and types could be staggering
(Zhang 2009). Using computational resources that are in proximity of the user is
central to context-aware applications. The aim of detecting resources that are close
to the current whereabouts of the user is to reduce the physical and cognitive burden
for users as well as avoid distracting the user from focusing on the task at hand. As
noted by Kirsh (1995), ordering and accessing items based on the concept of
physical proximity and the use of physical space as criteria is a very natural concept
for humans. To better meet the requirements of the current situation—better match
the user’s needs, selecting resources should be based on such contextual elements
as the nature and the requirement of the user activity and the user preferences as
well as on the status and condition of the resource and the network proximity, that
is, the context of the resource entity.

9.4.5 Eye Gaze and Facial Expressions: Cognitive Context


That Appears Externally

As mentioned previously, the cognitive dimension of context can be inferred by


means of either internal context such as user’s intention using software algorithms
or external context such as eye gaze and facial expressions using smart sensors. In
either case, cognitive context is relevant to psychological (or cognitive) states, such
as decision making, problem solving, and information searching. With the
advancement of multisensory technology and pattern recognition mechanisms, eye
gaze and facial expression can be used by cognitive context-aware systems to
capture a user’s cognitive context as a form of implicit input, in addition to allowing
new forms of explicit input (commands for executing tasks effectively). These
nonverbal cues are, as research shows, considered as cognitive channels, carriers of
cognitive information apart from their communication functions—in conversational
acts and emotional expressions. They can provide a wealth of data indicating the
individual’s activities and their cognitive states. The face and the eye are considered
to be a rich source of information for gathering context in our everyday lives.

9.4.5.1 Eye Movements: Implicit Input to Cognitive Context-Aware


and Affective Systems

The potential of eye movement as a form of implicit and explicit input is under
rigorous investigation in HCI community. Similarly, eye movement is increasingly
9.4 Cognitive Context-Aware Computing 473

gaining an increased attention among researchers within context-aware computing.


There is particularly an active investigation in the area of eye tracking and eye gaze
in relation to the development of ambient, naturalistic user interfaces as part of the
so-called cognitive context-aware systems. Eye movement is one of the multimodal
behaviors that indicate the user’s cognitive states. Therefore, cognitive context-
aware systems can use it as a source to detect or read cognitive cues to infer
particular user’s cognitive states and then adapt in response to the user’s cognitive
needs. Eye movement holds a great potential to solve the subtasks for capturing
cognitive cues or realizing cognitive states as implicit input reliably such as
anticipation of user intentions. In other words, the use of automatic recognition of
eye movement in context-aware computing is very promising, in particular in
relation to task performance. Hence, research on eye movement is significantly
important to the advancement and success of cognitive context-aware applications.
Eye movement is an effective means to display one’s psychological state, both
external emotional states as well as internal cognitive states. Indeed, it has been
researched in an attempt to derive finer indicators of such cognitive activities as
writing, information searching, reading, and exploring. Eye movements accurately
reflect visual attention and cognitive processes (Tobii Technology 2006). Also, eye
gaze is an important indicator of emotional stances with gaze patterns showing
specific distributions with few gazes lasting more than a second. People tend to
positively evaluate others by their patterns of gaze: people who look at their
interlocutor a lot of the time are ‘friendly’ and ‘sincere’ (Kleck and Nuessle 1968).
People tend to look more at conversants whom they like (Exline and Winters 1965).
Like facial expressions, eye gaze as an explicit display, is highly informative about
the emotional states of people and visible enough for conversational participants to
interpret a great deal of affective information. Indeed, people tend to rely on visual
information if it happens to be in conflict with verbal information.
The eye is suspended in the bony orbital socket by six extraocular muscles that
control its movements, allowing for vertical, horizontal and rotational movements
of the eyeball. Having more than one action due to the angle they make with the
optical axis of the eye while inserting into the eyeball, the extraocular muscles
rotate the eyeball around vertical, horizontal and antero-posterior axes. In terms of
data collection and reliability, eye movement data is given a particular value in
various research areas, such as cognitive psychology, neurology, and computing.
Due to the high sampling rate possible (1–2 ms) and the nonintrusive nature of data
collection, eye movement data is deemed to be particularly useful in studies
(Salvucci and Anderson 2001). The high resolution characterizing the human visual
system is restricted to a small area, requiring the gaze to shift to each area of
interest, indicating changes in visual attention and reflecting the cognitive processes
of the individual (Ibid).
Eye tracking tools, which can be embedded in ambient user interfaces, have a
great enabling potential to gauge the cognitive state of the user. If this can be
accurately done, it may be applied to tailoring educational programs to the learner
just as a tutor would act as to varying the delivery of instruction according to a
learner’s progress or lack thereof (Alexander and Sarrafzadeh 2004). These adaptive
474 9 The Cognitively Supporting Behavior of AmI Systems …

services are of great significance to learners in terms of facilitating comprehension,


encouraging participation, and boosting motivation for learning. Especially,
e-learning platforms are increasingly pervading the realm of education at all levels
and prevailing around the world. In the context of affective computing systems,
using behavioral user state based on eye gaze, among others, e-learning applications
can adjust the presentation style of a computerized tutor when a learner is bored,
interested, frustrated, or pleased (Asteriadis et al. 2009). Tracking of eye movements
is necessary to interpret pose and motion in many applications. Eye movement
tracking tools such as Tobii 1750 eye tracker (Tobii Technology 2006) can provide
the ability to estimate and visualize user’s emotional states. This is of particular
relevance and applicability in affective context-aware applications (see previous
chapter).

9.4.5.2 Facial Movements: Implicit Input to Cognitive Context-Aware


Systems

As mentioned previously, facial expressions can indicate cognitive processes apart


from having conversational functions and indicating emotional processes. The
facial muscles are not only affective in nature, but also express thought (Scherer
1992, 1994b), that is, indicate cognitive processes or states. Kaiser and Wehrle
(2001) found that a frown as a facial expression indicates incomprehension as an
internal context. Frowning is thus relevant to problem solving as a cognitive context
and thus can indicate an individual’s cognitive state. Frowning often occurs when
an individual encounters a difficulty in a task or does some hard thinking while
concentrated on attending to a problem (Ibid). However, for a computer to use
facial expressions as implicit input to capture a user’s cognitive states is not a trivial
computational process, but rather a very complex thing to model computationally—
whether using machine learning or ontological algorithms, as some facial expres-
sions may indicate different things simultaneously. ‘To make things even more
complicated, a facial expression can have several meanings at the same time: e.g., a
frown can indicate that the listener does not understand what the speaker is talking
about (cognitive difficulty); at the same time this frown is a listener response
(communicative), indicating disagreement and signaling that the speaker has to
explain his argument more appropriately; finally, it can indicate that the listener is
becoming more and more angry about this difficulty in understanding him (emo-
tional), about the content, or about the way this interaction develops’ (Kaiser and
Wehrle 2001, p. 287). Therefore, in terms of cognitive context-aware applications,
it is crucial to develop effective recognition approaches and sophisticated pattern
recognition algorithms or robust (hybrid) ontological models (representation and
reasoning mechanisms) that can discriminate between multiple functions of facial
movements. In other words, it is crucial to accurately gauge the user’s cognitive
state in order to be able to properly adapt in response to it. Adaptation decisions are
made based on the evaluation and inference of cognitive states as cognitive con-
textual information. Having the knowledge of differentiating between the functions
9.4 Cognitive Context-Aware Computing 475

of facial behavior ‘is a prerequisite for developing more adapted models for
interpreting facial expressions in spontaneous interactions, i.e., models that do not
interpret each occurrence of a frown in terms of anger, sadness, or fear’ (Ibid).
There is a need for specialized research within the area of cognitive
context-aware computing with the goal to create novel and robust tools and tech-
niques for accurate measurement and detection of facial expressions as indicators of
cognitive cues or states. This area remains indeed under-researched, compared to
facial expressions for emotion recognition. ‘Given the multi-functionality of facial
behavior and the fact that facial indicators of emotional processes are often very
subtle and change very rapidly…, we need approaches to measure facial expres-
sions objectively—with no connotation of meaning—on a micro-analytic level. The
Facial Action Coding System (FACS)…lends itself to this purpose; it allows the
reliable coding of any facial action in terms of the smallest visible unit of muscular
activity (Action Units), each referred to by a numerical code. As a consequence,
coding is independent of prior assumptions about prototypical emotion expressions’
(Ibid, pp. 287–288).

9.4.6 Challenges and Limitations

Cognitive context awareness poses real conundrums. Capturing, modeling, under-


standing, and inferring the user’s cognitive context—mental processes and states—
is associated with subtle intricacies and thus daunting challenges in the realm of
context-aware computing. Cognitive states and processes are tacit, and most of
them are internal as contextual features. In fact, they are difficult to externalize and
translate into a form understandable or intelligible to a system. This is likely to be
one of the reasons why research is not active in this area compared to affective
context-aware systems. Realizing cognitive implicit input reliably seems unfeasible
at the current stage of research due to the constraints of existing enabling tech-
nologies and computational processes. In particular, a number of subtasks for
realizing such an input, such as recognition, interpretation, processing, and infer-
ence of cognitive behavior and anticipation of user intention are still not solved yet.
In view of that, realizing cognitive implicit output pertinently seems to be still far
away due to major unsolved technological challenges.
There is a large body of work within cognitive psychology and cognitive science
describing various cognitive processes and how they interrelate and can be iden-
tified, but a few studies propose computational methods for capturing and modeling
cognitive states, especially in the state of performing tasks involved in the inter-
action with AmI systems. To create systems that can recognize and adapt to the user
cognitive states seems to be not an easy task, despite the advancement of computer
science—HCI and AI. Cognitive states are tacit, fuzzy, and difficult (even for the
user) to externalize or express and translate into a form intelligible to an AmI
system. To make things even more complicated, one task may involve several
cognitive states, which interact in a dynamic, and even sometimes synchronized,
476 9 The Cognitively Supporting Behavior of AmI Systems …

fashion—reasoning, problem solving, and decision making. In relation to tasks as a


category of human factors related context, cognitive cues are very difficult to detect,
track down, and disambiguate. In all, it is difficult to recognize user’s cognitive
context (Kim et al. 2007).
Major strides have been made and unprecedented achievements have been
realized in cognitive science and AI with regard to simulating complex processes of
human cognition into computer systems—intelligence enhancement. However, the
creation of ‘thinking machines’ does not necessarily mean that these machines can
understand the human mind when interacting with human users. The mind is a very
special machine. Attempting to computationally disambiguate what is cognitively
intended by a user—regardless of the type of context that can be used for the
purpose—in the state of interacting with a computer system makes the human mind
even more mysterious and cryptic. The question to be raised here is whether it is
feasible at all to create computer systems that emulate the ability to understand and
react to something—intentional processes—which humans themselves still find it
sometimes difficult to decipher when communicating with each other. Indeed,
human cognitive agents are sometimes not able to draw precise inferences about the
psychological states of each other—e.g., in the state of working together on some
tasks or activities. Therefore, it becomes sometimes difficult for people to help each
other in their tasks, as they cannot deduce what they are struggling with in order to
support each other’s cognitive needs. From a cognitive perspective, to understand
people’s intentions as a mental ability involves complex, dynamic information
processing, comprehensive knowledge base, and extensive experiences.
However, many scholars studying context awareness have acknowledged the
importance of capturing the cognitive elements of a user’s context. Anind Dey, a
scholar widely credited for his contribution to context-aware computing, conceive
of a context-aware system as one that uses context to provide relevant services to
the user, where relevancy depends on the user’s task (Dey 2000). It is crucial to
identify the user’s context in the state of carrying out tasks or activities in order to
be able to intelligently adapt in response to the user’s cognitive needs. In other
words, the outcome of reasoning, inference, decision making, and application
actions as computational processes are determined by the accurate detection and
effective interpretation of the use’s intention in order to be able to reduce the
cognitive burden associated with tasks performed or assisted by computer systems.
Failure to accurately detect and effectively interpret internal or external context at
an atomic level may render the provision of adaptive service irrelevant. This may be
disturbing, intrusive, and frustrating for users, which could subsequently have
implications for AmI technology acceptance. As an attempt to improve users’
performance, Fogli and Piccinno developed an interactive visual environment
called Software Shaping Software, which enables the user as a domain-expert to
define the functionality and configuration of the computational environment
(Bianchi-Berthouze and Mussio 2005). To achieve their objective, the authors
suggest using the metaphor of the working environment within which they identify
a key-role user that is an expert in a specific domain and also aware of the needs of
the user when using computational systems, an approach that enables domain
9.4 Cognitive Context-Aware Computing 477

expert users to collaborate with HCI engineers to design and implement


context-aware interactive systems.
Context-aware computing is highly promising. Tremendous opportunities reside
in implementing cognitive context-aware systems on different scales and intelli-
gence, ranging from mobile phones, e-learning platforms, bookshops, libraries, to
web-based information systems and web browser. We can envision a proliferation
of cognitive context-aware systems in different walks of life. Enabling computers to
accurately capture and infer the user’s cognitive context by abstracting it form a
combination of internal and external contexts is a worthy endeavor towards
understanding and supporting human cognitive intelligence. In the AmI world the
relationship between humans and the technology around us should be no longer one
of a user towards a tool, but of a person towards an ‘object-became-subject’,
something that is capable of responding intelligently to cognitive indications of
needs, among others.
Therefore, there is a need for advancing research on cognitive context aware-
ness, as capturing the cognitive context can help AmI systems reason more intel-
ligently about and adapt intuitively in response to the user’s cognitive states. In
particular more research should be dedicated to multimodal recognition of cognitive
behavior, combining cognitive cues read from multiple sources, such as eye gaze,
facial expression, gesture, intention, work context, and so on, for more robust
estimation of the cognitive state. In other words, novel sensing techniques and
pattern recognition algorithms as well as dedicated inference software algorithms
are required for advancing cognitive context-aware computing. Of equal impor-
tance as to research is how people cognitively operate and process information
when communicating with technological artifacts as context in themselves. Overall,
the topic of cognitive context awareness deserve more attention in the research
within cognitive science and AI, as it holds a great potential for enhancing user
acceptance of AmI technologies.

9.5 New Forms of Explicit Input and Challenges

9.5.1 Speech, Eye Gaze, Facial Expressions, and Gestures

AmI is capable of responding to spoken or gestured indications of desires as to


carrying out tasks. In addition to conveying a great deal of contextual information
to context-aware systems by indicating the individual’s activities and emotional and
cognitive states, to affective and emotionally intelligent systems by detecting
emotions, and conversational agents by providing communicative functions, eye
gaze, facial expressions, and gestures can allow new forms of explicit input. This is
promising as to bringing simplicity, smoothness, and intuitiveness to user inter-
action with AmI technologies. The aim is to reduce the cognitive burden for user to
direct and manipulate interactive (mobile) applications. Eye gaze, head and mouth
478 9 The Cognitively Supporting Behavior of AmI Systems …

motion of facial movements, and hand gestures are being investigated in AmI as to
how they can be used as a form of dynamic explicit input to control computer
systems in the sense of instructing them to execute tasks in a more effective and
efficient way. They can also be utilized to assist people with disabilities, with a wide
variety of impairments: visually impaired and hearing-impaired users rely on the
voice modality with some keyboard and the visual modality with some speech
input, respectively (see Vitense et al. 2002). Specifically, eye movement can be
used by the disabled who are unable to make normal use of explicit inputs such
keyboard, movements of the pointing device, and selections with the touch screen;
facial expressions by people with hand and speech disabilities; and gestures and
facial movements by people who suffer from blindness.
For regular users, eye movement may be more efficient than facial or gestural
movements in relation to a set of specific tasks. In other words, compared to HCI
designs using such movements as commands, eye gaze has greater potential to be
used as hands free method for many tasks associated with manipulating interactive
applications. For example, a gaze-based interface with eye gaze tracking capability,
a type of interface that is controlled completely by the eyes, can track the user’s eye
motion and translate it into a command to perform such tasks as opening documents
or scrolling, which the user would normally do by means of conventional explicit
inputs, using keystrokes with the keyboard, movements of the pointing device, and
selections with the touch screen. In more detail in a system equipped with a
gaze-based interface, a user gazes at a given link, then blinks in order to click
through; gazes at a given file or folder then blinks to open it; moves his/her eye to
scroll down and up or move the cursor from right to left and around across icons to
search for a particular item; and so on. The movements of the eyeball, horizontal,
vertical, and rotational can be combined, depending on the nature of the task being
carried out at a certain moment. But, the use of eye gaze information is a natural
choice for enhancing scrolling techniques, given the fact that the act of scrolling is
tightly linked to the users’ ability to absorb information through the eye as a visual
channel (Kumar et al. 2007). Adjouadi et al. (2004) describe a system whereby eye
position coordinates were obtained using corneal reflections and then translated into
mouse-pointer coordinates. In a similar approach, Sibert and Jacob (2000) show a
significant speed advantage of eye gaze selection over mouse selection and consider
it as a natural, hands free method of input. While this is concerned with healthy
subjects, Adjouadi et al. (2004) propose remote eye gaze tracking system as an
interface for persons with severe motor disability.
Similarly, facial movements similarly allow a new form of explicit input as an
alternative to eye gaze movements. These two distinct movements can also be
combined depending on the task to be performed and how the user might prefer—or
need—to proceed. Indeed, as an alternative to aid people with hand and speech
disabilities, visual tracking of facial movements has been used to manipulate and
control mouse cursor movements, e.g., moving the head with an open mouth which
causes an object to be dragged (Pantic and Rothkrantz 2003). Likewise, de Silva
et al. (2004) describe a system that tracks mouth movements.
9.5 New Forms of Explicit Input and Challenges 479

In addition, AmI systems are capable to use gestures and speech as commands to
assist the user in carrying out most routine tasks or activities. HCI design using
natural modalities as commands has a great potential to bring intuitiveness to the
interaction between AmI systems and users. Utilizing distance sensors Ishikawa
et al. (2005) propose touchless input system based on gesture commands. Abawajy
(2009) describes a (common) scenario where an application uses a natural modality
(gestures) to perform a task: the possible scenario is when a user refers ‘to a number
of open documents on a computer whilst typing a document. Presently one must
acquire the mouse, locate the target icon, move the cursor to the icon and click. If
the correct document is successfully opened one then has to use scroll bars or a
mouse wheel to move through the pages. An alternative could use gestures similar
to the movements used when leafing through physical documents. For example, by
moving two or three fingers towards or away from the palm the user could move to
the next document whilst moving one finger could move from page to page. The
user would face less interruption and save considerable cognitive effort when
navigating between and within documents’ (Ibid 2009, p. 67). This can also be
accomplished using speech modality. In this respect, the advantage of multiple
modalities is increased usability as well as accessibility. The limitation or infeasi-
bility of one modality can be counterbalanced by the strength or practicality of
another. On a mobile phone with a small keypad, a message may be cognitively
demanding to type but very easy to be spoken to the phone. Utilizing speech as a
command can be extended to be used in the event of writing, a feature that can be
used by both regular users and those with disabilities alike. With using speech as
commands one can easily manipulate the computer, e.g., you send speech signals to
the computer to, for example, switch off, open an application, log into a website,
play a song, send an email, find an address email, search for a document, or enter a
password.
Additionally, body movements may be used as an explicit input to control
interfaces. They can be used to manipulate what has come to be known as tangible
interfaces, which allow combining digital information with physical objects, and
there is a number of products that are illustrative of tangible interfaces, e.g., a small
keychain computer that clears the display when shaken (Fishkin 2004). The whole
idea of new forms of explicit inputs is to simplify the interaction with computer
systems and harness intuitive processes for their manipulation.
As regards the design of AmI systems using facial, gestural, and bodily move-
ments, it is important to account for physical factors such as aging and disabilities.
Nonverbal behaviors, which have gained increased attention in AmI research, are
typically linked to physical movements of the human body. In fact, aging and
disability have been ignored in HCI design and ICT more generally. And new
technologies continue to be designed for a certain type of users. When interactive
applications get adapted for the needs of a particular target user group, they are less
likely to be appropriate for others. In other words, HCI design tends to become
quickly a strengthening of existing stereotypes when targeting specific groups.
480 9 The Cognitively Supporting Behavior of AmI Systems …

The assumption that users have similar physical properties is problematic.


Brandtzæg (2005) claims 90 millions EU-citizens are fully or partly left out when it
comes to ICT use due to such permanent obstacles as age or disabilities. The aged
population is the most stereotyped group in society and variation of ability levels
within this group are the widest among the age groups (Hawthorn 1998).
In particular aging factors need to be taken into consideration when designing
AmI systems that support new forms of explicit inputs and cognitive context
awareness with regard to tasks or activities—especially capture technologies, pat-
tern recognition techniques, and modeling cognitive context and human body
movement domains of knowledge. Aging is relevant to cognitive and physiological
elements of a user’s context, and is a more complex issue to tackle. As a physical
factor it has strong implications for the cognitive and physical performance of
individuals. An investigation conducted by Hawthorn (1998) reports on many aging
issues. Besides the obvious fact that aging changes result in decrease muscle
strength and endurance, it does slower motor response time, which is accentuated
with increased task complexity, and reduced ability to control and modify forces
applied (Hawthorn 1998). Hence, aging factors are strongly linked to users’ non-
verbal behavior and are thus crucial for designing properly functional and widely
accepted AmI technologies—providing simplicity of interaction. However, while a
considerable work has been carried out on the common properties of human body
movements, only a few studies have been performed on differences in physical
behavior relating to aging (especially in the context of computing). Aging in AmI
research is thus an under-researched area and future research to investigate aging in
relation to nonverbal behavior is no doubt warranted to better inform the design and
development of naturalistic, multimodal user interfaces for the applications sup-
porting new forms of explicit interaction and cognitive context awareness in terms
of reducing the cognitive and physical burden required to manipulate the computer.
The underlying assumption is that how physically and, to a certain extent, cogni-
tively users perform or carry out tasks is affected by aging factors. An awareness of
the developmental considerations of younger users and the decline in the abilities of
the aged may provide useful insights into designing efficient user interfaces for all
age groups (Hawthorn 1998; Strommen 1993). The authors conclude that both
younger and aged groups perform better when complex movements are devolved
into discrete movements, the former primarily due to lack of cognitive development
and the latter primarily due to decline in physical abilities. As far as disability is
concerned, there have been some attempts to assist disabled users to manipulate
computer systems by offering different alternatives when it comes to using new
forms of explicit inputs: eye gaze, facial expressions, speech, and gestures. In all,
while there is hardly ever a ‘one-size-fits-all’ solution for designing user interfaces
due to the variety of users and interactions, considering such factors as aging and
disabilities will certainly enhance the social acceptance of AmI technologies and
thus encourage a wider adoption thereof—the simplest interactions and adaptable
user interfaces are required.
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 481

9.6 The Relationship Between Aesthetics, Affect,


and Cognition in AmI

It is essential to design AmI technologies and environments that elicit positive


emotions and pleasant user experiences. The affective quality of user interfaces,
software applications, and computing devices, in addition to the smoothness,
simplicity, and richness of interaction all play a key role in eliciting positive
emotions in users, which has implications for improving their task performance.
Recent studies demonstrate the connection between aesthetics and affect and their
significance in the ICT design (Norman 2002, 2004; Zhang and Li 2005; Zhang
2009), the relationship between affect and cognition in the context of ICT (Norman
2002), the relationship between design aesthetics and emotions (Hekkert 2004), and
a broader sense of aesthetics in art, design, and computer science (Fishwick 2006).
Overall, there is an increasing interest in aesthetics and affect and how they affect
cognition in the context of ICT design and use. Therefore, it is important to develop
and apply new approaches to AmI design based on the understanding of these
concepts and their relationships.

9.6.1 Affect and Related Concepts and Theories

In social science (psychology), affect is considered to be a critical building block


and thus affective processes to be the basis of feelings, emotions, and moods, in
both conscious and non-conscious states. Affect is thus an umbrella term for mood,
emotion, and feeling as different, yet related, concepts. Accordingly, it is often used
interchangeably with these concepts—e.g., in several studies on ICT design and
aesthetic and emotional computing. However, the distinction between those con-
cepts is a subject of a technical debate. In other words, terminology is still an issue
that is under discussion. According to Forgas and George (2001), affect has two
components moods and emotions: Moods are the more pervasive states with no
apparent sets of causes and thus less based on rational contingencies; as they are not
rooted in a specific cause, they are difficult to manipulate. As momentary outbursts,
emotions, by contrast, are more intense, less enduring, and usually triggered by an
identifiable circumstance or a specific cause as self-evident, and hence they are
easier to deal with. While emotion as an affective state tends to have a clear focus
and is amenable to action and a transient state of mind evoked by experiencing
different sensations, mood tends to be unfocused, diffused, and long-lasting. As
such mood is thus more difficult to cope with and can last from days to years, unlike
instant reactions that produce emotion and change with expectations of future
pleasure or pain (Schucman and Thetford 1975). According to Batson et al. (1992),
mood involves tone and intensity as well as a structured set of beliefs about
expectations of a future experience of pleasure or pain. Typically, the existence of
moods can be inferred from a variety of behavioral referents (Blechman 1990).
482 9 The Cognitively Supporting Behavior of AmI Systems …

In addition, it is assumed that affect is a factor in mood and emotion is associated


with it. Mood constructs represent an individual’s emotional state and is shaped by
his/her personal life.
Furthermore, what characterizes affect has been a subject of debate among
scholars for decades. Although the cognitive and neurosciences have made major
strides in the past decade, affect and emotion are not well understood and thus
remain unclear concepts in psychology. Like emotion (see previous chapter for a
detailed discussion), there exist many theoretical perspectives on and unsettled
issues about affect. Nevertheless, there is agreement among psychologists that affect
or affective processes are considered to be the basis for emotions and feelings.
Affect produces emotions and emotions generate feelings. However, views tend to
differ as to whether initial affect or affective reactions produces thoughts or affect is
produced by the thoughts—cognitive evaluations. Some views argue that affect is
pre-cognitive and others contend that it is post-cognitive, based on sensation, likes,
and dislikes. Zajonc (1980) suggests that affective reactions can be made sooner
than cognitive judgments, occurring without extensive perceptual and cognitive
encoding. That is to say, the experience of emotion may occur before the typical
cognitive information processing over perception necessary for the formation of
emotion as complex chains of events triggered by certain stimuli. In this sense,
affect-based judgments and cognitive appraisal processing are two independent
systems. Conversely, Lazarus (1982) considers affect to be post-cognitive—that is,
it is elicited or occurs after a certain amount of cognitive processing of information
—judgments or thoughts. He argues that cognitive appraisal processes are deemed
crucial for the development and expression of emotions because emotions are a
result of an anticipated, experienced, or imagined outcome of the patterns of
adaptational transaction between the organism and the environment. In line with
this view, an affective reaction is based on a prior cognitive process in which
various content discrimination are made and features are identified, assessed, and
weighted for their contributions (Brewin 1989). However, the controversy on the
primacy of an ‘affective system’ versus a ‘cognitive system’ in emotion generation
(the Lazarus-Zajonc controversy) is a debate that is mostly around semantic issues
(Leventhal and Scherer 1987; Scherer 1994). On the other hand, Lerner and Keltner
(2000) argue that affect can be both pre- and post-cognitive, with thoughts being
produced by initial affective reactions, and affect being produced by the thoughts.
Zhang (2008, p. 147) states, ‘modern affect theories propose that human beings
have two synchronous systems that activate and regulate emotions. The primitive
biological system has the evolution root of human beings and is an innate, spon-
taneous, physiological system that reacts involuntarily to emotional stimuli. The
contemporary cognitive system is an experience-based system that reacts inter-
pretatively and socially. The two systems influence each other and combined they
provide a highly adaptive emotion mechanism’.
In psychology, feeling refers to the conscious subjective experience of emotion
(VandenBos 2006). In other words, it is an affective state of consciousness which
results from emotions. Feelings are thus recognizable. They are called core affect,
which has two dimensions (Seo et al. 2004). One is the degree of pleasure (i.e., state
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 483

of happiness) or discomfort (i.e., state of unhappiness) and the other is the level of
activation (i.e., state of alert) or deactivation (i.e., state of calmness) experienced by
the individual. These two dimensions are unrelated, as a strong sense of pleasure
can accompany low activation, so can a strong sense of displeasure. Russell (2003)
defines core affect as neuro-physiological state that is consciously accessible as a
simple, non-reflective feeling. This definition is part of a notable recent work in
theoretical development in psychology carried out by the author, which contributed
significantly to the definition of a number of important affective concepts. Those of
which that relate to aesthetic and emotional computing are introduced here.
Affective quality is a stimulus’ ability to cause a change in core affect (Ibid). Core
affect pertains to the individual while affective quality to the stimulus, such as
artifacts/objects, events, and places. Perception of affective quality refers to an
individual’s perception of a stimulus’s ability to change his/her core affect (Russell
2003). The perception of the stimulus leads to an appraisal through thought process
—cognitive information processing, which is a perceptual process that assesses the
affective quality of the stimulus. Accordingly, an AmI artifact is as a stimulus
consciously sensed and perceived—recognized and affectively interpreted, leading
to an appraisal, which in turn leads to an emotional response. Perceived affective
quality of ICT artifacts has been studied by Zhang and Li (2004, 2005) as a concept
related to affect. As an elemental process, perception of affective quality has been
assigned other terms, such as evaluation, affective judgment, affective reaction, and
primitive emotion (Cacioppo et al. 1999; Russell 2003; Zajonc 1980). However, in
the context of HCI design, the affective quality of AmI artifacts may have an effect
on user’s affect, i.e., aesthetically beautiful and emotionally appealing AmI systems
are likely to elicit positive emotions in users. In this sense, perception of affective
quality (of an artifact) is a construct that makes such a relation (Zhang and Li 2004).

9.6.2 Aesthetics

Affect is related to, but different from, aesthetics. Zhang (2009, p. 6) notes:
‘…a simple way to differentiate and connect aesthetics and affect is to say that
aesthetics emphasizes the quality of an object or stimulus and the perception of such
quality in one’s environment, and affect emphasizes the innate feelings people have
that are induced by the object (such as emotions and affective evaluations)’. Affect
studies are concerned with individuals’ affective or emotional reactions to stimuli in
one’s environment, while aesthetics studies focus on objects and their effect on
people’s affect. Affect refers to user’s psychological response to the perceptual
design details of the artifact (Demirbilek and Sener 2003). ‘Aesthetics’ comes from
the Greek word aesthesis, meaning sensuous knowledge or sensory perception and
understanding. It is a branch of philosophy. The meaning of esthetics is implied to
be a broader one, including any sensual perceptions, but sometimes the concept is
used to describe a sense of pleasure (Wasserman et al. 2000). It is difficult to pin
484 9 The Cognitively Supporting Behavior of AmI Systems …

down the concept of aesthetics. Lindgaard et al. (2006) contend that the concept of
aesthetics is considered to be elusive and confusing. It was the philosopher
Baumgarten who, in the eighteenth century, changed the meaning of the concept
into sense gratification or sensuous delight (Goldman 2001). The term has since
become more related to the pleasure attained from sensory perception—sensuous
delight as an aesthetic experience. Aesthetics can thus be said to be about the
experience of the beauty and quality of artifacts (or work of art) as gratifying to our
senses. As artifacts are produced to gratify our senses, ‘the concept has…been
applied to any aspect of the experience of art, such as aesthetic judgment, aesthetic
attitude, aesthetic understanding, aesthetic emotion, and aesthetic value. These are
all considered part of the aesthetic experience’ (Hekkert 2004, p. 2). However,
aesthetics can be found in many aspects of human lives, nature as well as people
can be experienced aesthetically. Indeed, aesthetics has been studied as part of a
wide range of disciplines and has a long history as an object of study. Its historical
development is, however, beyond the scope of this chapter. In relation to com-
puting, the reader can be directed to the book, Aesthetic Computing, written by
Fishwick (2006), which attempts to place aesthetics in its historical context,
and examines its broader sense in art and design, mathematics and computing, and
HCI—user interfaces—in the form of a set of collected articles and essays. This
book involves several scholars and practitioners from art, design, computer science,
and mathematics; they have contributed to laying the foundations for a new field
that applies the theory and practice of art to computing. In the context of computer
science, the contributors address aesthetics from a broader perspective, from
abstract qualities of symmetry to ideas of creative pleasure. Aesthetic computing
offers benefits to HCI in terms of enhancing usability through inducing positive
affective states in users. Pleasant, aesthetic design of artifacts enhances their
usability (Norman 2002).
As an affective experience, aesthetics is subjective because it is associated with
perception. That is, affective response (pleasure or displeasure from the sensory
perception of the affective quality of an artifact) is based on the way each individual
experiences an artifact—react affectively to it, meaning that aesthetic response tends
to vary from an individual to another. Aesthetics is rather about its effect on the
perceiver in terms of the degree to which his/her senses can be gratified when
experiencing an artifact than the aesthetic potential of an artifact itself to invoke
affective reactions. In this sense, gratification of senses or sensuous delight is much
linked to such factors as the context, the situation, and the environment, as well as
idiosyncratic and sociocultural dimensions of the perceiver. This implies that same
aesthetic quality of an artifact may trigger different affective reactions on different
people. Aesthetic reactions differ in a lawful manner, ‘just like the process
underlying our emotions is uniform, yet leading to individual differences as a result
of interpretation differences’; and ‘it is only in this way that beauty can be said to lie
in the “eyes of the beholder”’ (Hekkert 2004, p. 4). Moreover, several concepts
related to aesthetics have been developed to signify the explicit meanings of
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 485

subjectivity, including aesthetic perception, perceived visual aesthetics, perceived


aesthetic value, and aesthetic experience, in addition to other related concepts, such
as perceived visual appeal and hedonic quality (Zhang 2009). For example, in an
attempt to develop measures of perceived website aesthetics, Lavie and Tractinsky
identified a two-dimensional structure: the classical aesthetics denotes orderliness in
design, including descriptions such as ‘pleasant’ ‘symmetrical’ and ‘aesthetic’ and
the expressive aesthetics signifies designers’ creativity and can be described by
‘sophisticated’, ‘creative’ and ‘fascinating’ (Ibid). However, subjectivity versus
objectivity is considered as one of the most contentious issues in aesthetics studies
in general and in the ICT context in particular. According to Zhang (2009), the
objectivity view denotes that aesthetics is the quality in a object, meaning that
aesthetics lies in the object in one’s environment, and this object must have certain
features ‘to reveal its inherent quality, or with aesthetic potential’, which as
‘attributes exist regardless of whether they are perceived by people or agreed upon
among people’.

9.6.3 Artifact Experience Versus Aesthetic Experience

To experience an artifact entails, in an initial state, an exposure to it, which involves


affective processes, and, in a subsequent stage, an interaction with it, which
involves, depending on its function and what this entails in terms of the nature of
interactivity, cognitive and emotional processes. A product or an artifact experience
can be defined as ‘the entire set of effects that is elicited by the interaction between
a user and a product, including the degree to which all our senses are gratified
(aesthetic experience), the meanings we attach to the product (experience of
meaning), and the feelings and emotions that are elicited (emotional experience).
With respect to the attachment of meaning, many cognitive processes play a role,
such as interpretation, retrieval from memory, and associations’ (Hekkert 2004,
p. 3). In the full experience of an AmI artifact, the affective-cognitive impact plays a
role in generating positive emotions during the interaction with the AmI system as a
result of a positive interpretation and evaluation of its affective quality as well as its
(autonomous) intelligent behavior. A typical experience of an artifact involves,
besides the aesthetic part, ‘understanding and an emotional episode’; ‘although
these three constituents of an experience are conceptually different, they are very
much intertwined and impossible to distinguish at a phenomenological level. We
experience the unity of sensuous delight, meaningful interpretation, and emotional
involvement, and only in this unity we can speak of an experience’ (Ibid).
As echoed by Zhang (2009), the two systems involved in inducing ‘intended
emotions via the affective system that is invoked by initial exposure to ICT’ (AmI
systems) and inducing ‘intended emotions via the cognitive system that is based on
intensive cognitive activities’ ‘influence each other and combined they provide a
486 9 The Cognitively Supporting Behavior of AmI Systems …

highly adaptive emotion mechanism’. In view of that, however, the aesthetic


experience—what is pleasurable or gratifying to senses of the artifact—is only part
of the full experience of the artifact. A recent ‘model of aesthetic experience’ (Leder
et al. 2004) illustrates that the aesthetic experience involves all processes involved
in our interaction with an artifact as a work of art. In this model, an observer of
artwork ‘starts off with a perceptual analysis of the work, compares this to previous
encounters, classifies the work into a meaningful category, and subsequently
interprets and evaluates the work, resulting in an aesthetic judgment and an aes-
thetic emotion. Only the first two (or three) stages would be considered aesthetic…
In these, mostly automatic stages perception is at work and the degree to which our
perceptual system manages to detect structure and assesses the work’s
novelty/familiarity determines the affect that is generated. At these stages we talk
about sensuous delight (or displeasure), whereas at later stages cognitive and
emotional processes enter the experience. There is every reason to consider these
stages part of the experience of the work of art, but there is also a good reason not to
call these stages aesthetic’ (Hekkert 2004, p. 3). Many authors (e.g., Wasserman
et al. 2000; Goldman 2001, Norman 2004; Zhang 2009; Hekkert 2004) consider,
explicitly or implicitly, aesthetic experience to constitutes only part of the full
experience of an artifact, which normally involves sensuous perception, interpre-
tation and evaluation (cognitive processing), and the ‘subsequent’ emotional pro-
cesses, including action tendency, expressive behavior, and subjective feeling.
Drawing on Zhang (2009, p. 147), the key for applying emotional studies to AmI
design is two-fold: induce intended emotions via ‘an innate, spontaneous, physi-
ological system that reacts involuntarily to emotional stimuli’, which is invoked by
initial exposure to AmI technologies, and induce intended emotions via ‘an
experience-based system that reacts interpretatively and socially, which is based on
intensive cognitive activities’.
Further to the argument that only part of the full experience of artifacts should be
considered of aesthetic nature, ‘the rest of the experience deals with faculties of the
human mind, i.e., cognition and emotion…and they should thus be conceptually
separated. All three levels of the experience, the aesthetic, understanding, and
emotional level, have their own, albeit highly related, underlying processes. These
processes are not arbitrary, but lawful. Although this seems rather obvious for the
way we understand a product and emotionally respond to it, this also applies to our
aesthetic responses to products. This is something we have only recently come to
realize’ Hekkert (2004, p. 2). It is warranted to further investigate the patterns
underlying users’ aesthetic reactions in the context of AmI environments.
Especially, the physical disappearance of AmI technology from our environment is
about the whole environment surrounding the user having the potential to function
as a unified interface. AmI technology—miniature, distributed, and embedded
devices—will be hidden in daily aesthetic beautiful objects. The technology dis-
appears into our daily surroundings until only the user interfaces remain perceivable
by users.
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 487

9.6.4 Appraisal Theory: Emotional Response


to the External Environment

Following the tenets of cognitivism, cognitions are mental and social representa-
tions of real objects, processes, events, and situations that occur in the world.
Accordingly, they are based on perceptions, i.e., affected by subjective, socially
situated interpretation of these elements, and cognitive schemata facilitate percep-
tion of novel experiences. The cognitive system is seen as ‘an experience-based
system that reacts interpretatively and socially’ (Zhang 2008, p. 147). Although
they are abstractions, and thus often simplifications or alterations of the external
environment, they do constitute attempts to capture reality. Appraisal theory pro-
vides a descriptive framework for emotion based on perceptions, that is, the way
individuals experience objects, processes, events, and situations at the focus of the
emotional state (Scherer 1999). The process underlying the emotional response to
these elements can in fact most precisely be described by an appraisal model (e.g.,
Frijda 1986; Scherer 1992; Scherer et al. 2001; Roseman et al. 1994; Ortony and
Turner 1990; Ortony et al. 1988). Theses appraisal theorists posit that an emotion is
elicited by an appraisal of a situation, event, or object as potentially advantageous
or disadvantageous to a person’s concerns, e.g., on seeing a new smart mobile
phone a person may experience desire because he/she expects that possessing it will
fulfill his/her concern of being in the know of the latest technology. A key premise
of appraisal theory is that it is the interpretation of the situation, event, or object
rather than these themselves, which trigger the emotion. Appraisal theory postulates
that each emotional response of an individual has an idiosyncratic pattern of
appraisal, but there are few one-to-one relationships between an emotional response
and a situation, event, or object.

9.6.5 Aesthetics and Affect in AmI Design and Use Context

Aesthetics is a crucial aspect of the design and use of technological artifacts,


especially in AmI environments, in which people will frequently be exposed to and
interact with various types of aesthetically beautiful artifacts. These are intended to
gratify various senses simultaneously and elicit positive emotions. It is argued that
models of system design that do not consider affect are essentially weakened (see
Helander and Tham 2003), as affect constitute a basis for the formation of human
appraisal, judgment, and values. Affective AmI artifact design attempts to define the
subjective emotional relationships between users and artifacts, and to explore the
affective properties that artifacts intend to communicate through their visual, aes-
thetic, and context-aware attributes; in particular, to support affective design with
context-aware adaptive and responsive applications is one of the strengths of AmI
environments. Such design seeks to deliver artifacts capable of eliciting affective
and psychophysiological pleasure that users may obtain through all of their senses.
488 9 The Cognitively Supporting Behavior of AmI Systems …

Given the intensity of the interaction between users and AmI artifacts, e.g., intel-
ligent functionality and visual and aesthetic tools, the interaction experience should
have a multidimensional effect, involving sense gratification resulting from aes-
thetically pleasant objects, pleasure and effectiveness of use resulting from the
interaction with the system at data and process levels, and fulfillment resulting from
achieving well-defined goals. In general, as regards sense gratification ‘following
thinking in evolutionary psychology, it is argued that we aesthetically prefer
environmental patterns and features that are beneficial for (the development of) the
senses’ functioning… If certain patterns in the environment contribute to the
functioning of our senses, it is reinforcing to expose ourselves to these patterns.
Hence, we have come to derive aesthetic pleasure from seeing, hearing, touching…
and thinking certain patterns that are beneficial to our primary sense’s functioning’
(Hekkert 2004, p. 1, 10). The aesthetic experience of AmI artifacts involves the
quality of their aesthetic features. These are associated with user interfaces and
encompass at the software level the visualizations of the content, menu and navi-
gation structure, fonts, color pallet, graphical layouts, dynamic icons, animations,
images, musical sounds, and so on. At the hardware level, aesthetic features include
type of display, casing, size, shape, weight, temperature, material, color, buttons,
and so on. Both aesthetic features are connected to the attractiveness and beauty of
AmI artifacts as part of the full experience thereof. The other part of the experience
of AmI artifacts concern the processes associated with the use of the artifact in
terms of performing tasks or actions, such as touching, scrolling, clicking, pushing,
navigating, and receiving reactions from the user interface or device, e.g., images
and musical sound or auditory feedback. Drawing on Dewey (1934), the experience
of the artifact is shaped by a continuous alternation of doing and undergoing.
A typical everyday experience with an AmI artifact would involve interactivity with
both aspects. It is an experience since it is demarcated by a beginning and an end to
make for a whole; this experience is shaped by a continuous alternation of doing
and undergoing (Dewey 1934). However, high affective quality of designed arti-
facts can profoundly influence people’s core affect through evoking positive
affective states, such as delight and satisfaction. Therefore, it is strongly favorable
to take into account aesthetics in the design of AmI systems. The sensory aspects of
humans should be accounted for in all forms of design (Loewy 1951). Aesthetics
satisfies basic ICT users’ needs when they strive for satisfying interactive experi-
ence that involves the senses, produces affective responses, and achieves certain
well-defined goals (Ben-Bassat et al. 2006; Tractinsky 2006), although it has been
difficult for users to articulate the different affective needs, and hence for HCI
designers to understand these needs. However, Norman (2004) contends that the
concept of aesthetic experience is implied to include emotional design. He makes
explicit, in a three-level processing for emotional design, the connection between
aesthetics and emotion: the visceral processing which requires visceral design that
leads to (pleasant) appearance, the behavioral processing which entails behavioral
design that is associated with the pleasure and effectiveness of use, and the
reflective processing which requires reflective design that is about personal satis-
faction, self-image, and memories. This three-level processing illustrates that
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 489

pleasure derivable from the appearance and satisfaction resulting from the func-
tioning of the artifact increase positive affect. Furthermore, aesthetics-based HCI
involves affective quality and rich-content information whose perception is affected
by a subjective, socioculturally situated interpretation of the AmI artifact and the
task. In relation to affective and emotional reactions (positive or negative), one’s
appropriation of the AmI artifact’s aesthetic quality and performance is based on a
set of intertwined factors involved in a particular use situation, e.g., colors and how
they can be combined and dynamically change in a user interface as affective
quality features of an AmI artifact. Psychological theory has it that colors invoke
emotions (pleasure or displeasure) and that people vary as to aesthetically judging
colors and affectively interpreting them based on cultural standards, in addition to
other factors such as personality, preferences, and gender. Colors as an esthetic
element are culturally dependent. Besides, it is argued that there are no universally
agreed-upon principles of what is aesthetically beautiful from what is not. The
pattern underlying our aesthetic responses or reactions, albeit uniform (i.e., visual
system, affect system), can differ from an individual to another, just like the process
underlying our emotions is unvarying (i.e., five organismic subsystems: cognitive
system (appraisal), autonomic nervous system (arousal), motor system (expression),
motivational system (action tendencies), and the monitor system (feeling) (Scherer
1993, 1994b), yet leads to individual differences as a result of interpretation dif-
ferences. Accordingly, like a number of aspects of artifact design aesthetics, the
visual perception of colors tends to be subjective, vary from an individual to
another. A constructivistic worldview posits that reality is socially constructed, i.e.,
the constructions are not personal—the representation process involves other social
and cultural artifacts and therefore inevitably becomes social, although perception
necessarily is individual. Therefore, it is important to account for cultural variations
in interaction design aesthetics—social-cultural specificity of aesthetic representa-
tions. In terms of applying emotional studies to AmI design, inducing intended
emotions involves an experience-based system that reacts socially and interpreta-
tively, to draw on Zhang (2009). Visual conventions have proven not to be uni-
versal as perception of aesthetics is culturally situated. Implementing user interfaces
foundered on assumptions that do not hold renders AmI design useless in the face
of cultural contingencies. Understanding how the process of interpretation occurs
when experiencing an artifact, aesthetically and during the interaction, holds a key
to designing for emotions through aesthetic means in computing. Fishwick (2006)
considers the importance of aesthetics and introduces aesthetic computing as a new
field of studies that ‘aims at adding qualitative representational aspects to visual
computing in order to support various cognitive processes… He argues that visual
programing is not only about technical issues but also about cultural and philo-
sophical assumptions on the notation used to represent computational structures.
His aim is not just to define the optimal aesthetic style, but also to support users to
explore new subjective perspectives’ (Bianchi-Berthouze and Mussio 2005, p. 384).
In ‘Aesthetic Computing’ (Fishwick 2006), the author explores aesthetic experience
beyond the representation of technological events.
490 9 The Cognitively Supporting Behavior of AmI Systems …

9.6.6 The Evolving Affective-Ambient-Aesthetic Centric


Paradigm

New studies in the field of AmI as a novel approach to HCI are marking new
milestones, including the emphases on intelligent functionalities and capabilities
(i.e., context awareness, natural interaction, affective computing) and aesthetics
design and affective design. There is growing interest in merging affective, ambient,
and aesthetic aspects in HCI design. Interactive systems combining context-aware,
multimodal, perceptual, visual, and aesthetic features are increasingly proliferating,
spanning a wide range of ICT application areas. These systems offer new and
appealing possibilities to user interaction—pleasurable experience, aesthetic
appreciation, and positive feeling. The new computing culture is about how people
aspire to interact with technology and the effect they expect this will have on their
own cognitive world—e.g., affect and emotion. This experience-driven way of
acting is a qualitative leap crystallized into a new paradigm shift in HCI, marking a
movement toward a more human-centered philosophy of design and concomitantly
heralding the end of the old computing paradigm which is about what computer can
do. Accordingly, among the things that many scholars have only recently come to
realize and make progress within is the relationship between aesthetics, affect, and
cognition. Aesthetics plays a key role in eliciting positive affective states, which in
turn influences cognitive processes associated with task performance. It can thus be
used to facilitate and stimulate cognitive abilities, either as an alternative to or a
combination with cognitive context-aware adaptive and responsive behavior,
depending on the nature of the task and the characteristics of the user’ cognitive
behavior. In general, the discourse has moved on from the goal of merely attaining
system functionality and usability to aesthetics, a movement from a cognitive
paradigm to a more affective centric paradigm and from being instrumental ori-
entation to experiential orientation (Norman 2002, 2004; Zhang and Li 2004,
2005). Bosse et al. (2007, p. 45) point out that the human factors should ‘support
designs that address people’s emotional responses and aspirations, whereas
usability alone still demands a great deal of attention in both research and practice.
Consideration of these needs has generally fallen within the designer’s sphere of
activities, through the designer’s holistic contribution to the aesthetic and functional
dimensions of human-system interactions’. Accordingly, emotions are gaining an
increased attention in AmI design research. AmI emphasizes the significance of
emotional states in determining the unfolding of the interaction process. In it,
positive emotions can be induced by the affective quality of AmI systems and the
smoothness, simplicity, and richness of interaction due to new technological fea-
tures of AmI. The aesthetic experience is said to have an effect on the users’
cognitive behavior associated with performing tasks using computational artifacts.
It aids user cognition during interaction (e.g., Norman 2002; Spillers 2004).
Similarly, the strength of AmI environments in supporting affective design with
context-aware adaptive applications has implications for improving user perfor-
mance, as they elicit pleasant user experiences. However, the need of AmI
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 491

environments to support affective design involves technical challenges; among


typical features that are to be realized in such environments include:
‘Embedded: Since many devices are plugged into the network, the resulting
system consists of multiple devices, computing equipment, and software systems
that must interact with one another. Some of the devices are simple sensors, while
others may be actuators owning a crunch of control activities within an ambient
intelligence environment… The strong heterogeneity makes difficult a uniformed
policy-based management among diverse user interactions and services.
Context-aware: A fundamental role of ambient intelligence is the capability of
context sensing. This central concept of context awareness represents the possibility
for the ambient intelligence system of biasing itself and its reactions to the envi-
ronment. This means knowledge of many statically and dynamically changing
parameters in relation to consciousness. In particular, affective design involves
intensive user-centered contextual data, which necessitates the exploitation of
relationships between the human concept of consciousness and the ambient intel-
ligence idea of context.
Personalized: An ambient intelligence environment is supposed to be designed
for people instead of generic users. This means that the system should be flexible
enough to tailor itself to meet individual human needs. This is because affective
design always involves highly customized products and personalized environments.
Adaptive: The affective design with ambient intelligence system, being sensible
to the user’s feedback, is capable to modify the corresponding actions have been or
will be performed. This is consistent with the mass customization situation,
where…[users] always want to make informed decisions of their own’ (Bosse et al.
2007, p. 56).

9.6.7 Affect and Cognition in the AmI Use Context

HCI researchers in AmI community have recently started to focus on aesthetics and
affect in the AmI use. In addition to seeking to understand how aesthetics of AmI
artifacts can trigger and mediate affect, there is a growing interest in exploring how
these processes can have effect on user performance—i.e., aid user cognition during
interaction with AmI systems. Some of the formal investigations on the effects of
aesthetic and affective related constructs on human ICT interaction factors such as
use and performance in various contexts are on the way by scholars in the field of
HCI (Zhang 2009). Many studies demonstrate the significance of affect on (cog-
nitive) task performance. It has become of significance to apply new theoretical
models based on the understanding of the concepts of aesthetics, affect, and cog-
nition and their relationships. The underlying assumption of integrating aesthetics
in the design of AmI systems is that the aesthetic experience as part of the full
experience of an AmI artifact is more likely to influence the cognitive processes
involved in tasks and activities in various AmI use situations. This relates to the
system feature of social intelligence of AmI systems. A system designed with
492 9 The Cognitively Supporting Behavior of AmI Systems …

socially intelligent features invokes positive emotions in the user (Markopoulos


et al. 2005) as a result of a satisfying user experience. It is worth mentioning that
ambient features in HCI play also a key role in invoking positive emotions in the
user. AmI allows users to benefit from a diverse range of intelligent services which
should also be supported by aesthetics such as affective quality, visual appeal, and
attractiveness. There is evidence that aesthetically pleasant and attractive artifacts
work better and produce a more harmonious result (Norman 2002). Emotion acts as
a critical component of artifact sense-making and determines how artifacts are
interpreted (Rafaeli and Vilnai-Yavetz 2004) and evaluated. Evaluating an artifact
occurs through attaching meaning or a symbolic significance to it, which leads to an
emotional response that gives the artifact its perceived meaning. Emotion also
influences user cognition during interaction with that artifact. Positive emotions
have a direct effect on cognitive performance. In Norman’s (2002, p. 5) words,
‘when we feel good, we overlook design faults. Use a pleasing design, one that
looks good and feels, well, sexy, and the behavior seems to go along more
smoothly, more easily, and better’. The key for understanding the interchange
between affective artifacts related to aesthetic experience and cognitive artifacts
related to use experience is to design for emotions and thus facilitate and enhance
the use of AmI systems. Understanding ‘how artifacts trigger and mediate affect and
how these processes aid user cognition during interaction’ is crucial ‘to better
understand the specific qualities of user experience impacting desirability and
pleasureability’ (Spillers 2004).

9.6.8 Relationship Between Affect, Mood, and Cognition

As demonstrated by various studies in psychology, cognition is influenced by


affect, emotion, or mood. Kahneman et al. (1982) have been influential in dem-
onstrating the way in which cognitive processing can be affected by non-cognitive
factors, including affect. Affect has a powerful influence on cognitive processes and
cognitive processes are rarely free from affect (Scherer 1994). According to Zajonc
(1980), affect and cognition are under the control of separate and partially inde-
pendent systems that can influence each other in various ways. Specifically, affect
and cognition as information processing systems are ‘with different functions and
operating parameters. The affective system is judgmental, assigning positive and
negative valence to the environment rapidly and efficiently. The cognitive system
interprets and makes sense of the world. Each system impacts the other: some
emotions—affective states—are driven by cognition, and cognition is impacted by
affect’ (Norman 2002, p. 38).
Building on more than a decade of mounting work, cognitive scientists have now
discovered that, after having tended for long to reinforce the view that emotions
interfere with cognition, it is impossible to understand how we think without
understanding how we experience emotions. As a neuroscientist, Antonio Damasio
is recognized as one of the leading voices who have played a pivotal role in
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 493

establishing emotions as an important scientific subject, publishing his research


results in the early 1990s, when most cognitive scientists assumed that emotions
interfered with rational thought. Damasio’s (1994) basic postulation is that affect is
necessary to enable more rational modes of cognition. Based on a large body of
theorizing and empirical research (e.g., Clore et al. 1994; Forgas 1995; Schwarz and
Clore 1996), emotions and moods can profoundly influence various cognitive
processes, such as decision making, problem solving, judgment, and memory.
Individuals who are in positive emotional states generate greater number of solu-
tions than those in the negative condition (Martin et al. 1993). Individuals are more
likely to recall information from memory that is congruent with their current
feelings (e.g., Bower 1981). Decision making may be influenced by the use of one’s
feelings as a basis of judgement through ‘influencing the accessibility and evalu-
ation of valenced features of the decision situation’ (Schwarz 2000). Luce et al.
(1997, p. 384) observe that ‘decision processing under increasing negative emotion
both becomes more extensive and proceeds more by focusing on one attribute at a
time’.
Similarly, moods affect how we interpret and use information and what we
consign to memory. Moods affect thought processes (Forgas and George 2001).
A happy mood induces a more personal, top-down, flexible, and creative thought
process. New information is welcome. A dejected mood comes with a tendency to
be guided by outside forces (i.e., an increased focus on external information) and
induces a bottom-up, systemic mode of thinking characterized by relatively low
reliance on the preexisting knowledge. Affective states influence what strategy of
information processing individuals may adopt (Schwarz 2000)—the way cognitive
processes as activities of the brain handle information. This implies that there are
differences in cognition modes. These differences in information processing strat-
egy presumably reflect that our cognitive processes are tuned to meet the require-
ments of the current situation, which are partly signaled by our affective states
(Schwarz 1990) as dynamic processes which mediate the organism’s relation to a
continually changing social environment. A large body of experimental research
demonstrate that ‘individuals who are in a happy mood are more likely to adopt a
heuristic processing strategy that is characterized by top-down processing, with
high reliance on pre-existing knowledge structures and relatively little attention to
the details at hand. In contrast, individuals who are in a sad mood are more likely to
adopt a systematic processing strategy that is characterized by bottom-up pro-
cessing, with little reliance on pre-existing knowledge structures and considerable
attention to the details at hand’ (Schwarz 2000, p. 434)—that is, stimulus-bound
style of information processing (Fiedler et al. 1991). There is evidence that people
in negative mood states are better at taking in the details of a stimulus event or
environment (e.g., Forgas 1998, 1999). However, these two information processing
strategies lead to different outcomes. ‘Negatively valenced affect narrows the
thought processes—hence depth-first processing and less susceptibility to inter-
ruption or distraction… Positively valenced affect broadens the thought processes—
hence enhanced creativity’ (Norman 2002, p. 39). Fredrickson (2001) also suggests
that positive affective states have the effect of broadening the thought action
494 9 The Cognitively Supporting Behavior of AmI Systems …

repertoire and of building cognitive resources while negative emotions narrow the
individual’s thought repertoire. In view of that, it is the information processing
approach adopted by individuals driven by their affective states that shape, to a
large extent, how the task is perceived and thus performed: negative affective states
can make some simple tasks difficult and positive ones can make some difficult
tasks easier—e.g., by helping generate creative patterns of problem solving. Indeed,
Norman (2002, pp. 4–5) maintains that affect ‘regulates how we solve problems and
perform tasks. Negative affect can make it harder to do even easy tasks: positive
affect can make it easier to do difficult tasks. This may seem strange, especially to
people who have been trained in the cognitive sciences: affect changes how well we
do cognitive tasks… Now consider tools meant for positive situations. Here, any
pleasure derivable from the appearance or functioning of the tool increases positive
affect, broadening the creativity and increasing the tolerance for minor difficulties
and blockages. Minor problems in the design are overlooked. The changes in
processing style released by positive affect aids in creative problem solving which is
apt to overcome both difficulties encountered in the activity as well as those created
by the interface design’.

9.6.9 Creativity and the Relationship Between Affect


and Creative Cognition or Thought

Creativity is also influenced by emotions. The concept of creativity has gained


importance in recent years. Creativity is a topic of interest and focus in most
psychological research. Sternberg and Lubart (1999) present different lines in the
study of creativity in psychology. In particular creativity has attained a significant
place in cognitive psychology. Related studies try to understand and discover the
process of creative thinking. According to Sternberg (2006), there are five com-
monalities in the research of creativity around the world: (1) creativity ‘involves
thinking that aims at producing ideas or products that are relatively novel and that
are, in some respect, compelling’; (2) it has some domain-specific and
domain-general elements; (3) it is measureable, at least to some extent; (4) it can be
developed and promoted; and (5) it ‘is not highly rewarded in practice, as it is
supposed to be in theory’.
Creativity is defined as ‘the ability to produce work that is both novel (i.e.,
original, unexpected) and appropriate (i.e., useful concerning tasks constrains)’
(Sternberg and Lubart 1999, p. 3). Creativity involves the creation of something
new and useful (Mumford 2003; Andreasen 2005; Flaherty 2005). Based on these
definitions, there is certain consensus on some of the creativity characteristics in
terms of the production of something new with utility as well as on the fact that
everybody can be creative to some extent. Mayer (1999) maintains that there is a
need to clarify if creativity is a property of people, products, or processes and that
depending on this assumption, different approaches have been used to study cre-
ativity. Runco (2007) adds approaches to creativity related to place (Richards 1999;
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 495

Runco 2004) and potential (Runco 2003), emphasizing research on individuals that
have potential for creativity but are not realizing it. In terms of models of creativity,
Plsek (1997) proposes the ‘directed-creativity cycle’, composed of observation,
analysis, generation, harvesting, enhancement, evaluation, implementation, and
living with it—these are clustered within: (1) preparation, (2) imagination,
(3) development, and (4) action.
Creativity is usually attributed to special imaginative or inventive operation, and
therefore involves a typical use of cognition—mental (information-manipulation)
processes, internal structures, and representations. In other words, the process of
creative cognition entails distinct cognitive patterns, dynamic connections, associ-
ations, and manipulation of mental elements in order to generate creative ideas.
Cognitive approach to creativity aims to understand the mental processes and
representations underlying creative thought (Stenberg 1999). To generate a tangible
creative outcome requires various resources, including intellectual abilities,
knowledge, styles of thinking, personality, environment, flexibility, openness to
experience, sensitivity, playfulness, intrinsic motivation, wide interest and curiosity,
and so on (Sternberg and Lubart 1996, 1999; Runco 2007). Runco (2007) notes
that creative personality varies from domain to domain, and perhaps, even from
person to person: ‘there is no one creative personality’ (Runco 2007, p. 315).
In the context of this chapter, creativity is particularly associated with the
relationship between affect and creative cognition, more specifically, how positive
affective states influence creative thinking in task performance (e.g., see, Norman
2002; Kaufmann and Vosburg 1997). The premise is that positive affective states,
which can be elicited and increased by exposure to and interaction with AmI
systems as aesthetically beautiful, emotionally appealing, and intelligently behaving
artifacts, is likely to broaden the thought processes—hence enhanced creativity
when it comes to performing tasks or carrying out tasks. In the study of how
emotions can affect creativity, three broad lines of research can be distinguished
(Baas et al. 2008): (1) the correlation between positive and neutral emotional states;
(2) the correlation between negative and neutral emotional states; and (3) the
correlation between positive and negative emotional states. In relation to the third
line of research and affect changing the operating parameters of cognition, ‘positive
affect enhances creative, breadth-first thinking’ and makes people more tolerant of
minor difficulties and more flexible and creative in finding solutions’ while ‘neg-
ative affect focuses cognition, enhancing depth-first processing and minimizing
distractions’ (Norman 2002, p. 36). Negative affect has no leverage effect on cre-
ative performance (Kaufmann and Vosburg 1997). In relation to AmI, positive
affect enable creative problem solving which is apt to overcome difficulties
encountered in the task or activity as well as those created by the user interface
design and behavior. Furthermore, various studies suggest that positive affect
increase cognitive flexibility, leading to unusual associations (Isen et al. 1985).
Stenberg (1999) points out that creativity occurs in a mental state where thought is
associative and a large number of mental representations are simultaneously active.
Creativity consists of making new combinations of associative elements (Poincaré
1913). Creative productions consist of novel combinations of pre-existing mental
496 9 The Cognitively Supporting Behavior of AmI Systems …

elements and producing even simple combinations could be labeled creative


(Stenberg 1999). On this note, Mendelsohn (1976) suggests that individuals differ
in creativity because of the focus of attention: ‘The greater the attentional capacity,
the more likely the combinational leap which is generally described as the hallmark
of creativity’. The above theoretical models provide useful insights into under-
standing how affect, cognition, and creativity are very much intertwined. And
advances in the understanding of this relationship will have implications for AmI
design and use. By integrating aesthetics with intelligent computational functioning
by context awareness and natural interaction capabilities, which reduce the cog-
nitive and physical burden required to manipulate and interact with computer
systems, through smoothness and intuitiveness of interaction with regard to com-
putational processes as well as richness pertaining to context information, AmI
systems can invoke positive affective states in users that can subsequently enhance
their task performance, through enabling rational modes of cognition and
influencing the mechanisms of rational thinking as well as enhancing creative,
breadth-first thinking through broadening the thought processes.

9.6.10 The Effect of Aesthetics and Intelligent Behavior


of AmI Systems on Mood and Immersion

A key function of AmI is to heighten user experience by invoking positive mood.


This can be elicited by aesthetically and emotionally appealing features and patterns
in the surrounding environment in the form of beautiful AmI artifacts and intelligent
AmI environments—characterized by adaptive, responsive, and anticipatory
behaviors. In other words, the pleasure derivable from the appearance of AmI
artifacts and the desirability from the intelligent functioning of AmI environments
creates and increases positive mood (as a component of affect), rendering user
experience pleasurable and desirable. Since mood is not something that is created
entirely within us, aesthetic features of computational artifacts, coupled with the
understanding and supporting computational behavior of smart environments, can
serve as a means to invoke positive mood in users and thus inducing a positive
perception of the interaction experience. Individuals in a happy mood tend to
overestimate the likelihood of positive events (e.g., Johnson and Tversky 1983;
Nygren et al. 1996). Fundamentally, the environment in which people operate has a
significant influence on their experiences, and AmI systems hold a great potential to
shape people’s experiences in a variety of situations.
Moreover, aesthetic appearance, rich interaction, intelligent behavior, and the
ensuing heighten user experience is more likely to increase people’s immersion in
AmI environments because of the involved intense attention and effortless action
during interaction, leading to a feeling of presence (see below). Nechvatal (1999)
provides an overview on salient aesthetic aspects of immersive experience.
Immersion as a sense of presence refers to the state of consciousness where an
immersant’s awareness of physical self is diminished or lost by being surrounded in
9.6 The Relationship Between Aesthetics, Affect, and Cognition in AmI 497

an engrossing total environment (Nechvatal 1999), a mental state that is, according
to Varney (2006), often accompanied with intense focus, special excess, a distorted
sense of time, and effortless action. As to the latter, in particular, AmI environments
are more likely to induce immersion, as it provides applications that are flexible,
adaptable, and capable of acting autonomously on behalf of users, in addition to
aesthetic beautiful artifacts that trigger and mediate affect and how these processes
aid user cognition during interaction, which is crucial to the quality of user expe-
rience impacting desirability and pleasureability—and hence positive mood and
intense focus. Individuals in a positive mood state, which involves intensity (Batson
et al. 1992), have a broader focus of attention (Gasper and Clore 2000). In all,
‘total-immersion is implied complete presence…within the insinuated space of a
virtual surrounding where everything within that sphere relates necessarily to the
proposed “reality” of that world’s cyberspace and where the immersant is seem-
ingly altogether disconnected from exterior physical space’ (Nechvatal 2009, p. 14).
However, immersion can only be found in the ensemble of ingredients, which
requires a holistic design approach, thereby the need to stimulate collaboration
among people from such human-directed sciences as cognitive science, neurosci-
ence, cognitive psychology, and social sciences or working on cross connections of
presence technologies in AmI (computer science) with these disciplines to combine
their knowledge and capitalize on their strengths, and develop integral solutions to
not only immersive-driven applications, but different aspects of presence in relation
to applications, services, and products. See below for further discussion.

9.7 Presence in Computing and AmI

Presence is a common feature of computing technology; it has been widely


researched in relation to desktop, multiple, and AmI applications (e.g., Sheridan
1992; Lombard and Ditton 1997; Bracken and Lombard 2004; Nan et al. 2006;
Rheingold 1993; Turkle 1995; Weimann 2000; Riva et al. 2005).

9.7.1 Definitions of Presence

In the realm of desktop applications, presence as a theoretical concept describes the


effect that people experience when they immerse themselves in virtual reality or
interact with a computer-mediated or -generated environment (Sheridan 1994). This
definition arose due to the advent of the internet where proliferation of computer-
mediated communication systems and Web-based applications became dependent
on the phenomenon to give people the sense of, as Sheridan called it, ‘being there’
(Ibid, p. 1). Sheridan (1992) extrapolated Minsky’s (1980) original definition of
tele-presence—a term from which the term ‘presence’ derived, which refers to the
manipulation of objects in the real-world through remote access technology or the
498 9 The Cognitively Supporting Behavior of AmI Systems …

effect felt when controlling real-world objects remotely. Lombard and Ditton (1997)
describe presence abstractly as an illusion that a mediated experience is not med-
iated. In developing further the concept of presence, they enumerate six concep-
tualizations thereof:
1. Presence can be a sense of social richness, the feeling one gets from social
interaction.
2. Presence can be a sense of realism, i.e., computer-generated environments
looking or seeming real.
3. Presence can be a sense of transportation, which is a more complex concept than
the traditional feeling of one being there, including users feeling as though
something is ‘here’ with them or they are sharing common space with another
person together.
4. Presence can be a sense of immersion, through the senses or the state of mind.
5. Presence can provide users with the sense they are social actors within the
medium where users are no longer passive viewers and, via presence, gain a
sense of interactivity and control.
6. Presence can be a sense of the medium as a social actor.
A study carried out by Bracken and Lombard (2004) illustrates this idea of the
medium as a social actor with the suggestion that people interact with computers
socially. With particular emphasis on children as a sample of study, the researchers
found that confidence in the children’s ability is correlated with the positive
encouragement they receive from a computer. In a similar study conducted by Nan
et al. (2006), it was found that the inclusion of anthropomorphic agents that relied
on AI on a Web site positively impact upon people’s attitudes toward the site. Also,
the studies done by the above researchers speak to the concept of presence as
transportation, which in this case refers to the computer-generated identity, in the
sense of users, through their interaction, perceive that these fabricated personalities
are really ‘there’. Communication media and web-based applications have been a
central pillar of presence since the term’s conception and a subject of different
studies (e.g., Rheingold 1993; Turkle 1995). Turkle focuses on the individual sense
of presence and Rheingold on the environmental sense of presence that commu-
nication provides.
However, Weimann (2000) argues that, based on the view of media scholars
who claim that virtual experiences are very similar to real-life ones, people can
confuse their own memories and have trouble remembering if those experiences
were mediated or not. This may apply to people, events, situations, and places.
Indeed, in terms of presence of objects, there is evidence that humans can cope well
with missing and even contrasting information and that they do not need a real-like
representation and full perceptual experience (Bianchi-Berthouze and Mussio
2005). This issue may be overcome or, at least, its effect be mitigated due to the
recent advances in presence technologies. Riva et al. (2005) point out that presence
research can include the bias and context of subjective experience, evolving from
the effort to generate reality with increasing realism—the ‘perceptual illusion of
non-mediation’.
9.7 Presence in Computing and AmI 499

9.7.2 Expanding and Reconfiguring the Concept


of Presence in AmI

Heralding a paradigm break with the post-desktop paradigm, AmI computing has
broadened and reconfigured the conceptualization of many terms, including pres-
ence. Accordingly, AmI goes further than the early use of the term (e.g., Minsky
1980; Sheridan 1994) presence since its applications and uses are both widened and
deepened. Riva et al. (2005) maintains: ‘Today man-machine interfaces have
evolved considerably, and the inherent capacity of presence technologies is to
support multiple users’ engagement and bidirectionality of exchange: the objectives
and communication approach are thus different to control theory’. Indeed, AmI are
characterized by human-like computational capabilities, including context aware-
ness, implicit and natural interaction, and autonomous intelligent behavior, and
involve distinctive enabling technologies, including smart miniaturized sensors,
embedded systems, communication and networking technologies, intelligent user
interfaces/intelligent agents. These are to be exploited in addition to virtual reality,
mixed reality, augmented reality, embodied reality, hyper-reality, mediated reality,
and ubiquitous virtual reality for successful substitution to being there yourself.
Furthermore, the aforementioned Lombard and Ditton’s (1997) conceptualizations
of presence apply to AmI, given the scope of the understanding and supporting
behavior characterizing AmI systems and environments—AmI takes care of and is
sensitive to needs; is capable of anticipating and responding intelligently to spoken
or gestured indications (cognitive, emotional and physiological cues) of desires
without conscious mediation, reacting to explicit spoken and gestured commands
for executing tasks, and supporting the social processes of humans and even being a
competent social agent in group interactions; can even engage in intelligent dialog
or mingle socially with humans; and elicits pleasant user experiences and positive
emotions in users through affective quality of aesthetic artifacts and environments
and smoothness, intuitiveness, and richness of interaction. Appropriate technologies
of presence, the sense of being there: ‘the experience of projecting one’s mind
through media to other places, people and designed environments’, combine vari-
ous types of media to create a non-mediation illusion—‘the closest possible
approximation to a sense of physical presence, when physical presence there may
be none’ (Ibid). Presence entails an amalgam of cognition, affect, attention, emo-
tion, motivation, and belief associated with the experience of interacting with AmI
technologies in relation to different settings: home, work, social environments, and
on the move. Riva et al. (2005) emphasize ‘the link between the technology—
through the concepts of ubiquitous computing and intelligent interface—and the
human experience of interacting in the world—through a neuro-psychological
vision centered on the concept of ‘presence’.’
In particular, more advances in ambient, naturalistic, intelligent user interfaces
will radically change interaction between technology and humans—e.g., tremen-
dously ease and enrich user interaction experience. This has direct implications for
presence and believability as to the mediated experience of interacting with any
500 9 The Cognitively Supporting Behavior of AmI Systems …

entity (e.g., objects, places, events, situations, people, designed environments, etc.)
in any x-reality (e.g., virtual reality, mixed reality, augmented reality, embodied
reality, hyper-reality, mediated reality, ubiquitous virtual reality, etc.) within AmI
spaces. In fact, computing spaces are much more about the believability than the
reality of these entities—in many AmI applications and scenarios. Addressing the
issue of presence and believability in relation to computational artifacts, Casati and
Pasquinelli (2005) argue that the important issue is believability not reality,
although the representation fidelity of, and perceptual interaction with, computa-
tional artifacts has been at the center of research within visual computing (visual
graphics) and virtual reality. The authors ‘provide evidence that humans do not
need a real-like representation and full perceptual experience but that they can cope
well with missing and even contrasting information. They argue that the processes
that play a role in the subjective feeling of presence are cognitive and perceptual
expectancies relative to the object to be perceived, and the sensory motor loop in
the experience of the perceived object. Insisting that a motor component plays a
prominent role in the perception of objects, they suggest an enactive interface, an
interface that enables users to act on an object and to see the consequences of their
actions, as a means to improve believability’ (Bianchi-Berthouze and Mussio 2005,
p. 384). Therefore, what is essential is the neuro-cognitive-perceptual processes
involved in the experience of the simulated settings. Likely missing and contrasting
information about these settings are more likely to be overcome with natural
interaction, context awareness, and intelligence as computational capabilities. The
perception of computing environments as real and believable is increasingly
becoming achievable due to the advances of many enabling technologies and thus
computational functionalities. AmI represents a computationally augmented envi-
ronment where human users interact and communicate with artificial devices, and
the latter explore their environment and learn from, and support, human users. This
entails that technologies become endowed with human-like cognitive, emotional,
communicative, and social competencies necessary to improve the naturalness of
interaction and the intelligence of services through AmI systems and environments
behaving so pertinently in real-time, i.e., the user having full access to a wide
variety of intelligent services from the augmented presence environments, that they
seem fully interactive, adaptive, and responsive—and hence can be perceived, felt,
and appear as real. Put differently, x-reality environments are more likely to become
indistinguishable from reality environments.

9.7.3 Interdisciplinary Research in Presence

The conceptualization of presence borrows from multiple fields, including com-


puter science, communication, engineering, cognitive psychology, cognitive sci-
ence, neuroscience, philosophy, arts, aesthetics, and so on. The value of
interdisciplinary research lies in bringing well-informed engineered, designed, and
modeled technologies, as this research approach seeks a holistic understanding of
9.7 Presence in Computing and AmI 501

AmI as a technological phenomenon. In doing so, it enhances the computational


understanding of a variety of aspects of human functioning—e.g., the way per-
ception, emotion, intention, reasoning, and intelligent actions as human cognitive
and behavioral processes work, co-operate, and interrelate—to ultimately develop
effective and successful applications that deliver valuable services and can span a
whole range of potential domains. The combination of recent discoveries in cog-
nitive science and neuroscience and the breakthroughs in the enabling technologies
and computational processes, thanks to AI, enables creating novel systems, whether
related to or beyond presence technology. AmI computing is inherently a ‘crossover
approach’, strongly linked to a lot of topics related to computer science,
human-directed scientific areas, social sciences, and humanities. Therefore, it is
necessary and fruitful to stimulate interdisciplinary research endeavors within the
field of AmI. In relation to presence, it is advantageous to get researchers together
from such disciplines as neuroscience, cognitive psychology, cognitive science,
human nonverbal communication behavior, visual computing, and cultural studies,
with the aim to create new-fangled interactional knowledge and thus benefit from
new perspectives and insights that can advance the understanding of the neuro-
logical, psychological, behavioral, and cultural dimensions of presence for a suc-
cessful implementation of AmI technology. The underlying assumption is that
presence needs to be further developed, beyond today’s state-of-the-art technolo-
gies (associated with visual simulated phenomena), which can be accomplished
through dedicating more efforts for collaborative research endeavors that bring
scholars and experts from those disciplines to pool the knowledge of their research
projects together to speed up the process of building presence-well-informed AmI
systems and environments. New scientific discoveries in human-directed sciences
and advanced knowledge in human communication should be directed at enhancing
AmI system engineering, design, and modeling in ways that enable technology
(functioning) to emulate human functioning at the interface of AmI systems and
environments. Concomitantly, a more thorough empirical investigation and theo-
rizing endeavor is necessary to solidify extant theoretical frameworks and models in
these sciences and disciplines to advance the understanding of human functioning
processes: neurological, cognitive, emotional, motivational, and communicative
and how they interrelate and affect one another at the focus of presence. Of equal
importance is to create robust metrics to measure and evaluate presence in relation
to neurological, mental, emotional, motivational, behavioral, and social states
among users within different interactive settings. Of these states, Riva et al. (2005)
contend: ‘Mental and motivational states are part of the study of human presence,
which adds controversy to complexity, since the methods or measurement metrics
for presence are not established and the validity of predicting effects is scant if
reliable or longitudinal data does not exist. A better understanding of human
cognitive and affective capacities is very likely to lead to new applications…,
spanning a whole range of potential industries. There are a number of application
areas where a good sense of presence is needed, and in the near future x-reality…
will be exploited for successful substitution to actually being there yourself.
502 9 The Cognitively Supporting Behavior of AmI Systems …

Representative examples could include…new working environments…, biomedi-


cine and neurosciences (assistive surgical operations, neurological rehabilitation,
human perceptual augmentation), education…, surveillance, real-time interactive
gaming and entertainment, archiving, new communication standards, and so on’.
All in all, intensified and in-depth studies are required in relation to multifarious
aspects of presence: realism, social richness, immersion, transportation, interactivity
and control, and beyond visual simulated phenomena. Riva et al. (2005) predict that
that x-reality applications—AmI environments—will shift from simulating visual
phenomena towards simulating more natural phenomena and appealing to multi-
sensory modalities, following user-in-the-loop approach; ‘the user will have full
access to various services from the augmented presence environment. The services
will be delivering results in real-time and the environment will seem fully inter-
active and reactive, enabling…users…to perceive and (inter)act with applications
and services to gain…better results’.
On the whole, as summarized by Riva et al. (2005), the center of attention in
research should be on:
• Understanding different forms of presence, including aspects of cognition,
perception, emotion, affect, and interaction.
• Developing techniques for measuring presence based on insights from
human-directed (physio-neuro-cognitive, and social) sciences.
• Investigating ethical and potential long-term implications of using presence
technologies.
• Designing and developing, based on the understanding of human presence,
essential building blocks that capture the salient aspects of presence and inter-
action, which should exploit relevant leading-edge software and hardware
technologies (e.g., multi-sensor fusion, wearable and biosensor sensor tech-
nology, real-time display and high fidelity rendering, 3D representation and
compression, real-time tracking and capture, haptic interfaces).
• Developing novel systems that can support different kinds and levels of pres-
ence and interaction in various situations, with research focusing on open sys-
tem architectures for integrating the aforementioned building blocks, using
relevant tools for programing complex aspects of presence and designing novel
interaction models.

9.7.4 Challenges to Presence in AmI

Presence research is active and evolving as AmI continues to advance. It is though


facing great challenges as to the design, development, and testing of AmI systems
that match human cognitive, affective, and interactive capabilities. At the moment,
the main effort is towards re-creating the different experiences of presence and
interaction in various x-reality environments. This requires further advancement of
technologies and thus computational capabilities, which in fact pose many
9.7 Presence in Computing and AmI 503

challenges and open issues relating to system engineering, design, and modeling,
especially in relation to context awareness, natural interaction, and intelligence.
Emulating these human capabilities and processes to augment presence environ-
ments is no doubt an easy task. The question is to which extent human capabilities
and processes can be captured in formal abstractions (or models) that an AmI
system can understand and communicate with the user on a human-like level.
Adding to this is the unsettled scientific issues relating to human cognitive world
and the complexity inherent in comprehending how human cognitive processes
interrelate (e.g., emotional complex, affect and cognition, motivation and emotion,
etc.) function in a dynamic and unpredictable way, and relate to other factors such
as different forms of energy and brain and body’s biochemistry. Another question to
be raised is to whether modelers and designers will ever be able to formally
computerize these relationships and dynamic patterns that an application or system
can give users the sense of it as a medium acting as a competent social actor in
interaction and provide users with the sense they are social actors within the
medium where users via presence gain a sense of interactivity and control. The
underlying premise is that for the system to do so, it needs to display a human-like
cognitive, affective, and communicative behavior. AmI works in an unobtrusive and
invisible way (ISTAG 2001). The significant challenge lies in designing, modeling,
evaluating, and instantiating AmI systems and environments that coordinate with
human users’ cognitive, affective, and interactive patterns and behaviors, so that
they can be perceived as real subjects without missing or conflicting information,
that is, in harmony with human mental representations used in the perception and
making sense of real objects and subjects. In deconstructing subject-object relations
in AmI, Crutzen (2005, p. 224) state: ‘A necessary condition for the realization of
AmI environments is not only monitoring in circumambient ways the actions of
humans and the changes in their visible and invisible environment, AmI is also a
pattern of models of chains of interaction embedded in things. Objects in our daily
world—mostly inanimate—will be enriched by an intelligence that will make them
almost ‘subjects’, capable of responding to stimuli from the world around them and
even of anticipating the stimuli. In the AmI world the “relationship” between us and
the technology around us is no longer one of a user towards a machine or tool, but
of a person towards an “object-became-subject”, something that is capable of
reacting and of being educated’. Regardless of the conceptualization of presence, in
augmented presence environment, the perceptual experience remains subjective and
idiosyncratic to a great extent. That is to say, it depends on how each user expe-
riences AmI systems and environments, e.g., the perception of the dynamics of the
interaction and the extent to which it naturally occurs. Riva et al. (2005) maintains
that presence research is evolving to reproduce reality with ever increasing realism
and to include the context and bias of subjective experience, and suggests design
choices for novel computer-enriched environments that can enhance human
capacity to adapt to new situations.
Realizing presence technology as a key aspect of human-centered design sym-
bolizing AmI remains a question of technology feasibility: what existing and future
technologies will permit in terms of engineering, design, development, evaluation,
504 9 The Cognitively Supporting Behavior of AmI Systems …

and modeling of AmI systems—in other words, how far computer scientists and
designers can go and be able as to simulating and implementing human cognition,
perception, emotion, and interaction into next-generation technologies. It is argued
that most of the computer system and application engineering, design, and mod-
eling are technology-driven due to the fact that little knowledge, methods, models,
and tools are available to incorporate user cognitive, affective, and interactive
behavior as a parameter when designing computer systems. A strong effort must be
made in the direction of human behavior modeling to achieve in human under-
standing the same level of confidence that exists in designing and modeling new
technology. The real challenge may lie in taking into account a holistic view at the
level of human functioning processes: neurological, cognitive, affective, motiva-
tional, communicative, and communicative as well as the micro-context of users’
everyday lives. This would go in favor of more in-depth studies of users in real life
settings or so-called living labs. Technology designers seem to believe however that
these techniques are too costly and too time consuming to take them on board;
indeed, they require considerable investments on different scales but rather the
question should be whether the results would prove the efforts or not. On an
optimistic note, Riva et al. (2005) mentioned a number of challenging scenarios that
are envisioned as tests of whether presence technologies can make a real difference,
while foreseeing other scenarios beyond the state of the art to emerge. The chal-
lenging ones include:
• ‘Persistent hybrid communities: constructing large-scale virtual/mixed com-
munities that respond in real-time and exhibit effects of memory and behavioral
persistence while evolving according to their intrinsic social dynamics.
• Presence for conflict resolution, allowing people to be immersed and experience
situations of conflict or co-operation. By fostering communication and mutual
understanding between different parties these presence environments should
ultimately be empathy-inducing.
• Mobile mixed reality presence environments: moving freely and interacting in
real/augmented populated surroundings through natural and/or augmented
mediated tools.
• Personalized learning and training environments, stimulating a combination of
imaginary and physical actions and emotions through appropriate sets of
embedded nonverbal and multisensory cues for skill acquisition and learning’.

References

Abawajy JH (2009) Human-computer interaction in ubiquitous computing environments. Int J


Pervasive Comput Commun 5(1):61–77
Abowd GD (1999) Classroom 2000: an experiment with the instrumentation of a living
educational environment. IBM Syst J Special issue Pervasive Comput 38(4):508–530
Adjouadi M, Sesin A, Ayala M, Cabrerizo M (2004) Remote eye gaze tracking system as a
computer interface for persons with severe motor disability. In: Proceedings of the 9th
international conference on computers helping people with special needs, Paris, pp 761–766
References 505

Alexander S, Sarrafzadeh A (2004) Interfaces that adapt like humans. In: Proceedings of 6th
computer human interaction 6th Asia Pacific conference (APCHI 2004), Rotorua, pp 641–645
Andreasen N (2005) The creating brain. Dana Press, New York
Asteriadis S, Tzouveli P, Karpouzis K, Kollias S (2009) Estimation of behavioral user state based
on eye gaze and head pose—application in an e-learning environment. Multimedia Tools and
Applications 41(3):469–493
Baas M, Carsten De Dreu KW, Nijstad BA (2008) A meta-analysis of 25 years of mood-creativity
research: hedonic tone, activation, or regulatory focus? Psychol Bull Am Psychol Assoc 134
(6):779–806
Batson CD, Shaw LL, Oleson KC (1992) Differentiating affect, mood and emotion: toward
functionally based conceptual distinctions. Sage, Newbury Park
Ben-Bassat T, Meyer J, Tractinsky N (2006) Economic and subjective measures of the perceived
value of aesthetics and usability. ACM Trans Comput Hum Interact 13(2):210–234
Bianchi-Berthouze N, Mussio P (2005) Introduction to the special issue on “context and emotion
aware visual computing”. J Vis Lang Comput Comput 16:383–385
Blechman EA (1990) Moods, affect, and emotions. Lawrence Erlbaum Associates, Hillsdale, NJ
Boring RL (2003) Cognitive science: at the crossroads of the computers and the mind. Assoc
Comput Mach 10(2):2
Bosse T, Castelfranchi C, Neerincx M, Sadri F, Treur J (2007) First international workshop on
human aspects in ambient intelligence. In: Workshop at the European conference on ambient
intelligence, Darmstadt, Germany
Bower GH (1981) Mood and memory. Am Psychol 36:129–148
Bracken C, Lombard M (2004) Social presence and children: praise, intrinsic motivation, and
learning with computers. J Commun 54:22–37
Braisby NR, Gellatly ARH (2005) Cognitive psychology. Oxford University Press, New York
Brandtzæg PB (2005) Gender differences and the digital divide in Norway—Is there really a
gendered divide? In: Proceedings of the international childhoods conference: children and
youth in emerging and transforming societies, Oslo, Norway, pp 427–454
Brewin CR (1989) Cognitive change processes in psychotherapy. Psychol Rev 96:379–394
Cacioppo JT, Gardner WL, Berntson GG (1999) The affect system has parallel and integrative
processing components: form follows function. J Personal Soc Psychol 76:839–855
Casati R, Pasquinelli E (2005) Is the subjective feel of ‘presence’ an uninteresting goal? J Vis Lang
Comput 16(5):428–441
Clore GL, Schwarz N, Conway M (1994) Affective causes and consequences of social information
processing. In: Wyer RS, Srull TK (eds) Handbook of social cognition, vol 1. Erlbaum
Hillsdale, NJ, pp 323–418
Crutzen CKM (2005) Intelligent ambience between heaven and hell. Inf Commun Ethics Soc 3
(4):219–232
Damasio A (1994) Descartes’ error: emotion, reason, and the human Brain. Grosset/Putnam, New
York
Demirbilek O, Sener B (2003) Product design, semantics and emotional response. Ergonomics 46
(13–14):1346–1360
de Silva GC, Lyons MJ, Tetsutani N (2004) Vision based acquisition of mouth actions for
human-computer interaction. In: Proceedings of the 8th Pacific Rim international conference
on artificial intelligence, Auckland, pp 959–960
Dey AK (2000) Providing architectural support for building context-aware applications. PhD
thesis, College of Computing, Georgia Institute of Technology
Dewey J (1934) Art as experience. Berkley Publishing Group, New York
Exline R, Winters L (1965) Effects of cognitive difficulty and cognitive style on eye contact in
interviews. In: Proceedings of the Eastern Psycholoical Association, Atlantic City, NJ, pp 35–41
Fiedler K, Asbeck J, Nickel S (1991) Mood and constructive memory effects on social judgment.
Cognit Emot 5:363–378
Fishkin KP (2004) A taxonomy for and analysis of tangible interfaces. Personal Ubiquitous
Comput 8(5):347–358
506 9 The Cognitively Supporting Behavior of AmI Systems …

Fishwick PA (ed) (2006) Aesthetic computing. MIT Press, Cambridge


Flaherty AW (2005) Frontotemporal and dopaminergic control if idea generation and creative
drive. J Comp Neurol 493:147–153
Forgas JP (1995) To appear in: hearts and minds: affective influences on social cognition and
behavior. Psychology Press, New York
Forgas JP (1998) Happy and mistaken? Mood effects on the fundamental attribution error.
J Personal Soc Psychol 75:318–331
Forgas JP (1999) On feeling good and being rude: affective influences on language use and
requests. J Personal Soc Psychol 76:928–939
Forgas JP, George JM (2001) Affective influences on judgments and behavior in organizations: an
information processing perspective. Organ Behav Hum Decis Process 86:3–34
Fredrickson BL (2001) The role of positive emotions in positive psychology: the
broaden-and-build theory of positive emotions. Am Psychol 56:218–226
Frijda NH (1986) The emotions. Cambridge University Press, Cambridge
Galotti KM (2004) Cognitive psychology in and out of the laboratory. Wadsworth, Boston
Gasper K, Clore GL (2000) Do you have to pay attention to your feelings in order to be influenced
by them? Personal Soc Psychol Bull 26:698–711
Giunchiglia F, Bouquet P (1988) Introduction to contextual reasoning: an artificial intelligence
perspective. Perspect Cognit Sci 3:138–159
Goldman A (2001) The Aesthetic. In: Gaut B, McIver Lopes D (eds) The Routledge companion to
aesthetics. Routledge, London, pp 181–192
Gwizdka J (2000) What’s in the context? In: Computer Human Interaction (CHI). Hague, The
Netherlands
Hawthorn D (1998) Psychophysical aging and human computer interface design. In: Proceedings
of the Australasian conference on computer human interaction, Adelaide, CA, pp 281–291
Hekkert P (2004) Design aesthetics: principles of pleasure in design. Department of Industrial,
Delft University of Technology, Delft
Helander MG, Tham MP (2003) Hedonomics - Affective human factors design. Ergonomics 46
(13/14):1269–1272
Isen AM, Johnson MM, Mertz E, Robinson G (1985) The influence of positive affect on the
unusualness of word associations. J Personal Soc Psychol 48(6):1413–1426
Ishikawa T, Horry Y, Hoshino T (2005) Touchless input device and gesture commands. In:
Proceedings of the international conference on consumer electronics, Las Vegas, NV, pp 205–206
ISTAG (2001) Scenarios for Ambient Intelligence in 2010. ftp://ftp.cordis.lu/pub/ist/docs/
istagscenarios2010.pdf. Viewed 29 Nov 2009.
Johnson EJ, Tversky A (1983) Affect, generalization and the perception of risk. J Personal Soc
Psychol 45:20–31
Kahneman D, Slovic P, Tversky A (1982) Judgment under uncertainty: heuristics and biases.
Cambridge University Press, New York
Kaiser S, Wehrle T (2001) Facial expressions as indicators of appraisal processes. In: Scherer KR,
Schorr A, Johnstone T (eds) Appraisal processes in emotions: theory, methods, research.
Oxford University Press, New York, pp 285–300
Kaufmann G, Vosburg SK (1997) Paradoxical effects of mood on creative problem solving. Cognit
Emot 11(2):151–170
Kim S, Suh E, Yoo K (2007) A study of context inference for Web-based information systems.
Electron Commer Res Appl 6:146–158
Kintsch W (1988) The role of knowledge in discourse comprehension: a construction-integration
model. Psychol Rev 95(2):163–182
Kirsh D (1995) The intelligent use of space. J Artif Intell 73(1–2):31–68
Kleck R, Nuessle W (1968) Congruence between the Indicative and communicative functions of
eye-contact in interpersonal relations. Br J Soc Clin Psychol 7:241–246
Kumar M, Paepcke A, Winograd T (2007) EyePoint: practical pointing and selection using gaze
and keyboard. In: Proceedings of the CHI: conference on human factors in computing systems,
San Jose, CA, pp 421–430
References 507

Kwon OB, Choi SC, Park GR (2005) NAMA: a context-aware multi-agent based web service
approach to proactive need identification for personalized reminder systems. Expert Syst Appl
29:17–32
Lakoff G, Johnson M (1999) Philosophy in the flesh: the embodied mind and its challenge to
Western thought. Basic Books, New York
Lazarus RS (1982) Thoughts on the relations between emotions and cognition. Am Physiol 37
(10):1019–1024
Leder H, Belke B, Oeberst A, Augustin D (2004) A model of aesthetic appreciation and aesthetic
judgments. Br J Psychol 95:489–508
Lerner JS, Keltner D (2000) Beyond valence: toward a model of emotion-specific influences on
judgment and choice. Cognit Emot 14(4):473–493
Leventhal H, Scherer K (1987) The relationship of emotion to cognition: a functional approach to
a semantic controversy. Cognit Emot 1:3–28
Lieberman H, Selker T (2000) Out of context: computer systems that adapt to, and learn from,
context. IBM Syst J 39:617–632
Lindgaard G, Fernandes G, Dudek C, Brown J (2006) Attention web designers: you have 50
milliseconds to make a good first impression! Behav Inf Technol 25:115–126
Loewy R (1951) Never leave well enough alone. Simon and Schuster, New York
Lombard M, Ditton T (1997) At the heart of it all: the concept of presence. J Comput Mediat
Commun 3(2)
Luce MF, Bettman JR, Payne JW (1997) Choice processing in emotionally difficult decisions.
J Exp Psychol Learn Mem Cognit 23:384–405
Markopoulos P, de Ruyter B, Privender S, van Breemen A (2005) Case study: bringing social
intelligence into home dialogue systems. ACM Interact 12(4):37–43
Martin RA, Kuiper NA, Olinger J, Dance KA (1993) Humor, coping with stress, selfconcept, and
psychological well-being. Humor 6:89–104
Mayer RE (1999) The promise of educational psychology: learning in the content areas. Prentice
Hall, Upper Saddle River, NJ
Mendelsohn GA (1976) Associative and attentional processes in creative performance. J Personal
44:341–369
Minsky M (1980) Telepresence. MIT Press Journals, Cambridge, pp 45–51
Mumford MD (2003) Where have we been, where are we going? Taking stock in creativity
research. Creat Res J 15:107–120
Nan X, Anghelcev G, Myers JR, Sar S, Faber RJ (2006) What if a website can talk? Exploring the
persuasive effects of web-based anthropomorphic agents. J Mass Commun Q 83(3):615–631
Nechvatal J (1999) Immersive Ideals/critical distances. PhD thesis, University of Wales
Nechvatal J (2009) Immersive ideals/critical distances. LAP Lambert Academic Publishing, Köln
Norman DA (2002) Emotion and design: attractive things work better. Interactions 4:36–42
Norman DA (2004) Emotional design: why we Love (or hate) everyday things. Basic Books,
Cambridge
Nygren TE, Isen AM, Taylor PJ, Dulin J (1996) The influence of positive affect on the decision
rule in risk situations: focus on outcome (and especially avoidance of loss) rather than
probability. Organ Behav Hum Decis Process 66:59–72
Ortony A, Clore GL, Collins A (1988) The cognitive structure of emotions. Cambridge University
Press, Cambridge, England
Ortony A, Turner TJ (1990) What’s basic about basic emotions? Psychol Rev 97:315–331
Pantic M, Rothkrantz LJM (2003) Toward an affect sensitive multimodal human-computer
interaction. Proc IEEE 91(9):1370–1390
Passer MW, Smith RE (2006) The science of mind and behavior. Mc Graw Hill, Boston
Plsek PE (1997) Creativity, innovation and quality. ASQ Quality Press, Milwaukee
Poincaré H (1913) The foundations of science. Science Press, Lancaster
Prekop P, Burnett M (2003) Activities, context and ubiquitous computing. Comput Commun
26:1168–1176
508 9 The Cognitively Supporting Behavior of AmI Systems …

Rafaeli A, Vilnai-Yavetz I (2004) Emotion as a connection of physical artifacts and organizations.


Organ Sci 15:671–686
Rheingold HR (1993) The virtual community: homesteading on the electronic frontier.
Addison-Wesley, New York
Reichle R, Wagner M, Khan MU, Geihs K, Valla M, Fra C, Paspallis N, Papadopoulos GA (2008)
A Context query language for pervasive computing environments. In: 6th annual IEEE
international conference on pervasive computing and communications, pp 434–440
Richards R (1999) The subtle attraction: beauty as the force in awareness, creativity, and survival.
In: Russ SW (ed) Affect, creative experience, and psychological adjustment. Brunner/Mazel,
Philadelphia, pp 195–219
Riva G, Vatalaro F, Davide F, Alcañiz M (2005) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam
Roseman IJ, Wiest C, Swartz TS (1994) Phenomenology, behaviors, and goals differentiate
discrete emotions. J Pers Soc Psychol 67:206–221
Runco MA (2003) Discretion is the better part of creativity: personal creativity and implications
for culture, inquiry: critical thinking across the disciplines. Inq Crit Think Discip 22:9–12
Runco MA (2004) Personal creativity and culture. In: Lau S, Hui ANN, Ng GYC (eds) Creativity
when East meets West. World Scientific, New Jersey, pp 9–22
Runco MA (2007) Creativity. Theories and themes: research, development and practice. Elsevier,
Amsterdam
Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 1:145–172
Salvucci DD, Anderson JR (2001) Automated eye movement protocol analysis. Hum Comput
Interact 16(1):38–49
Scherer KR (1992) What does facial expression express? In: Strongman K (ed) International
review of studies on emotion, vol 2, pp 139–165
Scherer KR (1993) Neuroscience projections to current debates in emotion psychology. Cognit
Emot 7:1–41
Scherer KR (1994) Plato’s legacy: relationships between cognition, emotion, and motivation.
University of Geneva, Austria
Scherer KR (1999) Appraisal theory. In: Dalgleish T, Power MJ (eds) Handbook of cognition and
emotion. Wiley, New York, pp 637–663
Scherer KR, Schorr A, Johnstone T (eds) (2001) Appraisal processes in emotion: theory, methods,
research. Oxford University Press, New York
Schilit WN (1995) A system for context-aware mobile computing. PhD thesis, Columbia
University
Schilit B, Adams N, Want R (1994) Context-aware computing applications. In: Proceedings of IEEE
workshop on mobile computing systems and applications, Santa Cruz, CA, USA, pp 85–90
Schmidt A (2005) Interactive context-aware systems interacting with ambient intelligence. In:
Riva G, Vatalaro F, Davide F, Alcañiz M (eds) Ambient intelligence: the evolution of
technology, communication and cognition towards the future of human-computer interaction.
IOS Press, Amsterdam, pp 159–178
Schmidt A, Beigl M, Gellersen HW (1999) There is more to context than location. In: Computers
and graphics, vol 23, no 6, pp 893–901
Schucman H, Thetford C (1975) A course in miracle. Viking Penguin, New York
Schwarz N (1990) Handbook of motivation and cognition foundations of social behavior. Guilford
Press, New York
Schwarz N, Clore GL (1996) Social psychology: handbook of basic principles. Guilford Press,
New York
Schwarz N (2000) Emotion, cognition and decision making. Cognit Emot 14(4):433–440
Seo MG, Feldman Barret L, Bartunek (2004) The role of affective experience in work motivation.
Acad Manag Rev 29(3):423–439
Sheridan TB (1992) Musings on telepresence and virtual presence. Presence Teleoperators Virtual
Environ 1:120–126
References 509

Sheridan TB (1994) Further musings on the psychophysics of presence. Presence Teleoperators


Virtual Environ 5:241–246
Sibert LE, Jacob RJK (2000) Evaluation of eye gaze interaction. In: Proceedings of the ACM
Conference on human factors in computing systems, The Hague, pp 281–288
Spillers F (2004) Emotion as a cognitive artifact and the design implications for products that are
perceived as pleasurable, experience dynamics. https://www.experiencedynamics.com/sites/
default/files/publications/Emotion-in-Design%20.pdf. Viewed 19 July 2010
Sternberg RJ (1999) Handbook of creativity. Cambridge University Press, Cambridge
Sternberg RJ (2006) Introduction. In: Kaufman JC, Sternberg RJ (eds) The International handbook
of creativity. Cambridge University Press, London, pp 1–10
Sternberg RJ, Lubart TI (1996) Investing in creativity. Am Psychol 51:677–688
Sternberg RJ, Lubart TI (1999) The concept of creativity: prospects and paradigms. In:
Sternberg RJ (ed) The international handbook of creativity. Cambridge University Press,
London, pp 3–16
Strommen ES (1993) Is it easier to hop or walk? Development issues in interface design. Hum
Comput Interact 8:337–352
Tobii Technology (2006) AB, Tobii 1750 eye tracker, Sweden. www.tobii.com. Viewed 15 Dec
2012
Tractinsky N (2006) Aesthetics in information technology: motivation and future research
directions. In: Zhang P, Galletta D, Sharpe ME (eds) Human-computer interaction and
management information systems: foundations, Armonk, NY, pp 330–347
Tractinsky N, Katz A, Ikar D (2000) What is beautiful is usable. Interact Comput 13(2):127–145
Turkle S (1995) Life on the screen: identity in the age of the Internet. Simon & Schuster, New
York
Ulrich W (2008) Information, context, and critique: context awareness of the third kind. In: The
31st information systems research seminar in Scandinavia, keynote talk presented to IRIS 31
VandenBos G (2006) APA dictionary of psychology. American Psychological Association,
Washington, DC
Varney A (2006) Immersion unexplained. The Escapist 57:20–23
Vitense HS, Jacko JA, Emery VK (2002) Multimodal feedback: establishing a performance
baseline for improved access by individuals with visual impairments. In: 5th annual ACM
conference on assistive technologies, pp 49–56
Wasserman V, Rafaeli A, Kluger AN (2000) Aesthetic symbols as emotional cues. In: Fineman S
(ed) Emotion in organizations. SAGE, London
Weimann G (2000) Communicating unreality: modern media and the reconstruction of reality.
Sage Publications, Thousand Oaks
Zajonc RB (1980) Feeling and thinking: preferences need no Inferences. Am Psychol 35(2):151–175
Zhang P (2008) Motivational affordances: reasons for ICT design and use. Commun ACM 51
(11):145–147
Zhang P (2009) Theorizing the relationship between affect and aesthetics in the ICT design and use
context. In: Proceedings of the international conference on information resources management,
Dubai, United Arab Emirates, pp 1–15
Zhang P, Li N (2004) Love at first sight or sustained effect? The role of perceived affective quality
on users’ cognitive reactions to information technology. In: International conference on
information systems (ICIS’04), Washington, DC, pp 283–296
Zhang P, Li N (2005) The importance of affective quality. Commun ACM 48(9):105–108
Part III
Conclusion
Chapter 10
Concluding Remarks, Practical
and Research Implications, and Reflections

The principal aim of this book was to explore, review, and discuss the state-of-the-art
enabling technologies, computational processes and capabilities, and human-
inspired AmI applications (in which knowledge from the human-directed sciences
such as cognitive science, social sciences, and humanities is incorporated) and to
provide new insights and ideas on how these components could be further enhanced
and advanced. This book intended moreover to identify, document, and address the
main challenges and limitations associated with the engineering, design, modeling,
and implementation of AmI systems, and to put forward alternative research avenues
that provide a more holistic view of AmI and present important contributions for
bringing the vision of the integration of computer intelligence into people’s everyday
lives closer to realization and delivery with real impacts.
The significance of the research combing technological, human, and social
dimensions of AmI lies in its potential to enhance the enabling technologies and
computational processes and capabilities underlying the functioning of AmI tech-
nology, by gaining a better understanding of a variety of aspects of human func-
tioning based on advanced knowledge from human-directed sciences and
effectively amalgamating and applying this knowledge in the field of AmI, with the
primary aim to build well-informed human-inspired AmI applications that can have
a profound and positive impact on people as to enhancing the quality of their lives.

10.1 A Comprehensive Design Approach to AmI Systems

The primary intention with regard to the design of human-inspired applications and
their use in everyday life practices is to contribute to the understanding of existing
problem domains, to emphasize the need for making an effort to broaden the scope
of problem domains, and to encourage search for new ones. Adding to this intent is
to contribute to the appropriate and pertinent solutions to some of the real issues
involved in the realization and deployment of AmI smart spaces. In this regard, it is

© Atlantis Press and the author(s) 2015 513


S.E. Bibri, The Human Face of Ambient Intelligence, Atlantis Ambient
and Pervasive Intelligence 9, DOI 10.2991/978-94-6239-130-7_10
514 10 Concluding Remarks, Practical and Research Implications …

crucial to rethink how various human-like intelligences in the form of cognitive and
behavioral processes should be conceived, combined, interrelated, and implemented
in the next generation of AmI systems.
The design of AmI systems should follow a three dimensional framework for
research in AmI as a comprehensive approach: (1) research outputs, including
constructs, models, methods, and instantiations; (2) research activities, including
building, evaluating, theorizing, and justifying (e.g., March and Smith 1995); and
(3) interdisciplinary and transdisciplinary research undertakings (multiperspectival
and holistic analysis for achieving coherent knowledge and broad understanding of
AmI). Real, new problems (e.g., context-aware systems, affective/emotion-aware
systems, socially intelligent systems, conversational systems, etc.) must be properly
conceptualized and represented (using machine learning, ontological, logical, and
hybrid methods, as well as other novel approaches), appropriate techniques and
mechanisms (including sensors, intelligent components/information processing
units, actuators, and networks) for their solution must be constructed, and solutions
(various human-inspired AmI applications) must be implemented and evaluated in
their operating environments using appropriate metrics or criteria. Enabling tech-
nologies and processes involve a wide variety of sensors and actuators, data pro-
cessing approaches, machine learning methods, knowledge representation and
reasoning techniques, intelligent agents, and query languages necessary for the
design and implementation of AmI systems. Moreover, if significant progress is to
be made, AmI research must also develop an understanding of how and why
different systems work or fail, and identify during the evaluation and instantiation
phases which of the enabling technologies and processes are interfering with the
proper functioning of AmI systems in their variety. Such an understanding must
link together natural laws (from natural and formal science) governing AmI systems
with human and social rules (from human-directed sciences) governing the human
environments in which they operate.

10.2 The Need for Interdisciplinary Research

In relation to the aforementioned comprehensive approach to the design of AmI


systems, it is crucial at the current stage of AmI research to stimulate more col-
laborative endeavors, by getting scientists, experts, and scholars together from
human-directed disciplines or working on cross connections of AmI with these
disciplines to pool their efforts and speed up construction of whole range of human-
inspired AmI applications, thereby advancing AmI approaches. Interdisciplinary
teams may encompass computer scientists, AI experts, computational mathemati-
cians, logicians, cognitive scientists, physicists, biologists, architects, cognitive
psycholinguists, neuroscientists, anthropologists, social scientists, specialists in the
application of computing to the processing of natural language, ethicists, and
philosophers, but to name a few. The emphasis should be on the use of knowledge
from human-directed sciences in human-inspired AmI applications, so to support
10.2 The Need for Interdisciplinary Research 515

human users in their everyday life in psychological, conversational, and social


respects. To achieve this goal, the effort should be directed towards engaging the
active researchers studying human-directed sciences in AmI research and sensi-
tizing them to the possibilities to pool their knowledge together and incorporate
their substantial research evidence in an attempt to enhance existing models of
human contexts and processes used in the development of AmI applications,
architectures, and environments. One possible way forward is to develop concrete
schemes and programs for encouraging modelers and researchers in the humanities
and psychological and social disciplines to develop interest and engage in the field
of AmI as a high prospective application domain for their models. Examples of
computational modeling areas that are of high topicality and relevance to AmI
include: human multimodal verbal and nonverbal communication behavior; cog-
nitive, emotional, psychophysiological, social, and cultural contexts; and emotional
and social processes. The necessity of innovative collaborative approaches to AmI
modeling research stems from AmI inherent complexity and intricacy as a tech-
nology that is directed towards Humans. This is useful to support common
understanding as well as constructive communication in cross-functional teams and
among the experts and scholars involved in the creation and promotion of the AmI
vision. Especially, there is a need to stimulate the cross-fertilization of ideas and
research between scholars from every discipline. The underlying premise is that
things are galvanized at the interfaces and in the action of several levels of reality.

10.3 Revisiting the AmI Vision—Rethinking the Notion


of Intelligence—and Fresh Possibilities
and Opportunities

AmI is an exciting and fertile area for investigation with many intriguing and
probing questions and extensive work awaiting future interdisciplinary scholarly
research and collaborative industry innovation. This supposes the necessity and
motivation for the AmI vision to become open to further interrogations that are
indeed causing it to fundamentally reconfigure its present beliefs and knowledge
claims and, accordingly, abandon some of the currently prevailing assumptions,
especially those pertaining to the notion of intelligence which has been an integral
part of some of the most tantalizing (visionary) scenarios. Besides, philosophically,
it is important for AmI to recognize and accept its historically conditioned
knowledge, which postulates the acceptance—like all knowledge formations which
are infused with ways-of-seeing—of partial, local, and specific analyses of social
reality, as it is not shaped in significant ways by more majestic and general
structures. However, in the process of revisiting the AmI vision, moving behind its
foundational farsightedness, it is of great importance to attach importance to
ensuring that the user implications are made more explicit by answering the main
question about how the users are—and ought to be—configured in AmI; to
surmounting the inadequacy in, or making more explicit, the consideration for
516 10 Concluding Remarks, Practical and Research Implications …

human values in the design choices that will influence AmI technology as well as
using these values as parameters for reading everyday life patterns with regard to
the innovation process of AmI; to working strategically towards becoming more
driven by humanistic concerns than deterministic ones; and to accepting, under-
standing, and capitalizing on the idea that the AmI innovation process is an
interactive process between technological and societal change, where technology
and society mutually shape and influence one another and they both unfold within
that process, thereby taking into account the user and social dynamics and under-
currents involved in and underlying the innovation process. The whole idea is that a
strong effort should be made in the direction of re-examining and reconfiguring the
vision to achieve in AmI progress from a human and social perspective the same
level of confidence and optimism that exists in advancing technology—i.e., it should
inspire researchers and scholars into a quest for the tremendous possibilities created
by exploring new understandings and adopting alternative strategies for rethinking
the whole idea of intelligence as an essential part of the incorporation of machine
intelligence into people’s everyday lives. A holistic view is one that considers people
and their behavioral patterns and everyday life scenarios and practices when looking
at intelligence, and thus leverage on these elements to generate situated forms of
intelligence. This is most likely to make people want and aspire to give technology a
place in their lives and thus allow the incorporation of computer intelligence in their
everyday lives. AmI holds a great potential to frame the role of new technologies—
but only based on incorporating the user dimensions and the social dynamics in the
innovation process. The push philosophy of AmI alone remains inadequate to
generate successful and meaningful technological systems.
Moving behind its foundational (visionary) vision—can still be seen as a sign of
progress towards delivery—after having contributed significantly to establishing the
field of AmI and thus accomplishing its mission, by inspiring a whole generation of
innovators, scholars, and researchers into a quest for the tremendous opportunities
that have been enabled and created by, and foreseen coming from, the incorporation
of computer intelligence into people’s everyday lives and environments to bring
about a radical and technology-driven social transformation (see, e.g., José et al.
2010; Aarts and Grotenhuis 2009; Gunnarsdóttir and Arribas-Ayllon 2012). The
conspicuous reality pertaining to the scattering of research areas, the magnitude of
challenges, the numerous open and unsolved issues, the unintended implications,
the significant risks, the bottlenecks or stumbling blocks, and the unfeasibility and
unattainability associated with the notion of intelligence and thus the realization of
the AmI vision all imply the high relevance and added sense of exigency as to
revisiting or reexamining the AmI vision. This should though, whether concerning
the notion of intelligence or other prevailing assumptions, not be seen as a failure or
criticism to the blossoming field of AmI, so to speak, but rather as an integral part
of the research advancement in which a vision of the future technology should not
be considered as an end in itself or a set of specified requirements. Instead, it should
be conceived as a place that marks the beginning of a journey from which to depart,
while stimulating debates and depicting possible futures along the way, towards
making it a reality. The underlying assumption is that the AmI field anchored on its
10.3 Revisiting the AmI Vision—Rethinking the Notion of Intelligence … 517

substantial research effort by which AmI vision has in fact fulfilled its mission and
role can aim higher and thus dream realistically bigger, by capitalizing on the
proposed alternative research directions, grasping the meaning and implication of
what AmI vision epitomize for people, valuing holistic approaches, and embracing
emerging trends around the core notions of AmI. Indeed, there is a growing per-
ception that the centripetal movement of the recommended fresh ideas and new
avenues, coupled with human considerations in the future AmI innovation in light
of the emerging and growing body of research findings, enduring principles, per-
tinent solutions for many complex issues, unraveled intricacies, and addressed
challenges can have a significant impact on AmI-driven processes of social trans-
formation—what I identify as ‘the substantiated quintessence of AmI’. Hence, it is
time to direct the effort towards new ways of thinking and striving for coherent
knowledge and understanding of AmI, instead of concentrating on—continuing to
devote huge energies to designing and building, very often reinventing the wheel—
new technologies and their applications and services for enabling the visionary
scenarios and making them for real. Such scenarios were actually meant, when
conceived by technology creators 15 years ago, to highlight the potentials and
illustrate the merits of AmI technology. Especially, most of the visionary scenarios
have proven to be unrealistic when compared with the reality they picture or
futuristic only to correspond to the inspiring and aspiring AmI vision they intend to
instantiate. The whole idea is that AmI has for long been driven by overblown
research agendas concentrated primarily on the potential of technology and its
technical features—perhaps to serve economic and political purposes. It is time to
deliver the promises and confront the expectations with reality for serving human
and social purposes.

10.4 The Inconspicuous, Rapid Spreading of AmI Spaces

All the efforts being made towards a synergetic prosperity and fresh research
endeavors in AmI can be justified by the fact that by all accounts—projects and
reports, technology foresight studies, science and technology policies, research and
technology development, and design and development of new technologies, one
can deduce that there is an unshakable belief in the development of technology
towards AmI as an internet of things that think, with computer intelligence com-
pletely infiltrating human environment, embedded everywhere, and minimal tech-
nical knowledge required to make use of computer technology as to functionality
and communication. Indeed, sensing and computing devices are already embedded
in many everyday objects and existing environments, and this trend will
undoubtedly continue to evolve. Especially, computing devices, which are able to
think and communicate, are becoming increasingly cheap, miniature, sophisticated,
powerful, smart, interconnected, and easy to use, thereby finding application in
virtually all aspects of people’s everyday lives. It is becoming increasingly evident
that AmI environments will be commonplace in the very near future to support
518 10 Concluding Remarks, Practical and Research Implications …

living, work, learning, infotainment, and social spaces through naturalistic multi-
modal interaction and context-aware personalized, adaptive, responsive, and service
provision.
It has been widely acknowledged that the dramatic reduction in cost and high
performance of ICT makes it accessible and widespread. That is to say, these two
factors play a key role in determining or shaping ICT use and application in each
computing era, from mainframe computing (1960–1980), through personal com-
puting (1980–1990) and multiple computing (2000 onwards), to everywhere
computing (2010 onwards). In view of this, the sensing and computing devices,
ubiquitous computing infrastructures, and wireless communication networks
becoming technically matured and financially affordable, coupled with the rise of
the internet and the emergence of Global Computing trend are laying the founda-
tions for a number of AmI applications of varied scale, distribution, and intelligence
in terms of system support and new services pertaining to as well everyday life as
societal spheres. This is increasingly shaping the magnitude and massiveness of the
uses of AmI. Thus, it is only a matter of the advance and prevalence of enhanced
enabling technologies and computational processes and capabilities underlying the
functioning of AmI that the AmI vision will materialize into a deployable com-
puting paradigm, if not a societal paradigm.
The construction of the AmI space is progressing on a hard-to-imagine scale.
A countless number of sensors, actuators, and computing devices (where analysis,
modeling, and reasoning occur) as key AmI technologies are being networked, and
their numbers are set to increase exponentially, by orders of magnitude towards
forming gigantic computing and networking infrastructures spread across different
geographical locations and connected by middleware architectures and global
networks. Middleware serves to linkup several kinds of distributed components and
enable them to interact seamlessly across dispersed infrastructures and disparate
networks, in the midst of a variety of heterogeneous hardware and software systems
(e.g., computers, networks, applications, and services) needed for enabling smart
environments.
At present, the environment of humans, the public and the private, is pervaded
by huge quantities of active devices of various types and forms, computerized
enough—e.g., equipped with artificial intelligent agents—to automate routine
decisions and act autonomously on behalf of human agents. The increasing mini-
aturization of computer technology is making possible the development of minia-
ture sensors that allow registering various human parameters without disturbing
human actors, thereby the commonsensical infiltration of AmI into daily human
environments. The purpose of this pervasion is to model and monitor the way
people live, through employing remote and nearby recognition systems for body
tracking, behavior monitoring, facial expressions, hand gestures, eye movements,
and voices, thanks to biometrics technology. Today, RFID tags are attached to
many objects and are expected to be entrenched in virtually all kinds of everyday
objects, with the advancement of the Internet of Things trend, handling address-
ability and traceability, monitoring and controlling devices, and automating process
controls and operative tools, and so on, on a hard-to-imagine scale. Likewise,
10.4 The Inconspicuous, Rapid Spreading of AmI Spaces 519

humans will be inundated by huge amounts of real-time responses based on


interacting and networking RFID tags. The micro- and nano-scale RIFD will, in the
foreseeable future, result in their integration into more and more everyday objects
as part of the Internet of Things, leading to the disappearance of input and output
media, thereby enabling people to communicate directly with all sorts of objects,
which in turn will communicate with each other and with other people’s objects. In
short, AmI is being pushed through largely unnoticed by the public at large—given
its nature in terms of pervasive and continuous presence—and spreading quite
rapidly into people’s everyday lives and existing environments.
In a nutshell, the idea that computer intelligence will permeate, be in the most
varied scenarios of, people’s everyday lives, enabling a drastic technology-driven
transformation to daily and social living is increasingly unfolding. It is, however,
important to acknowledge that the nature of the applications, services, and envi-
ronments that will constitute AmI upon its deployment may not be realized com-
pletely as proposed or may turn out to be different from the way they were
envisioned, especially in relation to the intelligence as has been alluded to in the
AmI vision. This implies that the AmI vision will materialize but with alterations to
its underlying assumptions and approaches and thus founding vision.

10.5 Future Avenues for AmI Technology Development:


A General Perspective

Whether concerning human-inspired applications associated with living environ-


ments and workspaces or other societal applications, AmI as technological change
has to evolve in a mutual process with societal change. ISTAG (2012) has indeed
realized that investments and R&D resources in ICT and thus AmI will not be
justified by technological advancement and industry leadership as it has been
understood (and pushed through) by technologists and industry expert communi-
ties; rather, any technological development has to be linked with social develop-
ment, and thus a perspective of balanced progress requires, among others, a
balanced concern across the various spheres of life, including society and individual
development; an agile, open and participatory process for formulating and nego-
tiating choices; an open approach towards disruptive innovations that may emerge
in networks of various stakeholders. However, it remains to be seen if, how, and to
what extent this manifesto will further shape and influence the evolution of AmI in
terms of research and development of AmI systems, not least in the medium term.
The main concern is that these guidelines for balanced progress may remain only at
the level of discourse, and as a result, if they are not translated into real-world
actions, then the AmI may continue to unfold according to, or be driven by, the
technical and industrial factors that originally shaped its landscape from the very
inception of the vision. It would not consequently take into account people’s
everyday lives and social dynamics in the real-world. But it is important to
acknowledge that the available enabling technologies are ‘already enough to do
520 10 Concluding Remarks, Practical and Research Implications …

much more than what we have done so far, and therefore AmI can no longer be
about a vision of a new world for the future, and driven by distant and overblown
research agendas focused mainly on technological features. AmI has the obligation
to start delivering valuable services’ (José et al. 2010, pp. 1497–1498). Rather, a
genuine value shift is needed to guide the evolution of AmI innovation. Aarts and
Grotenhuis (2009) underscore the need for a value shift: ‘…we need a more bal-
anced approach in which technology should serve people instead of driving them to
the max’. This argument relates to social innovation in the sense of directing the
development of new technologies towards responding to the user and social needs
and creating enduring collaborations between various stakeholders. A value shift
entails the necessity of approaching AmI in terms of a balance between conflicting
individual and social needs and impacts, rather than merely in terms of techno-
logical progress (ISTAG 2012). The underlying assumption is that failing to con-
nect with social development is likely to result in people rejecting new technologies
and societal actors in misallocating or misdirecting resources (e.g., technical R&D).
One way to achieve the objective is to view AmI development as entry into the
networks of social working relationships, involving technology designers, diverse
classes of users, and other involved stakeholders and what they entail in terms of
codified, tacit, creative, and non-technological knowledge, that make AmI systems
possible and enable them to find their way to domestication and social acceptance
and subsequently thrive. In other words, all the stakeholders involved in the value
chain of AmI technology should focus and work with how AmI, with its diverse
application domains, connect to broader systems of socio-material relationships—
thereby the need for insights from social research—in the form of cooperatives of
humans and nonhumans, through which various issues of concern can be dealt with.
Of particular significance in this regard is that, to iterate, human values must
constitute key drivers of AmI innovation and key parameters for reading everyday
life patterns, an important trait of those innovators who judge the successfulness of
their innovations on the basis of the extent to which they primarily deliver real
value to people and benefiting them—first and foremost. Indeed, the key factors and
criteria for technology acceptance and appropriation are increasingly associated
with the way technology is aligned with human values (see José et al. 2010).
Human values form an important part of the society, and guide people’s behavior in
many ways. Incorporating human values into, and bringing them to the forefront of,
the innovation process of AmI is about putting a strong emphasis on people and
their experience with technology as a view that is rather concerned with a much
broader set of issues than just ‘intelligent functionality’ and ‘intuitive usability’,
namely hedonism (pleasure, aesthetics, and sensuous gratification) as well as other
high-level values, such as self-direction (independent thought and action), crea-
tivity, ownership, freedom, privacy, and so on. Consequently, the necessity of the
AmI technological progress to be linked with human and social progress entails
changes to the context in which AmI technology creators and producers operate and
innovate. Besides, the ICT industry has to operate within the wider sociotechnical
context, networked ecosystem, where it is embedded and thus consider the other
stakeholders with their interests, meaning constructions, and notions of action.
10.5 Future Avenues for AmI Technology Development: A General Perspective 521

In fact, various technologies and practices concurrently undergo change in the


wider social context and necessitate aligning of conflicting interests. Innovators do
not operate in isolation and are influenced by the evolving social patterns around
them and, in turn, influence those patterns (see Bibri 2014). It is about the interplay
between the ICT industry motivations for AmI innovation activities and the wider
social context within which the ICT industry operates. This includes the various
powers of users, consumers, citizens, regulatory agencies, and policymakers. The
wider sociotechnical landscape in which the whole innovation system is embedded
and all types of innovators operate, encompass not only the economy and politics,
but also institutions and social norms and values (see Smith 2003; Bibri 2014).
In particular, what is needed is for AmI to deliver a real value to people, valuable
applications and services for the disorderliness, richness, situatedness, unpredict-
ability, and diversity of the social environment of real settings. This means that it is
needed to find alternative ways to innovate and design for new, and all sorts of,
situations of use and to avoid neglecting, overlooking, or oversimplifying the
challenges of the present, by attempting to solve real issues so as to be able to scale
from prototypes and simulations to realistic systems and environments, thereby
going beyond the constrained conditions of the laboratories. This is a call for seeking
outside the technical R&D laboratories and for a quest of a more pertinent research
agenda (developed from fresh roadmaps indicating what areas of more social rele-
vance must be investigated in order to bring the vision of AmI into reality—closer to
delivery and real social impact) that address issues of more significance for, and
associated with the messiness of, everyday life practices and imperfections of the
real life world.

10.6 The Seminal Role of Social Innovation


and Participative and Humanistic Design
in the Sustainability of AmI Technology

The continuous success of AmI as ICT innovations will be based on the social
dimension of innovation and, thus, the participative and humanistic dimensions of
design—i.e., the ability and willingness of people to use or acclimatize to the
technological opportunities offered by AmI as well as their active involvement in
the design process, coupled with the consideration for human values in the fun-
damental design choices. This highlights the tremendous value of the emerging
approaches into and trends around technology design and innovation in addressing
the complexity of AmI context, enhancing related application and service devel-
opment, and even managing the unpredictable future as to emerging user behaviors
and needs in the context of AmI. Given its underpinnings—collective interlacing of
concerned people, participative and humanistic design processes, and needed
technological systems and applications, social innovation is a sound and powerful
way to mitigate the risk of unrealism associated with the AmI vision, and thus work
purposefully and strategically towards achieving the essence of the AmI vision,
522 10 Concluding Remarks, Practical and Research Implications …

a radical and technology-driven social transformation—i.e., the incorporation of


computer intelligence into people’s everyday lives with positive and profound
impacts. Otherwise design and innovation processes grounded in unrealistic (user)
scenarios will ultimately lead to irrelevant or unrealistic applications and services
that no one will use, adopt, or benefit from. The premise is that most successful and
meaningful technological solutions for users’ needs, social problems, and existing
everyday practices can emerge from people and be found in communities. The
Social Shaping of Technology approach, which is an umbrella term for different
social constructivist visions on technology, advocates the active participatory role
of concerned people as agents of technological change. The implication of scenarios
in AmI being mostly conceived by technology creators, albeit to emphasize the
merits and illustrate the potential of AmI technology, is technological determinism,
a view which does not resonate with what recent social studies of new technologies
have shown with regard to the determining role of social innovation and what it
entails in the innovation equation and thus the significance of social dimension of
innovation for the success of new technology in terms of its acceptance, adoption,
or appropriation.
My hope is that this book will provide the grounding for further more in-depth
empirical qualitative studies on human and social aspects of AmI as a new tech-
nology, the kinds of studies that take into account the micro-context or scenarios of
users’ everyday lives. Considering the nature of human-inspired AmI applications
and services, the most appropriate and effective way to develop successful appli-
cations and services is to experiment and engage more with people in their everyday
life’s scenarios. Hence, the design of AmI technology needs to be supported by
ethnographic, in-depth studies of users (e.g., Crabtree and Rodden 2002) in real-life
settings, the environment in which they will be interacting with such technologies in
order to create well-informed AmI technological systems—that is, fully adapted to
users’ needs and expectations. Ethnographic, in-depth studies are about involving
users in a sociological sense, whereby they are able to accept, absorb, and find a
meaningful presence for new technologies in the myriad scenarios or situations of
their everyday lives (Hallns and Redstrm 2002). The reason for encouraging this type
of qualitative studies is that as the demand for practical ideas and holistic perspec-
tives on how to achieve a radical and technology-driven social transformation and
thus bring the vision of AmI to delivery with concrete impacts increases, future AmI
projects and initiatives—informed by holistic analyses and perspectives—are more
likely to get increasing attention from computer scientists, industry experts, and
technology and innovation policymakers. Especially, there is currently a persistent
gap between the promises of the AmI vision relating particularly to the notion of
intelligence (central anticipatory, adaptive, and personalized characteristics of AmI)
and its real achievements, as well as neglecting the current challenges as a potentially
relevant risk. This is what Bell and Dourish (2007) label ‘the proximate future’,
a closer future but always postponed. However, as stated by ISTAG (2003, p. 13),
the vocal champion for the AmI vision, AmI ‘can only be fully developed by a
holistic approach, encompassing technical and societal research. In return, AmI
offers scientists a rich field of research at the boundaries between disciplines.
10.6 The Seminal Role of Social Innovation and Participative … 523

Although research aimed at improving and extending the knowledge in core sci-
entific and technology domains remains a necessity, it is at these interfaces between
scientific domains that exciting things happen…. The AmI vision should not be
‘oversold’ but neither ISTAG nor the IST research community should shrink from
highlighting the exciting possibilities that will be offered to individuals who will live
in the AmI space’ (Bold in the original). Further research should focus on providing
the knowledge that the involved societal actors will need to make informed decisions
about how to realize the AmI vision in its social context—predicated on the
assumption that it is high time for the AmI community to embrace new emerging
research trends around its core concepts and underlying assumptions.

References

Aarts E, Grotenhuis F (2009) Ambient intelligence 2.0: towards synergetic prosperity. In:
Tscheligi M, Ruyter B, Markopoulus P, Wichert R, Mirlacher T, Meschterjakov A, Reitberger
W (eds) Proceedings of the European conference on ambient intelligence. Springer, Austria,
pp 1–13
Bell G, Dourish P (2007) Yesterday’s tomorrows: notes on ubiquitous computing’s dominant
vision. Pers Ubiquit Comput 11(2):133–143
Bibri SE (2014) The potential catalytic role of green entrepreneurship—technological
eco-innovations and ecopreneurs’ acts—in the structural transformation to a low-carbon or
green economy: a discursive investigation. Master Thesis, Department of Economics and
Management, Lund University
Crabtree A, Rodden T (2002) Technology and the home: supporting cooperative analysis of the
design space. In: CHI 2002, ACM Press
Gunnarsdóttir K, Arribas-Ayllon M (2012) Ambient intelligence: a narrative in search of users.
Lancaster University and SOCSI, Cardiff University, Cesagen
Hallns L, Redstrm J (2002), From use to presence: on the expressions and aesthetics of everyday
computational things. ACM Trans Comput Hum Interact 9(2):106–124
ISTAG (2003) Ambient Intelligence: from vision to reality (For participation—in society &
business), viewed 23 October 2009. http://www.ideo.co.uk/DTI/CatalIST/istag–ist2003_draft_
consolidated_report.pdf
ISTAG (2012) Towards horizon 2020—recommendations of ISTAG on FP7 ICT work program
2013, viewed 15 March 2012. http://cordis.europa.eu/fp7/ict/istag/reports_en.html
José R, Rodrigues H, Otero N (2010) Ambient intelligence: beyond the inspiring vision. J Univers
Comput Sci 16(12):1480–1499
March ST, Smith GF (1995) Design and natural science research on information technology. Decis
Support Syst 15:251–266
Smith A (2003) Transforming technological regimes for sustainable development: a role for
alternative technology niches? Sci Public Policy 30(2):127–135

You might also like