2017 Book TheWebAtGraduationAndBeyond

Gottfried Vossen
Frank Schönthaler
Stuart Dillon
The Web
at Graduation
and Beyond
Business Impacts and Developments
The Web at Graduation and Beyond
Gottfried Vossen Frank Schönthaler
•
Stuart Dillon
The Web at Graduation

and Beyond
Business Impacts and Developments
123
Gottfried Vossen Stuart Dillon
Department of Information Department of Management Systems
Systems, ERCIS University of Waikato
University of Münster Hamilton
Münster New Zealand
Germany
Frank Schönthaler
PROMATIS Group
Ettlingen, Baden-Württemberg
Germany
ISBN 978-3-319-60160-1 ISBN 978-3-319-60161-8 (eBook)

DOI 10.1007/978-3-319-60161-8
Library of Congress Control Number: 2017943255
© Springer International Publishing AG 2017

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Printed on acid-free paper
This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
I dedicate this book to my oldest daughter
Laura, who was seriously ill during a
considerable period while I was working on
this book, but for whom, just like for the Web
and what people can do with it, everybody
hopes that the future will look bright.
G.V.
Preface
At least two of the authors of this book have reached an age at which it is not
uncommon to witness, often in disbelief and sometimes stunned, the younger
generation that are commonly referred to as millennials, or Generation Y. We
encounter millennials in talent management, in the workplace, in associations and
clubs, but also in our families; in other words, we associate with them in all sectors
of society. Born in the period that spans the early 1980s to the beginning of the new
millennium, millennials constantly push boundaries in search for new meaning and
the perfect work-life balance. Their professional biographies are characterized by
internationalization, changing fields of activity, and phases of varying work
intensity, exhibiting a mixture of professional as well as social tasks. They demand
increased societal as well as environmental responsibility from their employers, and
they require room for their personal development. More importantly, they seek
praise and appreciation for their dreams, their visions, and their achievements.
It is of little value to examine and assess millennials based on their current
behaviors—they are way too dynamic for that. Instead, we must comprehend where
they come from, what has formed them, what influences them, and what they
consider important in their lives. To achieve this, we must look back at their early
years in life, their time at school, and their professional training which is when the
millennials often finish their formal education or studies—in other words, when
millennials reach graduation. This provides us with the basis on which we might
dare to look into their future.
Considerations like these have motivated us to write this book. After all, the
Web is just another (albeit non-human) millennial and unquestionably one of the
most important companions in the lives of human millennials. Its evolution has
taken it from a simple means to communicate as Web 1.0 to a Web of participation,
otherwise known as the Social Web or Web 2.0, to today where the Web now
infiltrates all aspects of our private and professional lives as a core driver of dig-
itization. While the early Web had a focus on rationalizing work procedures and
providing a repository for information, the Social Web enabled improvements of
process quality and ‘flattened’ the pathway into the digital world. Facilitating this
was improved usability, enhanced interactivity, and ubiquitous access, in particular
via mobile platforms. Today Web-based technologies are the enabler for novel
forms of customer experience, for disruptive business models, and for modes of
vii
viii Preface
collaboration never dreamt of before. Interestingly, it is not the technical limitations

of the Web hampering its continued growth today, instead it is moral and ethical
dilemmas. Several such issues are touched upon in this book; deeper discussions are
beyond our scope. Yet we hope that we have been able to produce a solid tech-
nology—as well as business-focused holistic view on the impact of the Web.
Life-long learning in the area of Information Technology is a big challenge
especially for managers and executives. New developments emerge at an
ever-increasing pace, and the areas in which this happens are both diverse, yet
relevant for almost every business. This includes the likes of Web search, data
mining and business intelligence to social media, cloud computing, big data and
mobile devices. More so than ever before it is important that those who own and
manage businesses, irrespective of size, are aware of these developments and the
impact they may have. In an increasingly globally connected, fast-paced operating
environment, many businesses can no longer choose whether to adopt technology;
the issue is when and how to adopt. This book is intended as a guide for people who
grew up with a background in business administration or a related area, but who
through their career paths have reached a position where IT-related decisions have
become their daily business. Our intention is to balance rules and approaches for
strategy development and decision making, with a certain technical understanding
of what happens “behind the curtain.”
Goals of the book:

• Explore the vast array of new technologies that are available to businesses
today.
• Familiarize the reader with ways to approach new technologies and best utilize
them.
• Raise awareness of key recent technological advances of the likes of cloud
computing, social media, mobile technologies, and big data technologies and
how they might be used in the modern enterprise.
• Establish an understanding, from a managerial perspective, of the value of a
range of contemporary technologies and an ability to articulate that to others in a
non-technical way.
• Identify the important managerial considerations associated with contemporary
technologies.
• Enable the reader to perform a SWOT analysis or even develop a strategy
regarding the adoption of these technologies to a range of situations.
Sample of issues discussed in the book:
• Cloud adoption and security issues: What should businesses consider? How can
they decide whether or not to move to the cloud? What would be a reasonable
strategy? How to avoid cloud washing? What needs to be considered regarding
security? How can my cloud applications and data be protected?
Preface ix
• Big data analytics: How to exploit big data scenarios for the benefit of my
business? What is a reasonable Big Data architecture (beyond a data
warehouse)?
• Social media data: How to handle the big data produced in and by social media
today? How to distinguish relevant from irrelevant data? How to measure the
value of a social media presence?
• Business Intelligence: Which adaptations need to be made to our BI processes?
• IT Decision Making and Strategy: Bring Your Own Device (BYOD) versus
Company Owned, Personally Enabled (COPE).
We can see a variety of paths through the various chapters of this book, indicated
in the following picture:
3 4
Obviously, the book can be read chronologically. A reader not so interested in

the history of the Web might jump straight to Chap. 2. Chapters 3 and 4 may be
read in either order, depending on the reader’s interest. Progressing from Chap. 1
directly to Chap. 5 or 6, or even Chaps. 2–6 is also feasible. The book, in general, is
non-technical; however, some readers might find aspects of Sects. 2.3, 3.4 and 3.6
present material using notation for which they may be unfamiliar. The overall
essence of the book will not be lost if these sections are skipped.
A word about references: We use two ways of citing, the first of which occurs
directly in the text where we want to point the reader to original work right at the
position where we talk about a certain topic. Additionally, each chapter has a
“Further Reading” section at the end, where we point to other literature containing
useful information on what was discussed in the respective chapter. These sections
typically do not repeat the references that appear already in the running text.
A quick disclaimer: Many of our references are Web-based and so, given the
obvious dynamic nature of the Web, we cannot guarantee their accuracy beyond
date of publication.
We are grateful to various people who helped in the preparation of this book.
Dr. Ute Masermann read chapter drafts and pointed out important mistakes or
x Preface
inconsistencies. Sabine Schwarz prepared the figures, ensured they were presented
accurately and uniform throughout the text. Lena Hertzel checked all the references
for us and corrected a number of citation errors.
Münster, Germany Gottfried Vossen

Ettlingen, Germany Frank Schönthaler
Hamilton, New Zealand Stuart Dillon
April 2017
Contents
1 The Web from Freshman to Senior in 20+ Years (that is, A Short
History of the Web). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Beginnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Browsers: Mosaic and Netscape . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Client/Server and P2P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 HTML and XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Commerce on the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 The Search Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 The Web as a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2 Search Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.3 The Long Tail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.4 Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Hardware Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.1 Moore’s Law: From Mainframes to Smartphones . . . . . . . . 19
1.3.2 IP Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Mobile Technologies and Devices . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4.1 Mobile Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4.2 Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 From a Flat World to a Fast World that Keeps Accelerating . . . . . 31
1.6 Socialization. Comprehensive User Involvement . . . . . . . . . . . . . . 35
1.6.1 Blogs and Wikis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.6.2 Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.6.3 The Crowd as Your Next Community . . . . . . . . . . . . . . . . 43
1.7 The Web at Graduation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2 Digital (Information) Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.1 Digitized Business Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.1.1 What Is the Problem? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.1.2 Business Process Modeling and the Horus Method . . . . . . . 55
2.1.3 Holistic Business Process Management . . . . . . . . . . . . . . . . 57
2.1.4 BPM Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
xi
xii Contents
2.2 Virtualization and Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . 65

2.2.1 Cloud Service Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.2.2 Relevant Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.2.3 Precursors of Cloud Computing . . . . . . . . . . . . . . . . . . . . . 71
2.2.4 What Defines Cloud Computing? . . . . . . . . . . . . . . . . . . . . 74
2.2.5 Classification of Cloud Services . . . . . . . . . . . . . . . . . . . . . 75
2.2.6 Types of Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2.7 Cloud Revenue Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.2.8 Cloud Benefits and Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.3 Technology for the Management of (Big) Data . . . . . . . . . . . . . . . 81
2.3.1 Characterizing Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.3.2 Databases and Data Warehouses . . . . . . . . . . . . . . . . . . . . . 83
2.3.3 Distributed Files Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.3.4 Map-Reduce and Hadoop . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.3.5 NoSQL and In-Memory Databases . . . . . . . . . . . . . . . . . . . 92
2.3.6 Big Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.4 Integrated Systems and Appliances . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.4.1 Integrated Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.4.2 Appliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.5 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3 IT and the Consumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.1 Commercialization of the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.1.1 Components of an E-Commerce System . . . . . . . . . . . . . . . 106
3.1.2 Types of Electronic Commerce . . . . . . . . . . . . . . . . . . . . . . 109
3.1.3 Recommendation, Advertising, Intermediaries . . . . . . . . . . . 111
3.1.4 Case Amazon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.2 Big Data Analytics Application Areas . . . . . . . . . . . . . . . . . . . . . . 116
3.3 Mobile Commerce and Social Commerce . . . . . . . . . . . . . . . . . . . . 122
3.3.1 Applications of Mobile Commerce . . . . . . . . . . . . . . . . . . . 122
3.3.2 Attributes of Mobile Commerce . . . . . . . . . . . . . . . . . . . . . 123
3.3.3 User Barriers of Mobile Commerce . . . . . . . . . . . . . . . . . . 123
3.3.4 Social Commerce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.3.5 Dimensions and Models of Social Commerce . . . . . . . . . . . 125
3.4 Social Media Technology and Marketing . . . . . . . . . . . . . . . . . . . . 127
3.4.1 Social Media and Business . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.4.2 Social Networks as Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.4.3 Processing Social Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.5 Online Advertising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.5.1 A Greedy Algorithm for Matching Ads and Queries . . . . . . 135
3.5.2 Search Advertising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.6 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
3.6.1 Content-Based Recommenders . . . . . . . . . . . . . . . . . . . . . . 145
3.6.2 Collaborative Filtering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Contents xiii
3.7 Electronic Government . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

3.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4 IT and the Enterprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.1 Cloud Sourcing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.1.1 Strategy Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.1.2 Cloud Strategy Development . . . . . . . . . . . . . . . . . . . . . . . . 159
4.1.3 Cloud Provider Evaluation and Monitoring . . . . . . . . . . . . . 162
4.1.4 Crowdsourcing for Enterprises . . . . . . . . . . . . . . . . . . . . . . 167
4.2 Business Intelligence and the Data Warehouse 2.0 . . . . . . . . . . . . . 169
4.2.1 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.2.2 Strategy Development for Big Data Exploitation . . . . . . . . 177
4.2.3 From Big Data to Smart Data . . . . . . . . . . . . . . . . . . . . . . . 178
4.2.4 Next Up: Data Marketplaces and Ubiquitous Analytics . . . 186
4.3 IT Consumerization, BYOD and COPE . . . . . . . . . . . . . . . . . . . . . 188
4.3.1 Device Ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.3.2 Access Control Though Mobile Device Management . . . . . 191
4.3.3 Governance for Security and Privacy . . . . . . . . . . . . . . . . . 193
4.4 The Digital Workplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4.4.1 Requirements of a Digital Workplace . . . . . . . . . . . . . . . . . 194
4.4.2 Key Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.4.3 The Physical Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
4.5 BPM and the CPO: Governance, Agility and Efficiency for the
Digital Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
4.5.1 CPO: The CIO’s New Role . . . . . . . . . . . . . . . . . . . . . . . . 200
4.5.2 Business-Driven Implementation of BPM . . . . . . . . . . . . . . 201
4.5.3 Governance, Risk, and Compliance . . . . . . . . . . . . . . . . . . . 206
4.5.4 Simultaneous Planning of the Business Architecture . . . . . . 208
4.5.5 Standardization and Harmonization: Company-Wide
and Beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
4.5.6 Business Process Outsourcing (BPO) . . . . . . . . . . . . . . . . . 212
4.5.7 Social Innovation Management . . . . . . . . . . . . . . . . . . . . . . 217
4.5.8 Sustainability of BPM Strategies: The Business Process
Factory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
4.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5 Digitization and Disruptive Innovation . . . . . . . . . . . . . . . . . . . . . . . . 223
5.1 Innovation. Social Innovation Labs. . . . . . . . . . . . . . . . . . . . . . . . . 223
5.2 Digital Transformation. The Chief Digital Officer. . . . . . . . . . . . . . 232
5.3 Disruption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
5.4 The Price of Data. Publicity Versus Privacy . . . . . . . . . . . . . . . . . . 238
5.5 Towards Sharing and On-Demand Communities . . . . . . . . . . . . . . 240
5.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
xiv Contents
6 The Road Ahead: Living in a Digital World . . . . . . . . . . . . . . . . . . . . 249

6.1 Cyber-Physical Systems and the Internet of Things . . . . . . . . . . . . 250
6.2 The Smart Factory and Industry 4.0 . . . . . . . . . . . . . . . . . . . . . . . . 255
6.2.1 IoT-Enabled Value Chains . . . . . . . . . . . . . . . . . . . . . . . . . 256
6.2.2 Smart ERP Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
6.2.3 IoT Software Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
6.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
6.3 Towards the E-Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
6.3.1 Future Customer Relationship Management . . . . . . . . . . . . 268
6.3.2 The Future of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
6.3.3 Learning for the E-Society . . . . . . . . . . . . . . . . . . . . . . . . . 269
6.4 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
The Web from Freshman to Senior
in 20+ Years (that is, A Short History 1
of the Web)
Since its inception in the early 1990s, the World-Wide Web (the Web for short) has
revolutionized our personal and professional lives and indeed our society more than
many other technological developments in recent history. In this first chapter, we
will outline the evolution of the Web from the early days until today, having just
turned 25 in August 2016.1 This involves taking a brief tour of the history of the
Web, during which we touch upon some of its underlying technological develop-
ments, which have enabled its evolution (and continue to do so). This relates to
hardware, software as well as computer networks and their rapid evolution during
the past 2.5 decades. From a usage perspective, we look at what we have experi-
enced over the past 25 years, primarily viewing the Web as an ever-growing and
omnipresent library of information which we access through search engines and
portals, the Web as a media repository facilitating the hosting and sharing of
resources—often for free, and the Web as an enabler of do-it-yourself services as
well as of disruptive developments in many established industries. In later chapters
we will also discuss the Web as a commerce platform through which people and
companies increasingly conduct their business. We will also look at the compre-
hensive and unprecedented user involvement in the Web. Here, it will not come as a
surprise that the younger generation of ‘digital natives’ today interacts with the Web
in an entirely different way than people did at its inception. In all of this, an
important role is played by data, i.e., the vast amounts of data that are produced on
the Web at increasing speeds and volumes. For some time now, the buzzword here
has been “big data,” although size is only a small part of the story.
These aspects, their impacts, and their results will take us through an evolution
that went from “freshman Web 1.0” which was mainly usable as a one-way
information repository, to “junior Web 2.0,” also termed the “Read-Write Web,”
where end-users started contributing content to the Web, to the situation we are
facing today: constant interaction between people, business, even government
1
www.computerhistory.org/atchm/happy-25th-birthday-to-the-public-web/
© Springer International Publishing AG 2017 1

G. Vossen et al., The Web at Graduation and Beyond,
DOI 10.1007/978-3-319-60161-8_1
2 1 The Web from Freshman to Senior in 20+ Years …
agencies through hyper-connected, high-speed networks that can be accessed

through an ever-increasing variety of devices of multiple shapes and sizes, and the
question is whether we have reached some point of “graduation” after these years.
1.1 Beginnings
Transmit yourself back to 1993, when the World-Wide Web, the WWW, or the
Web as we have generally come to call it, had just arrived. Especially in academia,
where people had been using the Internet since the late 1970s and early 1980s in
various ways and for various purposes including file transfer and email, it quickly
became known that there was a new service available on the Internet. Using this
new service, one could request a file written in a language called HTML (the
Hypertext Markup Language, see below), and if one had a program called a
browser installed on a local computer, that program was able to display or render
the HTML file when it arrived. In simple terms, the Internet had been transformed
from a scientific tool requiring expertise to use and being available only to a small
number of expert users, to an information discovery tool requiring little expertise
and now available to the mass market. We start our tour through the history of the
Web by taking a brief look at browsers.
1.1.1 Browsers: Mosaic and Netscape
One of the first browsers was Mosaic, developed by the National Center for
Supercomputing Applications (NCSA) at the University of Illinois in
Urbana-Champaign in the US. There had been earlier browser developments (e.g.,
Silversmith), but Mosaic was the first graphical browser which could display more
than just plain ASCII text (which is what a text-based browser does). The first
version of Mosaic had limited capabilities: It could access documents and data
using the Web, the File Transfer Protocol (FTP), or several other Internet services;
it could display HTML files comprising text, links, images (in different formats),
and already supported several video formats as well as Postscript; it came with a
toolbar that had shortcut buttons; it maintained a local history as well as a hotlist,
and it allowed the user to set preferences for window size, fonts, etc.
The initial version of Mosaic was launched in March 1993, its final version in
November the same year, and although far from modern browser functionality with
all their plug-ins and extensions, users pretty soon begun to recognize that there was
a new “beast” out with which one could easily access information that was stored in
remote places. A number of other browsers followed, in particular Netscape Nav-
igator (later renamed Communicator, then renamed back to just Netscape) in 1994,
Microsoft Internet Explorer in 1995, Opera in 1996, Apple’s Safari in 2003,
Mozilla Firefox in 2004, and Google Chrome in 2008. Netscape and Microsoft soon
1.1 Beginnings 3
got into what is now known as the browser war (see Quittner and Slatalla 1998),
which at the time was won by Microsoft, although as we will see later, in tech-
nology nothing is permanent. The playing field has changed significantly in recent
years with Google Chrome now dominating the market with close to 60% market
share as of November 2016. Microsoft’s Internet Explorer, now replaced by
Microsoft Edge, Mozilla Firefox and Apple’s Safari are a significant distance
behind (see: www.w3counter.com/trends).
In mid-1994, Silicon Graphics founder Jim Clark started to collaborate with
Marc Andreessen to found Mosaic Communications (later renamed to Netscape
Communications). Andreessen had just graduated from the University of Illinois,
where he had been the leader of the Mosaic project. They both saw the great
potential for Web browsing software, and from the beginning Netscape was a big
success (with more than 80% market share at times), in particular since the software
was free for non-commercial use and came with attractive licensing schemes for
other uses. Netscape’s success was also due to the fact that it introduced a number
of innovative features over the years, among them the on-the-fly displaying of Web
pages while they were still being loaded; in other words, text and images started
appearing on the screen as they were downloading. Earlier browsers did not display
a page until everything that was included had been loaded, which had the effect that
users might have to stare at an empty page for several minutes and which caused
people to speak of the “World-Wide Wait.” With Netscape, however, a user could
begin reading a page even before its entire contents was available, which greatly
enhanced the acceptance of this new medium. Netscape also introduced other new
features (including cookies, frames, and later JavaScript programming), some of
which eventually became open standards through bodies such as the W3C, the
World-Wide Web Consortium (w3.org), and ECMA, the European Computer
Manufacturers Association (now called Ecma International, see www.ecma-
international.org). An image of Version 4 of the Netscape homepage of April
1999 with its “Netcenter” site collection can be found, for example, at blogoscoped.
com/archive/2005-03-23-n30.html; the main features included then were the menu
bar, the navigation, address, and personal toolbars, the status bar, or the component
bar.
Although free as a product for private use, Netscape’s success was big enough to
encourage Clark and Andreessen to take Netscape Communications public in
August 1995. As Dan Gillmor wrote in August 2005 in his blog: “I remember the
day well. Everyone was agog at the way the stock price soared. I mean, this was a
company with scant revenues and no hint of profits. That became a familiar concept
as the decade progressed. The Netscape IPO was, for practical purposes, the Big
Bang of the Internet stock bubble—or, to use a different metaphor, the launching
pad for the outrages and excesses of the late 1990s and their fallout. … Netscape
exemplified everything about the era. It launched with hardly any revenue, though it
did start showing serious revenues and had genuine prospects … .” Netscape was
eventually retired in 2008.
1.1.2 Client/Server and P2P
Mosaic already had basic browser functionality and features that we have mean-
while gotten used to, and it worked in a way we are still using browsers today: the
client/server principle applied to the Internet. This principle is based on a simple
idea, illustrated in Fig. 1.1: Interactions between software systems are broken down
into two roles: Clients request services, servers provide them. When a client needs a
service such as database access, an e-mail to be sent or a print function to be
executed on its behalf, it sends a corresponding request to the respective server. The
server will then process this request, i.e., execute the access, message sending, or
printing, and will eventually send a reply back to the client.
This simple scheme has been used widely, and it is this scheme that interactions
between a browser and a Web server are based upon. A common feature of this
principle is that it often operates in a synchronous fashion: While a server is
responding to the request of a client, the client will typically sit idle and wait for the
reply; only when the reply has arrived, the client will continue whatever it was
doing before sending off the request. This form of interaction is often necessary; for
example, if the client is executing a part of a workflow which needs data from a
remote database, this part cannot be completed before that data has arrived. This
dependency has also been common in the context of the Web until recently, when
asynchronous interaction became more dominant.
The basics that led to launching the Web as a service sitting atop the Internet
were two quickly emerging standards: HTML, the Hypertext Markup Language,
and HTTP, the Hypertext Transfer Protocol. The former is a language, developed
by Tim Berners-Lee at CERN, the European particle physics lab in Geneva,
Switzerland, for describing Web pages, i.e., documents a Web server will store and
a browser will render. The latter is a protocol for transferring a request for a page
from a client to a Web server and for transferring the requested page in a reply back
to the browser. Thus, the client/server principle is also fundamental for the inter-
actions happening on the Web between browsers and Web servers (see Fig. 1.2).
Over the years, HTML has become extremely successful as a tool that can be
employed even without a deep understanding of programming languages to put
Request
Reply
Client Server
Fig. 1.1 The client/server principle

1.1 Beginnings 5
Client Server
Request (URL) get

Published
Web Browser HTTP Web Server resources
Response put
(HTML page)
Script Engine Application Extension HTML
(e.g., JavaScript) program (e.g., PHP)
CSS
external local
sources sources doc+
script
Fig. 1.2 Client-side versus server-side scripting
information on the Web. The reasons for this include the fact that HTML is a vastly
fault-tolerant language, where programming errors are simply ignored, and the
availability of numerous tools, from simple text editors to sophisticated WYSI-
WYG (What You See Is What You Get) environments, for producing HTML
documents.
The client/server principle has over time undergone a generalization, since it has
also had an impact on how users see the Web and the information it provides.
Initially, when the Web first appeared and HTML became available as a markup
language for Web pages, people composed their HTML code in a text editor, a way
that still works today. A few years later, tools became available for designing Web
pages and for setting up Web sites more and more easily. Some of these simply
allowed users to design HTML documents and to include links, graphics, maybe
even audio and video in a WYSIWYG fashion, others allowed for an easy man-
agement of entire Web sites comprised of multiple pages. The modern result of this
development are content management systems (CMS), which underpin most major
Web sites today, in particular those maintained at an enterprise level (Barker 2016).
What is more important is the fact that over time more and more people started
setting up sites using these tools, and the obvious consequence was that the
information available on the Web grew exponentially. Once a site had been created,
the next important issue was to facilitate browsers to locate it, for which the
emerging class of search engines provided registration mechanisms, sometimes for
free, although increasingly with a fee. This also led to the development of tricks
that, for example, faked high popularity of a site just in order to get a high (i.e.,
close to the top) placement within search results. Besides text documents, people
soon started to place other types of documents on the Web, in particular media such
as image, audio, and video files. Now every Web user is likely to have experienced
how easy it is to save (actually copy) an image found in an HTML document: just
right-click on the image and select the “save image as” option! Similarly, audio and
video files can easily be downloaded and copied to a local computer, as long as
access to these files is granted. The fact that obtaining information from the Web
became so easy and because of the sheer number of files available on the Web, the
way was paved for a new attitude towards information and its consumption.
It soon turned out that the traditional client/server model behind the Web was
less than optimal for some interactions, including the download or streaming of
large files, e.g., a video file that contains a 90-minute movie. Video streaming is not
just a matter of bandwidth; it is also a matter of a single server being occupied with
a large request for quite some time. In response to this problem, peer-to-peer (P2P)
networks were devised, which bypassed the need for a central server to take care of
all incoming requests. Instead, a P2P network primarily relies on the computing
power and bandwidth of its participants and is typically used for connecting nodes
via mostly ad-hoc connections. A P2P network also does not distinguish between a
client and a server; any participant in the network can function as either a client or a
server to the other nodes of the network, as needed by the task at hand. In fact, a
P2P system comes with complete and decentralized self-management and resource
usage, and it enables two or more peers to collaborate spontaneously in a network
of equals (peers) by using appropriate information and communication systems.
As mentioned, one of the many uses for P2P networks is the sharing of large
files, which is today done on a large scale on the Internet. The different P2P systems
in use are based on distinct file-sharing architectures with different principles,
advantages, disadvantages and naming conventions. One of the consequences of
developments like P2P networks, communication protocols, and other tools just
described, has been that information and files started to become available on the
Web which previously had been relatively difficult and costly to acquire. We are not
delving into legal issues related to services like Napster, Kazaa or Mega Upload
here or into general issues related to copyright or intellectual rights and their
protection. However, fact is that many users around the globe have started using the
Internet and the Web as a free source for almost everything. For example, once the
mp3 format had been invented as a digital audio encoding and compression format
by Fraunhofer’s Institut für Integrierte Schaltungen in Erlangen, Germany, music
got transformed into mp3 format en masse and then could be copied freely between
computers and other devices. As a result, users started illegally “ripping” music
CDs and exchanging their content over the Internet; others then took videos of
recently released movies with a camcorder in a cinema, compressed them into a
suitable video format, and put them on a file-sharing network for general copying.
Today, the most prominent site for online video is YouTube, from where music and
video can be streamed. As an aside, we direct the reader interested in how the music
business changed as a consequence of the above to www.digitalmusicnews.com/
2014/08/15/30-years-music-industry-change-30-seconds-less/.
1.1 Beginnings 7
1.1.3 HTML and XML
In parallel to hardware becoming a commodity over the last 40 years, software

development has also become much less expensive. In particular, the open-source
world has brought along a considerable number of tools through which software
development is supported today. An early manifestation was the “LAMP” model of
a service stack with its four open-source components Linux operating system,
Apache HTTP server, MySQL database system, and PHP programming language.
Over time, other components were exchanged for one or more of these, and the
current incarnation is represented by the “XAMPP” solution stack package, with
XAMPP standing for Cross-Platform (X), Apache HTTP server (A), MariaDB
database system (M), interpreters for scripts written in either PHP (P) or Perl (P). It
was developed by Apache Friends.2
We have already mentioned the arrival, together with the Web, of HTML, the
predominant markup language for the creation of Web pages. HTML provides
means to structure text-based information in a document by denoting headings,
tables, paragraphs, or lists, and to supplement that text with forms, images, links,
and interaction. As mentioned, the language was originally developed by Tim
Berners-Lee in the context of his creation of the Web (Berners-Lee 2000), and it
became popular through the fact that it is easy to use. An HTML document can
quickly be set up using just a few structuring elements again called tags. Tags have
to follow some simple syntactical rules and are often used to describe both content
and presentation of a document. The separation of presentation and content became
an issue when Web pages started to be rendered on more and more devices,
including computer terminals, laptop screens, and smartphone or tablet displays,
since each device has its own capabilities, requirements, and restrictions. It has also
become important due to the fact that HTML is increasingly generated dynamically
by applications, rather than being stored as static files. For example, an online
database will rely on the assumption that no layout information needs to be stored
for its content, but that this information will be added once its content is being
accessed for display. In HTML, presentation can be specified within a document or
separately within a cascading style sheet (CSS) file.
HTML tags are all predefined, and although there are ways to include additional
tags (for example, through the embedding of scripting-language code), tags can
generally not be defined by the individual user. This is different in XML (Harold
and Means 2004; Key 2015), the Extensible Markup Language, a W3C recom-
mendation for a general-purpose markup language that supports a wide variety of
applications and that has no predefined tags at all. Markup languages based on the
XML standard are easy to design, and this has been done for such diverse fields as
astronomy, biochemistry, music, or mathematics and for such distinct applications
like voice or news. XML-based languages are also reasonably human-readable,
since the tags used can be chosen in such a way that they relate to the meaning of
the particular portion of the document that they enclose. XML is a simplified subset
2
www.apachefriends.org
of the Standard Generalized Markup Language (SGML) and is widely used in

information integration and sharing applications, in particular as they arise on the
Internet. Any XML-based language should have an associated syntax specification,
which can take the form of a document type definition (DTD), or of an XML Schema
Definition (XSD), which specifies a schema roughly in the style and detail of
structure and type declarations found in programming languages or database
schema languages.
XML has had an impact on HTML in that it has brought along XHTML, a
version of HTML that follows the same strict syntax rules as XML. More impor-
tantly, XML has become a universal enabler for a number of applications on the
Web. For example, e-commerce sites use XML-based language intensively for
document exchange or integration. Examples include GS1 (www.gs1.org/), and
ebXML (Electronic Business using eXtensible Markup Language, see www.ebxml.
org), platforms which provide standardized XML documents for e-commerce items
such as orders, invoices, etc.
As mentioned, an HTML document is allowed to have scripting code embedded.
This arose out of the necessity to make Web pages dynamic as well as interactive.
Indeed, often when users are asked for input, that input needs to be checked for
correctness and completeness, or it needs to be sent off to a server for verification
and most likely storage in a database. Moreover, the response a Web server creates
upon the arrival of user input may have to be generated dynamically, e.g., to
acknowledge or reject the input, in which case HTML needs to be created on the
fly. To this end, an important distinction refers to the question of whether scripting
occurs at the client side or at the server side, see Fig. 1.2. Client-side scripting, very
often seen in the form of JavaScript, makes use of the fact that a browser can not
only render HTML pages, but also execute programs. These programs, which have
to be written in a script language, will be interpreted just like HTML code in
general. Thus, some of the tasks arising in a Web site can be off-loaded onto the
client. On the other hand, certain things cannot be done at the client side, in
particular when access to a database on the Web is needed. With server-side
scripting using, for example, the PHP language, user requests are fulfilled by
running a script directly on the Web server to generate dynamic HTML pages; it
can be used to provide interactive Web sites that interface to databases or other data
stores as well as local or external sources, with the primary advantage being the
ability to customize the response based on a user’s requirements, access rights, or
query results returned by a database.
While PHP is primarily used on Web servers, there are other languages, origi-
nally used for other purposes, which have over time been extended to also support
server-side scripting functionality. For example, Java, the popular programming
language with its Enterprise Edition platform has constructs such as Servlets or
JavaServer Faces which allow HTML pages to be generated by Java applications
running on the server (for details, see www.oracle.com/technetwork/java/index.
html).
Both client-side and server-side scripting is based on the client/server paradigm
and on the fact that any such interaction so far has been assumed to be synchronous.
1.1 Beginnings 9
In order to enhance Web programming even further, a recent idea has been to not
only allow HTML creation or modification on the fly (“dynamically”), but to be
able to provide direct feedback to the user via on-the-fly HTML generation on the
client. This, combined with asynchronous processing of data which allows sending
data directly to the server for processing and receiving responses from the server
without the need to reload an entire page, has led to a further separation of user
interface logic from business logic now known by the acronym Ajax (Asyn-
chronous JavaScript and XML). Ajax was one of the first Web development
techniques that allow developers to build rich Web applications that are similar in
functionality to classical desktop applications, yet they run in a Web browser. Its
main functionality stems from an exploitation of XMLHttpRequest, a JavaScript
class (with specific properties and methods) supported by most browsers which
allows HTTP requests to be sent from inside JavaScript code. Ajax calls are pro-
vided by popular JavaScript libraries such as jQuery or AngularJS.
Out of the numerous applications XML has seen to date, we just mention Web
services, which extend the client/server paradigm by the notion of a registry,
thereby solving the problem of locating a service in a way that is appropriate for the
Web. In principle, they work as follows: A service requestor (client) looking for a
service sends a corresponding query to a service registry. If the desired service is
found, the client can contact the service provider and use the service. The provider
has previously published his service(s) in the registry. Hence, Web services hide all
details concerning their implementation and the platforms they are based on; they
essentially come with a unique URI3 that points to their provider. Since Web
services are generally assumed to be interoperable, they can be combined with other
services to build new applications with more comprehensive functionality than any
single service involved. It has to be noted, however, that this appealing concept has
been obscured by the fact that vendors have often insisted on proprietary registries,
thereby hindering true interoperability.
1.1.4 Commerce on the Web
Roughly during the mid-90s people started thinking about ways to monetize the
Web and discovering that there is also a commercial side to the Web. We have
already mentioned the Netscape IPO, but commercialization was and is not just
about buying (and eventually selling) Internet companies.
A first step towards commercialization has been to attract user attention and,
once obtained, to retain it. A popular approach has been to require registration in
exchange for access to additional features or services, or even to the site at all.
Without being historically precise about the order in which this has occurred,
examples include Amazon.com, which let users create their personal wish list after
3
A URI is a Uniform Resource Identifier or character string used to identify a resource. The most
common form of a URI is the Uniform Resource Locator (URL).
logging in, as well as (former) Yahoo!,4 Google, or Facebook. Once you have
registered for an account at any of these or many other sites, you may be allowed to
use storage space, communicate with other people, or set your personal preferences.
Sites such as Dropbox or zipcloud allow registered users to upload files, invite other
participants to access your files, or use their free email or messaging service etc.
What you may have to accept as a kind of compensation is that advertisements will
be placed on the pages you look at, next to the results of searches you do on that
site, or be sent to your email account from time to time. As we will discuss in more
depth later, advertising on the Web has become one of the most prominent Internet
business models, and the idea of “free” sites just described turns out to be a highly
attractive advertising channel. Clearly, the more people register at a site, i.e., reveal
some of their personal data and maybe even a user profile of preferences and
hobbies, the more data the site owner will have available and the more he can do
with it. Experience also shows that people do not re-register for similar service
functionality from distinct providers too often. Thus, there is some form of cus-
tomer retention right away, and then is it often just a small step to starting to offer
these customers a little extra service for which they then, however, have to pay.
Commercialization of the Web has in particular materialized in the form of
electronic commerce, commonly abbreviated e-commerce, which involves moving
a considerable amount of shopping and retail activity essentially from the street to
the Web or from the physical to a virtual world. More generally, e-commerce refers
to selling goods or services over the Internet or over other online systems, where
payments may be made online or otherwise. It was typically during the weeks
before Christmas in which the success as well as the growth of e-commerce could
be measured best every year. In the beginning, customers were reluctant to do
electronic shopping, since it was uncommon, it was not considered an “experience”
as it may well be when strolling through physical shops, and it was often considered
unreliable and insecure; initially, paying by credit card over the Web was even a
“no-go.” Many companies entering this new form of electronic business were not
ready yet, unaware of the process modifications they would have to install in their
front- and back-offices, and unfamiliar with the various options they had from the
very beginning. Major obstacles in the beginning also were lacking security, in
particular when it came to payments over the Web, and lacking trust, in particular
when it came to the question of whether goods I had paid for would indeed be
delivered to me. The global nature of the Web also resulted in a range of geo-
graphical, legal, language, and taxation issues being uncovered. As a consequence,
e-commerce took off slowly in the very beginning. However, the obstacles were
soon overcome, for example by improvement in hardware and software (e.g.,
session handling), by appropriately encrypting payment information, by corre-
sponding measures from credit card companies, or by the introduction of trusted
third parties for handling the physical aspects of sales transactions such as PayPal.
Regional issues were addressed by way of “localized” mirror sites. Then, towards
the end of the 20th century, e-commerce started flying, with companies such as
4
which was renamed Altaba in early 2017 after selling its Web business.
1.1 Beginnings 11
CDnow and Amazon.com, later also eBay, and sales figures soon went beyond the
billion-dollar threshold. Today, also “brick-and-mortar” retail chains such as
Walmart, Target, or Costco make considerable, if not most of their revenues online,
in addition to the numerous stores they run in a traditional way.
However, it was also discovered that e-commerce and selling over the Web was
not the only way of making money on or through the Web. Indeed, another was
placing advertisements, and ultimately to introduce paid clicks. Besides all this is,
of course, the telecommunication industry, for which technological advances such
as the arrival of DSL or wireless networks brought entirely new business models for
both the professional and the private customer.
CDnow is a good example of how setting up a new type of business on the Web
took off. CDnow was created in August 1994 by brothers Jason and Matthew Olim,
roughly at the same time Jeff Bezos created Amazon.com. As they describe in Olim
et al. (1999), their personal account of the company, it was started in the basement
of their parents’ home; Jason became the president and CEO and Matthew the
Principal Software Engineer. The company was incorporated in Pennsylvania in
1994 and originally specialized in selling hard-to-find CDs. It went public in
February 1998, and after financial difficulties eventually merged with Bertelsmann,
the big German media company, in 2000. CDnow became famous for its unique
internal music rating and recommendation service, which was also often used by
those who had never actually purchased a product on the site. In late 2002,
Amazon.com began operating the CDnow web site, but discontinued CDnow’s
music-profiling section.
What the Olim brothers detected early on was that the Web offered a unique
chance to provide not only basic product information, but highly specialized
information that previously had required an enormous amount of research to come
by. This is what they provided for music on CDs, and they combined their infor-
mation and catalogue service with the possibility to buy CDs directly from them. At
some point and for a short period of time, CDnow was probably the best online
store for music, as it was able to integrate so much information on a CD, on an
artist, or on a group in one place and in so many distinct categories. Their selection
was enormous, and most of the time whatever they offered could be delivered
within days. They also ran into problems that nobody had foreseen in the very
beginning, for example that customs fees often had to be paid when a package of
CDs is delivered to an addressee in a foreign country. In other words, legal issues
related to commerce over a network that does not really have physical boundaries,
came up in this context (as it did for any other shop that now started selling
internationally), and many of these issues remain unresolved today. We note that
this issue does not apply in most parts of the world, for non-physical goods, which
can be distributed electronically and for which these issues do not exist. Apple’s
iTunes is currently by far the most popular service for distributing music
electronically.
We will return to the topic of e-commerce and several of its distinctive features
in Chap. 3.
1.2 The Search Paradigm
The key to what made the Web so popular early on is the fact that a Web page or an
HTML document can contain hyperlinks or links for short, which are references to
other pages (or other places in a current page). The origin of this is hypertext, an
approach to overcome the linearity of traditional text that was originally suggested
by Vannevar Bush in an essay entitled As We May Think which appeared in The
Atlantic Monthly in July 1945 (see www.theatlantic.com/magazine/archive/1945/
07/as-we-may-think/303881/). Selecting a link that appears in a given HTML
document causes the browser to send off a request for the page whose address is
included in the link (or, if the link points to another place in the current page, to go
that position); this page will then be displayed next.
1.2.1 The Web as a Graph
Figure 1.3 presents a simplistic, graphical portrayal of what the intuition just out-
lined can mean; the Web is a large collection of hyperlinked documents and can be
perceived, from a more technical point of view, as a directed graph in which the
individual pages or HTML documents are the nodes, and in which links leading
from one page to another (or back to the same page) are the (directed) edges.
Web page Hyperlink
Fig. 1.3 Navigation through the Web along hyperlinks

1.2 The Search Paradigm 13
Figure 1.3 hence shows a very small and finite sample of nodes and links only (but
it can easily be extended in any direction and by any number of further nodes and
edges).
Links in HTML are technically anchors which typically are composed of a name
(that will show up in the document where the links is placed) and a URL, a
Universal Resource Locator or logical address of a Web page. When a user clicks
on the link, the browser will contact the Web server behind that URL (through
common network protocols which, among other things, will ensure name resolu-
tion, i.e., translate the URL into the physical IP address of the computer storing the
requested resource, through various steps of address translation) and request the
respective HTML document (cf. Fig. 1.2). Links allow a form of navigation
through the Web, the idea being that if something that a user is looking for is not
contained in the current page, the page might contain a link to be followed for
getting her or him to the next page, which may in turn be more relevant to the
subject in question, or may contain another link to be followed, and so on. Links,
however, need not necessarily point to other pages (external links), but can also be
used to jump back and forth within a single page (internal links) or they can link to
different types of content (e.g., images, videos).
From a conceptual point of view, it is fair to say that the Web is a very large and
dynamic graph in which both nodes and edges come and go. Moreover, parts of the
Web might be unreachable at a time due to network problems, or Web designers
may add new pages with links and from time to time remove old ones. As a
consequence, looking up information on the Web typically relies on exploration,
i.e., a progression along paths or sequences of nodes without predetermined targets.
This is where the activity of search comes in. In the early days of the Web,
automated tools for exploration and search had not yet been developed; instead
these activities were often done manually by an information broker. While the
information broker as a job description has lost relevance over the years due to the
arrival of automated tools, an important aspect is still around today, that of price
comparisons. Indeed, comparing prices over the Web has become an important
activity, for both companies and individual users, and is a form of information
brokering still available today through companies or sites such as DealTime,
mySimon, BizRate, Pricewatch, or PriceGrabber, to name just a few.
1.2.2 Search Engines
Search engines are today’s most important tool for finding information on the Web,
and they emerged relatively soon after the Web was launched in 1993. Although “to
search” the Web is nowadays often identified with “to google” the Web (see
searchenginewatch.com for stats about which search engine gets how much traffic),
Google was a relative latecomer, and will most likely not be the last. The early dates
were dominated by the likes of Excite (launched in 1993), Yahoo, Webcrawler,
Lycos, and Altavista, all of which came into being in 1994. Google however, it has
dominated the search field ever since its launch in the fall of 1998, and it has
invented many tools and services now taken for granted. For fairness reasons, we
mention that InfoSeek, AlltheWeb, Ask, Vivisimo, A9, Wisenut, Windows Live
Search, as well as others, have all provided search functions, at some point in time,
over the past 25 years.
Search has indeed become ubiquitous. Today people search from the interface of
a search engine, and then browse through an initial portion of the often thousands or
even millions of answers the engine returns. Search often even replaces entering a
precise URL into a browser. In fact, search has become so universal that Battelle
(2005) speaks of the Database of Intentions that exists on the Web: It is not a
materialized database stored on a particular server, but “the aggregate results of
every search ever entered, every result list ever tendered, and every path taken as a
result.” He continues to state that the Database of Intentions “represents a real-time
history of post-Web culture—a massive click stream database of desires, needs,
wants, and preferences that can be discovered, subpoenaed, archived, tracked, and
exploited for all sorts of ends.” Search not only happens explicitly, by referring to a
search engine; it also happens to a large extent inside other sites, for example within
a shopping or an auction site where the user is looking for a particular category or
product; also most newspaper sites provide a search function that can be used on
their archive. As a result, a major portion of the time presently spent on the Web is
actually spent searching, and Battelle keeps up to date with developments in this
field in his search blog on the topic (see battellemedia.com). Notice that the search
paradigm has meanwhile become popular even in the context of file system orga-
nization or e-mail accounts, where a search function often replaces a structured, yet
eventually huge and confusing way of organizing content (see also the end of this
section).
From a technical perspective, a search engine is typically based on techniques
from information retrieval (IR) and has three major components as indicated in
Fig. 1.4: A crawler, an indexer, and a runtime system. The crawler explores the
Web as indicated above and constantly copies pages from the Web and delivers
them to the search engine provider for analysis. Analysis is done by an indexer
which extracts terms from the page using IR techniques and inserts them into a
database (the actual index). Each term is associated with the document (and its
URL) from which it was extracted. Finally, there is the runtime system that answers
user queries. When a user initiates a search for a particular term, the indexer will
return a number of pages that may be relevant. These are ranked by the runtime
system, where the idea almost always is to show “most relevant” documents first,
whatever the definition of relevance is. The pages are then returned to the user in
that order.
A crawler commonly revisits Web pages from time to time, in order to keep its
associated index up-to-date. Thus, a search query will typically return the most
recent version of a Web page. If a user is interested in previous versions or wants to
see how a page has evolved over time (if at all), the place to look is the Wayback
Machine at the Internet Archive (see www.archive.org/web/web.php), which has
been crawling the Web on a daily basis ever since 1996.
Web pages User query

Internet
Crawler
Runtime system
Index /
database
Indexer
Fig. 1.4 Anatomy of a search engine
The popularity of Google grew out of the fact that they developed an entirely
new algorithmic approach to search. Before Google, it was essential to locate just
any site whose content was related or contained a given search term. To this end,
search engine builders constructed indexes of Web pages and often just stored the
respective URLs. As an answer to a query, a user would be returned a list of URLs
through which he or she then had to manually work through. Google co-founder
Larry Page came up with the idea that not all search results could be equally
relevant to a given query, but unlike the information broker, who can exploit his or
her expertise on a particular field, an automated search engine needs additional
ways to evaluate results. What Page suggested was to rank search results, and he
developed a particular algorithm for doing so; the result of that algorithm applied to
a given page is the PageRank, named after his inventor. The PageRank of a page is
calculated using a recursive formula (see infolab.stanford.edu/*backrub/google.
html for details), whose underlying idea is simple: Consider a doctor. The more
people that recommend the doctor, the better he or she is expected to be. It is similar
with ranking a Web page: The more pages linked to a page p, the higher the rank of
p will be. However, the quality of a doctor also depends on the quality of the
recommender. It makes a difference whether a medical colleague or a salesperson
for the pharmaceutical industry recommends her or him. If the doctor is
recommended by another doctor, that recommendation might count as 100%; a

recommendation from a nurse without comprehensive medical education might
count only as 60%, that from a patient 30%, and that from the salesperson, possibly
having an interest completely disjoint from that of the doctor, might count 0%. The
principle behind this, also found, for example, in classical scientific citations, is thus
based on the idea of looking at the links going into a page p in order to calculate the
rank of p, but to do so by recursively ranking all pages from which these incoming
links emerge. The idea was first explored while Google founders Sergey Brin and
Larry Page worked on a project called BackRub at Stanford University; over the
years, Google has added other criteria for constructing the order in which search
results are presented to the user besides PageRank.
Although we are not discussing the details of a PageRank calculation here, we
mention that it involves iterated matrix-vector multiplications in huge dimensions
(potentially in the millions or even billions), a fact that gave rise to a quest for
efficient ways to organize such computations on parallel hardware and ultimately to
the map-reduce paradigm that we will discuss in Chap. 2.
The comparison between an information broker, who can apply intuition, human
expertise and reasoning, as well as experience and domain knowledge to search
results in order to distinguish good and bad ones, and a search engine, which has to
do all of this based on some form of artificial intelligence, points to the fact that
even beyond Google there is room for improvement in search. Suppose you search
for the term “light”; are you looking for something not heavy or for something not
dark? A ranking algorithm may learn from the user as a search progresses. The
original PageRank approach has been extended and modified in many ways over
the years, with Edge-Weighted Personalized PageRank by Xie et al. (2015) being
among the latest developments. Besides keyword-based search, other ideas for
search have developed around personal histories, text search, topic interpretation,
word associations, or taxonomies. For latest information, the reader should consult
searchenginewatch.com/. To take a wider view on search engine developments,
optimization, and marketing, we suggest looking at www.webmasterworld.com/.
1.2.3 The Long Tail
Anyone interested in statistics about searching should consult, for example, Goo-
gle’s Zeitgeist (at www.google.com/zeitgeist), which keeps rankings about most
popular search terms in recent past. Other statistics may be obtained from places
like Nielsen//Netratings or the aforementioned SearchEngineWatch. What people
have observed by looking at these figures is, among other things, that a few queries
have a very high frequency, i.e., are asked by many people and pretty often, but the
large majority of queries have a considerably lower frequency; a possible appli-
cation of the 80/20 rule. When plotted as a curve, where the x-axis represents a list
of (a fixed number of) queries, while the y-axis indicates their frequency, the graph
will look like the “hockey stick” shown in Fig. 1.5. Graphs of this form follow a
frequency
top 20%
the long tail
popularity
Fig. 1.5 The long tail (of search queries)
power-law type of distribution: They exhibit a steep decline after an initial, say,
20%, followed by a massive tail into which the curve flattens out. Power laws can
be found in many fields: the aforementioned search term frequency, book sales, or
popularity of Web pages. Traditionally, when resources are limited, e.g., space on a
book shelf or time on a TV channel, the tail gets cut off at some point. The term
long tail is used to describe a situation where such a cutting off does not occur, but
the entire tail is preserved. For example, there is no need for search engine pro-
viders to disallow search queries that are only used very infrequently.
As we will see as we go along, the long tail phenomenon can be observed in a
variety of contexts related to the Internet and the Web. For example, it applies to
electronic commerce, where the availability of cheap and easy-to-use technology
has enabled a number of companies to take advantage of the broader reach provided
by the Internet which otherwise would not have considered entering this arena.
Think, for example, of shops selling high-end watches or cameras. It also applied in
the context of online advertising.
1.2.4 Directories
Yahoo! and AOL were among the first to recognize that the Web, following the
explosion of the number of pages that occurred in the mid-90s needed some form of
organization, and they did so by creating directories containing categorizations of
Web site content and pathways to other content. These were hierarchically orga-
nized catalogues of other sites, and many of them were later developed into portals.
A portal can be seen as an entry point to the Web or a pathway to Web sources that
has a number of topical sections that are owned and managed by the main site and
Arts Business Computers

Games Health Home
News Recreation Reference
Regional Science Shopping
Society Sports Kids & Teens Directory
Fig. 1.6 DMOZ categories
that typically provides some personalization features (e.g., choice of language). As

a typical directory page, the home page of the Open Directory Project, also known
as Directory Mozilla (DMOZ), with categories as shown in Fig. 1.6 can be found at
dmoz.org/. Each such category will typically contain a list of subtopics, eventually
together with the current number of hits in each.
Directories and portals pre-date search engines, and their developers originally
did not accept the idea that search was a necessary tool for the Web. The reason
behind this is easily identified as being commercial: If a portal has sites included in
its categories and places banner ads on pages, it will be interested in many people
using the portal and its listings, so that the ads can drive home revenue. But this will
only work if traffic is not directed to other sites, which may not be listed in the
portal, by a search engine. In other words, directories and portals were originally
afraid that search engines take away too much of the traffic that would otherwise
reach them. From today’s perspective, the original portal idea of providing a single
point of access to the Web still exists, yet usability has attained much greater
importance over the years and resultantly the appearance of portals has changed
considerably.
Directories and portals can not only be seen as an alternative (or competition,
depending on the point of view) to search engines, where potential search results
have been categorized in advance; they are often also highly specialized, for
example towards a particular business branch or interest. Examples in the travel
industry include travel booking portals such as Travelocity, Hotels, Expedia, Tri-
vago, or Booking.com, one in a scientific area is dblp.org, the comprehensive
electronic library for computer science. Thus, portals have actually lost little of their
popularity over the past 2.5 decades, and indeed new portals are still being launched
today.
We close this section by mentioning that search as a paradigm (and as a
“competition” to directory-based forms of organization) has been successful, way
beyond what was anticipated in the original application of search engines and the
Web. It is nowadays common, for example, to organize a an e-mailbox no longer
using folders, but keeping all mail in one place and apply search in case a specific
message is required. Personal notebooks frequently allow the use of tags to be
attached to entries, which essentially are keywords that can be searched for, in case
the user needs to look at a particular note; the activity of placing tags is corre-
spondingly called tagging. The same paradigm is used in tools like Evernote,
Google Keep, or Ubernote, to name just a few. The possibility for users to attach
tags to documents originally marked the arrival of user input to the Web and
coincided with the opportunity for customers to place reviews or ratings on an
e-commerce site. For e-commerce, ratings and comments have been shown to have
a major impact on revenues a seller may be able to obtain, and that is no surprise: If
a seller is getting bad ratings repeatedly, why would anyone buy from them in the
future? This input is typically exploited in various ways, including the formation of
virtual communities. Such communities are characterized by the fact that its
members might not know each other, but they all (or at least most of them) share
common interests. This phenomenon has been identified and studied by many
researchers in recent years, and it represents a major aspect of the socialization of
the Internet and the Web.
1.3 Hardware Developments
We have previously mentioned some of the key (hardware or software) technolo-

gies that has emerged in the context of the Web, that has helped to make the Web
popular, or that serves as a foundation in general. We now consider more recent
technology developments, and touch upon the most relevant technological advances
and communication infrastructure. However, this is not intended as an in-depth
treatment of hardware or networking technology, so we again refer the reader
seeking further details to the relevant literature.
1.3.1 Moore’s Law: From Mainframes to Smartphones
In hardware, there is essentially one singular development that governs it all: the
fact that hardware is becoming smaller and smaller and will ultimately disappear
from visibility. Consider, for example, the personal computer (PC). While already
more than 10 years old when the Web was launched, it has shrunk (and become
cheaper) on a continual basis ever since, with laptops and tablets now being more
popular (demonstrated by their sales figures) than desktops. Moreover, with pro-
cessors embedded into other systems such as cars, smartphones, watches etc., we
can now carry computing power in our pockets that was unthinkable only a few
years ago (and that typically outperforms the computing power that was needed to
fly man to the Moon in the late 1960s by orders of magnitude). Just think of a
smartphone that is powered by a microprocessor, has some 128 GB of memory,
that can run a common operating system, and that can have a host of applications
(“apps”) installed (and can have many of them running simultaneously). Thus, in
many applications we do not see the computer anymore, and this trend, which has
been envisioned, for example, by Norman (1999), will continue in ways we cannot
even accurately predict today.
Another important aspect of hardware development has always been that prices
keep dropping, in spite of expectations that this cannot go on forever. As has been
noted, Moore’s Law is still valid after 50 years, and expected to remain valid for a
few more years (chips nowadays have such a high package density that the heat
produced will eventually bring it to an end). In this “law” Gordon Moore, one of the
founders of chipmaker Intel, predicted in 1965 in an article in the “Electronics”
journal that the number transistors on a chip would double every 12–18 months; he
later corrected the time span to 24 months, but that does not change the underlying
tenet of his prediction. A plot of Moore’s Law can be found at en.wikipedia.org/
wiki/Moore’s_law or, for example, at pointsandfigures.com/2015/04/18/moores-
law/. It turns out that microprocessor packaging has been able to keep up with this
law (indeed, processor fabrication has moved from 90 nm5 structures in the
mid-2000s to 14 nm today and will most likely shrink even further6). Raymond
Kurzweil, one of the primary visionaries of artificial intelligence and father of
famous music synthesizers, and others, consider Moore’s Law a special case of a
more general law that applies to the technological evolution in general: If the
potential of a specific technology is exhausted, it is replaced by a new one.
Kurzweil does not use the “transistors-per-chip” measure, but prefers “computing
power per $1,000 machine.” Indeed, considering the evolution of computers from
mechanical devices via tubes and transistors to present-day microprocessors, it
exhibits a double-exponential growth of efficiency. The computing power per
$1,000 (mechanical) computer has doubled between 1910 and 1950 every three
years, between 1950 and 1966 roughly every two years, and presently doubles
almost annually.
As a result, hardware has become a commodity, cheap and ubiquitous, and—as
will be discussed in Chap. 2—is often provided today “in the cloud.” For being able
to use hardware, be it processors or storage, it is not necessary anymore to purchase
it, since computing power and storage capacity can nowadays be rented as well as
scaled up and down on-demand. With many Internet providers (as well as other
companies, for example Amazon or Google), private or commercial customers can
choose the type of machine they need (with characteristics such as number of
processors, clock frequency, or main memory), the desired amount of storage, and
the rental period and then get charged, say, on a monthly basis (or based on other
measures, e.g., CPU cycles or hours). This leasing approach, one core aspect of
cloud sourcing, has become an attractive alternative to purchasing for the reason
that, since hardware ages so fast, there is no more need to replace/dispose-of out of
date technology, when it is still functioning as intended.
One consequence of the hardware developments just described and the
increasing hardware complexity is the trend to develop hardware and software no
5
Nanometer.
6
7 nm is currently considered the end of the line by many people, since at 7 nm distance transistors
sit so close to each other that an effect called quantum tunneling occurs, which means that the
transistor cannot reliably be switched off and will mostly stay on. Graphene might then become a
replacement for silicon in the production of microchips.
1.3 Hardware Developments 21
longer separately, but together; the result are engineered systems or “appliances” as
marketed nowadays by vendors such as Oracle or SAP and which will be discussed
in Chap. 2.
1.3.2 IP Networking
Interestingly, a similar trend of evolving from an expensive and rare technicality

into a cheap and ubiquitous commodity can be observed in computer networking, at
least from an end-user perspective. This is especially true for networks that are
based on the TCP/IP (Transmission Control Protocol/Internet Protocol) protocol
stack and which are hence considered to be part of the Internet. Essentially, net-
works of this category break messages to be transmitted into packets which are
equipped with addressing information as well as protection against transmission
errors and which will then travel individually, possibly via different routes, between
the sender and the receiver; transmission control makes this reliable by assuring that
all packets will ultimately arrive at the receiver and that the latter will be able to
correctly order and reassemble them into the original message. A vivid explanation
of these basics can be found at www.warriorsofthe.net.
The arrival and widespread use of wireless network technology has made it
possible to get connected to the Internet without cables, and many modern devices,
most notably laptop computers and tablets, are able to establish a connection to the
nearest hot spot by themselves (we will say more about mobile technology in the
next section). At the same time, cable-based networks, with fiber optics having
replaced copper wires to a large extent, have started moving into private homes as
the technology continues to decrease in price and convergence of technologies is
beginning to materialize. For instance, some providers nowadays let users establish
an Internet connection over a powerline from which electricity is otherwise
obtained; moreover, most providers have nowadays integrated Internet and tele-
phone communications. The latter has become known under acronym Voice over IP
(VoIP), originally been made popular by Skype. Skype users make telephone and
video calls through their computer using Skype client software and an Internet
connection. Users may also communicate with landlines and mobile telephones,
although this requires setting up an account in which the caller has deposited funds.
Skype operates on a P2P (peer-to-peer) model rather than a client-server model. The
Skype user directory is entirely decentralized and distributed among the nodes in
the network, so that the network can easily scale to large sizes. www.go-rbcs.com/
articles/tech-growth-curves is a collection of growth curves which, among others,
shows the growth in network capacity over the past 30 years.
The Internet has become a maturing, universal, and worldwide accessible net-
work that continues to grow and advance technologically rapidly. In the late 1990s,
it was impacted considerably by the aforementioned dot-com bubble and its
gigantic speculation in Internet stocks, which provided the money for establishing
high-bandwidth networks; this laid the foundation for broadband Internet applica-
tions and the integration of data, voice, and video services on the single
technological basis that we are used to today. As remarked by Friedman (2005), one
of the consequences of the burst of the dot-com bubble was an oversupply of
fiber-optic cable capacity especially in the US, of which many newly created ser-
vice providers were able to take advantage.
The mid-1990s also saw a growing need for administration of Internet issues,
one result of which was the creation of ICANN (www.icann.org), the Internet
Corporation for Assigned Names and Numbers. ICANN is a private non-profit
organization based in Marina Del Rey, California, whose basic task is the technical
administration of the Internet, in particular the assignment of domain names and IP
addresses as well as the introduction of new top-level domains. To this end, it is
worth mentioning that naming on the Internet follows a hierarchical pattern as
defined in the Domain Name System (DNS), which translates domain or computer
hostnames into IP addresses, thereby providing a worldwide keyword-based redi-
rection service. It also lists mail exchange servers accepting e-mail for each domain,
and it makes it possible for people to assign authoritative names without needing to
communicate with a central registrar each time. The mid-1990s moreover saw the
formation of organizations dealing with the development of standards related to
Web technology, most notably the World Wide Web Consortium (W3C), founded
by Web inventor Tim Berners-Lee, for Web standards in general (see http://www.
w3.org) and the Organization for the Advancement of Structured Information
Standards (OASIS) for standards related to electronic business and Web services
(see www.oasis-open.org).
With broadband and wireless technology available as an increasingly ubiquitous
commodity (with the only remaining major exception being developing nations),
we constantly see a host of new applications and services arise and delivered over
the Internet and the Web, with digital radio and television only being precursors of
what is still to come; moreover, we see a constant growth in mobile usage of the
Web (see next section). Broadband communication in particular allows for an easy
transfer of large files, so that, for example, it becomes possible to watch movies
over the Internet on a mobile device, since at some point it will be possible to
guarantee a constant transfer rate over a certain period of time. For example, the
broadband penetration of homes in the US has gone up considerably since the year
2000, and indeed the period 2000–2013 shows a dramatic reversal of the use of the
two networking options—broadband and dial-up; see www.pewinternet.org/2013/
08/26/home-broadband-2013/; however, also according to PewResearchCenter,
home broadband penetration plateaued in 2015, see www.pewinternet.org/2015/12/
21/home-broadband-2015/. A typical effect after getting broadband at home is that
people spend more time on the Internet. Moreover, with flat rates for Internet access
widely available today, many users do not explicitly switch their connection on and
off, but are essentially always connected. As an example, slide 45 of wearesocial.sg/
blog/2014/02/social-digital-mobile-europe-2014/ shows the Internet penetration in
Europe as of February 2014, which a European average of 68%, twice as high as the
global average of 34% (for a 2015 update, showing, for example, the global average
already risen to 42%, see de.slideshare.net/wearesocialsg/digital-social-mobile-in-
2015).
1.3 Hardware Developments 23
To conclude this section, we mention that the future of hardware technology will
see a variety of other developments not discussed here, including nano-sensors, 3D
materials and microchips, or organs-on-chip; for more details we refer the reader to
www.weforum.org/agenda/2016/06/top-10-emerging-technologies-2016.
1.4 Mobile Technologies and Devices
Mobile technology is one example of rapid technology evolution that is changing

the way in which people interact and businesses operate. As mentioned, we have
observed continuous advances in speed, capability and geographical penetration
over the past decades. Not only has this resulted in new ways of operating for
existing businesses, but it has been the enabler of a new “breed” of business where
use of mobile technology forms the basis of their business model. Well known
examples include Uber, Tinder, Foursquare and more recently, Nintendo’s Poke-
mon Go. This section provides a brief historical account of the key developments
culminating in a summary of the current mobile services available to organizations
today. It will also outline the most important technology currently under devel-
opment, Li-Fi, and what, if/when implemented it will mean for wireless commu-
nication of the future. It will then present a summary of the vast array of mobile
devices at our disposal and the various challenges this variety presents organiza-
tions needing to make sound mobile IT decisions.
1.4.1 Mobile Infrastructure
Cellular
Wireless technologies are often described in terms of the cellular generation by
which they were characterized. There are a number of attributes by which each
generation can be described, although it is the data bandwidth of each that provides
the greatest differentiator. Table 1.1 summarizes the generations that have occurred
to-date, along with a prediction of the next wave, that being 5G.
First generation wireless technology (1G) is the original analog, voice-only tele-
phone standard. It was developed in the early 1980s and was what the first cellular
phones were based upon. Various standards were adopted with Advance Mobile
Phone System (AMPS) adopted in North America and Australia, Total Access
Communication System (TACS) employed in the United Kingdom, and Nordic
Mobile Telephone (NMT) in a variety of European countries. Speeds were limited
and users could only call within their own country. 2G or second generation wireless
cellular networks were first launched in Finland on the GSM (Global System for
Mobile Communication) standard. 2G permitted SMS (Short Messaging Service),
picture messages, and MMS (Multimedia Messaging Service) and provided greater
levels of efficiency and security for both the sender and receiver. Compared to 1G, 2G
calls were of higher quality, yet to achieve this, mobile devices needed to be nearer to
24
Table 1.1 Cellular technology generations

Technology 1G 2G 2.5G 3G 4G 5G
Implemented 1984 1991 1999 2002 2010 2020 est.
Service Analog voice, Digital Higher Higher capacity, Higher capacity, High data rates, High mobility,
1
Synchronous voice, capacity, Broadband data Completely IP-Oriented, Wearable devices

data to 9.6 kbps SMS Packetized up to 2 Mbps IP-Oriented, with AI capabilities
data Multimedia data
Standards AMPS, TACS, TDMA, GPRS, WCDMA, Single standard Single standard
NMT, etc. CDMA, EDGE, CDMA2000
GSM, 1XRTT
PDC
Data 1.9 kbps 14.4 kbps 384 kbps 2 Mbps 1000 Mbps 1 Gbps+
bandwidth
The Web from Freshman to Senior in 20+ Years …
1.4 Mobile Technologies and Devices 25
cellphone towers. Technologies were either Time Division Multiple Access (TDMA)
which separates signals into distinct time slots, or Code Division Multiple Access
(CDMA) in which the user is allocated a code that allows them communicate over a
multiplex physical channel. GSM is the most well-known example of a TDMA.
Personal Digital Cellular (PDM) is another. Before the introduction of
third-generation (3G), a revised and improved 2G, known as 2.5G was introduced
that implemented a packet-switched domain in addition to the circuit-switched
domain. Viewed more as an enhancement of 2G rather than a new generation its own
right, 2.5G provided enhanced data rates via these improved standards. 3G offered
significant improvements to data speed and efficiency and via additional services.
These services are required to meet the speed threshold established by IMT-2000 that
being 200 Kbps (0.2 Mbps). Many service providers exceeded this threshold sig-
nificantly with speeds of up to 2 Mbps not uncommon. Enhanced voice quality along
with video was now possible. We are currently in the era of 4G cellular wireless
standards which offers theoretical speeds of up to 1 Gbps, although 100 Mbps is more
realistic. This is more than adequate for high quality mobile video delivery such as
on-demand television and video conferencing. Other benefits include seamless global
roaming, low cost and greater levels of reliability. The next generation of cellular
wireless, 5G, is tentatively projected to arrive in around 2020 and is forecast to be
extremely fast, extremely efficient, and has been described as being “fiber-wireless”.
Wi-Fi
Wi-Fi or WiFi is a wireless standard that lets mobile-enabled devices connect
wirelessly to a WLAN (Wireless Local Area Network). This is the universal
approach for both business and private wireless WLAN users. Wi-Fi first emerged
in the early 1990s and was released for consumer use in 1997 following the
establishment of the IEEE 802.11 committee which oversaw the IEEE 802.11 set of
WLAN standards. It was about two years later when Wi-Fi routers became avail-
able for home use that Wi-Fi really took off. A typical Wi-Fi setup involves one or
more Access Points (APs) and one or more clients. Clients connect to the AP or
hotspot upon receipt of the unique SSID (Service Set Identifier), commonly known
as the network name. Various forms of encryption standards are employed to secure
what is largely an insecure network structure. Common encryption standards
include Wired Equivalent Privacy (WEP) although this has been shown to be lar-
gely insecure. Wi-Fi Protected Access (WPA and WPA2) encryption is much more
common today and significantly more secure that WEP. Wi-Fi has advanced sig-
nificantly over the two decades in terms of data transfer rates.
Table 1.2 below provides a historical account of this evolution. In just 15 years,
Wi-Fi maximum data speeds have increased in magnitude by over 700 times the
initial legacy standard (802.11). It is interesting to note that indoor and outdoor
Wi-Fi ranges have not changed significantly, primarily because range is a product
of the radio frequency at which it operates and only two frequencies are accessible
for Wi-Fi, 2.4 and 5 GHz.
It is also important to note that Wi-Fi is highly susceptible to interference, which
can also severely constrain its effectiveness. Such interference can come from
Table 1.2 Evolution of Wi-Fi Standards

Protocol Release Frequency Max speed Indoor range Outdoor
date (GHz) (Mbps) (m) range (m)
802.11 1997 2.4 1.2 20 100
802.11a 1999 5.8 54 35 120
802.11b 1999 2.4 11 38 140
802.11g 2003 2.4 54 38 140
802.11n 2009 2.4 & 5 150 70 250
802.11ac 2012 5 867 35 120
competing Wi-Fi networks and other electrical equipment operating within the 2.4
or 5 GHz ranges such as cordless phones and baby monitors. The physical envi-
ronment can also cause major service degradation with concrete walls significant
hurdles to overcome. There is little doubt the Wi-Fi will remain the benchmark
WLAN standard for the foreseeable future, and many workplaces, campuses and
even cities are seeking to establish broad Wi-Fi networks through the installation of
overlapping network access points. The first university campus to achieve this was
Carnegie Mellon University (CMU) with their “Wireless Andrew” project that
began in 1993 and was completed in 1999. As of 2010, CMU operated over 1,000
access points across seventy-six buildings and some 370,000 square meters
(Lemstra et al. 2010). Many cities around the world either have city-wide Wi-Fi
coverage or have such projects underway. London, UK and Seoul, South Korea are
two such examples.
Bluetooth
Bluetooth is a short distance, low data volume, wireless standard commonly used
for direct data sharing between paired mobile devices. It is largely included by
default in smartphones, tablets and other mobile devices today. Like Wi-Fi,
Bluetooth operates on a wireless network utilizing a narrow band between 2.4 and
2.485 GHz. It was first developed by Ericsson in the early 1990s. The Bluetooth
standard is managed by an independent special interest group (SIG) which itself
doesn’t actually develop or market Bluetooth devices, but instead manages the
development of the specification, protects the trademarks, and ensures the reliability
of the standard (Bluetooth 2016). Like many other technologies, the Bluetooth
standard has changed and been refined/improved over the years. However the
greatest and most radical change came with the replacement of Bluetooth 3.0 with
Bluetooth 4.0 (also known as Bluetooth Smart) in 2010. A subset of this, and what
is most widely known, is Bluetooth Low Energy or BLE.
Table 1.3 provides a summary of the key differences between the “classic”
Bluetooth standard and Bluetooth Low Energy.
The key difference, as the name suggests, with BLE is power consumption.
Although the peak consumption is comparable, the significantly reduced set-up
time means that BLE connections are actually very brief. It is thus not uncommon
for a BLE device to be able to operate for a number of years on a non-rechargeable
Table 1.3 Bluetooth versus Bluetooth Low Energy

Bluetooth “Classic” Bluetooth Low Energy
Frequency (GHz) 2.4 2.4
Range (m) 30 50
Bit rate 1–3 Mbps 200 Kbps
Set-up time <6 s <3 ms
Peak current consumption (mA) <30 <15
Power consumption 1 as the reference 0.01–0.5
button battery. This provides numerous potential real-work applications and indeed
is the basis of Beacon technology which is discussed in the next section.
Li-Fi and the Future of Wireless Communications
What each of the above mobile technologies, and indeed others not discussed here
such Near Field Communication (NFC) have in common is that they all operate via
radio waves within the electromagnetic spectrum, albeit at its lower end. To-date, it
is the only component of the spectrum that has been successfully used to any real
degree for wireless communication and data transmission. However a significant
limitation exists in that the size of this component of the electromagnetic spectrum
is relatively small, and is also reliant on costly infrastructure to maintain the scale
necessary for the continued growth in use we observe. As a possible solution to
these and other radio wave constraints, University of Edinburgh Physicist Harold
Haas has developed an approach that utilizes the visible spectrum, or as we all
know it, light. His concept is known as Li-Fi and promises to deliver, at some point
in the future, data transfer speeds exponentially faster than what we are currently
used to. One of the key benefits offered by the visible component of the electro-
magnetic spectrum, is the breadth of it, indeed, as shown in Fig. 1.7, the visible
light spectrum is roughly 10,000 times the size of the radio wave spectrum.
Li-Fi works through the use of LEDs which momentarily turn on and off in a
pattern that, when received after being transmitted via light by a receiver, is con-
verted into a computer readable form. While Haas is proposing that LED light be
Frequency / Hz
0 4x10 10 3x10 11 4x10 14 7.9x10 14 3x10 16 3x10 19
Micro- Infra- Ultra- Gamma

Radio Visible X-rays
waves red Violet Rays
X ~ 10,000
Fig. 1.7 Wave spectrum

the primary source of light and the data transfer source, there are indeed many
potential LED light sources that could be used. These include mobile phones and
televisions (Ganesan 2014). In terms of receivers, there is also the potential for
smart phone cameras to be used instead of specialized photo detectors and recei-
vers. A common question about Li-Fi is whether it works in the dark. The answer is
no, however it can work with such a low amount of light, that the light source is not
visible to the human eye. Another characteristic of light is that it cannot penetrate
walls, yet of course can go around corners. While this may reduce the range and
application of Li-Fi, it does provide the opportunity for physical security barriers.
Another key benefit of Li-Fi over Wi-Fi is that it can be used in dangerous envi-
ronments such as nuclear power plants where RF is not permitted. Overall Li-Fi
offers potentially the greatest advancement in wireless communications yet. While
still under development, and with some technical challenges yet to be overcome, it
seems likely that practical applications of Li-Fi will be seen within the next decade.
1.4.2 Mobile Devices
There exists a plethora of mobile-enabled devices today. Few would argue that the
devices we have at our disposal today are, in general, an amazing technical
achievement. They provide unquestionable performance, outstanding graphics,
high-quality multimedia, reliable connectivity, impressive broadband speeds and
long battery life (Qualcomm 2014). Just like traditional computing, mobile devices
are a combination of hardware, software and data processing. They comprise CPUs
and operating systems (OS). They have input and output mechanisms delivered by
mobile user interfaces. Like in traditional computing there exists considerable
variability in all of these attributes which leads to significant issues in compatibility.
An obvious example of this occurs within mobile operating systems. The market is
dominated by Google’s Android, with only Apple’s iOS offering any real com-
petition. Others including Microsoft’s Windows Mobile and Blackberry have, at
times, been more prominent in the marketplace. Android is an operating system
based on the Linux kernel, and designed primarily for touchscreen mobile devices
such as smartphones and tablet computers. Initially developed by Android, Inc.,
which Google backed financially and later bought in 2005. Android was unveiled in
2007. The first Android phone (HTC Dream) was sold in October 2008. Android is
open source and Google releases the source code under the Apache License. Google
Play has one million+ Android apps. The main competitor to Android is iOS
(previously iPhone OS) being a mobile operating system developed and distributed
by Apple Inc. originally unveiled in 2007 for the iPhone. It has been extended to
support other Apple devices such as the iPod Touch (September 2007), iPad
(January 2010), iPad Mini (November 2012) and second-generation Apple TV
(September 2010). Interface control elements consist of sliders, switches, and
buttons. Interaction with the OS includes gestures such as swipe, tap, pinch, and
reverse pinch, all of which have specific definitions within the context of the iOS
operating system and its multi-touch interface. Unlike Microsoft’s Windows Phone
and Google’s Android, Apple does not license iOS for installation on non-Apple
hardware. The market share of mobile OS clearly shows an increasing dominance
of Android, from about 2% in 2009 to more than 80% in 2015, while iOS has been
pretty stable at around 15% during the same period.
Smartphones
A smartphone is primarily a cellular communication device that with the addition of
a mobile operating system, multimedia, a large touch screen, and data connectivity,
operates more as a small computer and it is widely reported that the average
smartphone of today has significantly more computation power than the Apollo
computers that took NASA to the moon in the 1960. Indeed it has been predicted
that in the very near future your smartphone will, for many, be your sole computing
device and used as a replacement, not a supplement, to a traditional computer
(Bonnington 2015). One of the key features of smartphones is their relative low
cost. This has enabled significant Internet penetration within emerging nations
where mobile infrastructure massively exceeds fixed network infrastructure. As a
result, smartphone sales continue to grow with 2016 sales expected to reach 1.5
billion units (Gartner 2016). Indeed smartphone ownership in countries such as
Turkey, Malaysia, Chile and Brazil have increased by over 25% in the past three
years (Poushter 2016).
Tablets
Tablets are effectively large smartphones. In most cases, their primary use is not for
voice communication as with a smartphone, yet many do have cellular (3G) con-
nectivity and there is indeed a combination of a tablet and a smart phone, typically
around five or six inches in size, known as a “Phablet” (Phone crossed with a
Tablet). Tablets themselves are intended to be somewhat of a cross between a
smartphone and a laptop with some tablets offering a screen similar in size to a
standard laptop. Tablets have touch screens and are intended for more convenient
and mobile interaction that a laptop.
One of the most significant future developments of both smartphones and tablets
will be the eventual introduction of flexible screens. The likes of Samsung and LG
are actively working on expandable, bendable, twistable OLED screens that may
very soon allow your smartphone to be folded inside your wallet. This could be a
“game-changer” for requiring multiple devices for different purposes where instead
you could simply roll-out the size screen you desire.
Wearables
The most recent development in the mobile hardware space has been the wearable.
Indeed, 2014 was regarded by some commentators as the year of the wearable
reflective of the large number of devices that entered the market. As the name
would suggest, these are devices that are intended to be worn on the body, and in
most cases are worn on the wrist as a smart watch or fitness tracker. Key players in
this rapidly growing domain include the range of Fitbit (fitbit.com) fitness trackers,
the Samsung Gear range of smartwatches, and the first entrant into smartwatch
market, Pebble (pebble.com) who developed their initial smart watch through in
excess of $US10 million in crowdfunding pledges. Many other smartwatches have
entered the market in recent times including the Apple Watch, the Motorola Moto
360, and the Huawei Smartwatch, to name just a few. The impact such devices will
have is still unclear. At this point in time, fitness trackers do just that, and serve a
niche market, smartwatches with their broader functionality are still seen somewhat
expensive for their novelty value. A range of business uses of wrist-based wearables
are predicted, most of which are intended to provide for flexible ways of com-
municating with others, managing schedules, accessing short documents or memos
when “on the run”, making wireless payments with tools as Apple Pay, translating
short strings of text, or as use as a remote for presentations.
One domain where there exist significant opportunities is within the health and
well-being space, otherwise known as healthware (Patel et al. 2012). Smartwatches
and fitness trackers can be used to guide and monitor injury rehabilitation or illness
recovery. Collected data can be analyzed and monitored by medical practitioners
and, where appropriate, treatments revised. The key component element that
enables this is the various sensors contained within these smart devices. These
include, for example, heart-rate monitors, GPS receivers, thermometers,
accelerometers, altimeters, barometers, and compasses. Only time will tell whether
wrist-based smart watches and fitness trackers will have the disruptive effect that
many are predicting.
Other wearables such as smart glasses, made most famous by the now discon-
tinued Google Glass, smart rings, smart headphones, tags that attach to shoes and
various forms of wearable clothing are all under varying stages of development.
Smart glasses offer significant opportunities for carrying out activities where
hands-free operation is either desirable or necessary. They are able to carry out
basic functions such as reading and voice-to-tech writing of e-mails or text mes-
sages, making notes etc., they can be used for basic navigation (The Verge 2013),
and they can be used for viewing video and images. Prototypical smart glass
applications have been developed for a variety of inspection-based activities where
hands-free operation is required. One such application includes the inspection of
high-voltage power pylons where the inspector needs to climb up the pylon and
look for possible faults or damage. Using smart glasses allows them to take a
photograph of what is observed, provide an audio commentary of what has been
photographed, record the specific location via GPS, and send all of the information
to the cloud via a cellular network; all without needing to release their hand from
what they are climbing on. Another future possible application is the analysis of the
users viewing patterns. This could be applied to customers viewing items either in a
store or online. This information would allow the store to provide a more
customer-focused experience and therefore increase profit (Rallapalli and Austin
2014). There are, however, a number of potentially limiting factors of smart glasses,
not least the potential for accidents occurring as a result of being distracted or the
possible health effects that might result from long-term use.
More recent developments with the likes of smart rings, e.g. NFC ring (nfcring.
com) or the Kerv (kerv.com) may become viable in the future as tools for con-
tactless payments or to unlock doors. However the technology is currently not
particularly advanced. This also applies to what is termed the “hearable”; a smart
devices that sit inside the ear, much like a hearing aid. The most prominent of these
right now is the Bragi Dash (bragi.com) which as a very minimum provides
wireless headphone functionality, but also purports to operate as a fitness tracker,
heart-rate monitor, wireless phone and much more.
Mobile devices are not without their challenges, however. Three such limitations
regularly receive attention. Safety and security is the primary issue for developers
and users alike. Users fear that their devices will be attacked by viruses, resulting in
the theft of personal data and people generally feel safer using their mobile devices at
home where the familiarity of the home setting gives the user the perception of safety
and security. Users are also concerned about slow or unstable connections. They fear
they may be cut off in the middle of an e-commerce transaction and so developers
need to ensure e-commerce platforms account for the eventuality. This issue is will
reduce as we move to faster, more reliable networks. Finally, and this is the issue
which is most difficult to address, users, in general, don’t like the small screen size.
The primary compliant is centered on the use of mobile devices for shopping, the
inability for users to get a good look and feel for the products they are considering
purchasing. In general, unless a buyer is familiar with a product or the product’s
appearance doesn’t matter, users are hesitant to buy an item on a smartphone.
1.5 From a Flat World to a Fast World that Keeps

Accelerating
In order to understand some of the statements we are going to make in this book,
and to put some of the developments on which we report into perspective, it makes
sense to deviate a little from the core topic of this chapter, the development of the
Web and its accompanying hardware and software evolution. For a brief moment
we take the view of a journalist, Tom Friedman, foreign affairs columnist for the
New York Times and three-time Pulitzer Prize winner, on globalization and how
the world has changed over the last 25–30 years in light of the technological
developments that we have mentioned.
As described in Friedman (2005), shortly after the turn of the century the world
became a “flat” one in which people from opposite ends of the planet can all of a
sudden interact, play, do business with each other, and collaborate, and all of that
without knowing each other or having met, and where companies can pursue their
business in any part of the world depending on what suits their goals and intentions
best; they can also look at an entire world of customer base. A typical example of
what was enabled at the time thanks to the Web is described by Richard MacManus
in a February 2016 blog post: “From 2003–2012 I built up and ran a technology
blog7 called ReadWriteWeb. At its peak it had over twenty people working for it,
nearly all of them in the US. The fact I could manage this business virtually, from
New Zealand, showed the power of the Internet tools the blog evangelized.
7
Blogs are the subject of the next section.
The RWW team communicated via Skype (these days we’d do it on Slack, but
Skype did the job back then). We published on Movable Type. We managed
projects on Basecamp. We scheduled meetings with GoToMeeting. We kept track
of the editorial calendar using Google Calendar” (see augintel.com/2016/02/03/
bitcoin-online-payments/).
Thus, there are essentially no more significant limits to what anyone can
accomplish in the world these days, since the infrastructure we can rely upon and
the organizational frameworks within which we can move allow for so many
unconventional and innovative ways of communicating, working together, col-
laborating, and information exchange. In total, Friedman (2005) accounts for 10
flatteners which are:
1. The fall of the Berlin wall on November 9, 1989, when Eastern Europe opened
up as a new market, as a huge resource for a cheap, yet generally well-educated
work force, and as an area with enormous demand for investments and reno-
vation. Globalization swapped from the West across Eastern Europe and
extended deeply into Asia.
2. The Netscape IPO on August 9, 1995, when for the first time it was demon-
strated that one can make money through the Internet, in particular with a
company whose business model does not immediately imply major revenues.
3. Software with compatible interfaces and file formats as well as workflow
software which can connect people all over the world by chaining together what
they are doing into a comprehensive whole. New forms of collaboration and
distribution of work can be run over the Internet, and jobs become location- and
time-independent. A division of labour in specialized tasks has moved from a
regional or domestic to an international scale.
4. Open sourcing, or the idea of self-organizing collaborative communities which
in particular are capable of running large software projects. Prominent exam-
ples include the GNU/Linux operating system, the Mozilla Firefox browser
project, and the Moodle e-learning software system. In all cases, complex
software with numerous components has been developed in a huge common
effort, and is being maintained by a community of developers which respond to
bugs and failures with an efficiency unknown to (and vastly impossible for)
commercial software companies. The modern term for this is crowdsourcing.
5. Out-sourcing, where companies concentrate on their core business and leave
the rest to others who can do it cheaper and often more efficiently. Out-sourcing
occurs at a global scale, i.e., is not restricted to national boundaries or regional
constraints anymore.
6. Off-shoring, which means going way beyond outsourcing; indeed, the next step
is to take entire production lines to an area of the world where labour is cheaper.
This particularly refers to China, but also again to India or countries like Russia
and Brazil.
7. Supply-chaining or the idea of streamlining supply and production processes on
a world-wide basis, for example through the introduction of RFID tags (or more
recently Beacon) technology. Supply chains have become truly global today,
1.5 From a Flat World to a Fast World that Keeps Accelerating 33
with their various aspects of inbound or outbound logistics, supply chain

integration, purchasing, capacity planning, inventory management, just-in-time
processes often scaled to be dealt with internationally.
8. In-sourcing, which is the opposite of outsourcing. It sometimes makes sense to
bring specific functions into a company in order to have them executed more
efficiently.
9. In-forming thanks to search engines such as Google, Yahoo!, or Bing (as well
as the many others). In the flat world, knowledge and entertainment can be
obtained anytime and anywhere. Information is accessed through search
engines, emails are read on the move, and movies are downloaded on demand.
A 21st century person does thus no longer depend on printed newspapers,
physical office space, or the local library.
10. Finally, the steroids, i.e., the technological developments which have made
everything digital, mobile, personal, virtual, such as high-speed cabling,
wireless computer access, modern personal digital assistants (PDAs), cell
phones, data servers as a commodity, cheap personal and laptop computers with
high computational capabilities, huge storage capacities, and excellent
input/output facilities.
Obviously, not all flatteners are related to the Internet and the Web, yet all of these
developments, which not only go together, but influence each other, are heavily
relying on efficient communication networks and on tools such as the Web for
utilizing them. In the flat world, it became possible to access arbitrarily remote
information in an easy and vastly intuitive way (“the global village”), in particular
information of which it had not been known before that it existed. One of the
slogans now was to have “information at your fingertips,” and search engines were
one of the major support tool making this possible.
Friedman described these flatteners originally in 2004, a time when smartphones,
tablet computers or Facebook were not yet around, when, as Friedman himself has
put it in a speech, “Twitter was a sound, the cloud was in the sky, 4G was a parking
spot, LinkedIn was a prison, and Skype was a typo.” 10+ years later, many of the
flatteners still apply, but several have considerably advanced. So it does not come as
a surprise that Friedman and Mandelbaum (2011) took another look at globalization
and its “clash” with the IT revolution. They noticed that, due to the developments
we have sketched above, the world has indeed transitioned from “flat” to “fast”. As
Friedman describes it in a November 2014 NYT column, “The three biggest forces
on the planet—the market, Mother Nature and Moore’s Law—are all surging, really
fast, at the same time. The market, i.e., globalization, is tying economies more
tightly together than ever before, making our workers, investors and markets much
more interdependent and exposed to global trends, without walls to protect them.
Moore’s Law … is, as Andrew McAfee and Erik Brynjolfsson posit in their book,
“The Second Machine Age,8” so relentlessly increasing the power of software,
computers and robots that they’re now replacing many more traditional white- and
8
Winner of the German Handelsblatt “Wirtschaftsbuchpreis 2015”.
blue-collar jobs, while spinning off new ones—all of which require more skills.
And the rapid growth of carbon in our atmosphere and environmental degradation
and deforestation because of population growth on earth—the only home we have
—are destabilizing Mother Nature’s ecosystems faster” (www.nytimes.com/2014/
11/05/opinion/the-world-is-fast.html).
As Friedman and Mandelbaum note, these developments have a profound
influence on how we live and how we work, how we educate students, and how we
conduct business. The world is no longer just connected, but “hyper-connected”
and hence ultra-fast in interactions, but also in changes and disruptive develop-
ments. High-speed networking is nowadays even available on the top of Mount
Everest, cheap labour and even cheap genius is always available from any corner of
the world, via the “crowd”, and rapid changes occur permanently and everywhere;
we will discuss the topic of disruption and disruptive innovation in more detail in
Chap. 5. In order to get along in a world like this, Friedman and Mandelbaum
suggest five behavioural patterns that everyone should adopt:
1. Think as a new immigrant, i.e., pursue opportunities more energetically, per-

sistently and creatively than anybody else; and act with the you-only-live-once
attitude of a new immigrant while remembering that anything can be taken away
in a flash and nothing is owed to you.
2. Think like an artisan, i.e., do everything one-off, do your work every day with so
much pride and extra effort that you want to “carve your initials into it.”
3. Always be In Beta, i.e., always think of yourself as a work in progress: iterate,
polish, iterate, etc., but never finished.
4. No more 401K world: The hyper-connected world comes with less defined
benefits (like a US 401K pension plan), but more offered opportunities and
defined contributions. The big divide is not just the digital divide; the real one is
the motivational divide.
5. Always think like a waitress (actually the one in his favourite pancake house in
his home town) and be relentlessly entrepreneurial.
In his most recent book, Friedman (2016) pins the developments he has repeatedly
reported on in essence to the year 2007. In his November 2016 blog post, con-
curring with the publication of that book, he writes: “Steve Jobs and Apple released
the first iPhone in 2007, starting the smartphone revolution that is now putting an
internet-connected computer in the palm of everyone on the planet. In late 2006,
Facebook, which had been confined to universities and high schools, opened itself
to anyone with an email address and exploded globally. Twitter was created in
2006, but took off in 2007. In 2007, Hadoop, the most important software you’ve
never heard of, began expanding the ability of any company to store and analyze
enormous amounts of unstructured data. This helped enable both Big Data and
cloud computing. Indeed, “the cloud” really took off in 2007. In 2007, the Kindle
kicked off the e-book revolution and Google introduced Android. In 2007, IBM
started Watson—the world’s first cognitive computer that today can understand
virtually every paper ever written on cancer and suggest to doctors highly accurate
1.5 From a Flat World to a Fast World that Keeps Accelerating 35
diagnoses and treatment options. Further, have you ever looked at a graph of the
cost of sequencing a human genome? It goes from $100 million in the early 2000s
and begins to fall dramatically starting around … 2007. The cost of making solar
panels began to decline sharply in 2007. Airbnb was conceived in 2007 and change.
org started in 2007. GitHub, now the world’s largest open-source software sharing
library, was opened in 2007. And in 2007 Intel for the first time introduced
non-Silicon materials into its microchip transistors, thus extending the duration of
Moore’s Law—the expectation that the power of microchips would double roughly
every two years. As a result, the exponential growth in computing power continues
to this day. Finally, in 2006, the internet crossed well over a billion users world-
wide” (see www.nytimes.com/2016/11/20/opinion/sunday/dancing-in-a-hurricane.
html). He continues to argue that three fundamental developments, namely in
computing, in globalization, and in climate change, are accelerating simultaneously
and are impacting each other, and that the ordinary person has increasing difficulties
to keep up with them and tends to feel more and more uncomfortable. He compares
these accelerations to a hurricane in which we are asked to dance.
In later sections and chapters we will see a variety of examples in which these
accelerations manifest. We will also remind the reader of Friedman’s perception in
places where it is appropriate, yet it makes sense to keep that in mind already at this
point.
1.6 Socialization. Comprehensive User Involvement
In addition to Tom Friedman, several generations of users have changed their

perception of and participation in the Web over the years, and have grown
accustomed to the Web as a communication medium, a socialization platform, as a
discussion forum, as a business platform, as a storage device for their diaries or
calendars, as a constantly growing and expanding dynamic encyclopedia, and even
as a dating platform. Few noticed the arrival of the Web in 1993; for the average
person it took a few years to recognize that the Web was around and what it could
do for her or him. The same people will meanwhile have become comprehensively
familiar with electronic mail, both in their business and private lives, electronic
banking and portfolio management, search for arbitrary information online, and
many other things that can easily be done over the Web today.
An important development of the Web has seen it transformed from a medium
where approved “authors” were permitted to publish material for the consumption
by the majority of Web users or “readers”, to one where everyone is an author. The
concept of user generated content (UGC) is quite possibly more significant than the
initial introduction of the Web itself and is indeed now described as being a vital
component of the online ecosystem (e.g. Rangwala and Jamali 2010). Blogs and
Wikis were among the first UGC tools, followed by a variety of social networking
services and crowd sourcing.
1.6.1 Blogs and Wikis
We have occasionally mentioned throughout this chapter that users have started to
use the Web as a medium in which they can easily and freely express themselves,
and by doing so online they can reach a high number of other people most of which
they will not even know. Two forms of user-generated content that became popular
about 10–15 years ago are the following: Blogs (such as the ReadWriteWeb
mentioned earlier) are typically expressions of personal or professional opinion or
experience which other people can at most comment; wikis are pages or systems of
pages describing content that other people can directly edit and hence extend,
update, modify, or delete. Both communication forms have contributed significantly
to the read/write nature of the Web and were indications of the transition that has
been coined “Web 1.0” to “Web 2.0” and which started happening around 2004.
Blogs
An effect the Web has seen and that has made it highly popular is that anybody can
write comments on products or sellers, on trips or special offers, on political
developments, and more generally on almost any topic, be it serious or not; people
can even write about themselves, or comment on any issue even without a particular
cause (such as a prior shopping experience). At the starting point of Web 2.0 were
blogs and a new form of activity called blogging. In essence, a blog is an online
diary or a journal that a person is keeping and updating on an ad-hoc or a regular
basis. The word itself is a shortened version of Web log and is meant to resemble
the logs kept by the captain of a ship as a written record of daily activities and
documentation describing a journey of the ship.
A blog on the Web is typically a sequence of texts in which entries appear in
reverse order of publication so that the most recent entry is always shown first. In its
most basic form, a blog consists of text only. Without any additional features and in
particular if separated from subscriptions, a blog is hence no more than a kind of
diary that may be kept by anybody, e.g., private persons, people in prominent
positions, politicians, movie stars, musicians, companies, or company CEOs.
However, most blogs go way beyond a simple functionality today.
As a first example of a blog, consider Slashdot (www.slashdot.org). It was
started in 1997 by Rob Malda for publishing “news for nerds, stuff that matters.” He
still maintains the blog today and has created one of the most lively sites for Linux
kernel news, cartoons, open-source projects, Internet law, and many other issues,
categorized into areas such as Books, Developers, Games, Hardware, Interviews,
IT, Linux, Politics, or Science. Each entry is attached to a discussion forum where
comments can even be placed anonymously. A prominent example of an early
company blog was the FastLane blog by General Motors Vice Chairman Bob Lutz,
in which he and other GM people reported on new developments or events in any of
the GM companies, or responded to consumer enquiries. It was a good example of
how an enterprise can develop new strategies for its external (but also its internal)
communication. German GM subsidiary Opel even went so far to maintain a car
blog while its Insignia model was still under development, with the effect that
40,000 units had been sold before the car even arrived at dealerships!
1.6 Socialization. Comprehensive User Involvement 37
While blogging services are often for free, i.e., users can create and maintain a
blog of their own without any charges, they typically have to accept advertising
around their entries. The Dilbert blog is maintained by Typepad, which hosts free
blogs only for a trial period, but which explains a number of aspects why people
actually do blog, namely to broadcast personal news to the world, to share a passion
or a hobby, to find a new job, or to write about their current one. Providers where a
blog can be set up (typically within a few minutes) include Blogger, Blogging, or
WordPress. UserLand was among the first to produce professional blogging soft-
ware called Radio (radio.userland.com). If a blog is set up with a provider, it will be
often the case that the blog is immediately created in a format so that readers of the
blog will be informed about new entries.
The activity of blogging, typically enhanced with images, audio or video, was
the successor to bulletin boards and forums, which have existed on the Internet
roughly since the mid-90s. Their numbers peaked in the early 2000s, which is why
the advertising industry took a close look at them. And in a similar way as com-
menting on products on commerce sites or evaluating sellers on auction sites have
done, blogging is contributing to consumer behavior, since an individual user can
now express his or her opinion without someone else executing control over it. The
party hosting a blog also has an ethical responsibility and can block a blog or take it
off-line, yet people can basically post their opinions freely. Many blogs take this
issue seriously and follow some rules or code of ethics. On a related topic, ethical
implications of new technologies have been investigated by Rundle and Conley
(2007).
Studies are showing that trust in private opinion is generally high. Blogs may
also be moderated which typically applies to company blogs; see www.blog.wan-
ifra.org/2016/07/04/five-lessons-on-managing-online-comments for an account of
how to deal with online comments. Blogs are also indexed by search engines and
are visited by crawlers on a regular basis. Since blogs can contain links to other
blogs and other sites on Web and links can be seen as a way for bloggers to refer to
and collaborate with each other, and since link analysis mechanisms such as
Google’s PageRank give higher preference to sites with more incoming links,
bloggers can obviously influence the ranking of sites at search engines. And
blogging has also opened the door for new forms of misuse. For example, blogs or
blog entries can be requested in the sense that people are asked to write nice things
about a product or an employer into a blog, and they might even get paid for this;
see www.nytimes.com/2011/08/20/technology/finding-fake-reviews-online.html
for one of many articles written about this subject, and see Ott et al. (2011) for
an algorithmic approach to discovering what the authors call “opinion spam.”
Conversely, a blog could be used for mobbing a person or a product, and can
become the target of an Internet troll. An easy way to avoid some of these misuses
is to require that blog writers have to identify themselves.
Wikis
A second prominent form of user participation on and contribution to the Web is
represented by wikis. A wiki is a Web page or a collection of pages that allows its
users to add, remove, and generally edit some of the available content, sometimes
without the need for prior registration if the wiki is a public one. Thus, a wiki is an
editable page or page collection that does not even require its users to know how to
write a document in HTML. The term “wiki” is derived from the Hawaiian word
“wikiwiki” which means “fast.” Thus, the name suggests having a fast medium for
collaborative publication of content on the Web. A distinction is commonly made
between a single “wiki page” and “the wiki” as an entire site of pages that are
connected through many links and which is in effect a simple, easy-to-use, and
user-maintained database for creating content.
The history of wikis started in March 1995, when Ward Cunningham, a software
designer from Portland, Oregon, was working on software design patterns and
wanted to create a database of patterns in which other designers could contribute by
refining existing patterns or by adding new ones. He extended his already existing
“Portland Pattern Repository” by a database for patterns which he called WikiWi-
kiWeb. The goal was to fill the database with content quickly, and in order to
achieve this, he implemented a simple idea which can still be seen at wiki.c2.com/?
SoftwareDesignPatterns: Each page had at its bottom a link entitled “EditText”
which could be used to edit the text in the core of the page directly in the browser.
Users could write within a bordered area, and they could save their edits after
entering a code number provided. The important point is that no HTML was needed
to edit the page; instead, the new or modified content was converted to HTML by
appropriate wiki software. Other wikis operate just like this one.
Often, there is no review before the modifications a user has made to a wiki page
are accepted and, commonly, edits can be made in real-time, and appear online
immediately. There are systems that allow or require a login which allows signed
edits, through which the author of a modification can be identified. In particular, a
log-in is often required for private wiki servers only after which a user is able to edit
or read the contents. Most wiki systems have the ability to record changes so that an
edit can be undone and the respective page be brought back into any of its previous
states. They can also show most recent changes and support a history, and often
there is a “diff” function that helps readers to locate differences to a previous edit or
between two revisions of the same page. As with blogs, there is an obvious pos-
sibility to abuse a wiki system and input garbage.
There are numerous systems for creating and maintaining wikis, most of which
are open source (see c2.com/cgi/wiki?WikiEngines for a pretty up-to-date listing).
It also does not surprise that there are meanwhile wikis for almost each and every
topic, a prominent example being The Hitchhiker’s Guide to the Galaxy (h2g2.com
). This wiki was created by The Digital Village, a company owned by author
Douglas Adams who also wrote the famous book by the same title, and was taken
over by the BBC after Adams’ untimely demise. Adams created it in an attempt to
realize his vision of an open encyclopedia authored solely by its users.
Wikipedia
One of the largest wikis today is the multi-lingual Wikipedia project which contains
more than 40 million pages in over 250 different languages and a large number of
images. Wikipedia started out as an experiment in 2001 to create the largest online
encyclopedia ever, but was soon growing faster than most other wikis. It soon got
mentioned in blogs and various print media including The New York Times.
Wikipedia has gone through several cycles of software development, and has
always strictly separated content from comments and from pages about Wikipedia
itself. Wikipedia has many distinct features that make its use transparent, among
them instructions on citations, which anybody who wants to refer to a Wikipedia
article elsewhere can easily download.
The Wikipedia administration has established strict rules regarding content or
users, in a similar spirit as blogs establish ethical rules. For example, a page to be
deleted must be entered into a “votes for deletion” page, where users can object to
its deletion; this reflects the Wikipedia idea of making decisions vastly uniform. By
a similar token, a user may only be instantly excluded from contributing to
Wikipedia in a case of vandalism, which has turned out to be rare. In general,
discussions that take place about the content of articles are most of the time highly
civilized, and Wikipedia has become prototypical for proving that the ease of
interaction and operation make wikis an effective tool for collaborative authoring.
The accuracy of Wikipedia has regularly been questioned. It was first empirically
tested in a study published by the British science journal Nature in December 2005. In
an article entitled Internet encyclopedias go head to head, Nature wrote that “Jimmy
Wales’ Wikipedia comes close to Britannica in terms of the accuracy of its science
entries, a Nature investigation finds”. For this study, Nature had chosen articles from
both Wikipedia and the Encyclopedia Britannica in a wide range of topics and had
sent them to experts for peer review (i.e., without indicating the source of an article).
The experts compared the articles one by one from each site on a given topic. 42 of the
returned reviews turned out to be usable, and Nature found just eight serious errors in
the articles, of which four came from either site. However, the reviewers discovered a
series of factual errors, omissions, or misleading statements; in total Wikipedia had
162 of them, while the Encyclopedia Britannica had 123. This averages to 2.92
mistakes per article for the latter and 3.86 for Wikipedia. While it may not be as quite
as accurate as Encyclopedia Britannica, is it much larger with approximately 60 times
as many words in the English Wikipedia site as in the encyclopedia.
The reliability of Wikipedia, discussed at en.wikipedia.org/wiki/Reliability_of_
Wikipedia in Wikipedia itself, may in part be due to the fact that its community has
developed several methods for evaluating the quality of an article, including
stylistic recommendations, tips for writing good articles, a “cleanup” listing articles
that need improvement, or an arbitration committee for complex user conflicts,
where these methods can vary from one country to another; some articles show in
their header that they “need improvement.”. The bottom-line is that Wikipedia is
one of the best examples for an online community that, in spite of permanent
warnings, works extremely well, that has many beneficiaries all over the world, that
is in wide use both online and off-line, and that enjoys a high degree of trust.
Wikipedia also is an excellent example of a platform that is social in the sense that
it gets better the more people use it, since more people can contribute more
knowledge, or can correct details in existing knowledge for which they are experts.
Wikipedia has meanwhile become part of Wikimedia, a “global movement

whose mission is to bring free educational content to the world” that contains a
number of other projects, including Wiktionary, WikiBooks, WikiNews, or Wiki-
Media (see www.wikimedia.org/). Wikipedia’s statistics continue to be highly
impressive and can be monitored at en.wikipedia.org/wiki/Wikipedia:Statistics or at
stats.wikimedia.org/.
Social software like wikis enable the creation of communication and relation-
ships between individuals as well as between groups, and it supports the estab-
lishment, maintenance, and extension of social networks. We will see next that this
concept of social software can be found elsewhere as well.
1.6.2 Social Networks
According to Levene (2010), social networks bring another dimension to the Web
by going way beyond simple links between Web pages; they add links between
people as well as links between communities. In such a network, direct links will
typically point to our closest friends and colleagues, indirect links lead to friends of
a friend, and so on. In terms of Fig. 1.3, social networks can be seen as graphs
where nodes now represent individuals, and edges represent relationships between
them; or they can be seen as “topic maps” where the nodes represent topics that
people write about or hashtags that they attach to writings, and edges are references
between these topics or hashtags. As an example, wiki.digitalmethods.net/Dmi/
StartingPoints2 shows such a network of hashtags that connect or reference posts
related to oil spill. In this section, we take a brief look at social networks and the
impact they are having on the Web today.
We note that the younger generation today has vastly abandoned traditional
media such as newspapers. Indeed, investigations such as those done by the British
Office of Communications show that the “networked generation” is driving a radical
shift in media consumption. British broadband users already in 2006 spent on
average 12.7 hours per week online, growing to more than 31 hours in 2015. Much
of this time is spent on online social networking sites such as Facebook. Moreover,
the 16–24 year olds are spurning television, radio, and newspapers in favour of
online services. Increasingly households are turning to digital TV, TV or video on
demand and streaming services (e.g., Netflix) enabled by modern broadband con-
nections. Thus, while it initially took a while for average users and consumers to
accept the Web as a medium or platform, or indeed trusted it, the generation of
“digital natives,” a term attributed to American writer Marc Prensky, is now growing
up with it and integrating it into everyday life much more comprehensively.
The information available on the Internet and the Web as well as the tools by
which this information has meanwhile become accessible has led to the establish-
ment of a number of distinct Internet communities, i.e., groups of people with
common interests who interact through Internet and the Web. Today, we can
identify at least the following types of communities:
• Communities of transactions which are characterized by the fact that they

facilitate buying and selling as well as auctioning;
• Communities of interest which commonly center around a specific topic, e.g.,
movies, diecast cars, health food, dietary supplements;
• Communities of relations which are organized around life experiences, e.g.,
traveling in New Zealand on a budget, coping with breast cancer, or coming
from the same high school;
• Communities of fantasy which are based on imaginary environments and game
playing, e.g., Entropia Universe, World of Warcraft or Pokemon Go.
A prominent example and at the same time one of the oldest communities of
interests on the Internet is the Internet Movie Database (IMDb), which emerged in
1990 from a newsgroup maintained by movie fans into the biggest database on
movies in cinemas and on DVD and which is nowadays owned by Amazon.
The considerable change in perception and usage has opened the door for the
present-day willingness of people to share all kinds of information, private or
otherwise, on the Web, for a new open culture as observable in blogs, wikis, and
social networks like Facebook, Snapchat, Twitter, or LinkedIn. The Internet offers
various ways to make contact with other people, including e-mail, chat rooms,
online dating sites, blogs and discussion boards (which, unfortunately, are also
heavily exploited by spammers, hackers, and other users with unethical intentions);
Fig. 1.8 summarizes the most important categories of Web tools for personal
communication and information management today and gives examples in each
category. However, while these services often just support ad-hoc interaction or
focused discussions on a particular topic (or less focused conversations on the
world at large), an online social network goes a step further and is typically the
result of employing some software that is intended to focus on building an online
community for a specific purpose. Many social networking services are also blog
CommuniƟes
E-Mail
Fora, Blogs
Social Network Sites Instant Messenger

Xing, LinkedIn Skype, WhatsApp,
Facebook, Snapchat Google Talk, FB Messenger
Massively MulƟplayer Online
Knowledge/
Games (MMOGs)
Data Management
World of WarcraŌ,
Wikis, Picasa, Calendar
Counter Strike etc.
Virtual Worlds Microblogging

Second Life, Habbo TwiƩer, Tumblr
Webtools
Mobile Apps
Maps, Google Groups,
e.g., for Android or iOS
Adobe Connect
Fig. 1.8 Tool types and sample tools for personal communication and information management
hosting services where people can deliver longer statements than they would
otherwise, and the distinction between blogs and social networks is often blurred.
Social networks connect people with different interests, and these interests could
relate to a specific hobby, a medical problem, an interest in some specific art or
culture. Often, members initially connect to their friends which they know, for
example, from school or college and later add new contacts, e.g., from their pro-
fessional life, often found through the Web.
A social network can also act as a means of connecting employees of distinct
expertise across departments and company branches and help them build profiles in
an easy way, and it can do so cheaper and more flexible than traditional (knowledge
management) systems. Once a profile has been set up and published within the
network, others can search for people with particular knowledge and connect to
them. A typical example of a social network often used professionally and for
business purposes is LinkedIn, a network that connects people, but also businesses
by industry, functions, geography and areas of interest. Meetup.com is a social
events calendar in which registered users can post event entries and share them with
other users; it currently has more than 250,000 groups in more than 180 countries.
Since social networks can usually be set up free of charge (or for a low fee), they
are an attractive opportunity for a company to create and expand their internal
contact base. However, a social network needs not be restricted to a company’s
internal operation; it may as well include the customers to which the company sells
goods or services (and hence be used for customer relationship management).
Notice that many of the tools mentioned in Fig. 1.8 are also in professional use
today.
An early social networking site that was most popular among members of the
networked generation around 2005 is MySpace, a Web site that facilitated an
interactive, user-supplied network of friends, personal profiles, blogs, groups,
photos, music, and videos. MySpace, along with others of its time, such as
bebo.com were slowly abandoned when Facebook gained traction; in their own
words, Facebook is “a social utility that connects people with friends and others
who work, study and live around them;” its story is well documented by Kirkpatrick
(2011). Similarly, Twitter has been successful as a social network and “mi-
croblogging” site, in which people can follow others and until recently could only
post entries of no more than 140 characters; for its evolution, see Bilton (2014).
We finally mention YouTube in this context, which can be seen as a mixture of a
video blogging site and a social networking site, but which exhibits an interesting
phenomenon: Due to the size of networks now existing on the Web, and due to the
enormous functionality which the sites mentioned above offer, IT and media
companies have discovered opportunities in the use of social networks. Like much
of the Internet industry, there is a rich history of acquisitions, takeovers and indeed
failure. Acquisitions include Google’s purchase of YouTube, while Rupert Mur-
doch of News Corporation bought MySpace (see the cover story in Wired maga-
zine, July 2006), a purchase he definitely regretted later, Facebooks takeover of
WhatsApp in 2014, or Microsoft’s acquisition of LinkedIn in mid-2016.
Social networks on the Web have also triggered a renewed interest in socio-
logical questions regarding how the Internet and the Web are changing our social
behavior (including the question of why people send out spam emails, act as
Internet trolls, and try to hack into other people’s accounts), how communities are
formed, or how news spreads across the Web. Social network analysis investigates
metrics that measure the characteristics of the relationships between the participants
of a network. In particular, such an analysis looks for ties (between pairs of par-
ticipants) and their strength as well as their specialization into bridges, for triangles
(involving three people), and for issues such as the clustering coefficient of a
participant or the density of the network or the degree of separation. For a brief
introduction to the former, the reader is referred to Levene (2010), for a more
detailed one to Scott (2013) or Borgatti et al. (2013). Watts (2004) found that the
average paths length for an e-mail message traveling the Web from sender to
receiver was around six; more recently, it was reported that Facebook found an
even smaller number: “The social media giant released a report on its blog
Thursday announcing ‘each person in the world’ is separated from every other by
‘an average of three and a half other people” (www.nytimes.com/2016/02/05/
technology/six-degrees-of-separation-facebook-finds-a-smaller-number.html).
In conclusion, the current expectation for several years to come is that user
participation in and contribution to the Web, which has increased considerably in
recent years, will continue to grow, and that the social life of individuals, families,
and larger communities of people will increasingly be enriched by online appli-
cations from social Web networks. It remains to be seen whether this will indeed
have anything more than local effects in the sense that people may now take their
network of friends online, but do not necessarily include people they have never
met or that they most likely will never meet. On the other hand and as a result of
this development, the social Web has become increasingly relevant to companies as
a vehicle for marketing, advertising, and communication internally as well as with
their customers.
1.6.3 The Crowd as Your Next Community
The various developments we have described above, among them Freidman open
sourcing flattener or the Wikipedia movement, have led to various novel forms of
collaboration among people world-wide and to new models of doing business
which we will briefly discuss next.
Crowdsourcing
Open sourcing refers to the concept of outsourcing a task or a project to a com-
munity or “crowd” of contributors. As we saw with Wikipedia, the concept has
meanwhile found a host of additional applications beyond the development of
open-source software, and the more recent denomination is crowdsourcing.
Crowdsourcing can be seen as an offspring of both blogging (a hierarchical orga-
nization, where one blogger talks to a crowd and may allow or reject comments)
and wikis, where a community works on a common task. The term crowdsourcing
was first coined in 2005 by Jeff Howe and Mark Robinson, editors of Wired
magazine (note: Wired also introduced the term “long tail” as previously discussed
in this chapter) and can be defined “the act of a company or institution taking a
function once performed by employees and outsourcing it to an undefined (and
generally large) network of people in the form of an open call” (Howe 2009).
While a new(ish) term, the concept underlying crowdsourcing is indeed cen-
turies old and began with the development of the vacuum-sealed pocket watch, also
known as the ‘marine chronometer’. In 1714 the British Government offered a prize
of £20,000 for the first person who could develop a device to aid navigation and
help prevent the loss of sailors. John Harrison subsequently developed the marine
chronometer that was accurate in determining longitude by means of celestial
navigation. In the early part of the 20th century, Toyota ran a competition to
redesign its logo. The competition was won, out of 27,000 entries, by a design that
included the three Japanese katakana letters for “Toyoda” in a circle. This was later
revised to “Toyota”. In 1955, an architectural contest was run to design a landmark
building to be located in the harbor of Sydney, Australia. The now famous Sydney
Opera House was judged the best design out of the 233 entries. In terms of
crowdsourcing on the Web, Wikipedia, by way of crowd size, is an example of
knowledge crowdsourcing—even though it is more commonly referred to as a Wiki
and pre-dates the introduction of crowdsourcing as a concept.
Online crowdsourcing first became popular in the form of crowdjobbing. One of
the oldest such platforms on the Web is Amazon’s Mechanical Turk (AMT). AMT
is typically called upon for tasks that are easy for humans, yet difficult for a
computer. For example, analyzing a large body of photos for an occurrence of a
certain person can be done by a human in a glance, while a computer needs to
perform pattern matching on each photo and search for one of potentially several
patterns in which the person in question could occur in one of the pictures. For tasks
like these, AMT acts as an intermediary for the Human Intelligence Task (HIT),
which is specified by a requester who wants to outsource it and typically equipped
with a (generally small) reward. People interested in working on a HIT, so-called
workers (also known as Providers or Turks), can download it, solve it, and return
the results. This principle is shown at www.mturk.com/mturk/welcome.
In general, crowdjobbing encapsulates a range of crowdsourcing tasks that are
concerned with the access to, and use of, labor, typically based on AMT. One of the
key characteristics of crowdjobbing is that in many cases, large, often complex
projects are broken down into much simpler tasks that, after completion by
anonymous crowd members, can be “re-assembled” to form the completed project;
see Kittur et al. (2013) for details. This approach gains the greatest benefit from a
large and disparate crowd that collectively may have the necessary skills, yet few
individual members may be able or willing to complete the full project. Depending
on their size and scale, tasks are often classified as either micro or macro tasks.
Micro tasks are HITs and can vary depending on the requester. For example you
can write descriptions, reviews, articles; tag and categorize images, data entry, fill
questionnaires and surveys, looking for information for businesses, transcribing,
rewriting, proofreading and many other tasks. For each HIT listed, there is
appropriate reward. You can earn from $0.01 up to $100 per HIT, however vast
majority of the HITs are under $1. Macro work is a type of crowdjobbing where
specialized skills such as those related to education or infrastructure or technology
etc. are involved. Key differences from micro work are that they can be done
independently, they take a fixed amount of time and, they require special skills.
Crowdjobbing is suitable for both physical and virtual tasks. Lebraty and Leb-
raty (2013) describe, as an example of a physical crowdjobbing task, the mobi-
lization of small businesses in a rural area. Businesses are sent an electronic file
which they are required to print and display in public areas. Virtual task applica-
tions are much more plentiful and ideally suited to the many big data problems that
exist. One such example might be data cleansing where on its own, the problem is
enormous, but by employing a large number of workers, each addressing just a very
small part of the cleansing projecting, it can be completed with a very high degree
of accuracy.
Crowdsourcing has also found entry into scientific applications. For example,
UC Berkeley has developed CrowdDB, a system that uses human input via
crowdsourcing to process queries that neither database systems nor search engines
can adequately answer.
Crowdfunding
A popular special case of crowdsourcing is crowdfunding. In simple terms,
crowdfunding is the use of a Web-based application to facilitative the generation of
funds, often in the form of small donations by a large number of “backers” for a
particular cause or initiative. This principle is shown in Fig. 1.9.
Fig. 1.9 Crowdfunding Principle

It is common to see crowdfunding used for the funding of new product develop
or alternatively for fundraising for humanitarian, social, or personal crises causes
(e.g., funding for medical expenses). One of the most famous crowdfunding causes
in recent history was for the development of the Pebble e-ink smart watch in 2012.
As described, for example, at www.cnbc.com/2015/03/30/pebble-watch-funding-
hits-record-.html, the developers totally underestimated the scale of support for their
development (part of the incentive for donating was to be at the start of the queue
for the product when released) and ended up with more than 100 times their target.
The Pebble watch funding used the Kickstarter (kickstarter.com) platform, which is
the largest and most popular of the many such sites, other well-used sites include
Indiegogo (www.indiegogo.com), Sparksters (www.sparksters.com), or crowdrise
(www.crowdrise.com/). Others include Givealittle (www.givealittle.co.nz) and
GoFundMe (www.gofundme.com) which focus on social and humanitarian
fundraising causes, or ArtistShare (www.artistshare.com) which operates as a music
label for “creative artists.” There is nowadays an almost limitless supply of
crowdfunding alternatives.
Other Forms of Utilizing a Crowd
In addition to crowdfunding, the basic structural tenet of crowdsourcing has been
applied to a broad range of other tasks resulting in a number of more specific types
of crowdsourcing. The most important of these is described here by way of
crowdsearching. Other, more narrowly focused types, but not discussed here,
include crowdvoting, crowd-auditing, crowdcuration, crowdcontrol, and crowdcare.
In crowdsearching the crowd is used to search for information that cannot be easily
sourced by computers, or to help locate missing physical items. An interesting
example of the latter is the online community “CrowdSearching” (www.hippih.
com/crowdsearching), which uses the power of social media to help people around
the world find their lost belongings. Tomnod (tomnod.com), which means “Big
Eye” in Mongolian, uses a crowd of “digital volunteers”, or “nodders” to identify
items of interest in satellite images, often for social or humanitarian causes. For
example, one cause focuses on the identification of buildings in rural Ethiopia from
satellite maps based on the assumption that buildings typically indicate human
habitation and as such, acts as a de-facto measure of population distribution. This
then can aid, for example, in the management of preventable diseases.
A key enabler of crowdsourcing is the platform facilitating the various crowd-
sourcing activities. In general, these application platforms are custom-designed for
specific types of crowdsourcing tasks. As mentioned, Amazon Mechanical Turk is a
micro-job crowdsourcing online marketplace where requesters use and pay for
human intelligence from workers to perform tasks that computers cannot do.
Delicious.com used to be a social bookmarking site that enabled the crowd to store,
share and discover bookmarks of Web documents. Utilizing “tags” these could be
stored in a variety of structures for ease of access, sorting and manipulation. There
also exists specialized crowdsourcing for idea generation (e.g., ideascale.com), data
sharing (e.g., deadcellzones.com), distributed innovation (e.g., ideasculture.com),
and content markets (e.g., threadless.com, istockphoto.com), to name but a few.
The crowdsourcing industry website (crowdsourcing.org) reports that as of January

2016, there exists over 3,000 verified crowdfunding sites alone. Considering all of
the other crowdsourcing models, it is not unreasonable to suggest the total number
would be in the tens of thousands. Only time will tell as to the sustainability of this.
Crowdfunding Pros
In general terms, irrespective of the specific type or application of crowdsourcing, a
number of key benefits characterize its value. The main reason for its use is the
significantly lower cost, compared to the cost of hiring dedicated professional
employees or contractors. For example, even when payment is involved, many
regions in the world operate within low wage markets, often significantly lower
than those within development economies. With the lower price comes the high
number of people who are ready and waiting to work for you anytime, avoiding the
extra cost overheads typically incurred when hiring professional staff. Another key
benefit is that the collective wisdom of the crowd is almost always going to exceed
that of internal staff, no matter how large the organization might be. The collective
knowledge of the crowd is superior because of the diversity and breadth of ideas
and knowledge it brings. Companies need to learn from those with different skills
and backgrounds—not from those confined to a department. It is also widely
accepted that crowdsourcing results in products or services having a quicker time to
market through accessing a critical mass of necessary technical talent who can
effectively work 24/7 or “follow-the-sun” across time zones where necessary,
alongside tasks that are undertaken in parallel. Many crowd workers are available
outside of normal working hours and during weekends, further enhancing the speed
of service delivery. Because many applications of crowdsourcing involve an ele-
ment of completion, creativity and innovation are encouraged, thus more creative
solutions can be explored. Overall, the openness of crowdsourcing allows for
greater collaboration, the competitive spirit, more ingenuity and innovation, and a
reduction in the need to relying on organizational knowledge hoarders.
Crowdfunding Cons
Like all good things, however, crowdsourcing is not without its risks and down-
sides. Perhaps unsurprisingly, it is often the benefits, under certain conditions, that
can become downsides when those conditions change. The low-cost labor that
comes with the crowd along with the self-selection that goes with it, has the
potential to result in lower quality and resultant less credible products/services,
compared to development by professionals. For example, there are people making a
living out of AMT and other crowdjobbing platforms, and AMT HITs might be
subject to cheating. This raises the question of how to decide on the quality of a
delivered result and how to apply some form of quality control to the workers
involved. Both issues have been studied in recent research (see, for example, Das
Sarma et al. 2016) and are the reason why sites like CrowdSource (www.crowd-
source.com) and others have started to do training and testing for anybody who
wants to work for them. On the other hand, professionals are paid for their relia-
bility, expertise and experience; however, if you open up anything other than the
simplest of tasks, the opportunities exist for substandard work to be produced.
Another important issue to consider is the management of the crowd—how do

you overcome the significant supervisory requirement? It may actually be easier (on
you) to simply hire a professional without the resource drain that might go with
remote, crowd-based workers. It is also important to recognize that the members of
the crowd and generally competing with one-another and have not incentive to
effectively work as a team for you. They can also choose to come and go as they
please. There are other risks that need to be mitigated wherever possible. This
includes ensuring that the right job is handed over to the crowd and does not
involve the release of company secrets. It is also important to ensure that the
relationship you have with the crowd is a positive one, otherwise the crowd can
easily turn against you and rapidly damage your brand or reputation. Overall, it is
crucially important that the right type of tasks are handed over to the crowd, and
that you are adequately resourced to deal with the vastness and diversity of the
contributions you can expect. Based on the various applications of crowdsourcing
discussed, along with the most important benefits and advantages identified, a series
of tips are presented below that are intended to help best select and manage a
crowdsourcing initiative:
• Be specific. Make sure to make a detailed list of what the person is supposed to
do. If you’re looking for feedback on a design, don’t ask “what do you think?”,
but be specific like, “Is the text readable? Do you see any layout errors? Does
the page loads fast?”
• Don’t be too cheap. Crowdsourcing is cheap, but it follows the same formula of
other jobs: the more you ask from the people, the more you have to give in
return. If you ask the people to spend 10 minutes filling out a questionnaire, it’s
unreasonable to offer them 10 cents. Hardly anyone would bite on that deal, and
for those who do, the results won’t be useful to you.
• Have a way of verifying the results. When outsourcing to a large crowd of
non-professional workers, the results can vary greatly. Make sure to state in the
job description about what skill or knowledge is required from the user.
• Weigh your options. Instead of Amazon’s Mechanical Turk, consider using
dedicated crowdsourcing services for things like usability testing. While they
certainly cost more, they are a lot more likely to yield better results. (source:
hongkiat.com)
Similarly, Fitzgerald and Stol (2015) present a number of “Do’s” and “Don’ts” of
crowdsourcing software development that indeed apply to may uses of the crowd:
• Do build a relationship with the crowd to identify those who are invested. Get to
know the people who actually care about your product or service.
• Do Provide Clear Documentation to the Crowd.
• Do Assign a Special Person to Answer Questions from the Crowd.
• Don’t Stay Anonymous.

• Don’t Reject Submissions If They Can Easily be Fixed.
• Don’t Underestimate the Cost.
• Don’t Expect Miracles.
1.7 The Web at Graduation?
During its relatively short lifespan of a little over 20 years so far, the Web has
undergone a tremendous development. It is nowadays truly ubiquitous, accessible
from almost any device in almost any place at almost any time. Indeed, wearesocial.
com/uk/special-reports/digital-in-2016 shows a global snapshot comparing the
world’s overall population to the number of Internet users as well as to the number
of mobile users, all figures as of January 2016:
• 3.419 Billion Internet users among a world population of 7.395 Billion people;
• 2.307 Billion active social media users;
• 3.790 Billion unique mobile users.
So the Web as the most prominent Internet service has indeed evolved from a
freshman in 1993 to a senior today, but is the Web already graduating?
We have discussed a variety of applications that can nowadays be used on the
Internet and the Web, and technology which has provided the underlying infras-
tructure for all of this with fast moving and comprehensive advances in networking
and hardware as well as software technology. We have also discussed the various
forms of user participation and contribution (which we might also call socializa-
tion) which has changed the way in which users, both private and professional,
perceive and utilize the Web, interact with it, contribute to it, publish their own or
their private information on it, or conduct business. So the Web has emerged from a
medium where a few people centrally determined what all others had to use to one
where almost half of the world’s population participate and jointly create and
publish content. An immediate consequence is that increasing amounts of data are
produced on the Web, which either get stored or are streamed. More data arises
from commercial sites, where each and every user or customer transaction leaves a
digital trace. The reader interested in what happens on the Web is recommended to
take a look at visual.ly/internet-real-time, which preserves an animation that orig-
inally appeared at pennystocks.la/internet-in-real-time/. Other such sources include
www.webpagefx.com/internet-real-time/ as well as www.betfy.co.uk/internet-
realtime/.
The term “graduation” is commonly used in connection with students who

successfully finish a particular portion of their studies, such as a Bachelor or Master
program at a college or a university. Translated into Internet speak, the Web is far
from graduating, since the story has essentially just begun. We see a major reason
for this in the fact that technology developments occur in random jumps, not
linearly or continuously; in addition, as Friedman (2016) notes, these developments
are characterized by steady acceleration. A typical example is Apple’s iPhone,
which did not exist in 2006, and at that time nobody (maybe except for Steve Jobs)
was able to predict the vast changes that would occur after its introduction in 2007.
Today, a world without smartphones and other “smart” devices such as tablets is
unimaginable. The reader interested in what to expect might be interested in the
videos from the 2016 Annual Meeting of the World Economic Forum (WEF) that
can be found at www.weforum.org/agenda/2016/01/6-videos-that-will-help-you-
understand-davos-2016; more recent ones will be available in the future.
It is difficult to make reliable predictions for the next 5–10 years (something the
Gartner Hype Cycle is famous for), since we do not know when the next jump in
technology will occur and where it will take us. So the best we can do is to analyze
the situation in that we find ourselves today, try to understand essentials of digital
information technologies in wide use today, and to provide help for the
techno-savvy manager of a small and medium enterprise, who needs to be ade-
quately prepared and trained to make informed decisions about the best way in
which to adopt the host of new technologies and to react to modern developments.
Such technologies include cloud computing, social commerce, Big Data, the
Internet of Things, as well as many others. It is these developments, including their
technological foundations and business impacts that we will try to introduce our
reader to in the following chapters.
1.8 Further Reading
Berners-Lee (2000) is an account of the early design of the Web by the man who
created it; many other books deal with the history of the Web or with Berners-Lee
and his importance for the modern world. Berners-Lee submitted his proposal for
the World Wide Web in 1989 and launched the first website in 1991. He founded
the World Wide Web Consortium (W3C) in 1994, established the World Wide Web
Foundation in 2009, and is the recipient of the 2016 ACM Turing Award. A vivid
account in this context is the Internet History Program of the Computer History
Museum in Mountain View, California (see www.computerhistory.org/nethistory/).
We also refer the reader to the 2014 issue of CORE, the magazine of the Computer
History Museum, which contains various articles under “The Web at 25” umbrella
(see s3data.computerhistory.org/core/core-2014.pdf).
1.8 Further Reading 51
The client/server principle (cf. Fig. 1.1) has found wide use in computer systems
and is described in more detail, for example, by Tanenbaum and van Steen (2007).
Tanenbaum and Wetherall (2010) explain P2P networks in more detail. Musciano
and Kennedy (2006) is one of many sources on the HTML language and also
covers XHTML; for an introduction to the current version HTML5, available since
2014, see www.w3schools.com/html/.
The reader interested in search engines should consult Brin and Page (1998) for
the original research on Google, Vise (2005) for an account of the early history of
Google, and Miller (2009) for an in-depth presentation of its possibilities. Levene
(2010) describes how search engines work in general; Langville and Meyer (2012)
study Google’s as well as other ranking algorithms and give an in-depth exposition
of the mathematical and algorithmic aspects behind PageRank calculations. Infor-
mation retrieval (IR) are explained, for example, in Baeza-Yates and Ribeiro-Neto
(2011), Levene (2010), or Büttcher et al. (2010) and have had a big impact on how
a search engine works. The long tail concept is discussed in detail in Anderson
(2006) as well as on Anderson’s Web site (at www.thelongtail.com/).
Moore’s Law is discussed by Friedman (2016) and also by Thackray et al.
(2015); another account can be found in the 2015 issue of CORE under the title
“Moore’s Law @ 50” (see s3data.computerhistory.org/core/core-2015.pdf). Spec-
ulations that its end is near have been published repeatedly; see, for example, www.
technologyreview.com/s/601441/moores-law-is-dead-now-what/#/set/id/601453/.
On the other hand, leaps forward seem also possible, for example via Google’s
Tensor Processing Unit (www.pcworld.com/article/3072256/google-io/googles-
tensor-processing-unit-said-to-advance-moores-law-seven-years-into-the-future.
html). The future of semiconductors is outlined by Greengard (2017). For details of
TCP and IP, we refer the reader again to Tanenbaum and Wetherall (2010). A trend
in networking that has become popular around mid-2000s is software-defined
networking (SDN), an approach to computer networking addressing the fact that
traditional network architectures do not support the dynamic and scalable com-
puting and storage needs of more modern computing environments. SDN is
achieved by decoupling the network components that make routing decisions for
traffic (the control plane) from the components that forward traffic to selected
destinations (the data plane). An introduction to this area is provided by Goransson
and Black (2017).
Blogs have become a highly popular medium for people to express themselves;
readers interested in learning more about the topic are referred to Reardon and
Reardon (2015). Social networks have become popular outlets for self-portrayals
that often even exaggerate, a phenomenon that caused Time magazine to report on
“The Me Me Me Generation” already in its May 20, 2013 issue. Yet like other
social media, blogs and social networks have given rise to what is now called
“cyber-mobbing” or “cyber-bullying;” see, for example, Festl and Quandt (2016) or
Blöbaum (2016), an emerging research area within the field of communication
studies.
Howe (2009) is an early introduction to the area of crowdsourcing, Brabham

(2013) a more recent one. A recent study by Lechtenbörger et al. (2015) relates
success and activity on crowdfunding platforms during campaigns. The reader
interested in finding out how to utilize modern online social media in a business
context should consult an online bookstore and just browse its catalogue for “social
media.”
Digital (Information) Technologies
2
In this chapter we discuss a variety of digital technologies that have become rel-
evant over the years. We start by looking at digitized business processes as they
have transformed many areas of business. This topic is closely related to business
process modeling and management (BPM) and with the execution of business
processes, which is nowadays often done using engineered systems and appliances.
After that we present cloud computing and cloud sourcing (not to be confused with
crowd sourcing which we introduced in the previous chapter), which is the main
enablers for big data and analytics. Our goal is to present the technological basics of
these areas, emphasize why they are relevant today, and discuss what their impact
so far has been. We will not present these technologies in full technical detail;
instead we try to describe the core of what is needed to appreciate them and to see
them in perspective, namely in a customer perspective (Chap. 3) as well as in a
business perspective (Chap. 4).
2.1 Digitized Business Processes
In a globalized world, business processes increasingly form the crux of any orga-
nization. The reason for this is comparatively straightforward, if one considers that
any change in an organization will be accompanied by changes inseparable from its
business processes. Changes in the global market are common for the daily agenda:
Companies are increasingly forced to adapt to new customers, competitors, sup-
pliers, and business partners. Globalization is one of the “accelerators” that Tom
Friedman has observed (Friedman 2016). Competitive edges are achieved more
frequently not by better products, but by more efficient and cost-effective processes.
In short: Business processes have developed into an additional factor of production.

DOI 10.1007/978-3-319-60161-8_2
54 2 Digital (Information) Technologies
2.1.1 What Is the Problem?
Given this background, it does not come as a surprise that of all the professionals
and managers today, but also the IT staff, suppliers and sometimes even the cus-
tomers of a company—we collectively call these the business community—are
expected to have a good understanding of business processes. Collectively they
contribute to the design, analysis, documentation, execution, and evolution of many
different types of business processes. Of course, this only succeeds if an efficient
and effective communication is possible. It requires that the same language is used
within the entire business community and that time-consuming and error-prone
translation operations are omitted. However, what does this look like in practice?
Obscurities, contradictions, misunderstandings and omissions in communication are
common. Each interest group in the business community maintains its
group-specific perspective on a business process: The management focuses on
corporate goals and business performance indicators; business professionals have
their business applications and processes in mind, and IT professionals think in
terms of software- and hardware structures. It is clear that communication chal-
lenges are inevitable. In many organizations, one tries to remedy this situation with
modeling experts who “translate” the collected business requirements into process
requirements, summarizing these in vast and highly complex models. Such an
approach gives the appearance of professionalism and efficiency, and in fact, such
“model monsters” are often given the stamp of approval by the entire business
community, hence forming the basis of organizational change. However, many
community members only recognize the negative implications of these “model
monsters” at a point far too late when they are forced to live with the respective
organizational changes.
Figure 2.1 shows an alternative approach that is based on the use of a common
modeling language. This language is understood and (ideally) “spoken” fluently by
all members of the business community involved. This means that an explicit
translation of the communication processes will no longer be necessary, and the
abstraction of group-specific perspectives as well as structuring of the contents
conveyed can be carried out individually by everyone involved.
Which prerequisites are necessary for such a universal modeling language? First,
it must be easy to learn and can be mastered by inexperienced users quickly and
reliably. The language must provide all technically relevant aspects of a business
process in detail. All this is possible only if the language has a simple syntax, which
requires a minimum number of language elements, and clearly defined semantics
that distinctly govern the use and interpretation of the elements. A typical repre-
sentative of such languages are Petri nets, which have proved to be effective in
business process modeling.
Simplicity of use combined with various fields of application—the strengths of
Petri nets are best utilized when they are integrated into a proven modeling method
in a particular way. In addition, it can regulate when and how analysis and simu-
lation are to be employed, and which results can be obtained. Efficient work with
Petri nets and the application of relevant methods is inconceivable without
2.1 Digitized Business Processes 55
abstract and
structure
Business
Processes
Fig. 2.1 Common language as a prerequisite for communication
appropriate software tool support. Tools ensure compliance with the syntactic rules,
support methodical steps and take over tasks of the administration, documentation
and use of content created.
2.1.2 Business Process Modeling and the Horus Method
Many approaches to business process modeling consider the modeling of process

procedures in isolation. Other methods may take different aspects of a process into
account (procedure, organization structure, business rules, etc.), yet they still model
these isolated from one another. For the representation of an as-is process, this
might be acceptable—however, for an optimization and redesign of processes this is
not suitable. Here, an integrated analysis of all aspects relevant to the process is
indispensable. Thus, for example, even the most sophisticated procedure will only
deliver sub-optimal process results when not all necessary business objects are
available in a sufficient quality standard.
The Horus Method solves this problem by always looking at a process together
with its organizational environment or context. This applies to both the modeling
described in this section as well as to the optimization and further use of the resulting
models. In addition, this method motivates the user to describe all relevant aspects
with exactly those techniques that are best suited for them. This, at first, may sound
trivial, but is a problem frequently found in practice. The expert will immediately
recognize whether a model was created by a specialist in object modeling or a Petri
net specialist or by an organizational structure manager. The specialist will always
try to describe as many aspects as possible in the modeling language familiar to her
or him as in the case of an object modeler, for example. He or she will try to place
Project management Quality assurance Documentation
Modeling strategy
Context analysis Structure analysis Process

management
SWOT analysis
Project Procedure analysis Process
Strategy analysis
initialization Process cluster implementation
Enterprise Business
Project Organization
architecture performance
definition structure analysis
modeling management
System Process
architecture design Key figure Risk
evolution
analysis analysis
Phase 0: Phase 1: Phase 2: Phase 3:

Preparation Strategy and Business Process Application
Architecture Analysis
Fig. 2.2 The Horus Method
numerous procedural aspects concerning specializations and integrity constraints in

the object model. The Horus Method provides a solution by guiding a user with
clearly defined steps through the modeling process, giving instructions as to what
essential facts need to be modeled in which particular manner.
The Horus Method offers steps both for model expansion through additional
elements (activities, organization units, etc.) as well as for joining various modeling
elements (e.g., organization unit is responsible for an activity or executes it). Fig-
ure 2.2 provides an overview of the Horus Method. It subdivides business process
engineering into four phases. Phase 0 is the preparation of the engineering project.
Phase 1 is the strategy and architecture phase to study the strategic aspects and
definition of enterprise and system architecture. Phase 2 is the detailed business
process analysis. Finally, Phase 3 is the subsequent usage of the model. Beyond
these phases, modeling is accompanied by project management, measures for
quality assurance, and up-to-date documentation.
From a Mission to an Architecture Model
The primary goals of the Horus Method are the gathering and structuring of
business requirements as well as the creation of a comprehensive business process
model that considers all relevant aspects including the process background and
context. The focus therefore lies on the actual model development and not on
process improvement, the development of an information system or even the
enforcement of the process within the organization. Such tasks will be dealt with in
Phase 3. Nevertheless the Horus Method puts an analysis of a company’s strategy
as well as the modeling of the corporate structure and the architecture of a sup-
porting information system at the beginning of a business process analysis. The
reason is that it has turned out in practice that only in this way it is possible to
involve decision makers adequately in a modeling project, and to convince them of

ongoing cooperation and support.
Business Process Analysis
Business Process Analysis is carried out in Phase 2 of the Horus Method within the
frame that has been marked out as a result of the Strategy and Architecture phase.
That framework defines both the width and depth of the analysis area, including in
particular where the focus of the analysis lies. The models to be created in Phase 2
are in part further refinements of the Phase 1 models, displaying the facts in much
greater detail and less of a strategic rather than a technical point of view.
Simulation
The Horus Method describes a consistent process for comprehensive modeling of
business processes within the framework of business process engineering. The
central point of reference is represented by XML nets that offer formally sound
opportunities for process simulation; such nets combine Petri nets with XML
documents for the specification of objects managed and manipulated by processes,
see Schönthaler et al. (2012). A look at the business practice in connection with the
simulation shows, however, a surprising ambivalence: Although the need for
simulation is undisputed—especially in digitalization endeavors—and every deci-
sion maker “lusts” for ways to test the consequences of his decision alternatives in
advance, simulation is often rejected based on the expected expenses for prepara-
tion and execution of simulation studies. The added benefit generated by a simu-
lation is considered too low, and simulation is often labeled as a cost driver. This is
different with the Horus Method: It comprehends simulation as a key to signifi-
cantly increasing the benefits arising from business process engineering. The use of
simulation-capable models in the Horus Method provides for a substantial reduction
in simulation efforts. Moreover, the seamless integration of modeling and simula-
tion—which is also reflected in the Horus tools—enables entirely new forms of
project communication, which are reflected directly in the quality of project work
and the results achieved.
2.1.3 Holistic Business Process Management
Business Process Management (BPM) has established itself since its beginnings in
the early 1990s as an independent discipline that bridges the gap between corporate
strategy and the information and communication technologies used by a company.
Based on the Gartner IT Glossary (see www.gartner.com/it-glossary), bpm.com
Forum and Rosing et al. (2015), BPM can be defined as follows: “BPM is a
discipline that uses different methods and software tools to discover business
processes. They are then comprised into models, to analyze and simulate, to
measure, and to improve and optimize them on the basis of predetermined criteria.
BPM is aligned to the business strategy that is important to the company or for the
organization as a whole and in general. Business processes coordinate the behavior
of people, systems, information and things (see IoT) within an enterprise or across
enterprise boundaries to achieve a beneficial result for the organization.” Schön-

thaler et al. (2012) refer to the people at large relevant to the process as the business
community. It includes a company’s employees, customers, suppliers, and other
external partners.
The execution of a process means that instances of a process model are created
and that the activities defined therein are carried out as required by the business
process model. The execution can be automated as a whole or in parts, but it can
also be carried out manually in its entirety, for instance by adopting organizational
instructions. To automate business processes or sub-processes, process-oriented
information and communication technologies are used, i.e., technologies that ade-
quately meet the requirements of the business process specifications and offer a
good user experience. Business processes can be structured and repeatable or
unstructured and unpredictable. Unlike structured business processes, the process
itself is not focused on in unstructured processes. In these cases the focus is on the
information to be processed and the requisite knowledge. In this case, the process
only yields at the time of execution. Following the suggestions of the Workflow
Management Coalition (WfMC; www.wfmc.org) we refer to this specific applica-
tion of business process management as Adaptive Case Management (ACM; see
Swenson 2010).
Business process management is an important constituent of a lasting recipe for
success in many successful and future-oriented companies. The different aspects of
the business processes relevant for companies are fully described, documented, and
made transparent for all persons involved, so that these may adequately become a
part of the process. In addition to full documentation, the goal of business process
management is an improvement or optimization of processes. This may occur under
different objectives. In principle, there are numerous possible goals or several
continuous activities of business process management as represented in Fig. 2.3:
Reshaping, Optimization
Reengineering
Set-up Design Engineering
Shaping
Fig. 2.3 Objectives of BPM

• Design: Business process management is set up for the first time and introduced.
The processes of the respective companies are subjected to a design.
• Engineering: Designed processes are implemented and made available for
execution. An efficient resource use is important, furthermore an adequate
connection of process, object and organization models.
• Monitoring: Existing and pre-established business processes are subject to
continuous monitoring to identify and remove bottlenecks in processes or
resource allocation. The management of the business processes is improving
constantly; this can be done continuously or at specific time intervals. The goal
is a continuous optimization of the current operations of existing processes.
• Reengineering: An established process management is redesigned or optimized
partly or completely because of changed organizational conditions.
Business process design refers to the design and development of a process prior to
its implementation. This case is usually found in practice only where completely
new business fields must be integrated into an existing process landscape or when
new technical possibilities are introduced (e.g., a switch from a bricks-and-mortar
business to e-commerce). Business process engineering means the continuous
further development and optimization of processes. Proven processes are retained
and linked with improved or partially redesigned processes. The changes may not
be drastic, they take place gradually. For one, the risk that a transformation always
brings with it is reduced and the acceptance on the other hand is improved.
A prerequisite for this evolutionary form of business process development is a
permanent monitoring. Only then weaknesses can be identified and the impact of
the changes be displayed and analyzed. All these objectives require a precise
definition of the business processes as well as a consistent documentation, which
can be achieved by adequate modeling.
The original method introduced by Hammer and Champy in 1993 for business
process reengineering (BPR) proposes a radical redesign of the existing process
landscape. For best results, all processes will be newly developed from scratch.
However, in reality proven processes remain unconsidered. Because of the serious
changes, which a radical change generally causes, the approach did not gain
acceptance in its “pure form” in operational practice. A redesign, on the other hand,
of distinct subareas of a company is much more feasible and palatable.
Generally accepted nowadays is holistic business process management, which
on the one hand takes the indicated aspects of the procedure modeling, object
modeling as well as organization modeling already indicated here into considera-
tion. Further, it takes the business view (abstracting from processes) as well as the
service view (implementing processes and their constituents) into account; this is
indicated in Fig. 2.4.
Business Evolution
Business
View
Analysis Change Management

Process
Process models
View
Implementation Reconciliation
Service Web Web Web

View service Rules
engine
service BPEL service
IF
IF Enterprise Service Bus
Legacy Human Legacy Legacy

system workflow system system
Fig. 2.4 Integrated BPM
2.1.4 BPM Applications
Business processes are the focal point when it comes to changes in an enterprise, be
it the implementation of new business models and strategies, be it in the realization
of information systems, or be it related to quality improvements—the discussion
always involves business processes. A logical consequence is the necessity for a
realistic and easy-to-understand portrayal of the business processes that qualify as a
basis for effective communication, but also for analysis and simulation. Based on
current practical projects, we will show next how business process models can be
used in important applications as well as the resulting benefits derived there from.
Business Process Reengineering (BPR)
While many books have been written on the topic of business process reengi-
neering, genuine accounts of implementation are rare. Why is this the case?
Because the nature of business process reengineering is a “fundamental rethinking
and redesigning of all business processes of an enterprise or corporate division.”
And as an objective, “dramatic and sustained improvements in process performance
in terms of quality, cost and time” come into question. Enterprises quickly find
enthusiasm for this goal; however, they have trouble with fundamental rethinking
and, above all, with an implementation of completely redesigned processes. This is
particularly true when—as is typical in business process reengineering—a new
organizational structure is derived from the new process. In practice, business
process reengineering can only be successful in connection with effective organi-

zational change management. This shows that business process reengineering not
only requires organizational strength in implementation, but an “adequately
dimensioned” project budget as well. In this respect, business process reengineering
often comes too late for enterprises that already find themselves in an economic
crisis.
With BPR projects, an intensive use of conventional organizational materials,
techniques, and methods (flipcharts, pin boards, questionnaires, etc.) has proven
effective. These reach their limits, however, when it comes to a discussion of
specific business processes, business objects, or rules. Misunderstandings then
occur, and unrecognized inconsistencies and incomplete statements impede infor-
mal communications considerably, often leading to questionable results. Also with
respect to documentation, significant quality deficiencies can be observed. Working
with business process models can help here, especially when appropriate software
tools are available.
When working with business process models, solutions are produced that prove
themselves repeatedly, therefore making them candidates for reuse in future pro-
jects. However, such models must be generalized and quality-assured prior to reuse.
Renowned consulting firms achieve substantial competitive advantages with such
“best practice models” or “knowledge bases.” In many cases, the manufacturers of
business process tools also offer such models.
Best practice models are frequently used in BPR projects. However, they pose a
danger in that they obscure the focus on completely new solution variants. For this
reason, the use of reference models is recommended, especially in the preparation
of details, but not in the preparation of initial process ideas.
Reference models for BPR must ideally be oriented towards a specific industry
or adapted to a specific area of application, so that they create a conceptual
framework for all project participants based on its conceptuality. And the models,
within their scope, must be sharply generalized so as not to constrain their use.
Business Process Management and SOA
Since the mid-1990s, business processes have regularly become the center of debate
when it comes to corporate strategy and organizational issues. Even with the
introduction of packaged business application software (such as SAP or Oracle
Applications), the use of business process models has long been state-of-the-art. In
the individual development of information systems, however, its significance has
always been underestimated. Many projects that failed due to imprecise or lack of
process definitions prove the accuracy of this statement.
With the increasing use of the Service-Oriented Architecture (SOA) concept,
business processes have now gained an entirely new meaning. First, SOA appli-
cations have already shown that an efficient infrastructure for process-based man-
agement and execution of Web services and applications alone is no solution.
A consistent design of the business processes that make up the heart of an SOA, in
accordance with the requirements of the departments, is more important. Only when
the processes really meet the business requirements can their automation provide
the best results. Based on these considerations, holistic business process manage-
ment has long surpassed the topic of SOA in its significance. And it is undisputed
that a consideration of business processes now belongs at the center of each
strategy, organization and IT project. Business process models then act as a central
reference point for all technical specifications and build the bridge to the specific
implementation in the form of organizational and IT solutions.
Process-Oriented Introduction of Business Software
There are numerous reports of failed or at least severely delayed and costly software
projects. Many of these projects deal with the introduction of standard software,
although this at first appears easier than developing new custom software. Why
business software projects appear to be more difficult will be explained below, in
order to derive an improved introduction process from it. It is often discovered only
during the course of the implementation of business software that the business
processes of the standard software at hand do not correspond to the current pro-
cesses in the enterprise. The comprehensive functionality of a standard software and
the resulting complexity lead to a divergent understanding of the term “standard”
with respect to these processes. Heterogeneous IT system landscapes are another
reason for problems in the introduction of standard software. Complex interface
solutions for the integration of different systems require additional efforts. Modern
cloud solutions offer much hope with respect to addressing complexity, agility, and
cost. Nevertheless introduction projects will remain difficult and will lead to more
organizational change management efforts.
For business users, introducing a new business software is always a challenge, as
entirely different skills are required than those needed in daily business, and sec-
ondly, the project work is often done in addition to the daily standard tasks. The
induction into new business software is difficult because of the complexity of the
software and the documentation is correspondingly large. Furthermore, the true
benefit of such solutions is often visible only with the interplay realized by the
software from several areas of an enterprise. This overlapping view of the solution
remains hidden for many users at first. The same problem is often found in the
system documentation, especially when this is geared purely functional and not
process-oriented. The points listed result in long project execution times and often
result in budget overruns. Furthermore, functions not covered by the standard
software are often only identified during system testing.
Many of the problems described cannot be solved with traditional business
software adoption methods and therefore demand novel approaches, with BPM
being one such approach. For an enterprise software to be introduced, predefined
business process models—often also reference models—are the key to shorter
project terms as well as to high quality in the implementation and results of the
project. Figure 2.5 shows an example of such a reference business process model
for the enterprise software Oracle Fusion ERP Cloud.1
1
Oracle Fusion ERP Cloud is a product of Oracle Corp., Redwood Shores, CA, USA.
Fig. 2.5 Model of a reference business process from the financial area
Governance, Risk and Compliance (GRC)

The days where an enterprise could be led entirely in an autocratic “lord of the
manor” fashion are long over. Regulators respond with increasingly complex reg-
ulations in response to the global escalation of economic, environmental, and
computer crimes. The crux here is that for a globally active enterprise, it no longer
suffices to only consider the impact of national regulations on the enterprise level,
but rather all international laws and regulations must be considered in the context of
transnational business processes. A significant change is that these regulations are
not even compatible in many cases. To this end, investors and financial institutions
demand an effective risk management system, for example through the
establishment of early warning systems to identify risks and to create greater

transparency in financial processes. Furthermore, increasingly shorter half-life
strategic decisions can only be met with efficiently managed and secure business
processes. In short: Governance, risk and compliance issues (GRC for short) are at
the top of every executive’s agenda; these topics define one of the most important
application areas for business process engineering. The benefits of comprehensive
business process models with GRC are particularly greater in number, because most
of the requirements relate to the quality of process control and transparency of
business operations. But first a short disambiguation:
• Governance is running a business on the basis of clearly understood and for-

mulated business objectives and instructions. Important conditions are legal
compliance and completeness. Governance thus extends across all business
units and levels, which is why we speak of horizontal and vertical governance.
• Risk management is the collection of all measures for dealing with known and
unknown enterprise-internal as well as enterprise-external risks. These include
the establishment of early warning systems to identify risks as well as measures
to eliminate potential risks and for the treatment of incurred risks.
• Compliance denotes conforming to a rule, correspondence or conformity with a
specification, policy, standard or law with (ethical and moral) principles and
procedures, including standards (e.g., ISO) and clearly defined conventions.
Compliance fulfillment can be both enforced (e.g., by law) or voluntary (e.g.,
adherence to standards).
In corporate practice, it has been proven reasonable, if not mandatory to treat

governance, risk and compliance management in a cross-thematic context. The
reasons are obvious: Many interdependencies exist; synergies arise during imple-
mentation that on the one hand enhance the effectiveness of planned actions,
contributing to cost reductions on the other. By the way, companies that do not
view GRC as a burden, but above all as an opportunity to improve business pro-
cesses, achieve genuine cost savings and improve their competitive positions. GRC
is often structured into three packages of measures that reflect different views of
GRC, but which are closely related to one other: Finance and Audit GRC, Legal
and Process GRC, IT GRC.
The implementation of GRC projects has an intrinsic complexity, which results
from the large number of fields of action as well as from the diversity of business
requirements. This complexity is manageable only if easy-to-understand enterprise
models are used, and a systematic procedure for creating these models exists. The
models allow efficient forms of communication in the context of the GRC project.
They provide for a consistent documentation, supplying approaches through anal-
ysis and simulation for the assurance of quality and the optimization of examined
business processes. As for the diversity of the business requirements, Horus has the
advantage that the generated partial models can be formally linked with one other,
Organization responsibility Object

model model
Roles affiliation Resources
model
execution /
responsibility
execution
Employee
typification of
object models
ownership
ownership
Procedure
Risk
model
model risk context
Activity
risk
provisioning
key figure
Key figure context Rule
model Refinement compliance model
Fig. 2.6 Integrated Horus enterprise model
as shown in Fig. 2.6. Such an integrated enterprise model prevents the creation of
new “information islands” through GRC that would lead to inefficiencies and hence
would stand in the way of interesting optimization opportunities.
2.2 Virtualization and Cloud Computing
We mentioned in Chap. 1 that more and more applications and service that are
available on the Web or on the Internet do not require a local installation anymore;
instead, they nowadays reside “in the cloud.” Essentially this means that there are
computing and storage resources, databases, or other systems or resources that
(often exclusively) are accessible via the Web, typically via a browser, or via an app
that is installed on a mobile device. Typical examples include audio and video
streaming (e.g., via Spotify or Netflix), shared writing of documents via Google
Docs or Zoho Docs, or the various applications and sites that are commonly
involved when booking travel: airlines for flights, hotels for accommodation, rental
car or shuttle services for transportation. All of them can be researched on the Web,
can be accessed for time and price comparisons, and are available for booking,
paying, and finally evaluating the product or service, often without leaving a single
platform or portal. In this section we deviate from the topic of BPM, and take a look
at the infrastructure that is rapidly becoming ubiquitous.
From a technical perspective, the infrastructure behind such applications are
compute or storage servers as well as the software running on them. We have no
idea where they are located, how they have been set up, or when maintenance is
needed. Lewis Cunningham of Oracle Corp. has characterized this by saying “cloud
computing is using the Internet to access someone else’s software running on
someone else’s hardware in someone else’s data center while paying only for what
you use.”2 In other words, cloud computing refers to the external provision of IT
infrastructure as well as application via the Internet conveying to a user the illusion
of unlimited, on-demand available resources.
Nicolas Carr in his 2008 book The Big Switch has compared this situation with
the arrival of electrical current; according to Carr (2008), we were at the point
where the handling of electricity was roughly 120 years ago: In order to be able to
use electricity, you had to produce it yourself. Later people learned how to transport
electrical current over increasingly long distances. Today we are used to obtaining
electrical energy from an outlet in the nearest wall in the quantity and quality just
needed, without having to worry about the source, the path to the outlet or the
provider. Carr compares this to the development of how we utilize computing
resources: Until a few years ago, companies or individuals needed their own
machine(s) if they needed compute or storage capabilities. As time passed, thin
clients replaced workstations, and servers withdrew into the background. In the
future, the location at which computing resources originate will no longer be
transparent at all, yet readily available in quantities as needed, and at the price that
is proportional to usage.
Smartphone and tablet users have gotten used to reading and writing e-mails
using a cloud-based service like Google Mail, to store itineraries with a service like
TripCase or Tripit, files with Dropbox or images with Instagram. The advantages
are obvious: There is no more need to store data on a local device, yet the data is
accessible (almost) anywhere and anytime. Protecting the data against loss or
misuse is left to the provider, and most of the time the provider will charge only
when advanced services (e.g., of a certain service quality) are requested. It is
exactly these aspects that make cloud computing relevant for companies: Resources
like compute power or storage space can be obtained in adequate amounts, as can
software services, and all without the need for local installations or maintenance.
We next take a closer look at several cloud applications and their providers that are
particularly relevant to enterprises.
2.2.1 Cloud Service Examples
As an example for a software service that is often being used by both individuals as
well as companies, we consider Gliffy (www.gliffy.com/), a drawing tool for
designing diagrams of various kinds, SWOT analysis results, Web page layouts,
networks sketches and architectures, business process models or technical drawings
(see www.gliffy.com/examples/), all of which can be done in a browser. Gliffy can
be used online or as a plugin to team collaboration software Confluence, and all
drawings or diagrams are stored at the Gliffy site, so that users can collaborate on
2
it.toolbox.com/blogs/oracle-guide/cloud-computing-defined-28433
2.2 Virtualization and Cloud Computing 67
them online. Besides the fact that this versatile cloud service does not require local
installation, it is also beneficial that a user does not have to worry about new
versions or security patches, and that pricing comes in three versions: Free, Stan-
dard, and Business. These scale the price a user has to pay from zero to around US$
10 per user per month, which makes it attractive for people who need to draw quite
a bit. Users might, however, be unhappy with the fact that their work remains on
Gliffy servers, which brings up the question of how well these are protected against
unauthorized access (e.g., by the competition), espionage, or loss, for example in a
catastrophe. Questions of this type result in some enterprises being hesitant to move
their applications and data to the cloud.
Gliffy is representative of a comprehensive class of software that in recent years
has been established on the Internet as an alternative to traditional licensing. Private
as well as professional users have on-demand access to writing tools, spreadsheets,
presentation software, meeting planners, calendar tools, conferencing software,
project management, accounting, HR management, as well as many others. Of the
many examples out there today, we just mention Google Apps for Work (gsuite.
google.com/), ThinkFree Cloud Office (www.thinkfree.com/), Syncplicity (www.
syncplicity.com/), or Zoho (www.zoho.com/).
Our second example goes a step further and no longer just considers usage, but
also development of applications. Sites offering both development and immediate
deployment are called platform services. An example is Force.com (www.
salesforce.com/platform/products/force/), which is a service offered by Salesforce.
As they say themselves, “Force connects business users and IT with a full suite of
tools for building apps that automate business processes, faster than ever before.” In
other words, Force is a platform enabling both the development and provisioning or
usage of applications; thus, a programmer can focus on functionality, search
function, process support or reporting options when developing an app, without
even having to bother with provisioning, sales, or management of the necessary
infrastructure (programming environment, machines, etc.). Once developed, apps
can directly be deployed using the Force platform, which they call “App Cloud.”
The developer no longer needs to consider scaling for increasing or decreasing
traffic, and the App Cloud even performs automatic backups. Besides Force, the
Salesforce site offers a variety of additional tools supporting the development of
mobile apps or appealing user interfaces and even connects to a marketplace for
apps (see www.salesforce.com/platform/products/). Alternatives to Salesforce and
Force are provided by the Google App Engine (cloud.google.com/appengine),
Heroku (www.heroku.com/), Microsoft Azure (azure.microsoft.com), or the Oracle
Cloud (www.oracle.com/cloud/paas.html).
Our third example comprises an even more elementary category of services,
namely the provisioning of pure infrastructure services or basic resources such as
compute power or storage space. Such resources can be obtained, for example, from
Oracle Infrastructure as a Service (www.oracle.com/cloud/iaas.html) or from
Amazon Web Services (AWS, aws.amazon.com/), the same platform that provides
the Amazon Mechanical Turk services we mentioned in Chap. 1. AWS started as a
small side business in 2006 and meanwhile also comprises a number of platform
services as well as software service end-products. A typical example of an infras-

tructure service is the Amazon Simple Storage Service (S3), which offers a simple
Web interface for reading or writing data. Among other functionality, S3 allows
users to read, write or delete data objects between 1 b and 5 TB in size, where the
number of objects that can be stored is unlimited. Each object is stored in a bucket
and can be accessed using an individual key. For storing buckets, AWS maintains
various regions, including US Standard, EU, US West or Asia-Pacific. Users can
choose a region for optimizing delay, minimizing costs or being complaint with
local regulations. Finally, AWS provides data against unauthorized access through
authentication mechanisms. Another sample infrastructure service is the Amazon
Elastic Compute Cloud (EC2), which enables access to computing power that easily
scales. EC2 provides a virtual server environment within which server instances
running distinct operating systems can be used; these instances can be loaded with
specific application environments which can then be executed. The important aspect
is that a user can easily scale compute power upward or downward; AWS will
charge for what has actually be used.
AWS contains numerous other services at the infrastructure level, including both
simple and more complex ones such as the Amazon Relational Database Service
(RDS) which can manage database services such as Amazon Aurora, MySQL,
PostgreSQL, Oracle, SQL Server and MariaDB; beyond this, AWS comprises
platform services (e.g., Hadoop for handling Big Data or Amazon Redshift as a data
warehouse service) and even application-level services such as e-mail, search, or
workflow support. Numerous examples of applications that have been realized
using AWS can be found at aws.amazon.com/solutions and cover such diverse
areas like business applications, e-commerce, government, education, health,
gaming, scientific computing, as well as many others. (In Sect. 2.2.5 we will
introduce the common terms “Infrastructure/Platform/Software as a Service”,
respectively, for these types of offerings.) In early 2016, AWS has grown to be the
largest cloud infrastructure service, with a market share of more than 30% and
hence bigger than the “next three” (Microsoft, IBM, and Google) taken together.
A brief account of the AWS history so far can be found in a 2016 Techcrunch post.3
2.2.2 Relevant Issues
With services like the ones just discussed, it is not difficult to imagine that the entire
IT functionality of an enterprise can be moved to the cloud; all that is needed within
the enterprise are access points (“thin clients”) to the various cloud-based infras-
tructure, development or runtime environments, or application services. Such a
transition can obviously be done in several steps and can have several forms, which
are summarized in Fig. 2.7.
From left to right, this figure shows an increasing level of outsourcing, which
ranges from none to a full cloud-based operation, where the latter may even include
3
techcrunch.com/2016/07/02/andy-jassys-brief-history-of-the-genesis-of-aws/
Your Own IT IaaS Cloud PaaS Cloud SaaS Cloud
Force, Google Gliffy, Google

Amazon Web Apps for
Services et al. App Engine et
al. Business, Zoho
et al.
User Interface User Interface User Interface User Interface

Application Logic Application Logic Application Logic Application Logic
Development,
Development, runtime Development, runtime Development,
runtime
runtime
Infrastructure Infrastructure
Infrastructure Infrastructure
Data Data Data Data
Fig. 2.7 Development options for enterprise IT using cloud services
complete business processes on top of application logic. Special attention needs to

be given to the lowest layer consisting of data; this layer may be part of the cloud
infrastructure (as in Oracle’s Data-as-a-Service or DaaS approach, see www.oracle.
com/cloud/daas.html) or may be kept local, a decision that needs to consider the
organizational (and sometimes also the legal) context. Indeed, company-owned data
often plays a special role: If data is a core company asset, which it often is, and the
company might be afraid that cloud storage of data may be subject to cyberattacks,
unauthorized access, or security breaches, it will typically be kept local. Legal
aspects might also be involved; for example, for a German company it is forbidden
to store company-related finance as well as tax data outside the European Marketing
Area (Europäischer Wirtschaftsraum, EWR). On the other hand, cloud-stored data
is typically protected against loss, for example, in a fire or a natural catastrophe,
since the cloud provider will typically keep off-site backups and often even backup
data centers.
A difference exists between small and medium-sized enterprises (SMEs) and
large companies with respect to the degree to which they outsource into the cloud:
While the former often hesitate to move too much into the cloud (if at all) and in
particular have no intention to setup and run their own data center, a large company
often establishes its own “private” cloud and takes required infrastructure, platform
or software services from there. We will later discuss reasons for this and how a
strategy could be developed for deciding in favor or against use of the cloud.
Nevertheless it is to be expected that every enterprise will utilize cloud computing
in one form or another within a few years, for a number of reasons:
• Thanks to the increasing dissemination of smartphones and tablets for personal

as well as professional usage, there is an increasing interest in applications (or
“apps”) that are mobile, location-based, and interactive, that allow real-time
response to message or posts from users, customers, computers, or sensors and
that are accessible anytime anywhere.
• The fact that cloud usage often exhibits a pay-per-use cost model suggests to
parallelize compute tasks more and more, so that they can be performed in
shorter time at (almost) the same price. The reason is simple: Suppose you need
100 CPU hours of compute time for a task, then you can either employ 100
virtual CPUs for one hour each, or one CPU for 100 h (where intermediate
solution are possible as well). The latter will almost always be the preferred
alternative.
• As many Web sites generate amounts of data in the GB or TB range on a daily
basis, e.g., by logging every click a user is performing, and as many users often
produce large amounts of data themselves, e.g., pressing Like buttons, evaluate
products, send short text messages, etc., site operators are increasingly interested
in analyzing this data, in order to better understand their customers, to send them
special offers, or to improve their products based on customer feedback. Cloud
services supporting such business intelligence (BI) applications are becoming
increasingly popular (see our discussion of AWS above).
Clearly, not every type of application is “made” for the cloud. Think, for example,
systems that need to guarantee an extremely low latency, such as electronic stock
exchanges or the emergency switch-off of a power plant, which cannot be depen-
dent on whether or not a computer network is up or down.
The aspects that need to be considered in connection with could computing, in
particular when it come to a decision on whether or not to move totally or in part to
the cloud, are:
• Technical aspects, i.e., considerations regarding what performance is expected

from the cloud, data safety and security, recovery times after crashes, and
integration of cloud services into existing applications.
• Economical aspects, e.g., questions regarding migration costs, expected running
costs, the business and pricing model of the provider, provider reputation,
potential lock-in effects, and whether a move to the cloud can indeed bring cost
savings.
• Organizational aspects look for a structured approach to provider selection,
documentation of cloud-based IT architectures, support efficiency and options to
communicate with the cloud provider or with other users.
• Legal aspects consider the laws that are applicable in a given context and try to
answer questions such as whether all legal requirements are met, what options
does the customer have when the provider changes its pricing model, or what
are the options for leaving a particular cloud provider.
Some of these aspects will be considered in what follows, where the view we
pursue will always be that of the (professional) customer. Before we delve further
into cloud aspects, we take a brief look at its precursors next.
2.2.3 Precursors of Cloud Computing
The idea of solving large computational problems more efficiently by decomposing

them into smaller pieces and deal with these in parallel goes back to the 1960s,
when information technology was still in its infancy. One way to implement this
idea is to put it on a sufficiently large parallel computer; another is to solve the
problem at hand using a large collection of small machines that are appropriately
connected. Such a network, which is a form of distributed system, appears to the
outside world as a single, coherent and transparent system whose components are
nodes. Several classes of such systems exist including the cluster and the grid (see
below).
Common to all distributed systems is the notion of scalability, which essentially
describes the capability of a system to appropriately handle a growing or shrinking
workload, or its potential to be enlarged or downsized in order to accommodate
that. A system can be scalable regarding its size, i.e., resources can be added or
taken away without a significant performance reduction. A system can also be
scalable regarding its geographical distribution, meaning that resources may be far
apart without the system’s performance being affected. Finally, a system can be
scalable regarding its management, in which the system is run by multiple inde-
pendent organizations without overly complex management. In the first case of size,
there is another distinction between vertical and horizontal scalability, where
vertical means that a more powerful computer is employed, whereas horizontal
means that more nodes are added to the system.
One of the first types of distributed system was the cluster, essentially a con-
nection of many identical nodes via a high-speed network, thereby enabling dis-
tributed parallel computations. A large number of machines rendered it possible to
achieve an availability and performance previously unobtainable and at a consid-
erably lower cost than a large mainframe computer. Typical application could (and
still can) be found in scientific computations such as weather forecasts or the
simulation of complex systems. Cluster are used in the modeling, simulation and
analyses of the so-called Grand Challenges for Computing Research (Hoare and
Milner 2004), including, for example, an understanding of the mechanisms of the
human brain, or security issues in software running in cars, homes, planes, or
rockets. Cluster computing has become the basis of modern supercomputers, and
even the blueprint for today’s data centers needed for Internet search or updating
social graphs, e.g., by Google or Facebook. The components comprising a cluster
are of commercial-off-the-shelf (COTS) components.
In the academic world, the idea of grid computing (Grandinetti 2006) emerged in
the late 1990s as an alternative to centralized computing, where applications can
request computational power in an on-demand fashion within an always available
grid of machines (and often use them for free), just like electric power is obtained
from an outlet, but is actually drawn from an electrical grid that does not need any
user interaction or adaptation, even at peak times. Grid computing has a number of
applications, for example in particle physics, meteorology, medical data processing
or in satellite image processing. It has become widely known through the
SETI@home project of the University of California, Berkeley, which uses com-

puter idle time to search for extraterrestrial intelligence (setiathome.berkeley.edu/).
Another prominent grid project is Stanford’s Folding@home (folding.stanford.edu/
), which used personal machines for protein folding simulation. Grid computing
was one of the forerunners of cloud computing.
Independent of technical developments, businesses have for many years con-
sidered outsourcing parts of their value chain which is not their core competency to
third parties. One driver behind this has always been cost reduction, since the third
party can exploit economies of scale, but also a complexity reduction of internal
processes. Another is often skill or capacity that a partner can provide to a business,
or a reduction of cycle times. The amount of outsourcing can vary, from simple
tasks to comprehensive business processes and even entire departments. Since the
1990s, outsourcing has increasingly been adopted for IT services. Technical
developments have made it possible to distribute systems at an increasing rate, and
the exact location of a system node is no longer of crucial importance. This has
opened the door for the notion of an application service provider (ASP).
An ASP essentially provides a particular application on demand; the application
itself runs on dedicated hardware within the provider’s data center. The customer
gets exclusive access to this application, while the ASP takes care of all mainte-
nance and development work. For additional customers who want to use the same
application, a second instance of software and server is needed. License fees are
commonly translated into periodic usage fees. In spite of the obvious benefits the
ASP model may have, it has vastly failed in its original form, since providers could
often not reach the intended economies of scale. Nevertheless this model can be
seen as an immediate precursor of cloud computing, which makes it still relevant
today.
As we saw in Chap. 1, computers, microprocessors, and the devices that utilize
them (e.g., laptops, smartphones) have become smaller and smaller overtime, up to
the degree that in many applications (e.g., cars, potentially the human body) they
may disappear from visibility entirely, either because they are so tiny or because
they are simply built-in. On the other hand, applications requiring massive com-
puting power have never stopped arising, be it in physics, medicine, microbiology,
other natural sciences, as well as many other areas including commercial ones, as
we will see below (in connection with Big Data). A key solution to meeting these
demands has always been virtualization, which according to Wikipedia is referring
to the idea of “the act of creating a virtual (rather than actual) version of something,
including virtual computer hardware platforms, operating systems, storage devices,
and computer network resources.”
Thus, virtualization (in computer science) is essentially a mapping of logical
resources to physical ones in such a way that a user (of logical resources) will not
notice any difference (to the underlying physical ones). An early incarnation of this
concept in the 1970s was that of virtual memory, or a programmer’s illusion that
available storage or logical address space is essentially unlimited, and is mapped to
the limited available physical storage via appropriately memory management
techniques. Hence, for a user, virtualization is transparent, and virtualization
VM Container VM Container
App. 1 App. 2 App. 3 App. 4
Operating System Operating System
Virtualization Layer
Hardware
Fig. 2.8 Virtualized infrastructure
abstracts from the hardware actually present. This abstraction happens between the
hardware and the software layer of a system, as indicated in Fig. 2.8; which shows
two virtual machines mapped to the same hardware and encapsulated by individual
containers. Note that virtual machines (VM) can run distinct operating systems atop
the same hardware.
Virtualization typically simplifies the administration of a system, and it can help
increase system security; a crash of a virtual machine has no impact on other virtual
machines. Technically a virtual machine is nothing but a file. Virtualization is
implemented using a Virtual Machine Monitor or Hypervisor which takes care of
resource mapping and management.
We finally mention another precursor to cloud computing, which can be
observed during the past 25 years as a major paradigm shift in software develop-
ment, namely a departure from large and monolithic software applications to
light-weight services which ultimately can be composed and orchestrated into more
powerful services that finally carry entire application scenarios. Service-orientation
especially in the form of service calls to an open application programming interface
(API) that can be contacted over the Web as long as the correct input parameters are
delivered have not only become very popular, but are also exploited these days in
numerous ways, for the particular reason of giving users an increased level of
functionality from a single source. A benefit of the service approach to software
development has so far been the fact that platform development especially on the
Web has received a high amount of attention in recent years. Yet it has also
contributed to the fact that services which a provider delivers behind the scenes to
some well-defined interface can be enhanced and modified and even permanently
corrected and updated without the user even noticing, and it has triggered the
development of the SOA (Service-Oriented Architecture) concept that was men-
tioned in the previous section.
2.2.4 What Defines Cloud Computing?
Now that we have discussed various technological precursors to cloud computing,

we can state what this term actually means. To this end, we follow the American
National Institute of Standards and Technology (NIST), to which multiple authors
have already referred before (e.g., Mell and Grance 2011; Sitaram and Manjunath
2012). NIST defines cloud computing as follows4:
Cloud computing is a model for enabling ubiquitous, convenient, on-demand network
access to a shared pool of configurable computing resources (e.g., networks, servers,
storage, applications, and services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction. This cloud model is composed
of five essential characteristics, three service models, and four deployment models.
Essential Characteristics:
• On-demand self-service. A consumer can unilaterally provision computing capabili-

ties, such as server time and network storage, as needed automatically without requiring
human interaction with each service provider.
• Broad network access. Capabilities are available over the network and accessed
through standard mechanisms that promote use by heterogeneous thin or thick client
platforms (e.g., mobile phones, tablets, laptops, and workstations).
• Resource pooling. The provider’s computing resources are pooled to serve multiple
consumers using a multi-tenant model, with different physical and virtual resources
dynamically assigned and reassigned according to consumer demand. There is a sense
of location independence in that the customer generally has no control or knowledge
over the exact location of the provided resources but may be able to specify location at a
higher level of abstraction (e.g., country, state, or datacenter). Examples of resources
include storage, processing, memory, and network bandwidth.
• Rapid elasticity. Capabilities can be elastically provisioned and released, in some cases
automatically, to scale rapidly outward and inward, depending on demand. To the
consumer, the capabilities available for provisioning often appear to be unlimited and
can be appropriated in any quantity at any time.
• Measured service. Cloud systems automatically control and optimize resource use by
leveraging a metering capability at some level of abstraction appropriate to the type of
service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage
can be monitored, controlled, and reported, providing transparency for both the pro-
vider and consumer of the utilized service.
With these defining properties, cloud computing promises to realize the vision that
computational power will ultimately be obtainable like electricity from a wall
outlet, an idea that has been termed utility computing and was first mentioned by
John McCarthy during a talk at the MIT Centennial 1961, where he said: “If
computers of the kind I have advocated become the computers of the future, then
computing may someday be organized as a public utility just as the telephone
system is a public utility… The computer utility could become the basis of a new
and important industry.” Today the idea of utility computing is primarily mani-
fested in the pricing models of cloud services, which often base payment on actual
usage.
4
nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf
The defining properties also set cloud computing apart from its various pre-
cursors, in particular cluster computing as well as grid computing, since the latter
were not based on virtual resources, rapid elasticity, or on-demand self-service;
only the grid typically has possibilities to measure service usage. Other than that,
cloud services and cloud computing are the more flexible option, which has made it
so attractive to the IT community.
Technically, cloud providers typically base their processing power on large
collections of commodity hardware, including conventional processors (“compute
nodes”) connected via Ethernet or inexpensive switches, which are arranged in
clusters and which are replicated within as well as across data centers. Replication
as a form of redundancy is the key to hardware reliability and fault-tolerant pro-
cessing, and in just the same way as data is protected against losses via replication.
Besides fault-tolerance and availability, distribution can enhance parallel processing
of the given data, in particular when computing tasks can be executed indepen-
dently on distinct subsets of the data. In such a case, data is often partitioned over
several clusters or even data centers.
2.2.5 Classification of Cloud Services
The number of cloud services available these days is huge, and as we have briefly
discussed with our introductory examples, cloud services can essentially be cate-
gorized in three ways as shown in Fig. 2.9 (although more categories are sometimes
used, e.g., Database-as-a-Service or Business Process-as-a-Service), which also
Fig. 2.9 Common cloud

service models
User Interface
Software-as-a-Service (SaaS)
Platform-as-a-Service (PaaS)
Infrastructure-as-a-Service (IaaS)
Virtualization Layer
Hardware Infrastructure
represent the NIST service models5 (see also Mell and Grance 2011; Sitaram and
Manjunath 2012):
• Software as a Service (SaaS): The capability provided to the consumer is to use

the provider’s applications running on a cloud infrastructure. The applications
are accessible from various client devices through either a thin client interface,
such as a Web browser or an API. The consumer does not manage, control, or
even maintain the underlying infrastructure including network, servers, oper-
ating systems, storage, or individual application capabilities, with the possible
exception of limited user-specific application configuration settings.
• Platform as a Service (PaaS): The capability provided to the consumer is to
deploy onto the cloud infrastructure consumer-created or acquired applications
created using programming languages, libraries, services, and tools supported
by the provider. The consumer does not control the underlying infrastructure,
but has control over the deployed applications and possibly configuration set-
tings for the application-hosting environment.
• Infrastructure as a Service (IaaS): The capability provided to the consumer is
the provision of processing, storage, networks, and other fundamental com-
puting resources where the consumer is able to deploy and run arbitrary soft-
ware, which can include applications and even operating systems. The consumer
does not control the infrastructure itself, but has control over operating systems,
storage, and deployed applications; and possibly limited control of select net-
working components (e.g., host firewalls).
Thus, software in an SaaS model can be immediately employed by a user, yet the
provider is running the software and takes care of installation, administration,
maintenance, upgrades, or failure recovery. Under the PaaS model users can
develop and provision their own programs in the cloud; the infrastructure provider
may define certain general regulations regarding, for example, the programming
environment or the available libraries or interfaces. Under the IaaS model, which is
closest to the vision of utility computing, the cloud provider offers virtual hardware
or infrastructure services, including computing power, storage, or network
bandwidth.
We mention that specific IaaS services are nowadays available for storage as
well as for networking: Software-defined storage (SDS) is the idea of making data
storage independent of the underlying hardware by introducing a software layer
atop that is policy-based and provides management capabilities. Software-defined
storage typically includes a form of storage virtualization to separate the storage
hardware from the software that manages it. The software enabling such an envi-
ronment may also provide policy management for features such as data dedupli-
cation, replication, thin provisioning, snapshots and backup. It is essentially a
similar concept to software-defined networking which we already mentioned in
5
nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf
Chap. 1. Both techniques built upon virtualization, and together with virtualized
servers that form the basis of a data center that is completely software-defined,
capable of offering its services broadly, in an automated fashion. See www.snia.org/
sites/default/files/SNIA%20Software%20Defined%20Storage%20White%20Paper-
%20v1.0k-DRAFT.pdf for an attempt to provide a generally accepted definition of
SDS.
We also mention that the notion of a platform is used in different ways: What we
have just described is a purely technical view of a platform, where it provides
development tools and ways to directly provision the software that has been
developed using these tools. A typical example here is WSO2 (wso2.com/), where a
variety of tools are readily available for establishing a Service-Oriented Architec-
ture in the cloud. Another one is Salesforce’s Force platform. This is not to be
confused with a different type of platform, often related to and mentioned in the
connection with some form of disruption, which brings together supply and demand
as a “hotspot” of digital economy. Examples include Apple’s iTunes and AppStore
or Airbnb; artifacts we will say more about in Chap. 5.
2.2.6 Types of Clouds
Besides the types of services that a cloud can provide, a distinction can be made
regarding the types of clouds that are employed. Again it is common to follow NIST
document mentioned above, which calls them deployment models and defines the
following, which is illustrated in Fig. 2.10.
Company 1
Private
Cloud Hybrid
Cloud
Internet Company 6
Private
Public Cloud
Public Cloud
Cloud
Company 2 Company 3
Community
Cloud
Company 4 Company 5
Fig. 2.10 Cloud deployment models

• Public cloud: The cloud infrastructure is provisioned for open use by the
general public. It may be owned, managed, and operated by a business, aca-
demic, or government organization, or some combination of them. It exists on
the premises of the cloud provider.
• Private cloud: The cloud infrastructure is provisioned for exclusive use by a
single organization comprising multiple consumers (e.g., business units). It may
be owned, managed, and operated by the organization, a third party, or some
combination of them, and it may exist on or off premises.
• Hybrid cloud: The cloud infrastructure is a composition of two or more distinct
cloud infrastructures (private, community, or public) that remain unique entities,
but are bound together by standardized or proprietary technology that enables data
and application portability (e.g., cloud bursting for load balancing between clouds).
• Community cloud: The cloud infrastructure is provisioned for exclusive use by
a specific community of consumers from organizations that have shared con-
cerns (e.g., mission, security requirements, policy, and compliance considera-
tions). It may be owned, managed, and operated by one or more of the
organizations in the community, a third party, or some combination of them, and
it may exist on or off premises.
We mention that companies like Oracle offer an interesting alternative to both

private and hybrid clouds, namely to bring the public cloud to your premises or
your own data center: An Oracle Cloud Machine is an infrastructure running
anything from the Oracle Public Cloud on premise and hence behind the customer’s
firewall. Users do not need any specific know-how, since the Cloud Machine is run
by Oracle, but can still provide the full spectrum of cloud services. For details, see
www.oracle.com/cloud/paas/cloud-machine/index.html.
We have already seen a typical example of a public cloud, namely the Amazon
cloud providing AWS. Anybody can use this cloud, provided payments are made as
requested. The provider needs to have a critical mass of infrastructure in order to be
able to realize economies of scale. Large organizations or large companies such as
banks that operate internationally, sometimes even universities often run a private
cloud, which gives this organization exclusive access to the available resource.
Multiple organizations in the same application domain (e.g., healthcare, cooperative
banks) often share a community cloud, which makes sense when all of them have
similar computing requirements and can agree to share the cost of establishing and
running “their” cloud. Finally, a hybrid cloud is often the combination of several
other clouds, some of which may be public and others private.
From a user’s perspective, it should be obvious that most personal usage occurs
in a public cloud (e.g., Dropbox, Apple’s iCloud), while large companies often
prefer private clouds while SMEs are inclined to utilize public or community
clouds. Some providers even offer a compromise in the form of a virtual private
cloud, i.e., an area within a public cloud that is logically separated from the rest;
this, however, cannot offer the same degree of isolation that a physically separated
private cloud can.
2.2.7 Cloud Revenue Models
We mentioned earlier that a major motivation for the high interest in the cloud in
recent years has been the fact that cloud services are often based on a pay-per-use
business model. In fact, there are several pricing schemes in effect which make the
cloud interesting for both private as well as professional users. The following
models are common:
1. Free: The service is essentially free to use, i.e., no immediate flow of money is
required, yet most often a user has to register in return for the service. This
scheme is typically applied when the service itself is sponsored through
advertising, and when the service provider is interested in collecting data about
its users, e.g., their e-mail addresses or—better yet—some kind of “user pro-
files” that allows them to customize the placement of advertisements. Clearly,
this model can help to attract users which in turn attracts suppliers of ads.
Examples include services like AroundMe, Yelp, or SeatGuru.
2. Pay-per-use: As mentioned, in this model there is no free usage, but no constant
payments either. A service user is only charged for the duration and/or amount
of service usage. Usage-based pricing corresponds to the human rationality that
each single unit of a commodity raises the total amount of money to pay for.
Examples include AWS or Microsoft Azure.
3. Pay-per-unit: In this model, a payment is made once per product purchased,
independent of its usage intensity. For example, Amazon or Apple may charge
for the purchase of a single title of music or a movie, but they do not care how
often it is actually played thereafter. The same concept applies to buying apps in
an app store or to streaming services like Amazon Prime Video.
4. Package pricing: This model offers a user a certain amount of API calls to a
service for a fixed fee and is typically applied when the service is primarily
providing data. Note that companies in this category often sell quantities that are
not actually used by the customers; also note that depending on the package
size, API-calls potentially allow the model of arbitrage. Examples include
OpenCalais by Thomson Reuters or Yahoo!Boss.
5. Flat fee: This kind of tariff is one of the simplest pricing models with minimal
transaction costs. It can be based on time or volume as the only parameter. This
pricing scheme is mainly used in regard to software licenses and software
hosting and resembles the subscription model traditionally applied in the
newspaper business. On the one hand, a flat fee tariff provides suppliers and
users with more safety in planning future activities. On the other, especially
from the user’s perspective, a flat fee tariff lacks flexibility. A supplier has to
bear in mind his service’s specific market structure and the user’s preferences.
To do so, suppliers could combine a fee tariff with flexibility by offering short
term contracts. Examples include the Reserved Instances for Amazon Web
Services such as EC2 or RDS.
6. Two-part tariff pricing: This is a combination of the previous two, where users
pay a fixed basic fee and on top of that an additional ‘fee per unit consumed’.
Sometimes, the fixed part covers the fixed costs of the provider, whereas the
variable fee generates the profit. Another example is pricing schemes for soft-
ware licenses where prices are often calculated by taking a base fee and adding a
surcharge depending on the numbers of users who would use the system. This
pricing scheme is commonly used by telephone companies.
7. Freemium: This is another combination where the idea is to let users join and
use basic services for free and charge them for (premium) services that provide
additional value to customers. The payment model for additional services can
take any of the forms described above. An example is again AWS EC2, which
has a “free tier” for new customers, or OKCupid, which is a dating service
whose basic usage is free, but which requires a paid registration for using
advanced services.
Various other combinations besides freemium or two-part pricing are in use, for
example, in a “pay-per-unit + pay-per-use” model, an initial buying price is
charged as well as usage-based fees thereafter. The AWS pricing for “Reserved
Instances” mentioned above is an example of this model. All models have pros and
cons, especially depending on whether you consider them from a user’s or a pro-
vider’s perspective.
2.2.8 Cloud Benefits and Pitfalls
We conclude this introduction to cloud computing by summarizing the benefits and

pitfalls this concept may have. The reader should keep in mind that these again
depend on whose side you are on; indeed, what might be a benefit for a provider
can at the same time be problematic from a customer’s perspective.
From a user’s point of view, two kinds of benefits are most relevant: reduction of
costs and increase in flexibility. The former is definitely a major argument in favor
of the cloud, since the cloud frees a user from high up-front investments into
information technology as well as from the majority of running costs that it will
incur. As a consequence, projects can be initiated without a big risk of investment;
personnel that previously was involved in IT management (e.g., for operation,
maintenance, backup and recovery, etc.) can now be deployed elsewhere.
Regarding flexibility, it should be clear that a move to the cloud typically
requires a revision of a company’s IT architecture, as applications that have so far
been running in-house will from now on be obtained as services from outside.
However, the benefit of this might be an overall simplification of that architecture, a
streamlining of processes (more on this later), the possibility to better respond to
changes in demand for computing resources, and even a participation in
technological progress, as the provider will generally make sure that the systems he
is running are up-to-date.
On the downside, there are also potential pitfalls to consider, and it is these
pitfalls that often result in hesitation to move to the cloud. For example, SMEs
whose core business or competency is not in the IT area often lack appropriately
trained personnel for dealing with service-oriented IT in general or the cloud in
particular. They also may have difficulties to do comparisons among the variety of
providers nowadays offering cloud services. Next is the issue of trust, or the
question whether the provider can be trusted with respect to data and applications
he is hosting for the company in question. Availability is an important aspect, as an
unexpected outage which makes services unavailable can easily endanger, some-
times even ruin a business that vastly relies on the cloud (see www.theregister.co.
uk/2015/09/20/aws_database_outage/ for an example). Finally, there might be the
problem of a lock-in effect; if a company decides to switch from one cloud service
provider to another, how easy or difficult will that be? A recent study by Dillon and
Vossen (2015) has shown that these pitfalls are perceived quite differently in var-
ious countries.
Independent of the benefits and pitfalls of the cloud, a recent survey by Morgan
Stanley in which 100 CIOs (75 in the U.S. and 25 in Europe) were interviewed
reveals that almost 30% of all applications were expected to be migrated to the
public cloud by the end of 2017, up from 14% in 2016. Also, Microsoft Azure will
bypass Amazon AWS as an IaaS provider within the next three years, and Azure is
predicted to edge out AWS by 2019 for both IaaS and PaaS. For SaaS, an increase
in marketing applications is foreseen, while analytics and business intelligence will
remain largely stable.
2.3 Technology for the Management of (Big) Data
During the mid-1990s, companies started to notice that the then new Web was
creating increasing amounts of data. Originally, the reason was a pure technological
one: a Web server which composes pages from database input, style files and other
sources into an HTML file a browser can render automatically and records
everything it is doing. Similarly, a search engine records every query a user is
issuing (which was, for example, used by Google for a while to analyze the
“Zeitgeist” behind queries), and an e-commerce site even records every click. So
when users became more active on the Web, it was near-at-hand to start conserving
every single click they made and every word they wrote. A famous quote char-
acterizing the demand to do something meaningful with all that data is “there’s gold
in your data, but you can’t see it.”6 This triggered the rise of data warehouse
technology for online analytical processing (OLAP) as well as the emerging of data
mining tools. This development has continued ever since, and with the ongoing
6
www.siggraph.org/education/materials/HyperVis/applicat/data_mining/data_mining.html
increase in digitization, there is no end in sight. So making the most of large data
collections is of high interest today (think of Amazon’s recommendations or
Facebook ads, which become more and more user-specific) and will be even more
crucial in the future. Indeed, the development has gone from pure analysis to
prediction and to prescription, or as Friedman (2016) puts it, “guessing is over.”
The core buzzword here, which emerged in the course of 2013 (and is effectively
gone by 2017), is “Big Data,” and in this section, which is partially based on
Vossen (2014), we will try to clarify what this means.
2.3.1 Characterizing Big Data
According to Wikipedia, big data “is a collection of datasets so large and complex
that it becomes difficult to process using on-hand database management tools.”
According to Bernard Marr at DataScienceCentral.com, “the basic idea behind the
phrase ‘Big Data’ is that everything we do is increasingly leaving a digital trace (or
data), which we (and others) can use and analyze. Big Data therefore refers to that
data being collected and our ability to make use of it.” In other perceptions, the
characterization of Big Data is done via properties that all start with a V:
• Volume: Big Data typically exceeds both an organization’s own data as well as
its storage or compute capacity for accurate and timely decision-making. In a
2013 report, Intel reported that in a single Internet minute 639,800 GB of global
IP data gets transferred over the Internet, which can be broken down into emails,
app downloads, e-commerce sales, music listening, video viewing, or social
network status updates; for an up-to-date impression of how much data is moved
on the Internet per second, the reader should refer back to Chap. 1 or take a look
at visual.ly/internet-real-time or www.webpagefx.com/internet-real-time/.
• Variety: Data nowadays comes in a variety of formats ranging from highly
structured (e.g., data from a relational database) to semi-structured (e.g., XML
data) to unstructured (e.g., arbitrary texts like the ones from Twitter tweeds or
Facebook posts).
• Velocity: Data is often produced at high speeds and often comes as a data
stream instead of in a “discrete” format; streams are often so large (and fast) that
there is no way to store and evaluate of analyze them offline. Instead, whatever
should be extracted from a stream needs to be instantly.
• Veracity: Data may be dirty, falsified, unsafe or simply unreliable; it may
represent fake news (“alternative facts”) or may simply contain typing and other
grammatical errors.
• Value: Data is (hopefully) of high value to those who analyze it.
2.3 Technology for the Management of (Big) Data 83
We note that the first three of these V’s are attributed to analyst Doug Laney who
works for Gartner, and additional V’s can easily be identified.7 In essence, big data
refers to the situation that more and more aspects and artifacts of everyday life, be it
personal or professional, are available in digital form, e.g., personal or company
profiles, social-network and blog postings, buying histories, health records, to name
just a few. Increasingly more data is dynamically produced especially on the
Internet and on the Web, and that nowadays the tools and techniques are available
for evaluating and analyzing all that data in various combinations and for deriving
conclusions from that which can converted into some kind of benefit. Numerous
companies foresee the enormous business opportunities that analytical scenarios
based on big data can have, and the impacts that it already has or at least soon will
have on advertising, commerce, and business intelligence (BI). BI tools have
meanwhile reached maturity, and besides stored data it is now possible to process,
or to incorporate into processing, data streams which cannot or need not be stored.
We generally consider “big” data as a consequence of the Web 2.0 developments
(see Chap. 1), but warn the reader right away that “big” is indeed relative to the
point in time that you look at it (what we consider big today would have been
unimaginable 10 years ago, and in ten years from now we will be amused by the
“tiny” amounts of data we handled in 2017).
As we mentioned earlier in this chapter for cloud computing, big data can again
be viewed from a technical, an economic, an organizational, and a legal perspective.
This section will primarily discuss the technological dimension and hence the
technology available for handling big data, in particular technology that has made it
to the center of attention recently. We note that big data processing often goes
beyond the pure employment of technology, but also needs a host of techniques
from statistics as well as from the field of visualization. The former area contributes
tools for clustering, classification, and statistical modeling (using, e.g., regression,
or neural networks), while the latter helps in data exploration using tools like
bubble charts, 3D scatter plots, or network and tree visualizations, to mention just a
few.
2.3.2 Databases and Data Warehouses
In order to cope with big data, a variety of techniques, methods, and technologies
has been developed over the years. For a long time, database systems has been the
technology of choice when it comes to storing, retrieving, querying, or generally
managing data. This is still the case today, since database technology has indeed
kept up with the fact that datasets to be processed fast have become larger and
larger over the years, that the types of data to be handled have changed, and that
considerable processing power is needed for complex computations to be per-
formed on this data. We take a brief look at the kind of database technology next
7
e.g., www.datasciencecentral.com/profiles/blogs/top-10-list-the-v-s-of-big-data
that has dominated the field since the early 1980s, and also look at data warehouses
as a core Business Intelligence (BI) technology for many years.
Databases and database systems were originally invented in the 1960s in an
attempt to organize ever growing data collections, and to do so in a way that could
guarantee certain properties. Among them was data independence, or independence
of data from the programs that operated on the data, later also declarative query
languages like SQL and the ACID contract for transaction processing guaranteeing
atomicity, consistency, isolation, and durability for database transactions in a
multi-user environment. Database systems experienced a major breakthrough in the
early 1980s with the arrival of the Codd’s relational data model, which suggested to
organize data in tables uniformly representing entities and their relationships. The
theory and also practice of relational systems that were built based on that model is
nowadays well-understood, has survived a number of potential competitors (e.g.,
object-oriented databases), and its query language SQL has reached popularity and
dissemination beyond pure databases.
In the mid-1990s it turned out that data collections especially in enterprises had
reached a volume which suggested that there should be other uses than just simple
CRUD operations (create, read, update, delete) or SQL querying and reporting. The
idea of data analysis was born, and it was soon suggested to perform analytical
applications on a separate “database” since it was expected that analysis could be
compute-intensive and should hence not impact the operational systems that pro-
duced and delivered the data and that supported daily business. This is what a data
warehouse is about; it enables online-analytical processing (OLAP) as well as data
mining on data collections that have been specifically prepared. Indeed, these
application are typically run on data that has been through a staging area or through
an ETL process (short for extraction, transformation, and loading) during which
data is selected from available sources, cleaned, curated, transformed into a specific
schema form (star or snowflake schema), and finally loaded in the warehouse.
Enterprises would move transactional data to the warehouse at regular intervals.
A large warehouse is often partitioned into subject-oriented data marts, and these or
the warehouse itself are made available to a variety of evaluation tools (dashboards,
spreadsheets, reports, mining applications, etc.). This results in a classical archi-
tecture for a data warehouse as shown in Fig. 2.11.
While a data warehouse typically comes—just like a database management
system—as a software package that is locally installed, Big Data and cloud com-
puting have triggered an evolution in this field that has radically changed the
picture. In particular, Big Data has—as a consequence of the Web 2.0 develop-
ments we have described in Chap. 1—brought along numerous new data sources
outside an enterprise that carry interesting or relevant data, and cloud computing
has made a host of tools available that no longer require local installation and can be
used on demand. We will have more to say about this later. We note that certain
external data sources have always been available, for example Web statistics and
tracking as done by SimilarWeb (www.similarweb.com/) which has been around
since 2007, but many sites (including the likes of Facebook and Twitter) nowadays
Ad-hoc Standard Analysis

Dashboards Spreadsheet Data Mining
Query reports Tools
Middleware Platform
Data
OLAP Meta Data Warehouse Archive Warehouse
Server Data Data Basis Systems Core
Data Marts
Data
Transfor- Meta Data Loading/
mation
Integration Cleansing
Generation Updating
Migration
(ETL)
Operational
Database
Systems
Internal Data Sources External Data Sources
Fig. 2.11 Classical data warehouse architecture
provide an API (application programming interface) through which users can

“grab” data from these sites themselves.
2.3.3 Distributed Files Systems
We next look at more recent requirements especially from big data scenarios and
discuss how these are being handled. Two important developments in this area are
novel types of database systems, including NoSQL systems and in-memory data-
base systems, and computing environments that are based on distributed file sys-
tems; we start with the latter.
Even though relational database systems were a huge success and have been in
wide use for many years, there have always been situations or applications for
which a database was not an optimal solution. For example, such a system might
require an initial investment and additional hardware which is beyond what an
enterprise can afford. The system “overhead” for providing high-level querying,
concurrency control, recovery, and data integrity may be inappropriate for a given
application. Or the application at hand might be simple enough and not require
multi-user access so that a database system is unnecessary. If there are stringent

real-time requirements that a database system cannot meet, or if the database system
is not able to handle given data complexity due to modeling limitations (e.g., in
complex genome and protein databases), different forms of data management
support are needed.
Starting with Web 2.0, novel requirements have arrived that ask for novel data
management approaches; these requirements stem from large search engine
indexes, text processing capabilities, including search, analysis, and comparison,
and less transactional strictness, e.g., my tweets need no isolation from others since
they cannot violate consistency of any data. All of these applications need to handle
huge amounts of data, and need to do so at high speeds (often ideally in “real
time”). The solution here is two-fold: First, file systems are employed instead of
database systems, since reading and writing from and to them is usually much
simpler than it is for a database, and second, these file systems come in a distributed
version. Distributed file systems became attractive out of the observation that, when
data comes in such large quantities as it does today, “traditional” technology
focusing around a central database may no longer be apt, primarily due to missing
parallel processing capabilities. Indeed, many big data scenarios come with the
requirement that they need scalable, distributed and fault-tolerant data processing
capabilities, including temporary or even permanent storage; this can only be met
by parallel programming and processing paradigms suitable for handling large
collections of data.
On the other hand, it was also discovered that the data that needs to be processed
often comes with a certain regularity: Sorting or ranking Web pages according to
their importance—something a search engine needs to do for a single search query
—comes down to an iterated matrix-vector multiplication, with the tiny “hatch” that
matrix and vector dimensions are typically in the millions or even billions. Simi-
larly, the search for “friends” in a social network can be cast into graph operations,
except that the graph in question typically has hundreds of millions nodes and
edges. So without parallelism and distributed processing, nobody would have a
chance to attack such problems!
In a modern distributed file system, individual files can easily have TB-range
size; they are often static, i.e., they rarely change, but new data gets appended to
existing data. Files are often broken down or partitioned into chunks of, say, size
64 MB, and chunks get replicated (i.e., kept in identical copies) throughout the file
system. Often replication is threefold, and then a master node keeps track of where
individual chunks are located. For example, Fig. 2.12, which follows en.wikipedia.
org/wiki/Google_File_System, illustrates the organization of the Google File Sys-
tem that underlies many Google applications; it exhibits the division into chunks
(and replicas) just described.
It is important to highlight that industrial-strength cloud applications today are
based on file systems of a different quality. For example, Amazon’s AWS cloud
infrastructure “is built around Regions and Availability Zones (‘AZs’). A Region is
a physical location in the world where we have multiple Availability Zones.
Availability Zones consist of one or more discrete data centers, each with redundant
File 1
Chunk 1
File 1
Chunk Server
Chunk 2
File 2
Chunk 1
File 1
Chunk 2 redundant
File 1
App Master Chunk Server
Chunk Chunk 1
Mappings
File 2
Chunk 2
Shadow
Master
File 1
Chunk 2
File 2
Chunk Server
Chunk 1
File 2
Chunk 2
Fig. 2.12 Google File System
power, networking and connectivity, housed in separate facilities. These Avail-

ability Zones offer you the ability to operate production applications and databases
which are more highly available, fault tolerant and scalable than would be possible
from a single data center” (see aws.amazon.com/about-aws/global-infrastructure/).
In January 2017, there were 42 AWS Availability Zones within 16 geographic
AWS Regions; for example, Region Europe has zones in Ireland (3), Frankfurt,
Germany (2), London, UK (2), and another planned in Paris, France.
If file systems get globally distributed as in AWS, so that we can speak of
global-scale data management, a host of new challenges arise, including fault
tolerance, proximity to users, and data center diversity. For example, wide-area
network (WAN) links connecting data centers typically have a large bandwidth
(e.g., 100 Tbps), yet they are shared by many applications, which makes WAN
bandwidth an expensive commodity. And although this form of data management
can bring data closer to users, coordination can take more time than in a regionally
restricted situation, since, for example, a Facebook post should show up identically
on all of my friends’ walls, independent of where they are in the world and to which
data center they are hence connected. If user requests can only be handled with
some non-negligible latency, his or her Web experience is affected, and monitoring
shows that even a 100 ms delay may drive users or potential customers away to the
competition.
Global-scale data management and globally distributed file systems have given
rise to a number of technical developments that go beyond the scope of this book.
However, we will return to one of them, relaxing the ACID requirements, later in
this chapter.
2.3.4 Map-Reduce and Hadoop
In order to cope with the enormous amounts of data that modern applications
demand to handle efficiently, today’s computing environments often rely on mul-
tiple parallel compute clusters and distributed file systems. Cluster computing was
already mentioned earlier in this chapter as one of the precursors of cloud com-
puting, and modern clusters contain multiple machines that are connected within the
rack that is housing them; data centers then hold a large number of such racks
which are connected among each other through a network, and on top the data
centers of a provider are also connected, since, as we mentioned, data or files can
now be distributed globally. Since machines within a rack or entire racks may fail
(and typically will after a while), data processed by them is kept redundant, and
computations are broken down into various tasks that can individually be restarted
when necessary.
An approach to breaking a computation down into various tasks was developed
by Google when they were looking for a technique for indexing and searching large
data volumes and for ranking those indexed pages that qualify as search results. We
mentioned in Chap. 1 that computing the PageRank of a Web page, although
recursive by definition, can be accomplished by an iterative procedure which is
based on matrix-vector multiplication in very high dimensions. The good news is
that matrix-vector or matrix-matrix multiplication has long been known as a
problem for which efficient parallel algorithms exist. These require that both factors
of a multiplication task be suitably partitioned into pieces and that these pieces are
replicated over multiple compute devices, in order to be readily available for the
multiplication step that needs them.
While replication is generally a measure to enhance data availability, since if one
copy fails another might still be available, partitioning turns out to be the key to
tackling many large-data problems algorithmically. Partitioning essentially follows
the “old” principle of divide and conquer, which has a long tradition in computer
science and the design of algorithms. If data can be split into various independent
partitions, processing of that data can exploit parallelism, for example by keeping
multiple cores of a processor or multiple CPUs in a cluster busy at the same time.
The results obtained by these cores or CPUs may need to be combined in order to
form a final processing result. This is the basic idea of Google’s map-reduce (for
which Google obtained US Patent 7,650,331, granted in January 2010) which
employs higher-order functions (well-known from the functional programming
paradigm) for specifying distributed computations on massive amounts of data.
Mapper
Reducer
Mapper
Partition
Shuffle
into Combined
Input data and group Reducer
various output
by keys
chunks
Mapper
Reducer
Mapper
Fig. 2.13 Principle of a map-reduce computation
Map-reduce is a combination of two functions, map and reduce, which work on

key-value pairs. A map-reduce computation essentially works as shown in
Fig. 2.13: Input data is made available in a number of data chunks, which typically
come from a distributed file system. These chunks are fed into map tasks executed
by components called mappers. Mappers turn their given input chunk into a
sequence of key-value pairs; exactly how these key-value pairs are generated from
the input data depends on the particular computing task and is determined by the
code written by the user for the map function. Next, Mapper intermediate outputs
are collected by a master controller and grouped by their key values. The keys and
their associated value groups are then given to reduce tasks in such a way that all
key-value pairs with the same key end up at the same reducer component. Finally,
reducers work on one key at a time, and combine all the values associated with that
key in a task-dependent way again specified by the code written by the user for the
reduce function. So essentially, a map-reduce computation centers around two
functions that resemble SQL’s group-by followed by aggregation:
1. map: (K1, V1) ! list(K2, V2)

2. reduce: (K2, list(V2)) ! list(K3, V3)
To see how map-reduce can be used in matrix-vector multiplication, consider an

n n matrix M with element mij in row i, column j, 1 i, j n. Also given is a
vector v of length n with element vj in position j, 1 j n. Then the product of
M and v is a vector x of length n whose components are computed as
X
n
xi ¼ mij vj
j¼1
Now suppose that n is of the order 1012, and that vector v fits in main memory.
Then a map steps processes a chunk of M: For each element mij, map produces a
weather station ID longitude/latidude
00573321309999991990010103004--51317+028783FM-12+0171…-0128
date temperature
Fig. 2.14 Sample weather input data
key-value pair (i, mijvj), i.e., all terms of the sum for one xi get the same key
i. Reduce will add all values that belong to the same key i and yield a pair (i, xi) and
hence the components of the result vector.
As another example, we consider the analysis of weather data coming as a long
string from weather stations; our interest is in an overview of the maximum tem-
peratures recorded during a year. Input data in this case might look like the sample
shown in Fig. 2.14. The weather station regularly sends long strings that have to be
interpreted appropriately; every string contains, among other information, the ID of
the station, the date of the measurement, longitude and latitude of the station’s
location, and the actual temperature.
Now suppose the following input is received, where the parts relevant for
determining the maximum temperature are highlighted (and temperature values are
rounded to integers):
00670119909999991990051507004 þ 51317 þ 028783FM 12 þ 0171. . . þ 0000

00430119909999991990051512004 þ 51317 þ 028783FM 12 þ 0171. . . þ 0022
00430119909999991990051518004 þ 51317 þ 028783FM 12 þ 0171. . . 0011
00430119909999991989032412004 þ 51317 þ 028783FM 12 þ 0171. . . þ 0111
00430119909999991989032418004 þ 51317 þ 028783FM 12 þ 0171. . . þ 0078
If this is the chunk of data given to a mapper, then the latter will extract year (as
key) and temperature (as value) as desired:
ð1990; 0Þ; ð1990; 22Þ; ð1990; 11Þ; ð1989; 111Þ; ð1989; 78Þ
Shuffling and grouping this by key values will result in
ð1989; ½111; 78Þ; ð1990; ½0; 22; 11Þ;
from which a reducer can determine maxima as
ð1989; 111Þ and ð1990; 22Þ:

It should be obvious that a task like this, which will in reality be based on huge
amounts of weather-station data all of which can be processed independently, is a
perfect candidate for a map-reduce computation. Other such tasks include counting
the occurrences of words in a text collection (relevant to index creation and
maintenance for a search engine), or even operations from relational algebra. In
general, map-reduce is applicable to problems which are easy to parallelize, i.e.,
which can easily be partitioned into parts such that these parts are processed
independently and their results are then combined into an overall result.
Apache Hadoop and Spark
Clearly, several issues need to be addressed in order to make a map-reduce com-
putation work, including the question of how to decompose a given problem into
smaller chunks which can be processed in parallel, how to adequately assign tasks
to compute nodes (executing a mapper or a reducer), how to coordinate synchro-
nization between the different compute nodes involved in a computation, or how to
make such a scenario robust against failures. These questions have been answered
in recent years in various ways, the best known of which is the software library
Hadoop. Hadoop8 supports scalable computations across distributed clusters of
machines. Its core components are the MapReduce Engine, the Hadoop Distributed
File System (HDFS), and the YARN (Yet Another Resource Negotiator) config-
urable resource manager for clusters. The former is responsible for execution and
control of map-reduce jobs; HDFS is a distributed file system in which large
datasets can be stored, read, and output. User data is divided into blocks, which get
replicated across the local disks of cluster nodes. HDFS is based on a master-slave
architecture, where a namenode as master maintains the file namespace including
the file-to-block mapping and the location of blocks, and datanodes as slaves
manage the actual blocks. Besides these main components there are numerous
extensions of Hadoop by specific functionality such as data storage, processing,
access, and data management in general, which together are considered the Hadoop
“ecosystem,” a snapshot of which can be found at savvycomsoftware.com/what-
you-need-to-know-about-hadoop-and-its-ecosystem/.
A major Hadoop “competitor” nowadays is Spark, another Apache project that is
an improvement over the original Hadoop MapReduce component of
Hadoop. Spark is a fast cluster computing system developed through contributions
of almost 250 developers from 50 companies in UC Berkeley’s AMP Lab. Spark
follows an execution model supporting in-memory computing and optimization of
arbitrary operator graphs, so that querying data becomes much faster when com-
pared to the disk based engines. While Hadoop can only be used in batch mode,
Spark is interactive. Spark powers a stack of libraries including SQL and Data-
Frames, MLlib for machine learning, GraphX, and Spark Streaming. You can
combine these libraries seamlessly in the same application. For details, we refer the
reader to spark.apache.org/. Another recent development in this category of big data
processing tools is Flink, see flink.apache.org/, which emerged from the German
8
hadoop.apache.org/
Stratosphere research project. Like Spark, Flink is an in-memory batch processing

engine, whose emphasis, however, is on machine learning and complex event
processing.
The Hadoop ecosystem also brought along tools for data stream processing. The
primary difference between a database system and a data stream system is the aspect
of querying: A database query can be sent to a database system in an ad hoc manner
and each query will be processed and produce a result individually, due to the fact
that data is loaded and then permanently stored. In a data stream system, on the
other hand, the data is streamed to a query processor continuously and without the
option of being available for long periods of time; so the query processor can only
respond to queries that have previously been registered with it, and can produce
result for the data stream by looking at a portion of the stream available within a
certain window. The ability to process data that is only available as a stream (e.g.,
data from temperature or pressure sensors in a weather station as we saw earlier),
but occurs at high frequency obviously requires certain processing power. This
aspect of big data processing is not considered a major problem anymore, due to the
availability of multi-core processors, computing on graphical-processing units
(GPU computing for short), in-memory database systems as described below, and
the widespread provisioning of high-performance data centers.
2.3.5 NoSQL and In-Memory Databases
As mentioned, two important developments in the area of database systems have

been NoSQL systems and in-memory database systems. NoSQL stands for “Not
only SQL” and refers to a type of database system built for coping with the
requirements of big data applications such as scalability, wide distribution, and
fault-tolerance. NoSQL systems are typically non-relational, mostly open source,
and do not necessarily require a fixed schema; they commonly relax one or more of
the ACID properties. Various kinds of NoSQL systems have been developed,
including key-value stores (e.g., Amazon’s SimpleDB or Dynamo, LinkedIn’s
Voldemort), column stores (e.g., Google’s BigTable, see Chang et al. 2008,
Apache’s Hbase or Cassandra), document databases (e.g., MongoDB or Couch-
base), and more recently graph databases (e.g., Neo4J or Allegro). In addition,
“NewSQL” databases such as Clustrix, NuoDB, VoltDB, and Google’s Spanner
promise transactional guarantees in addition to NoSQL’s scalability. NoSQL sys-
tems are useful, for example to store tweets or posts from the Web for analysis, due
to their schema flexibility (data is often represented in JSON format) and their
scalability supporting distribution.
Data management in the cloud or in widely distributed systems has its specific
challenges when it comes to balancing consistency against availability and resi-
liency to partitioning failures, as elaborated upon by Agrawal et al. (2012). Keeping
a distributed data collection, file system or database consistent at all times such that
access to any fraction of it will never see inconsistent or invalid data is hard to
maintain, in particular since in a distributed system both hardware and software
failures are frequent. Since the good news is that not every application running in
the cloud permanently needs full consistency, consistency can often be relaxed into
what is known as eventual consistency: When no updates occur for a long period of
time, eventually all updates will propagate through the system and all the nodes will
be consistent; for a given accepted update and a given node, eventually either the
update reaches the node or the node is removed from service. Eventual consistency
is used, for example, by Amazon within several of their data storage products. An
observation first made by Eric Brewer and later proved by Gilbert and Lynch (2002)
is that of consistency, availability, and partition tolerance, only two properties can
be achieved simultaneously; this result has become known as the “CAP Theorem.”
In other words, if an application wants to be immune against partition failures and
needs to guarantee high availability, it has to compromise consistency. Conversely,
an application that needs consistency along with high availability cannot expect to
be partition-tolerant and hence has to take measures for handling partition failures.
The NoSQL systems mentioned above have reacted and responded to the CAP
Theorem in various ways, most often by allowing for relaxed notions of consis-
tency, yet more recent developments (such as Google’s Spanner and F1) claim to be
able to go back to strict forms of consistency.
In-memory (or main-memory) database systems as currently popular actually
represent the revival of an old idea that was first studied in the late 1980s and that
has finally become available in commercial products thanks to considerable tech-
nological advances during the last 30 years. In-memory database systems are pri-
marily characterized by enormous gains in processing speed, since the classical
bottleneck between (relatively slow) secondary memory and (very fast) primary
memory is essentially eliminated. Large quantities of data can now be kept in main
memory, and there is no need to swap data in and out because space in main
memory is so limited. Hence read access is now much faster since no I/O access to a
hard drive is required. In terms of write access, mechanisms are available which
provide data persistence and thus secure transactions. In-memory databases have
proven to be suitable for particular use cases, and big data analytics is one of them.
This use case goes way beyond traditional simple read and write operations, and
typically requires a database to provide functionality not commonly attributed to
database systems in the past. For example, SAP’s HANA system is capable of
conducting transactions and analytics in parallel and on the same database, so that a
division into operational database system and data warehouse system is no longer
necessary.
2.3.6 Big Data Analytics
The general expectation today is that all these huge amounts of data we have at our
disposal help overcome guessing, speculating, and imprecise forecasting. The use
cases that we will discuss in the Chap. 3 all apply some form of analytics to the data
collection at hand, be it sports, healthcare, entertainment, or any other area. Above,
we have given an idea in the map-reduce examples how large amounts of data can
be broken into pieces and then be processed individually. But beyond pure storing
or streaming as well as processing, big data applications have an interest in ex-
ploiting the data, i.e., in deriving knowledge or meaning from the data that can
ultimately be turned into benefit or even profit.
Analytics for big data goes considerably beyond what has been done in the
traditional data warehouse. One reason is that the interest now is to take into
account data that cannot be found within a company, but vastly on the Web, e.g., in
public blogs or on social networks. Another reason is the size of big data, since that
often goes beyond the capabilities and capacities of an enterprise. This is where the
new technologies discussed in the previous subsection come in, and they do so in
various forms. One way is to augment a data warehouse by, say, a Hadoop envi-
ronment; another is to kiss the warehouse goodbye and implement a radically
different and new architecture, primarily based on the Hadoop stack.
The latter approach is often preferred when the application requires more than
just information technology, but also statistical computing or computational
statistics, which lies on the border of statistics and computer science. It is concerned
with the development of statistical programming languages such as R and with the
design and implementation of statistical algorithms (such as those available in
packages like SPSS, SAS, or STATA).
As mentioned, big data analytics also refer to data mining techniques to a large
extent, i.e., the process of discovering patterns (such as association rules or clusters)
in large datasets; data mining has, since the 1990s, become popular as a collection
of techniques to unleash previously unknown knowledge from raw data. A typical
data mining application is market basket analysis, which investigates the question,
among others, which items a supermarket customer typically buys together, i.e., in a
single transaction or single trip to the supermarket. The goal is to determine “re-
liable” association rules which can allow the supermarket to do special promotions
or item placements to improve sales. A well-known method here is the Apriori
algorithm, originally developed by Agrawal et al. (1993), which will be discussed
in Chap. 4.
Of particular interest in the analysis of big data are algorithms that in a sense can
adapt themselves to the fact that, over time, more and more data becomes available,
allowing for even more precise results than at the time when less data was available.
A typical example is a recommender system (see Chap. 3) which, for instance,
recommends movies to viewers. Based on searches or past viewing history, the
recommender will try to establish a “profile” for a viewer, possibly compare this
with the profiles of other viewers, and then recommend movies the viewer in
question has not yet seen, but which fit his or her profile. Clearly, recommendations
get better and better the more refined the profile is. However, the movie provider
will not be interested in rewriting the recommendation algorithms from time to
time, whenever a significant amount of new data has become available. Instead, it
would be preferable if the algorithm improves itself, or “learns.” This is the basic
idea behind the wide area of machine learning (ML), which is the science of
enabling computers to act without being explicitly programmed. ML has gained
high attention especially in connection with big data, and has impacted and is
impacting such diverse areas as self-driving cars, speech recognition, Web search,
or genome research.
A highly acclaimed ML example is IBM Watson, “a technology platform that
uses natural language processing and machine learning to reveal insights from large
amounts of unstructured data” (www.ibm.com/watson/). IBM Watson became
widely known in 2011 through its win against two of the greatest champions of the
American quiz show Jeopardy essentially by evaluating huge amounts of possible
answers in parallel and at an extremely high speed. Since then, Watson technology
has been exploited in other areas, one of which is healthcare, as we noted earlier one
of the prominent use cases for big data. Watson Analytics is now available for
general use and can hence be integrated into a variety of other fields as well, and a
number of interesting discoveries can certainly be expected (see www.ibm.com/
analytics/watson-analytics/). We will come back to Watson in Chap. 6 when dis-
cussing the Watson IoT platform.
2.4 Integrated Systems and Appliances
“I think there is a world market for maybe five computers.” This market forecast by
IBM chairman Thomas Watson in 1943 proved to be just as wrong as the
assessment by DEC founder Ken Olson in 1977: “There is no reason anyone would
want a computer in their home.” It is precisely digitization, which allows for
information and communication technologies to access all areas of our lives, which
shows how absurd these assessments were, granted, from today’s perspective. Not
even these legendary IT experts of their time could begin to imagine the perfor-
mance explosions possible in IT in such a short time. They were even less capable
of imagining the dramatic hardware price decline that accompanied these perfor-
mance explosions. Powerful information and communication technologies have
become affordable now to both business and private users.
Perhaps the IT veterans Watson and Olson had been right with their predictions,
had the market continued to focus on special-purpose systems designed for specific
purposes and developed from perfectly matched components. Instead, market
leaders such as IBM, Intel and Microsoft have emerged and created industry
standards with their products within a short time. The standardization allowed for
economies of scale, which led to a drop in prices for the components. Moreover,
driven by software innovations, the shift of system functionality away from hard-
ware to software enabled for more flexibility in configuration and subsequent use of
the systems. The result was that general-purpose systems could be used in more and
more fields of application, creating additional economies of scale and thus a further
drop in prices. In fact, the additional flexibility that had to be paid for by reduced
performance and higher integration complexity was more than offset by the rapidly
increasing price-performance ratio of the systems.
In times of digitization, the conventional component-based systems now often

reach their limits. In many fields of application, they can no longer keep pace with
the new requirements. Bottlenecks are formed in terms of capacity, performance
and usability, but also with respect to energy consumption (keyword ‘Green IT’).
Integrated systems and appliances offer interesting solutions here and thus become
the enabler of the digital transformation.
2.4.1 Integrated Systems
While the market was increasingly dominated by component-based systems for

years, integrated systems nevertheless have rocked the boat again and again. In
specific application scenarios they allowed performance boosts or ergonomic
benefits that were unthinkable for the component-based competitors. A typical
example was the Apple Lisa (acronym for Local Integrated Software Architecture)
from 1983, one of the first personal computers with a mouse and a graphical user
interface. Due to its high price, however, Apple stopped the production in 1984.
Successor Macintosh had more success, and was a landmark particularly in terms of
ergonomics and networking. Today, virtually all Apple products, from MacBook to
iPad and iPhone on to the Apple Watch, come with features of integrated systems.
Integrated systems combine compute, storage, and networking components with
a virtual machine or the operating system together in a single system. All com-
ponents are perfectly matched by the manufacturer and pre-integrated, making
significant performance gains possible. Typical representatives are Dell PowerEdge
FX2, EMC VCE Vblock, HP ConvergedSystem, Lenovo Converged Systems, and
Oracle SuperCluster. Integrated systems can be classified as converged and hyper-
converged systems. Hardware-focused converged systems are built with discrete
components that could be disconnected from the overall system and used inde-
pendently according to their purpose. In contrast, hyper-converged systems are
software-defined and can be used only in their entirety.
Published in 2015 by Gartner, the Magic Quadrant for Integrated Systems
defines a class of integrated systems with Integrated Stack Systems (ISS), in which
the server, storage and networking hardware and the associated system management
software are linked with application software components. Known representatives
of this class of integrated systems are IBM PureApplication System, Oracle Exadata
Database Machine, and Teradata. The pre-integration in these systems, extended to
the application software, enables additional performance gains and a further
reduction of complexity. However, these benefits are gained by limiting the
applications to the pre-integrated application software.
Integrated systems are characterized by the fact that the manufacturer is
responsible for selecting suitable components, their installation, configuration and
integration, and delivers them in product form. This reduces the complexity of the
IT infrastructure, and the deployment of the systems is considerably simplified. For
the customer, costly conception, integration, testing, certification, and system
management tasks are omitted in deployment and operation. In practice, integrated
2.4 Integrated Systems and Appliances 97
systems are often used as a platform to consolidate heterogeneous system envi-

ronments and to host Enterprise Cloud services.
Practical Example: Oracle Engineered Systems
Already in 2008, Californian Oracle Corp. launched a hyper-converged server
system with the Oracle Exadata Database Machine in cooperation with HP. After the
acquisition of Sun Microsystems, an extensive product family of integrated systems
and appliances (see Sect. 2.4.2) emerged under the label Oracle Engineered Sys-
tems. To show an example of an Oracle Engineered System, the Oracle Exalytics
In-Memory Machine is presented below. Exalytics combines Sun hardware with
Oracle’s business intelligence and in-memory database technology, creating a
high-performance platform for business intelligence (BI), enterprise performance
management (EPM), financial modeling, and forecasting, and planning applications.
In practice, Exalytics is often used in conjunction with the aforementioned Oracle
Exadata Database Machine for data management and the Oracle Exalogic Elastic
Cloud for application server and storage functionality. Figure 2.15 outlines the
architecture of Exalytics as an example of an integrated system.
Oracle Applications Oracle

optimized for Exalytics Fusion Applications
Vertical-specific Pre-
packaged In-Memory Hyperion Analytic Applications for
Analytic Applications EPM ERP, SCM, CX, HCM
Applications Oracle Business Oracle Oracle Endeca

Intelligence TimesTen In- Information
Foundation Suite Memory Discovery
Database
Systems Management
Operating System
Virtualization
Oracle VM
Networking
Compute
Internal Storage
Fig. 2.15 System Architecture of Oracle Exalytics In-Memory Machine

Oracle Exalytics In-Memory Machine hardware is an optimally configured

server, engineered to enable in-memory analytics for large enterprise workloads. It
includes powerful compute capacity, abundant memory, fast storage and fast net-
working options, and also supports direct attached storage options. Exalytics fea-
tures an optimized Oracle Business Intelligence Foundation Suite and Oracle
TimesTen In-Memory Database for Exalytics. Business Intelligence Foundation
takes advantage of large memory, processors, concurrency, storage, networking,
operating system, kernel, and system configuration of Oracle Exalytics hardware.
This optimization results in better query responsiveness, higher user scalability and
lower total cost of ownership (TCO) compared to stand-alone software. Oracle
TimesTen In-Memory Database for Exalytics is an optimized in-memory analytic
database, with features exclusively available on Oracle Exalytics platform.
In practice, Exalytics is frequently used to consolidate various BI and EPM
applications on a single platform. Specifically, this means that Exalytics is opti-
mized for both analytical processing of relational, multidimensional and hybrid
datasets as well as for in-memory computing and Oracle’s Endeca information
discovery in unstructured data. Today, the Oracle product portfolio and third-party
manufacturers offer many analytic applications that are optimized for Oracle Exa-
lytics. The spectrum ranges from Oracle Hyperion enterprise performance man-
agement applications to BI components for Oracle’s enterprise applications in the
areas of ERP (enterprise resource planning), SCM (supply chain management), CX
(customer experience) and HCM (human capital management), which are deployed
both on premise and in the cloud.
2.4.2 Appliances
An important feature of integrated systems is their product character. In contrast,

appliances are characterized by their solution character. They are based on the
experiences of the solution provider and represent their best practices in the inte-
gration and implementation of modular systems. In appliances, modules of different
types (hardware, firmware, system software, application software, development
tools, installation services, product support, consulting, training, etc.) are collected
and loosely coupled based on extensive practical experience and research results.
As with a converged system (see Sect. 2.4.1), the modules can usually also be used
beneficially independently from each other. Moreover, appliances are designed so
that they can be flexibly assembled and parameterized to meet specific industry and
user needs.
In practice, the appliance term is used in various forms: computing appliance,
database appliance, server appliance, storage appliance, network appliance, security
appliance, software appliance, virtual machine appliance, and many more. In all
cases, however, the appliance provides one or more services in such a form that
significantly reduces the requirements in terms of the user’s skills and experience.
This property may also account for the origin of the appliance concept, namely the
easy-to-use home appliances.
2.4 Integrated Systems and Appliances 99
Capacity-On-Demand Single-Vendor
Software Licensing Support
Database Software
Systems Management
Oracle Appliance
Operating System Manager
Virtualization
Oracle VM
Networking
Compute
Storage
Fig. 2.16 Solution architecture of Oracle Database Appliance
The Gartner IT Glossary defines a computing appliance as follows: “A com-

puting device that provides predefined services, and that has its underlying oper-
ating software (OS) hidden beneath an application-specific interface. Computing
appliances offer reduced complexity (e.g., installation, administration and mainte-
nance) and faster deployment by hiding the operating software and embedding the
application within the device.” Wikipedia defines a software appliance as a soft-
ware application, “that might be combined with just enough operating system
(JeOS) for it to run optimally on industry standard hardware or in a virtual
machine”.
Practical Example: Oracle Database Appliance
As a concrete example of an appliance the Oracle Database Appliance is shown in
Fig. 2.16. It belongs to the family of Oracle Engineered Systems. It is preferably
used as a pre-packaged database platform for demanding business applications in
online transaction processing and data warehouses and data marts. The appliance is
less suitable for big data applications, for which the Oracle Corp. has designed a
special big data appliance.
Although the Oracle Database Appliance architecture is similar to the integrated

systems presented in the previous section, there are important differences. The base
of the appliance is made up of hardware for compute, storage and networking,
which are optimally assembled and pre-configured for the operation of Oracle
databases. These are, however, standard components, which are also suitable for
other application scenarios. In addition, the Oracle Database Appliance is available
in different performance classes as well as a high-availability system. Standardized
components for operating systems and virtualization are also installed and
pre-configured on the chosen hardware platform. The database itself is an integral
part of the appliance. However, the customer must license it separately with all the
options according to their requirements. System management software, update
licenses and product support for all included components are included. In addition
to the fast and inexpensive deployment, a unique “capacity-on-demand” software
licensing model, which enables to quickly scale utilized processor cores without
any hardware upgrades, characterizes the appliance. Customers can deploy the
system and license as few as two processor cores in the appliance, and incremen-
tally scale up to the maximum physical processor cores in each system. This
enables customers to deliver the performance and reliability that enterprise business
users demand, and align software spending with business growth.
Practical Example: PROMATIS BPM Appliance
As another example of an appliance, Fig. 2.17 shows the PROMATIS BPM
Appliance.9 Unlike the Oracle Database Appliance that focuses on the intercon-
nection of hardware and database software and puts together a solution that focuses
on this, PROMATIS BPM Appliance places the focus on the BPM software that is
linked to optimal rapid implementation services. Either a virtual machine or suitable
hardware is included as a platform.
The PROMATIS BPM Appliance is based on best practices from the Oracle
Solution Partner PROMATIS, which are offered in the form of best practice
business process models, an appropriate pre-configuration and a proven procedure
for rapid deployment. The professional services included with the appliance cover a
feasibility study including a profitability calculation (ROI calculation), installation,
turnkey implementation of the solution with analysis, design, prototyping, config-
uration, transition, including user and administrator training and go-live. Hence, the
appliance addresses the problem that shies away potential users: fear of risks and
costs of introducing a new software solution. The appliance provides an imple-
mentation with fixed scope, fixed schedule and fixed cost, thus reducing significant
risks of conventional implementation projects.
9
PROMATIS® BPM Appliance™ is a product of PROMATIS software GmbH, Ettlingen,
Germany.
PROMATIS Solution
BPM Software Oracle BPM Suite
Horus Business Modeler
Product Update and Support Services

Best Practice Models
Systems Management
Infrastructure Software
Professional Services
Oracle VM
Hardware (optional)
Fig. 2.17 Solution architecture of PROMATIS BPM Appliance
2.5 Further Reading
Regarding technologies for digital information, there is a host of literature, both in

the form of textbooks and research papers, of which we can recommend only a few
here. Petri nets were originally introduced by German mathematician Carl Adam
Petri in his dissertation, see Petri (1962). The reader interested in this modeling
language is referred to the Petri Nets World website (see www.informatik.uni-
hamburg.de/TGI/PetriNets/), whose goal “is to provide a variety of online services
for the international Petri Nets community. The services constitute, among other
things, information on the International Conferences on Application and Theory of
Petri Nets, mailing lists, bibliographies, tool databases, newsletters, and addresses.”
A well-known model language alternative is BPMN, the Business Process
Modeling Notation, for which various textbooks are available (see also www.bpmn.
org/).
The Horus Method is described in detail by Schönthaler et al. (2012) and
underlies the Horus Business Modeler tool (see www.horus.biz); its main feature is
the fact that it provides a method for enterprise modeling that is integrated with a
process modeling tool. Various other tools have also been developed, in an effort to
establish a kind of “regulation framework” for the modeling task at hand, but with
less rigor; an example is the Signavio Process Editor (see www.signavio.com/
products/process-editor/).
Boutros and Purdie (2014) deal with business process reengineering and process
improvement. Erl (2005, 2009) are introductions to the topic of Service-Oriented
Architecture (SOA). The importance of BPM is also discussed by vom Brocke and
Schmiedel (2015). Vom Brocke and Rosemann (2015a, b) are comprehensive
introductions to all touchpoints that BPM can have in an enterprise environment.
Introductions to cloud computing are provided by Ruparelia (2016), Erl et al.
(2013), or Kavis (2014). Haselmann et al. (2011, 2015) discuss specific ways to
organize and administer a cloud that is particularly appealing to SMEs. Venters and
Whitley (2012) compare the desires and the reality of present-day cloud computing
and develop a framework for evaluating cloud service providers. Juels and Oprea
(2013) compare approaches to security and availability for cloud data. Portnoy
(2016) is an introduction to virtualization; Tanenbaum and van Steen (2007) give a
detailed account of distributed systems.
The reader interested in the foundations of relational databases should go back to
Ted Codd’s (1970) original paper or read his Turing Award Lecture in Codd
(1982). Elmasri and Navathe (2016) provide a comprehensive introduction to all
aspects of database systems. Han et al. (2012) is a comprehensive introduction to
data mining, which also discusses data warehouses. Modern distributed file systems
are described by Leskovec et al. (2014); a classical introduction to distributed
(database) systems is Özsu and Valduriez (2011).
Various introductions to the area of Big Data have been published recently,
including by Mayer-Schönberger and Cukier (2013) or Marr (2016). Google’s
Map-Reduce was first described by Dean and Ghemawat (2008). Examples for an
application of the Map-Reduce paradigm can be found in Leskovec et al. (2014).
The reader interested in approaches to matrix multiplication on parallel hardware
should consult Dekel et al. (1981) and Akl (1989). Regarding modern hardware
solutions for processing big data, we refer the reader to Saecker and Markl (2013).
An introduction to Hadoop is given, for example, by White (2015). Map-reduce, its
Hadoop implementation, Spark, Flink and many other components of the Hadoop
ecosystem not only have spawned a host of development in recent years, which has
resulted in a variety of commercial offerings (e.g., MapR, Hortonworks, or
Cloudera), but also a lot of research, presented, for example, in the annual Inter-
national Workshop on MapReduce and its Applications started in 2010. For
NoSQL systems we refer the reader to Redmond and Wilson (2012) or Elmasri and
Navathe (2016). A representative of the NewSQL category, which combines
classical database properties with those of NoSQL systems, is Google’s Spanner,
see Corbett et al. (2012). For details of in-memory database systems we mention
Loos et al. (2011) or Plattner and Zeier (2015), who also discusses SAP HANA.
For an introduction to statistics, predictive data analytics, and machine learning,

we refer the reader to Shasha and Wilson (2010), Kelleher et al. (2015), Provost and
Fawcett (2013), or Alpaydin (2016).
IT and the Consumer
3
In this chapter we look at the implications for consumers in light of the develop-
ments (technological and otherwise) that we have presented in the previous two
chapters. We start with electronic commerce and look at various approaches that
businesses, both brick-and-mortar and mobile, now have at their disposal for getting
their message to the customer, including advertising, social media marketing, and
recommendation. We will also show how big data analytics can open up new
possibilities. Finally, the emerging area of e-government will be touched upon. As
will be seen, we here take a customer’s perspective and argue that not all of the
above is beneficial, since there are pros and cons of the various ways customers are
nowadays addressed or approached, and there are also many things that they not
only now can, but indeed have to do themselves.
3.1 Commercialization of the Web
Electronic commerce or e-commerce for short is generally understood as the pro-

cess of buying, selling, transferring, or exchanging products, services, or infor-
mation via computer networks; in particular via the Web. It has become remarkably
popular over the past 25 years since it provides an approach for conducting busi-
ness that was highly innovative in the beginning due to its potential global reach,
something that is difficult for traditional brick-and-mortar businesses to imitate.
Furthermore it represents a reduction in the cost of transactions, it can provide
unique, customized products for even small customer bases, and it allows customer
access 24 hours a day, 7 days a week, 365 days a year (“24/7/365”). Moreover,
e-commerce between customers, or C2C, has been highly popularized through
auctioning sites like eBay or TradeMe. Often, a fundamental characteristic of an
e-commerce scenario is the absence of intermediaries, i.e., third parties offering
intermediation services to two trading parties. For example, a publishing company
can now sell directly to readers, without going through the intermediary of a book

DOI 10.1007/978-3-319-60161-8_3
106 3 IT and the Consumer
store, authors can directly sell to readers without a publisher as intermediary,

travelers can circumvent travel agencies, or individuals can sell used cars without
the involvement of a car dealer.
3.1.1 Components of an E-Commerce System
A typical e-commerce system has the following major components:
• Product presentation component: An e-commerce system must provide var-

ious ways for customers (including businesses) to search, select, compare, and
identify products they want to purchase. This presentation component typically
needs to service several channels, including browsers of various makes and sizes
running on a variety of distinct devices or platforms; the presentation itself
nowadays needs what is called responsive design so that is can adapt itself to the
various screen sizes in use today.
• Inventory management and supply chain component: When a customer
orders, the e-commerce system needs to provide an immediate response on
whether the desired product, item, or service is currently available and, if not,
how long it might take to restock or rebuild. Conversely, the system should be
capable of efficient warehouse management and be able to automate the inter-
action of a business with its suppliers to a great extent.
• Order entry and shopping basket component: After the customer has made a
selection, he or she needs to enter an order for the product into the electronic
commerce system. Order entry often allows the customer to add items to an
electronic shopping basket, which is a list of the products the customer has
selected to purchase. Before an item is added to that basket, the e-commerce
system should have the inventory control system check the product database to
see if there is adequate stock on hand or if the product needs to be ordered from
a manufacturer.
• Payment component: To allow customers to pay for the items purchased, an
e-commerce system needs to have electronic payment capabilities. Various
approaches are used for electronic payment, including payment by credit card,
third-party payment service (e.g., PayPal) or by electronic funds transfer. To
ensure the security of the card information sent over the Internet, special pro-
tocols such as HTTPS that provide data encryption are used. Interestingly, it has
become common to ask for prepayment from a customer (during checkout), a
habit that was less common prior to electronic commerce (and still is in many
classical business fields).
• Customer service and support component: At any time before, during, or
after purchasing an item, the customer may need advice or additional services or
support. For example, a customer may have a question about how a particular
item of clothing fits before purchasing it, or the customer may have a question
regarding delivery. After receiving an item, the customer may decide to
exchange or return the item. Sometime later, the customer may have a warranty
3.1 Commercialization of the Web 107
Online/Offline Ads,
Email, Recommendation,
Forum, FAQ, Promotion
Social Network,
Newsletter Need or
Awareness
After-
sales
Reviews,
Presales
Blogs, Media,
Sales Direct Contact
Store,
Ecommerce
Price
Comparison
Fig. 3.1 Phases of a customer journey
claim. Many of these customer service situations can be dealt with by providing
detailed information and answers to questions (in FAQs) electronically, and by
providing appropriate customer services. The general perception and goal today
is that of providing a smooth customer journey (or customer experience, CX, see
Chap. 4) that starts with producing awareness for a product or service and
continues through the phases of consideration, purchase, retention, and even
advocacy as shown in Fig. 3.1. This journey typically alternates between a
variety of physical and digital touchpoints and is nowadays supported by
comprehensive analytics of all the data traces a customer leaves behind.
• Recommendation component: Closely related to the previous point (and in
particular the awareness and consideration phases) is that of recommendation,
which has become popular in e-commerce, as people can now refer to other
people for obtaining advice on a product or service. As will be seen ion
Sect. 3.6, an e-commerce site is typically interested in establishing a “profile”
for every customer containing, for example, all the products the customer has
ever bought or at least looked at. It gets interesting when the site is able to find
other customers whose profile is, in a sense that needs to be made precise,
“similar”, so the products the first customer has not yet acquired can be rec-
ommended to her or him. Known as collaborative filtering, this scheme applies
to goods, movies, music, and even people on a dating site.
Figure 3.2 summarizes the various components an e-commerce system must

have today, where core components are shown in blue and interfaces to internal as
APIs to Consumer Touch Points, Web, Call Center, etc.
Content
Social Customer Shopping
Management Product Order
Media Account Cart &
Store Locator Catalog Management
Connection Management Check Out
Search
Payment
Processing Inventory Customer
Fulfilment Reporting
Fraud Management Service
Detection
Interfaces to Enterprise Applications (Financials, CRM, Data Warehouse etc.)
Fig. 3.2 E-commerce components
well as external applications in green. The customer (although not explicitly shown)
and customer service play a central role, as do social media.
Clearly, while these components are well-understood these days, it has taken
more than 20 years of development and (gathering of) experience to develop this
understanding. In the beginning, i.e., in the mid- to late-1990s, user acceptance of
e-commerce was low, due to limitations in Internet access, the limited number of
companies doing e-business at all, a general lack of trust, or due to the missing
customer support. For further details on the issues of the early days and the
development until today, the interested reader should consult, for example, con-
secutive reports by Mary Meeker of Silicon Valley venture capital company
KPCB.1 Indeed, the initial perception was that a traditional retail shop will know
returning customers after a short while, whereas an e-commerce site, without fur-
ther measures, cannot distinguish an HTTP request by a customer today from one
by the same customer tomorrow. This situation has meanwhile changed consider-
ably, and it is one of many situations where big data has become prominent:
E-businesses today typically know their customers well, can classify them
according to a variety of criteria, and are often even able to predict what the next
transaction or the next step during the respective customer journey will be.
It took e-commerce companies some time to recognize that doing their business
electronically involves much more than setting up a Web site that comprises the
components listed above. Indeed, it requires a considerable amount of process
reengineering, as, for example, the Web shop front and the back office must be
connected in ways that did not exist before; this was frequently overlooked in the
early days of electronic commerce and was a typical cause of failure. Among the
fastest to realize this and to react appropriately have been banks and financial
1
see www.kpcb.com/blog/2016-internet-trends-report for the latest edition.
institutions, since their business has moved to the Web quite extensively; electronic
banking, stock trade as well as online fund and portfolio management are in wide
use today. A general theme (and challenge) in this context is the broad digitization
of business, a topic that we will discuss in Chap. 5.
The move from stationary commerce to electronic and later to mobile commerce
has also triggered the development of new branches of the software industry, which
not only provide shop front software, but also systems for click stream analysis,
payment encryption (including technologies such as Blockchain and parallel cur-
rencies such as Bitcoin), data mining, and customer relationship management
(CRM). As we mentioned in Chap. 2, data mining has become popular for ana-
lyzing the vast amounts of data that are aggregated by a typical e-commerce
installation; prominent data mining applications include association rule determi-
nation, clustering, and classification, some of which will be discussed in Chap. 4.
Click streams that have been created by users are also subject to intensive data
mining, and CRM comprises a set of tools and techniques for exploiting data
mining results in order to attract new customers or to retain existing ones.
An e-commerce platform typically serves both the buyer as well as the seller side
and sometimes even an intermediary. On the buyer side, there is a number of
suppliers from which the company obtains its raw materials or supplies, often
through some form of procurement process, and, thanks to the flattening we have
mentioned in Chap. 1, this process can be executed world-wide. Internally, there is
a supply-chain management (SCM) system at work that interacts with an enterprise
resource planning (ERP) system in various ways. On the seller side, there may be
several channels through which the company sells its goods or services also
world-wide, including individual consumers, businesses, and partners (see also
Fig. 3.1). For their various customers, some form of CRM-system will be in place
in order to take care of after-sales activities, customer contacts, complaints, war-
ranty claims, help desk inquiries, etc.
3.1.2 Types of Electronic Commerce
Electronic commerce has developed into various types that all have their specific
properties and requirements:
• Business-to-consumer or B2C e-commerce, which involves a business selling its

goods or services electronically to end-consumer.
• Business-to-business or B2B e-commerce, which involves a business selling its
goods or services electronically to other businesses.
• Consumer-to-consumer or C2C e-commerce, which involves a consumer selling
goods or a service electronically to other consumers.
B2C e-commerce is probably best known to the general public, although by

figures alone B2B is considerably higher in value of goods traded. B2C
e-commerce has become popular due to two aspects: the end of intermediaries and
better price transparency. Indeed, goods are now often sold directly by a business to
the end-consumer, instead of going through a third-party in between. For example,
software can now be downloaded from the producer directly (called electronic
software distribution), instead of having it delivered on DVD and sold through
stores. As a result, prices may be lower (an expectation often not valid), or the seller
will make a better profit (since nothing needs to be paid to the intermediary).
Second and as mentioned, it has become very popular on the Web to provide price
comparisons, through sites such as dealtime.com or guenstiger.de; as a conse-
quence, the consumer will nowadays often do extensive comparisons before
committing to a particular seller. We mention that B2C e-commerce has meanwhile
reached almost every kind of goods; what started with books, music, and movies
has nowadays reached even, for example, raw as well as processed or fresh food.
One of the enablers for the latter has been the fact that delivery has been perfected
over the years, and the nowadays same-day delivery is an option in many places
around the globe. And while even a few hours might be too long a duration a
customer has to wait for a product just ordered, Amazon is experimenting with
delivery by drones (“Prime Air”) that can bring delivery time—currently just in
selected areas—down to a few minutes from the receipt of an order.2
B2B e-commerce, in turn, comes in three major varieties: In a supplier-oriented
marketplace, a supplier provides e-commerce capabilities for other businesses to
order its products; the other businesses place orders electronically from the supplier,
much in the same way that consumers will place orders in B2C e-commerce. In a
buyer-oriented marketplace, the business that wants to purchase a product requests
quotations or bids from other companies electronically; each supplier that is
interested places a bid electronically and the buyer selects the winning supplier
from the submitted bids; this type is common in the car industry, where manu-
facturers requests bids from, say, tire suppliers. Finally, in an intermediary-oriented
marketplace, a third-party business acts as an intermediary between the supplier and
the buyer; the intermediary provides e-commerce capabilities for both suppliers and
buyers in order to identify each other and to electronically transact business.
The third major type of e-commerce, C2C, is mostly manifested these days
through auctions such as eBay or TradeMe.co.nz. eBay (Cohen 2002) was founded
in September 1995 by computer programmer Pierre Omidyar under the name
AuctionWeb. One of the early items sold on eBay was Omidyar’s broken laser
pointer, which to his surprise was due to a real interest in such an item by the
winning bidder. The company officially changed the name of its service from
AuctionWeb to eBay in September 1997. Millions of collectibles, appliances,
2
see www.amazon.com/Amazon-Prime-Air/b?ie=UTF8&node=8037720011 for a December 2016
experiment in this direction.
computers, furniture, CDs, DVDs, musical instruments, diecast models, outdoor

equipment, cars, and other items are listed, bought, and sold daily on eBay. Some
items are rare and valuable, while many other items would have been discarded if
eBay with its thousands of bidders worldwide would not exist. Anything used or
new can be sold as long as it is not illegal or does not violate the eBay Prohibited
and Restricted Items policy. Interestingly, programmers can create applications that
integrate with eBay through the eBay application programming interface (API) by
joining the eBay Developers Program. This opens the door to “mashing up” eBay
with totally different applications, an aspect we will have to say more about later.
We note that other types of e-commerce have meanwhile emerged, including
G2C (government-to-citizen) or B2G (business-to-government), to name just two,
and each type can be combined with one or more business models in order to
actually generate revenue. All of these are characterized by two major features,
which are automation and self-service.
Automation has been a major goal for the establishment of electronic commerce,
as many of the interactions between the components that we saw in Fig. 3.2 are
performed algorithmically. For example, recommendations of products to cus-
tomers are determined based on the customer’s buying history as well as on that of
“related” or “similar” customers, where similarity can be based on various mea-
sures, all of which allow for automated computation.
Self-service, on the other hand, has always been promoted as a major conve-
nience feature of electronic commerce. The customer is now “enabled” to book
flights and hotels himself, to check whether a package has been shipped and is
hence out for delivery, or to manage everything related to his or her bank account in
a do-it-yourself fashion. While it has indeed become convenient to be able to
access, for example, all information relating to air travel (including ticket prices,
seat assignments, airline safety, or flight progress) by yourself, what has actually
happened is that, as noted by Tom Friedman 10 years ago, the customer has in fact
become an airline employee who works for free. This is even more apparent in
banks, which are not only eliminating one branch after another, but also make an
account holder pay if he or she needs manual help and wants a bank clerk to
perform a money transfer! We will come back to this DIY aspect of e-commerce
when we discuss disruption and other aspects in later chapters.
3.1.3 Recommendation, Advertising, Intermediaries
For anyone interested in setting up an e-commerce site or to embark on e-commerce

in another way, two questions will prevail: What is an appropriate way to sell my
product, and how do I attract traffic to my site? Especially for physical goods, it has
become very helpful if such advice comes from other customers. On places like
Amazon, eBay and others, this has become one of the main aspects people seek
when shopping for a product: what others have said about that product, how much
they like it, whether or not they would buy it again, and maybe how the seller
performed or how easy the overall experience has been. Once the importance of
other customers’ opinions, evaluations, and recommendations had been recognized,
many Web shop providers started to install facilities for commenting on a regular
and intensive basis. Amazon was among the first to take this even a step further and
go from pure reviews, which can be commented by others, to a comprehensive
recommendation system (“Customers who bought this item also bought …”),
whose goal is, as an Amazon employee once put it, “to make people buy stuff they
did not know they wanted.” Integration of social media established such retailers as
proponents of what is now known as social commerce.
The other aspect the owner of an e-commerce business will be interested in, the
attraction of traffic as already mentioned above in the context of portals, is closely
related to classical disciplines from business administration: advertising and mar-
keting. Traditionally, advertising is the art of drawing public attention to goods or
services by promoting a business, and is performed through a variety of media. It is
a mechanism of marketing, which is concerned with the alignment of corporations
with the needs of the business market (the term customer journey we mentioned
above is also a term commonly used in marketing). We will take a closer look at
online advertising below.
While we have mentioned that, from a seller’s perspective, doing business over
the Web may be attractive due to the absence of intermediaries which often do little
more that take a share of the profit, there are situations in which new intermediaries
enter the picture and indeed offer a highly valued service. This in particular refers to
facilitating payments, for which trusted third parties have come onto the scene. The
term is borrowed from cryptography, where it denotes an entity enabling interac-
tions between two parties who both trust the third party; so that they can utilize this
trust to secure their business interactions. A well-known example is PayPal, which
allows payments and money transfers to be made over the Web, actually to anybody
with an email address, and it performs payment processing for online vendors,
auction sites, and other corporate users, for which it charges a fee. Private users
need to register and set up a profile which, for example, includes a reference to a
bank account or to a credit card; PayPal will use that reference to collect money to
be paid by the account or card owner to someone else. When a customer reaches the
checkout phase during an e-commerce session, the shopping site he or she interacts
with might direct them to PayPal for payment processing. If the customer agrees to
pay through PayPal, PayPal will verify the payment through a sequence of
encrypted messages; if approved, the seller will receive a message stating that the
payment has been verified, so that the goods can finally be shipped to the customer.
Services such as PayPal or Escrow have invented the notion of a micro-payment,
i.e., a tiny payment (of often just a few cents) which is feasible as a reimbursement
only if occurring sufficiently often. Figure 3.3 shows the checkout process when the
PayPal “Buy Now” button is employed.
Begin
Buyers are
ready to
purchase
your item
1a 1b
Buyers enter their Buyers with
billing information PayPal accounts
to pay by credit card log in to pay
Buyers
confirm
transaction
details
before
paying
Buyers view and End

3 print payment Buyers receive
confirmations payment authorization
notices by email
Fig. 3.3 Checkout process utilizing PayPal Buy Now button
In conclusion, it is fair to say that since the inception of the Web in 1993, a
considerable amount of the trading and retail business has moved to electronic
platforms and is now run over the Web. E-commerce continues to grow at a fast
pace; indeed, US e-commerce retail alone grew from $132 Billion in 2008 to a
estimated $224 Billion in 2014. More precise figures can be found, for example, at
www.emarketer.com/. This has created completely new industries, and it has led to
a number of new business models and side effects that do not exist, at least not at
this scale, in the physical world.
3.1.4 Case Amazon
Before we continue our elaboration of IT and its impact on the consumer, we pause
for a moment and take a look at the “role model” for electronic commerce:
Amazon.com. Amazon sold its first book online in July 1995 and, as mentioned in
Chap. 1, soon after started selling CDs, then DVDs, and then items in many other
categories. According to an article in USA Today,3 “now you can hop online from
your phone, download the e-book version, bid on a vintage couch on which to read
it, and hire someone to explain the concepts to you—all with one click.”
3
www.usatoday.com/story/news/nation-now/2015/07/14/working—amazon-disruptions-timeline/
30083935/
In the 20 years since its inception, Amazon has introduced major innovations
that have had an impact on how people read, consume music and movies, and even
shop for typical consumer products such as clothes and groceries. Beyond this,
Amazon has invented (or at least experimented with) novel ways of delivering
goods (e.g., by drones), and has become one of the largest cloud providers
worldwide. As we have discussed in Chap. 2, Amazon is nowadays providing all of
infrastructure, platform, and application software as service in the cloud. Not sur-
prisingly, the Internet traffic caused by AWS has long (actually in January 2007)
bypassed the traffic caused by Amazon’s e-commerce business, as can be seen from
media.amazonwebservices.com/blog/2008/big_aws_bandwidth.gif.
More importantly, Amazon has become the role model of a “disruptor” in the
sense that, even without a single physical bookstore, Amazon has been able to
disrupt the physical book market and its value network, and replace it by a virtual
market that is cheaper, faster, and way more convenient than the classical market in
many respects. We will discuss disruption and disruptive innovation in more detail
in Chap. 5, but a brief summary of “Case Amazon” is presented here.
We follow the USA Today article mentioned above from July 14, 2015 on
Amazon’s lifestyle innovations, which summarizes the following key areas of
innovation:
• Bookselling: In the mid-1990s, Amazon was the first to try online bookselling,
and it succeeded widely and even put competitors (i.e., Borders) out of business.
• One-click purchasing: Buying something with one click only was introduced in
the fall of 1997; Amazon even holds a patent for this.
• The cloud: We discussed AWS in Chap. 2, an idea that started in 2006 and has
evolved into a major Amazon business.
• The cloud at home: In March 2015, the company expanded its professional
services marketplace, Amazon Local Services. Now you can hire anything from
a plumber to a goat herder, at your digital leisure.
• Members only: Amazon Prime was created in 2005, with users paying a flat
annual subscription fee for certain benefits, including one-day shipping prices.
Same-day delivery for Prime members launched in May 2015. Amazon
announced its goals to launch drone delivery called Amazon Prime Air in
December 2013, and said that the future delivery system is “designed to safely
get packages into customers’ hands in 30 minutes or less using small airborne
devices”.
• The rise of e-books and e-reading: Amazon’s Kindle was a game-changer for
e-reading. Launched in 2007, the Kindle connected book purchasing with book
platforms, leading to sensational headlines like “print is dead.” In mid-2010,
Amazon’s Kindle e-book sales outpaced hardcover book sales for the first time.
By a similar token, the Kindle Fire, launched in September 2011, was a cheap
version of the flashier Apple counterparts. Beyond this, Amazon purchased
Goodreads, the leading social network for book lovers, in March 2013.
• Publishing: If you’re selling, why not publish? In 2009, Amazon Publishing

took the publishing world by storm, slowly snatching up titles from publishers
such as Random House, Penguin, Macmillan, or Simon & Schuster.
• Audio: Amazon purchased audiobook mainstay Audible for $300 million in
2008, so there is a good chance it is powering all the audio books one will ever
download.
• Robots: For the 2014 holiday rush, Amazon hired 15,000 robots for its fulfill-
ment system.
Currently we even see the arrival of brick-and-mortar Amazon stores (“Amazon

Books”) in the US (see Sect. 5.5). Amazon has also been instrumental in the
development of search, data mining, and recommendation technology, much of
which is nowadays subsumed under “big data technology.” This technology
includes supply chain optimization (e.g., site selection for warehouses to minimize
distribution costs; selection of optimal routes, schedules, and products groupings, to
minimize delivery costs; minimization of time spent by drivers in traffic jams),
pricing and profit optimization, fraud detection for credit card transactions or
detection of criminal behavior on AWS (system intrusions, hacking attempts or
other malicious activity), fake reviews detection, search engine technology to help
users find what they want to buy quickly, customer segmentation, churn analysis,
inventory and sales forecasting, payments analytics (for authors, vendors, pub-
lishers), competitive analysis, i.e., automatic processing and analysis of billions of
comments posted by users on social networks about Amazon, its competitors, and
new trends and taking action based on the respective findings.
Analyzing reviews is a particularly tricky area, since the majority of customers
are either not willing to provide feedback or do not have the time to do so. As Suw
Charman-Anderson wrote in Forbes in 2012,4 “Amazon’s reviews system is fun-
damentally broken and whilst that might seem like an issue that troubles only those
of us in the industry who pay attention to these things, it isn’t. As book reviews
become more and more unreliable, so more and more buyers will start to get
frustrated that they aren’t getting what they were expected and will start looking for
reviews elsewhere. That will habituate them to looking outside Amazon for
information on books and bring Amazon’s position as the canonical reference for
books under threat.” Indeed, fake reviews have become a lucrative business, and
they often do not come from the manufacturer of a product as might be expected,
but from a company that has been hired to produce such a review. There is already a
name for this business, astroturfing, and legal instances are increasingly involved in
fighting it, including New York Attorney General Eric T. Schneiderman, who made
local businesses pay a total of $350,000 in fines for engaging in this illegal
practice.5
4
www.forbes.com/sites/suwcharmananderson/2012/12/18/amazon-is-ripe-for-disruption/
#55ad91947d4c
5
www.entrepreneur.com/article/228525
On the other hand, Amazon has been instrumental over the past 20 years in the
development of data mining techniques that take advantage of the massive amount
of digital data that is available on an e-commerce site, including customers’ sear-
ches, clicks paths, length of stay on a particular site, buying histories, wish lists etc.
Every single click is recorded and eventually analyzed, which has led to features
such as “Customers Who Bought This Item Also Bought” or “Customers Who
Viewed This Item Also Viewed.” While content-based recommendations as well as
collaborative filtering are commonly employed, there remains room for improve-
ment in this space.
3.2 Big Data Analytics Application Areas
In Chap. 2 we discussed technology for the management of big data, and we gave
two brief examples (matrix-vector multiplication and weather data analysis) for an
application of the map-reduce paradigm. But what are some actual cases where big
data has resulted in a new insight and how does this technology benefit business in
general? In this section we will describe several application areas which demon-
strate how big data can indeed be considered a game changer, since it has already
led to developments that differ significantly from what we have seen in the past,
e.g., in the context of data warehouses. These areas exhibit great variety, so it is
important to note that our sample is not exhaustive and also represents only the
beginning of the development, generally termed “big data analytics.”
Sports
One of the oldest examples of what data can do, before it was called “big” data and
be popularized, is from the area of sports and relates to the Oakland Athletics
baseball team and their coach Billy Beane, who was able to use statistics and player
data to revamp the team from an unsuccessful one into a comparatively successful
one within a relatively short time span. The story is well documented in the book by
Lewis (2004) and in a 2011 movie based on that book (with Brad Pitt as main actor).
Doug Laney, whom we already mentioned in Sect. 2.3, provided a more recent
example from sports on his blog, namely from the Indy 500 race happening in the U.
S. every year on Memorial Day weekend. According to Laney, a present-day Indy
500 race car is on the inside “smattered with nearly 200 sensors constantly mea-
suring the performance of the engine, clutch, gearbox, differential, fuel system, oil,
steering, tires, drag reduction system (DRS), and dozens of other components, as
well as the drivers’ health. These sensors spew about 1 GB of telemetry per race to
engineers pouring over them during the race and data scientists crunching them
between races. According to McLaren, its computers run a thousand simulations
during the race. After just a couple of laps they can predict the performance of each
subsystem with up to 90% accuracy. And since most of these subsystems can be
tuned during the race, engineers pit crews and drivers can proactively make minute
adjustments throughout the race as the car and conditions change.” Further details
3.2 Big Data Analytics Application Areas 117
can be found on Laney’s blog (see www.blogs.gartner.com/doug-laney/the-indy-

500-big-race-bigger-data/), and it is obvious that the situation for Formula 1 cars or
the NASCAR series is similar. Exploitation of sensor data from cars is, however, no
longer limited to race cars, as will be seen shortly.
Smart Homes
An area that will finally experience wider dissemination with the application of big
data is home automation, a field that has been under development for more than
15 years now, but which so far has not taken off on a large scale. With big data
tools there now exists the technical knowhow to process data from air conditioning
and heating units, lighting, windows, doors, audio and video equipment, even
household appliances such as washers, dryers, and refrigerators in conjunction with
personal information from the people living in the house, in order to create living
conditions optimally adapted to the specific needs of the residents. Figure 3.4
indicates the many application areas for smart home concepts, and there are
numerous companies are already active in one or more of these areas. The important
point is that data is produced everywhere and is ideally combined in a way that can
lead to useful automation and subsequently enhanced convenience for the residents.
Healthcare
In home automation the intention is to improve the living conditions of people in
their homes and a by-product of this is a greater likelihood of being able to live
independently for longer, even into old age. Creating better living conditions is also
part of the broad domain of healthcare, which is increasingly supported by or based
upon data gathered about a person’s medical condition, daily activity, nutrition as
well as other input, e.g., from drug manufacturers, and its appropriate processing.
A major trigger here has been the arrival of personal tracking devices or “wear-
ables” such as the Fitbit One, Flex or Alta, the Nike+ Fuelband, the Jawbone Up or
smart watches with health monitors, which typically communicate with a local
device such as a smartphone over Bluetooth and with a website over the Internet.
Fig. 3.4 Smart home

applications areas Lightning
Appliances &
Audio Devices
Energy &
Pet & Baby Utilities
Monitor
Health &
Wellness
Safety &
Security
Home
Robots
Device
Controllers Gardening
Commonly you need to sign up to one of the providers’ sites and can then inspect
your personal statistics. Activities are recorded on a daily basis; the user can set
goals, monitor whether they have been achieved, and even compared himself or
herself with friends who are using the same type of device. Such devices and
associated activities are supported by a range of gamification-like features including
rewards and leaderboards.
While this data may be beneficial for its users, for example to watch the progress
of a personal diet or to find out how the personal shape has improved (or deteri-
orated) over time, there is a host of other people and institutions interested in that
data, first and foremost your doctor as well as your health insurance provider. While
a personal doctor can so far diagnose a patient only based on the data from a recent
check-up, combined with the records which the doctor may keep about this patient
or which may be available from previous consultations, he or she can now integrate
this with fitness data which the patient provides himself. It will thus become easier,
for example, to relate a heart condition to a lack of exercise, and the patient will
even be enabled to control recovery him/herself. On a larger scale, it has been
predicted by IBM Research (see http://www.research.ibm.com/cognitive-computing/
machine-learning-applications/targeted-cancer-therapy.shtml) that “in five years,
doctors will routinely use your DNA to keep you well.” The healthcare area will
particularly boom soon due to an availability of personal sequence or genome data
and an increasing understanding of which of its portions (i.e., genes) are responsible
for what disease or defect.
Similar to personal doctors, health insurance companies will likely ask for the
availability of such personal tracking data soon, and they will typically be able to
execute an even wider integration of data than a general practitioner, thanks to the
digitization of research and test results, insurance claims, or home monitors, all of
which deliver data in addition to what the general practitioner (GP) already can
acquire. It can also be expected that it won’t be long until the insurance premium a
person has to pay for such an insurance will reflect the person’s willingness to do
health monitoring or to grant access to personal data.
Property Insurance
Developments analogous to healthcare can already be seen for car insurance, for
example in the USA or in the UK, where some companies already reduce the
premium a car owner has to pay if the latter is willing to plug a small device into the
on-board diagnostics (OBD) port of their car, through which the insurer can per-
manently monitor how the driver is behaving when on the road. The OBD port
allows access to sensor readings from a variety of devices that are built into a car.
Insurer Allstate is marketing its Drivewise device as follows6: “Drivewise is a way
for smart drivers to get rewarded for driving safely every day. Each time you take a
drive, it collects feedback on driving behaviors including hard braking, high speed
and when you’re behind the wheel. The safer you drive, the more you can earn!
Drivewise will never raise your rates. The focus of Drivewise is to give you
6
www.allstate.com/drive-wise.aspx
feedback that can only help your driving, and your rates.” Competitor Progressive is
marketing its Snapshot device as follows7: “The fair way to pay for car insurance. It
just makes sense—insurance should be based partly on how you actually drive,
rather than just on traditional factors like where you live and what kind of car you
have. That’s what Snapshot is all about. Your safe driving habits can help you save
on car insurance.”
Connected Cars
One important point in these developments, be it healthcare, homes, or cars, is that we
are observing more and more cases where devices talk directly to each other, instead
of just recording, say, a measurement on a website and have it ready for human or
algorithmic inspection. For example, in the car insurance example, the plug in a car is
not talking to the driver, but to a machine on the insurer’s site which can immediately
use the transmitted data for premium calculations. Ultimately, (small or big)
machines talk to other (small or big) machines, potentially through various stages or
intermediate machines, and ultimately come up with a decision on an issue that
impacts human beings. This is a typical example of the Internet of Things (IoT) and of
a cyber-physical system which we will say more about in Chap. 6.
Take connected or self-driving cars. The idea, around as a vision since the
1950s,8 is that a car be able to self-navigate along a roadmap and while doing so
observe exceptional conditions (such as construction sites) and communicate with
other cars as well as the road itself. Similar to race cars, autonomous cars carry
sensor and camera technology, so that they can monitor their distance from other
cars and their surroundings, adapt their speed appropriately, and can recognize
obstacles or oncoming traffic. Ultimately, they will even be able to predict technical
malfunctions or even breakdowns, and will then communicate to the nearest garage
to arrange repair. When the car reaches the garage, alternative transportation will
already be ready to take over the passengers; this was originally envisioned by HP
Labs in California within their CoolTown project.9
On the downside, preliminary experience with self-driving cars as gathered by
companies like Tesla, Jeep, or Volvo shows that there is still a lot to be done before
the vision of totally self-driving cars will become a reality. Autonomous cars need
to be connected to the Internet in order to be able to communicate with a remote
server that can evaluate, for example, distance measures or images in real-time.
These connections could become subject to hacking, or the decision that the car
makes in response to a server communication may simply be wrong. Worse, the
problem of making a “correct” decision when there are two alternatives, both of
which imply casualties but in differing amounts, has been shown by Englert et al.
(2014) to be closely related to the famous halting problem for Turing machines and
is hence undecidable; see also Achenbach (2015). The halting problem states that
there is no algorithm which can decide given a program and an arbitrary input to
7
www.progressive.com/auto/snapshot/
8
See www.youtube.com/watch?v=F2iRDYnzwtk
9
See www.youtube.com/watch?v=U2AkkuIVV-I
that program, whether the program will halt for that input. While originally stated
for the formal computational model of Turing machines, it generalizes to programs
written in arbitrary, yet “Turing-complete” languages. Undecidability means that
there is no algorithm, of whatever complexity, that can solve the problem at hand.
Hence, connected cars have a built-in problem, which cannot even be solved by
algorithm and even less by an ethics committee!
Smart Cities
While connected or autonomous cars are still in an early stage of development, other
developments regarding transportation (or more generally, “becoming smart”) are
more advanced. We mention two examples next that are representatives of the pos-
itive effects (big) data can have for the customer or more generally the individual. The
first example deals with Milton Keynes, a town in Buckinghamshire, England. Milton
Keynes, or MK for short, has set out to become a role model for smart cities, taking a
holistic view of transportation, energy and water management, enterprises and citi-
zens. On its website www.mksmart.org/ the city originally stated: “Milton Keynes is
one of the fastest growing cities in the UK and a great economic success story.
However, the challenge of supporting sustainable growth without exceeding the
capacity of the infrastructure, and whilst meeting key carbon reduction targets, is a
major one. MK:Smart is a large collaborative initiative, partly funded by HEFCE (the
Higher Education Funding Council for England) and led by The Open University,
which is developing innovative solutions to support economic growth in Milton
Keynes. Central to the project is the creation of a state-of-the-art ‘MK Data Hub’
which supports the acquisition and management of vast amounts of data relevant to
city systems from a variety of data sources. These include data about energy and water
consumption, transport data, data acquired through satellite technology, social and
economic datasets, and crowdsourced data from social media or specialized apps.”
One of these apps concerns transportation, where the goal is to provide
cloud-enabled mobility (CEM) for everybody. The idea is “to connect users with
information and other cloud-based services (e.g., booking and billing systems) in
such a way as to reduce travel frustrations and congestion, and also allow users to
make spontaneous public transport decisions.”10 Central to this is the MotionMap
“that continuously describes the real-time movements of people and vehicles across
the city. It will include embedded timetables, car parking, bus and cycleway
information and estimates of congestion and crowd density in different parts of the
city.” Hence, users of the motion can, for example, decide to switch from car to bus
while on their way, or switch from one type of public transport to another, since
there are enabled to “see” what is happening on their path. It should not come as a
surprise that all of this is based on intensive data analysis utilizing several of the
techniques and technologies we have mentioned, including recent hardware
developments, social media analytics, cloud computing, and recommendations.
The second example is Urban Engines, a Silicon Valley startup whose original
mission was to improve “urban mobility—saving you and everyone else time in
10
www.mksmart.org/transport/
transit—by using information from the Internet of Moving Things.” The latter
refers to transit systems like metros and buses, delivery services, or on-demand
fleets which move through a city, thereby generating huge amounts of data. Urban
Engines collected and analyzed that data in such a way that people (or companies)
could understand better how traffic flows change during the day; ideally, this
knowledge can be exploited to optimize a personal transportation schedule, for
example by learning that a bus will be late due to a traffic jam and suggesting the
user to switch to the underground, and ultimately save time. The approach also
works for transportation services themselves, and Urban Engines’ software has
been deployed by Singapore’s Urban Redevelopment Authority (URA).11 The
important point here lies in the combination of data from a variety of sources, and in
analyzing this data jointly in order to identify, for example, commuter flows to and
from home and work locations. Urban Engines was acquired by Google in
September 2016 to become part of the Google Maps team.
Other Use Cases
Other areas that are already big on an exploitation of the massive amounts of data
that can be collected include market research (enabling the customer journey we
mentioned above) or the entertainment industry. Disney Parks & Resorts has
developed the MyMagic+ system (see www.disneyworld.disney.go.com/faq/bands-
cards/understanding-magic-band/), which through the My Disney Experience web
site and the corresponding mobile app can deliver up-to-date information on current
offerings to prospective guests planning a trip to one of the Disney parks. Disney’s
MagicBand can be used by the guest as a room key, a ticket for the theme park,
access to FastPass+ selection, or to make a purchase. Participating visitors can skip
queues, reserve attractions in advance and later change them via their smartphone,
and they will be greeted by Disney characters by their name. The system behind
MagicBand collects data about the visitor, his or her current location, purchase
history, and which attractions have been visited.
Finally, we mention that social media sites or search engines also intensively
analyze the data that they can get a hold of. Indeed, Twitter regularly analyzes the
tweets its users are generating, for example to identify and compare user groups, to
analyze user habits, or to perform sentiment analyses on the text of tweets. Simi-
larly, Facebook is interested in the number of “Likes” a page gets over time and
keeps a counter for recommended URLs, in order to make sure it takes less than
30 second from a click to an update of the respective counter. Google performs text
clustering in Google News and ties to show similar news next to each other;
moreover, they classify e-mails in Gmail, and perform various other analytics tasks,
e.g., in connection with their AdWords business. We will look “behind the curtain”
for some of these applications later, in order to give the reader an idea of the
techniques employed in these areas.
11
www.ura.gov.sg/uol/
3.3 Mobile Commerce and Social Commerce
In Chap. 1, we presented the key mobile technologies used by businesses today.

We also introduced the concept of socialization and presented a number of appli-
cations of social media technologies. At the beginning of this chapter, we briefly
summarized the commercialization of the web and e-commerce. In this section, we
continue our discussion of e-commerce, but specifically look at how mobile tech-
nologies and social media are increasingly becoming the platforms of choice for
both customers and retailers wishing to buy and sell products and services on the
Web.
According to emarketer.com (2014), it is forecast that by the end of 2017 over
two billion mobile phone or tablet users will make some sort of mobile commerce
transaction. Further predictions suggest that by the end of 2018, some 27% of all
US retail e-commerce sales will be carried out using mobile devices with an
anticipated value of some $US 133 billion. It would seem that these predictions
may have indeed been a little conservative, as the 2015 Internet Retailer 2016
Mobile 500 study found that by 2015, US mobile sales had already exceeded 29%.
What these and many other similar studies clearly demonstrate is that mobile
commerce will eventually, and perhaps quite soon, overtake fixed-line e-commerce.
3.3.1 Applications of Mobile Commerce
Mobile commerce can be defined as any business activity conducted over a wireless
telecommunications network or from a mobile device. Specifically, in most cases it
can more simply be defined as the buying and selling of goods and services through
wireless handheld devices. Service-based mobile transactions can include those
involving the likes of entertainment such as online gaming, gambling, and content
consumption (e.g., from Netflix, Amazon Prime, or iTunes). They can also include
transactions that result from a user viewing advertisements on their mobile devices.
Another key source of mobile commerce revenue is from web-based mobile
communication. With increasing ubiquity of low-cost wireless networking (Wi-Fi
and Cellular), mobile users are moving away from more traditional, fixed-line forms
of electronic communication to greater use of mobile-only communication appli-
cations such as WhatsApp, Viber, or Snapchat, as well as mobile enabled appli-
cations such as Facebook Messenger, FaceTime, or Skype. All of these applications
are offered free to use, albeit with limited functionality, funded through the
ever-increasing presence of online advertising. WhatsApp, purchased by Facebook
in 2014 for an estimated $US19 billion, is an exception and, to-date, has resisted the
opportunity of advertising, instead recognizing the value of over one billion reg-
istered users and instead encouraging users to sign up to Facebook where they can
contribute to its extremely successful advertising-based funding model.
3.3 Mobile Commerce and Social Commerce 123
3.3.2 Attributes of Mobile Commerce
While we mention ubiquity as being a key enabler of mobile commerce, the rapid
growth of mobile commerce that has been observed around the world cannot be
attributed to any singular factor. Indeed, there are a number of reasons behind what
cannot only be described as a highly disruptive innovation. The following attributes
are collectively responsible for this growth.
• Ubiquity: widespread, uninterrupted network access enables easier information

access in real-time.
• Convenience: We now have access to highly sophisticated “computers in our
pockets” that store data, provide access to the Internet and intranets, offer
intuitive touch screens, high definition video and sound, with ever increasingly
acceptable battery life.
• Instant connectivity: It seems a distant memory nowadays the process we had to
follow in order to connect to the Internet, and that was via a wired connection.
Today we are provided with easy and quick connection to the Web, as other
mobile enabled devices through the likes of Bluetooth or NFC.
• Personalization: What we see on our devices has been specifically tailored to our
own specific needs and wants. This, based on highly sophisticated Big Data ana-
lytics, collaborative filtering, and recommender systems provides a mobile service
that appears to have been developed by somebody who know us intimately.
• Localization of products and services: This more recent development taking
advantage of inbuilt GPS technologies and, in the future, beacon technologies, is
concerned with knowing where users are located at any given time and matching
services to them.
3.3.3 User Barriers of Mobile Commerce
While mobile commerce appears, on the face of it, to offer consumers a

shopping/entertainment experience comparable to that of fixed-line e-commerce but
with the ability to engage from any location, there are some consumer challenges that
developers and service providers alike need to be aware of. The three main issues that
concern smartphone users and prevent them from using their devices to engage in
m-commerce revolve around safety and security, connectivity, and screen size.
In terms of safety and security, users fear that their devices will be attacked by
viruses, resulting in the theft of personal data. Indeed customers regularly report
that they feel safer, when engaging in web-based transactions, when they are at
home using fixed-wire communications. The home setting gives the buyer famil-
iarity and the software and technology sitting on a desk at home feels more secure
and protective. This perception, however, is not necessarily supported by the
mobile industry. For example, McCaskill (2015) quotes leading specialist security
company Kaspersky and top mobile payments provider Zapp as suggesting that
mobile security is actually probably safer than that of your average laptop. The
reason for this is still a little unclear, but what has been observed is that hackers are
still predominately focused on fixed-line online transactions, not so much mobile.
While mobile connectivity continues to improve, at least within urban areas
within development nations, there continues to be concerns of the quality of mobile
connections, especially when financial transactions are involved. Users remain
concerned over slow or unstable connections and in particular worry that they may
be cut off in the middle of an e-commerce transaction. In the vast majority of cases
however, mobile application providers have technical solutions for such situations.
Increasing network speed, availability and reliability will continue to reduce the
likelihood of such occurrences.
One mobile commerce challenge that is less easy to solve relates to the small
screen size that inherently characterizes mobile devices. While on the one hand,
users demand the convenience that goes with small-sized devices, on the other hand
they want to be able to view in detail the products and services they are purchasing.
Unless a buyer is familiar with a product or service, or the product’s appearance
does not matter, users report being hesitant to buy an item on a smartphone. This
has led to many technical and design developments to try to maximize the space
available on a mobile screen and provide a user experience that is both intuitive and
maximizes the capabilities of the screen for which the user is viewing product and
service information. Further technological developments around foldable and
expandable screens will continue to reduce these user concerns.
3.3.4 Social Commerce
While mobile technologies are disrupting the “where” in terms of e-commerce

transactions, social media is having a similar impact on “how” we are conducting
these transactions. Social commerce, sometimes abbreviated as “s-ecommerce”, is a
term often used to describe new online retail models or marketing strategies that
incorporate established social networks and/or peer-to-peer communication to drive
sales. Cohen (2011) more succinctly states that social commerce “is the evolution
and maturation of social media meets shopping.” More specifically, social com-
merce, is, as represented in Fig. 3.5, the intersection between e-commerce and
social media.
Social
e-Commerce Social Media
Commerce
Fig. 3.5 Defining social commerce

3.3 Mobile Commerce and Social Commerce 125
It is helpful when considering the role of social commerce to consider an

analogy, based around two central items—cash registers and water coolers. If we
consider putting water coolers next to cash registers, that represents helping people
connect where they buy by adding and linking social media tools and content to
e-commerce sites, e.g., Amazon—rate and review products. Putting cash registers
next to water coolers is the analogy for helping people buy where they connect by
embedding social media stores and storefronts to popular social media platforms,
e.g., Best Buy’s storefront in Facebook.
In the past, business use of social media has been restricted to a use of, typically,
Facebook for marketing and promotional purposes. However, with greater func-
tionality being introduced in such applications, social media is increasingly being
used to improve customer retention and build brand loyalty, contributing to the
customer journey (see Fig. 3.1). To this end companies are learning to identify
opportunities to maintain customer engagement strategies after an initial purchase.
While social commerce is still a small percentage of online retailing, its growth rate
exceeds all others, with Internet Retailer’s Social Media 500 (2015) reporting year
on year growth between 2013 and 2014 of 26%. That compares with the roughly
16% growth for the overall e-commerce market in the US.
It is instrumental to consider why such growth has occurred, and to fully
understand this, the business and customer perspectives need to be separately con-
sidered. From the business perspective, social commerce aids with marketing
monetization, i.e., helps marketers monetize and measure campaigns. It also con-
tributes to e-commerce sales optimization by improving conversion rates and
increasing average order value. Finally it is used by businesses for creating new
revenue streams by curating and extracting value from social media content. From
the consumers perspectives, trust, utility, and fun characterize what appeals. Per-
ceived trust increases because social media content increases the “source credibility”
of sales and marketing messages, making them more believable, persuasive, and
trustworthy. In terms of utility, by putting social commerce tools at the disposition of
customers, brand, businesses, and retailers can enhance the online customer expe-
rience. Finally, probably most importantly, social commerce brings back the fun in
online shopping. By contrast, early e-commerce was a solitary experience typified by
people interacting with software. Social commerce helps make commerce social
again and can enhance the entire customer journey as discussed in Sect. 3.1.
3.3.5 Dimensions and Models of Social Commerce
There are six dimensions of social commerce (Marsden 2009):
1. Social Shopping: allows customers to share their online shopping experience

with others (synchronous shopping); adds emotion/feeling to the experience;
enables real-time recommendations.
2. Rating & Reviews: provision of independent third-party evaluation of a product
or service review, with user contributions encouraged.
3. Recommendations & Referrals: provides a mechanism to promote recommen-

dations and referrals within social networks, providing gamified rewards for
referrers; integrated in social shopping portals; use of syndication tools via
various social media platforms to share recommendations with friends, fans, and
followers.
4. Forums & Communities: used to connect customers and businesses to each
other in a moderated and curated environment.
5. SMO (Social Media Optimization): used to promote and publicize websites and
website content through social media.
6. Social Ads & Apps: branded content in social media in the form of paid
advertisements or social applications.
There exists a number of different social commerce business models. Not all
utilize social media extensively, but instead involve elements of socialization that
exist within existing retailing platforms. Social commerce business models, as
defined by Sagefrog (2013) and wpress4.me (2013) include:
• Peer-to-peer sales platforms (eBay, Etsy, Amazon Marketplace): community-based

marketplaces, or bazaars, where individuals communicate and sell directly to other
individuals.
• Social network-driven sales (Facebook, Pinterest, Twitter): sales driven by
referrals from established social networks, or take place on the networks
themselves (i.e., through a “shop” tab on Facebook).
• Group buying (Groupon, LivingSocial): products and services offered at a
reduced rate if enough buyers agree to make the purchase.
• Peer recommendations (Amazon, Yelp, JustBoughtIt): sites that aggregate
product or service reviews, recommend products based on others’ purchasing
history (i.e., “others who bought item x also bought item y,” as seen on
Amazon), and/or reward individuals for sharing products and purchases with
friends through social networks.
• User-curated shopping (The Fancy, Lyst, Svpply): shopping-focused sites where
users create and share lists of products and services for others to shop from.
• Participatory commerce (Threadless, Kickstarter, CutOnYourBias): Consumers
become involved directly in the production process through voting, funding and
collaboratively designing products.
• Social shopping (Motilo, Fashism, GoTryItOn): sites that attempt to replicate
shopping offline with friends by including chat and forum features for
exchanging advice and opinions.
Looking forward, yotpo.com identifies participatory commerce, social shopping,

curated shopping, and peer recommendations as the business models that will
flourish in the future.
3.4 Social Media Technology and Marketing 127
3.4 Social Media Technology and Marketing
In Chap. 1 we discussed the impact that the Web has had on the growing importance of
social networks over the years. A particular result of this has been the wide emergence
of social networking sites as well as blogs (e.g., Wordpress), microblogging (e.g.,
Twitter), or wikis (e.g., Wikipedia). Readers interested in the impact these media
nowadays have in general should consult sites like http://www.fanpagelist.com/
category/top_users/, “the social media directory of official accounts of your favorite
brands, celebrities, movies, TV shows and sports teams,” or http://www.ebizmba.
com/articles/blogs, which shows the top 15 most popular blogs on a monthly basis.
We will focus here on specifically at the business impact of social media, how
companies might use them, and how they could analyze the impact of their use.
3.4.1 Social Media and Business
Social media is no longer just confined to public usage by private people. Indeed,
many companies have discovered social media for their internal communication, and
nowadays use them intensively. Examples of tools available for this purpose include
Socialtext (see http://www.socialtext.com/, an “integrated suite of web-based social
software applications includes microblogging, user profile, directories, groups,
personal dashboards using OpenSocial widgets, shared spreadsheet, wiki, and
weblog collaboration tools, and mobile apps”), Atlassian Confluence (see http://
www.confluence.atlassian.com/, a type of team collaboration software), Asana
(asana.com, “the easiest way for teams to track their work—and get results”), Slack
(slack.com, “messaging for teams”), or Starmind (see http://www.starmind.com/).
The effects social media can have on an organization have been nicely summarized
by Kietzmann et al. (2011) in the “honeycomb” of social media, which separates
between social media functionality and implications of that functionality as follows:
Social Media Functionality:
• Presence: the extent to which users know if colleagues are available.
• Sharing: the extent to which users exchange, distribute and receive content.
• Relationships: the extent to which users relate to each other.
• Identity: the extent to which users reveal themselves.
• Conversations: the extent to which users communicate with each other.
• Reputation: the extent to which users know the social standing of others and
associated content.
• Groups: the extent to which users are ordered or form communities.
Implications of the Functionality:

• Presence: creating and managing the reality, intimacy and immediacy of the
context.
• Sharing: content management system and social graph.

• Relationships: managing the structural and flow properties in a network of
relationships.
• Identity: data privacy controls, and tools for user self-promotion.
• Conversations: conversation velocity, and the risks of starting and joining.
• Reputation: monitoring the strength, passion, sentiment, and reach of users and
brands.
• Groups: membership rules and protocols.
Clearly, not all the functionality mentioned here is commonly available in a

single tool, and not everything is desirable or appropriate in every enterprise
context, but the important point is that organizations have started to recognize the
benefits achievable with social media and are now exploiting them widely.
An early study of social media impact on businesses was performed by Andriole
(2010), where he posed questions to managers and executives such as the fol-
lowing: What good is Web 2.0 technology to your company? What problems might
Web 2.0 technology solve? How can we use the technology to save or make
money? What are the best ways to exploit the technology without complicating
existing infrastructures and architectures? Andriole’s study produced a number of
important findings which have, since then, been supported by observed practice:
• Web 2.0 technologies can help improve collaboration and communication

within most companies.
• These technologies should be assessed to determine real impact, and a number
of assessment techniques, including interviews, observations, and surveys, can
be used to measure impact over time across multiple business areas.
• These technologies can help improve collaboration and communication across
multiple vertical industries, though many companies are cautious about
deploying them.
Although this may change over time, it is interesting to note that many com-
panies have recognized the convenience and benefits these media can offer both
internally and in the interaction with their customers. Indeed, companies often allow
their customers today to approach them through a variety of channels, including
voice (telephone, VoIP), social networks, e-mail, classical mail, private messages,
or chat and consequently employ what is a called a “multi-channel strategy” in their
customer relationship management. On the other hand, the saying that “a fool with a
tool is still a fool” is still valid; people’s mindset must be such that they are willing
to engage in all this.
3.4.2 Social Networks as Graphs
As it is our goal in this chapter to give the reader an impression of how to approach
an analysis of social media, or how enterprises learn about their customers via
social media, we now take a look at a typical problem in the context of social
networking, that of determining communities. Intuitively, communities are groups
of people that are related by a common interest or purpose and that often interact
regarding this interest; they also develop a sense of togetherness. Communities,
once found, can often be addressed as a whole, in order to support their purpose or
simply as subjects for conducting business. Communities can form for a variety of
purposes and goals, e.g., people with the same disease, people forming a shopping
community, people with the same hobby, the alumni of a school or university, a
sports club, or the fans of a particular type of music; notice that communities often
overlap, i.e., individual members often belong to more than one community.
This last remark is already a hint to a technical challenge: Communities can
easily be visualized as graphs, where nodes represent individual members and
edges a relationship (e.g., “friend”) between two members. So one might expect
that classical graph algorithms are applicable, for example for determining weakly
or strongly connected components. The catch is that graph algorithms tend to
determine disjoint subsets of the set of nodes, so that no node can be in more than
one of them. This is counterintuitive when applied to overlapping communities,
which is why different approaches are needed, one of which is described next.
As a running example, we consider the graph shown in Fig. 3.6, which shows a
very small social network. The interpretation of this graph is that there is a col-
lection of participating entities, e.g., individuals which form the nodes of the graph
and which in our example are named A … F. Moreover, there is at least one
relationship between entities of the network, which could be absolute or with a
degree, i.e., there could be different kinds of relationships between individuals, but
these are ignored here. As a consequence, we can consider undirected graphs,
where every relationship is symmetric, i.e., if X is a “friend” of Y, then Y is also a
“friend” of X. Finally, there is an assumption of locality, i.e., relationships tend to
cluster, e.g., if X is related to Y and Z, then Y and Z are probably also related.
However, notice that relationships are not necessarily transitive: If X is related to Y
and Y is related to Z, then X is not necessarily related to Z as well.
Obviously, real-world social networks are considerably more complex than our
tiny example. Many examples in that regard can be found on the Web; for instance,
Fig. 3.6 Example of a (tiny) social network graph

Internet user John M. Baker shows his LinkedIn connections (as of 2011) at http://
www.etechsuccess2.blogspot.de/2011/01/my-social-network.html.
From a formal point of view, a social network is a graph G = (V, E) with a set V
of vertices (nodes) and a set E of edges that is a subset of V V. Key problems in
social network analysis are the following:
• Centrality: To what degree is a given node “central” to the network, or how

important is a node in the network?
• Link prediction: Which edges currently not in the network or graph are most
likely to be added at some point?
• Community detection: How can the nodes in the network be clustered into
“natural” or “useful” groups?
• Information diffusion: How does information spread or diffuse over the
network?
We will here just look at the problem of detecting communities. Informally, a

community is a subset C of V such that there are many edges between the nodes in
C (and considerably less between two different such subsets). When looking at
Fig. 3.6, we would intuitively say that there may be two communities, one com-
prising nodes A, B, and C, and one comprising the remaining four nodes. In other
words, there is an edge, (B, D), which is a kind of “bridge” between these com-
munities, or which is just “between” them. In the next section, we will demonstrate
how the determination of the “betweenness” of edges can be seen as a key to
community detection.
3.4.3 Processing Social Graphs
We will now describe an algorithmic approach to finding communities in a social

network. Although we perceive such a network as a graph, traditional graph
algorithms, such as those for finding strongly or weakly connected components,
will not work, as we mentioned above. An additional complication, besides the fact
that classical algorithms avoid overlaps, is that they are typically based on a
measure for determining distances between nodes. There is, however, a catch in
social networks, namely the fact that in undirected graphs where the edges represent
“friend” relationships, there is no suitable way to define a distance measure. We
could say, for example, that nodes are close if there is an edge between them (and
distant if not): For x, y 2 V, distance d(x, y) is 0 if (x, y) is in E and 1 otherwise.
However, now consider the case that edges (x, y) and (y, z) are present, while (x, z)
is not. Then the triangle inequality, whose validity is often a basic assumption in
graph algorithms, would require that d(x, z) = 1 d(x, y) + d(y, z) = 0 + 0 = 0,
a contradiction! So the triangle inequality does not hold, and we need a different
algorithmic approach (we will later see, for example in connection with online
advertising that also in other Web applications traditional algorithmic approaches
are now longer usable).
Following the example shown in Fig. 3.6, we are now interested in finding edges
that are least likely to be inside a community. We will make this precise using the
notion of “betweenness” of an edge (x, y) 2 E, defined as the number of pairs of
nodes u, v such that (x, y) lies on the shortest path between u and v. For example, in
Fig. 3.6 edge (B, D) has the highest betweenness of any edge in this graph, since it
appears on every shortest path between any of the nodes A, B, C to any of the nodes
D, E, F, G; all these 12 paths go through (B, D); hence betweenness (B,
D) = 3 4 = 12.
The intuition behind the notion of (edge) betweenness is to look at the strengths
of weak ties, and the higher that strength, the weaker the tie. This is similar to
playing golf, where a high score is also bad; high betweenness of (a, b) suggests
that (a, b) runs between two different communities, yet a and b do not belong to the
same community.
We mention that there is a related centrality notion, node betweenness, which
considers individual nodes instead of edges. It indicates how “central” a node is in a
network and is again measured by the number of shortest paths from all vertices to
all others that pass through that node. A node with high (node) betweenness cen-
trality has a large “influence” on things (messages, opinions) that pass through the
network, under the assumption that passing is based on shortest paths. Both con-
cepts have many applications, including computer and communication sciences,
biology, transport and scientific cooperation.
We next present an algorithm originally proposed by Girwan and Newman (2002),
hereafter abbreviated as GNA. It focuses on edge betweenness and detects commu-
nities by progressively removing edges from the original network, in such a way that
the connected components of the remaining network are the communities (which may
have smaller communities nested inside). GNA focuses on edges that are most likely
“between” communities, and essentially proceeds in three steps as follows:
1. First, the betweenness of all existing edges in the given graph is calculated, by
considering each node X in turn, determining the number of shortest paths from
X to other nodes, and using that number for assigning a partial betweenness to
the edges adjacent to X. When all nodes have been considered, add the
betweenness values determined for each edge and divide by 2 (since for each
edge, both its endpoints have been considered).
2. The edge with the highest betweenness is removed (if multiple edges have the
same highest betweenness, remove them all). The graph may thus split into
several disjoint components; if so, we have found some communities already.
3. Now make the second highest betweenness the highest and repeat Step 2 and
until the graph is broken into a suitable number of connected components,
which form the communities.
In the example of Fig. 3.6, the calculation of edge betweenness will yield the
result shown in Fig. 3.7; as mentioned before, the edge with the highest
betweenness is (B, D). If we remove this edge from the graph, we obtain two
communities as expected, one with nodes A, B, C and one with nodes D, E, F, G.
Fig. 3.7 Graph from Fig. 3.6 after first betweenness calculation
We could then consider edges (A, B) and (B, C) in the first component as well as
edges (D, E) and (D, G) in the second, whose removal would yield smaller com-
munities; whether these are meaningful, however, would have to be answered from
an application point of view.
We are not discussing GNA in further detail, but mention that the algorithm
essentially needs to consider all the shortest paths between all pairs of nodes, which
is computationally expensive and can hence become an obstacle when determining
communities in networks with thousands or millions of nodes. GNA avoids this by
using a breadth-first search approach, which by counting the number of shortest
paths from a node to every other node determines the “flow” values for each edge.
This method has a complexity that is proportional to n e (i.e., linear in the size of
the input), where n is the number of nodes and e is the number of edges in the given
graph, which essentially means that it can be solved efficiently. Details can be
found, for example, in Leskovec et al. (2014).
To conclude this section, we note that there are many other techniques for
analyzing (social) networks. For example, in a site where users can post pictures
and tag them with keywords, the interest may be in “similar” pictures, where
similarity could either be based on a use of similar tags, or on similar picture
content, or both.
More generally and according to the position2 blog,12 there are four different
types of tools that enterprises need for analyzing social networks:
1. “Listening Tools: A brand cannot afford to be ignorant about what’s being said
about it on any major social platform. Social media listening can be your digital
eyes and ears. They sift through all the chatter and analyze it for positive and
negative comments. Depending on their complexity and features, they can give
you alerts on where your brand is featured or direct your attention to negative
comments and potential trouble creators.
2. Reach Tools: Every brand wants to maximize its reach on social media. With
the variety of social media platforms today, this is becoming an increasingly
12
www.blogs.position2.com/four-types-social-media-analytical-tools-need
tough task. Each platform has different formats—Slideshare can carry a 100 MB
presentation but Twitter restricts at 140 character messages. The growth and
diversity of social media offerings makes manual posting across social media
platforms tougher with each passing day.
3. Depth Tools: Some products are high involvement products and choosing the
right product means a lot to the buyer. It could be a camera for an amateur
photographer or a financial software package for a small business owner. The
stakes are high for the buyer either emotionally, financially or in terms of effort
and impact.
4. Relationship Tools: Social relationship tools are useful for publishing content
on social sites. These tools offer scheduling capabilities, which ensure an
enduring online presence. This helps the brand stay in touch regularly with
consumers instead of making sporadic appearances.”
Important tools are henceforth tools that perform sentiment analysis, cluster
analysis (such as GNA) as well as other forms of analytics that ideally result in
perceptions that not only allow us to study the relationship between a company and
a customer as it has been in the past, but that ideally allows to foresee how to
improve (or reestablish) it in the future.
3.5 Online Advertising
We mentioned in Chap. 1 that advertising on the Web has become one of the most
prominent Internet business models. It begun with simple banners that could be
placed on other Web sites. Since then it has emerged as one of the major ways to
make money on the Web, which according to Battelle (2005) is due to Bill Gross
and his invention of GoTo, a service that became famous for being among the first
to differentiate Web traffic. Indeed, what Gross quickly realized was that
non-targeted advertising on the Web was largely irrelevant and of little value to the
advertiser as long as the traffic passing by any placed ad was the “wrong” traffic,
i.e., from users not interested in what was being advertised. If users arrive at a site
due to a spammer who has led them there, due to a bad portal classification, or due
to a bad search result, they are unlikely to be interested in the products or services
offered at that site. He hence started investigating the question of how to get
qualified traffic to a site, i.e., traffic with a reasonable likelihood of responding to
the goods or services found at a site, and then started calculating what businesses
might be willing to pay for this. This gave birth to the idea that advertisement can
be associated with the terms people search for and the pay-per-click tracking
models we see today in this business.
Advertising has become a major business model on the Web since the arrival of
Google AdSense, according to them “a fast and easy way for website publishers of
all sizes to display relevant Google ads on their website’s content pages and earn
money” and Google AdWords, which allows businesses to “create ads and choose
keywords, which are words and phrases related to [their] business. … When people
search on Google using one of [the] keywords, [the] ad may appear next to the
search results.” It is hence no surprise that consumers of stationary or mobile
devices are constantly flooded with ads today. It is also a major source of income
for social networks like Facebook, and hence is a connection to the topic of
community detection we discussed in the previous section.
Before we embark on a more detailed discussion of online advertising as a
business model in general and AdWords in particular, we note that advertising on
the Web represents another incarnation of the long tail curve of Web applications
we have seen in Chap. 1 (Fig. 3.7): Through Google AdWords and related
approaches (e.g., in social networks), it has become possible not only for large
companies (amounting to 20% of all companies) to place advertisements on the
Web, but now the same is possible even for a small company. Through a
cost-effective and highly scalable automated infrastructure provided by the index of
a search engine or of a social network, online sites can offer advertising even for
very limited budgets, as they may be only available for a small company. In other
words, small companies do not have to set up an infrastructure for advertising (even
in niche markets) themselves, they can simply rely on what others are providing and
searching for on the Web.
The 20+ years of history of online advertising has seen a variety of important
developments, including the following:
• Direct placement: Advertisers post their ads directly on a site, and do so for
free or for a fee or pay a commission. Examples include eBay, craigslist, and
many auto trading sites. The selection of an ad by a user can be based on
parameters (e.g., make, model, or year of a car), or it can be done relative to
query terms (e.g., “apartment Belmont”). Ranking of ads, i.e., the question of in
which order ads should be presented, is typically tricky under this approach and
may be based on such strategies like “most recent first” or similar. However,
users can be shown individual ad selections.
• Display ads: These are banners that are placed on many sites, sometimes at
fixed places (e.g., upper left corner), sometimes in the middle of text that rep-
resents a search result or an article spread over multiple Web pages, yet all users
get to see the same ads. Banner ads resemble advertising in traditional media
(e.g., magazines, TV); however, a big difference is that the advertiser now
typically pays for impressions, not just for showing the ad. An obvious benefit is
that the Web can exploit the information about its users, in order to determine
which ad they should be shown; this information can be gathered from a variety
of sources, including social media, email, bookmarks, time spent on a page, or
search queries issued. For example, if a search engine recognizes a user (via
cookies or when logged into an account with the respective provider) and can
record that the user has an interest, for example, in motorsports, there is a high
probability that an advertisement for car and car parts will be regularly presented
3.5 Online Advertising 135
to that user. As mentioned earlier, banner ads were the initial form of advertising
on the Web and have brought along a typical foundation for calculating fees, the
CPM (cost per mille,13 i.e., per thousand impressions) rate, meaning that an
advertiser does not have to pay for each and every click of his ad, but only per
thousand. Banner ads typically show low click-through rates and, correspond-
ingly, a low return on investment or revenue for the respective advertiser.
• Search advertising: This form of advertising is based on the simple idea of
creating an association between what a user is searching for on the Web and the
ads shown to her or him in response to a search query. This idea was originally
developed by a company called Overture, which was acquired by Yahoo!
around the year 2000, and would place ads together with the results of a search
query; advertisers can now bid on certain keywords, and when someone sear-
ches for one of these keywords, the ad of the highest bidder is shown. Like with
banners, the advertiser is charged only if the ad is actually clicked on. The
concept is (and has been) easily extended to e-mail, where a provider can
analyze e-mail content, e.g., search for “important” terms in e-mail, and then
select and show ads correspondingly.
Search advertising has become a primary advertising method on the Web since
Google adopted it around 2002. After a number of changes it was made available to
the public under the name AdWords. We will discuss these changes below, but in
particular focus on two issues that arise when ads are shown dynamically. These are:
• How to determine which ads should be shown together with a particular search
result?
• How to rank ads in case multiple ones link to a given search term?
Additional questions, not discussed here, refer to the question of how to attract
views and clicks, where to place an ad on a Web page, and generally how to do
better than traditional mass-media (radio, TV billboards), both from a provider’s
and from an advertiser’s point of view.
3.5.1 A Greedy Algorithm for Matching Ads and Queries
We next look at the question of which advertisement(s) should be displayed with

the results of a given query. So the situation we consider is that of a search engine
which can answer search queries, and which has decided to monetize the ad
business. More specifically, when a user places a search, say, on “sports car,” the
search engine would come back with links to Web sites on sports cars, but also
show advertisements of sports car vendors and manufacturers, and vendors of
associated accessories. To make this happen, the advertisers have previously stated
13
Latin for thousands.
in one way or another that they are interested in having their ads placed near search
results for “sports car.” The typical way they indicate this is through an online
auction, in which advertisers offer a certain premium they are willing to pay to the
search engine provider as soon as their ad is clicked. We assume that every
advertiser or bidder has a certain budget (e.g., for the month), and that budget
cannot be exceeded.
The following simple example indicates that the letting the highest bidder win
might not always be the best solution: Suppose we are faced with the following
situation:
Advertiser Bid on “Mustang” (m) Bid on “Camaro” (c) Budget

A 2 1 5
B 1 2 5
Thus, we only have two bidders and no one else; both still have all of their
budgets, and we only display one ad per query. Next assume we receive the
following sequence of search queries:
cmcmcm
The first query asks for “camaro”, the next for “mustang”, the next for “camaro”
again, and so on. If ads are placed as follows
BABA
The search engine will get stuck after answering four queries, since B, the
highest bidder on “c”, does not have enough budget anymore, and similarly, A, the
highest bidder on “m” does also not have enough budget anymore; thus, the overall
revenue will be 8, and no more ads will be shown after the first four searches.
So what we see here is that highest bids are not always a guarantee for having
ads placed, and indeed a search engine will typically keep track of how often the
ads of a particular advertiser are actually clicked, in order to get a more realistic
picture of potential revenues; we will come back to this point later.
We could do better than what we saw above if the search engine had an idea of
the future or, in other words, would know in advance which search queries to
expect. Indeed, if the entire sequence “c m c m c m” had been known in advance,
the search engine could have assigned ads as follows:
BABAAB
Now the last two ads are not from the highest bidders anymore, but the overall
revenue would come to 10.
It is easy to see that this aspect of “knowing the future” can make a crucial
difference. Consider the following revised example, where budgets remains the
same as above, but bids go down:
Advertiser Bid on “Mustang” (m) Bid on “Camaro” (c) Budget

A 1 0 5
B 1 1 5
Now assume that the search sequence is
mmmmmccccc
(i.e., five queries for “m” followed by five queries for “c”). Since both advertisers
bid on “m” and there is no difference in their bids, the search engine might assign as
follows:
BBBBB
But then B’s budget is exhausted, and A does not bid on “c”, so the revenue
obtained in this way is 5. Had the search engine known what to expect after the first
five queries, it could have placed ads as follows:
AAAAABBBBB
and the total revenue had been 10.

What we see from these simple examples is that a type of algorithm is needed
which does not wait for its input to be complete. Instead, there is a partial input to
start with, and the algorithm needs to make a decision or produce an output as soon
as the next piece of input is received. An algorithm of this kind is called an “online”
algorithm, as opposed to an “offline” algorithm that only starts its processing once
the input is completely available (which is the case, for example, for sorting
algorithms like Quicksort or Heapsort)14. In the case of a search engine, an offline
algorithm would collect, say, a month of search queries, then look at the bids and
the budgets of its advertisers, and finally compute an assignment of ads to query
results that optimizes revenue as well as the number of impressions, but obviously
that search engine would not be of much use in a fast digital world.
Instead of waiting for more input, a search engine needs to employ an online
algorithm that can instantly select an ad to be shown with the query result when a
search query arrives, and the only information it can utilize is the advertisers’ bids
and budgets and information about the past, in this case how often the ad of a
particular advertiser has been clicked (the click-through rate or CTR). Our second
14
Notice that “online” does not mean in this context that it has to be done on the Internet; it only
indicates incomplete input information.
example above shows why the future could help, but in this scenario knowledge
about the future is not available.
The last example above also shows that typically an online algorithm will
achieve results that are not as good as what could be achieved by an offline
algorithm. Indeed, the result of the online algorithm indicated could only be 50% of
the revenue otherwise achievable, i.e., by an optimal algorithm A. This discrepancy
is measured by a coefficient called the competitive ratio c of the online algorithm at
hand; in our case, c < 5/10 = ½, and it can be shown that the converse also holds
and hence c = ½. In other words, the result that this “greedy” online algorithm can
achieve cannot be guaranteed to be any better than 50% of the optimal result.
Having now established a basic understanding of what kind of algorithm is
needed for solving the problem of matching advertisements to search queries (or
their results), we now look at a simplified version of the matching problem itself.
The simplification is that we consider bipartite graphs, i.e., graphs whose set of
nodes can be divided into two disjoint subsets such that edges only connect nodes
that belong to distinct subsets. For example, Fig. 3.8 shows a bipartite graph with
four nodes in each node subset.
We interpret the edges in such a graph as preferences; if we consider ads to form
the left set and queries the right one, our interest is in finding as many matchings as
possible, i.e., subsets of the edges such that no node is an end of two or more edges.
For example, {(1, c), (2, b), (3, d), (4, a)} is a matching for the graph shown (with
thick lines) in Fig. 3.8; it is even a “perfect” matching in the sense that every node
of the graph appears in the matching. A matching that has the most number of edges
possible in a given graph is called a maximal matching. The matching just con-
sidered is also maximal, since no other matching could have more edges.
Fig. 3.8 A bipartite graph

We mention that another, possibly more intuitive interpretation of the scenario

considered here is to consider the left set as boys and the right set as girls; the goal
would then be to match each girl to a boy which she “likes” (or is connected to
through an edge). This interpretation also indicates the online character of the
scenario: Given the boys (ads), the girls (queries) arrive and need to be assigned a
boy (ad) based on preference (existing bid and remaining budget). Again, if the
entire bipartite graph is known in advance, the problem of finding a maximum
matching is well-studied and can roughly be solved in time quadratic in the number
of nodes of the graph (Hopcroft and Karp 1973).
If that is no longer the case, i.e., if we do not know the entire graph from the
outset, but only the left set (i.e., the boys or ads), we need an online algorithm or a
greedy matching; this works as follows: Given the set of boys, the girls arrive one
after the other. Upon each arrival of a girl, her preferences (edges) are revealed, and
the girl is paired with the next eligible boy. If there is none, the girl is not paired. In
the graph of Fig. 3.8, this will mean that nodes 1–4 are initially given. When node a
arrives, it reveals that (1, a) and (4, a) are possible choice; assume (1, a) is chosen.
Next node b arrives, and (2, b) and (3, b) are possible choices; assume (2, b) is
chosen. Next c arrives, and no choice is possible since node 1 is already taken.
Finally, node d arrives and is paired with node 3. So we arrive at a matching
consisting of three edges, which is not maximal, since we already seen earlier that a
maximal matching would consist of four edges.
This result would lead us to suspect that the competitive ratio c equals ¾, but we
need to consider the worst performance of the algorithm over all possible inputs. If
we start again with (1, a) as above, but this is now followed by (3, b), we end up
with an even worse result consisting of only two edges; hence c < ½, and again it
can be shown that ½ is also a lower bound, so that c = ½. This concludes our short
introduction into the essence of algorithms for matching ads to queries.
3.5.2 Search Advertising
We now consider search advertising, sometimes also called the “adwords problem”
after the Google AdWords system. Informally, the problem we are faced with is the
following: A stream of queries q1, q2, … arrives at a search engine, and several
advertisers bid on each query. When a query arrives, the search engine must pick a
subset of advertisers and show their ads. Not surprisingly, the goal is to maximize
the revenue for the search engine! Stated slightly simpler than what Google actually
does, the search advertising problem can be stated as follows. Given are:
1. A set of bids by advertisers for search queries.

2. A click-through rate (CTR) for each advertiser-query pair indicating the per-
centage of impressions which are actually clicked.
3. A budget for each advertiser (say, for 1 month).
4. A limit on the number of ads to be displayed with each search query.
The aim is to respond to each search query with a selection of advertisers such
that the following holds:
1. The size of the selection is no larger than the limit on the number of ads per
query.
2. Each advertiser indeed has a bid on the query.
3. Each advertiser has enough budget left to pay for the ad if it is clicked.
As an example, consider the following scenario of three advertisers having

different bids on the same search term, but exhibiting different click-through rates
and hence different expected revenues for the search engine provider:
Advertiser Bid ($) CTR (%) Bid * CTR (c)

A 1.00 1 1
B 0.75 2 1.5
C 0.50 2.5 1.125
Clearly, the search engine provider is interested in maximizing its revenue, or the
total value of the ads selected, where each value is calculated as bid * CTR. It is
therefore understandable that he will not necessarily display the ad of the highest
bidder, but those that promise the highest revenue. For example, A in the table
above has the highest bid, but a low CTR, B has the highest value, and C has the
highest CTR. In other words, if 1,000 queries occur for the search term in question,
A will most likely be clicked 10 times and yield a value of 10 c, B will be clicked
20 times with a value of 30 c, and C will be clicked 25 times with a value of 28.13
c. So the provider will obviously be more interested in C than in A.
To make the situation somewhat more complicated, each advertiser has a limited
(typically monthly) budget, which is divided by 30 to obtain a daily budget, and the
search engine makes sure that no one is charged more than their (daily) budget.
Moreover, the CTR of an ad is essentially unknown in advance and can only be
observed and monitored over time; so typically a search engine will need to start
with an assumption about the click probability of a new ad.
We next present the basic ideas underlying a greedy algorithm for search
advertising; to this end, we make several simplifying assumptions:
1. There is only one ad to be shown for each query.

2. All advertisers have the same budget B.
3. All ads are equally likely to be clicked (i.e., all CTRs are the same).
4. The value of each ad is the same (=1).
Then the algorithm simply says: For each query, pick any advertiser with a bid
for that query. As an example, consider two advertisers A and B such that A bids on
m, while B bids on m and c; both have a budget of 4. Similar to what we have seen
earlier, for query stream
mmmmcccc
the worst greedy choice is
BBBB
with a revenue of 4, while
AAAABBBB
would be optimal with a revenue of 8. Again the competitive ratio can be shown to
be ½, but with a simple improvement called the Balance algorithm it can be brought
up to 0.63. This improvement picks, for each query, the advertiser who bids on the
query and has the largest unspent budget, if that applies to more than one, it picks
one of them arbitrarily.
We conclude this section by briefly discussing the Google mechanism for
advertisers: It is based on an ongoing auction where advertisers can submit bids on
particular keywords; bids indicate the value a click would have for the advertiser
when the ad is shown. Google shows a limited number of ads only with each query.
Thus, while the original (Overture) idea was to simply order all ads for a given
keyword, Google now decides which ads to show, as well as the order in which to
show them, and as we have seen this decision is solely driven by expected revenue.
That is, the click-through rate is observed for each ad, based on the history of
displays of that ad. Users of the AdWords system specify a budget: the amount they
were willing to pay for all clicks on their ads in a month. As we have also shown,
these constraints make the problem of assigning ads to search queries significantly
more complex.
We also note that online advertising is nowadays a complex business due to the
number of parties involved. It is most often not just a business involving an advertiser
and a Web platform provider or (ad) publisher, but there are many intermediate steps
(involving intermediaries), each of which causes an application of additional fees. As
can be seen from Fig. 3.9, there is an entire ecosystem behind the online advertising
business today, consisting of advertising agencies, demand- as well as supply-side
platform, auction markets, and at its endpoints the advertiser and publisher. The three
big players distributing ads on the Web are Doubleclick by Google, Adtech, formerly
AOL, and smartadserver in Europe. It is safe to assume that each transition between
any two parties in Fig. 3.9 involves paying a fee, and statistics show that roughly only
50% of the amount an advertiser spends actually reaches the publisher. More infor-
mation on this can be found, for example in de.slideshare.net/andrewtweed1/
thomvest-advertising-technology-overview-sept-2014 or in http://www.de.slideshare.
net/ksanz15/understanding-the-online-advertising-technology-landscape.
Fig. 3.9 Partial advertising

technology ecosystem Advertiser
Advertising Agency
Demand-Side Platforms
Auction Markets (Ad Networks and Exchanges)
Supply-Side Platform
Publisher
We should not leave the topic of online advertising without a warning:

Advertising is often misused to distribute (and execute) malware and then turns into
“malvertising,” a short form for “malicious advertising.” The principle is simple:
The malicious piece of code is hidden behind an ad that shows up when the user
opens a website. When the ad is clicked, the user is not redirected to the site of the
advertiser, but to an exploit landing page from which the malicious code attacks the
user’s computer (or device) and installs malicious software (see, for example, www.
blog.malwarebytes.org/101/2015/02/what-is-malvertising/ for details).
What a user can do about this is install a so-called ad-blocker such as AdBlock,
Adblock Plus, or uBlock Origin. This is particularly relevant to browsers that are
used for any action on the Web, not just search. These malvertising networks place
ads within Web sites, on mobile phones, into YouTube videos, typically in always
the same way, and clicking an ad creates revenue. As we saw in Fig. 3.9, there are
even marketplaces for ad space on the Web, and it is through these channels that
malware finds its way into the devices or computers of end-users. Even better
protection than just through an ad-blocker is to combine an ad-blocker with
anti-malware software.
3.6 Recommendation
We have previously touched on the topic of recommendation: in connection with

the intuition behind PageRank in Chap. 1, and in the context of big data analytics in
Chap. 2. Earlier in this chapter we mentioned that it has become common in
e-commerce that sites include a recommendation device. We now take a closer look
at what is behind this concept.
3.6 Recommendation 143
While traditional reviews often come from a professional source (such as the
publisher of a book or newspaper staff) or from private customers, online recom-
mendations are often generated by the data mining tools that work behind the
scenes; indeed, recommendations may come from other users, or they are generated
from user behavior (e.g., search history, time spent on particular Web pages while
browsing). Recommendation systems may look at transactional data that is col-
lected about each and every sales transaction, but also at previous user input (such
as ratings) or click paths. Ideally, it becomes possible to classify a customer’s
preferences and to build a profile; further recommendations can then be made on the
basis of some form of similarity of items or categories that have been identified in
or between consumers’ profiles. Clearly, recommendations point to other items,
where more customer reviews as well as further recommendations to more products
can be found (and hopefully end in purchases).
In the context of electronic commerce, recommendations have become very
popular. As an introductory example, let us briefly look at what Amazon does when
a user adds a product to his or her shopping cart. Amazon then creates a special
interim page where recommendations are the main pillar of the strategy, and where
a mix of several strategies occurs; these are:
• Cross-selling,
• other “related” or “similar” products,
• recommended promotions,
• more generic recommendations aimed at serendipity,
• recommended products in the Amazon shopping cart.
Amazon makes the most of recommendations in the page that is displayed when
you add a product to your shopping cart, seizing their opportunities to sell.
While in the context of e-commerce recommendation is typically about items or
products, it can, however, also be about a number of other things in other contexts.
For example, recommendation in electronic learning is about learning content, in
search and navigation about links and pages, in social networks about potential new
friends, or in online dating about potential dates. Irrespectively of the context, the
goal is to help people (customers, users) make decisions on where to spend
attention, money, time, or any combination of these. In what follows, we use
“items” as generic term for what is recommended.
Figure 3.10 shows the various components of a recommender system: It
incorporates two types of entity, items and users, and it takes as input ratings (if
available), content data, and possibly also demographic data. Ratings can be
implicit (obtained through observing user activity, including page views, purchases,
or mails) or explicit, and content can be structured, unstructured (e.g., a textual
evaluation), or somewhere in between. The output of a recommender system is
typically a recommendation and potentially even a prediction of what a user might
like next or what item could become interesting next.
Recommender
Systems
Entities Input Output
Demographic Content Recommen-

User Item Ratings Prediction
Data Data dation
Semi-
Explicit Implicit Structured Unstructured
Structured
Fig. 3.10 Components of a recommender
Recommendations can come in a variety of forms, such as “Top 10,” “Most

Popular,” or “Recent Uploads.” These types of recommendations are easy to come
up with, since they are primarily based on counting activities independent of a
particular user. What is more relevant and, consequently, more difficult to produce
are recommendations that are tailored to an individual user (like in Amazon or
Netflix) and are then typically based on some form of “profile.” Recommendations
might also be editorial or hand-curated, and readers of recommendations should
always be aware that there can also be misuse involved in what is visible.
Recommendation that takes the user into account can easily be described in an
abstract way using a set S of items, a set X of customers or users, and a totally
ordered set R of ratings, such “zero to five stars” or a number in the interval [0, 1].
Recommendation can then be described as a utility function u: X S ! R which
associates a rating with a customer-item combination. Function u can be seen as a
utility matrix U like in the following example, where the items are movies (MI for
Mission Impossible, JB for James Bond, FF for Fast & Furious), and there are four
customers A, …, D; ratings are between 1 (low) and 5 (high):
MI1 MI2 MI3 JB FF5 FF6 FF7

A 4 5 1
B 5 5 4
C 2 4 5
D 3 3
A utility matrix is commonly sparse, i.e., most entries are missing (or 0) since
they are not known, since most people have not rated most items, and the goal of a
recommender is to predict values for these blanks; at the least, a recommender
should be able to fill in those blanks whose values are likely to be high. In reality, a
utility matrix U will have many columns (items) and many rows (users; both
numbers can be in the millions or even higher), and a recommender has access to a
number of attributes for both items and users in order to come up with an entry for
the matrix. Notice that adding a new user would add a line to U that is initially all
blank; adding a new item means adding a blank column. Even worse, a new system
would start with an empty matrix and then has a “cold-start problem.”
The key problems a recommender is faced with are to gather “known” ratings for
U, to extrapolate unknown (high) ratings from known ones, and to evaluate the
extrapolation methods employed. Clearly, the key interest when extrapolating is in
high unknown ratings, since the recommender is interested in what a user likes, not
what he or she dislikes. For gathering new ratings, there are again explicit as well as
implicit methods, both of which have pros and cons. Explicit ratings can be
obtained by asking users to rate items, which could bother people and hence lead to
unreliable responses. Implicit ratings means learning from user actions, in particular
from purchases that lead to high ratings, but a problem here is how to treat pur-
chases that result in low ratings.
In the following, we will briefly discuss two major approaches to the design of
recommender systems (see Fig. 3.11): content-based recommenders and collabo-
rative filtering; hybrid recommenders as a combination of these two are also an
option.
3.6.1 Content-Based Recommenders
Content-based recommenders, often abbreviated CB systems, examine properties of

the items being recommended, i.e., they look at the content behind a potential
recommendation. For example, if a Netflix user has watched many movies
involving car chases, it makes sense to recommend movies from the “Action” genre
to this user or movies with the same actors; the principle is illustrated in Fig. 3.12.
For Web sites, blogs, or news entities it may make sense to recommend other sites
that carry “similar” content. So the important notion here is similarity, and the way
to decide on similarity goes through the creation of a profile; a profile is typically a
set or a vector of features or attributes.
Recommender Systems
Content-Based Collaborative Filtering Hybrid

Recommenders (CB) (CF) Recommenders
Fig. 3.11 Types of recommender systems

Fig. 3.12 Principle of Item profiles

content-based
likes
recommendation
recommend build
match
Red
Circles
Triangles
User profile
Consider movies, which can be characterized or profiled by title, genre, main

actor, director, producer, author, etc. If two movies have identical values for a
“sufficient” number of attributes, they will be considered similar; if the profile P of a
movie is similar to the profile P′ of another movie and user X liked P, then with
some probability X will also like P′.
While profiling appears not too difficult for items like movies or cars and such, it
is considerably more complicated for documents, e.g., news articles, Web pages,
blog entry, or Twitter tweets. Here the question is what the “important” words in the
document are. In order to determine this, techniques originally developed in the
field of Information Retrieval are commonly applied. Obviously, the most impor-
tant words in a document are not simply the ones that occur most often (i.e., words
like “the” or “and” etc., so-called stop words), so another measure is needed.
A common approach to measuring word importance is based on the TF-IDF
(term frequency—inverse document frequency) measure, which identifies words in
a collection of documents that are useful for determining the topic of each docu-
ment. Intuitively, a word has a high TF-IDF score in a document if it appears in
relatively few documents, but appears in this one, and when it appears in a docu-
ment it tends to appear many times.
Suppose we are given a collection of N documents. Let fij be the frequency (i.e.,
the number of occurrences) of term i in document j. Then the term frequency TFij is
defined as fij/maxk fkj (fij normalized by dividing it by the maximum number of
occurrences of any term in the document, in order to avoid a distortion of the result
in long documents). Next let term i appear in ni of the N given documents. Then the
inverse document frequency IDFi is defined as log2(N/ni) and measures the general
importance of a term for the given document collection. Finally, the TF-IDF score
for term i in document j (or its weight in j) is defined as TFij IDFi. The terms with
the highest TF-IDF score are (often) the terms that best characterize the topic of a
document.
As an example, consider a document d containing 100 words wherein the word

Mustang appears 3 times (possibly after some preprocessing, see below). Then
f = 3, and the term frequency for Mustang in this single document is TF = 3/100
= 0.03. Now assume we have 220 = 1,048,576 documents and the word Mustang
appears in 210 or 1,024 of them. Then the inverse document frequency is calculated
as log2(220/210) 10. Thus, the TF-IDF score for Mustang in document d is the
product of these quantities: 0.03 * 10 0.3.
With these preparations, profiling a given document by topic may proceed as
follows:
1. Preprocess the document by eliminating stop words (as well as other actions,
e.g., word stemming).
2. Compute the TF-IDF score for each remaining word in the document; the ones
with the highest scores are the words that characterize the document.
3. Take as the features of a document the n words with the highest TF-IDF scores.
For example, in a document describing muscle cars, words like “Mustang,”

“Camaro,” “Challenger,” or “V8” might turn up as having the highest TF-IDF
scores and would hence be considered the most important features of that
document.
CB recommenders have the obvious advantage that in order to produce a rec-
ommendation for a user, no data on other users is needed, and they could even
provide an explanation, why a particular item was recommended (by listing the
content features that led to their decision). On the other hand, if content is not well
represented by keywords, which is the case, for example, for images or music,
recommendation based on the CB approach is more difficult. Also, CB recom-
menders cannot distinguish items represented by the same set and values of attri-
butes, and they have difficulties to recommend items outside a user’s content
profile.
3.6.2 Collaborative Filtering
Recommenders based on collaborative filtering (CF) recommend items based on a

notion of similarity between users or items, i.e., the items recommended to a user
are those preferred by “similar” users or are simply “similar” items. It is this
category that often comes in the form of “customers who bought this also bought
…” Clearly, what is needed here is a similarity measure.
Before going into further detail, we briefly continue our discussion from the
previous subsection. While the TF-IDF measure is intended to figure out meaning,
or to classify documents by first finding the significant words they contain, and is
hence appropriate for content-based recommendation, sometimes simpler methods
for document comparison will do. Indeed, it is sometimes enough to look at
“character-based” similarity instead of similar meanings, for example when the
interest is in exact copies of a document (like in plagiarism or when looking for
mirror pages of a Web page) or in product recommendations on sites like Amazon.

To this end, a variety of options exist, including the Jaccard similarity (a set-based
measure) if we consider sets of terms, or the cosine similarity (a vector-space
measure) if we consider vectors of terms. Suppose we have two sets of words as
follows (representing, for example, distinct books that have often been bought by
the same customers):
X ¼ fMustang; Camaro; Challengerg

Y ¼ fMustang; Camaro; Challenger; Veyron;
Regera; Miura; 911; Panterag
Then the Jaccard similarity of these sets is determined by looking at the relative
size of their intersection, and hence given by
simJ ðX; YÞ ¼ jX \ Yj=jX U Yj ¼ 3=8 ¼ 0:375
Instead of using features or attributes of items to determine their similarity,

recommenders in the collaborative filtering category maintain a database of many
users’ ratings of (a variety of) items. For a given user, the goal is to find other
similar users whose ratings strongly correlate with the current user, and to rec-
ommend items rated highly by these similar users, but not rated by the current user.
Almost all existing commercial recommenders use this approach (e.g., Amazon).
Coming back to our model of the utility matrix U, where users are represented
by rows and items by columns, we can say that users are similar if their (rows or)
vectors are “close” according to some distance measure (which again could be the
cosine measure), and recommendation for user X is made by looking at those users
that are most similar to X and then recommend items that these users like.
Obviously, there is a “dual” version of CF recommenders: Items are also similar
if their representing (columns or) vectors are close. So if a rating for an item is
missing, a recommender could estimate it based on rating for similar or close items,
and this can use the same similarity metrics and prediction functions as before.
Thus, CF recommenders distinguish user-to-user recommendation, made by
finding users with similar taste or profile, from item-to-item recommendation, made
by finding items that have similar appeal to many users. Going back to the sample
utility matrix U seen earlier, which has been populated further, we could conclude
the following:

A 4 5 5 1 5
B 5 5 4 2 4
C 5 2 4 5
D 5 5 3
Users B and C both liked movie MI1 as well as FF6 and disliked movie JB, so
they might have similar tastes; thus MI2 could be a good recommendation for C.
Conversely, users A and D liked both MI2 and FF6, so we may conclude that
people who like FF6 will also like MI2, and hence MI2 will be recommended to
user C.
Let us come back to the Jaccard similarity introduced above and consider
whether it is appropriate. To determine the similarity of users B and C, we consider
their associated vectors and ignore the missing entries. Thus, when B is considered
as a set, we get B = {2, 4, 4, 5, 5} (technically, we need to consider multisets here,
where duplicate entries are allowed, due to the fact that each number represents a
distinct valuation of some item). Similarly, C = {2, 4, 5, 5}. Hence we obtain:
B \ C ¼ f2; 4; 5; 5g
B [ C ¼ f2; 4; 4; 5; 5g
which implies
simJ ðB; CÞ ¼ 4=5 ¼ 0:8
By the same type of calculation, we obtain the following, for example, for C
and D:
C \ D ¼ f5; 5g
C [ D ¼ f2; 3; 4; 5; 5g
which implies
simJ ðC; DÞ ¼ 2=5 ¼ 0:4
While the second result (regarding C and D) is somewhat more intuitive than the
first (since C and D hardly have anything comparable), we can see that the Jaccard
measure is not appropriate in this case, since the information to which item a rating
value belongs is completely lost, and hence we are somehow comparing apples and
oranges. This would be even more striking if we had other users, say, E and F such
that, for example, E rated MI2 with 1 and FF7 with 5, while F rates exactly opposite
(and both so far rated nothing else): The Jaccard similarity of E and F would be 1,
i.e., these users would be identified as having the same taste, but that would be far
from valid.
A more appropriate measure in cases like these is the cosine measure described
next. Since we do have a utility matrix, we can look at the various vectors contained
in it. In particular, we now consider user preferences (i.e., rows in the utility matrix)
as vectors in a multi-dimensional space and will look at their pairwise cosine
distance, i.e., the angle between them. Since our vectors have positive integer
components only, we are technically looking at the discrete version of a Euclidian
space. Our intuition is that the smaller the angle between two vectors, the more the
point in the same direction and hence the more similar they are: Two vectors x and
y with the same orientation have a cosine similarity of 1; if x and y are at 90°, they
have a cosine similarity of 0.
A quick recap from any geometry book will reveal that the cosine of two vectors
x and x, denoted cos(x, y) is defined as follows:
xy
cosðx; yÞ ¼
jj xjj kyk
In words, the cosine of vectors x and y is calculated as the dot product of x and y
divided by the L2 norms of x and y, i.e., their Euclidean distances from the origin.
The dot product of vectors x = [x1, …, xn] and y = [y1, …, yn] is defined as
X
n
xy¼ xi yi
i¼1
and the L2 norm of vector x is

sffiffiffiffiffiffiffiffiffiffiffiffi
X n
jj xjj ¼ x2i
i¼1
(correspondingly for vector y).

We now apply this measure sim(x, y) = cos(x, y) to our original utility matrix:

A 4 5 1
B 5 5 4
C 2 4 5
D 3 3
Missing entries will be no longer be ignored (as for the Jaccard measure), but
now be treated as 0, and we first look at the cosine of the angle between users A
and C:
5 2þ1 4
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:322
4 þ 52 þ 12 22 þ 42 þ 52
2
Thus, simcos(A, C) = 0.322. Similarly, we calculate cos(A, B) as
45
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:380
42 þ 52 þ 12 52 þ 52 þ 42
Since a larger (positive) cosine implies a smaller angle and therefore a smaller
distance, this measure tells us that A is slightly closer to B than to C, confirming our
intuition. By comparison, it is easily verified that, using Jaccard similarity, we
would have obtained simJ(A, C) = 0.5 and simJ(A, B) = 0.2.
So far we have focused on user-to-user collaborative filtering, yet there is the
obvious alternate (“dual”) view of item-to-item collaborative filtering:
• For item i, find other similar items;

• Estimate a rating for item j based on ratings for similar items.
Clearly, this kind of filtering can use the same similarity metrics and prediction
functions as in the user-to-user model.
Apparently, collaborative filtering is a powerful and efficient method, which
works for any item and can deliver very relevant recommendations. The bigger the
underlying database is and the more the past behaviors are recorded (i.e., the larger
the utility matrix U and the more non-blank entries in U), the better the recom-
mendations it can produce.
On the downside, collaborative filtering might be expensive to implement and
resource as well as time-consuming. A new item that has never been purchased
cannot be recommended, and a new customer who has never bought anything
cannot be compared to other customers and hence not be recommended any items.
In order to cope with the various drawbacks of both content-based and collab-
orative filtering, the approaches can be combined into “hybrid” recommenders,
which use a content-based approach to score some unrated items and then use
collaborative filtering for recommendations.
A final remark in this context concerns the size of the problems we are looking at
here. Given that sites like Amazon or Netflix have millions of users or customers,
and also may have thousands or even millions of products that could be subject to
recommendation, a typical utility matrix will obviously have a huge dimension. In
order to be able to produce recommendations in a timely fashion, ideally while a
user is navigating Amazon’s or Netflix’ pages, two directions of technical support
need to be explored: Relevant computations might be done using parallelization and
paradigms like map-reduce as discussed in Chap. 2. Alternatively, new algorithmic
paradigms need to be employed, which go beyond the scope of this book. Finally a
combination of the two is often the method of choice, since otherwise no timely
recommendations would be possible.
3.7 Electronic Government
Developments in information and communication technologies (ICT) have been an

enabler of enhanced, citizen focused services by governments around the world.
Electronic government, or e-government, is widely regarded as a disruptor of tra-
ditional government service provision through greater citizen access, enhanced
democracy, improved information quality, and a range of governmental efficiencies.

A definition of e-government that succinctly captures its scope of technology use is,
“…a government’s uses of ICT; particularly Web-based Internet applications, to
enhance the access and delivery of government information and service to stake-
holders such as citizens, business partners, public sector employees, and other
governments, agencies and entities” (Shan et al. 2011).
It is important to note that e-government is much more than the provision of
government-to-citizen (G2C) services; it is also concerned with government-
to-government (G2G) interactions and government-to-business (G2B) transactions.
Each of these will be discussed next.
Government-to-citizen (G2C) is the e-government category that includes all the
interactions between a government and its citizens. For example citizens can find all
the information they need on the Web; ask questions and receive answers, pay tax
and bills, receive payments and documents. Electronic benefits transfer (EBT) is a
well-document G2C example. It is expected that G2C e-government will benefit
citizens in four ways:
• It will be easier for people to have their say.

• People will receive more integrated services because different government
organizations will be able to communicate more effectively with each other.
• People will get better services from government organizations.
• People will be better informed because they can get up-to-date and compre-
hensive information about government laws, regulations, policies and services.
A commonly discussed application of G2C e-government is e-democracy. This

is the use of electronic communications technologies such as the Internet in
enhancing democratic processes within a democratic republic or a representative
democracy. It is a political development still in its infancy, as well as the subject of
much debate and activity within government, civic-oriented groups and societies
around the world. E-democracy should seek to improve the democratic outcomes of
the policy process, engage citizens in meeting public challenges, increase
involvement in terms of numbers of participating citizens and, improve the quality
and effectiveness of the democratic process.
There are important social issues associated with the adoption of G2C
e-government. These include access to those who cannot afford Internet access, are
elderly or perhaps disabled. There are also issues, like e-commerce in general,
associated with citizen privacy and security. While these cannot be completely
eliminated, as time progresses technical infrastructure will improve and citizens will
become increasingly accepting of e-government as the new “normal.”
Government-to-business (G2B) e-government is the category that includes
interactions between governments and businesses (government selling to businesses
and providing them with services, and also businesses selling products and services
to government). Many of the G2C services are also relevant to businesses. These
include paying taxes, receiving information and completing various types of online
forms. What is unique to this category is government e-procurement. Government
3.7 Electronic Government 153
is a large consumer of technology, vehicles, offices supplies etc. Using the col-
lective size and purchasing power of government departments/agencies (often
working together as a single purchasing entity) greater value for public money can
be achieved than ever before.
The final category of e-government is government-to-government (G2G). This
includes activities within government units and also those between governments. In
this regard information sharing and process efficiency are the key benefits of G2G
e-government. In terms of information sharing, there are obvious situations where
timely and accurate data staring could lead to much better governmental outcomes.
Such sharing might be seen between government departments such as Immigration
and Inland Revenue, along with Police and Justice.
Broadly, the vision of e-government is based around enhancing public partici-
pation and by providing a progressive and reformist approach to bureaucracies
(Cumbie and Kar 2016). While what has transpired in a practical sense may not
fully aligned with what was initially envisioned, there was a clear expectation that
e-government was not about automating existing processes, but in offering
improved service delivery, integrated services, and market development (Grant and
Chau 2006).
Benchmarking studies tend to categorize e-government initiatives as being at one
of several distinct stages, or levels, of sophistication. The traditional view through
the late-1990s was that e-government developments would parallel those being
observed in the commercial world; essentially offering basic online information;
then citizen-requested information; followed by extra online service channels. It
was also expected that a step change in coverage might subsequently occur via
extensive online collaborations with a wide range of stakeholders on the way to
offering a full e-government service (after De Kare-Silver 1998). While even recent
models of e-government sophistication offer evidence of such progression (e.g.,
Norris and Reddick 2012), the advent of Web 2.0 technologies also requires that
e-government be viewed against the backdrop of (commercial) online social net-
working applications and services; in particular because a clear trend has emerged
of users expecting to contribute and shape Web content themselves (Wirtz and
Nitzsche 2013; Deakins et al. 2008). Recent research has started to uncover sig-
nificant use of various social media applications in local government (e.g., Oliveira
and Welch 2013) and these implementations provide transferable Web 2.0 migra-
tion paths for other government organizations to consider.
One model for e-government implementation that has stood the test of time, and
remains relevant even today, is Deloitte’s (2001) six-stage model. Taking a
citizen-centric approach and seeking to establish long-term relationship with citi-
zens, Deloitte’s six stages are as follows:
• Stage One: Information publishing/dissemination. This early stage is charac-

terized by individual government departments setting up their own websites.
• Stage Two: “Official” two-way transactions with one department at a time—
with secure websites, customers are able to submit personal information and
conduct transactions online.
• Stage Three: Multipurpose portals—customer-generic governments make

breakthroughs in service delivery.
• Stage Four: Portal personalization—customers can access a variety of services at
a single website.
• Stage Five: Clustering of common services—through the removal of duplica-
tion, this is where the real transformation of the government’s structure starts to
take place.
• Stage Six: Full integration and enterprise transformation—offers a full service
center, personalized to each customer’s needs and preferences.
It is without question that governments around the world are facing unprece-
dented opportunities and challenges regarding management of their information and
interactions. While the commoditization of information and communication tech-
nologies coupled with Web 2.0 trends and technologies presents a plethora of
possible solutions, the pace of change is such that few governments are able to keep
up with it. Real-world e-government implementation has largely failed to match the
hype associated with early predictions of government transformation. These pre-
dictions are still sound, it is just that the rate of change in the public sector is
lagging that of for-profit industries.
3.8 Further Reading
Laudon and Traver (2015) is a comprehensive introduction in the area of electronic

commerce. Data mining is the topic of Han et al. (2012) or Witten et al. (2016).
Payne (2005) or Kostojohn et al. (2011) introduce the topic of customer relationship
management.
Healthcare and DNA research, microbiology, but also other natural sciences like
physics and chemistry have also always been among major producers of big data,
although, as Reed and Dongarra (2015) explain, these scientific areas need to keep
their eyes on both data and high-performance (“exascale”) computing. MacManus
(2015) is a comprehensive exposition of health tracking devices and applications.
As we mentioned in Chap. 2 already, Big Data analytics is often based on statistical
technique, like those discussed by Ramachandran and Tsokos (2015), Shasha and
Wilson (2010), Kelleher et al. (2015), Provost and Fawcett (2013), or Alpaydin
(2016).
Social network analysis is the topic of Barabasi (2016), Borgatti et al. (2013),
Fouss et al. (2016), or Scott (2013). A wide field nowadays is that of opinion
mining from social network data or sentiment analysis; an introduction is Liu
(2015).
Online algorithms as used in online advertising go back to the work of Karp
(1992); our exposition in Sect. 3.5 follows Leskovec et al. (2014), who discuss the
topic in more detail. For details on search advertising, see also Leskovec et al.
(2014) or the original sources Kalyanasundaram and Pruhs (2000) as well as Mehta
et al. (2005).
An introduction to recommender systems is given by Aggarwal (2016) or
Agarwal and Chen (2016); the topic is also covered by Leskovec et al. (2014). We
also mention the recent research papers by Lu et al. (2015) and Jannach et al.
(2016).
IT and the Enterprise
4
In previous chapters, we have discussed what IT technologies we are confronted

with today and what their impact for the consumer will most likely be. In this
chapter, we adopt an enterprise perspective and discuss what a company has to do
in order to keep up with technology and, even better, exploit it appropriately for its
own business. The chapter will in particular try to answer the question of how to
guide middle management in any IT-related decisions. This involves cloud sourcing
(“should I or should I not?”), Business Intelligence (BI) and the future of data
warehouses in the presence of big data. It then carries over to models of IT con-
sumerization such as Bring Your Own Device (BYOD) and Corporate Owned
Personally Enabled (COPE), i.e., the question of whether a business should or has
to allow this. Finally, Business Process Management (BPM) is discussed, again
from the perspective of “why should I do this,” but especially from the perspective
of all the benefits proper BPM can drive home.
Clearly, some of the techniques and approaches we have discussed in Chap. 3
are also relevant here, so the manager interested in how recommendation or online
advertising works might want to read the relevant sections there.
4.1 Cloud Sourcing
As we have mentioned before, a recent Morgan Stanley study revealed that almost
30 percent of all applications will have migrated to the public cloud by the end of
2017. While this is a clear indication that cloud computing is an unstoppable trend
and will ultimately reach considerably higher percentages, many companies for
which IT is not their core business are confronted with a non-trivial and potentially
far-reaching decision when it comes to cloud computing: Is it the right alternative
for my organization? Can I really save money when I move to the cloud? What are
the legal implications of choosing a non-local cloud provider who maintains data
centers across the globe? Do I have a chance to negotiate the conditions under

DOI 10.1007/978-3-319-60161-8_4
158 4 IT and the Enterprise
which cloud provision would become acceptable for me? Is my data sufficiently
protected when hosted by a cloud provider?
These and other questions essentially touch on five different dimensions of cloud
sourcing, which can also be observed in many other areas and decision situations
involving technology adoption and are collectively called the TELOS dimensions:
• The technical dimension assesses the general efficiency and performance

capabilities of the cloud or of a particular cloud provider, the backup and
recovery options after a system crash, and at how difficult it may be to integrate
cloud services with local applications.
• The economical dimension considers the cost savings that can be achieved by
moving an existing IT operation, entirely or in parts, to the cloud versus how
expensive the migration itself will be. It also looks at provider reputation, the
applicable pricing model(s), potential lock-in effects, and difficulties that may
arise during a provider switch.
• The legal dimension considers the domestic or international laws that become
applicable, as well as legal issues such as where data is stored, any local legal
restrictions the provider may have a problem with, options for leaving a cloud
provider, or what recourse may be possible should the provider change its
pricing structure.
• The organizational dimension is concerned with designing a structured
approach to provider selection or to document cloud architectures. It also
evaluates support services and mechanisms for communicating with the cloud
provider or with other users.
• The social dimension considers the impact a move to the cloud might have on
staff, and considers questions such as potential lay-offs, changes in job func-
tions, or the need for professional development.
Some of the TELOS dimensions have already been dealt with in previous chapters;
for others, in particular the legal dimension, specific knowledge regarding domestic
and international law is needed (which goes beyond the scope of this book). We
hence concentrate here on the organizational dimension.
4.1.1 Strategy Development
Strategy development in IT has a long tradition, ever since it was recognized that an
IT business is not something that can be setup and run in an ad hoc fashion, but
instead needs careful planning and, due to the rapid development of the field,
constant evaluation and evolution. Hence, our discussion here takes a quite generic
approach to devising an IT strategy, which is shown in Fig. 4.1. It consists of five
stages beginning with preparation and planning. A crucially important aspect is the
specification of goals and the definition of one or more business cases. This is
followed by the selection of a solution provider, which typically involves a com-
parison of several potential providers.
4.1 Cloud Sourcing 159
Preparation & Planning;

1
Definition of Goals and Business Case
2 Selection of a Solution Provider
Detailed Planning;
3
Contract Negotiation
4 Implementation & Migration
Preparation & Planning;

5
Definition of Goals and Business Case
Fig. 4.1 Generic IT development strategy
Once a particular provider has been selected, detailed planning as well as con-
tract negotiations can start. These are followed by an implementation of the selected
solution and a migration of previous services to the new one(s). Finally, the new
setup becomes operational, which typically includes a maintenance schedule.
Ultimately, the new solution evolves over time, which may require some earlier
phases of the strategy to be revisited in order to consider the impact of evolutionary
changes from there. This can occur during intermediate phases as well.
4.1.2 Cloud Strategy Development
It is necessary to adapt the generic strategy shown in Fig. 4.1 to one specific to
cloud computing and cloud sourcing. The first and most important step here is to
develop the cloud strategy at the level of top management. The strategy needs to
define which processes or business areas will be moved to the cloud or shall be
supported by cloud services, and which, if any, will not. Management also needs to
identify possible risks and how they might be mitigated or their effects managed.
When these issues have been clarified, further steps towards cloud sourcing can be
taken.
If it is not at all clear whether or not a move to the cloud will be beneficial, it
may help to retreat to traditional evaluation techniques such as SWOT analysis
(McGuire 2014) or balanced scorecard (Kaplan and Norton 1996). A SWOT
analysis, for example, will analyze the strengths, weaknesses, opportunities, and
threats for the organization in question and can suggest measures or remedies in
each case. Figure 4.2 shows a sample SWOT analysis that can accompany the
development of a cloud strategy (and is actually applicable to most forms of out-
sourcing). The value derived from carrying out a SWOT analysis include the
concentration on core competencies, improving the efficiency of the company’s IT
operation, a consolidation of the company-internal IT landscape, and ultimately cost
reduction. There are a number of challenges associated with cloud adoption, and
Efficiency
of IT Concentration
on core Participation in
operation Participation in technology
Cost competencies
cutting price decline leadership
Chance to
IT consoli- react fast to
dation Strengths Opportunities changing
markets
Internal External
Influencing Factors Early
Factors Factors adopter
Staff
opposition Weaknesses Threats Hacker
Necessary
Organization preparatory Contract
of data Dependency on measures termination Data leaks
management Internet by provider
Fig. 4.2 Sample SWOT analysis for developing a cloud strategy
these need to be considered carefully. For example, staff might be unhappy about
the changes brought about by a move to the cloud, in particular since this requires
serious planning and preparation; another challenge might be the intensified
dependency on the Internet and maybe even on a certain guaranteed bandwidth, and
the management of the company-owned data might require new forms of organi-
zation. On the other hand, the opportunities include a customer participation in
declining prices and technological innovations, since the cloud provider will always
be interested in offering up-to-date equipment, and also an opportunity for the
customer to quickly react to changing markets, which may require an increase or a
decrease in IT support from time to time. Being an early adopter of new technology
may be an opportunity if the technology is robust and future-proofed, but it may
also be a threat if the technology “flops”; both situations have occurred with the
adoption of novel technologies in the history of computing. Other threats include
the danger of being hacked, data leaks, security breaches, or the simple fact that the
provider may prematurely abrogate the contract.
In general, a strategy should be a set of guidelines that is binding for the entire
company (or at least those organizational units that are affected by it), often with
little room for maneuver. A company typically has an enterprise strategy from
which other strategies can be derived, among them an IT strategy and potentially a
cloud strategy. The latter can be broken down into a cloud sourcing strategy and a
cloud providing strategy. The former describes which services will be procured
from the cloud under which conditions; the latter is needed in case a company
wants to act as a cloud service provider itself; this typically applies to software
companies interested in marketing their products “as-a-service.”
We focus here on cloud sourcing, for which the corresponding strategy should
be in line with the overall enterprise strategy; to this end, it needs to address issues
such as the following “W” questions:
• Why should services be procured from or moved to the cloud? If there are
primarily financial reasons, the expected gain should be specified.
• What should be moved to the cloud? If existing systems are affected, should
they be replaced or just moved? What about the data that these systems handle?
• Which risks can be tolerated from a management perspective? Often certain risks
need to be tolerated in the beginning of a cloud sourcing project, with the option
of being eliminated later.
• When should the move to the cloud occur? A timeline for migration is often
useful.
• Who is acceptable or unacceptable as a cloud service provider? A number of
criteria can be used to assess a particular provider, like how the service avail-
ability has been in the past or how quick the provider responds to glitches.
• Where may the cloud provider be geographically located? This has to do with
the question where company-relevant data is kept, for which legal regulations
may apply. For example, if a requirement states that no company data or no
customer data may be kept outside the European Union, a cloud provider that
keeps redundant storage in Asia is not acceptable.
When designing a cloud strategy, it is important to get all stakeholders onboard and
then follow a sequence of steps such as the following:
1. Strategic analysis: Identify the important enterprise goals and how they might
impact the cloud strategy. Try to keep an orientation towards future
developments.
2. Identification of strategic cloud goals: State the core goals, ideally supported by
quantifiable and measurable statements regarding goal achievement and time-
line. Typical goals include effectiveness, efficiency, quality, flexibility, pro-
ductivity, and security.
3. Identification and specification of planning objects: Potential planning objects
include staff, infrastructure, applications, integration, and services. For each
such object, partial goals need to be specified, which when reached deliver the
overall goal. The cloud strategy should now be ready.
4. Review of the cloud strategy: Check the strategy design, typically in a workshop
to make sure that there are no more contradictions, unclear statements, or
passages that require further discussion; if such issues are discovered, they
should be taken care of right away.
5. Publication and enactment of the cloud strategy: Make the strategy known
within the enterprise; stipulate organizational measures for its implementation.
• Specification of the functionality to be externalized,

Preparation according to the cloud strategy
1 & Planning • Identification of security requirements, availability,
response times, etc.
• Development of criteria a provider has to meet

Provider • Analysis and comparison of providers and their
2 Selection offerings; documentation of results
• Decision on a specific provider
• Contract and service-level agreement (SLA)

negotiation, respecting all non-functional
Detailed Planning;
3 Contract Negotiation
requirements (e.g., security, confidentiality)
• Establishment of direct contacts between customer
and provider
• Migration planning in cooperation between CSP and

company departments
Implementation
4 & Migration
• Outsourcing of services and/or data
• Implementation of regular checks for SLA
conformance
• Regular security updates and SLA checks

5 Operation
• Observation of market situation for better offers
Fig. 4.3 Sample implementation of a cloud strategy
A cloud strategy that is the result of such a proceeding is a reasonable basis for the
next steps towards cloud adoption. Figure 4.3 shows a sample realization of a cloud
strategy based on such preparations. Notice that it is directly derived from the
generic IT strategy that was shown in Fig. 4.1. The implementation requires that
necessary responsibilities have been clearly assigned, and that all parties involved
are part of the process from the outset.
4.1.3 Cloud Provider Evaluation and Monitoring
Once a move to the cloud has been decided on, there are still several open issues,
especially for a company whose core competency is in a field other than IT. This in
particular applies to small and medium-sized enterprises (SMEs), who often hesitate
the most or the longest before making a decision regarding cloud sourcing. There is
a simple reason for that: Large enterprises have often relied on a particular IT
company for an extended period of time, and when a move to the cloud is sug-
gested, the company normally relies on the advice of the IT company. Startup
companies, on the other hand, often rely on the (public) cloud entirely, since cloud
sourcing gives them an easy and comparatively cheap way of testing their idea or
product; if it “flies”, they can still decide whether to remain in the cloud or to invest
in IT resources themselves; if it does not, they can simply return the resources.
We therefore primarily target SMEs in the measures that are briefly described
next, and which indicate alternatives that are available when they need to make a
decision about cloud adoption. These include:
1. EVACS, an Economic Value Assessment method of Cloud Sourcing,

2. Cooperative community clouds,
3. Cloud intermediaries.
Before establishing a full-blown business case, as suggested above, for Step 1 of a

cloud strategy, it is often helpful to have easy-to-verify criteria that can identify
those projects that are likely to be good candidates for cloud sourcing. SMEs would
hence greatly benefit from a “filter” that can help them isolate unfit projects easily
and let them focus on promising cloud sourcing undertakings. Such a filter may
come in the form of a method called EVACS (short for Economic Value Assessment
of Cloud Sourcing). This method applies the well-known principle of stepwise
refinement: Project ideas are analyzed in three incremental steps so that most of the
eventually unattractive projects are eliminated with comparatively little effort in the
first two steps. EVACS comprises three steps as shown in Fig. 4.4:
1. Sanity check: Ensure that the cloud paradigm is in principle adequate for the
problem at hand and check that typical benefits of cloud sourcing are likely to be
leveraged. As outlined above, the first step for a company should be to look for
indicators that promise attractive results from a cloud sourcing and
contra-indicators signaling that a cloud sourcing is likely to be a suboptimal
choice.
Cloud sourcing candidates
Make sure cloud paradigm

1 Sanity Check is basically adequate
Using rules of thumb, roughly

2 Preparatory Analysis estimate the business value
of a cloud sourcing project
Building a detailed business case

3 In-depth Analysis using the proposed guidelines on
cost factors and benefits
Profitable projects
Fig. 4.4 The 3-phase EVACS method

2. Preparatory analysis: When an analysis has shown that a cloud sourcing is in

principle viable, an SME can use more explicit rules of thumb to investigate its
attractiveness. These rules require more detailed input and, thus, usually require
a finer grain for all planning associated with the project. For example, it should
be clear whether an SME will source SaaS or IaaS because both models exhibit
specific features.
3. In-depth analysis: When a project proposal has passed the tests of the first two
steps, it is likely but still not certain that the project is economically viable.
Therefore, a detailed business case has to be developed that considers all cost
factors and possible benefits and compares them to the conceivable alternatives.
A cooperative community cloud is based on the concept of a cooperative, i.e., a

business organization owned and operated by a group of individuals (or companies)
with a common goal and for their mutual benefit (MacPherson 1995). Traditionally,
cooperatives are found in industries such as agriculture, finance, and the real-estate
industry. Generally speaking, the cooperative paradigm allows for a more flexible and
diverse alignment of the supplied products with the customers’ demands. It implies a
common administration as well as sharing of relevant resources. Thus, it often leads to
lower costs per unit. SMEs forming a cooperative typically want to realize synergies
from a joint organization of elements on their value chains. This leads to economies of
scale, scope and skills (Williamson 2005). By concentrating their market power, the
SMEs can also compensate competitive disadvantages towards large companies
(Theurl and Meyer 2005). Cooperatives are characterized by clear rules facilitating the
handling of uncertainty and thus fostering of credibility and trust among the members.
The strategic orientation of the cooperative is built on the notion of creating value only
for the members, which are simultaneously the owners of the cooperative. Thus,
anonymous vested interests are excluded from all strategic decisions, and the values of
the cooperative are grounded in the member value, a special type of shareholder value
that can be interpreted as the overall business value of the cooperative.
Applied to the cloud services domain, the cooperative paradigm attempts to
alleviate deficiencies in the individual IT infrastructures and to build collective
competence with regard to cloud services. This principle is illustrated in Fig. 4.5.
The cooperative community cloud focuses on the cooperation of SMEs from a
variety of industries. The enterprises organize parts of their IT systems and pro-
cesses jointly in order to become more flexible and cost-efficient. The service
portfolio of the cooperative can be fine-tuned to suit the needs of both the SMEs
and their customers. This reduces information asymmetries, creates transparency
and reduces the potential of exploitation. Cooperatives are distinguished by their
governance elements, i.e., structures for incentives, decisions, control and coordi-
nation. These elements provide stability and lead to mutual trust, as all cooperatives
are designed to be long-term undertakings. There are four elements of cooperative
governance that are particularly important for the cloud computing domain: the
notion of the member value, the concept of consistent incentives, the systemic trust
of a cooperative, and its size and locality.
Cost-efficiency Trust
Cooperative
Community Cloud
Legal
Cooperative Monitoring
Certainity
Organizational Technical Legal

Governance Governance Governance
Fig. 4.5 Principle of the cooperative community cloud
The third approach we mention as a remedy towards uncertainties an SME might

be confronted with when it comes to cloud sourcing decisions is to retract to the
notion of an intermediary. An intermediary is a third party that facilitates economic
transactions between two other parties. If an intermediary focuses on the cloud
domain, it is referred to as a cloud intermediary. In the sense of a traditional
intermediary, a cloud intermediary arbitrates between supply (CSPs) and demand
(potential cloud users) in order to facilitate cloud sourcing for SMEs by reducing the
upfront effort of identifying, comparing, and screening the CSPs and their services.
To do so, the intermediary propagates best practices and offers specific consultancy.
Intuitively, a cloud intermediary helps cloud users “clear the service jungle” so
that they can find a suitable match easier and, thus, cheaper. Evidently, a cloud
intermediary can be oriented towards the buy-side, towards the supply-side, or it
can be oriented towards both and hence be bilateral. A buy-side cloud intermediary
is formed by a joint venture of potential cloud users that would like to bundle their
cloud-related activities in order to realize synergies and scale effects. The cloud
intermediary is, hence, an agent of the cloud users and strongly biased to enforce
these users’ interests towards CSPs. The intermediary focuses on the users’ prob-
lems, such as service identification, provider selection or migration from one CSP
to another. If a group of CSPs that share a common goal form a joint venture, it is
called a supply-side cloud intermediary. The intermediary then acts as an agent of
the suppliers and is, thus, focused on enhancing their interests. The formation of
such a cloud intermediary is attractive mainly for small players in the cloud market
that offer complementary services. A bilateral cloud intermediary arbitrates
between cloud users and CSPs while being tied to both sides. This is particularly
Cloud Intermediary
“Pass-through” services
Core value proposition

Proper
(Supply-side)
(Buy-side)
(Supply-side)
services
(Buy-side)
SMEs
CSPs
(Supply-side)
Supply-side value
(Buy-side)
SMEs
CSPs
Buy-side value
Key activities „Atomic“
SMEs
CSPs
proposition
proposition
services
Compound
services Key resources
Revenue streams
Competition
Fig. 4.6 A multi-perspective model of cloud intermediary value-creation
relevant when there is a group of enterprises that plan a close cooperation based on
a mid- to long-term horizon. In this scenario, a bilateral cloud intermediary can
reduce the coordination overhead between the partners and allow for the realization
of economies of scale, scope and skill.
A particularly attractive setting can be given by a cloud intermediary that is also
a community cloud provider. In this case, all benefits of a cloud intermediary are
combined with the possibility to offer custom services that are tailored towards the
needs of the members of the cloud community. These services can comprise both
completely self-provided functionality and compound services, created from a
combination of third-party cloud services (providing some kind of added value);
they are also referred to as higher-order services. A cloud intermediary that also acts
as a CSP is called a hybrid cloud intermediary. The model of this type of inter-
mediary is shown in Fig. 4.6.
We finally mention that there are numerous other tools available which can help
SMEs in their handling of cloud computing and cloud sourcing. For example,
CloudHarmony1 “provides objective, impartial and reliable performance analysis to
compare cloud services” and simplifies “the comparison of cloud services by
providing reliable and objective performance analysis, reports, commentary, met-
rics and tools.” So, if you are interested, for instance how AWS compute services
1
cloudharmony.com/
for Europe perform over a 30-day period and what individual services have had,
this would be a useful source. Similar types of analyses were previously available
from CloudSherpas, which has meanwhile been acquired by Accenture.
The effects of implementing a cloud strategy and of utilizing these various tools
for cloud-oriented decision-making are meanwhile becoming known: According to
a 2015 survey of the British Cloud Industry Forum, cloud adoption grew from 48%
in 2010 to 84% in 2015,2 and the top five benefits the interrogated companies found
achieved through cloud service deployment were:
1. More flexible access to technology,

2. Faster access to technology,
3. Cost saving over on-premise solutions,
4. Reduction in capital expenditure,
5. On-demand/predictable cost.
Companies have also recognized that by moving some or all of their IT resources
to the cloud, they can automate their business; in particular, sales, (potentially)
increase agility, and perform predictive analytics to an extent that was unthinkable
before.
The cloud has meanwhile reached a state where considerable amounts of money
can be made or spent. Through its development it thus resembles the development
of making money on Internet (e.g., the IPO of Netscape Communications in 1995,
see Chap. 1) in general or the development of e-commerce from a niche activity to
a ubiquitous, location-independent 24/7 business in particular. The Internet has
more than once produced new business models and artefacts, whose value initially
seemed low, but which at some point essentially exploded. The cloud seems to be
no exception.
4.1.4 Crowdsourcing for Enterprises
In Chap. 1 we discussed the “crowd as your next community” and described

crowdsourcing as a modern way of employing and exploitation a vastly unknown
crowd of workers for specific tasks or job. We mentioned crowdjobbing as a
particular form of crowdsourcing, and crowdfunding as a popular way to collect
money of (often startup) projects. While it may seem that crowdsourcing is typi-
cally something startups or individuals turn to when they need some form of help,
there is a different side to it, which we describe next. Look at InnoCentive, whose
motto3 is “Our Challenge Driven Innovation methodology and software result in
fresh thinking and cost-effective problem solving, whether you want to crowd-
source solutions from external Solvers or better harness the intelligence of your
internal team.” Their LinkedIn presence says, “InnoCentive offers innovation
2
raconteur.net/technology/cloud-is-shaping-a-new-uk-digital-landscape
3
www.innocentive.com/
platforms to assist companies and government agencies with their innovation

needs.” They maintain a network of more than 350,000 so-called “Solvers” which
can be employed to find a solution to a problem a company is not able to solve
alone. Their customers include organizations such as Booz Allen Hamilton, Eli
Lilly & Company, the NASA, Procter & Gamble, Scientific American, or Thomson
Reuters as well as several government agencies in the U.S. and Europe. Companies
can publish a “request-for-partners challenge” through InnoCentive, to which
Solvers who feel capable can respond.
A similar approach can be found at Crowdsource,4 “an integrated, simple-to-use
platform where businesses and freelancers can collaborate and establish the flexible
work environment of the future.” Crowdsource, who recently rebranded into
OneSpace,5 is yet another manifestation of a modern form of problem solving and
at the same time a new form of working. Indeed, we are more and more living in an
on-demand society where solutions need not be kept available all the time at
potentially high costs, but it suffices if they are available only when needed. An
example from everyday life is car sharing, about which we will have more to say in
Chap. 5. However, companies have started thinking along the same lines. Espe-
cially in countries with high social security (e.g., in Germany), where it is not easy
to fire an employee even if he or she is not performing as needed, crowdsourcing
work is an interesting option, not from the employee’s point of view, but from the
employer’s. On the other hand, many people are no longer happy with life-long
employment in the same company or type of job; for them turning to a crowd-
sourcing platform like the two mentioned here may be a viable option.
While crowdsourcing has so far only had a short history in the job market, and
while also crowdfunding only become popular a few years ago, there is an older
form of crowdsourcing especially in computer science that goes back to the 1970s:
the development of the UNIX operating system. While originally an effort by a few
people, most notably Dennis Ritchie and Ken Thompson, the development of this
operating system gave rise to vendor alliances in the 1980s like the Open Software
Foundation, Unix International, and X/Open, to which Richard Stallman6 respon-
ded in 1985 with his GNU Manifest7 and ultimately the GNU/Linux operating
system, a free, multi-user, Unix-similar operating system. It took a while to
understand the benefits of free software and collaborative software development
beyond company borders, but the Apache Software Foundation8 dates back to 1999
and has been propagating open-software projects ever since. It consists of a dis-
tributed community of developers who contribute for free which can be considered
as a large crowdsourcing effort. However, there is a difference between free soft-
ware and open-source software, see, for example, www.fsf.org/
4
www.crowdsource.com/
5
www.onespace.com/
6
www.stallman.org/
7
www.gnu.org/gnu/manifesto.en.html
8
www.apache.org/
Another open community is the Moodle community,9 which develops and

maintains a learning platform designed to provide educators, administrators and
learners with an integrated system to create personalized learning environments.
Moodle is used globally and can be downloaded to a local web server. This
community is based in Australia and financially supported by a number of partners
worldwide.
Apache and Moodle are prominent examples of collaboration projects in which
not all participants know each other, yet work towards a common goal or benefit.
This concept deviates slightly from crowdsourcing in the strict sense, where there is
a “requester” and “workers”; here the requester is the general public, and the
workers are comprised of the community of software developers.
4.2 Business Intelligence and the Data Warehouse 2.0
We noted in the previous section that companies have become enabled by the cloud
to perform data analytics. This is due to the fact, as detailed in Chap. 2, that
computing power can be obtained from the cloud on-demand and in particular, in
arbitrarily large amounts. Moreover, numerous analytical tools are nowadays pro-
vided via the cloud, either for free or for a fee, to which everybody with an Internet
connection has access.
We have mentioned in Sect. 2.3 already that data warehouses emerged in the
1990s as a way to exploit the increasing amounts of data that digital businesses as
well as customers produce. We also mentioned that a data warehouse can be viewed
as a separate database, distinct from the operational one(s), that collects and inte-
grates data from a variety of operational sources and makes that data available to
(typically compute-intensive) analytical tasks. Enterprises commonly move trans-
actional data to the warehouse, where it is sent through extraction, transformation,
and loading (ETL), and ultimately made available to online analytical processing
(OLAP) or data mining applications. We also saw in Sect. 2.3 the “classical” data
warehouse architecture, which in a bottom-up fashion, distinguishes operational
database systems, a staging area for ETL, from the actual data warehouse core,
which is topped by a layer of analytical tools. We now consider this as the first
generation and term it “data warehousing 1.0.” In this section, we indicate that the
architecture of a data warehouse is now significantly more flexible and is capable of
incorporating a variety of additional sources and services. We look at the organi-
zational dimension of big data and consider the situation where a company or
institution wants to make use of it. What does it take to do so, and what needs to
change if the company has previously set up a data warehouse for data analytics
purposes? In particular, we briefly look again at strategy development and then
present a modification of the “classical” data warehouse architecture that is intended
to accommodate big data requirements.
9
www.moodle.org/?lang=en_us
4.2.1 Data Mining
Before we delve into recent developments that are replacing the classical data
warehouse, we take a quick look at one of its major applications, data mining. We
will not discuss data mining in its full generality, but mainly focus on one particular
application, association rule mining, in order to provide a summary of what is
achievable and how it can be approached.
Data mining is concerned with the algorithmic extraction of knowledge from
large data collections that is interesting for or relevant to a company or its appli-
cations. The important characteristic is that from the outset, it is often not exactly
clear what is being looked for. The information to be extracted from given data can
consist of patterns, associations, or relationships (like in association rule mining),
rules, conditional statements, classifications, clusters, time-series developments,
and various other outcomes. Typical usages of mining results were originally just
seen in areas such as marketing or customer relationship management and have
meanwhile been expanded to business applications such as advertising or recom-
mendations as well as to a variety of non-business areas. In CRM, the goal is often
to compose a customer profile or a customer 360-degree view, or, as a manager
from a large online retailer once put it: “Help people find stuff they didn’t know
they wanted.” Typical questions related to the already discussed customer journey
triggering data mining include: Who buys what? Who is interested in what? To
whom can I recommend a new product or which products to a recent purchase?
Which products are typically bought as a sequel to the purchase of another product?
When dealing with a social network, the interest might be in which communities
exist and how to find and address them.
The Data Mining Process
The basic data mining process is shown in Fig. 4.7: It starts with given input data,
which may be internal to the respective company or stem from external sources.
From this data, a selection of relevant portions has to be made, and the selected data
may need to undergo preprocessing or preparation (e.g., cleansing, reduction,
curation). Then a mining function, to be applied to the resulting data, is chosen and
executed. The mining results obtained are subject to interpretation and usage, but
may also give rise to an iteration of the process with new or more data or simply
with a different selected data set.
This general process, which has been in use for quite some time, albeit with
variations, has meanwhile been refined into the CRISP-DM methodology. CRISP-
DM stands for Cross-Industry Standard Process for Data Mining10 and essentially
is a hierarchical process model consisting of sets of tasks described at four levels of
abstraction, which top to bottom describe various phases, then generic tasks of each
phase, then specialized tasks, and finally process instances. There are six phases
altogether, shown in Fig. 4.8, which go beyond what was shown in Fig. 4.7 in that
CRISP-DM considers the business context, and reflects the fact that often-statistical
10
crisp-dm.eu/home/crisp-dm-methodology/
4.2 Business Intelligence and the Data Warehouse 2.0 171
Input Data Selection

& Preparation
Data
Iteration, Selection of
potentially with Mining Function
new data
Mining Results
Interpretation
and Evaluation
Fig. 4.7 General data mining process
models need to be developed as a step in data mining, in particular when it comes to

such functionality like clustering or classification.
Association Rule Mining
We now look at association rule mining, where the goal is to analyze customer
transactions, find regularities or rules in them that hint to items being associated,
e.g., often bought together, and from that derive conclusions as to what these
customers might also have an interest. We use an example from after-market sales
for a particular type of sports car, where parts people purchase, once they have
driven their car for a while, might be taken from the following small “catalogue:”
Part-ID Description
S Spoiler
R Radiator cap
C Cold air intake
PC Performance camshaft
F Fuel rail cover
0 Oil dipstick
PF Performance oil filter
St Strut tower cap
T Tuning programmer
Business Data
Understanding Understanding
Data
Preparation
Deployment
Data
Modeling
Evaluation
Fig. 4.8 Phases of the CRIPS-DM methodology. Source www.crisp-dm.eu/reference-model/
Some parts are purchased in isolation, some together, and sometimes parts are
acquired after earlier purchases. We just look at the second case here. The parts
dealer records his customer transactions, which are as follows:
Transaction ID Parts purchased

1 R, O, PF, St
2 St, R
3 PC, T
4 C, T
5 PC, C, T
6 St, R, O, PF
7 S, St, R, O, C
8 S, O
9 C, PC, F
10 S, R, T, St
11 R, C, T
The basic idea of association rule mining is to discover rules for items that are
frequently purchased together. In our example, we see that the strut tower cap
(St) and radiator cap (R) appear together in several transactions, which might
indicate that the respective customers are interested in engine bay dress-ups. More
formally, a rule will be an expression of the form L ! R where L and R are sets of
items and the arrow indicates that if the items in L are bought, then there is good
probability that the items in R are also bought in the same transaction. For example,
{St} ! {R} means that with some probability, if someone buys a strut tower cap,
he or she will buy a radiator cap as well.
For ease of explanation, we consider a simple transaction database like the one
just shown which records a transaction ID and the set of items that were purchased
within that transaction. Note that we thus abstract from a number of further details,
such as the number of items of each type, the unit price for each item, the overall
price for the transaction (i.e., we cannot really determine the “customer value”), or
who was actually the customer. Such information can give rise to refined proce-
dures; we focus here on a simple version to convey the principle.
The algorithm we are going to describe is based on counting the frequency of
individual items within transactions, then pairs of items, then triples of items and so
on. This is done relative to a predetermined threshold, the (minimum) support of an
item set X, which is the (minimum) percentage of transactions that contain X. For
our example, we assume a required minimum support of 0.4, i.e., we are only
interested in combinations of items that occur in at least 40% of all transactions.
The algorithm proceeds in a stepwise fashion and considers larger sets of items
in each step. In each iteration, the support of all candidate item sets considered is
first determined; then those are eliminated which do not meet the required minimum
support, and the others are considered “frequent.” The remaining sets form the input
for the next iteration. For singleton sets, we obtain the following:
Part # of occurrences Support (%)

C 5 45
F 1 9
O 4 36
PC 3 27
PF 2 18
R 6 55
S 3 27
St 5 45
T 5 45
Given a required minimum support of 0.4, we can see that only parts C, R, St,
and T qualify for further consideration, since these are the only parts occurring in at
least 40% or 5 of the 11 transactions.
Since we are interested in rules of the form X ! Y, we need to look for larger
sets and hence iterate the process with the following candidate sets containing two
items each:
Candidate Set # of occurrences Support (%)

C, R 2 18
C, St 1 9
C, T 3 27
R, St 5 45
R, T 2 18
St, T 1 9
Obviously, only set {R, St} meets the required support of 40%, so we cannot
form larger item sets, and the only rules we can form in this case are R ! St and
St ! R. Now the questions is: Which of these two is significant? To evaluate this,
we employ a second measure such as the confidence of a rule of the form X ! Y,
which refers to the percentage of transactions that contain Y, provided they contain
X. For R ! St, we find that 5 of the 6 transactions that contain R also contain St, so
we can calculate a confidence of 5/6 or 83%. For St ! R, all transactions that
contain St also contain R, so for this rule the confidence is even 100%. If we were
given a required minimum confidence of, say, 90%, the first candidate rule would
be dropped, while the second rule would prevail.
The car parts shop could now try to exploit this result in a variety of ways.
Firstly they could raise the price of radiator caps, since with high probability, if
someone buys a strut tower cap, he or she will also purchase a radiator cap (and
probably care little for a higher price). They could place the parts near each other at
the shop, since people shopping for St will most likely also buy R. Alternatively
they could place them far apart, so that if someone shops for St, he or she will come
across a number of other parts when going for R, etc.
Note that the choice of minimum support and minimum confidence are crucial
for what the algorithm we have just indicated will produce. Suppose, for example,
that we lower the required minimum support to 25%, all singleton sets except for F
and PF are frequent, and we obtain more frequent item sets of size 2:
Frequent item set # of occurrences Support (%)

C, T 3 27
O, R 3 27
O, St 3 27
R, St 5 45
With these, we can now form candidate set {O, R, St} of size 3 and obtain the
following potential rules:
Rule candidate # of occurrences Confidence (%)

O, R ! St 3 out of 3 100
O ! R, St 3 out of 4 75
St ! O, R 3 out of 5 60
R, St ! O 3 out of 5 60
O, St ! R 3 out of 3 100
R ! O, St 3 out of 6 50
If the required minimum confidence is now, say, 75% instead of the previous
90%, we have found three new association rules involving 3 items instead of just 2,
which are obviously more interesting than the previous ones involving just 2 items
(although these are still valid, as are now rules O ! R, O ! St, R ! St, and
St ! R).
The algorithm we have just described by way of an example is generally known
as the Apriori algorithm. It utilizes the “Apriori principle,” a monotonicity property
of frequent sets, which states that all subsets of a frequent set are frequent as well
(or, conversely, any superset of an infrequent set must also be infrequent). This
property allows the algorithm, which as we saw proceeds by determining frequent
sets of size 1, then of size 2, then of size 3 and so on until no more frequent sets can
be found, to cut work in the middle, as we have seen in the example above. On the
other hand, the basic algorithm gives rise to a number of improvements, which have
been suggested over the years:
1. For every new size of an item set, it makes a complete scan over the given
transactions; while a limited (and small) number of scans can be shown to
suffice.
2. The number of rules the algorithm produces as output can be very large, in
particular when the required minimum support value is low; in extreme cases it
can even exceed the number of transactions given as input. This can be com-
pensated, for example, by limiting the length of a rule (or of either of its sides)
or by considering “condensed” representations of frequent item sets that sum-
marize their essential characteristics.
3. The confidence measure can easily be shown to be not the best for evaluating
rule, which is which measures such as the lift or the correlation as well as many
others have been suggested.
We also note that the Apriori algorithm is based on the generation of candidates
(for frequent item sets as well as for rules), which can actually be avoided, for
instance by sampling techniques. For all of these considerations, we refer the reader
to textbooks listed in the “Further Reading” section for this chapter.
Classification and Clustering
Association rule mining has become very popular as a data mining functionality,
yet it is certainly not the only one. We briefly look at two others, classification and
clustering, which have wide applications. While association rule mining is generally
a descriptive technique, classification is a prescriptive one (and clustering is again
descriptive).
In brief, classification is the task of predicting a class label for a given data item
or data point. Typical applications for classification include credit approval, target
marketing, or medical diagnosis, and typical models employed are classification
rules, decision trees, or statistical approaches (e.g., Bayes classifier). Given a
database, D, of data items and a set of classes C = {C1, …, Cm}, the classification
problem is to define a mapping f: D ! C, such that each element of D is assigned
to one class. For example, if the “items” are customers and the classes are A, B, and
C (depending on how much turnover each is generating), we get a classification of
the customer base into excellent (A), good (B) and reasonable (C). The approach to
obtain this classification for a large base is to start with a training sample (which
may be chosen using an expert’s knowledge) and then apply the model thereby
created to new data as it comes in.
While in classification the target classes are predefined (which “supervises” the
actual process), in clustering the output clusters are created on the fly. Here the goal
is to start from a given data set, and group data items that are “close” or “similar”
into the same cluster in such a way that for each cluster there is a representative
point that summarizes the cluster. For example, the boss of the aftermarket auto
parts company we learnt about above has five sales reps, and now would like to
organize all customers of the company into five distinct groups so that each can be
assigned a different sales rep. Clearly, customers in each group are expected to have
similar interests when it comes to auto parts, and two customers with very different
interests or buying patterns should not be in the same group. The remedy here is to
use a clustering technique that is based on a suitable similarity measure or good
partitioning criteria; numerous options include well-accepted approaches like
k-means or DBSCAN.
A challenge arising in clustering is outlier detection. Outliers are data objects
whose behavior deviates considerably from the general expectation, and their
detection has many applications, for example, in fraud detection, e.g., in medical
care, public safety, industry damage detection, credit card fraud, or intrusion
detection. For example, an outlier in a dataset containing personal data focusing on
Sentiment Social Network

Recommendation Search
Analysis Analysis
Analysis and
Visualization
Dash- Standard Spread- Ad-hoc Data
Planning
boards -report sheet Query Mining
Web-based Access Data Marts
Hadoop
Data Data
Map/ Meta Data Warehouse
Warehouse OLAP Warehouse
Reduce Data Data Basis Server
Extension Core
HDFS
Transfor- Loading / Staging

Selection Extraction Cleansing
mation Updating Area (ETL)
Streams
Static and
HDFS Logs Dynamic
Data Sources
Internal Data Sources External Data Sources
Fig. 4.9 Data warehouse “2.0” architecture enhanced for big data processing
age, is one for people more than 100 years old (an age that most people do not
reach). Again, there are many established techniques for detecting outliers.
4.2.2 Strategy Development for Big Data Exploitation
Data mining has become one of the most prominent and widely used applications
for data warehouses in the late 1990s and early 2000s, in particular when a
warehouse has been configured into data marts that are selected for specific usages.
On the other hand, the software industry has long discovered that data mining
“packages”11 can be used even without a data warehouse, for example by just
putting them atop an operational system such as a relational database. This
development has continued in recent years, with digitization becoming ubiquitous
and data volumes rising faster and faster, and many software developers do not
explicitly rely on a data warehouse anymore, some not even on a database.
It is generally straightforward to extend the basic data warehouse architecture,
which we have seen in Sect. 2.3 (cf. Fig. 2.11 in Chap. 2); the same architecture
can also be recognized from the right half of Fig. 4.9, yet the figure also indicates
how to extend a traditional data warehouse architecture for big data. Indeed, what is
new in this figure is a wider selection of external data sources than typically
considered and the extension by a map-reduce engine such as Hadoop (or other
components from the Hadoop “ecosystem”) on the left side. Various ways of
11
Like the ones listed at www.predictiveanalyticstoday.com/top-free-data-mining-software/
communication need to be made available between these old and new building
blocks, but in the end, the set-up might look as shown in the figure.
When it comes to big data analytics, it is no longer necessary to “think” just in
terms of a data warehouse or an extension of it. We therefore depart from the
assumption that data analytics requires an underlying data warehouse and will
consider more general architectures next. To this end, we will first adapt our general
IT strategy we have seen earlier in this chapter (cf. Fig. 4.1) to the big data case and
assume that the typical CIO needs to decide whether to invest in additional tech-
nology for these new developments. Once again, a SWOT analysis can help, which
may be able to reveal the strengths, weaknesses, opportunities, and threats of the
big data project envisioned. Another tool that could be employed in
decision-making is context analysis, which looks at objectives, benefits, and the
general context and environment into which a project should fit.
We here adapt our generic IT strategy, which we have already used in cus-
tomized form for the cloud adoption case, to a big data decision scenario. Slightly
different from before, we assume that a company whose core business is not in the
IT domain has decided to step into the big data area, and consider the strategy
shown in Fig. 4.10. It starts with information gathering and planning, which could
involve either a SWOT analysis or a context analysis or both. If a decision is made
in favor of a big-data project or of a general adoption of big-data technology,
relevant data sources need to be selected. In an enterprise this could be a variety of
in-house sources, e.g., databases, but could also be a variety of external sources,
e.g., from the Web, which may provide relevant data for free or for a cost.
The second phase includes a selection of the technology to be employed, e.g.,
the selection of a specific Hadoop implementation. Then detailed planning and
implementation can take place. Finally, the system or project is in operation and
may need regular or ad hoc maintenance.
Our next goal is to indicate what a software selection (or its result) might look
like for specific big data use cases.
4.2.3 From Big Data to Smart Data
As previously mentioned, running business intelligence and analytics applications no

longer explicitly requires the existence of a data warehouse. Many tools are nowadays
available that can be operated as add-ons to the operational systems and databases that
an enterprise is already running. In this case, an explicit architecture design is not
required, and the same remarks apply to big data applications. However, experience
shows that strategy as well as architectural considerations, in particular when they are
well documented, can help an enterprise prevent project failures.
In the following subsection, we discuss several applications relevant to busi-
nesses and present either their potential sequence of steps or a tool setup or both.
These applications include text analysis, in particular sentiment analysis, as well as
several others.
• Specification of analysis targets

Preparation • Identification of business goals and benefits
1 • Identification of technical and legal requirements
& Planning
• Selection of data sources and team members
• Development of criteria a software solution has to

meet
2 Software Selection • Analysis and comparison of suitable software
packages and their functionality
• Decision on a specific provider
• Detailed planning according to the various

3 Detailed Planning requirements
• Preparing organizational changes
• Installation and solution implementation

4 Implementation • Execution of analyses
• Interpretation and usage of results
• Regular security updates & maintenance

5 Operation • Observation of market situation for better offers
• Extension of the software as new requirements arise
Fig. 4.10 Big data strategy
Analysis of Text. Concepts and Tools for Sentiment Analysis

With the ubiquitous presence and usage of social networks and public blogs,
companies are very much interested in what people are posting about them or their
products. As we mentioned in the context of blogs in Chap. 1, blogs require a
moderator who is able to decide whether to publish new comments to a post, since
otherwise all kinds of negative or inappropriate comments may accumulate. While a
blog is often under the control of a particular enterprise, the same is no longer true
for public blogs (see, for example, www.inkybee.com/top-60-pr-blogs/ for a listing
of 60 top PR blogs in the world) and for social networks, where people may post
anything on their own wall and in particular outside a company’s representation.
Since posts are unstructured text, tools for text analysis are required in order to
determine their meaning, their topic, or their sentiment to find out about people’s
opinion. To this end, Fig. 4.11 shows a general methodology for text analysis that is
divided into three layers.
The input layer consists of a collection of text documents, e.g., blog entries that
have been collected from the Web, or single documents to be analyzed. Typically,
the input undergoes various forms of preprocessing. For example, stop words (and,
or, not, the, etc.) are eliminated, and words are reduced to their word stem. This can
be done through matching words in the document against a specially compiled stop
word list. The set of common stop words that is often used in this context goes back
to Lewis (1992) and includes prepositions, auxiliary words, etc. When preprocessing
is finished, a number of options are available to analysis the result, which ranges
Input Text Document

Collection
Single Document
Preprocessing
filtering, stop word removal, stemming, pruning, etc.
Basic Statistical
Computations
Information
Extraction
Topic
Discovery
Clustering
Categorization
Summarization
Sentiment
Analysis
Basic Entities, Document Documents

Main topics Summaries Opinions
Statistics Relationships clusters by category
Fig. 4.11 General text analysis procedure
from collecting basic statistics about a given text or a collection to opinion mining as
shown in Fig. 4.11.
Depending on the input data, different methods within each processing technique
are applicable. The middle layer of Fig. 4.11 shows possible such techniques; their
respective outputs are shown at the bottom of the figure. In general, output data is
arranged here in the order of increasing complexity (from left to right), i.e., the
effort grows with more complex tasks. The figure also shows that results of pre-
vious techniques can be used in subsequent steps. For example, an analytics process
might start with gaining some basic statistics knowledge about the given text; the
process might proceed with extraction of entities from the text, which is about
identifying (and extracting) structured information from the generally unstructured
text document. The result is knowledge about the main entities occurring in the
document(s) and their relationships.
Next, the main topics of the text can be discovered by executing a topic mod-
eling algorithm, which is essentially a statistical approach aimed at discovering the
core topics in a document corpus. Topic modeling algorithms aim at automatically
revealing and annotating document collections with their topics; see, for example,
Blei (2012). In case of input data being a collection of documents, their structure
might be explored by clustering and categorization techniques. Finally, more
recently developed techniques of text summarization and sentiment analysis might
be applied in order to obtain a brief summary of the document(s) and see if there are
any sentiments present in the text.
The techniques presented in Fig. 4.11 can be applied independently of each

other or in combination, and also sequentially or in parallel. The figure shows
several possibilities for connections, such as going from topic discovery to cate-
gorization or from clustering to summarization. Indeed text summarization can use
methods from text clustering, categorization and information extraction techniques.
Text categorization and clustering methods can both directly be applied to the task
of text summarization. Both techniques can help building extractive summaries.
Text categorization is applied to construct extracts, by identifying the most
important sentences and by assigning importance to sentences. Text clustering
follows a different approach: It groups the most similar sentences and extracts one
sentence representing each group, combining them into a summary. Finally, sen-
timent analysis techniques engage entity recognition in order to extract certain
aspects of the entity about which an opinion is being searched.
Sentiment classification, which is an important subtask of sentiment analysis,
can be viewed as a special case of text categorization. However, it is considered
harder because sentiments are less tangible than topics.
A more specialized architecture for opinion mining applications is shown in
Fig. 4.12. It is divided into three modules: collection, analysis, and presentation.
These modules are connected through a process flowing downward from the col-
lection module to the presentation module. The process flowing through the anal-
ysis module (i.e., the analysis process) is divided into three sub-processes:
preprocessing, extraction, and post processing.
The process is executed as follows: The data collection module retrieves doc-
uments from a set of sources. These sources are either predefined or discovered
through a crawling process. A data import module will most likely include a
crawling engine to fetch Web pages or a mechanism that utilizes provided APIs to
gather opinionated content. Documents fetched will be passed to the analysis
process, which in turn consists of a number of different processing steps. Extraction
steps common in opinion mining applications are opinion holder extraction, subject
and subject-feature extraction, metadata extraction, and sentiment orientation
extraction (sentiment classifier). Diversions from the process flow shown in
Fig. 4.12 are possible however: Some preprocessing steps may be dependent on
certain extracted data or classifier outputs, like the subjectivity filter or the subject
filter, creating an optional feedback into the preprocessing process.
As we have indicated above, an application such as sentiment analysis no longer
requires a data warehouse as numerous components are nowadays available which
can serve the purpose at hand without serious overheads. These components come
either stand-alone or packaged into one of the major (according to market presence)
Hadoop distributions from Cloudera,12 Hortonworks,13 or MapR14; the latter in
2016 commonly included the following:
12
www.cloudera.com/
13
hortonworks.com/
14
www.mapr.com/
Fig. 4.12 Specialized architecture for opinion mining
• Data warehouse infrastructure: Apache Hive

• High-level data analysis: Apache Pig
• In-memory- and stream-processing: Apache Spark
• Machine learning: Apache Mahout
• Distributed database: Apache HBase
• Database integration: Apache Sqoop
• Stream data collection: Apache Flume
• Workflow scheduling: Apache Oozie
• Cluster coordination: Apache ZooKeeper
A sample setup for sentiment analysis using the Hortonworks framework, which
builds upon a variety of open-source components from the Apache ecosystem
(www.apache.org/), including Hadoop technologies including HDFS and YARN as
Analysis Analysis
DB Approach Methods
structured data
unstructured data content analysis/
text mining
Topic/issue/
trend-related
Tracking method trend analysis
APIs
RSS/HTML parsing
Combination of
Methods
opinion/ opinion mining/
sentiment-related sentiment analysis
Tracking approach Summarizing
Results
Analysis keyword-based
actor-based social network
Purpose random/explorative structural
analysis
Preprocessing
Social Media dash-
board
Data tracking/Collection Data analysis Outcome
Fig. 4.13 Social media analytics framework
well as Apache Storm, Apache Solr, Oozie, and Zookeeper, can be found at www.
hortonworks.com/hadoop-tutorial/nlp-sentiment-analysis-retailers-using-hdp-itc-
infotech-radar/. The use-case behind the figure shown there is a brick-and-mortar or
online retailer interested in tracking social sentiment for each product as well as
competitive pricing or promotions being offered in social media and on the Web, for
any number of products in their portfolio. Using this, retailers can create continuous
re-pricing campaigns that can be implemented in real-time in their pricing systems.
A more general setup for social media analytics is shown in Fig. 4.13, which
abstracts from particular components and restricts the attention to the necessary
type of functionality. The framework provides a general guideline for the devel-
opment of toolsets aiming at collecting, storing, monitoring, analyzing, and sum-
marizing user-generated content from social media. Although originally developed
with political content in mind, the framework is obviously general enough to serve
other purposes as well.
The framework in Fig. 4.13 has two major parts, data tracking/collection and
data analysis. Data tracking first selects appropriate data sources, and then distin-
guishes different approaches for selecting a relevant subset of data to be tracked,
e.g., keyword-based, actor-based (only track data of certain persons/groups), or a
random/explorative approach. Suitable tracking methods for each data source also
need to be determined (e.g., the Twitter API as tracking method for Twitter). The
collected data may be structured or unstructured, and later analysis steps may
benefit from data pre-processing, such as converting natural language text into
uniform terms. The second part of the framework comprises the actual analysis and
distinguishes three approaches: topic-/issue-/trend-related analysis, opinion/

sentiment-related analysis, and social network structure analysis (or a combina-
tion of these). Depending on the chosen approach, a number of concrete methods is
available.
Customer Experience (CX) Analytics
As we discussed above, big data analytics today enables a variety of findings, since
it not only takes the (internal) data generated by transactions and structured inter-
actions into account, but also (external) data that is entirely unstructured or auto-
matically generated or streamed from a continuously or periodically sending source.
We have discussed various examples of how companies can exploit all this data,
and we have indicated what toolsets can be employed. However, as discussed in
Sect. 3.1 in connection with e-commerce, companies are not just interested in
analyzing their customers, in order to find out what else can be sold to them.
Instead, most companies today are more interested in providing a satisfying cus-
tomer experience (CX) or customer journey, terms that refer to all aspects of the
interaction between the respective organization or enterprise and a customer over
the entire duration of their relationship. As we saw in Chap. 3, CX is particularly
interested in creating positive experiences for a customer in such a way that an
emotional bond between user and product results. A primary goal of CX is to turn
happy customers into loyal ones, and to turn loyal customers into “brand ambas-
sadors.” Thus, CX not only considers consumer acceptance, sales volume or
intensity of usage, but also indirect effects, like word of mouth, reviews, evalua-
tions, or recommendations.
CX has received new attention in recent years because many products can be
easily exchanged. A typical example are telecom or Internet providers, the car
industry, or banking, where there is a host of offerings, an essentially saturated
market, and differentiation via the price of the respective product is no longer an
option. In such a situation, CX or the customer experience has become a core
competitive factor.
As a continuation of the discussion we started around Fig. 3.1 of Chap. 3, the
customer journey is commonly comprised of several phases that begin with a
certain need or awareness on the side of the prospective customer. There is the
phase of initial consideration of a certain product, during which the customer is
open to advertisements of various kinds, which when leading towards a certain
selection is followed by research, e.g., into reviews or recommendations. Once a
selection has been made, the customer might be looking for a good price for the
product, and ultimately makes a purchase. The product bought is then delivered and
goes into usage. Finally, the customer may be willing to share his or her experi-
ences with the product and purchase with others.
The customer journey “cycle” can be broken down into three distinct phases:
pre-sales, sales, and after-sales, and each of these phases involves a distinctly
different kind of interaction with the customer. Such interaction today take place
through a variety of channels, including Web sites, postal mailings, advertisements,
billboards, e-mail, telephone, online chat, social media and maybe even video
Reviews
Recommendations
Evaluations (New)
Blog Posts Need or
Awareness
Customer
Profile Click Paths
in CRM Search History
System Search Duration
Transactional
Data
Fig. 4.14 What contributes to a customer profile
conferencing. During the pre-sales phase, a company will typically try to attract a
customer to its products, and during after-sales, it is crucial to keep the customer
satisfied, while the actual sales step is often done by “self-service” (e.g., airline
tickets, bank transactions, etc.). The important point here is that each phase and
indeed each channel used during the lifecycle creates data in abundance, and this
data can serve as the basis for CX. Indeed, if that data is properly integrated,
filtered, and processed in such a way that a customer profile is created and that each
individual interaction with the customer contributes to that profile, a company will
be enabled to keep good relationships with their customers.
Figure 4.14 summarizes what typical data sources a company has available to
create a customer profile. Typically, there will be a customer relationship man-
agement (CRM) system for keeping track of all customer interactions, and gener-
ally, this will be where customer profiles are also kept. However, while traditionally
only internal sources, such as the transactional database, could contribute to a
profile, a company nowadays has a host of other options, in particular though the
availability of external data that can be collected (or bought, see below) from the
Web. Importantly, a customer profile can be made the basis for predictions of what
the customer might be interested or even need next, an approach that Amazon has
pioneered through their “customer who bought this also bought …” feature.
Amazon today generates massive additional revenue by precisely analyzing what
users have been searching for, how long they stay on the site, where they click (e.g.,
on a particular positive or negative reviews), what they put on their wish-list, how
often they return to the site, and in response send them seemingly individualized
recommendations.
Predictive Analytics. Industrial Data Analytics
The concept of prediction is not restricted to electronic commerce. While it may not
be easy to predict the next product a user is interested in purchasing, there are other
areas where prediction is much more reliable and potentially even more useful.
A prominent (and old) example is given by the aircraft industry. According to
Hunter and Eng (1975), “the ability to detect an impending failure in an aircraft
engine mechanical power system, at an early stage, where expensive and possibly
catastrophic system failures can be prevented, will provide enhanced aircraft safety
by minimizing the possibility of a serious engine failure. It will also prevent the
pilot from needing to shut down an engine during flight with all the attendant
emergency ramifications that can arise. This will also improve the utilization of the
aircraft, by the scope to plan unscheduled engine removals to suit the aircraft
downtime, and it will reduce the turnaround time and costs to effect the necessary
repairs; particularly if the skill of detection can also pinpoint the area of distress
with accuracy.”
What all these sample applications tell us is essentially two things: Firstly, data
today is abundant, and there are tools available (many of them for free) which allow
us to maximize the value of that data. Secondly, data alone is not enough; indeed
the goal is to turn raw data into something smart, or big data into smart data. In the
context of a research project of the German government, even the following
“formula” has been stated, which goes even far beyond the few aspects we have
discussed here: “Smart Data = Big Data + Benefit + Semantics + Data Qual-
ity + Safety + Data Privacy.” In other words, big data only delivers the raw
material that needs to be appropriately processed and refined for being able to
deliver its full economic potential.
4.2.4 Next Up: Data Marketplaces and Ubiquitous Analytics
In times like these, when, according to Friedman (2016), acceleration is in full

swing and leads to faster and faster innovation and renewal cycles, it is difficult to
predict what lies ahead and what will happen next. This is particularly difficult
because new developments are often the result of some technical innovation (like
the iPhone in 2007), of which nobody can predict. To forecast the future, the best
we can do is look sharply at what is underway already.
As has happened with other goods in the past, when data becomes a commodity,
we observe the emergence of (virtual) marketplaces for data just as the past has seen
the creation of marketplaces, say, for company stocks or electricity. The stock
market is characterized by the fact that it not only sells shares in companies, but
offers a variety of other products that may or may not be derived from the basic
stock. In a similar way, a data marketplace will offer raw data, say, on a certain
topic, but will also offer a variety of ways in which this data can be processed prior
Developers Applications Analysts
algorithm, precompiled ad-hoc

€ € €
application query query
2nd Data Provider Marketplace, Trading, Billing

data, Data, algorithms (UDFs), processing flows, data
Consultancy
integration
Certificate / Licensing
Crawl, Buy, Aggregate
Match/Mine
€ Observe, … (OLAP)
Storage
Lineage/ Cleansing
Relevancy
Trust Transform
3rd party UDFs
Processing Infrastructure
Public Web Executing processing flows + user defined functions
Fig. 4.15 Concept of a data marketplace
to being sold. Different from the stock market, however, a data marketplace may be
open to anyone, i.e., users can act as sellers or buyers or both.
Figure 4.15 shows the general concept of a data marketplace for integrating
public Web data with other data sources. Like with a data warehouse architecture,
the schema includes components for data extraction, transformation and loading, as
well as meta-data repositories describing data and algorithms. In addition, the data
marketplace offers interfaces for uploading data and methods for optimizing data,
e.g., by employing operators with user-defined-functionality, as well as components
for trading and billing the usage of these operators. In return, the provider of the
user-defined-function retrieves a monetary consumption (indicated by the Euro
symbol) from buyers. Moreover, in the case of large data volumes from the Web,
the marketplace relies on a scalable infrastructure for processing and indexing data.
For completeness, we mention that data nowadays need not only be obtained
from data marketplaces. Indeed, there are numerous sources of data on the Web
today; for example, www.linkedin.com/pulse/ten-sources-free-big-data-internet-
alan-brown lists ten of them. Another place on the Web for getting hold of data
collections is the Kaggle platform,15 which can be seen as an analog to InnoCentive
for data. According to Wikipedia, “In 2010, Kaggle was founded as a platform for
predictive modelling and analytics competitions on which companies and
researchers post their data and statisticians and data miners from all over the world
compete to produce the best models. This crowdsourcing approach relies on the fact
that there are countless strategies that can be applied to any predictive modelling
15
www.kaggle.com/
task and it is impossible to know at the outset which technique or analyst will be
most effective. Kaggle also hosts recruiting competitions in which data scientists
compete for a chance to interview at leading data science companies like Facebook,
Winton Capital, and Walmart.” Everybody can participate in a Kaggle competition,
and many individuals as well as companies have used Kaggle competitions to train
or improve their data analytics capabilities.
Data-processing and data-analytics technology is today capable of handling huge
amounts of data efficiently, which implies that there are large and primarily eco-
nomic opportunities for exploiting this data. The notion of business intelligence that
was “invented” in the context of (early) data mining as a circumscription of the fact
that business can improve or enhance their “intelligence” regarding customers and
revenues by analyzing and “massaging” their data to discover the unknown will
now enter the next level. Indeed, a consequence of the fact that more and more data
is made available in digital form not only allows business to gain new insights; it
also renders new discoveries possible in areas such physics or healthcare where the
primary target not necessarily is of type “business.” So not only regarding business,
big data can indeed be seen as the new intelligence enabler, since the broadness of
data available today (not just its sheer size!) and the available technology enable us
to perform analytics, to see connections, and to make predictions unthinkable only a
short while ago. It has been predicted that in ten years from now, every individual
will be in a position to exploit digital information to his or her advantage in ways
unforeseeable today.
While security breaches and data misuse have always been a challenge in
computer science, even this reaches a new level with big data. Website io9 (see
www.io9.gizmodo.com/or www.gizmodo.com.au) lists a number of ways in which
big data is creating the “science fiction future”, among them that dating sites can
predict when you are lying, that surveillance will become Orwellian, or that sci-
entists, doctors and insurers can make sense of your genome. We should hence be
aware that big data does not just require the right technology, but also needs an
appropriate governance and protection, a topic also favored by the Free Software
Foundation16 and its protagonists.
4.3 IT Consumerization, BYOD and COPE
We now turn to a different topic that is relevant for the modern enterprise. With
increasing market penetration of (mobile and) smart devices such as smartphones,
tablets, or laptops and owing to the ubiquitous (“always-on”) nature of these
devices, private applications such as social networks and e-mail will more and more
reside on the same device as corporate documents or applications such as company
spreadsheets or (interfaces to) proprietary software. Most commonly, both types of
services are used interchangeably in both business and private environments, e.g.,
16
www.fsf.org/
4.3 IT Consumerization, BYOD and COPE 189
employees usually have a private and a corporate e-mail address, but both are
accessed through a common interface or even using the same e-mail application.
This observation is supported by the BYOD (“bring your own device”) develop-
ment, also known as IT consumerization (Castro-Leon 2014), where companies are
allowing their employees to use their personal devices at work or for work-related
purposes. Of the many benefits BYOD offers (both to organizations and their
employees), an increase in flexibility and efficiency as well as the ability to work at
anytime from anywhere are considered key (Morrow 2012). The underlying phi-
losophy of BYOD is in line with Mark Zuckerberg’s philosophy, implemented in
Facebook, that every person has only one identity (as opposed to a private and a
professional one). Indeed, in an interview with David Kirkpatrick for his book,
“The Facebook Effect.,” Zuckerberg is cited as saying “The days of you having a
different image for your work friends or co-workers and for the other people you
know are probably coming to an end pretty quickly. Having two identities for
yourself is an example of a lack of integrity.” (Zimmer 2010). Even though this
statement is controversial from a privacy point of view, the lack of integrity is
particularly pertinent, especially from a technological perspective.
An alternative to BYOD is often termed COPE (Corporate Owned, Personally
Enabled). This is effectively BYOD in reverse when an organization provides a
device for its employee yet allows the device to be used for personal use. The key
benefit of COPE is that the owner (the organization) can maintain control over the
setup and security of the device, thereby limiting potential security breaches that
may occur. BYOD is different from what is commonly termed CYOD (“Choose
Your Own Device”), where an employee can choose from a (typically limited)
variety of devices offered, yet the device chosen remains company property. CYOD
is in fact very similar to COPE and both COPE and CYOD may or may not come
with restrictions regarding the selection of devices. We will not distinguish CYOD
and COPE in the remainder of this section, and we will generally assume that for
the latter private usage is accepted. COPE is more commonly discussed, so we will
follow suit.
For BYOD or COPE to be effective, various issues need to be addressed,
including but not limited to:
• Who owns the device(s)? An employee might be permitted to use a private

device for business-related purposes (BYOD), or the employer might provide
the device (COPE).
• How are access rights managed on the various applications that reside on the
same device, but are “owned” by different parties?
• How is a change of jobs or the termination of an employment handled?
By way of an example which is a reality in many parts of the world today, consider
the daily routine of a typical knowledge worker (e.g., a bank employee). While
having breakfast she wants to check on her private and business e-mails. Today, it
is very likely that she will do this on her smartphone, tablet, or on the rather new
combination “phablet” (portmanteau from phone and tablet) using two different
Web services, each with individual login. Also, a third and fourth application will
be needed for calendar and (quality) news, all of which with potentially different
credentials. In her office, after plugging her laptop into a docking station, she will
access proprietary banking software, the same e-mail and calendar services, only
via a different interface. She is likely to store some files on a company cloud storage
solution. Heading for a customer presentation she grabs her laptop again which is
obviously able to access the aforementioned cloud storage. Many interesting sce-
narios emerge from this setting: During a meeting, relevant company performance
figures can be accessed; on the way home a presentation can be finalized on the
train; during her lunch break she might take a quick look at photos from a relative’s
vacation; during a free moment, the remainder of last night’s movie can be watched.
While this may still be viewed as the realms of science fiction to some, it is only the
beginning of what will soon be everyday manifestations of our 24/7
hyper-connected world in which the distinction between private and professional
life is vastly blurred (Schmidt and Cohen 2013). In other words, people will soon be
living in their “personal” cloud, a term that was first mentioned in a 2011 Forrester
report and also picked up by the blog readwriteweb.com around the same time.
A study reported in BITKOM (2013), the German Federal Association for
Information Economy, Telecommunication and New Media (Bundesverband
Informationswirtschaft, Telekommunikation und Neue Medien e.V., abbreviated
BITKOM) shows that 43% of German IT and telecommunication companies allow
their employees to connect their own devices to the company network. Almost 60%
of these companies have established specific rules for this, 81% expect an increase
in employee satisfaction, 74% expect increased productivity, while roughly 40%
believe they will be perceived as a modern employer. On the other hand, 53% of the
companies interviewed declined the use of private devices in the workplace, mostly
due to increased maintenance and security costs in the presence of a large variety of
devices, differing operating systems, with all kinds of application software installed.
We expect these figures to be similar in other parts of the Western world. Lance and
Schweigert (2013) examine BYOD projects at IBM, Cisco, Citrix, and Intel to
determine when and whether an implementation of this concept is successful.
Issues relating to BYOD/COPE can be categorized into legal, economical,
organizational, and technical issues, several of which will be elaborated upon in this
section; for an introduction to cultural and organizational impact, see Lofaro (2014).
We look at current practice which reveals solutions already in use, and we point out
research issues that need further study.
4.3.1 Device Ownership
BYOD eliminates the need to carry and use separate devices for private and work
purposes. As shown in Fig. 4.16, both BYOD (1) and COPE (2) have in common
that the total cost of usage is carried by one entity (company in case of COPE,
employee in case of BYOD). In reality, usage costs are likely to be shared such that
the cost of work usage is paid for by the employer in case of BYOD or the cost of
Low High
Single device, COPE or BYOD

sole payment 1 2
Usage
Device
Security Risk
3
Total Cost
Single device, or
Shared usage payment (Hybrid)
Usage
Device
4
Separate devices
Personal
Separate payments Work Device
Device
5
Work Ownership Personal Ownership High Low
Fig. 4.16 Different forms of device ownership
private usage is paid for by the individual (cases 4 and 3, respectively). In contrast
to these forms is the traditional model of separate devices for professional and
private usage (5). Whilst this is the most secure solution from a company per-
spective as the work device is controlled by it, it is also the most costly approach—
at least if total cost is examined—because two devices have to be purchased and
maintained. This is reversed with COPE or BYOD where less institutional control
results in a greater security risk, while the total cost are lower.
In terms of organizational processes, for each of these ownership models, it
becomes necessary to specify what concrete actions will be taken when a new
device is deployed or enabled. Furthermore, processes have to be defined that
describe the actions to be taken in case a device is stolen or lost, when updates need
to be physically deployed, or an employee leaves the company, as well as other
eventualities.
4.3.2 Access Control Though Mobile Device Management
It is eminently obvious that an appropriate form of security management is needed

when company data ends up on personal devices, especially since this data will
most likely be saved somewhere in the cloud and might get replicated around the
world. The other obvious risk relates to potential theft or loss of the device. Since
the trends that have led to the COPE model are unlikely to be reversed, there is a
need to introduce ways to control them, instead of simply ignoring them or even
forbidding usage of personal devices for work purposes, when such actions are
likely to result in unauthorized, and indeed risky practices anyway.
A software solution to this problem is provided by Mobile Device Management
(MDM) such as that described by Liu et al. (2010), which offers centralized
management of all mobile devices that are used within an organizational context.
MDM typically comes with features such as the following:
• Software distribution via an in-house app store, installation and maintenance via
push apps
• Remote access for reading and setting possible configurations, push messages,
switch-off (remote lock and wipe)
• Inventory management for keeping track of hardware, software, licenses, patch
management
• Optional encrypted backup and restore
• “Containerization,” i.e., strict differentiation between private and professional
data and applications
• Protection against unauthorized access to enterprise services, jailbreak detection
• In general, conceptual issue include interfaces, SSO (single sign-on), architec-
ture, standards.
To realize these functionalities, an Enforcement Agent (basically a client app) is

deployed to the mobile device. Some of the latest devices and operating systems
also offer an MDM-API as an alternative. From that point onward, the device
communicates with an MDM server, which can be operated either on premise or as
SaaS (Software as a Service). Its main tasks are: monitoring of access, data storage,
and configuration of the device.
Despite the fact that this type of software is only just emerging, a number of
vendors exist. Forbes estimates about 80 vendors and Solutions Review has iden-
tified 20 main suppliers of MDM software (Solutions Review n.d.), including, but
not limited to the following:
• MobileIron is specialized in mobility management and positions its software

suite as a pure MDM solution. MobileIron has meanwhile deployed a portal
that, in their own words, “provides an easy and customizable end user device
registration experience and self-service device management.” (It can be reached
at www.byodportal.com/.)
• AirWatch offers its MDM solution primarily in SaaS form; however, on-premise
installation is also possible. They emphasize scalability enabling them to
manage large numbers of devices (they report on cases where more than 40,000
devices are connected).
• Good Technology has a wider portfolio and offers a dedicated MDM solution as
well as software for mobile collaboration and enterprise communication.
• Fiberlink, now owned by IBM, offers a cloud-based MDM service called
MaaS360 and has a large number of partners aiming for a global reach.
• Citrix XenMobile is an MDM tool that was originally created by Zenprise and
that particularly emphasizes protection against data loss, besides offering a
variety of functions pertaining to the mobile cloud.
We refer the reader to Schomm and Vossen (2013) for further details.
4.3.3 Governance for Security and Privacy
While MDM is a viable solution and some features such as an in-house app store
and SSO are demanded widely, this is particularly labor intensive and restricts users
in the way they can actually use their device. In this section, we suggest alternative
solutions and also look at the problem from a conceptual point of view; the reader is
referred to Morrow (2012) and Miller et al. (2012) for an overview of relevant
security (employer side) and privacy (employee side) challenges.
Essentially, two scenarios need to be considered:
1. The company owns the device (COPE): In this case, the device needs different
ways of being accessed: a personal domain, a company section, and a combined
section, each with an individual user-chosen access code.
2. The user owns the device (BYOD): In case of a split, i.e., the user leaves the
company and is no longer granted access to company data or services, the
company disables access to any company application and the user returns the
SIM card of the device in question.
Clearly, the measures that need to be taken in either case vary slightly for different
types of devices such as smartphones, tablets, or laptops and for cellular versus
non-cellular devices. We ignore such differences in what follows.
Both scenarios can utilize any of the security measures discussed in the fol-
lowing subsections (security tokens, hardware keys, smart containers). Before we
elaborate on these issues, we note that an additional complication can arise
depending on whether or not data is replicated on a user’s device. Data replication
necessitates identical copies existing at a central company-owned repository as well
as on all devices that request access to that data. As is well-known, replica control,
or making sure that all copies remain identical at all times, is a non-trivial problem
(Weikum and Vossen 2002), yet has the obvious advantage that data remains
available in the presence of network failures. Conversely, if keeping replicas is not
an option, data access depends on network availability. We consider replication a
purely technical issue and therefore do not consider it any further here.
Chow et al. (2009) outline the main issues from a general cloud computing
perspective without presenting actual solutions: (1) Trusted Computing: “authen-
ticate” hardware systems; allow computers to proof their compliance with certain
requirements; make sure hardware is not tampered with. Question: How to adapt for
virtual hardware in clouds? (2) Working on encrypted data: New approaches, such
as searchable encryption, homomorphic encryption or private information retrieval
can allow some operations, such as range queries, to be performed on ciphertext so

that the Cloud Service Provider (CSP) never has to see the cleartext at all. Ques-
tions: How to make this easy-to-use for both CSP and cloud user? How to convince
CSP to implement it and take on the additional computational effort? (3) Proofs of
retrievability: CSP can prove that all data is stored as requested by the client; the
client can be sure that provider actually stores data (and, e.g., not just pretends to be
storing it); it works without re-transmitting the original data, so it’s feasible even for
large data sets. Tian et al. (2009) present a security requirements analysis as well as
a security framework for personal clouds.
4.4 The Digital Workplace
As technology has become more sophisticated and ubiquitous in recent times,

opportunities for changes to the working environment have resulted in the emer-
gence of the digital workplace. The confluence of mobile technologies, enhanced
networking, and greater globalization has provided opportunities for new ways of
working that permit greater creativity, flexibility, and fun. These features of a digital
workplace are largely achievable through the integration of four technologies—
mobile; big data; cloud computing and search-based technologies; all with a focus
on developing the mobile environment (White 2012).
4.4.1 Requirements of a Digital Workplace
As a term, the Digital Workplace is not new and was most likely first proposed by
Jeffrey Bier in around 2000. Bier had previously worked for Lotus Corporation as
general manager of the spreadsheet division. In 1996 he founded Instinctive
Technologies, later renamed as eRoom Technologies, based upon the work he had
undertaken at Lotus on collaborative technologies. Bier established five key
requirements of a digital workplace—all of which still hold today. These are
described later in this section.
There is no agreed upon definition of a digital workplace, in part because the
concept is still evolving. Marshall (2014) describes a virtual workplace as simply
being the virtual equivalent of the physical workplace, although it is far from as
simple as this. Marshall does accept that this is an imprecise definition and indeed
the shape of a digital workplace will vary between organizations. Key character-
istics of any definition are:
1. People focus. People are the central element of any digital business and the
overarching goal of any initiative is to develop a working environment that
allows people to work how they want, when they want, and—to a degree- where
they want, whilst also enhancing productivity and efficiencies.
4.4 The Digital Workplace 195
2. Technology. It goes without saying that a digital workplace is reliant on tech-

nology. Indeed a workplace without enabling technology is not a digital
workplace at all. It is technology developments that are driving the movement
towards the digital workplace.
3. Management and design. Simply putting positively orientated workers into an
environment replete with contemporary technologies does not make a digital
workplace; certainly not successful one. The final enabler of a successful digital
workplace is the organization-wide management of processes that align with the
people that will implement them and the technology that will be applied to them.
Digital workplace expert and consultant Jane McConnell presents a framework,

shown in Fig. 4.17, that encapsulates the key intersecting elements of an opera-
tional digital workplace. In simplistic terms, she characterizes a digital workplace as
the intersection of people, organization and tools/technology. She further defines
the digital workplaces as comprising three perspectives: capabilities, enablers and
mindset. Capabilities can be measured at the individual, core business, or
cross-organizational levels. Enablers include structures, processes and reach.
Finally mindset includes the values, expectations and ways of thinking that deter-
mine how people and organizations act and can be decomposed to include lead-
ership, culture, and strategic assets (McConnell 2016). The goals of the framework
are to enable management and practitioners to understand how people and orga-
nizational characteristics shape the digital workplace as much or more than tech-
nology, to enable organizations to self-assess their digital workplace maturity from
the broad perspective of the connected organization and, to provide a framework for
building a roadmap toward a digital workplace that plays an essential role in the
organization, and is therefore a strategic asset.
Mindset
Values, expectations and ways of
thinking that determines how
people and organizations act.
Capabilities
Where people and tools
come together serving People Organization
the purposes of
individuals, business and
the enterprise. Tools Enablers
Where the organization
and tools come together
in structures and
processes facilitating
change.
Fig. 4.17 Components of a digital workplace

Communicate & Engage
Technology
User Governance &

Agile Working Experience Strategy Operation Collaborate
Adoption
Business Applications Find & Share
Fig. 4.18 Digital workplace services
Clearbox Consulting is a leading UK-based workplace consultancy. Run by Sam

Marshall, it helps companies manage intranets, internal social media, enterprise
mobile strategies and real-time collaboration tools. Clearbox describes how a digital
workplace provides an organization with five key services as shown in Fig. 4.18:
1. Communication and employee engagement

2. Collaboration
3. Finding and sharing information and knowledge
4. Business applications (process specific tools and self-service)
5. Agile working—the ability to be productive at any time and at any place.
However for these to work effectively, there are five management activities that
need to be carried out in a way that facilitates the provision of the services above:
1. Strategic planning
2. Governance and operational management
3. Proactive support for adoption
4. High quality user experience
5. Robust, secure and flexible technology.
Jeffery Bier established five key requirements of a digital workplace. Whilst now
over 15 years old, these are still highly relevant to the working environment of
today. The requirements he proposed are as follows.
1. It must be comprehensible and have minimal learning curve. If people are

required to learn a new tool, they are unlikely to use it, especially those people
outside the firewall or on the periphery of the organization. The way in which
employees need to interact with the digital workplace needs to be intuitive and
straightforward.
2. It has to be contagious. By this Bier refers to the willingness of employees to
want to tell others about the positive experience to be derived from using the
digital workplace. To achieve this, it must have clear benefits to all parties
involved including those that work remotely, and other organizations that are
forced to interact with these new workplaces. The workplace also has to be a
trusted place, thus secure, both for the individual and the companies involved. In
summary, people have to want to use it.
3. It must be cross-enterprise. The digital workplace will not be just used by
employees of the firm. It must span company boundaries and geographic
boundaries. It will need to be used outside the corporate “firewall” with cus-
tomers, suppliers and other stakeholders, and require very little IT expertise and
investment, or it negatively impacts important external relationships.
4. The workplace has to be complete. Partial adoption of digital workplaces are
rarely successful. All activities, including decision-making need to be
incorporated.
5. The digital workplace must be connected. If not, it will not gain acceptance.
4.4.2 Key Technologies
The key enabler of the digital workplace has been the rapid development of highly
functional collaborative technologies. Such technologies include those that have
been available for some years, such as e-mail and Intranets, along with the most
recent developments like big data and cloud computing. None of these in isolation
offer the capability to transform a traditional workplace into a fully-functioning
digital workplace, but as a suite of technologies, they permit much more radical
change. Deloitte summarizes the array of technologies that form the basis of the
technology infrastructure of the digital workplace. Messaging technologies such as
email, instant messaging, microblogging and SMS messaging provide a fast way for
colleagues to communicate. Productivity and efficiency may be enhanced through
the use of traditional IT software tools such as word processors, spreadsheets,
presentation software and the like. Collaboration technologies are used by all
employees to work with each other as well as external partners. Web conferencing
is a common tool of this type, along with Wikis, virtual communities and social
collaboration tools like Asana, Slack and Google Docs. Communication tools are a
key element of a digital workplace enabling information sharing and internal
publishing and can include portals and intranets, blogs as well as personalization of
websites. Self-service applications provide flexibility in working arrangements and
include HR systems, ERP and CRM applications. In some situations,
crowdsourcing is desirable, especially to facilitate brainstorming and idea genera-

tion from employees that may not always be physically present or at least at
common times. Such applications also include polling, surveying and forum
functionality. Connectivity is a key technology allowing digital connections to
occur within and outside of the organization. This includes both wired and wireless
infrastructure along with the requisite security and firewall software and hardware.
Finally, a variety of technologies are often adopted to permit employee mobility.
These include a range of mobile/smart devices, both general and specialized.
4.4.3 The Physical Space
For many, a digital workplace is often most easily conceptualized in the physical
sense. Commonly held views are of work spaces that are highly contemporary,
often sterile, open-plan, loaded with physical technology, yet sparsely populated as
most employees are connected remotely while working in far-flung, yet unknown
locations. While this may be the case of some digital workplaces, the reality for
most is often very different, indeed little uniformity exists as the culture, industry
and indeed physical location of the organization greatly impacts how each digital
workplace actually looks and operates. Because of this variability, it is neither
possible nor useful to try and characterize the physical space of a digital workplace.
However three distinctly different styles are often observed. These include those
that have been designed with the principal goal of facilitating collaboration, those
designed to enable creativity, and finally workplaces that are intended to provide an
underlying sense of fun.
Collaboration
Many of the technologies employed within a digital workplace are intended to
facilitate internal and external collaboration. The likes of email, social collabora-
tions tools like Asana or Slack, cloud-based file sharing platforms such as Google
Docs, all provide the basis for synchronous and asynchronous collaboration.
However the physical space also plays an important part. The organization needs to
be cognizant of the nature of collaboration that occurs and provide a physical
environment that enables it to occur. Open-plan, flexible work-spaces are com-
monly used in this regard. Different types of seating (e.g. couches and bean-bags
are also used in some situations). From a technology point of view, interactive
touch screens are a useful addition to the collaborative workspace.
Creativity
For those organizations operating within the creative industries such as design,
architecture, art, etc., the physical space is a crucially important element in pro-
viding an environment in which the creative juices can flow. This can involve the
use of light, color, warmth and entertainment. Again, the nature of the business will
be the key determinant in what the physical space needs to look like.
Fun
This is a challenging one. We all want work to be enjoyable. We want workplaces
that our staff want to go to. Making the workplace fun can assist with that, but at
what stage does this get out of hand? Google for example have workplaces that
portray a strong sense of fun. Employees move about the buildings on scooters,
slides and fire poles are incorporated. There are games rooms and entertainment,
plus a range of other features that most would view as fun-focused. But in many
ways, this is the type of portrayal that Google strives for and uses such physical
spaces to instill this type of organizational culture in its business. This type of
physical space is not-uncommon in this type of business. However it is not suited to
all industries. Would you expect your elected government representative to operate
in such a digital space? What about your tax department or your local police
station? Fun as a characteristic of a digital workplace has its “place”. However it is
not suited to every type of business and needs to align with the culture and external
perception the business wishes to portray.
While the concept of a digital workplace is not new, it is important to recognize
that is not simply achieved by renaming intranet technology, by adopting social
media, by introducing mobile technology, or by allowing workers to work from
remote locations and flexible times. For a business to provide a true digital
workplace, a combination of aligned tools/technologies, engaged employees, and
management processes needs to be present in equal measure.
4.5 BPM and the CPO: Governance, Agility and Efficiency

for the Digital Economy
Novel digital technologies such as the ones discussed earlier in this book or simply
unprecedented ways of using known technologies, or sometimes just immense
increases in performance of proven technologies provide the elements needed to
drive the digital economy. Here, motivated entrepreneurs with ingenious—or
sometimes downright strikingly simple—ideas will find the soil to thrive on. If
potent and ideally patient risk capital providers then come into play, a rising star of
the digital economy may be born. But how it can be prevented that this star does not
burn up in the fire of scorched millions after a short time? How can a business
model’s efficiency and sustainable success be ensured?
This question arises not only for the entrepreneurs in the digital economy as we
discuss in the next section, but also for any company competing for market share.
Companies are required to test their business models and to screen them for
potential success by digitizing them. This must not be a one-time process, but has to
be established as a process of continuous monitoring and continuous improvement
of the respective business model. Agility is the order of the day and the organization
needs to apply a high level of willingness to change. And it must be able to quickly
draw conclusions from the emergence of disruptive technologies that threaten
existing business models, and also offer opportunities for developing new markets.
In addition, the demands in terms of governance and compliance are on the rise and
companies, especially those operating in a global context, are faced with new
challenges on a daily basis.
4.5.1 CPO: The CIO’s New Role
Within a company’s management it is usually the Chief Technology Officer (CTO)

and the Chief Information Officer (CIO) who are the first point of contact when it
comes to the development and implementation of digitization strategies through
new business models. However, the job descriptions of these management positions
no longer suit the actual challenges in the digital economy. In many organizations,
there is nowadays a new role at the level of management, the Chief Digital Officer
(CDO), who is in charge of future digitization projects. In addition, the role of a
CIO is increasingly developing into that of a Chief Process Officer (CPO) to
emphasize the rapidly growing importance of information and communication
technologies in business developments. On the other hand, the increasing avail-
ability of simpler deployment models (keyword: cloud) leads to a change of the
scope of the recent CIO, away from technological issues, towards business issues
and business processes. With the transformation of the CIO to a CPO, the lead-
ership responsibility for this role increases for the business’s success and thus its
visibility in everyday business.
In the new role of the CPO, the importance of business processes is manifested
as a central component of the enterprise architecture. Business process excellence
becomes a critical success factor for the company in the digital economy. The
CPO’s most important tools are business process models that make up the con-
struction plans of the digital economy. They support the design and continuous
improvement of business processes and prove to be an indispensable tool for
communication and collaboration. The models are the basis for holistic business
process management, in which the complete life cycle of business processes is
supported by adequate methods and tools (see Sect. 2.1).
The scope of a CPO is broad and generally has company-specific peculiarities
arising from organizational factors, the market situation, the corporate culture, the
available knowledge, and quite often also from the company’s history. In summary,
the objective of a CPO is to build an architecture for business processes, infor-
mation and communication to ensure achievement and preservation of sustainable
business process excellence. The severity of this task is obvious in view of the
volatility in the digital economy.
From practice, important fields of action can be derived for the CPO:
• Participation in the development and maintenance of the corporate culture.

4.5 BPM and the CPO: Governance, Agility and Efficiency for the Digital Economy 201
• Development and maintenance of the enterprise architecture, from the technical

IT architecture to the business processes and to the business architecture.
• Establishing an environment for continuous business process improvement.
• Establishing a culture of knowledge and building an effective knowledge
architecture.
• Reduction of complexity and improvement of efficiency in business and deci-
sion procedures.
• Enterprise-wide streamlining, harmonization, and standardization of business
processes with their associated services and master data.
• Business process transparency.
• Implementation of governance, risk, compliance (GRC), and security strategies.
• Providing tools for predictive analytics and simulation of business scenarios.
• Being a catalyst for the creation of a continuous stream of process and service
innovations.
Reflecting these fields of action against currently prevailing business practice, it

becomes clear that the CPO plays a crucial roles in enabling a company’s change
processes and can therefore be primarily considered a change manager. Every time
it comes to changes in the company, the business processes form the linchpin of all
activities, hence the CPO is therefore well advised to establish a powerful BPM
platform in the company, so as to strike a balance between agility on the one hand
and governance and efficiency on the other. How such a platform can be introduced
in the company is answered by introducing best practices as described in the
following sections.
4.5.2 Business-Driven Implementation of BPM
When considering the introduction of BPM, central anchoring of core business

processes in the enterprise architecture must be taken into account, as illustrated in
Fig. 4.19. A transformation from the company’s objectives and strategy portfolio
towards business excellence will only be achieved using on-demand business
processes. This requires a close integration of business process management with
the company’s strategic management and all business excellence initiatives taking
place in the company. A multitude of external factors must be considered and the
fact that BPM is always moving in a field of tension between the company’s
mission and culture, ethical principles, values and the nature of corporate knowl-
edge, and externally and internally driven GRC requirements.
Particularly in the digital economy, special demands are made regarding the
construction plans, so the business process models are of particular importance. It is
obvious that they need to form a solid basis for the implementation of digital
processes. But it is also necessary that they are widely understood and formulated in
a way that is easy to communicate, but hold so much agility that they can adapt
quickly and resiliently to changing markets and environmental conditions at any
time. They must allow different views of the digital economy: A consistent strategic
Governance, Risk and

Compliance Management
External influencing
factors Business Process
Management
society Strategy
environment
markets
continuous
business Business Processes improvement
partners
job market
investors Business Excellence
Mission, Ethical Principles,

Corporate Culture Values, Knowledge
Fig. 4.19 Context of BPM implementation
corporate vision with business objectives, strategies and an objective risk assess-
ment as well as a comprehensive definition of business processes with process
context, processes, business rules, business objects, competences and responsibil-
ities as well as the relevant corporate structures. Construction plans of the digital
economy may not be rigid, as they would fail. Rather, they must be agile by being
stretched from a lot of “conceivable” business processes, which constitute an agile
character themselves. The mapping of business processes in semiformal models
then allows rapid implementation processes, in which the technical requirements of
the business process models are implemented into productive enterprise software
systems, which are increasingly sourced from the cloud (Software as a Service;
SaaS).
However, these are currently only the initial phases of the digital economy,
where this method has been proven. Further phases will follow, in which singular
business processes must be connected to enterprise-wide process chains and process
networks. Seamless integration and best usability for all users involved are required
just as much as an optimal exploitation of the potentials of the Internet of Things
(IoT). These aspects are briefly discussed in Chap. 6.
BPM Trends
Given the high volatility of business processes in the digital economy, it is
important that the underlying BPM platform is able to respond flexibly to future
trends or already considers the currently foreseeable trends. Interesting in this
context is a study by Forrester Research (cf. Richardson et al. 2012), in which the
analysts identify five trends that will change BPM fundamentally in the future.
Based on these trends, we have defined the following change potentials for the
fields of action listed above:
• Customer Experience as a Driver for BPM.

Traditional applications of BPM in the internal business processes are supple-
mented by more and more applications in the area of customer experience
improvement. BPM leads to individualization and improves the quality of
customer experiences.
• BPM is going Mobile.
Driven by product and service innovations and the increasing importance of
mobility in everyday business and private life, numerous business process
reengineering initiatives arise with the aim of using BPM for mobile
applications.
• Automation and the “Human Touch”.
Bundling formerly isolated BPM activities in now enterprise-wide BPM pro-
grams, in which large amounts of processes are simplified, standardized and
integrated, calls for new ways to automate and deploy processes. At the same
time, the human factor is becoming increasingly important again, despite—or
perhaps as a result of?—advanced automation that makes personal customer
experience possible.
• Trust and Transaction Security
Advancing digitization leads to a tighter integration of external partners into the
business processes of an enterprise. In addition, comprehensive value chains
involving any number of partners are created which leads to highly distributed
business processes. Given this background, it is no surprise that mutual trust of
process partners becomes a critical success factor. This is reflected in continu-
ously increasing requirements to transaction security of BPM systems. An
important building block in this context is the secure storage and trusted
exchange of data between the various process partners involved. This is a central
application area for distributed ledger technology, of which a well-known
example is blockchain. A blockchain is a cryptographic ledger consisting of a
sequence of blocks of information securely chained together, using sophisticated
hash techniques which make it impossible to retroactively manipulate the
information.
• “Big Data–Big Process”.
Isolated business process improvements become more and more a story of the
past: Indeed, the trend goes towards BPM initiatives that take place in an
enterprise-wide fashion, often even beyond enterprise borders, and that aim at
sustainable transformation of all business processes. For this purpose, business
processes must be closely linked to the company data, and BPM must be able to
benefit from the potentials that novel processes provide with Big Data and Big
Data Analytics. In addition, lightweight BPM platforms will enable the rapid
implementation of structured and unstructured processes.
• Customer-Oriented Process Mining.
The combination of process indicators with indicators of customer experience,

productivity, quality and agility creates new process mining techniques. These
techniques make it possible, starting from the customer interface, to discover
new processes to ensure an optimal customer experience.
Classification of Business Processes

In practice it is seen too often that companies put significant effort into the
customer-specific implementation of business processes, which are actually of
minor importance for the company. As a result, they then lack the resources for a
realization of mission-critical business processes. To address this problem, the BPM
platform has to allow for a classification of business processes, which ensures that
BPM resources can really be benefit-oriented.
The individual business processes are then basically rated based on three criteria:
• Competition:
Is the business process critical in terms of competitiveness? In other words, do
customers acknowledge quality and performance of the processes and do they
influence them directly or indirectly in making their purchasing decisions?
• Value:
In terms of value added, is it a core process of the company?
• Quality:
How does the business process affect the quality of products and services offered
by the company?
As a result of this brief review, business process’s profitability can be classified and
can be implemented following different priorities and using various methods. At a
generic level, two classes can be distinguished:
• Business-critical (mission-critical) business processes:

Business-critical business processes play a prominent role in competition.
Usually they define the unique selling points of a company. Their influence on
product and service quality is significant. In addition, a high value is achieved
through them. In the following, we also make a distinction between critical
business processes, which to a large extent are designed specifically for a
company, and less specific processes with a more or less high degree of stan-
dardization, as for example common for industry-specific processes.
• “Commodity” business processes:
Business processes that achieve a low value and have little or no effect on
product and service quality or are of little or no importance in competition are
called “commodity” business processes.
It is clear that a classification of business processes into these categories greatly

depends on the industry; it may even vary from company to company within the
same sector. Some simple examples make this clear: For a forwarding agent, fleet
management is a mission-critical process to a great extent, while it will not be an

issue for a financial services provider. For a financial service provider, business is
all about customer relationship management, while a supplier in the automotive
industry will attach significantly less importance to marketing and sales.
BPM Implementation Strategies
Given the diversity of the fields of action for BPM as a means to achieve sus-
tainable and enabling business process excellence, it is evident that the value of
BPM depends to a large extent on the implementation strategy used. Various
aspects can be taken into account for the choice of an appropriate strategy for a
particular use case, which are detailed below.
Bottom-Up- versus Top-Down Strategies

For an implementation of BPM, two generic types of strategies can be distin-
guished: bottom-up and top-down. Bottom-up strategies are often used in practice,
when BPM is considered primarily in the implementation-oriented form of work-
flow management. Here workflows are analyzed in existing information systems
that are subsequently automated by means of BPM techniques. Unfortunately in
this scenario, processes are often cemented that have long since become obsolete.
Workflow improvement is then omitted so that the valuable potential of workflow
automation is given away. Also typical for bottom-up approaches is that business
requirements are considered from a reduced, process-specific view, which usually
also leaves considerable potential for improvement unused. The results are largely
insulated workflows that are completely unsuited as a basis for an enforcement of
BPM concepts at the architecture and business model level. Thus, the goals of
creating sustainable business process excellence through BPM cannot be achieved
with bottom-up approaches.
Based on these considerations, practices with a strong top-down character seem
much more efficient for implementing BPM. BPM introduction rarely happens in
the form of largely independent projects, but mostly in the context of large BPM
programs or initiatives, in which BPM projects are planned, integrated, and coor-
dinated related to various fields of action. Here, the decision makers of the company
should always be the clients. They define BPM as part of the business model; they
basically anchor it to a certain extent in the DNA of the company and ensure
compliance with the corporate culture. Following the top-down approach, BPM
finds its way into the business architecture and finally to a company’s information
systems. The design of business processes is then based on an enterprise-wide
overview on the technical requirements and on the basis of the company’s business
objectives and strategies. It is obvious that such a top-down approach is an ideal
way to achieve business process excellence. Further it provides the basis for con-
tinuous improvement of business processes with the aim to ensure business process
excellence in the future as well.
The following presents important BPM implementation strategies, evaluated as
to their practical relevance.
4.5.3 Governance, Risk, and Compliance
For the implementation of BPM in companies, it is absolutely critical to success to

win over the executive management as a stakeholder. The implementation has to be
desired by the executive management and must be driven by it. In this respect, it is
worth taking a look at the management agenda in the larger companies. And there
the issue of GRC (cf. Sect. 2.1.4)—Governance, Risk and Compliance Manage-
ment—is at the top. What could stand more to reason than to anchor BPM firmly in
the GRC strategy?
Figure 4.20 shows the typical structure of a GRC approach. The number of
external requirements seems overwhelming for the observance of and compliance
with corporate records-management, in many cases setting forth personal liability.
Their task is to formulate appropriate instructions, to communicate and to monitor
their compliance. Even more, the directives should be complete, efficient and
effective, therefore consistent in themselves. It is also necessary to implement
mechanisms to monitor and control the execution of the directives. In addition,
reactive mechanisms are to be provided for, ensuring that the enterprise immedi-
ately takes proper measures, in the case of imminent or an actual violation of
regulations, to limit damage to the periphery as well as the enterprise itself.
Responsible enterprises pay the highest attention to preventive mechanisms. With
the avoidance of risks and compliance violations lies the key to significant cost
Values
standards
Norms and
and
Laws
ethical Corporate
principles objectives
Governance
Norms and Instructions Risk

standards and directives
Controls
Compliance Risk
Regulations Management Management Risks
Enterprise
Model
Prevention Execution Reaction
Fig. 4.20 Embedding BPM within the corporate GRC strategy

Fig. 4.21 Embedding of

Strategy
GRC within the corporate
context
Finance &
Business Audit GRC
Processes
Application
Software Legal &
IT GRC
Process
GRC
IT Platform
savings and prudent market trade, often resulting in interesting competitive

advantages.
These considerations clearly outline the potential contribution that BPM makes
in the definition and consistent implementation of a GRC strategy. BPM models
form the heart of any GRC definition. And using BPM procedures for analysis,
simulation, as well as process automation and monitoring, the GRC definition
becomes a lively and powerful GRC system.
Prior to the introduction of GRC into an enterprise, it is often debated whether it
is a business- or an IT project. “As well as” is the correct answer here. Figure 4.21
illustrates this: GRC always lies in the responsibility of enterprise management,
therefore remaining a strategic project. GRC also digs deep into the enterprise,
penetrating it by implementing mechanisms that work at all levels—from the
strategy to business processes and application software right up to the IT platform.
Set forth from the multitude of mechanisms, it is advisable to perceive GRC not
as a simple project, but as a strategic program in which several time-coordinated
projects are implemented. It is also important to understand that GRC is not only a
“concern of the finance department”, as one often encounters within an enterprise.
GRC in fact encompasses all business processes and organizational units of an
enterprise, consciously involving the collaboration with customers and business
partners of all kinds. The resulting complexity can only be mastered with business
models. GRC is often structured into three packages of measures that reflect dif-
ferent views of GRC, but are closely connected to one other:
• Finance- and Audit GRC:

SOX, Basel III, country-specific principles of proper accounting, etc. are the
drivers in finance- and audit GRC. A close cooperation between the auditors and
the financial- and tax authorities is a requirement. Internal control systems are to
ensure the correct and compliant implementation of the processes in finance and
accounting. Of particular significance is the monitoring of business transactions,
based on corporate performance indicators (Key Performance Indicators, KPIs)
and risks (Key Risk Indicators, KRIs).
• Legal- and Process GRC:
Legal- and process GRC places the operative business in the center of consid-
eration. Are all legal requirements fulfilled? Customs and tax issues will be
resolved. Moreover, do the business processes guarantee the observance of
norms (e.g., ISO, DIN) and standards (including industry standards) across
national borders? Market risks and endangerment through a business interrup-
tion (e.g., force majeure) must be taken into account.
• IT GRC:
Information technology plays not only an important role in GRC with the
implementation of the GRC mechanisms—think of the automation of business
processes and monitoring of key indicators—but also presents significant risks.
In addition, they are subject to numerous regulations, such as with data pro-
tection (e.g., Federal Data Protection Act). An appropriate course of action
guarantees a secure and legally compliant use of all information resources. In
addition, programmatic anomalies and violations of existing regulations must be
avoided, e.g., regarding the segregation of duties (SoD).
4.5.4 Simultaneous Planning of the Business Architecture
A successful BPM implementation requires that BPM be fully integrated into a

company’s business architecture. All the more so when BPM acts as the core of a
corporate GRC strategy. Only this integration ensures that BPM takes everything
relevant to the core business processes into account. Figure 4.22 illustrates the
integration of BPM with the most important components of a business architecture.
In this figure, the BPM symbol already shows that a complete business process
model requires that the links to the various aspects of the business architecture must
be visible. Modern business process modeling tools such as Horus by
Germany-based Horus software GmbH offer these options at all model levels. The
following statements demonstrate why a simultaneous planning of the various
architectural aspects is advised and proven in practice. In planning, BPM makes for
the central point of reference:
• BPM and Human Capital Management (HCM):

The design of business processes is determined to a large extent by the quantity
and the qualifications of the personnel available. On the other hand, the given
business goals and strategies result in requirements to business processes, which
in turn have requirements to the Human Capital Management.
• BPM and Master Data Management (MDM):
The business processes are defined on the basis of the corporate master data with
the aim to achieve an equivalent level of quality, as eventually the quality of
processes and master data entail each other. Through the simultaneous planning
Human Capital Master Data

Management Management
(HCM) (MDM)
Business Architecture Planning
Business
Process
Management
(BPM)
Business Rules
Enterprise Management (BRM)
Performance Management
(EPM)
Systems People
Documents
Decisions
Events
Fig. 4.22 Simultaneous business architecture planning
the completeness of the business process and master data structures are pursued,
along with the quality.
• BPM and Business Rules Management (BRM):
The business rules defined in a cross-process context provide a framework into
which the business processes must be fitted. On the other hand, new regulations
arise within defined business processes. Based on these considerations, it is
recommended to tune the rules defined at different levels to each other by means
of simultaneous planning.
• BPM and Enterprise Performance Management (EPM):
In EPM, performance measurement systems are designed and used to monitor
the company’s performance and therefore also the performance of business
processes. In this respect, EPM is an essential component in business process
governance and therefore simultaneous planning is also recommended here.
4.5.5 Standardization and Harmonization: Company-Wide

and Beyond
Practice reports and surveys show that more and more companies are threatening to
choke on their complexity in different countries and industries. Often the reasons
for the complexity are quite comprehensible, especially when referring to the
products’ varieties and complexities, or to the demand of the market to meet
individual customer experiences. But even in these cases it is worth taking a look
into the company’s business processes. Often innovative technologies provide
entirely new ways to streamline or consolidate the processes and consequently to

reduce the complexity. Is there really a need for so many different process variants,
or are there starting points for a—at least partial—standardization of processes in
certain divisions or within the group as a whole? Can processes of different com-
panies within a group be unified or at least harmonized and reduced to the differ-
ences in cultural characteristics? And do the special process features required by
business really add value or have they rather become “cherished traditions”?
Undoubtedly, it is a big challenge to discuss such questions—which are gen-
erally unpleasant to those affected—in the company. And the implementation of
standardization and harmonization requires effective and often cost-intensive
organizational change management measures. But the potential benefits are usually
much too large to not be exploited. BPM provides appropriate methods and tools to
support process standardization and harmonization: Clear and easy to understand
graphical models provide a basis for communication that promotes a common
understanding of the processes and facilitates the analysis of process variations.
Analysis and simulation tools allow the adequacy of qualitative statements with
appropriate figures and visual representations. The monitoring of the change pro-
cesses and their outcomes over time is done using BPM monitoring tools.
Case Study: Globally Distributed Business Excellence Platform
The fact that standardization and harmonization must be taken into account for all
components of the business architecture and that this also requires a simultaneous
procedure will now be illustrated by an example from practice. Figure 4.23 shows
the architecture of the business excellence platform of a globally operating tech-
nology group. As a result of business process reengineering, the core business
processes have been harmonized across the group, and where possible, they were
also standardized. However, due to different product lines, different standardization
classes had to be created, which, however, did not affect harmonization. Indeed,
harmonization and standardization were carried out simultaneously also with
respect to the applicable business rules within the group and the master data.
In addition to the core processes, some support and management processes were
also standardized and implemented in the form of group-wide Shared Service
Centers. Company-specific business processes without group-wide relevance, or
processes that reflect local characteristics were not standardized and consequently
they were implemented in a local context in order to keep the complexity of the
group solution at a manageable level. This does not mean that integrated processes
had to be omitted. This is ensured by an implemented integration platform that is
based on a service-oriented architecture (SOA) and that combines the implemented
processes at the group level with the local processes and are integrated with the
relevant business applications. With the integration platform, external partners are
also involved in the business processes. This means that sub-processes, relevant
master data and also a part of the business rules, do not only apply to the group, but
in addition they are also binding for business partners.
Cross-company
Core business processes
Corporate
Cross-company Cross-company
Master data management
BPM/BRM Repository SOA Integration platform
products suppliers
customers sites
Company-specific
processes
External Operative Data

partners
Local BPM/BRM
Repository
Fig. 4.23 Architecture of a globally distributed business process excellence platform
Standardization with (Industry-Specific) Reference Models

In practice, it turns out that the standardization of business processes sometimes
requires very complex coordination processes. This is especially true when aiming
at standardization across company boundaries. The remedy here is a reference
model that reflects the best practices of a company. The best-practice character—
which should be ideally detectable—facilitates standardization efforts considerably.
In an enterprise-wide standardization, industry-specific reference models, pro-
posed by industry associations, have proven to be efficient. They facilitate com-
munication between companies in the same industry and provide a robust basis for
cross-enterprise process integration and the definition of processes in business
networks. Perhaps the most widely used industry reference models in practice are
eTOM®17 und SCOR®.18 These are generic reference models that reflect the
common processes and indicators in their industry in a very practical way. Their
generic character, however, ensures that the corporate users are not limited in their
options for the delimitation of their competitors.
Figure 4.24 shows an example of an excerpt from a SCOR reference model. In
this representation, the hierarchical model structure yields a reduction of the model
complexity.
17
eTOM® (enhanced Telecom Operations Map) is a business process framework by the
TeleManagement Forum aimed at service providers of the telecommunications industry and their
partners (cf. Schönthaler et al. 2012).
18
SCOR® (Supply Chain Operations Reference Model) describes the process reference model by
the Supply Chain Councils with the aim to discuss and improve supply chain management
procedures within a company and with business partners (cf. Schönthaler et al. 2012).
refinement
Fig. 4.24 Excerpt from a SCOR reference model
4.5.6 Business Process Outsourcing (BPO)
The beginning of this millennium traded as a future mega-trend in the business

world, virtualization strategies have meanwhile developed into common instru-
ments for corporate management. In a business context, virtualization is based on
the idea that enterprise business processes do not necessarily have to be performed
by the enterprise itself, but can also be performed by strategic partners. However,
the overall process is guided by the virtualized enterprise, which requires close
cooperation, and in the subsequent process presupposes close integration with the
strategic partners.
The relocation of business processes as part of a virtualization strategy is referred
to as business process outsourcing (BPO). By means of BPO, the complete exe-
cution of a business process is delegated to a BPO service provider, including all
underlying business services. Cost advantages are often strived for with BPO,
especially if personnel or resource-intensive processes are involved. However, BPO
should not only be considered from a cost perspective. It is more important to look
at the BPO decision from the perspective of value creation. Advantages in opening
up new markets must be taken into account (e.g., by adding a local distribution
partner in the target market) or through the use of specialized knowledge, for
example through special manufacturing processes that introduce strategic manu-
facturing partners into the value chain. In addition, shorter process cycle times or
minimizing process risks are valid considerations in a BPO decision. Some
important application areas for BPO in practice will be addressed next.
Typical Fields of Application

Personnel-intensive Transaction Processing
In many industries, outsourcing of transaction executing processes involving high
personnel intensity has become common practice these days. It offers high savings
potential and sustained productivity improvement in the back office, as employees
can then focus on more tasks involving disposition. The enterprise customers
perceive this through competitive prices and improved service quality. Service
examples from different industries include:
• Financial Services:
Purchasing and accounts payable, order management and accounts receivable,
general accounting, travel and expense reports, account reconciliation.
• Trade:
Logistics, freight invoices, order management, returns, account reconciliation.
• Insurance:
Quotations, contract conclusion, premium statement, accounts receivable,
claims processing.
• Healthcare:
Social insurance accounting, processing of reimbursement claims, accounts
receivable, rejected reimbursement claims.
Knowledge-intensive Processes
While cost considerations have been paramount in the previously considered BPO
examples, the superior knowledge of a service provider plays an important role in
the following examples. In general, the scope of outsourced knowledge-intensive
processes and in particular its complexity is significantly higher than the “simple”
transaction processing mentioned above. For the end customer, the advantages of
BPO knowledge-intensive processes are demonstrated mostly through improved
product quality and the accelerated implementation of innovations. A few examples
will illustrate this:
• OEM (Original Equipment Manufacturing) Production:

OEM manufacturers have special—often low cost—resources or special
expertise in the manufacturing of “original equipment.” Characteristic of OEM
manufacturers is that they do not bring the goods produced into the market. In
many cases, engineering services or perhaps necessary tools are provided by the
client, or are developed jointly as part of a collaborative engineering process.
• Contract Logistics:
Contractually fixed package of logistics services (transportation, warehousing,
refinement, etc.) performed by a logistics service provider for purchasing or
sales organizations.
• Distribution Services:
The service provider takes over the distribution of certain products in a clearly
defined target market. Often, distribution is not limited to the marketing of the
products, but includes marketing, warehousing, and often end-customer service.
• Contact Center:
Qualified service providers operate entire contact centers on behalf of their
customers. Multi-modal and multilingual inbound and outbound services are
offered. Typical inbound services include customer service and care (loyalty,
commitment, satisfaction) or internal IT help desks for technical and user
support. Typical outbound services include campaign management and tele-
marketing, telesales, sales support (appointments), tele-market research or also
multi-modal dunning processes. Up and cross-selling opportunities are put to
full use with a high level of expertise.
• Personnel Management and Earnings:
A very common application for BPO is personnel management (full
recruit-to-retire processes) and especially the settlement of wage and salary
statements. Although cost considerations indeed play a role in the personnel
area, the high level of expertise and the service provider’s ability to remain in a
current state of knowledge stands in the foreground of BPO.
Exploitation of Location Advantages

Location advantages of a service provider are utilized in many applications of
knowledge-intensive processes. For example, an outsourced assembly at the cus-
tomer location can lead directly to cost advantages, shorter delivery times, inten-
sified customer loyalty, and hence to competitive advantages. The situation is
similar if contract logistics, where the operation of a warehouse near the customer
or the supplier, is included. Alternatively, distribution services are delivered into a
local target market in which the commissioned sales organization itself is not
represented.
Basic Principle of Business Process Outsourcing
BPO follows a basic principle, which is shown in Fig. 4.25. A sample scenario is
illustrated in which three BPO service customers outsource a process to the BPO
service provider. Due to specific business requirements on the customer’s side, the
problem for the service provider arises that the outsourced processes are not
identical. This yields very significant challenges for his flexibility, without which he
will not be competitive in the market. In addition, the service provider must act very
cost-sensitively, focusing his capabilities on the best customer value possible.
A suitable solution is the Horus BPM architecture, which separates the
implementation-oriented business services from the customer-specific business
processes (cf. Schönthaler et al. 2012). Incidentally, the implementation of cus-
tomer business processes on a homogeneous service layer presents, in itself, a form
of virtualization.
Service Provider Service Customer

Mediation Layer
Customer 1
Customer Processes
Business
Services
Customer 2
Customer 3
Orchestration
Business Process Outsourcing
Fig. 4.25 Basic principle of business process outsourcing (BPO)
Process costs typically are at the center of economic BPO studies. They are
compared with the expected benefits in order to come to a resilient basis for
decision. Keep in mind that BPO always raises technical issues. How does the
integration of the outsourced business processes with internal processes or other
outsourced processes take place? How are corporation-wide processes realized
beyond corporate boundaries, and how does the overall process control take place?
Can business activity monitoring be guaranteed across all processes? Who will
ensure compliance of business rules within the overall context of the virtual
enterprise? How will a uniform and consistent master data management be ensured?
Are collaborative planning processes, involving all participating business partners,
including customers and suppliers provided for? The sheer quantity of these issues
shows that in the initial technical implementation of BPO, and in the current
operation, significant cost drivers can be found.
Integration components in the form of a mediation layer are shown in Fig. 4.25.
A mediation layer is spoken of since the service provider strives rising economies
of scale and will therefore always try to handle all outsourced processes internally
as consistently as possible. This is reflected in largely standardized procedures with
appropriate control and data flows. On the other hand, since the service customer
expects his individual requirements to be implemented, a customer-specific trans-
formation of the control and data flow exchanged between service providers and
service customer must take place within the mediation layer. BPO service cus-
tomers also demand—not least of all due to GRC requirements—a degree of
transparency regarding the process results, but also the process execution itself.
Accordingly, the service provider is required to offer customized reporting,
meaningful key indicators and analysis.
Model-Based Planning and Implementation of BPO Contracts
As the name suggests, BPO is all about business processes, so that BPO is a
“natural” field of application for methods and tools like Horus. And this applies
across the entire life cycle of BPO contracts. In addition, it should be more than
clear from the foregoing statements that BPO requires a common understanding of
the technical business requirements between service customer and service provider,
generated by the outsourced process to be implemented. This common under-
standing can be specified in a formal way, for example, by means of Horus models.
How Horus can be used by parties involved with BPO, and the resulting benefits
will be addressed in the following.
Service Customer Point of View

For a service customer, the first task is to identify the process intended for the
outsourcing and to selectively separate it. The starting points for this are usually a
clean definition of business strategies and objectives in conjunction with the results
of a SWOT analysis. Preferred outsourcing candidates are processes where weak-
nesses have been identified in their context or where opportunities could arise from
their outsourcing.
Separation of the process intended for outsourcing leads to a procedure model in
the next step, in which the required resources are allocated, including the exact
personnel requirements. It is not surprising that in practice outsourcing decisions
are often highly controversial. In such cases, it has been proven useful to model
different variations—for example, 100% outsourcing, partial outsourcing, internal
optimization of the current process and finally the actual process itself—of the
outsourcing candidate in simulation-ready form. By applying simulation, extensive
figures can then be obtained with different forecast scenarios, in order to support a
BPO decision quantitatively.
From simulation and internal feasibility studies, a model-based specification of
the business process to be outsourced will result. This specification should include
the expected features of the process—costs, time, added value, quality—also the
relevant business objects and a key indicator system by which the performance of
the process is measured. The process specification then forms the core of a spec-
ification book based on which a provider selection and call for bid takes place for
the BPO.
Service Provider Point of View

For the service provider, working with Horus models offers starting points in the
market-oriented design of the “process offer,” subsequently in the sales process and
the implementation of the acquired outsourcing contracts.
First, it is recommended that the service provider separates the reusable business
services provided to him from the industry and customer-specific business pro-
cesses, as presented in Sect. 2.1 where the Horus BPM architecture has been
introduced. This has the advantage that in customer-specific adaptations of the
offered business process, the businesses services can further be used, mostly
unchanged. Based on this, the service provider creates several industry-specific
process templates to support their marketing and sales. These templates are ideal to
show potential customers the performance and benefits of BPO. As experience
shows, different forecasts of the process load to be expected are based on such
benefit analysis. It is obvious that simulation can be a valuable service in such
benefit analysis. In addition, an animation of simulation runs as part of sales pre-
sentations has proven to be a convincing visualization tool.
Horus models also come to use as part of the proposal and, above all, the
specification preparation. They are used for the formal specification of the services
scope and contain, in addition to the procedure models, resource, business object
and key indicator definitions. By the way, a so-called “value-based pricing” can be
achieved based on these statements, i.e., pricing based on the actual benefit that is
generated by BPO.
The Importance of Industry Reference Models
The considerations for the use of models throughout the entire BPO contract cycle—
from the suitable offer on the market over the tendering and bid management to
implementation of the agreement—reveal a significant weakness. It arises from the
need to compare models that are created on the supplier and customer side, even
bringing them into congruence if necessary. This can be an arbitrarily complex, and
in any case, time consuming process. General reference models may solve this, as
offered by industry associations, for example. Well known samples are the enhanced
Telecom Operations Map® (short: eTOM®) of the TeleManagement Forum and the
SCOR® Supply Chain Operations Reference Model of the Supply Chain Council
(cf. Section 4.5.5). These reference models can then be used by both parties as a
common reference point. It is thus obvious that the mutual adjustment of the models
is much easier with this.
4.5.7 Social Innovation Management
In many industries, markets today show a volatility, which we at most have only
experienced in some segments of the financial industry up until a few years ago.
Business models with a half-life of only a few years or even months are no longer
uncommon. This is also a characteristic of the digital economy. Drivers are in many
cases business processes or service innovations that are made possible by new
digital technologies or through new uses of technology.
For innovations to be successful in the digital economy, more than ever speed
and consistency are crucial for competitiveness when asserting a market. It is also
important to use the knowledge of the entire business community for innovation
management. For this reason, more and more companies have started to anchor
their innovation management directly within the business community. The provi-
sion of the entire knowledge and creativity present in the community creates an
ideal basis for generating a continuous stream of innovative customer services and
business processes. This base is even more necessary, as in the public debate the
social benefits of innovation are questioned increasingly (see Stiglitz and Green-
wald 2014).
For innovation management, we propose a method that is based on the
above-described Horus Method for Social BPM (cf. Schönthaler et al. 2012). In the
business community, this forms social innovation networks where domain spe-
cialists, experts from different disciplines, ideally potential customers and external
partners in the value chain, as well as opinion leaders and idea givers are connected
to each other in a social network: the innovation community. The community works
using a web-based collaboration platform, on which, next to popular social media,
intuitive software tools are used to graphically model processes and services. The
active participation in the community work adds each individual’s creativity and
knowledge, which is then connected and amplified in group dynamic processes and
leads to process and service innovations. The quality of the innovation process
depends on that community members contribute relevant knowledge, experience,
creativity and inspiration openly and unconditionally and that they are prepared to
link these ingredients to the other members in a way that benefits the community.
Figure 4.26 shows how services can be provided for the entire innovation life-
cycle around a social collaboration platform. In this way, not only the actual
generation of innovation can be promoted, but also the learning process leading to
innovation or during the generation process, or as part of the market assertion, the
accompanying research, the development and marketing innovation.
Collaboration then becomes the lifeblood of innovation management and the
driver for agile business processes as a construction plan of the digital economy.
4.5.8 Sustainability of BPM Strategies: The Business Process

Factory
Up until now this section has dealt with the important fields of action for the
implementation of a holistic BPM platform in a company. In practice, however, it
turns out time and time again that even after the consistent implementation of BPM
projects, the BPM commitment is reduced in a gradual process (sometimes called
the “BPM erosion”). This results in cases where only short-term benefits are
achieved and that fact that significant volumes of long-term potential are not
assessed. Such situations can be avoided if, with the implementation of a BPM
platform, organizational measures for the establishment of a dedicated BPM
Social Partner
Collaboration
Social
Collaborative Collaboration
Marketing & Sales Collaborative
Platform Education
Education Consulting
Services Services
On On
Demand Demand
Collaborative Collaborative
Development SaaS / Innovation
On Premise
Collaborative POWERED BY
Research
Fig. 4.26 Social innovation management in the innovation lifecycle
Business Community Corporate Management

demand-driven (internal, external) (CPO)
breathing + agile
process improvement Process demand
(business focus)
performance + quality
Process Governor
BIZ PROCESS FACTORY
Process
Factory Manager Solution Architect governance
Best Practice
Reference
Process
IS Engineer Models
Department Developers Seamlessly integrated
Business BPM Methods Business Processes
& Tools Admini-
Analyst
Process demand strator
(integration focus)
Services & Integration Integrated
workflow
Manufacturing
Enterprise
application-specific
Applications Teams
knowledge (Oracle Apps, SAP, MS
Dynamics, …)
Fig. 4.27 Business Process Factory concept
organization are also established and implemented. In practice, a Business Process

Factory concept has proven successful, as shown in Fig. 4.27.
The factory concept perceives the design, operation, monitoring, maintenance,
and continuous improvement of business processes as a factory in which workers of
different qualifications—the plant manager, solution architects, process engineers,
business analysts, administrators and developers—carry out work sequences as

defined in their work schedules. It is important that this is a “breathing factory” that
can flexibly adapt its capacity to the actual utilization. The workers are provided
with powerful BPM methods and software tools as well as reusable best-practice
reference models. Internal and external members of the business community
approach the factory with needs for new processes or necessary process improve-
ments or changes. The Information Systems department also reports its needs,
although theirs are likely to be of a more technical nature, for example integration
requirements. The factory satisfies these requirements in the form of seamlessly
integrated business processes. The factory also has a separate department for the
development of services and integration capabilities. This department works closely
with the teams responsible for enterprise applications used in the processes. These
teams have specific knowledge in order to, for example, implement a business
service for customer administration in SAP or an interface to a cloud-based HCM
system.
From our experience, BPM governance is an important task of the factory.
A Process Governor or a Governance team is responsible for the compliance with
the defined regulations in a GRC concept, for a high process quality and perfor-
mance, and generally for a regular operation of the Business Process Factory.
However, the agility must always be maintained in the applied work schedules.
4.6 Further Reading
McKeen and Smith (2014) give an introduction to IT strategy development. Mahon

(2015) or Ruparelia (2016) discuss cloud strategies or ways to migrate to the cloud,
as do Erl et al. (2013)
The EVACS method was developed by Haselmann and Vossen (2014), where
further details can be found. As mentioned in Sect. 4.1.3, the concept of a com-
munity is further explained by Williamson (2005) or Theurl and Meyer (2005). The
approach of a cooperative community cloud has originally been proposed by
Haselmann et al. (2011). The hybrid cloud intermediary shown in Fig. 4.6 is taken
from Haselmann et al. (2015) and is based on Rensmann (2012).
For an introduction to data mining, the reader should consult Han et al. (2012) or
Witten et al. (2016). The Apriori algorithm was originally proposed by Agrawal and
Srikant (1994); Agrawal et al. (1993) had previously introduced the problem of
association rule mining. The textbooks just mentioned also present revision and
extensions of the original algorithm; for these, see also Leskovec et al. (2014) or
Zaki and Meira (2014). Sources like Han et al. (2012) or Witten et al. (2016) cover
other data mining topics as well, among them classification, clustering, and outlier
detection, that were mentioned above.
The transition from data warehousing 1.0 to what we have called the data
warehouse 2.0 in Sect. 4.2 has primarily been triggered by three factors: (1) the fact
that many analytical tools are nowadays available as stone-alone applications which
no longer require a data warehouse as their base, (2) the fact that many tools can
easily be plugged together today via their APIs, and (3) that many solutions have
become available as open-source. Several big data processing goals require specific
solutions; for example, looking for similarity of texts or documents can be based on
sophisticated techniques such as minhashing or locality-sensitive hashing as
described by Leskovec et al. (2014). Sentiment analysis is the topic of Liu (2015).
The process flow shown in Fig. 4.12 was originally described by Steffen (2013).
The social media analytics framework shown in Fig. 4.13 was originally proposed
by Stieglitz and Dang-Xuan (2013). The Stieglitz-Dang-Xuan framework has been
used, for example, by Ruland (2015) for doing an in-depth analysis of public
microblogging and parliament protocol data to find out how the German members
of the Bundestag as well as the German public perceives TTIP, the highly con-
troversial Transatlantic Trade and Investment Partnership under negotiation at the
time of this writing.
For details on the Smart Data Research Program we mentioned in the text, the
reader is referred to www.digitale-technologien.de/DT/Navigation/EN/Home/home.
html. Figure 4.15 on data marketplaces is originally from Muschalle et al. (2013);
the topic has intensively been studied by Schomm et al. (2013) or Stahl et al. (2014,
2016).
An introduction to distributed ledger technology can be found in ASTRI (2016),
to blockchain technology in Diedrich (2016) or Drescher (2017). These technolo-
gies have a number of applications, for example, in mortgage loan applications,
trade finance, digital identity management, regulatory compliance, or cryptocur-
rency. Most prominent in the latter area is Bitcoin, originally proposed by Naka-
moto (2008). Bitcoin is an example of an unpermissioned distributed ledger
platform, meaning that it is maintained by public nodes and is accessible to anyone.
Another platform type is the permissioned one, which involves authorized nodes
only and hence facilitates faster, more secure, and more cost-effective transactions;
an example is Corda (see www.corda.net/). The development of Corda is led by R3,
a fintech company that heads a consortium of over 70 of the world’s largest
financial institutions. Other such consortia are the Enterprise Ethereum Alliance
(entethalliance.org/), Ripple (ripple.com/), or Hyperledger (www.hyperledger.org/).
For up-to-date information on permissioned blockchains we refer the reader to the
respective Web page of IBM Fellow C. Mohan at bit.ly/CMbcDB.
Digitization and Disruptive Innovation
5
After having discussed technical developments over the last few decades and
presented strategies for IT-related decision making in specific areas, the message in
this chapter is that companies need to change the way they consider their customers,
as well as their internal business operations. One core keyword here is digitization,
another is disruption. Our goal is to discuss what innovation and disruption mean
and provide some typical examples of businesses disruption. The Christensen
theory states that traditional companies cannot be disruptive since they are busy
keeping their customers and fighting the competition; hence, they will ultimately
fail when disruptors take over. However, there might be ways even for a traditional
company to survive, which we will discuss.
5.1 Innovation. Social Innovation Labs
We begin with Wikipedia’s definition of innovation: “Innovation can be defined

simply as a ‘new idea, device, or method’. However, innovation is often also
viewed as the application of better solutions that meet new requirements, unartic-
ulated needs, or existing market needs. This is accomplished through more-effective
products, processes, services, technologies, or business models that are readily
available to markets, governments and society. The term ‘innovation’ can be
defined as something original and more effective and, as a consequence, new, that
‘breaks into’ the market or society. It is related to, but not the same as, invention.
Innovation is often manifested via the engineering process.”
If we look back at previous chapters, we have already discussed various
approaches to digitization and innovation: In Chap. 2 we presented examples of
digitized business processes; in Chap. 4 we outlined consumerization and the
digital economy. We have already stated out belief that in a globalized world,
business processes increasingly form the crux of any organization, since any change
in an organization will be accompanied by changes inseparable from its business

DOI 10.1007/978-3-319-60161-8_5
224 5 Digitization and Disruptive Innovation
Market research
R&D Insights
New product or service

ideas
Early SWOT
adopters Analysis
Opportunity evaluation.
Delivery
Value proposition
Preparation
of market
entry
Development Pilot development. Testing
Proof of concept
Verification and validation
Fig. 5.1 Innovation as a cycle
processes. Moreover, changes in the global marketplace are common on the daily
agenda of any business: Companies are increasingly forced to adapt to
(new) markets, customers, competitors and business partners, but also to new
requirements in terms of governance, risk, compliance, and security management
(GRC+, see Chap. 2). We also saw that, “thanks” to Big Data, numerous options
exist today to enhance a business process or the knowledge that a company has
about customers, services, and products, provided the respective enterprise is aware
of these options and is willing or even has a strategy to exploit them.
Most often, innovation is seen as a process, often one that is cyclical in nature,
like the one shown in Fig. 5.1. It typically starts with a development of ideas for
new services or products or from the recognition or discovery of novel customer
needs. This may or may not be based on market research,1 or just on brainstorming
or insights from a company’s research and development (R&D) department. Once
an innovative idea has been fixed, opportunity evaluation can start, as can value
proposition, the former possibly based on a SWOT analysis or even other tools
from the Horus method described in Chaps. 2 and 4. This phase considers the risk
of failure, the cost of development, the return on investment (ROI), and as well as
the competition.
If an evaluation indicates positive opportunities, a prototype/pilot of the product
or service can be developed, in order to conduct a proof of concept, possibly with
selected test customers or users. If the intended purpose can indeed be verified and
the considerations around a wider introduction be validated, development can start.
This phase will be accompanied by a preparation of the market entry (through
1
Former Apple CEO Steve Jobs never believed in market research.
5.1 Innovation. Social Innovation Labs 225
marketing and sales departments), and finally delivery can first go to early adopters
and then to a wider audience.
Not surprisingly, these phases, sometimes abbreviated just as Definition, Dis-
covery, Development, and Delivery, will have touchpoints with existing business
processes, and nowadays makes extensive use of big data. As we have discussed,
data can be acquired from reviews, blogs, and social media, to find out what people
say about a (new) service or product, recommenders can be used to produce
awareness, and data mining techniques like classification can be employed to
identify pilot customers or those to whom the new product is offered first.
Occasionally, the cycle shown in Fig. 5.1 involves the development of a new
business model, typically at the very beginning when a new product or service is
being conceived. A typical example is Apple’s iTunes store. When the first iPhone
was released in 2007, which for the first time combined a mobile phone with a
PDA, a music player, and a Web capable device, a new business model for mar-
keting and distributing music came with it, which was soon expanding into a
distribution and sales channel for software (in the form of apps) as well.
We mentioned cloud revenue models in Chap. 2. Among them is the freemium
model, which combines free basic services with charging for an advanced one (or
often free access to services in exchange for commercial use of the customer’s
data), as well as the subscription model. Online advertising as a prominent Web
business model was discussed in Chap. 3. Other Internet business models that have
evolved over time (many of which are “digitized” versions of classical business
models) include the following:
• Auctions: the process of buying and selling goods or services by offering them
up for bid, taking bids, and then selling the item to one of the bidders; among
them
– English auction: items are sold to the highest bidder.

– Vickrey auction: items are sold to the second highest bidder.
– Dutch auction: The price of an auctioned item descends rather than increases.
Examples especially of English auctions are eBay or TradeMe (the latter in New
Zealand). Dutch auctions are typically used for perishable products that need to
be sold and consumed in a short period.
• Brokerage: brings together buyers and sellers and charges a fee per transaction
to one or another party. Examples are Charles Schwab or Bayleys. We also
described the concept of a cloud intermediary in Chap. 4, which is a form of
brokerage.
• Razor and Blades: where an item is sold at a low price (or even given away for
free), in order to increase sales of a complementary good, such as supplies
needed to use the item. An example of this is printers (inkjet or laser), which are
typically amortized via their consumable supplies. The concept, also known a
freebie marketing, is wrongly attributed to the founder of the Gillette Safety
Razor Company, who gave away razors for free and made people pay for the
blades they needed. The concept is still common in the mobile phone market,
where devices are often subsidized via the contract.
Essentially, a business model is characterized by the following four components:
• A value proposition, i.e., a statement of how the products or services offered can
create value for the customer.
• A revenue model, i.e., a description of the cash flows that will generate profit.
• A specification of the target customer or the market segment to which the
products or services are to be offered for the purpose of creating value and
revenue.
• Distribution channels, the actual ways in which the company plans to reach its
customers.
In the examples of Web companies we have considered before, these components

are easily verified. For example, Google’s value proposition is to “improve the
ways people connect with information.” Its revenue model is to make search free
(as is the case with many other services, but not all), yet make money through
online advertising. The target customer can be anybody, and the single distribution
channel is the Web. By the same token, Amazon’s value proposition is that people
can order any product from anywhere, anytime, and the revenue model consists of
commission taken from every purchase. For Netflix, the value proposition is to
watch all your favourite movies from the comfort of your home or anywhere you
wish, revenue is made through a subscription fee, and distribution channels include
the Web and bundling with others, e.g. Apple TV or Amazon Fire TV Stick.
The ingredients of a business model have been neatly completed in the Business
Model Canvas that was developed by Swiss business theorist Alexander Oster-
walder, founder of Strategyzer (strategyzer.com/), and colleagues and that is
nowadays seen as a strategic management template for developing new, or
Customer
Key Activities
Relationships
Key Value Customer
Partners Propositions Segments
Key Resources Channels
Cost Structure Revenue Stream
Fig. 5.2 The Business Model Canvas

documenting existing, business models—as is shown in Fig. 5.2. Each box of this
visual chart poses different questions, and the idea is to collect all the information
relevant to a business model for a specific purpose or area. We now list the
questions contained in each box; notice that these questions are not strictly separate,
but also connect the various boxes or aspects:
• Key Partners:
Who are our key partners?
Who are our key suppliers?
Which key resources are we acquiring from partners?
Which key activities do partners perform?
• Key Activities:
What key activities do our value propositions require?
Our distribution channels?
Customer relationships?
Revenue streams?
• Key Resources:
What key resources do our value propositions require?
Our distribution channels?
Customer relationships?
Revenue streams?
• Value Propositions:
What value do we deliver to the customer?
Which of our customer’s problems are we helping to solve?
What bundles of products and services are we offering to each customer
segment?
Which customer needs are we satisfying?
• Customer Relationships:
What type of relationship does each of our customer segments expect us to
establish and maintain with them?
Which ones have we established?
How are they integrated with the rest of our business model?
How costly are they?
• Channels:
Through which channels do our customer segments want to be reached?
How are we reaching them now?
How are our channels integrated?
Which ones work best?
Which ones are most cost-efficient?
How are we integrating them with customer routines?
• Customer Segments:
For whom are we creating value?
Who are our most important customers?
• Cost Structure:
What are the most important costs inherent in our business model?
Which key resources are most expensive?
Which key activities are most expensive?
• Revenue Streams:
For what value are our customers really willing to pay?
For what do they currently pay?
How are they currently paying?
How would they prefer to pay?
How much does each revenue stream contribute to overall revenues?
In addition to these questions, each box lists further aspects or asks for various
details, e.g., motivations for partnerships, categories of key activities, types of
resources, characteristics of the value proposition or the cost structure, channel
phases, or types of revenue. The Business Model Canvas (strategyzer.com/canvas)
has received considerable support and can provide guidance for not only existing
companies who enter into new business areas, but also for startup companies who
need to better structure their product and business idea. As a sample application of
the canvas, consider Fig. 5.3, in which the Business Model Canvas has been
applied to Google as an example.
An important question for companies that have some history already, that have
been in their market for some time, and that have had a fair degree of success, is
how to produce innovation. How can they keep up with new technical develop-
ments, changing customer tastes and demands, or product or service innovations
made elsewhere by the competition? One way, which we will discuss in Sect. 5.3
below, is to disrupt an existing industry or business approach by a totally new and
unique way of doing things. However, as we will see, this might be difficult for an
established company that has to continue to pay its employees and that needs to
monitor market and competition in order to keep up.
Key Activities Customer

manage massive IT Value Propositions Relationships Customer Segments
Key
Partners infrastructure automation
R&D for new Web search, Gmail, dedicated sales for Internet users
products Google+ large accounts advertisers
distribution partners
Adwords, Adsense ad agencies
Open Handset
Android and Chrome Channels mobile device
Alliance Key Resources
OS platforms global sales and owners
OEMs for Chrome OS data centers
hosted Web-based support teams developers
devices intellectual property
apps multi-product sales enterprises
brand
force
Cost Structure Revenue Stream

costs of traffic acquisition, data center operations, ad revenues
R&D personnel enterprise product sales
Fig. 5.3 Business Model Canvas applied to Google

In this situation, a different approach that has been made popular through a
variety of (successful) examples is to establish an innovation lab. Such a lab is a
physical or virtual space intended for the initiation, conception and testing of
innovative ideas. Based on some relevant infrastructure, an innovation lab enables
cooperation and collaboration among people from distinct backgrounds and disci-
plines, which may even include (future) customers. The main goal of such a lab is
the interdisciplinary exchange of ideas, information, and knowledge, and the
underlying approach is often based on design thinking. If an innovation lab is run
by an established company, it will ideally be integrated into the company’s inno-
vation process(es).
Innovation labs, which are often company-owned, but might also be run by a
collection of different entities, e.g., universities, business development agencies,
chambers of commerce, sponsors, etc., can be seen in a wider context as one form
of digital lab; other forms are:
• Company Builders are intended to found startups themselves and accompany

their initial growth steps.
• Accelerators offer programs that last several months for which founders can
apply; if successful, they receive support in almost all business areas.
• Incubators take a share in startups, yet unlike accelerators take a long-term
perspective.
Another type of innovation lab is the co-working lab, where typically people
with vastly distinct and different backgrounds are working towards a common
(innovation) goal. Instead of discussing these types in detail, rather, we look at a
few examples. We note, however, that these labs are often referred to as social
innovation labs due to the fact that they benefit from the social interaction of its
participants. Our first example is from Deutsche Bahn (DB) AG, the main railway
operator in Germany. Although in recent times DB is not exactly known for
punctuality and flawless technology, it has launched a number of initiatives that aim
to improve current processes and procedures or take a look at future developments.
One such virtual activity is Bahn.de/ideenschmiede, where customers can submit
new ideas and contribute to a development of new products for customers; the
platform regularly runs competitions in which anybody can participate, and people
can also comment on product suggestions that have been made. A recent (and
probably not too serious) proposal, for instance, suggested to add a sauna car to
German trains in which people can enjoy spa amenities during long-distance trips.
Web site inside.bahn.de/innovation/ lists a number of other DB approaches to
innovation. Our second example is the Porsche Digital Lab that was opened in
Berlin, Germany in August 2016 with two primary goals: firstly to identify and test
innovative information technologies relevant to motor vehicles, customers, and
staff. Secondly the lab is intended as a platform for collaboration with other tech-
nology companies, venture capitalists, startups, and science.
Another form of lab is what German air carrier Lufthansa is experimenting with.
A Flying Lab is an event that takes place on board a flight and that is a kind of mini
tech conference, where distinguished people discuss future tech scenarios and how
human and machine are fusing though digitization (or designers present their latest
fashion collection) and passengers can follow (and comment) via the onboard
WLAN network.
A virtual lab similar to the one just described was established by Starbucks under
mystarbucksidea.force.com/, where people can publish ideas regarding an
improvement of Starbucks’ business, other people can comment and essentially
“vote” on them, and top-rated ideas might be put into action by Starbucks. For
example, business people who work near a Starbucks outlet and regularly have their
morning coffee there suggested a way to skip the line and have their coffee ready as
soon as they enter the shop. As a result, in some countries (including the UK)
Starbucks now enables customers to order over a smartphone app so that the
product is waiting for them at a specific pickup time.
The basic idea of social innovation labs is that the members of an innovation
community collaborate in a social network, to exchange ideas about how to
overcome identified disharmonies, to define objectives, strategies, product struc-
tures and requirements, to prepare models of business processes and services or
even “just” to find a common understanding of a disharmony and solution
requirements.
Participants perceive the lab as a unique experience in which they accomplish
tasks together as a team, take on responsibility and contribute new ideas. Group
dynamic processes empirically provide for quite some surprising results that may
help overcome barriers, identify compromises and strengthen the sense of com-
munity. The lab acts as a catalyst for creativity and willingness to compromise and
helps to form a common understanding of disharmonies, products, processes, and
services. Finally lab collaboration sometimes leads to exceptional situations in
which the participants have to deal with uncommon or incorrect behavior of col-
laboration partners, with poor quality of results, with misleading instructions etc.
In such a lab, the entire innovation community should be represented, ideally
including employees across all relevant organizational units of all hierarchical
levels, customers across all target market segments, strategic business partners
including suppliers, and external advisors (consultants, spin doctors, researchers).
This does not necessarily mean that there has to be one member from every
community group involved—often it is sufficient if a community member repre-
sents the interests of an entire group. It is necessary, however, that the represen-
tative provides a distinct understanding of the needs, feelings, or goals of the group
she or he represents.
Various roles should be represented in a social innovation lab: The Moderator
establishes an initial structure of the innovation sphere, in which s/he forms groups
(teams) of innovation community members. An essential task is to guide commu-
nity members through the lab. The Lab participants are allocated to the teams
according to their competence and expertise, developing multi-site innovation
teams. In each team a Leader will be identified who will supervise integrative tasks
Social Innovation Lab

• Community brainstorming Social Innovation Lab Social Innovation Lab
• Living the disharmony • Simulation and Animation • Advanced Education
• Product & service modeling • Demonstration of prototypes • Problem solving
• Simulation and Analysis • Visualization • Further improvements
• Service prototyping
Sensing Envisioning Offering Adopting Sustaining
Social Innovation Lab Social Innovation Lab

• Living the innovation ideas • Visualization
• Product & service modeling • Demonstration of Pilot
• Product & service prototyping • Education
• Simulation and Animation • Piloting support
• Visualization • Pilot improvement
Fig. 5.4 Social innovation labs supporting invention and adoption processes
and take responsibility for the team results. Depending on the size of the Lab and
the knowledge of the participants, Quality managers are appointed for technical and
substantive review of the resulting models. Experts On Demand are available as a
point of contact for questions on methodology, modeling and tool use.
Social innovation labs particularly support the innovator‘s skills for social
interaction. And these skills are of paramount importance especially in generative
innovation environments. Therefore, there are interesting applications for Social
Innovation Labs during all stages of innovation, as shown in Fig. 5.4. They range
from supporting brainstorming activities to the experience of disharmony through
to modeling products, services and processes. By using Horus tools in the context
of the Labs, extensive analyses and simulations are possible with reasonable effort.
For service and process innovations, Horus also supports the construction of pro-
totypes and piloting of innovation. Of course, Social Innovation Labs also con-
tribute to the training members of the innovation community.
We conclude this section by noting that innovation labs have role models in
Tech Shops, a concept not restricted to IT applications: “TechShop provides access
to instructional classes, events, and over $1 million worth of professional equip-
ment and software at each location. Each of our facilities includes laser cutters,
plastics and electronics labs, a machine shop, a woodshop, a metalworking shop, a
textiles department, welding stations, class and conference rooms, and much more.
Members have open access to design software, featuring the Autodesk Design
Suite. Huge project areas with large work tables are available for completing
projects and working with others” (www.techshop.ws/).
5.2 Digital Transformation. The Chief Digital Officer
The emerging business environment today is characterized by the fact that every-
thing is becoming digital: Airlines tickets are not printed anymore; banking
transactions exclusively take place electronically (and banks even charge for
manual transactions); reading books or magazines have moved from physical prints
to digital versions on e-readers, ipads or even smartphones; music and movies have
moved from storage devices that can physically be handled to streaming over the
Web. While these developments can be seen as one of the many consequence of
Moore’s Law, as we discussed in Chap. 1 and of technological progress in general,
we are now in a stage where digital transformation is incurring change in every
aspect of business, and indeed almost every aspect of society. While initially per-
ceived as “going paperless” only (which has never happened to its full potential, at
least not until the time this book was written), it is nowadays seen as more than just
enhancing traditional methods of doing business (e.g., in travel agencies or banks);
it is seen as an enabler of innovation and new forms of creativity, no longer
restricted to a particular domain.
“Digital transformation is the profound and accelerating transformation of
business activities, processes, competencies and models to fully leverage the
changes and opportunities of digital technologies and their impact across society in
a strategic and prioritized way, with present and future shifts in mind. The devel-
opment of new competencies revolves around the capacities to be more agile,
people-oriented, innovative, customer-centric, aligned and efficient. The goal is an
ability to move faster from an increased awareness capability regarding changes to
decisions and innovation, keeping in mind those changes” (www.i-scoop.eu/digital-
transformation/). In Sect. 5.5 we will return to Amazon as a perfect example of
taking advantage of digitization and constant digital transformation.
In a blog post in the MIT Sloan Management Review in January 2014, George
Westerman, Didier Bonnet, and Andrew McAfee identify the nine elements of
digital transformation shown in Fig. 5.5 (see sloanreview.mit.edu/article/the-nine-
elements-of-digital-transformation/).
They consider digital transformation an opportunity “to radically improve per-
formance or reach of enterprises” and identify the three main building blocks shown
in Fig. 5.5. In Chaps. 3 and 4 we have already discussed the first aspect of trans-
forming the customer experience; indeed, we argued that companies are nowadays
interested in providing an absolutely smooth customer journey, and the digital
workplace is now the state-of-the-art. The second aspect of transforming opera-
tional processes was dealt with in Chaps. 2 and 4. The study performed by
Westerman et al. revealed that companies see automation as a way to free their
employees for more strategic tasks, and that the virtualization and digitization of
work enables them to separate actual work from the location where it is performed.
Moreover, we have already seen that Big Data analytics will enable decision makers
to make better decisions. The last aspect of transforming business models refers to a
transformation of how business is conducted, e.g., in the food or banking industry
5.2 Digital Transformation. The Chief Digital Officer 233
• Customer understanding
Transforming customer
• Top-line growth
experience
• Customer touch points
• Process digiƟzaƟon
Transforming operaƟonal
• Worker enablement
processes
• Performance management
• Digitally modified businesses

Transforming business
• New digital businesses
models
• Digital globalizaƟon
Fig. 5.5 The nine elements of digital transformation
or in retail, to the introduction of new digital products, e.g., digital tracking devices
that complement physical sports gear, and to an extension of the business from a
multinational to a truly global operation.
Let us continue with the examples just mentioned for a moment: Fintechs, a
modern shorthand for financial technology and used as a synonym for companies
using innovation and current technology for competing with traditional financial
institutions, originally tried to just reinvent payment applications, lending, and
money transfers. Meanwhile, they have expanded into more than thirty banking
areas, according to a January 2017 McKinsey report.2 These areas cover insurance,
investment banking, wealth management in addition to the ones already mentioned,
and they even cover issues beyond banking, such as virtual marketplaces or
couponing. Fintechs have also pioneered “robo advisors” like vaamo or Whitebox,
which are increasingly replacing human advisors in the banking business and have
gained popularity even with a number of traditional banks already. Typical
examples for innovative fintechs are Ripple (ripple.com/), Simple (www.simple.
com/), the Fidor Bank (www.fidor.de/), or solarisBank (www.solarisbank.de/). The
buzzword in all of these examples is “platform.” Successful participants of digiti-
zation have created platforms that offer new ways of doing business, but further-
more integrate a number of services previously unrelated to the business at hand.
In the food sector, stores and chains typically have an ordering problem when it
comes to fresh food: If they order too much, food is partially wasted; if they order
too little, they lose sales opportunities. Here, machine learning algorithms can help
retailers to determine their optimal stock levels, and we can expect that, once again,
Amazon will be at the forefront of new developments here when Amazon Machine
Learning gets applied to online grocery shopping service Amazon Fresh.
2
www.mckinsey.com/business-functions/digital-mckinsey/our-insights/three-snapshots-of-digital-
transformation
An important point in this context is the willingness to participate in digitization,

which has to include the top management of a company or agency. Several
examples shall illustrate this point. German steel giant Kloeckner is in the process
of moving its business into the digital age: Under www.kloeckner-i.com they
present various digital products for the steel industry, including a contract platform
for real-time information about contracts and direct ordering, a Web shop, a mill
certificate platform, and several notification services. They explain their goals as
follows: “Based on our digital solutions we are redesigning all supplier and par-
ticularly customer-related processes to be simpler and more efficient. And
kloeckner.i in Berlin plays a key role in making this happen.” However, kloeckner-I
goes beyond pure digitization of its existing supply chain, e.g., via predictive sales;
they also support startups that try to disrupt Kloeckner’s own business.
Our second example is from urban development and relates to the smart city
movement that was already mentioned in Chap. 3. We described what the city of
Milton Keynes, UK is doing in that respect, in particular in the context of their MK:
Smart initiative. Another example going in a similar direction, but also showing that
top-level consent and willingness is needed, is the city of London in the UK with its
Smart London Plan,3 intended for “using the creative power of new technologies to
serve London and improve Londoners’ lives.” As it says in the Executive Summary
of the plan: “The Mayor’s view is clear. To support London’s future growth, we
must look to what new approaches innovation in digital technology can bring. This
Smart London Plan is for Londoners, businesses, researchers, investors and
everyone who has an interest in the capital’s future. The Plan sits within the
overarching framework of the Mayor’s Vision 2020.”
Among the things London is doing within this initiative4 is the London Data
Store,5 which “has been created by the Greater London Authority (GLA) as a first
step towards freeing London’s data. We want everyone to be able to access the data
that the GLA and other public sector organizations hold, and to use that data
however they see fit—for free. The GLA is committed to using its connections and
influence to request other public sector organizations into releasing their data here
too, and it’s an objective backed strongly by Sadiq Khan, Mayor of London.” The
city is calling upon startups to generate ideas of what to do with that data and how
to exploit it. The Datastore comprises data on arts and culture, business and the
economy, crime and community safety, demographics, education, employment and
skills, environment, health, housing, welfare, sport, or transport. For example, the
transport section contains London Underground Performance Reports, where
anybody can see the total number of lost customer hours as well as other perfor-
mance measures; if a startup now has an innovative idea on what to do with that
data or how to improve on what that data states, it can do so and get in touch with
the City of London.
3
www.london.gov.uk/sites/default/files/smart_london_plan.pdf
4
www.london.gov.uk/what-we-do/business-and-economy/science-and-technology/smart-london
5
www.data.london.gov.uk/
5.2 Digital Transformation. The Chief Digital Officer 235
Just like other core competencies in a company have brought along a number of
positions in the “CxO” domain over the years (e.g., CEO, CFO, COO, CIO, or
CTO), there is room for a new one called the Chief Digital Officer (CDO), who is an
individual helping an enterprise or government agency, either at a national,
regional, or even local level. They drive growth by converting traditional analog
businesses to digital ones using online technologies and data, and at times over-
seeing operations in the rapidly changing sectors like mobile applications, social
media and related applications, virtual goods, as well as Web-based information
management and marketing.
As we said earlier, top management needs to be on board when it comes to
digital transformation. A CDO will typically report directly to the CEO of a
company, and possibly also to its board of directors and its shareholders. A CDO
will be able to monitor the main opportunities of digitization: improved process
efficiency, more effective GRC, better response to customer needs, cost saving, and
the development of new business models.
5.3 Disruption
We have already seen a number of ways in which the world is becoming

increasingly digital: printed airline tickets, manual bank transfers and signed
checks, as well as many other items of daily life have disappeared from visibility.
We have also seen, for example, crowdfunding as a new way to finance startups or
projects. But what is the difference, for example, between (a) the development of an
app through which taxis can be ordered online, but otherwise offers a standard taxi
service, and (b) a taxi service, like Uber, that does not even own taxis, but is
expected to reduce car sales by 20% worldwide? The former is digitization (which
even occurs gradually in this example), while the latter is disruption. Similar
examples, in which we follow Tom Goodwin, senior vice president of strategy and
innovation at Havas Media as he wrote in a 2015 essay on TechCrunch.com,
include:
• Airbnb: the world’s largest accommodation provider owns no real estate;

• Skype, WeChat: the largest phone companies own no telco infrastructure;
• Alibaba: the world’s most valuable retailer has no inventory;
• Facebook: the most popular media owner creates no content;
• SocietyOne (Australia): one of the fastest growing banks has no actual money
(similar comments may apply to other fintechs);
• Netflix, Amazon Prime Video: the world’s largest movie houses own no
cinemas.
According to Clayton Christensen, Kim B. Clark Professor of Business Adminis-

tration at Harvard Business School, who coined the term around 1995, “disruptive
innovation describes a process by which a product or service takes root initially in
simple applications at the bottom of a market and then relentlessly moves up

market, eventually displacing established competitors.” Examples include cell
phones, which disrupted fixed-line telephony, and personal computers, which dis-
rupted mainframes and minicomputers. He argues that traditional companies
innovate faster than their customers’ needs evolve, but in doing so can only
maintain the higher tiers of their respective market in order to stay profitable. They
focus on an improvement of their products and services for their best and most
lucrative customers, thereby neglecting other customer segments. A typical
example is German car manufacturers, who have been betting on Diesel engine
technology for ages, yet vastly overlooked developments like hybrid or electric
power sources; when they noticed, other manufacturers, most notably from Japan,
already had many years of experience in this field.
It is these other customer segments that disruptive companies attack first. For
traditional companies sustaining innovation in their area and maintaining prof-
itability in their lucrative segments, disruptors at the low end of their market often
go unnoticed. Christensen continues that “an innovation that is disruptive allows a
whole new population of consumers at the bottom of a market access to a product or
service that was historically only accessible to consumers with a lot of money or a
lot of skill.” The disruptors slowly work upwards and deliver what a large share of
the customers want; the latter gradually accept the new offers and thus make dis-
ruption happen. A disruptive innovation may even create an entirely new market
and value network, thereby disrupting an existing market and value network and
displacing established market leading firms, products and alliances.
A typical example comes from the photo industry: The world of photos was
dominated for more than a century by film development based on chemistry. Yet in
less than 20 years it was displaced by digital photography, companies like Kodak
lost their means of existence, and photography became readily available for the
masses. Google disrupted major portions of the advertising business in less than ten
years; in less than five years Apple first seized the business model of the music
industry, then that of mobile phone manufacturers.
Christensen distinguishes three forms of innovation: Efficiency innovation which
is about making processes like production or sales more efficient, so that more can
be achieved with less effort. Incremental innovation is concerned with improving a
product that is already good, such as a better car, thereby replacing an existing
product with a new one (which will hardly result in growth). Finally, disruptive
innovation transforms a product or service that has so far been complex and
expensive into a simpler and cheaper one, so that more and new customers can
afford it. According to Christensen, this is the only form of innovation that results in
true growth.
Hence not all innovations can be called disruptive, even if they are revolu-
tionary. For example, Ryanair is not a disruptor of the airline industry, since the
airline market remained intact; flying only got more accessible due to radically
lower ticket prices (and the concept that the price for a seat in a plane might be the
cheapest part of an airline ticket). Similarly, the first automobiles in the late 19th
century were expensive luxury items unable to disrupt the market for horse-drawn
5.3 Disruption 237
vehicles. Transportation was disrupted only later by the arrival of the Ford Model T
in 1908. Mass-production of cars was a disruptive innovation since, due to the new
production lines Ford installed, cars all of a sudden became affordable to many
people.
Christensen even has advice for companies on how to respond to disruption:
React to it when it happens, but do not overreact and don’t give up your established
business! Instead strengthen your relationships to the most important customers and
invest in innovation. In addition, they can create new business units focusing on the
opportunities of disruption, such as social innovation labs described earlier. For
some, the options to react are limited: The taxi business has no chance of competing
with the Uber model; the only action they could take to compete would be to buy
Uber, and since this will hardly happen, they can only continue to run their
established businesses for as long as possible. (Of course, there are other possible
actions, already being taken in some countries, namely trying lawsuits against Uber
or simply political lobbying.)
As an aside, we mention that Uber is not considered sustainable by a number of
people for various reasons. Among them is the city of Austin, Texas, which has
banned Uber and created its own service. “Unlike Uber, RideAustin is a non-profit.
It charges $2 off the top, and the driver keeps the entire fare—including tip. This
isn’t a model Uber can compete with, even though rides are competitively priced”.6
The following technologies are currently seen by many as the core technologies
for the immediate future upon which much of innovation and disruption will be
based:
• Mobile Internet
• Automation of knowledge work
• Internet of Things
• Self-driving cars
• Cloud technology
• Advanced robotics
• Genome research
• Energy storage
• 3D printing
• Unconventional oil and gas extraction
• Renewable energy
We are convinced that a particular role will be played by 3D printing technology,

which is already revolutionizing such diverse fields like various production areas,
logistics, jewelry, or dental prostheses.
6
www.thenextweb.com/apps/2017/03/15/sxsw-showed-us-the-future-of-ride-sharing-and-its-not-
uber/
5.4 The Price of Data. Publicity Versus Privacy
Much has been written in previous chapters about making the customer vitreous by
utilizing all the data he or she creates or simply leaves behind. We discussed how
retailers like Amazon use data mining techniques to analyze what customers might
like to buy, how streaming services like Netflix try to recommend movies a sub-
scriber might like to watch, or how social networks like Facebook deeply analyze
user activities in order to show them the “right” advertisements. They all follow the
slogan “the more you fill in, the better your experience will be.” Indeed, the
experience on the Web today is that “innocently clicking on a link results in ad
targeting that’s hard to shake and our purchases quickly reveal more information
than we intend, such as the infamous example of Target knowing a woman is
pregnant before she’s told her family—and before she’s purchased any baby
products.”7
We also discussed how health or car insurers already are or soon will be utilizing
customer data in order to predict their individual risk and base premiums on the
resulting ratings. Various questions come to mind in light of this situation that is
close, if not already beyond Orwell’s “1984” vision:
• Is the massive utilization and exploitation of data, of which we are currently

experiencing just the beginning, a cure or a curse?
• What actually is the value of an individual’s data?
• What would happen if a person drops out of this data collection madness?
• Is it at all possible to withdraw from it?
We can immediately answer “no” or “possibly not” to the last question, since
although many sites allow the user to configure his or her own settings or they can
simply stay away from certain sites, a complete withdrawal would mean to give up
Internet usage as we know it. But if data collection and analysis continues as we
experience it today, and there is no reason to assume otherwise, some of the
implications will be the following:
• Dating sites will be able to predict when you are lying.

• Surveillance will soon get really Orwellian and make the 2054 vision of the
“Minority Report” movie, where people can get arrested for future crimes, a
reality. The Los Angeles Police Dept (LAPD) is already using predictive
policing software and has successfully prevented crimes.
• Recommendation engines will get much smarter since they will get to know
users much better than today.
• Scientists and doctors will be enabled to make sense of your genome and will
routinely use that information to predict or cure diseases. However, insurers will
also want to get hold of that data.
7
www.fastcoexist.com/3057514/your-data-footprint-is-affecting-your-life-in-ways-you-cant-even-
imagine
5.4 The Price of Data. Publicity Versus Privacy 239
• Intensely personal data gets crunched in order to attract customers (or bribe or
blackmail them); take a look at stalkscan.com to find out what Facebook already
knows about you today, which may be more than your mother does, according
to a 2015 study.8 If you install browser extension dataselfie (dataselfie.it), it can
show you your own data traces and reveal how machine learning algorithms use
your data to gain insights about your personality while you are on Facebook.
• Early detection can mitigate catastrophes. This will be particularly beneficial for
earthquake forecasting and prediction and sites like www.quakeprediction.com,
which so far mostly explore statistics and evaluate computational models. By a
similar token, weather forecasts might become significantly more accurate than
they are currently.
The value of data can be assessed in various ways and from various perspectives,
such as the shareholder, the company, or the individual user. Examples for the first
option are easy to find: When Facebook bought WhatsApp for US$ 19 billion in
2014 that amounted to roughly US$ 30 for each of the network’s 600 million users
at the time. Facebook paid the same amount in 2012 for each Instagram user when
they were acquired. In 2016, however, when Microsoft bought LinkedIn for US$
26.2 billion, that price tag had already doubled (LinkedIn had about 433 million
users at the time, resulting in roughly US$ 60 per user). For Microsoft, this was still
highly beneficial, since it now got access to LinkedIn’s social graph, users’ location
and address information, as well as user interests and skills, all of which they did
not have before.
Companies often go through a data broker when buying user data, and a long list
of such brokers can, for example, be found at www.privacyrights.org. An example
is Gild (people.gild.com/), which transforms talent acquisition and hiring processes.
Gild identifies candidates who fit a job opening and analyzes factors that can predict
their success. It “has built a database of tens of millions of professionals that
contains data purchased from third-party providers plus ‘anything and everything
that’s publicly available’.” A consequence is that job applicants are often surprised
at how much an interviewer knows about them ahead of time.
Several sites meanwhile help people find out about the value of their data
themselves. This has been triggered by past breaches or unauthorized dissemination
of customer or user data, even when customers have especially paid for their
information to be kept private.9 One such site is www.totallymoney.com/personal-
data/, showing how cheap user data can be purchased. Another had been established
by the Financial Times in 2013 and allowed a user to go through categories like
demographics, family and health, property, activities, and consumer behavior to
determine the value of his or her data. Marketplaces for personal data that even
offered customers to exchange their data for money or benefits, such as Enliken or
Handshake, typically cannot be sustained for very long.
8
europe.newsweek.com/does-facebook-know-you-better-your-mother-or-roommate-299171
9
www.techcrunch.com/2015/10/13/whats-the-value-of-your-data/
Which brings us to the issue of protecting privacy, which we will briefly discuss.
It is easy today to become a victim, e.g., of phishing or skimming, to become the
target of a virus or a Trojan horse which encodes the local hard-disk and requests a
“fee” for giving it back, i.e. ransomware. Worse, there are even tools like the
browser extension Web of Trust (WOT) that pretend to protect the user from unsafe
browsing, but then collects your browsing data in the background and even sells it
to third parties.10 So let us at least clarify a few things.
Privacy protection is not concerned with the protection of data, but with the
protection of people, while data security deals with the protection of data against
attacks, unauthorized access, or unintended errors or incidents. Privacy protection
protects against a misusage of personal data and is often regulated in domestic laws
(or at least design principles, see Further Reading for this chapter). Yet reality is
different, and even CEOs of Internet or computer companies have more than once
made it clear that privacy is no longer an option in today’s world. As Bruce
Schneier wrote in his blog on security issues in 2010, “we’re not Google’s cus-
tomers; we’re Google’s product that they sell to their customers.”11
So the important point is to be aware of the fact that privacy may be at stake in
whatever we do on the Internet and on the Web, to be aware that privacy and
personal data are valuable goods today which companies and organizations can
make money from if we do not take relevant precautions, and that it makes sense to
regularly monitor and update our privacy settings wherever we are active online
(and maybe even read the terms and conditions section that pops up whenever a
new registration is entered).
5.5 Towards Sharing and On-Demand Communities
A recent announcement from US car manufacturer Cadillac can be understood as

the beginning of the end of car ownership; it came out in early 2017: Cadillac
introduced a new subscription program that lets a user drive any Cadillac car for a
monthly fee of US$ 1,500 which includes insurance and maintenance. Users can
even exchange their cars (e.g., a sedan against an SUV or vice versa) up to eighteen
times per year. The concept does not involve a long-term contract, and users can
unsubscribe at any time. When a user exchanges a car, the new one will be
delivered and will have all personal settings (e.g., radio programming, seat posi-
tions and other settings) from the previous car already installed.
While the Cadillac model might not appeal to everybody at the given price tag, it
can be expected that other manufacturers will follow with similar business models.
While car sharing has been around for a few years already (through providers such
10
www.pcmag.com/news/349328/web-of-trust-browser-extension-cannot-be-trusted
11
www.schneier.com/blog/archives/2010/12/security_in_202.html
5.5 Towards Sharing and On-Demand Communities 241
as Flinkster, Drivy, or Tamyca), especially in densely populated urban areas where

people do not want to own a car anymore, car manufacturers are responding to the
decreasing interest in car ownership, especially among younger people, by estab-
lishing new business models which put the emphasis on mobility, of which a car,
whether owned or leased, is just one mode. For example, German manufacturer
Mercedes-Benz offers car2go in European and North-American cities (see www.
car2go.com), while BMW has teamed with rental car provider Sixt and started the
DriveNow car sharing joint venture (de.drive-now.com/, and Daimler and BMW
are also considering a fusion of their services).
These examples show one of several major current trends: the share (or sharing)
economy, where goods or services are no longer owned by one party and used
(against a fee or for free) by another, but equally shared by a number of participants.
The share economy today has a root in the open-source community, where the idea
of sharing programs and applications has been fundamental for a long time already;
today the notion comprises considerably more areas than just IT, as our examples
show. If you want to share traffic information with other people in your area, put
them on Waze.com, the world’s largest community based traffic and navigation
app.
A definition from www.thepeoplewhoshare.com/blog/what-is-the-sharing-
economy/ says: “The Sharing Economy is a socio-economic ecosystem built
around the sharing of human, physical and intellectual resources. It includes the
shared creation, production, distribution, trade and consumption of goods and
services by different people and organizations. Whilst the Sharing Economy is
currently in its infancy, known most notably as a series of services and start-ups
which enable P2P exchanges through technology, this is only the beginning: in its
entirety and potential it is a new and alternative socio-economic system which
embeds sharing and collaboration at its heart—across all aspects of social and
economic life.”
Sharing is not necessarily intended to mean “for free,” since often the use of or
access to shared physical or human resources or assets involves paying a fee. Also,
the term “sharing” is meant in a very broad sense, since it could consist of swap-
ping, shared ownership, co-operatives, trading of used goods, borrowing, lending,
subscription-based models, peer-to-peer, pay-as-you-use economy, crowdfunding,
crowdsourcing, as well as a number of other forms. Rifkin (2014) speaks of Col-
laborative Commons12 in this context, “a digitalized space where providers and
users share goods and services” which operate “at near zero marginal cost,
undercutting the higher fixed and marginal costs of conventional businesses.” In the
brick-and-mortar or analog world, the concept is an old one, for example used by
co-operative banks where customers are also shareholders and hence owners, or
used in agricultural applications where farmers collectively own and share equip-
ment. Collaborative Commons are characterized by the fact that many people need
to contribute to a common goal or platform, but also benefit from it.
12
www.huffingtonpost.com/jeremy-rifkin/uber-german-court_b_5758422.html
The concept of sharing can be taken from “joint usage” as in the

(business-to-customer) car sharing example to a customer-to-customer version,
where people rent something they currently do not need or use to other people.
Numerous examples can be found in this category, including
• Airbnb, where travelers can rent a room or an apartment;

• DogVacay, where dog owner can leave their dog with someone who will take
care of the dog;
• RelayRides, where people can rent cars by the hour or day; a similar concept is
Getaround;
• Liquid applies the C2C renting concept to bikes;
• TaskRabbit or Zaarly, where people can hire other people for doing jobs and
tasks, be it delivery, handyman, or office help;
• Lyft, a ride sharing service for people to find rides from people who have a car;
unlike Uber, Lyft drivers only take “donations” since they are not a taxi service;
similar services include SideCar, Wingz, or Fasten;
• Lending Club or Lendico, peer-to-peer networks through which people can lend
money to other people;
• Fon, a network for sharing some of your home Wi-Fi network in exchange for
getting free Wi-Fi from any other member of the network;
• Blablacar for joint car rides and shared costs.
What these examples have in common is an important technical and organizational

underpinning that we have already mentioned, namely that they are all based on a
platform. As Keese (2016) explains, platforms are the “hotspots of the digital
economy.” Most of value creation today occurs on platforms which bring together
some form of supply and demand. Platforms own the data from both sides: the
provider’s supply data and all of the customer’s contact, consumption and payment
data. In this way, platforms create an ecosystem in which different partners coop-
erate and collaborate on the basis of commonly agreed rules; each partner has his
own goals, yet the distinct goals of platform participants have to be compatible. So
in a sense platforms are the modern marketplaces, which holds for all of our
examples above, yet they enjoy a number of advantages over traditional market-
places. Most notably, a traditional marketplace is crucially dependent on its sup-
pliers and consumers, both of which can easily move on to another marketplace if
they feel it will serve them better. Suppliers and consumers can put pressure on a
marketplace in various ways, in particular when it comes to interest margins.
Platforms, on the other hand, can put pressure on their community of users, both
suppliers and consumers, and their margins will constantly be on the rise.
We have come across the notion of a platform in a technical sense in Chap. 2
already, where we discussed platform-as-a-service (PaaS) as one of the categories
of cloud services. The effect is the same, whether a platform is mainly technical or
organizational; in both cases there is a provider who establishes the platform, and
who takes advantage from the fact that providers and consumers meet on the
platform and leave their digital traces, mostly in the form of data.
5.5 Towards Sharing and On-Demand Communities 243
While many more examples for platforms in the digital age can be found, it is
worth mentioning that their success is also due to a phenomenon that is related to
the concept of sharing: the on-demand society. As could be read in The Atlantic in
2016, “when today’s consumers want to watch a TV show, they can watch it when
they want on Netflix. When they want to buy household goods, they can order them
from Amazon, even when the stores are all closed. And when they want a car, they
can just book a Zipcar or hail an Uber, without owning a car.”13 This concept is no
longer restricted to the media or transportation, but has reached such qualified jobs
like lawyers (Upcouncel), programmers (Topcoder), consultants (Eden McCallum),
delivery personnel (Postmates), home butlers (Hello Alfred), or sales professionals
(Universal Avenue). Platform Upwork meanwhile connects 9.3 million freelancers
to 3.7 million companies worldwide.
At a more abstract level, “the On-Demand Economy is defined as the economic
activity created by technology companies that fulfill consumer demand via the
immediate provisioning of goods and services. Supply is driven via an efficient,
intuitive digital mesh layered on top of existing infrastructure networks. The
On-Demand Economy is revolutionizing commercial behavior in cities around the
world. The number of companies, the categories represented, and the growth of the
industry is expanding at an accelerating pace.”14
We conclude this section and chapter by returning to one of the examples of
modern enterprises that fit almost any aspect of Web shopping, cloud services,
digitization, technical aspects such as data mining or recommenders, and novel
business models: Amazon, already used as an example in more than one place in
this book, is also one of the core examples that serves the on-demand society.
Fastcompany has recently selected Amazon as the world’s most innovative com-
pany of 2017,15 and in their justification they give several reasons: First, Prime,
Amazon’s membership program that we already mentioned in Chap. 3, is con-
nected to almost all of Amazon’s recent innovations. It is used by an estimated
40–50 million people in the US alone, and besides preferred shipping of ordered
items or ad-free viewing of streaming video, “what Prime is selling most is time,”
according to Fastcompany: Whatever people want, they nowadays want it in the
shortest time window possible. Amazon meets this demand by same-day delivery or
Prime Air, but also by innovations like the Amazon Dash button, a gadget through
which Prime members can order the favorite products by pressing a button provided
by Amazon. If you cannot order via Dash, Amazon offers to do so via Alexa, “the
voice service that powers Echo, provides capabilities, or skills, that enable cus-
tomers to interact with devices in a more intuitive way using voice. Examples of
these skills include the ability to play music, answer general questions, set an alarm
or timer, and more. Alexa is built in the cloud, so it is always getting smarter. The
more customers use Alexa, the more she adapts to speech patterns, vocabulary, and
13
www.theatlantic.com/entertainment/archive/2016/06/the-on-demand-society/489257/
14
www.businessinsider.com/the-on-demand-economy-2014-7
15
www.fastcompany.com/most-innovative-companies/2017
personal preferences.”16 Alexa can also take orders, such as “Alexa, reorder
toothpaste,” and scan through all the data that Amazon has collected about the
respective customer via Prime; so the system will know the kind of toothpaste to be
ordered.
To round out the picture on Amazon’s innovative business ideas, the company is
experimenting with Amazon Go, a concept for convenience stores in which a
shopper can swipe a code on his or her mobile phone when entering the store, and
everything taken from the shelve will thereafter be added to a digital cart that is
automatically paid for from an existing customer account upon exit. Thus, a cus-
tomer can skip both the line and cash register when done shopping. Another
shopping concept under development resembles what we have reported earlier
about Starbucks regarding the pickup of coffee previously ordered via an app;
customers of Amazon Fresh will be enabled to fill their digital carts remotely, pay
online, and pick up their purchase within a certain time window.
Last, not least, Amazon is opening brick-and-mortar book stores that “solve one
of the biggest problems with online shopping: discoverability”.17 Amazon Books
will represent data-driven book stores where customers will be likely to “pick up a
book that you didn’t know you wanted to read.” There are various differences to
traditional bookstores, like the fact that all books are facing out so that their cover
can be seen. All of them have received an online rating between 4.6 and 5 stars, and
the display even shows customer reviews. Clearly, a setup like this allows fewer
books to be held in the store, but this is compensated for by the fact that Amazon
already knows its customers in the vicinity of the store from online shopping
patterns, so that a store can tailor its selection to the local crowd; moreover, a
customer can always order a book not present through a terminal available in the
store. The bottom-line is that a company like Amazon not only has contributed
numerous inventions since the incarnation of the Web, but continues to do so when
it comes to bridging the physical and the virtual world.
5.6 Further Reading
Brynjolfsson and McAfee (2014) discuss digitization, innovation, and variety of

related topics, and they predict in The Second Machine Age that a “tsunami” of
groundbreaking inventions is approaching and that digitization will reach every
sector of living. According to them, we are at a turning point from which onward
digital technologies will be as revolutionary for our economy and society as the
steam engine was during the First Machine Age.
16
www.developer.amazon.com/alexa
17
www.fastcodesign.com/3067020/with-amazon-books-jeff-bezos-is-solving-digital-retails-
biggest-design-flaw
The components of a business model go back to Michael Rappa and the Web site
digitalenterprise.org/ he created. Ovans (2015) is a good discussion of what a
business model is. A 2015 report on the Business Model Canvas can be downloaded
from the Strategyzer blog at blog.strategyzer.com/, although a more authentic source
is Osterwalder and Pigneur (2010). Gassmann et al. (2014) apply it to derive 55
different business models for a variety of areas. Various videos in which one can hear
Osterwalder speak can be found on YouTube. Modern innovation labs have a
famous precursor in Bell Labs which were founded in the 1920s; their success story
is described by Gertner (2012). Our discussion of social innovation labs follows
Schönthaler and Oberweis (2013). The business models of Airbnb, Uber, and others
and how they influence our world is the subject of Stone (2017).
Design Thinking is a discipline that has emerged from a variety of fields,
including such diverse ones as mechanical engineering, architecture, urban plan-
ning, organizational learning, or process improvement. It basically resembles an
innovation process or cycle in the style of Fig. 5.1, yet is more than just a creative
process. It is considered a “new way of seeing people in relation to work, of
imagining the concept of work and of posing questions about how we want to live,
learn, and work in the 21st century. The appeal of Design Thinking lies in its ability
to inspire new and surprising forms of creative teamwork” (hpi.de/en/school-of-
design-thinking/design-thinking.html). The approach has been made popular by
Stanford University’s d.school (dschool.stanford.edu/) and is described in books,
for example, by Lockwood (2009) or Yayiki (2016); for the latter see also www.
artbiztech.org. Peffers et al. (2006, 2007) have made the approach popular in
information systems research and break it down into the following phases (compare
these to Fig. 5.1, but also to what we have stated about social innovation labs):
1. Problem identification and motivation

2. Objectives of a solution
3. Design and development
4. Demonstration
5. Evaluation
6. Communication
Regarding the digital transformation, oecdinsights.org/2016/06/03/digital-

innovation-what-does-it-really-mean/ is an interesting read written by Paul Chaf-
fey, State Secretary in the Norwegian Ministry of Local Government and
Modernization.
An interesting blog about innovation is innovationexcellence.com/, which has
been maintained since 2006. Another excellent introduction to innovation is given
by Denning and Dunham (2010), including most of the basic knowledge used here
to prepare our contributions about Social Innovation (Labs). The basic message of
Tina Seelig (2012, 2015) from Stanford University is that creativity, a driver behind
innovation, is something everybody can learn, and that everyone can be taught how
to make imaginative ideas a reality. Disruption is a concept that was coined by C.M.
Christensen, for example in Christensen (1997). His “Innovator’s Dilemma” shows
why outstanding companies that had their competitive antennae up, listened closely
to customers, and invested aggressively in new technologies still lost their market
dominance. Using examples like the hard-disk drive industry, he argues that good
business practices can even weaken a great company. In the absence of break-
through innovations, which may initially be rejected by potential customers, many
enterprises let their most important innovations languish and then face the inno-
vator’s dilemma: On the one hand, keeping close to customers is critical for success
and survival, but the other, long-term growth and profits depend upon a different
managerial approach. In a similar direction argues Docherty (2015), when he says:
“If you ask professionals, especially executives within large companies, what
images and thoughts come to mind with the word ‘disruption’, it’s usually not good.
Disruption is too often thought of as something that you didn’t see coming—
something that happens to you by outside forces, especially by startups. It doesn’t
have to be that way. Collective Disruption is about changing that paradigm and
learning to embrace disruption through collaboration.”18 Thiel, a successful Silicon
Valley investor, and Masters (2014) present another way of thinking about
innovation.
Canadian data protectionist Ann Cavoukian has created the concept of “Privacy
by Design” which essentially means “privacy protection by technology” and aims
to guarantee that privacy protection is already “built into” the development of
technical devices, as opposed to being added later when the first breaches have been
discovered. The concept is based on seven principles intended to promote privacy
and data protection compliance from the very beginning. Although not a law,
Privacy by Design is often recommended, in particular from official sides like the
British Information Commissioner’s Office.19 As a starting on the issues of privacy
protection and data security, the reader may consult Bazzell and Carroll (2016).
Lindner (2016) is an introduction to the European Data Protection Law. Schneier
(2016) uses concrete examples of bad behavior with data, in order to alert people of
what is happening behind their backs, but also indicates what they can do.
On the positive side, it is undeniable that Big Data has the potential to expand
our understanding of humanity. A recent project in this direction is the Kavli
Human Project (www.kavlifoundation.org/kavli-human-project), a massive scien-
tific undertaking to launch a “study of all of the factors that make humans…
human.” The project plans to recruit 10,000 New York City residents in approxi-
mately 2,500 households and monitor and measure them 10 years.
Rifkin (2014) discusses the sharing economy. An interesting read regarding the
on demand society is www.atelier.net/en/trends/articles/how-demand-economy-
remodelling-society_440900. Wrap-ups of the history of Amazon are Brandt (2012)
or Stone (2014). Rossman (2016) is an introduction to the corporate culture of the
world’s largest Internet retailer. Keen (2015) criticizes the on-demand economy as
18
innovationexcellence.com/blog/2015/01/26/collective-disruption/
19
ico.org.uk/for-organisations/guide-to-data-protection/privacy-by-design/
the “operating system of a new and increasingly unfair Silicon-Valley capitalism.”

Samit (2015) is a guide to what the individual can do to keep up with the “Era of
Endless Innovation.” Lanier (2014), the “father” of virtual reality, outlines “an
information economy that rewards ordinary people for what they do and share on
the web.”
The Road Ahead: Living in a Digital
World 6
Worldwide, economic activities are now largely driven by information and com-
munication technologies. Indeed, few areas of society remain untouched by the
disruptive impacts of ICT, and there is little doubt: we are not only rapidly heading
towards the digital economy, but to an entirely digital world as well. So the
question is: how do we want to live, learn, and work in this world? Future-focused
answers must be found to these questions. This calls for a worldwide cultural and
transnational, interdisciplinary discourse based on shared values, following the
vision of global welfare and harmony. This can only succeed if, in particular, the
economically strong countries on this planet accept their global responsibility for
humanity, trust, security, and accountability.
Since the dawn of ERP systems, the sole role of machine-generated data has
been to enable the proper execution of a supply chain. With the ongoing adoption of
cyber-physical systems and the Internet of Things, machine-to-machine commu-
nication is enabling collaborative shop floor planning, and machine generated
(big) data offers unprecedented value. This paradigm shift calls for new smart ERP
systems that make use of big data in (predictive) planning throughout the entire
value chain and create the transparency for improved governance, risk, security, and
compliance management. And even if companies think they have adapted to the
situation today, future change is almost certain. Therefore, we have to accept, and
respond to, what we can foresee now, and this is includes both small-scale (sensors,
beacons, etc.) technologies as well as those with broader impact (e.g., Internet of
Things, Industry 4.0). We could attempt to try to predict the future, but should be
careful with this for obvious reasons. We are better equipped (using the tech-
nologies that we have just discussed) than ever before to make such predictions, but
many uncertainties remain.
In this final chapter, we try to answer the question about what to expect from
living in a digital world, and we concede at the outset that without a crystal ball, we
cannot assure the reader that our prediction is any more accurate than any others
they read. We join a long line of “futurists” who have tried to predict the future of
technology. For example, Kahn and Wiener (1967) were among the most profound
DOI 10.1007/978-3-319-60161-8_6
250 6 The Road Ahead: Living in a Digital World
of the earlier futurists and in their book got many predictions right, but apparently
underestimated the comprehensive influence of information technology and com-
puters. German tech giant Siemens had a study entitled “Horizons2020” conducted
by market researcher TNS Infratest, whose results were published in October 2004;
it outlined two possible scenarios of how life might look in 2020.
In the first scenario, they present a future where people are generally skeptical of
technology, and even explicitly create free space from what they term an “engi-
neered environment,” thereby accepting stagnation in the European economy.
A strong government guarantees education, security, and health for its citizens; both
genders are equally represented at all levels of leadership. However, many families
need second and third jobs, in order to finance their living, yet the society is
generally open to technical innovation. For example, growing environmental con-
sciousness enables a breakthrough for fuel cells, quantum computers become a
reality, and automated translation systems guarantee the survival of exotic local
languages. Data protection agencies have popularized the view that it is “uneco-
nomic” to connect objects of everyday life.
In the second scenario, market and competition determine the rules and the speed
of life. If you want to reach a high standard of living, you cannot waste time on
anything and vice versa. Government is reduced to core duties and it otherwise
leaves its citizens to manage their own lives. Traditional moral ideas have been
replaced by a pursuit of what best serves the individual. A result of this is high
social tension; a “relevant” number of people are living just above the breadline.
The retirement age (in Germany) has gone up to almost 70 years, but work life now
includes several interruptions for executive education or sabbaticals. Humanity is
unwilling to abandon nuclear power, and is happy to employ any technology
available for influencing human life even prior to birth. Ubiquitous Computing has
become a reality, and people are surrounded by a host of autonomous systems.
Looking at the results of this study when it is almost 2020, it is clear that both
scenarios have not become a reality, although present-day life has quite a few
commonalities with both. We want the reader to keep this in mind for the remainder
of this chapter, where we try to take a look into the near future and what it will bring
along in terms of the topics we have previously discussed.
6.1 Cyber-Physical Systems and the Internet of Things
In their bestseller “The Second Machine Age” Brynjolfsson and McAfee (2014)
vividly reveal their vision of the technological, social and economic changes that
we will need to adjust to. It comes as no surprise that such changes will not only
bring along winners, but also losers. Therefore, it is upon the political and social
powerful and elite to create a framework that opens future opportunities to the
broad population and mitigates unavoidable risks. Protectionist measures, which
result in restrictions on global trade flows and the subsequent reduction of the
global distribution of the value chain, and which see the solutions to structural
6.1 Cyber-Physical Systems and the Internet of Things 251
2nd industrial
1st industrial revolution: 3rd industrial 4th industrial
revolution: mass
mechanization, water & revolution: computers revolution: cyber-
production, assembly
steam power and automation physical systems
line, electricity
Fig. 6.1 The four industrial revolutions (so far)
problems in industries from the 60s as well as an untamed financial industry are
certainly not suitable.
Industry now stands at the beginning of its 4th industrial revolution (see
Fig. 6.1). Via the evolution of the Internet, the real world and the virtual world are
increasingly converging, to form an Internet of Things (IoT). Turner (2016) defines
the IoT as a “network of networks of uniquely identifiable endpoints – or things –
that communicate without human interaction.” International Data Corporation
(IDC), a global provider of market research, predicts “that the worldwide installed
base of IoT endpoints, or connected devices, will reach nearly 30 billion by 2020,
representing a compound annual growth rate (CAGR) of 19.2%.”
Consider this current example. “The tomato has an iron deficiency” is what app
Plantix recognizes, and it immediately suggests a cure. Plantix is a disease diag-
nostic and monitoring tool developed by German startup PEAT (short for Pro-
gressive Environmental & Agricultural Technologies). Plantix studies photos, like
in the case of the tomato, where their leaves were lacking color and its veins stick
out in green, and discovers the net pattern that is typical for an iron deficiency. The
app has been trained on more than 1,500 images of plants with that deficiency, and
it is now familiar with more than 40 plants and more than 100 deficiencies. While
the program is currently used from smartphones, the vision of its developers is that
in the future solar-powered robots will plow through fields in order to discover pest
plants with their built-in cameras and destroy them.
Plantix is just one glimpse into a future that will (fortunately) look considerably
different from what science fiction movies have been promising us for ages. We are
experiencing the arrival of a generation of machines that are heavily based on
algorithms, and that are able to conclude and think faster and more reliable than a
human ever could. Computers can search patient files and doctors’ statements for
rare diseases. They compute the credibility of bank customers and decide on the
investments of rich people. They automatically maneuver cars into parking spots
and apply the brakes to avoid accidents.
Artificial intelligence (AI) as we are experiencing it today is heavily based on

statistics and statistical computations, which is what made AI finally successful
during the last 10 or so years. Previously, AI had relied on symbolic representations
of things and on the discovery of rules on how to manipulate these representations.
By switching to techniques such as statistical computations, machine learning,
genetic and deep-learning algorithms, or cognitive computing and being based on
sufficiently large training sets, AI can nowadays predict the most likely correct
solution to a given problem. For example, voice recognition associates a probably
correct letter with the sound of a voice, and the result have been platforms like IBM
Watson (see next section), Apple Siri, Amazon Alexa, Microsoft Cortana, Google
Assistant, and now Samsung Bixby, which have enabled us to enter into a new era
of voice-controlled applications (see www.businessinsider.de/siri-vs-google-
assistant-cortana-alexa-2016-11 for a comparison). All of this needs a combina-
tion of algorithms, computational power, as it is available today, and data. This has
led to a race for the best cloud infrastructure, the most intelligent AI applications,
and the most appropriate data sources. According to IDC market research, the
investments in AI for all of hardware, software, and services will rise from US$ 8
Billion in 2017 to US$ 47 Billion in 2020; whether this will, however, result in a
growth of the economy remains to be seen. A recent post on the KDNuggets blog
lists 50 companies which are considered current leaders in AI (see www.kdnuggets.
com/2017/03/50-companies-leading-ai-revolution-detailed.html).
We are currently observing an increasing emergence of cyber-physical systems,
i.e., systems that are essentially physical or mechanical systems yet are
computer-controlled, like autonomous automotive systems, medical monitoring, or
process control systems. Taken all of this together, is it no longer surprising that
Industry 4.0 will fundamentally comprise the “smart factory”: “Within the modular
structured smart factories, cyber-physical systems monitor physical processes,
create a virtual copy of the physical world and make decentralized decisions. Over
the Internet of Things, cyber-physical systems communicate and cooperate with
each other and with humans in real time, and via the Internet of Services, both
internal and cross-organizational services are offered and used by participants of the
value chain.”1 The essential ingredient will be data that is produced by the various
entities involved, that is communicated via the appropriate channels, that is ana-
lyzed in ways that allow for timely decisions, reactions, prescriptions, and pre-
dictions. These are then, ideally in real-time, communicated to wherever they are
needed with the goal of increasing productivity, quality, and flexibility, for example
within the manufacturing industry.
An interesting early example of a cyber-physical system was MIT’s Robotic
Garden, in which robots took care of tomato plants.2 More recent examples can be
found in the context of smart homes and ambient assisted living. Duravit produces
the toilet of the future, called BioTracer, which is capable of automatically ana-
lyzing a user’s urine and providing the results through a smartphone app; Japanese
1
www.en.wikipedia.org/wiki/Industry_4.0
2
www.youtube.com/watch?v=lJpnoRHba_Y
6.1 Cyber-Physical Systems and the Internet of Things 253
manufacturer Toto has developed similar technology. Kickstarter project Perseus3 is

building a smart mirror that is equipped with a camera, speakers, Wi-Fi and
Bluetooth connectivity, and a quad-core processor. It can be customized in various
ways, and distinguishes between family members via specific voice commands.
Since the mirror has an Internet connection, users may plan their day while
brushing their teeth in the morning; the mirror can also show traffic information,
access e-mail accounts, or show the TV schedule.
The confluence of lighting, air conditioning, heating, front-door security, and
window shades is well established, in order to support convenient and ambient
living that will shortly also include domestic robots; platforms like wibutler or
iExergy intend to act as integrators among the many smart home products now
available.
Beyond individual homes, at the city level Singapore “has established itself as a
test bed for data-driven, autonomous urban systems. On the National University of
Singapore campus, Airbus drones are delivering packages, part of a trial that will
eventually extend out to the port, and potentially even ships waiting off the coast.
Self-driving taxis are being tested in a 6.5-square kilometre part of the One-North
business district”.4 Or, as Dieter Zetsche, CEO of Daimler, announced at the 2017
South by Southwest (SXSW) conference in Austin, Texas, the car of the future is
intended to be a “steward” for its users, who knows the driver’s preferences and
personality. The self-driving car may be able to get something from the bakery for
breakfast even before the driver has risen from his bed. Applications like these
require large amounts of data, which are partially collected by the car itself and
partially obtained from other companies. Among these others could be both friends
and enemies, or “frenemies,” depending on the application.
More of these developments will soon arrive at the intersection of AI algorithms
and machine learning, voice recognition, and virtual reality. They present enormous
opportunities for startups, but also traditional companies which are open to the
progress enabled by digitization. As we will discuss shortly, there are, however,
threats hidden in these developments, of which the individual needs to be aware in
order not to be taken by surprise.
From a global perspective, and starting with cloud computing which has enabled
access to remote hardware and software even when you are in manufacturing,
developments have already gone beyond what was envisioned to include
cyber-physical systems as well as the Internet of Things. The latter, often abbre-
viated to IoT, refers to the development that more and more mobile as well as
stationary “things” or items will be equipped with intelligence and will be able to
communicate with each other. We have already mentioned examples to which this
pertains, namely connected cars or intelligent transportation systems in smart cities.
We also remind the reader of HP’s CoolTown project mentioned previously, which
had exactly this vision. IoT will, however, practice this on a considerably larger
scale, and will integrate applications from such diverse areas as environmental
3
www.perseusmirrors.com/
4
www.raconteur.net/current-affairs/singapore-the-robot-city
monitoring, infrastructure management, energy management, manufacturing,

medical systems, home automation, transportation, and consumer applications. IoT
will enable anytime, anyplace, and anything communications, the latter directly
between machines or things, but also between humans and machines or things in
any direction. In order to facilitate this, International Telecommunication Union’s
Study Group 205 is addressing the non-trivial, yet necessary standardization
requirements of IoT technologies, which involve devices from a host of distinct
manufacturers, with an initial focus on IoT applications in smart cities and com-
munities. SG20 is developing standards that shall enable the coordinated devel-
opment of IoT technologies, including machine-to-machine communications and
ubiquitous sensor networks.
We note that there are even search engines available today which are especially
tailored towards IoT. One is Shodan (www.shodan.io/) that lets you search for the
Web and IoT in general, but more specifically also for buildings, refrigerators,
power plants, or webcams. In their own words, “Shodan is the world’s first search
engine for Internet-connected devices.” Another one is Censys (censys.io/), “a
search engine that allows computer scientists to ask questions about the devices and
networks that compose the Internet. Driven by Internet-wide scanning, Censys lets
researchers find specific hosts and create aggregate reports on how devices, web-
sites, and certificates are configured and deployed.”
Clearly, search engines like the ones just mentioned can also be misused, for
example to identify a power plant which is subsequently attacked either digitally or
physically. As with autonomous cars, which may run into ethical decisions when
challenged to react to a non-standard situation, there are numerous problems
associated with IoT which are currently open from a research perspective. Misuse
and hacking are just two of them. Worse is the fact that even governments are
among the biggest collectors of big data through approved surveillance operations.
For example, the US National Security Agency (NSA) collects internet commu-
nications from a variety of major US Internet companies such as Google through
their PRISM software. The NSA operates on the basis of the Foreign Intelligence
Surveillance Act (FISA) of 1978 Amendments Act of 2008, which is an Act of
Congress that amended the Foreign Intelligence Surveillance Act and which has
been used as the legal basis for mass surveillance programs. This was disclosed by
Edward Snowden in 2013. As the world learned from Snowden, the NSA collects
data on heads of government, US citizens and their phone calls or connection data,
embassies, banks, or the Organization of the Petroleum Exporting Countries
(OPEC), to name just a few. Besides PRISM, there were other programs in use,
including XKeyscore or Blarney, and the NSA did not operate in isolation, but in
the context of projects such as Fairview or Tempora it collaborated with interna-
tional companies or other security agencies. The interested reader is referred to
www.dailydot.com/layer8/nsa-spy-prgrams-prism-fairview-blarney/ for an over-
view of NSA spy programs.
5
www.itu.int/en/ITU-T/about/groups/Pages/sg20.aspx
6.2 The Smart Factory and Industry 4.0 255
6.2 The Smart Factory and Industry 4.0
The Internet of Things (IoT) and cyber-physical systems (CPS) are the techno-
logical basis for the digitalization of traditional industries such as manufacturing
and logistics. As discussed above, a CPS is a system of collaborating IT elements,
designed to control physical (mechanical, electronic) objects and processes. Com-
munication takes place via data infrastructure such as the Internet. Traditional
embedded systems can be considered as a special case of a stand-alone CPS. The
key characteristics of the industrial production of the future will include production
of extensively individualized products, within highly flexible production environ-
ments, early-stage integration of customers and business partners within design and
value-creation processes, and linking of production and high-quality services to
yield “hybrid products”. The New High-Tech Strategy: Innovations for Germany of
the German Federal Ministry of Education and Research6 compiles and describes
the following potentials of Industry 4.0:
• Meeting individual customer requirements: Industry 4.0 allows individual,

customer specific criteria to be included in the design, configuration, ordering,
planning, manufacture and operation phases and enables last minute changes to
be incorporated.
• Flexibility: Engineering processes can be made more agile, manufacturing
processes can be changed, temporary shortage can be compensated for and huge
increases in output can be achieved in short space of time.
• Optimized decision-making: Industry 4.0 provides end-to-end transparency in
real time, allowing early verifications of design decisions in the sphere of
engineering and both more flexible responses to disruption and global
optimization.
• Resource productivity and efficiency: delivering the highest possible output of
products from a given volume of resources (resource productivity) and using the
lowest possible amount of resources to deliver a particular output (resource
efficiency). Moreover, rather than having to stop production, systems can be
continuously optimized during production.
• Creating value opportunities through new services: Industry 4.0 opens up new
ways of creating value and new forms of employment, for example through
downstream services.
• Responding to the demographic change in the workplace: In conjunction with
the organization of work as well as competency development initiatives, inter-
active collaboration between human beings and technological systems will
provide businesses with new ways of turning demographic change to their
advantage.
6
www.hightech-strategie.de/de/The-new-High-Tech-Strategy-390.php
• Work Life Balance: The more flexible work organization models of companies
that use CPS mean that they are well placed to meet the growing need of
employees to strike a better balance between their work and their private lives
and also between personal development and continuing professional
development.
Industry 4.0 connects people, machines, and objects in the Internet of Things and
paves the way to new production concepts and fully integrated, digital value chains.
Traditional ERP systems are now reaching their limits in terms of planning and
managing corporate resources. We now need Smart ERP systems, which are
available as software services from the cloud.
In an environment where digital transformation is becoming a reality in more
and more companies, a new dimension of digitization opens up: Now objects and
machines are digitally accessible at any time and from anywhere via sensors and
SIM cards. The usage and the network of autonomously acting CPS tap new
potential for the automation of production and logistics processes. But above all,
they create the conditions for new processes and services. The keyword here is
social manufacturing and logistics. Vertically and horizontally integrated, highly
digitized value chains with the consequence of higher complexity, as well as an
increasing degree of decentralization and self-organization are the result. The
resulting demands are what modern Smart ERP systems have to meet.
Conventional ERP systems use machine-generated data solely to ensure a
smooth running supply chain. With the use of networked cyber-physical systems, a
collaborative, decentralized machine-level control is now possible, and
machine-generated data represents a value on its own as Big Data. Smart ERP
systems must be able to employ big data across the entire value chain for strategic,
tactical and operational planning. Also, they must create the transparency required
by today’s enterprise management, namely Governance, risk, compliance, and
security management. Despite the required high efficiency, Smart ERP systems
must be easy to use, fast, high-performing, cost-effective and especially safe and
accessible at all work stations, even mobile ones, along the value chain. This
explicitly includes external partners such as customers, suppliers, original equip-
ment manufacturers (OEMs) and service partners. Manufacturers of Smart ERP
systems meet these requirements by offering their systems as a software service
(Software as a Service, SaaS) from either public, private, or hybrid clouds.
6.2.1 IoT-Enabled Value Chains
With the claim of a technological infrastructure, in which all electronic devices can
communicate with each other, IoT is a new paradigm with applications in all areas
of life. Rifkin (2014) describes “how the Communication Internet is converging
with a nascent Energy Internet and Logistics Internet to create a new technology
platform that connects everything and everyone. Billions of sensors are being
attached to natural resources, production lines, the electricity grid, logistics
networks, recycling flows, and implanted in homes, offices, stores, vehicles, and
even human beings, feeding big data into an IoT global neural network. Prosumers
(producer + consumer) can connect to the network and use big data, analytics, and
algorithms to accelerate efficiency, dramatically increase productivity, and lower
the marginal cost of producing and sharing a wide range of products and services to
near zero, just like they now do with information goods.” And in view of the rapid
pace with which digitization penetrates into all areas of life, it becomes clear how
important not only technological questions are, but also considerations regarding
privacy and ethical aspects in terms of data sensing, storing and processing. Which
begs the question: What are the limits for the application of artificial intelligence?
Or in other words: What degree of autonomy do we want machines to really have?
A current example is autonomous driving; current discussion is focused on partially
autonomous driving, although the majority of the technological challenges for full
autonomy have already been solved.
Undoubtedly the industrial application of the IoT is already the most advanced.
Interesting examples can be found in Gilchrist (2016). Based on the scenario shown
in Fig. 6.2, the following shows how IoT is designed to tap into value chains in
their entirety. In the figure, intelligent CPS communicate with each other and with
conventional IT systems. The figure shows IoT-based communication along the
value chain, i.e. that between supplier and carrier, then between shipper and pro-
ducer, and finally between the producer and his customer. Already by utilizing this
type of communication, a digital transformation of the value chain takes place
resulting in enormous potential for improvement. However, perhaps even more
important are the newly emerging communication channels such as between sup-
plier and customer, which give the supplier an insight into inventory of salable
products directly on the shelf at the point-of-sale, so that he can pro-actively
Authorities
Business Partners
Employees
Field Sales
Manufacturing &
Intralogistics
Field Service
Customers
Suppliers
Transportation
(inbound) Transportation
(outbound)
Fig. 6.2 Scenario of an IoT-enabled value chain

respond to the producer’s anticipated needs to his own pre-products. In this

example, a new form of collaboration arises from digitization, which eventually
results in a transformation of the value chain itself. The improvement potential of
such a transformation is obvious. However, to exploit this potential, conditions
have to be created that are not of a technical, but mainly of a sociological nature. To
discuss them here is beyond the scope of this book we only want to point out the
need for trust between business partners, which will have to gain a completely new
quality. In addition, issues such as governance, risk, security, and compliance (see
Schönthaler et al. (2012)) must be considered.
But when autonomous agents suddenly interact with each other and operate, will
the existing ERP systems be left out? Figure 6.2 appears to mediate this. And in
fact, ERP systems currently in use are not ready for the challenges they will
encounter in the IoT-enabled value chains of Industry 4.0. What is the significance
of centralized operational manufacturing resource planning (MRP II), when
autonomous agents on the shop floor agree on the production program in a
decentralized manner? Or how to deal with sophisticated scheduling in procurement
framework contracts, when the supplier can query the requirements directly in real
time via sensors at the point-of-sale? An endless number of such examples can be
found. And so it may seem surprising that the major manufacturers have so far
reacted mainly with simple cloud messages as well as with greater and more
flexible computing power (SAP HANA, Oracle Engineered Systems, see Chap. 2)
to the challenges that the 4th Industrial Revolution will bring.
This example scenario in mind, we want to shed some light on the most fun-
damental changes along the entire value chain, which we will be facing, or that in
many cases are already taking place directly in front of us:
• Self-control: Things (e.g., CPS) will operate and interact autonomously.

• Self-organization: Agents will negotiate with each other on the global IoT. This
will lead to a decentralization of decisions.
• Less complex decentralized algorithms: Complex algorithms for centralized
supply chain planning have to be replaced by less complex decentralized
algorithms.
• Tight integration of customers, suppliers, and business partners along the value
chain.
• Responsiveness: Transparent decisions in decentral control cycles enable fast
reactions to changes and disruptions.
Here, intelligent production systems will create intelligent products that can be
identified at any time and that can be localized. They will know their current status
and they will be able to submit this information together with their provenance, i.e.,
all states that they have been through as part of their life cycle. Important for
agent-based self-organization is that they know their options for the path to com-
pletion, i.e., to a certain extent, they carry their own production plan within
themselves. And in addition to intelligent products, intelligent machines and tools,
intelligent storage and transport containers or also autonomous means of trans-

portation will assume an active role in the production process.
6.2.2 Smart ERP Systems
From a consideration of the changes that lie ahead, it can be deduced that con-
ventional monolithic ERP systems offer little to Industry 4.0. In the future, appli-
cation functionality will not only be used within the own company, but will become
decentralized where decisions are made by business partners, customers, suppliers;
in short, by all partners in the value chain. For this, the functionality must be
provided much more fine-grained than existing ERP systems would be capable of
today. If we add compliance, security, risk and governance requirements to this
functional deterministic requirement, then it becomes obvious that Web services
deployed in a cloud are the most logical solution. In addition, more and more
functionality will be used on mobile devices, which can serve more effectively in a
transformed worker reality with an increase in mobile work than stationary devices.
Cyber-Physical Production Systems (CPPS)
A basic requirement in relation to Smart ERP systems is that they must be able to
communicate efficiently with physical processes, which process the material and
energy flows. Kagermann et al. (2013) define the structure of a cyber-physical
production system (CPPS) as shown in Fig. 6.3. A CPPS is the use of CPS in the
manufacturing industry. Basically, a CPPS consists of intelligent machines, storage
systems, and resources, which exchange information independently, trigger actions,
and control each other independently. It enables the continuous viewing of prod-
ucts, means of production and production systems while considering constantly
changing processes.
Smart ERP System

information
Information Processing
set meter
CPS
Information Processing Information Processing Information Processing
meter
set
Actuators Sensors Actuators Sensors Actuators Sensors
energy
Physical Process Physical Process Physical Process
substance
Fig. 6.3 Basic structure of a CPPS

The key to the communication between the Smart ERP system and the physical
processes is the virtualization of these processes in the CPS. The CPS is equipped
with sensors and actuators for measurements and site operations, which impact the
respective physical process directly. This creates a virtual representation of the
physical process in the CPS, on which the communication between the CPS with
the Smart ERP system then takes place.
An obvious approach to the design of a Smart ERP system could come back to a
centralized control of the entire system by means of the communication taking
place between the ERP system and all CPS’s. Such an approach can often be found
in today’s practice. However, in such application scenarios, we should not be
talking about a Smart ERP system. On the one hand, it is assumed that the sheer
number of CPS’s integrated with an overall system leads to such high complexity
that a central control of the system is no longer viable. Moreover, Fig. 6.3 shows
that the CPS’s do not only communicate with the superordinate ERP system, but
also with each other. This means that autonomous units are created, which make
decentralized decisions in collaboration with each other. In these decision pro-
cesses, the Smart ERP system often only serves for the transmission of information,
the adjustment of overall planning, and ensuring governance and compliance.
Fundamental Requirements for Smart ERP Systems
From looking at the Smart ERP system as part of a CPPS, a number of basic
requirements arise. In formulating these requirements we draw on design principles,
which have been collected for Industry 4.0 scenarios in Hermann et al. (2015).
Collectively, they define the essential requirements that have to be met by
Smart ERP systems:
• Interoperability: Basic principle of the IoT, in which all components (CPS,

information systems, etc.) communicate with each other in a standardized way.
• Virtualization: CPS monitor and control physical processes and create a virtual
image of the physical world.
• Decentralization: Central planning, monitoring, and controlling are replaced by
decentralized decisions by collaborating autonomous units.
• Real-time capability: Monitoring, controlling, and decision-making processes
must be carried out in real time.
• Service orientation: Services by companies, organization units, individuals,
information systems and CPS are offered internally or across company bound-
aries and can be used free of charge or for a fee. The orchestration of services is
user-specific.
• Modularity: Modular systems are able to adapt flexibly to new or changed
requirements. Ideally, appropriate modules are identified and used
automatically.
Collaborative Strategical Supply simulation Collaborative

Engineering Chain Planning Quality Mgt.
Predictive Supply Chain Execution

Maintenance
Order Mgt.
Procure-
Smart ERP ment
Cost
database
Mgt.
Warehouse
Tactical Supply Chain simulation Inbound Mgt.
Big Data & Production Planning Logistics
Analytics
Work in Outbound
Progress Logistics
IoT Middleware
big data
public data
CPS
Fig. 6.4 System architecture: CPPS kernel
Architecture of a Smart ERP System

Starting from these requirements, the architecture of a Smart ERP system can be
defined as shown in Fig. 6.4 and essentially comprise the components described
above.
The physical processes that are virtualized in CPS are shown in the lower part of
the illustration. The CPS is accessed through IoT middleware, which is also
equipped with the technology to collect and manage big data. This data is used for
decision operations in different functional areas using powerful analytics features. If
necessary, the data is enriched with data from public databases. Supply chain
management planning and implementation are found within the CPPS kernels.
While much of the operational planning takes place through collaboration between
the CPS, corresponding functionalities needed for strategic and tactical planning,
and not remotely feasible production planning, have to be provided in the
Smart ERP system. These are also equipped with simulation features that take into
account the circumstances of the autonomy of the CPS in their decisions. Especially
in the tactical area, big data analysis is used as well. For the supply chain execution,
functionalities for procurement, receiving, warehouse management, order man-
agement, and shipping, as well as inbound and outbound logistics are considered. In
these operations, too, access to the CPS is provided via the IoT middleware. The
application examples are manifold: automatic repeat orders, tracking and tracing of
intra and extra logistics, automated call-off orders and many more.
The core of the architecture, next to the big data management, is a Smart ERP
database, which can manage both object-relational and multidimensional data. To
achieve high performance, parts of the database can be implemented on the basis
of in-memory technology. Vendors such as SAP rely on HANA for a consistent use
SaaS
Collaborative Governance, Risk, Collaborative

Project Mgt. Compliance, Product Lifecycle
Enterprise Security Mgt.
Performance Mgt. Human Capital
Management Social Innovation
Social Value Chain Management
Partner Financial
Relationship Mgt. Management
PaaS Integration
Framework
Social Business Business
Process Mgt. Rules Mgt.
CPPS Continuous, enterprise-spanning

Business Processes
Kernel
Fig. 6.5 Overall solution architecture
of in-memory technologies, while competitors such as Oracle use these technolo-

gies intently demand-orientated. The functional range of the CPPS kernel is com-
plemented with collaborative engineering, collaborative quality management,
predictive maintenance, and cost management. The analysis of big data is of great
importance in all of these functional areas. In the Smart ERP system, new and
innovative business algorithms need to be implemented that enable an optimal use
of big data in prognosis, planning, forecasting, monitoring, and analysis operations.
In engineering and quality management, it is important that value chain partners can
be integrated to be able to support new forms of collaboration efficiently. It goes
without saying that collaboration in these sensitive areas requires a significant
degree of confidence between the partners.
The CPPS kernel described is part of a comprehensive Smart ERP development
plan shown in Fig. 6.5. It is equipped with a powerful process and integration
platform, which is realized as a platform service in the cloud. Business processes
can be scheduled and automated using social business process management and
rules management components. It is important that it also supports human work-
flows and adaptive case management. Through an integration framework, appli-
cation components from the cloud and on-premise environments can be linked and
connected to integrated business processes across company boundaries.
The architecture presented contains the most important components, which are
used to supplement the CPPS kernel. Just like in conventional ERP systems, there
are financial management and enterprise performance management as well as
human capital management. Then there are components that take into account the
rapid technological progress with ongoing adaptation to changing requirements, as
well as the need for a close cooperation of partners along the value chain: Social
innovation management, collaborative product lifecycle management, collaborative
project management and social value chain partner relationship management are all
IoT
Middle-
ware
SaaS
IoT
Middleware
CPS
CPS
Fig. 6.6 Distributed value chains in virtual organizations
included. Governance, risk, compliance, and security management (GRC+) is a key

feature of the Smart ERP system and is indispensable. It has to meet particularly
high requirements in global value chains.
Figures 6.4 and 6.5 show the functions of a Smart ERP system, each as
self-contained modules. This visual impression may be misleading, because in
practice the modules of a Smart ERP system always have to be granulated finer to
take into account the increasing decentralization of the value chains, in particular
functionalities in a functional area can be composed of services from different
manufacturers, which can be also owned by various value chain partners.
Distributed Value Chains in Virtual Organizations
Figure 6.6 schematically shows how a value chain can be flexibly distributed across
a virtual organization. The virtual organization is created through the close col-
laboration between several value chain partners, which must be represented by the
Smart ERP system.
Figure 6.6 shows by means of different coloring how different cloud services are
used for the various processes along the value chain. In addition, various IoT
middleware platforms are used, so that in one case the entire IoT environment
remains with one partner, and in the other case, an internal IoT middleware controls
the access to external CPS. In all cases, the services are integrated with the process
and integration platform, which not only ensures consistent business processes but
also meets the significant requirements in terms of governance, risk, security, and
compliance.
Due to the ever-increasing integration of companies in collaborative value cre-

ation processes, science and practice are increasingly discussing use cases for
shared secure data rooms. To name an example: The initiative to create an
“Industrial Data Space”,7 a “virtual data space which supports the secure exchange
and simple linking of data in business ecosystems on the basis of standards and by
using collaborative governance models. Data is only exchanged if it is requested
from trustworthy certified partners. The data owner—i.e. the company—determines
who is allowed to use the data in what way. As a result, the partners of one supply
chain have joint access to certain data by mutual consent so that they can start
something new, develop new business models, design their own processes more
efficiently or initiate additional added value processes elsewhere, either alone or
together.”
6.2.3 IoT Software Platforms
“Platforms beat products every time”. This often used quote by MIT Professor
Marshall van Alstyne may provide the reason why the IoT software platform
market is so competitive. The best-selling “Platform Revolution” by Parker et al.
(2016) describes a platform as a market square, which defines an ecosystem in
which players assume different roles (producers, consumers, suppliers, owners)
which they can switch quickly over time. The marketplace offers a powerful
infrastructure and rules for relationships and transactions between the players.
Parker et al. interpret the IoT as “worldwide platform of platforms” and as a driver
of the platform revolution. This assessment is underscored by an IDC forecast that
assumes 29.5 billion devices that will be connected with the IoT by 2020 (see
Turner 2016).
There is an abundance of IoT software platforms available on the market, but
also the variety of features that are provided by these platforms is considerable. This
is due to the different origins of the providers, as shown by a recent Forrester
study.8 Forrester analysts studied 11 providers of IoT software platforms in terms of
their current offerings, strategy and market presence. IBM (Watson IoT Platform),
PTC (ThingWorx), GE (Predix), and Microsoft (Azure IoT Suite) have been
identified as market leaders. Amazon Web Services (AWS IoT Platform), SAP
(SAP HANA Cloud Platform IoT Services) and Cisco (Cisco Jasper Control
Center) were considered “strong providers”. This study can be seen as a snapshot of
the rapid development of available platforms in conjunction with numerous cor-
porate acquisitions. Especially in Germany, it is becoming evident how
market-leading industrial companies are turning into IoT platform providers: Bosch
(Bosch IoT suite), Siemens (MindSphere IoT Operating System) and TRUMPF
(AXOOM IoT platform), just to name a few.
7
www.industrialdataspace.org/en/
8
www.forrester.com/report/The+Forrester+Wave+IoT+Software+Platforms+Q4+2016/-/E-
RES136087
Fig. 6.7 IoT software platforms, integrating edge devices with enterprise systems (source: Turner
2016)
Functionality of IoT Software Platforms

According to Forrester, the core task of an IoT software platform is the “integration
of edge devices with enterprise systems by helping to simplify deploying,
managing, operating, and capturing insights from IoT-enabled connected devices”
(see Fig. 6.7). In assessing the platform providers, the Forrester analysts have
identified existing partner ecosystems, pre-built apps, and advanced analytics as key
differentiators. Overall, the analysts categorize the functionality of IoT software
platforms as follows:
• Connect: create and manage the link from the device to the Internet
• Secure: protect IoT devices, data, and identity from intrusion
• Manage: control the provisioning, maintenance, and operation of IoT devices
• Analyze: transform data into timely, relevant insight and action
• Build: create applications and integrate with enterprise systems
We consider two sample platforms next which have already achieved a significant
market presence and are continually developed by innovative and economically
strong manufacturers.
IBM Watson IoT Platform
The Watson IoT platform is an adaptable, scalable and open IoT platform, which is
deployed as a service from the IBM Bluemix® cloud infrastructure. It “convinces”
through its functional diversity, including capabilities such as augmented reality,
cognitive capabilities, Blockchain technology, edge analytics, analytics tooling, and
natural language processing, next to the usual basic functionalities. In addition,
IBM is expanding their IoT application range through corporate acquisitions,
including, for example, the weather forecast platform The Weather Company. The
Forrester analysts highlight IBM’s commitment to open source standards and their
extensive global partner ecosystem as strengths. Details of the IBM Watson IoT
solution architecture can be found at www.ibm.com/internet-of-things/platform/
watson-iot-platform/.
The IBM Alliance with Siemens AG is particularly interesting, as Siemens also

has a powerful IoT platform on the market, MindSphere, which has been developed
jointly with SAP on the SAP HANA Cloud Platform IoT. The plan is to integrate
the Watson Analytics Service and other analysis tools in MindSphere and the goal
is to give Siemens business customers access to visualization functionalities and
dashboards and to also provide app developers and data analysts with analytics
technologies via corresponding interfaces. Also, to bundle the competencies of the
partners in analysis technologies, automation and digitalization in the development
of apps (e.g. for predictive maintenance) for MindSphere. The cooperation between
Bosch Software Innovation GmbH (Bosch SI) and IBM has a similar goal: To make
the software-based services of the Bosch IoT suite available on the IBM Watson
IoT platform and thus enable for faster networking of devices in the IoT context.
Just like with Siemens, this is not Bosch SI’s only alliance, who also cooperates
with Cisco, SAP, Software AG, and GE (Predix) in the IoT market.
A look at the IBM IoT platform strategy makes the importance of collaborative
innovation with customers, suppliers and development partners clear. At its head-
quarters in Munich, Germany, for the worldwide IBM Watson IoT business, IBM
operates “Collaboratories” on the subject of cognitive IoT. These are industry labs
where IBM experts and customers and partners develop innovative solutions for the
automotive, electronics, manufacturing, health, and insurance industries. Schaeffler
AG is one important partner and is named here as an example for a global auto-
motive and industrial supplier. Within the framework of their strategy “Mobility for
tomorrow”, they are working together with IBM on a digital platform for all
data-based services by Schaeffler. A pre-integrated IoT solution for the insurance
industry is being produced in cooperation with American smart home manufacturer
Wink.
Oracle IoT Cloud Service
With the Oracle IoT Cloud Service, Oracle Corp. is also offering IoT middleware.
Currently, the main target industries are automotive, healthcare, logistics, and
manufacturing. In contrast to many market companions, whose developments
mainly focus on the connectivity of IoT devices, Oracle focuses on an optimal
integration between the IoT devices and the enterprise applications, as well as the
business analytics. Here, Oracle is playing off their high manufacturing depth,
ranging from server hardware to Java and big data management to applications.
“Connect, analyze, integrate,” that is how Oracle summarizes the functionality of
their in-house IoT middleware that is offered exclusively as a cloud service. Details
of the Oracle IoT Cloud Service solution architecture can be found at www.oracle.
com/solutions/internet-of-things/.
Using Java and other technologies, the Oracle IoT Cloud Service offers many
possibilities for a reliable and secure connection with IoT devices. Big data and
predictive analytics for data analysis are available for delivering insights into
streamed IoT data and events to identify new services or to improve customer
satisfaction through enriched enterprise data. Open interfaces and pre built-in
integrations for Oracle’s PaaS and SaaS products can provide IoT data safely and
easily in enterprise applications.
Just like their competitors, Oracle is increasingly opting for strategic partner-
ships with the aim of developing special application solutions and vertical industry
packages. One such example is the cooperation with Bosch Rexroth AG, a pioneer
in the application and development of Industry 4.0 solutions. With the integration
of industrial automation technologies by Bosch Rexroth, the Oracle IoT platform
forms the bridge from the Smart Factory to the world of enterprise systems.
6.2.4 Summary
Industry 4.0 networks people, machines, and objects in the Internet of Things and
paves the way to new production concepts and fully integrated, digital value chains.
They extend across company boundaries and form the basis for the collaboration
between customers, suppliers, engineering, production, and service partners. Tra-
ditional ERP systems are now reaching their limits in terms of planning and
managing corporate resources. We now need Smart ERP systems, which are
available as software services from the cloud. In addition to the cloud, mobile and
big data analytics are the enabling technologies.
Smart ERP systems must be able to communicate efficiently with physical pro-
cesses that process material and energy flows. This communication takes place via
CPS, which are equipped for measurement and site operations in the process with
sensors and actuators, which participate directly in the process. This creates a virtual
representation of the physical process in the CPS, on which then the communication
between the CPS with the Smart ERP system takes place. A proposal for an archi-
tecture where the resulting core system of networked CPS is embedded into an overall
system meets the requirements of a modern Smart ERP system: interoperability,
virtualization, decentralization, real-time capability, service orientation, modularity
(properties generally required for any modern enterprise software system).
IoT platforms play an increasingly key role in the market. The IBM Watson IoT
platform and the Oracle IoT Cloud Service are two platforms that have already
achieved a significant market presence and are continually developed by innovative
and economically strong manufacturers. The capacity for collaborative innovation
involving customers, suppliers and development partners that paves the way for
horizontally and vertically integrated industry solutions defines the overall
competition.
Turner (2016) features interesting results from IDC studies from 2014 and 2015,
demonstrating that the enterprise sector is increasingly seen as the driver for IoT
investments, leaving behind the consumer sector. And nearly 60% of the companies
emphasize the strategic importance of the IoT for their company’s future. IoT is
increasingly becoming a business subject, and modern Smart ERP systems make a
valuable contribution in unlocking the potential of the IoT through improved
process automation, faster and better quality decision support, and a better customer
experience.
6.3 Towards the E-Society
With all we have said about digitization, the Internet and the Web, as well as their
implications for practically all areas for everyday private as well as professional
life, we finally look at several topics every one of us will be confronted with in the
years to come and that will make the road (of the Web) partially bumpy and
partially smooth.
6.3.1 Future Customer Relationship Management
In previous chapters we have discussed customer relationship management (CRM)

and the customer journey in some detail; these topics will remain of prime
importance in the future, and many companies today put the customer experience
(CX) as their highest priority. As our previous discussion of smart homes shows, a
big challenge currently is integration of the various “standards” around, different
interfaces, languages “spoken” by devices etc. As long as the field remains
Babylonian, customers will not be happy and stay away from it. It requires smooth
and seamless integration and handling for such a technology (actually a collection
of technologies) to be successful.
The same applies to future retail and e-commerce. According to a recent pre-
diction by Walmart CEO Doug McMillon,9 who speculated about shopping in
10 years from now, three aspects will be instrumental for future CX:
1. Customer empowerment due to higher control over their shopping experience.

They want to explore, but with easy access to items regularly chosen, and to
save time while saving money. Everyday needs require a most convenient
combination of store, e-commerce, pick-up, and delivery that is AI-supported,
e.g., via virtual reality.
2. Customer desires can span anywhere in the world. As McMillon puts it: “I’ve
seen what you have and I want it, too.” In a flat world that is moving fast,
customers can see what people in other countries have and may want it too,
ideally without delay.
3. Shared value: Businesses need to create shared and sustainable value that will
appeal to customers and at the same time benefit shareholders and society. This
is achievable only through collaboration between various parties, including
retailers and governments.
9
www.weforum.org/agenda/2017/01/3-predictions-for-the-future-of-retail-from-the-ceo-of-
walmart/
6.3 Towards the E-Society 269
6.3.2 The Future of Work
Much can regularly be found in the press about the disappearance of traditional
labor and about the fact that many of today’s jobs will soon vanish. Indeed, when
we look at an area like car manufacturing, we can immediately see that lots of
manual work is nowadays done by robots (a fact that also holds for other fields of
manufacturing, as implied by Fig. 6.1). As the Daily Mail already warned in 2014,
“50 per cent of occupations will be redundant in 11 years’ time.”10 The study on
which the article was based continued that “experts believe half of today’s jobs will
be completely redundant by 2025; Artificial intelligence will mean that many jobs
will be done by computers; customer work, process work and middle management
will ‘disappear’; … workspaces with rows of desks will no longer exist.”
We mentioned earlier the new generation of robot advisors that many banks are
putting forward, a typical example of a personal advisor being replaced by an
algorithm or a collection of algorithms hidden behind a Web interface. In the same
category falls Wipro HOLMES,11 an AI platform based on cognitive computing
that can serve a number of applications, including helpdesks, diagnosis, shopping,
insurance claims, or maintenance. Amelia by IPsoft is advertised as “your first
digital employee … a cognitive agent who can take on a wide variety of service
desk roles and transform customer experience,”12 and its inventors see businesses
such as insurance, banking, health, retail, or government as potential applications.
So while many consider these prospects a threat, correct predictions about the
future of work are difficult, and we refer the reader interested in more than just
articles on the buzzword “future of work” to the recordings from the 10th De Lange
Conference on “Humans, Machines, and The Future of Work” that took place at
Rice University in Houston, Texas in December 2016, which can be found at
delange.rice.edu/conference_X/videos.html.
6.3.3 Learning for the E-Society
When it comes to Industry 4.0, to the use of IoT across entire value chains, when
Smart Factories are established, or when modern Smart ERP systems are to be
introduced, one must not forget the human factor. An article by Constanze Kurz
states, “Industry 4.0 is understood as a socio-technical system, which does not only
need new technical but also new social infrastructures in order to be implemented
successfully” (see Botthoff and Hartmann 2014). The latter authors also refer to a
survey conducted by the German Fraunhofer Institute in 2013, where experts from
economy and practice were asked to make their 5-year forecasts on the importance
of human work (planning, controlling, execution, monitoring). The result was
10
www.dailymail.co.uk/news/article-2826463/CBRE-report-warns-50-cent-occupations-redundant-
20-years-time.html
11
www.wipro.com/holmes/
12
www.ipsoft.com/amelia/
abundantly clear: 97% of the respondents thought that human work will remain
very important (60.2%) or important (36.6%). And even for the non-producing
parts of the value chain the human labor and skill will continue to play an important
role, especially when it comes to the issue of customer experience.
Technological Unemployment and New Skills
However, one cannot deny that the progressive digitization will destroy many jobs or
at least fundamentally transform them. This leads to fears in the population, which
are often ignored by the economic and political elites and act as door openers for
populists of all shapes and sizes. On this subject, a report in Issue 36 of 2016 of
German news magazine DER SPIEGEL identifies a number of high-risk work areas:
• Dematerialization: Through direct data acquisition and processing, many tra-

ditional office and back office jobs will disappear.
• Drones, self-driving systems: Autonomous transporters will replace taxi and
truck drivers, warehouse men, postmen, and parcel carriers.
• Robotics: 3D printers are replacing highly skilled manual workers, dental
technicians or modelers. Work and maintenance robots take over manufacturing
steps in production lines.
• Gig economy: Freelancers look for jobs on virtual platforms. Companies con-
tract out globally.
• Deep learning: Systems linked to databases recognize patterns and replace the
analysis through highly specialized professionals.
Brynjolfsson and McAfee (2014) also postulate a technological unemployment that

arises when jobs are replaced by intelligent machines. But it will also create new
jobs and other jobs will change considerably. What all changes have in common is
the need for a generally high level of education by all concerned walks of life. And
digitization in the 4th Industrial Revolution demands—just like all the revolutions
before it—new skills that were previously not necessarily promoted in educational
systems and perhaps even inhibited:
• Advanced requirements for working independently and self-organized,

• ability to collaborate intensively and effectively with external partners;
• business transaction processing is replaced by approval, monitoring, planning,
and simulation tasks.
An important aspect in the implementation of Industry 4.0 is elaborated in

Kagermann et al. (2013). The authors point out that for a successful change, which
is also valued positively by the employees, organizational models and free work
arrangements are crucial, in addition to comprehensive qualification and further
education measures. They demand models that combine a high degree of autonomy
with decentralized command and control forms.
Also, one must be aware that lifelong learning is probably more than ever the only
guarantee for secure jobs with progressive digitization. For this, learning-conducive
6.3 Towards the E-Society 271
activities have to be prioritized in future-oriented organization models. T. Mühlbrandt

lists the relevant characteristics of such activities in Botthoff and Hartmann (2014):
• Independence
• Participation
• Variability
• Complexity
• Communication/cooperation
• Feedback and information
• Avoiding time pressure
These requirements are implicitly named in the objectives published by German

Integrata Foundation (www.integrata-stiftung.de) for the humane use of informa-
tion technology. “More life quality through information technology”. This is also
the motto of a current project of the foundation, in which a seal of approval for the
humane use of information technology in CPS in the IoT will be developed. The
seal takes the population’s increasing need for protection as a result of ongoing
digitization into account. CPS are awarded the seal of approval by the foundation
when they have successfully completed a certification process. They have been
tested whether the information technology contained in them is humane. In addition
to the certification, the quality seal is also used in benchmarking processes.
Important aspects of testing result from the unimaginable amounts of data, which
are, sometimes voluntarily, in many cases, involuntarily, disclosed by users. And
what are the risks arising from the permanent availability of the users in conjunction
with knowledge of their preferences and behaviors? Does the supposedly higher
level of service actually justify the resulting risks of abuse? Undoubtedly, an
increase in the quality of life results only if the individual remains master of the
technologies.
The seal of approval comes with tests that assess what is learned, not what is
programmed. Modern machines have become so complicated because of their ar-
tificial intelligence that their inner workings are hard to see through. This is even
increased by the abilities of CPS, which will no longer be programmed but con-
trolled by a built-in algorithm. Rather, their skills are acquired through learning.
The only thing that is programmed is how they learn and how the actions learned
are implemented. Testing serves to see whether the learnings really serve the user,
without any unwanted side effects.
6.4 Further Reading
In May 2014, the New York Times ran an interesting feature entitled “A Vision of
the Future from Those Likely to Invent It,” in which people like Marc Andreessen,
Peter Thiel, Susan Wojcicki and others could share their vision of the future, see
www.nytimes.com/interactive/2014/05/02/upshot/FUTURE.html. The inventor of
the Web, Tim Berners-Lee, recently stated his worries about the future of the Web,
which in his view are mainly due to three trends13:
1. We‘ve lost control of our personal data.

2. It’s too easy for misinformation to spread on the Web.
3. Political advertising online needs transparency and understanding.
He continues to say that “these are complex problems, and the solutions will not be
simple. But a few broad paths to progress are already clear. We must work together
with web companies to strike a balance that puts a fair level of data control back in
the hands of people, including the development of new technology such as personal
‘data pods’ if needed and exploring alternative revenue models such as subscriptions
and micropayments. We must fight against government overreach in surveillance
laws, including through the courts if necessary. We must push back against misin-
formation by encouraging gatekeepers such as Google and Facebook to continue
their efforts to combat the problem, while avoiding the creation of any central bodies
to decide what is ‘true’ or not. We need more algorithmic transparency to understand
how important decisions that affect our lives are being made, and perhaps a set of
common principles to be followed.”
A lot has been written recently about the Internet of Things, its opportunities,
technologies, impacts, and threads, some of which we have already mentioned in
the text. Georgakopoulos and Jayaraman (2016) provide a research perspective.
Readers interested in broader perspective should consult Greengard (2015), Chou
(2016), Buyya and Dastjerdi (2016), or Raj and Raman (2017). Obviously, since
IoT is still in an embryonic state, there is a lot of discussion not just about tech-
nology, but also about social and ethical aspects; for starters, we refer to Berman
and Cerf (2017).
Rifkin (2014) studies the implications of the Internet of Things, states a para-
digm shift from market capitalism to collaborative commons, and predicts the zero
marginal cost society. The latter is on the horizon, for example, through 3D
printing, which makes production of anything a local business, or MOOCs (Mas-
sive Open Online Courses, made popular through platforms like Coursera, Edu-
cause, or edX, which are online courses intended for worldwide participation and
open access via the Web. One of the authors’ of this book personal vision is that, in
a not-too-distant future, we will buy cars as data sets comprising digital represen-
tations of all equipment (standard or additional) we are willing to pay for, which
then gets either delivered to the buyer or directly to a print shop where the car gets
printed; if delivered to the buyer, (copying and) reselling needs to be regulated,
while in both cases warranty claims or selling data (i.e., cars) as “used” need new
concepts.
A different look into the future is offered by Carr (2016), who opposes the idea
that the future is only about technology and data; in his view, technology has not
only enriched, but also imprisoned us and hence puts it in perspective.
13
www.theguardian.com/technology/2017/mar/11/tim-berners-lee-web-inventor-save-internet
Stieglitz and Greenwald (2014) present and study the modern learning society.
Raconteur ran a feature in April 2016 entitled “The Future Workplace,” see www.
raconteur.net/the-future-workplace. Samit (2015) presents stories from people like
Richard Branson, Steve Jobs, or Elon Musk as well as from companies like
YouTube, Cirque du Soleil, or Odor Eaters and shows how personal transformation
can reap entrepreneurial and other rewards. Kelley and Kelley (2013) show how to
“unleash the creativity within us” in order to keep up with the modern world and its
developments.
References
Achenbach, J. (2015): Driverless cars are colliding with the creepy Trolley Problem; The
Washington Post, December 29, 2015 (see https://www.washingtonpost.com/news/
innovations/wp/2015/12/29/will-self-driving-cars-ever-solve-the-famous-and-creepy-trolley-
problem/).
Agarwal, D.K., B.-C. Chen (2016): Statistical Methods for Recommender Systems. Cambridge
University Press, New York.
Aggarwal, C.C. (2016): Recommender Systems – The Textbook. Springer Cham, Heidelberg,
Germany.
Agrawal, D., S. Das, A. El Abbadi (2012): Data Management in the Cloud: Challenges and
Opportunities. Synthesis Lectures on Data Management, Morgan & Claypool Publishers, San
Francisco, CA.
Agrawal R., T. Imielinski, A. Swami (1993): Mining association rules between sets of items in
very large databases. Proc. ACM SIGMOD International Conference on Management of Data,
207–216.
Agrawal, R., R. Srikant (1994): Fast Algorithms for Mining Association Rules. Proc. 20th
International Conference on Very Large Data Bases, 487–499.
Akl, S.G. (1989): The Design and Analysis of Parallel Algorithms; Prentice-Hall, Inc.,
Englewood-Cliffs, NJ.
Alpaydin E. (2016): Machine Learning. The MIT Press. Cambridge, MA.
Anderson, Ch. (2006): The Long Tail. Why the Future of Business is Selling Less of More;
Hachette Book Group, Lebanon, Indiana. See also the following Web page of Wired Magazine
for a short article: www.wired.com/wired/archive/12.10/tail.html.
Andriole, S.J. (2010): Business Impact of Web 2.0 Technologies. Communications of the ACM 53
(12) 67–79.
ASTRI (2016). Whitepaper on Distributed Ledger Technology. Commissioned by the Fintech
Facilitation Office (FFO) of Hong Kong Monetary Authority (HKMA). https://www.astri.org/
tdprojects/whitepaper-on-distributed-ledger-technology/, retrieved 15 April, 2017.
Baeza-Yates, R., B. Ribeiro-Neto (2011): Modern Information Retrieval: The Concepts and
Technology behind Search, 2nd edition. Addison-Wesley, Reading, MA.
Barabasi, A.-L. (2016): Network Science. Cambridge University Press, Cambridge, UK.
Barker, D. (2016): Web Content Management: Systems, Features, and Best Practices. O’Reilly
Media, Sebastopol, CA.
Battelle, J. (2005): The Search – How Google and Its Rivals Rewrote the Rules of Business and
Transformed Our Culture. Portfolio (Penguin Group), New York.
Bazzell, M., J. Carroll (2016): The Complete Privacy & Security Desk Reference: Volume I:
Digital. CreateSpace Independent Publishing, Seattle.
Berman, F., V. Cerf (2017). Social and Ethical Behavior in the Internet of Things. Communi-
cations of the ACM 60 (29) 2017, 6–7.

DOI 10.1007/978-3-319-60161-8
276 References
Berners-Lee, T. (2000): Weaving the Web: The Original Design and Ultimate Destiny of the World
Wide Web. HarperCollins Publishers, New York.
Bilton, N. (2014): Hatching Twitter: A True Story of Money, Power, Friendship, and Betrayal.
Portfolio Penguin, New York.
Blei, D.M. (2012): Probabilistic topic models. Communications of the ACM 55 (4), 77–84.
Blöbaum, B. (ed.) (2016): Trust and Communication in a Digitized World: Models and Concepts
of Trust Research; Springer, Berlin.
Bonnington, C. (2015) http://www.wired.com/2015/02/smartphone-only-computer/, Retrieved 13
May, 2016.
Borgatti, S.P., M.G. Everett, J.C. Johnson (2013): Analyzing Social Networks. SAGE Publications
Ltd., Thousand Oaks, CA.
Botthoff, A., E.A. Hartmann, eds. (2014): Future of Work in Industry 4.0. Springer Vieweg Berlin
Heidelberg (in German).
Boutros, T., T. Purdie (2014). The Process Improvement Handbook: A Blueprint for Managing
Change and Increasing Organizational Performance. McGraw-Hill Education, New York.
Brabham, D.C. (2013): Crowdsourcing. The MIT Press, Boston, MA.
Brandt, R.L. (2012): One Click: Jeff Bezos and the Rise of Amazon.com. Protfolio Penguin,
London, UK.
Brin, S., L. Page (1998): The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Computer Networks 30, pp. 107–117.
Brynjolfsson, E., A. McAfee (2014): The Second Machine Age – Work, Progress, and Prosperity
in a Time of Brilliant Technologies. W. W. Norton & Company, New York, London.
Büttcher, S., C.L.A. Clarke, G.V. Cormack (2010): Information Retrieval – Implementing and
Evaluating Search Engines. The MIT Press, Cambridge, MA.
Buyya, R., A.V. Dastjerdi, eds. (2016): Internet of Things – Principles and Paradigms. Morgan
Kaufmann, Cambridge, MA.
Carr, N. (2008): The Big Switch: Rewiring the World, from Edison to Google. W. W. Norton &
Company, New York.
Carr, N. (2016): Utopia Is Creepy and Other Provocations. W. W. Norton & Company, New
York.
Castro-Leon, E. (2014). Consumerization in the IT Service Ecosystem. IEEE IT Professional 16
(5), pp. 20–27.
Chang, F., J. Dean, S. Ghemawat et al. (2008): Bigtable: A Distributed Storage System for
Structured Data. ACM Transactions on Computer Systems (TOCS) 26 (2), Article No. 4.
Chou, T. (2016): Precision: Principles, Practices and Solutions for the Internet of Things.
Lulu.com Crowdstory.
Chow, R., P. Golle, M. Jakobsson et al. (2009). Controlling Data in the Cloud: Outsourcing
Computation without Outsourcing Control, In Proc. ACM Workshop on Cloud Computing
Security (CCSW), Chicago, IL.
Christensen, C.M. (1997): The Innovator’s Dilemma: When New Technologies Cause Great Firms
to Fail. Harvard Business School Publishing, Boston, MA.
Codd E.F. (1970). A relational model of data for large shared data banks. Communications of the
ACM, 13, pp. 377–387.
Codd E.F. (1982). Relational databases: a practical foundation for productivity. Communications
of the ACM, 25, pp. 109–117.
Cohen, A. (2002): The Perfect Store: Inside eBay. Little, Brown and Company, Boston, MA.
Cohen, H. (2011): What is Social Commerce? http://heidicohen.com/what-is-social-commerc/,
Retrieved 26 March, 2017.
Corbett, J.C., J. Dean, M. Epstein et al. (2012): Spanner: Google’s Globally-Distributed Database.
Proc. 10th USENIX Symposium on Operating System Design and Implementation (OSDI),
251–264.
References 277
Cumbie, B. A., B. Kar (2016): A Study of Local Government Website Inclusiveness: The Gap
Between E-Government Concept and Practice. Information Technology for Development 22
(1), 15–35.
Das Sarma, A., A. Parameswaran, J. Widom: (2016). Towards globally optimal crowdsourcing
quality management: The uniform worker setting. In Proceedings of the 2016 International
Conference on Management of Data, 47–62.
De Kare-Silver, M. (1998): E-Shock: The Electronic Shopping Revolution: Strategies for Retailers
and Manufacturers. New York: AMACOM.
Deakins, E., S. Dillon, H. Al Namani (2008): Local e-Government Development Philosophy in
China, New Zealand, Oman, and the United Kingdom. Proc. 6th International Conference on
E-Government: ICEG 2008. Academic Conferences Limited, 109.
Dean, J., S. Ghemawat (2008): MapReduce: simplified data processing on large clusters.
Communications of the ACM 51 (1), 107–113.
Dekel, E., D. Nassimi, S. Sahni (1981): Parallel Matrix and Graph Algorithms; SIAM Journal on
Computing 10, 657–675.
Deloitte (nd): The digital workplace: Think, share, do Transform your employee experience, http://
www2.deloitte.com/content/dam/Deloitte/mx/Documents/human-capital/The_digital_
workplace.pdf, Retrieved 8 May 2016.
Deloitte & Touche (2001): The Citizen As Customer. CMA Management 74 (10), 58.
Denning, P.J., R. Dunham (2010): The Innovator’s Way: Essential Practices for Successful
Innovation. The MIT Press, Cambridge, MA.
Diedrich H. (2016). Ethereum – Blockchains, Digital Assets, Smart Contracts, Decentralized
Autonomous Organizations. Wildfire Publishing.
Dillon, S., G. Vossen (2015): SaaS Cloud Computing in Small and Medium Enterprises: A
Comparison between Germany and New Zealand; International Journal of Information
Technology, Communications and Convergence 3 (2), 87–104.
Docherty, M. (2015): Collective Disruption: How Corporations & Startups Can Co-Create
Transformative New Businesses. Polarity Press, Boca Raton, FL.
Drescher, D. (2017). Blockchain Basics – A Non-Technical Introduction in 25 Steps. Apress
Springer Science + Business Media, New York.
Elmasri R.A., S.B. Navathe (2016). Fundamentals of Database Systems, 7th ed. Boston, MA.
Pearson Addison-Wesley, Boston, MA.
Emarketer.com (2014): Mobile Commerce Trends, https://www.emarketer.com/Webinar/Mobile-
Commerce-Trends/4000088, Retrieved 26 March, 2017.
Englert, M., S. Siebert, M. Ziegler (2014): Logical Limitations to Machine Ethics with
Consequences to Lethal Autonomous Weapons; Computing Research Repository (CoRR),
November 2014 (see http://arxiv.org/abs/1411.2842).
Erl, T. (2005). Service-Oriented Architecture (SOA): Concepts, Technology, and Design.
Prentice-Hall, Upper Saddle River, NJ, USA.
Erl, T. (2009). SOA Design Patterns. Prentice-Hall, Upper Saddle River, NJ, USA.
Erl, T., R. Puttini, Z. Mahmood (2013): Cloud Computing: Concepts, Technology & Architecture.
Prentice Hall, Upper Saddle River, NJ.
Festl R., T. Quandt (2016): The Role of Online Communication in Long-Term Cyberbullying
Involvement among Girls and Boys. Journal of Youth and Adolescence, 45 (9), 1931–1945.
Fitzgerald B., K. Stol (2015): The Dos and Don’ts of Crowdsourcing Software Development, Proc.
SOFSEM 2015, LNCS 8939, 58–64.
Fouss, F., M. Saerens, M. Shimbo (2016): Network Data and Link Analysis. Cambridge University
Press, Cambridge, UK.
Friedman, T.L. (2005): The World is Flat – A Brief History of the Twenty-First Century. Farrar,
Straus and Giroux, New York.
Friedman, T.L. (2016): Thank You for Being Late – An Optimist’s Guide to Thriving in the Age of
Accelerations. Farrar, Straus and Giroux, New York.
278 References
Friedman, T.L., M. Mandelbaum (2011): That Used to Be Us: How America Fell Behind in the
World It Invented and How We Can Come Back. Farrar, Straus and Giroux, New York.
Ganesan, R. (2014): Li-Fi Technology in Wireless Communication, Communications Engineering
Papers, Madras Institute of Technology, Anna University, http://www.yuvaengineers.com/li-fi-
technology-in-wireless-communication-revathi-ganesan/ Retrieved 10 December 2016.
Gartner (2016): Gartner Says Worldwide Smartphone Sales to Slow in 2016, http://www.gartner.
com/newsroom/id/3339019 Retrieved 8 August 2016.
Gassmann, O. K., Frankenberger, M. Csik (2014): The Business Model Navigator: 55 Models That
Will Revolutionise Your Business. Pearson Education Ltd., Harlow, UK.
Gertner, J. (2012): The Idea Factory: Bell Labs and the Great Age of American Innovation.
Penguin Group, New York.
Georgakopoulos, D., P.P. Jayaraman (2016): Internet of things: from internet scale sensing to
smart services. Computing 98, 1041–1058.
Gilbert, S., N. Lynch (2002): Brewer’s conjecture and the feasibility of consistent, available,
partition-tolerant web services. ACM SIGACT News 33 (2), 51–59.
Gilchrist, A. (2016): Industry 4.0: The Industrial Internet of Things. Apress Media (Springer
Nature).
Girvan M., M.E.J. Newman (2002). Community structure in social and biological networks. Proc.
National Academy of Science of the USA 99, 7821–7826.
Goransson, P., Ch. Black (2017): Software Defined Networks: A Comprehensive Approach, 2nd ed.
Morgan Kaufmann Publishers, Cambridge, MA.
Grandinetti, L. (2006): Grid Computing: The New Frontier of High Performance Computing.
Elsevier Science, Amsterdam, The Netherlands.
Grant, G., D. Chau (2006): Developing a generic framework for e-government. Advanced Topics
in Information Management 5, 72–94.
Greengard, S. (2015): The Internet of Things. The MIT Press, Cambridge, MA.
Greengard, S. (2017): The Future of Semiconductors. Communications of the ACM 60 (3) 2017,
18–20.
Hammer, M. and Champy, J.A. (1993, revised edition Dec. 2003). Reengineering the
Corporation: A Manifesto for Business Revolution. New York: Harper Collins Publishers.
Han, J., M. Kamber, J. Pei (2012): Data Mining: Concepts and Techniques, 3rd edition; Morgan
Kaufmann Publishers, San Francisco, CA.
Harold, E. R., W. S. Means (2004): XML in a Nutshell, 3rd edition. O’Reilly Media, Sebastopol,
CA.
Haselmann, T., G. Vossen (2014): EVACS: Economic Value Assessment of Cloud Sourcing by
Small and Medium-sized Enterprises; EMISA Forum 1/2014, 18–31 (available at http://www.
emisa.org/index.php/publikationen/forum/item/46-2014-1).
Haselmann, T., G. Vossen, S. Dillon (2015): Cooperative Hybrid Cloud Intermediaries — Making
Cloud Sourcing Feasible for Small and Medium-sized Enterprises; Open Journal of Cloud
Computing (OJCC) 2 (2) 2015, 4–20.
Haselmann, T., G. Vossen, St. Lipsky, Th. Theurl (2011): A Cooperative Community Cloud for
Small and Medium Enterprises; Proc. 1st International Conference on Cloud Computing and
Service Science (CLOSER) 2011, Noordwijkerhout, The Netherlands, SciTePress Science and
Technology Publications, 104–109.
Hermann, M., Pentek, T., Otto, B. (2015): Design Principles for Industrie 4.0 Scenarios: A
Literature Review. Working Paper No. 1/2015, TU Dortmund, Audi Stiftungslehrstuhl Supply
Net Order Management. (http://www.thiagobranquinho.com/wp-content/uploads/2016/11/
Design-Principles-for-Industrie-4_0-Scenarios.pdf).
Hoare, A., R. Milner, eds. (2004): Grand Challenges in Computing Research. The British
Computer Society, Swindon, Wiltshire, UK.
Hopcroft, J.E., R.M. Karp (1973): An n5/2 algorithm for maximum matchings in bipartite graphs.
SIAM Journal on Computing 2 (4), 225–231.
References 279
Howe, J. (2009): Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business.
Three Rivers Press, New York.
Hunter, R.C., C. Eng (1975): Engine Failure Prediction Techniques. Aircraft Engineering and
Aerospace Technology 47 (3), 4–14.
Internet Retailer 2016 Mobile 500 (2015): https://www.digitalcommerce360.com/2015/08/18/
mobile-commerce-now-30-all-us-e-commerce/, Retrieved 26 March, 2017.
Jannach, D., P. Resnick, A. Tuzhilin, M. Zanker (2016): Recommender Systems – Beyond Matrix
Completion. Communications oft he ACM 59 (11), 94–102.
Juels, A., A. Oprea (2013): New approaches to security and availability for cloud data.
Communications of the ACM 56 (2), 64–73.
Kagermann, H., W. Wahlster, J. Helbig, eds. (2013): Recommendations for implementing the
strategic initiative Industrie 4.0: Final report of the Industrie 4.0 Working Group. (http://www.
acatech.de/fileadmin/user_upload/Baumstruktur_nach_Website/Acatech/root/de/Material_
fuer_Sonderseiten/Industrie_4.0/Final_report__Industrie_4.0_accessible.pdf).
Kahn, H., A.J. Wiener (1967): The year 2000: A framework for speculation on the next thirty-three
years. Macmillan.
Kalyanasundaram, B., K.R. Pruhs (2000): An optimal deterministic algorithm for b-matching.
Theoretical Computer Science 233 (1–2), 319–325.
Kaplan, R.S., D.P. Norton (1996): The Balanced Scorecard: Translating Strategy into Action.
Harvard Business Review Press, Boston, MA.
Karp, R. (1992): On-line algorithms versus off-line algorithms: How much is it worth to know the
future? Proc. 12th IFIP World Computer Congress, 416–429.
Kavis, M.J. (2014): Architecting the Cloud: Design Decisions for Cloud Computing Service
Models. John Wiley & Sons, Hoboken, NY.
Keen, A. (2015): The Internet Is Not the Answer. Grove Atlantic, Inc., New York.
Keese, C. (2016): The Silicon Valley Challenge: A Wake-Up Call for Europe. Penguin Books,
London, UK.
Kelleher, J.D., B. MacNamee, A. D’Arcy (2015): Fundamentals of Machine Learning for
Predictive Data Analytics. The MIT Press, Cambridge, MA.
Kelley, T., D. Kelley (2013): Creative Confidence – Unleashing the Creative Potential in All of
Us. Crown Business, New York.
Key, S. (2015): XML Programming Success in a Day, 2nd edition. CreateSpace Independent
Publishing Platform.
Kietzmann, J.H., K. Hermkens, I.P. McCarthy, B.S. Silvestre: Social media? Get serious!
Understanding the functional building blocks of social media; Business Horizons (2011) 54,
241–251.
Kirkpatrick, D. (2011): The Facebook Effect: The Real Inside Story of Mark Zuckerberg and the
World’s Fastest Growing Company. Virgin Books, New York.
Kittur, A., J.V. Nickerson, M. Bernstein, et al. (2013): The future of crowd work. In Proceedings of
the 2013 conference on Computer supported cooperative work, 1301–1318.
Kostojohn, S., B. Paulen, M. Johnson (2011): CRM Fundamentals. Apress Springer Science +
Business Media, New York.
Lance, D., M.E. Schweigert (2013): BYOD: Moving toward a More Mobile and Productive
Workforce. Business & Information Technology. Paper 3; Montana Tech Library.
Langville, A.N., C.D. Meyer (2012): Google’s PageRank and Beyond – The Science of Search
Engine Rankings. Princeton University Press, Princeton, NJ.
Lanier, J. (2014): Who Owns the Future? Simon and Schuster, New York.
Laudon, K.C., C.G. Traver (2015): E-Commerce: Business, Technology, Society, 11th edition.
Prentice-Hall, Englewood-Cliffs, NJ.
Lebraty, J.-F., K. Lobre-Lebraty (2013): Forms of Crowdsourcing, in Crowdsourcing, John Wiley
& Sons, Inc., Hoboken, NJ USA. doi: 10.1002/9781118760765.ch4.
280 References
Lechtenbörger, J., F. Stahl, V. Volz, G. Vossen (2015): Analyzing Observable Success and
Activity Indicators on Crowdfunding Platforms; International Journal of Web Based
Communities (IJWBC) 11 (3–4), 264–289.
Lemstra, W., V. Hayes, J. Groenewegen (2010). The innovation journey of Wi-Fi: The road to
global success. Cambridge University Press.
Leskovec, J., A., Rajaraman, J.D. Ullman (2014): Mining of Massive Datasets, 2nd ed. Cambridge
University Press.
Levene, M. (2010): An Introduction to Search Engines and Web Navigation, 2nd edition. John
Wiley & Sons, New York.
Lewis, D.D. (1992): Representation and learning in information retrieval. PhD thesis, University
of Massachusetts, Amherst, MA.
Lewis, M. (2004): Moneyball: The Art of Winning an Unfair Game. W.W. Norton & Company,
New York.
Lindner, A., ed. (2016): European Data Protection Law: General Data Protection Regulation
2016. CreateSpace Independent Publishing, Seattle.
Liu, B. (2015): Sentiment Analysis – Mining Opinions, Sentiments, and Emotions. Cambridge
University Press, Boston, MA.
Lockwood, T. (2009): Design Thinking: Integrating Innovation, Customer Experience, and Brand
Value. Allworth Press, New York.
Lofaro, R. (2014). The Business Side of BYOD: Cultural and Organizational Impacts. Amazon
Media.
Loos, P., J. Lechtenbörger, G. Vossen, et al. (2011): In-memory Databases in Business
Information Systems; Business Information Systems Engineering 6, 389–395.
Lu, J., D. Wu, M. Mao, W. Wang, G. Zhang (2015): Recommender System Application
Development: A Survey. Decision Support Systems 74, 12–32.
MacManus, R. (2015): Health Trackers: How Technology is Helping Us Monitor and Improve
Our Health. Rowman & Littlefield, Lanham, MA.
MacPherson, I. (1995): Co-operative Principles for the 21st Century. International Co-operative
Alliance, Geneva.
Mahon, E. (2015): Transitioning the Enterprise to the Cloud: A Business Approach. Cloudworks
Publ. Co., Hudson, OH.
Marr, B. (2016): Big Data: Using SMART Big Data, Analytics and Metrics To Make Better
Decisions and Improve Performance. John Wiley & Sons Ltd, Chichester, UK.
Marsden, P. (2009): The 6 Dimensions of Social Commerce: Rated And Reviewed, Digital
Intelligence Today, http://digitalintelligencetoday.com/the-6-dimensions-of-social-commerce-
rated-and-reviewed/, Retrieved 26 March, 2017.
Marshall, S. (2014): What a Digital Workplace Is and What It Isn’t. CMS Wire, http://www.
cmswire.com/cms/social-business/what-a-digital-workplace-is-and-what-it-isnt-027421.php,
Retrieved 9 May 2016.
Mayer-Schönberger, V., K. Cukier (2013): Big Data: A Revolution That Will Transform How We
Live, Work, and Think. John Murray (Publishers), London, UK.
McCaskill, S. (2015): Zapp: Don’t Worry, Mobile Payments ‘Safer’ Than Shopping Online, 2017.
http://www.silicon.co.uk/e-marketing/zapp-mobile-payments-security-161630, Retrieved 26
March 2017.
McConnell, J. (2016): The Organization in the Digital Age, 10th Annual Report. http://www.
netjmc.com/digital-workplace-report/, Retrieved 9 May 2016.
Mcguire, K. (2014): SWOT analysis 34 Success Secrets. Emereo Publishing.
McKeen, J.D., H.A. Smith (2014): IT Strategy: Issues and Practices, 3rd ed. Pearson, Boston,
MA.
Mehta, A., A. Saberi, U. Vazirani, V. Vazirani (2005): Adwords and generalized on-line matching.
IEEE Symp. on Foundations of Computer Science, 264–273.
References 281
Mell, P., T. Grance (2011): The NIST Definition of Cloud Computing. Techn. Report SP800-145,
National Institute of Standards and Technology (NIST), Gaithersburg, MD. http://nvlpubs.nist.
gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf (as of Jan 8, 2017).
Miller, M. (2009): Google.pedia – The Ultimate Google Resource, 3rd edition. Que Publishing,
Indianapolis, IN.
Miller, K., J. Voas, G. Hurlburt (2012). BYOD: Security and Privacy Considerations. IT
Professional, 14 (5), 53–55.
Morrow, B. (2012). BYOD security challenges: control and protect your most sensitive data,
Network Security 2012 (12), pp. 5–8.
Muschalle, A., F. Stahl, A. Löser, G. Vossen (2013): Pricing Approaches for Data Markets. in M.
Castellanos, U. Dayal, E. Rundensteiner (Eds.): BIRTE 2012 (Proc. 6th International Workshop
on Business Intelligence for the Real Time Enterprise 2012, Istanbul, Turkey), Springer LNBIP
154, 129–144.
Musciano, C., B. Kennedy (2006): HTML & XHTML: The Definitive Guide, 6th edition. O’Reilly
Nakamoto, S. (2008): Bitcoin: A Peer-to-Peer Electronic Cash System. Available online for
example from https://bitcoin.org/bitcoin.pdf. Retrieved 15 April, 2017.
Norman, D. A. (1999): The Invisible Computer: Why Good Products Can Fail, the Personal
Computer Is So Complex, and Information Appliances Are the Solution. The MIT Press,
Boston, MA.
Norris, D.F., C.G. Reddick (2013): Local E‐Government in the United States: Transformation or
Incremental Change? Public Administration Review 73 (1), 165–175.
Özsu, M.T., P Valduriez (2011). Principles of Distributed Database Systems, 3rd ed. Springer, New
York.
Olim, J., M. Olim, P. Kent (1999): The CDnow Story: Rags to Riches on the Internet. Top Floor
Publishing, Lakewood, CO.
Oliveira, G.H.M., E.W. Welch (2013): Social media use in local government: Linkage of
technology, task, and organizational context. Government Information Quarterly 30 (4), 397–
405.
Osterwalder, A., Y. Pigneur (2010): Business Model Generation: A Handbook for Visionaries,
Game Changers, and Challengers. John Wiley & Sons, Hoboken, NJ.
Ovans, A. (2015): What is a Business Model? Harvard Business Review. https://hbr.org/2015/01/
what-is-a-business-model (last visited Feb 22, 2017).
Parker, G.G., M.W. Van Alstyne, S.P. Choudary (2016): Platform Revolution: How Networked
Markets are Transforming the Economy – and How to Make them Work for You. W. W. Norton
& Company, Ney York, London.
Patel, S., Park, H., Bonato, P., Chan, L., and Rodgers, M. (2012): A review of wearable sensors
and systems with application in rehabilitation. Journal of Neuroengineering and rehabilitation 9
(1), p. 1.
Payne, A. (2005): Handbook of CRM: Achieving Excellence in Customer Management.
Butterworth-Heinemann Elsevier, Amsterdam.
Peffers, K., T. Tuunanen, C.E. Gengler, M. Rossi, W. Hui, V. Virtanen, J. Bragge (2006): The
Design Science Research Process: A Model for Producing and Presenting Information Systems
Research. Proc. 1st Int. Conf. on Design Science Research in Information Systems and
Technology, 83–106.
Peffers, K., T. Tuunanen, M.A. Rothenberger, S. Chatterjee (2007): A Design Science Research
Methodology for Information Systems Research. Journal of Management Information Systems
24 (3), 45–77.
Petri, C.A. (1962): Communication with Automata. Dissertation, Schriften des
Rheinisch-Westfälischen Instituts für Instrumentelle Mathematik an der Universität Bonn,
Germany, Heft 2 (in German).
Plattner, H., A. Zeier (2015): In-Memory Data Management: Technology and Applications, 2nd ed.
Springer-Verlag, Berlin.
282 References
Portnoy, M. (2016): Virtualization Essentials, 2nd ed. Sybex, Indianapolis, IN.

Poushter, J. (2016): Smartphone Ownership and Internet Usage Counties to Climb in Emerging
Economies, http://www.pewglobal.org/2016/02/22/smartphone-ownership-and-internet-usage-
continues-to-climb-in-emerging-economies/, retrieved 8th August 2016.
Provost, F., T. Fawcett (2013): Data Science for Business. O’Reilly Media, Sebastopol, CA.
Qualcomm (2014): The Evolution of Mobile Technologies, https://www.qualcomm.com/media/
documents/files/the-evolution-of-mobile-technologies-1g-to-2g-to-3g-to-4g-lte.pdf, Retrieved
25 October 2016.
Quittner, J., M. Slatalla (1998): Speeding the Net: The Inside Story of Netscape and How It
Challenged Microsoft. Atlantic Monthly Press, New York.
Raj, P., A.C. Raman (2017): The Internet of Things: Enabling Technologies, Platforms, and Use
Cases. CRC Press, Boca Raton, FL.
Ramachandran, K.M., C.P. Tsokos (2015): Mathematical Statistics with Applications in R.
Academic Press, London, UK.
Rangwala, H., S. Jamali (2010): “Defining a coparticipation network using comments on Digg.”
IEEE Intelligent Systems, 25(4), 36–45.
Rallapalli, S., U.T. Austin (2014): “Enabling Physical Analytics in Retail Stores Using Smart
Glasses,” Mobicom pp. 115–126.
Reardon, D., J. Reardon (2015): A Practical Guide to Plan Your Blog. Amazon Digital Services,
Inc.
Redmond, E., J. Wilson (2012): Seven Databases in Seven Weeks: A Guide to Modern Databases
and the NoSQL Movement. The Pragmatic Programmers, LLC, Raleigh, NC.
Reed D.A., J. Dongarra (2015): Exascale computing and big data. Communications of the ACM
58 (7), 56–68.
Rensmann, B. (2012): A Multi-Perspective Analysis of Cybermediary Value Creation. Ph.D.
dissertation, University of Münster, Germany.
Richardson, C., C. Le Clair et al. (2012): Embrace Five Disruptive Trends That Will Reshape BPM
Excellence. Forrester Research, Inc. Report.
Rifkin, J. (2014): The Zero Marginal Cost Society: The Internet of Things, Collaborative
Commons, and the Eclipse of Capitalism. Palgrave Macmillan, New York.
Rosing von, M. et al. (2015). The Complete Business Process Handbook: Body of Knowledge from
Process Modeling to BPM – Vol. I. Amsterdam: Morgan Kaufmann Publishers.
Rossman, J. (2016): The Amazon Way: 14 Leadership Principles Behind the World’s Most
Disruptive Company, 2nd ed. Clyde Hill Publishing.
Ruland, B. (2015). Big Data analysis of public microblogging and protocol data using Apache
Spark: The TTIP case. Master thesis, University of Münster, Germany.
Rundle, M. Ch. Conley, eds. (2007): Ethical Implications of Emerging Technologies: A Survey.
UNESCO, Paris.
Ruparelia, N.B. (2016): Cloud Computing. The MIT Press, Boston, MA.
Saecker, M., V. Markl: (2013): Big Data Analytics on Modern Hardware Architectures: A
Technology Survey. Lecture Notes in Business Information Processing 138, Springer-Verlag,
Berlin, 125–149.
Sagefrog Marketing Group (2013): The Seven Types of Social Commerce, https://www.sagefrog.
com/blog/the-seven-types-of-social-commerce/, Retrieved 26 March 2017.
Samit, J. (2015): Disrupt You! Master Personal Transformation, Seize Opportunity, and Thrive in
the Era of Endless Innovation. Flatiron Books, New York.
Schmidt, E., & Cohen J. (2013). The New Digital Age – Rapidly Shaping the Future of People,
Nations and Business, John Murray Publ., London, U.K.
Schneier, B. (2016): Data and Goliath: The Hidden Battles to Collect Your Data and Control Your
World. W. W. Norton & Company, New York.
Schönthaler, F., A. Oberweis (2013): Social Innovation Labs: Generation and Implementation of
Innovations. In Proc. of the DOAG 2013 Applications Conference, Berlin, Germany.
References 283
Schönthaler, F., G. Vossen, A. Oberweis, T. Karle (2012): Business Processes for Business
Communities: Modeling Languages, Methods, Tools. Springer Heidelberg Dordrecht London
New York.
Schomm, F., F. Stahl, G. Vossen (2013): Marketplaces for Data: An Initial Survey; ACM
SIGMOD Record 42 (1), 15–26.
Scott J. (2013): Social Network Analysis, 3rd edition. SAGE Publications Ltd. Thousand Oaks, CA.
Seelig, T. (2012): inGenius: A Crash Course on Creativity. HarperCollins Publishers, New York.
Seelig, T. (2015): Insight Out: Get Ideas Out of Your Head and Into the World. HarperCollins
Publishers, New York.
Shan, S., L. Wang, J. Wang, Y. Hao, F. Hua (2011): Research on e-government evaluation model
based on the principal component analysis. Information Technology and Management 12 (2),
173–185.
Shasha, D., M. Wilson (2010): Statistics is Easy! 2nd ed. Synthesis Lectures on Mathematics and
Statistics, Morgan & Claypool Publishers, San Francsico, CA.
Sitaram, D., G. Manjunath (2012): Moving To The Cloud: Developing Apps in the New World of
Cloud Computing. Elsevier Syngress.
Stahl, F., F. Schomm, G. Vossen (2014): Data Marketplaces: An Emerging Species; H.-M. Haav,
A. Kalja, T. Robal, eds.: Databases and Information Systems VIII, Frontiers in Artificial
Intelligence and Applications Series, Vol. 270, IOS Press, Amsterdam, 145–158.
Stahl, F., F. Schomm, G. Vossen, L. Vomfell (2016): A Classification Framework for Data
Marketplaces; Vietnam Journal of Computer Science 3 (3), 137–143.
Steffen, D. (2013): Parallelized Analysis of Opinions and their Diffusion in Online Sources.
Master’s thesis, University of Münster, Germany, January 2013.
Stiglitz, J.E., B.C. Greenwald (2014). Creating a Learning Society. A New Approach to Growth,
Development, and Social Progress. Columbia University Press, New York.
Stieglitz, S., L. Dang-Xuan (2013). Social media and political communication: a social media
analytics framework. Social Network Analysis and Mining, 3 (4), 1277–1291.
Stone, B. (2014): The Everything Store: Jeff Bezos and the Age of Amazon. Little, Brown and Co.,
New York.
Stone, B. (2017): The Upstarts: How Uber, Airbnb, and the Killer Companies of the New Silicon
Valley Are Changing the World. Little, Brown and Company, New York.
Swenson, K. D. (2010). Mastering the Unpredictable: How Adaptive Case Management Will
Revolutionize the Way That Knowledge Workers Get Things Done. Tampa, FL: Meghan-Kiffer
Press.
Tanenbaum, A. S., D.J. Wetherall (2010): Computer Networks, 5th edition. Pearson Education Inc.,
Boston, MA.
Tanenbaum, A. S, M. van Steen (2007): Distributed Systems – Principles and Paradigms, 2nd
edition. Prentice-Hall, Upper Saddle River, NJ.
Thackray, A., D. Brock, R. Jones (2015): Moore's Law: The Life of Gordon Moore, Silicon
Valley'’s Quiet Revolutionary. Basic Books, New York.
The Verge. (2013): “Google Glass apps: everything you can do right now”, http://www.theverge.
com/2013/5/20/4339446/google-glass-apps-everything-you-can-do-right-now, last accessed:
2016/05/25.
Theurl, T., E.C. Meyer, eds. (2005): Strategies for Cooperation. Shaker-Verlag, Aachen, Germany.
Thiel, P., B. Masters (2014): Zero to One: Notes on Startups, or How to Build the Future. Crown
Business, New York.
Tian, Y., B. Song, E.N. Huh (2011). Towards the development of personal cloud computing for
mobile thin-clients. Proc. IEEE International Conference on Information Science and
Applications (ICISA), 1–5.
Turner, V. (2016): Reducing the Time to Value for Internet of Things Deployments. IDC,
Framingham, MA, USA.
Venters, W., E.A. Whitley (2012): A critical review of cloud computing: researching desires and
realities. Journal of Information Technology 27 (3), 179–197.
284 References
Vise, D.A. (2005): The Google Story – Inside the Hottest Business, Media and Technology Success
of Our Time. Macmillan, London.
Vom Brocke, J., M. Rosemann (2015a): Handbook on Business Process Management 1:
Introduction, Methods, and Information Systems, 2nd ed. Springer Heidelberg.
Vom Brocke, J., M. Rosemann (2015b): Handbook on Business Process Management 2: Strategic
Alignment, Governance, People and Culture, 2nd ed. Springer Heidelberg.
Vom Brocke, J., Th. Theresa Schmiedel (2015): BPM – Driving Innovation in a Digital World.
Springer Berlin.
Vossen, G. (2014): Big Data as the New Enabler in Business and other Intelligence. Vietnam
Journal of Computer Science 1 (1), 1–12.
Watts, D.J. (2004): Six Degrees: The Science of a Connected Age. W.W. Norton & Company,
New York.
Weikum, G., G. Vossen (2002): Transactional Information Systems – Theory, Algorithms, and the
Practice of Concurrency Control and Recovery; Morgan Kaufmann Publishers, San Francisco,
CA.
White, M. (2012). Digital workplaces Vision and reality. Business information review, 29(4), 205–
214.
White, T. (2015): Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale. O’Reilly
Williamson, O. (2005): Economics of Interfirm Networks, chapter Networks - Organizational
Solutions to future challenges. Ökonomik der Kooperation. Mohr Siebeck, Tübingen, 3–28.
Wirtz, B.W., P. Nitzsche (2013): Local level E-government in international comparison. Journal of
Public Administration and Governance 3 (3), 64–93.
Witten, I. H., E. Frank, M.A. Hall, C.J. Pal (2016): Data Mining – Practical Machine Learning
Tools and Techniques, 4th edition. Morgan Kaufmann Publishers, San Francisco, CA.
wpress4.me (2013): The 7 Species of Social Commerce, http://www.wpress4.me/the-7-species-of-
social-commerce/, Retrieved 26 March 2017.
Yayiki, E. (2016): Design Thinking Methodology Book. Design Management Institute,
ArtBizTech, Istanbul, Turkey.
Yotpo. (n.d.): The 4 Most Powerful Social Commerce Trends, https://www.yotpo.com/blog/the-4-
most-powerful-social-commerce-trends/, Retrieved 26 March 2017.
Zaki, M.J., W. Meira Jr. (2014): Data Mining and Analysis. Cambridge University Press, New
York.
Zimmer, M. (2010). Facebook’s Zuckerberg: “Having two identities for yourself is an example of
a lack of integrity”, http://www.michaelzimmer.org/2010/05/14/facebooks-zuckerberg-having-
two-identities-for-yourself-is-an-example-of-a-lack-of-integrity/, last accessed 2014/12/11.
Index
A Watch, 29
Accelerator, 229 Appliance, 98
ACID, 84 Application programming interface (API), 73
Adams, Douglas, 38 Application service provider (ASP), 72
Adaptive case management (ACM), 58 Apriori algorithm, 94, 175
Ad-blocker, 142 Artificial intelligence (AI), 252, 257, 269, 271
Adtech, 141 ArtistShare, 46
Advertising, 112 Asana, 127, 197
AdWords, 133, 135 Association rule mining, 170, 171, 173
Airbnb, 35, 77, 235, 242, 245, 283 Astroturfing, 115
Allegro, 92 Asynchronous javaScript and XML (Ajax), 9
Allstate Drivewise, 118 Atlassian confluence, 127
Amazon, 92, 113, 126, 143, 185, 226, 243 Auction, 141, 225
Alexa, 243, 252
Aurora, 68 B
AWS, 67, 68, 78, 80, 81, 86, 87, 114, 115, BackRub, 16
168, 264 Balance algorithm, 141
.com, 11 Balanced scorecard, 159
Elastic Compute Cloud (EC2), 68 Banner ads, 134
Prime video, 79 Beane, Billy, 116
Redshift, 68 Berners-Lee, Tim, 4, 7, 22, 50, 272
Simple Storage Service (S3), 68 Betweenness, 131
Mechanical Turk (AMT), 44, 47 Bezos, Jeff, 11
Web Services (AWS), 67, 114 Big data, 34, 82, 84, 154, 256, 257, 261
Andreessen, Marc, 3, 271 BigTable, 92
Android, 28, 34 Bilateral cloud intermediary, 165
AngularJS, 9 Bing, 33
AOL, 17 Bipartite graph, 138
Apache, 7, 28, 169, 182 Bitcoin, 109
Hadoop, 91 Blockchain, 109, 203
Software Foundation, 168 Blog, 36, 127, 179
Apple, 2, 30, 50, 77, 79, 96, 114, 224, 225, 236 Blogging, 36
iCloud, 78 Bluetooth, 26
iOS, 28 Bluetooth low energy (BLE), 26
iPad, 28 Bosch IoT suite, 266
iPhone, 28, 34 Bragi Dash, 31
iPod, 28 Branson, Richard, 273
iTunes, 225 Brewer, Eric, 93
Siri, 252 Brin, Sergey, 16

DOI 10.1007/978-3-319-60161-8
286 Index
Brokerage, 225 Cloud service provider (CSP), 194

Browser, 2 Cloud sourcing, 159
Brynjolfsson, Erik, 33 Cloud strategy, 161
Bush, Vannevar, 12 Cluster, 71
Business intelligence (BI), 70, 83 Cluster computing, 71
Business model, 226, 245 Clustering, 176
Business model canvas, 226 Clustrix, 92
Business process, 54 Codd, Ted, 102
Business process analysis, 57 Cognitive computer, 34
Business process design, 59 Collaborative filtering, 147
Business process engineering, 59 Community, 40, 129
Business process management (BPM), 57, 61, Community cloud, 78
102, 201 Community detection, 130
Business process modeling, 55 Company builder, 229
Business process modeling notation (BPMN), Competitive ratio, 138
101 Confidence, 174
Business process outsourcing (BPO), 212 Confluence, 66
Business process reengineering (BPR), 59, 60 Connected Car, 119
Business-to-business (B2B), 109, 110 Consumer-to-consumer (C2C), 105, 109, 110
Business-to-consumer (B2C), 109, 110 Container, 73
Business-to-government (B2G), 111 Content-based recommender, 145
Buyer-oriented marketplace, 110 Content management system (CMS), 5
BYOD, 188 Context analysis, 178
Converged system, 96
C Cooperative community cloud, 164
Cadillac, 240 COPE, 188
CAP Theorem, 93 CORE, 50
Carr, Nicolas, 66 Cosine distance, 149
Cascading style sheet (CSS), 7 Cosine similarity, 148, 150
Cassandra, 92 Cost per mille (CPM), 135
Cavoukian, Ann, 246 COTS, 71
CDnow, 11 Couchbase, 92
Censys, 254 Coursera, 272
Centrality, 130 Co-working lab, 229
Charman-Anderson, Suw, 115 Crawler, 14
Chief Digital Officer (CDO), 200, 235 CRISP-DM methodology, 170
Chief Information Officer (CIO), 200 Cross-selling, 143
Chief Process Officer (CPO), 200 Crowdfunding, 45
Chief Technology Officer (CTO), 200 Crowdjobbing, 44
Christensen, Clayton, 235 Crowdsearching, 46
Chrome, 2 Crowdsource, 47, 168
Chunk, 86 Crowdsourcing, 32, 43, 167
Clark, Jim, 3 Crowdvoting, 46
Classification, 176 CRUD operations, 84
Click stream analysis, 109 Cunningham, Lewis , 66
Click-through rate (CTR), 137, 139 Cunningham, Ward, 38
Client, 4 Customer experience (CX), 98, 107, 184, 268
Client/server principle, 4 Customer journey, 107, 121
Client-side scripting, 8 Customer profile, 185
Cloud computing, 66, 102, 159 Customer relationship management (CRM),
Cloudera, 181 42, 109, 128, 170, 185, 268
Cloud intermediary, 165 Cyber-bullying, 51
Cloud revenue model, 79 Cyber-physical production system (CPPS),
Cloud service, 67 259, 261
Index 287
Cyber-physical system (CPS), 250, 252, 255, edX, 272

257, 259, 267, 271 Elasticity, 74
Electronic business using eXtensible markup
D language (ebXML), 8
Database system, 7, 83–86, 92, 93 Electronic commerce (e-commerce), 10, 105
Data encryption, 106 Electronic government (e-government), 151
Data independence, 84 Electronic payment, 106
Data mart, 84, 177 Embedded system, 255
Data mining, 94, 109, 143, 170 Encyclopedia Britannica, 39
Datanode, 91 Energy Internet, 256
DataScienceCentral.com, 82 Engineered environment, 250
Data warehouse, 81, 84, 169 Enterprise performance management, 209
dblp.org, 18 Enterprise resource planning (ERP), 109
Decentralization, 260, 263 ERP system, 249, 256, 258, 259, 267
Declarative query language, 84 Escrow, 112
Degree of separation, 43 ETL process, 84, 169
Deployment model, 77 eTOM, 211
Design thinking, 229, 245 Etsy, 126
Deutsche Bahn (DB), 229 Euclidian space, 149
Digital economy, 200, 249 European Computer Manufacturers
Digital lab, 229 Association (ECMA), 3
Digital native, 40 EVACS, 163
Digital transformation, 232, 256, 257 Eventual consistency, 93
Digital value chain, 256, 267 Evernote, 18
Digital workplace, 194 eXtensible Markup Language (XML), 7
Digitization, 109, 223, 242, 256, 257, 270, 271
Digitized value chain, 256 F
Directed graph, 12 Facebook, 10, 34, 40–42, 87, 121, 126, 238,
Directory, 18 272
Directory Mozilla (DMOZ), 18 Facebook messenger, 122
Direct placement, 134 FaceTime, 122
Disney MagicBand, 121 FAQ, 107
Display ad, 134 Fashism, 126
Disruption, 235 FastLane blog, 36
Disruptive innovation, 235, 236 File systems, 86
Disruptor, 114, 151 File transfer protocol (FTP), 2
Distance measure, 130 Financial technology (FinTech), 233
Distributed file system, 85, 86 Firefox, 2, 32
Distributed ledger technology, 203 Fitbit, 29, 117
Distributed system, 71 Flink, 91
Distributed value chain, 263 Flinkster, 241
DIY, 111 Folding@home, 72
Document type definition (DTD), 8 Freemium, 80
Domain Name System (DNS), 22 Friedman, Tom, 22, 31–35, 50, 51, 53, 82, 111,
Dot-com bubble, 22 186
Doubleclick, 141
Dropbox, 10, 66, 78 G
Dynamo, 92 1G, 23
2G, 23, 25
E 2.5G, 25
eBay, 11, 105, 110, 126, 134, 225 3G, 25
E-democracy, 152 4G, 25
Educause, 272 5G, 25
288 Index
Gartner hype cycle, 50 business modeler, 101

General Motors, 36 Method, 55, 56, 101
Gild, 239 HP CoolTown, 119, 253
Gillmor, Dan, 3 HTML5, 51
Girwan Newman algorithm (GNA), 131 HTTPS, 106
GitHub, 35 Huawei smartwatch, 30
Givealittle, 46 Human Capital Management (HCM), 208
Gliffy, 66 Human Intelligence Task (HIT), 44
Globalization, 35, 53 Hybrid cloud, 78, 166
Global-scale data management, 87 Hyper-connected world, 190
Global System for Mobile communication Hyper-converged system, 96
(GSM), 23 Hyperlink, 12
GNU Manifest, 168 Hypertext, 12
GoFundMe, 46 Hypertext markup language (HTML), 2, 4
Google, 10, 13, 33, 51, 65, 92, 121, 199, 226, Hypertext transfer protocol (HTTP), 4
228, 272 Hypervisor, 73
App engine, 67
Apps for work, 67 I
Assistant, 252 4th industrial revolution, 251, 258, 270
Docs, 65, 197 IBM, 34, 96
File system, 86, 87 IBM research, 118
Glass, 30 IBM Watson, 34, 95, 265
Mail, 66 Incubator, 229
GoTryItOn, 126 Indexer, 14
Governance, 64 Indiegogo, 46
Governance, risk and compliance (GRC), 63, Industry 4.0, 252, 255, 259, 260, 267, 269, 270
206 Information broker, 13
Governance, risk, compliance, and security Information diffusion, 130
management (GRC+), 256, 259, 263 Information retrieval (IR), 14, 146
Government-to-business (G2B), 152 Infrastructure as a service (IaaS), 76
Government-to-citizen (G2C), 111, 152 In-memory database system, 85, 93, 261
Government-to-government (G2G), 111, 153 InnoCentive, 167
Graph, 12, 129 Innovation, 223, 245
Greedy algorithm, 135 Instagram, 66
Grid computing, 71 Integrata Foundation, 271
Gross, Bill, 133 Integrated stack systems (ISS), 96
Groupon, 126 Intel, 20
GS1, 8 Intelligent production system, 258
Intermediary, 105, 165
H Intermediary-oriented marketplace, 110
Hadoop, 34, 68, 91, 102, 178 Internet archive, 14
Distributed File System (HDFS), 91, 182 Internet corporation for assigned names and
ecosystem, 92 numbers (ICANN), 22
Halting problem, 119 Internet Explorer, 2
Hbase, 92 Internet movie database (IMDb), 41
Healthcare, 117 Internet of Services (IoS), 252
Hearable, 30 Internet of Things (IoT), 119, 250–252, 255,
Heroku, 67 256, 264, 267, 271, 272
Holistic business process management, 59 Interoperability, 260
Home automation, 117 Inverse document frequency, 146
Horizons2020, 250 IoT-enabled value chain, 256, 257
Hortonworks, 181 IoT middleware, 261, 263, 267
Horus, 231 IoT software platform, 264
Index 289
IP address, 13 Marr, Bernard, 82

IT consumerization, 189 Massive Open Online Course (MOOC), 272
Item-to-item recommendation, 148 Master data management, 209
Matching, 138
J Matrix-vector multiplication, 88, 89
Jaccard similarity, 148, 149, 151 McAfee, Andrew, 33
Java, 8 McCarthy, John, 74
JavaScript, 8, 9 McLaren, 116
Jawbone, 117 Measured service, 74
Jobs, Steve, 50, 273 Meeker, Mary, 108
jQuery, 9 Meetup, 42
JustBoughtIt, 126 Micro-payment, 112
Microsoft
K Azure, 67, 81
Kaggle, 187 Cortana, 252
Kaspersky, 123 Edge, 3
Kazaa, 6 MindSphere, 266
Keynes, Milton, 120, 234 Mobile commerce, 122
Kickstarter, 46, 126 Mobile device management (MDM), 192
Kindle, 34, 114 Modeling language, 54
Kloeckner, 234 MongoDB, 92
KPCB, 108 Moodle, 169
Kurzweil, Raymond, 20 Moore, Gordon, 20
Moore’s law, 20, 35, 51
L Morgan Stanley, 81
LAMP, 7 Mosaic, 2
Laney, Doug, 83, 116 Motorola Moto, 29
Li-Fi, 27 Mp3 format, 6
Link, 12 Multi-channel strategy, 128
LinkedIn, 41, 42, 92, 130, 239 Multimedia Messaging Service (MMS), 23
Link prediction, 130 Multi-tenant model, 74
Linux, 28, 32 Murdoch, Rubert, 42
LivingSocial, 126 Musk, Elon, 273
Localization, 123 MySpace, 42
Lock-in effect, 81 MySQL, 68
Logistics, 256
Long tail, 17, 44 N
Lufthansa, 230 Namenode, 91
Lutz, Bob, 36 Napster, 6
Lyst, 126 National Center for Supercomputing
Applications (NCSA), 2
M National Institute of Standards and Technology
Machine learning (ML), 91, 94, 118, 182, 233, (NIST), 74, 77
239, 252 National Security Agency (NSA), 254
MacManus, Richard, 31 Navigation, 13
Malicious advertising, 142 Near-Field Communication (NFC), 27
Malvertising, 142 Neo4J, 92
Mapper, 89 Netflix, 40, 65, 226, 238
MapR, 181 Netscape, 2
Map-reduce, 88, 102 NewSQL, 92, 102
MariaDB, 68 Nielsen//Netratings, 16
Market basket analysis, 94 Nike, 117
Marketing, 112 Nintendo, 23
Marketplace, 186 NoSQL system, 85, 92, 102
290 Index
NuoDB, 92 PewResearchCenter, 22
Phablet, 29
O PHP, 8
Oakland athletics, 116 Pinterest, 126
OKCupid, 80 Pitt, Brad, 116
Olim brothers, 11 Plantix, 251
Olson, Ken, 95 Platform, 46, 67, 77, 242, 262, 263
Omidyar, Pierre, 110 Platform as a service (PaaS), 76
On-board diagnostics (OBD), 118 Pokemon go, 23
On-demand society, 243 Porsche digital lab, 229
OneSpace, 168 Portal, 17
Online advertising, 134 position2, 132
Online algorithm, 137, 138 PostgreSQL, 68
Online Analytical Processing (OLAP), 81, 84 Predictive analytics, 186
Open-source, 168 Prensky, Marc, 40
Opera, 2 PRISM, 254
Oracle, 21, 262 Privacy, 240
Applications, 61 Private cloud, 78
Cloud, 67 Process mining, 203
Cloud machine, 78 Profile, 143–145
Database appliance, 99 Progressive Snapshot, 119
Engineered system, 97, 258 PROMATIS BPM Appliance, 100
Exadata database machine, 96 Prosumer, 257
Exalogic elastic cloud, 97 Public cloud, 78
Exalytics, 97, 98
Fusion ERP cloud, 62 R
IoT cloud service, 266 R, 94
TimesTen, 98 Raconteur, 273
Organization for the Advancement of Rating, 143
Structured Information Standards Razor and blades, 225
(OASIS), 22 ReadWriteWeb, 1, 31
Original Equipment Manufacturing (OEM), Recommendation, 107, 112, 142
213 Recommender system, 94
Osterwalder, Alexander, 226 Reducer, 89
Outsourcing, 72 Relational database service (RDS), 68
Overture, 135, 141 Relational data model, 84
Replication, 86, 88
P Responsive design, 106
Packet, 21 RFID tags, 32
Page, Larry, 15, 16 Risk management, 64
PageRank, 15, 37, 88 Ritchie, Dennis, 168
Partitioning, 88 Robo advisor, 233
PayPal, 10, 106, 112
Pay-per-click, 133 S
Pay-per-unit, 79 Safari, 2
Pay-per-use, 79 Salesforce, 67
Pebble watch, 46 Samsung Bixby, 252
Peer-to-peer (P2P), 6, 21 SAP, 21, 93
Personal computer (PC), 19 HANA, 93, 102, 258, 261, 266
Personal digital assistants (PDA), 33 SAS, 94
Personal digital cellular (PDM), 25 Scalability, 71
Personalization, 123 SCOR, 211
Petri, Carl Adam, 101 Schneiderman, Eric T., 115
Petri net, 54, 55, 57, 101 Search, 13
Index 291
Search advertising, 135, 139 Software-defined storage (SDS), 76

Search engine, 5, 13 Spanner, 92
SearchEngineWatch, 16 Spark, 91
Self-driving car, 119 Sparksters, 46
Self-service, 74, 111 Spotify, 65
Sentiment analysis, 179, 181 SPSS, 94
Server, 4 SQL, 84, 91
Server-side scripting, 8 Staging area, 84, 169
Service, 210 Standard Generalized Markup Language
Service-orientation, 73, 260 (SGML), 8
Service-oriented architecture (SOA), 61, 73, Starmind, 127
102 STATA, 94
Service set identifier (SSID), 25 Statistical computing, 94
SETI@home, 72 Strategy, 158, 160
SG20, 254 Supplier-oriented marketplace, 110
Share (or sharing) economy, 241, 246 Supply-Chain Management (SCM), 109
Shodan, 254 Support, 173
Shopping basket, 106 SWOT, 159, 178
Short Messaging Service (SMS), 23 Syncplicity, 67
Siemens, 250
Signavio, 102 T
Silicon Valley, 108, 120 Tablet, 29
Similarity, 143, 145 Tag, 7, 18
SimilarWeb, 84 Tagging, 18
SimpleDB, 92 Tamyca, 241
Simulation, 57, 261 Technological unemployment, 270
Skype, 21, 122 Tech Shop, 231
Slack, 127, 197 TELOS dimensions, 158
Slashdot, 36 Teradata, 96
Slideshare, 133 Term frequency, 146
Small and medium-sized enterprise (SME), 69, Tesla, 119
162 Text analysis, 179
Smart city, 120 TF-IDF score, 146
Smart ERP system, 249, 256, 259, 261, 267 Thiel, Peter, 271
Smart factory, 252, 255 Thin client, 68
Smart home, 117 ThinkFree cloud office, 67
Smartphone, 29 Thompson, Ken, 168
Smartwatch, 29 Threadless, 126
Snapchat, 41, 122 TradeMe, 105
Snowden, Edward, 254 Transmission control protocol/Internet protocol
Social commerce (s-ecommerce), 112, 124 (TCP/IP), 21
Social innovation lab, 229, 230 Triangle inequality, 130
Social innovation management, 217 TripCase, 66
Social manufacturing and logistics, 256 Tripit, 66
Social media analytics, 183 Trusted third party, 112
Social media optimization (SMO), 126 Turing machine, 119
Social network, 40, 42, 129 Twitter, 34, 42, 121, 126, 133
Social network analysis, 43 Typepad, 37
Social shopping, 125
Social software, 40 U
Socialtext, 127 Uber, 235, 237
Software appliance, 99 Ubernote, 19
Software as a Service (SaaS), 76, 256 UC Berkeley, 91
Software-defined networking (SDN), 51, 76 Undecidability, 120
292 Index
Undirected graph, 129 What you see Is what you get (WYSIWYG), 5
Uniform resource identifier (URI), 9 Wi-Fi, 25
Universal resource locator (URL), 13 Wi-Fi protected access (WPA), 25
UNIX, 168 Wiki, 37, 127
Urban Engines, 120 Wikimedia, 40
User generated content (UGC), 35 Wikipedia, 38
UserLand, 37 WikiWikiWeb, 38
User-to-user recommendation, 148 Wired, 42, 44
Utility computing, 74 Wired equivalent privacy (WEP), 25
Utility function, 144 Wireless local area network (WLAN), 25
Utility matrix, 144 Wojcicki, Susan, 271
WordPress, 37
V Workflow Management Coalition (WfMC), 58
Value chain, 257 Work life balance, 256
Vector, 145 World Economic Forum (WEF), 50
Viber, 122 World-Wide Web (WWW), 1, 2
Virtualization, 72, 260 World-Wide Web Consortium (W3C), 3, 22
Virtual lab, 230 WPA2, 25
Virtual machine monitor, 73
Virtual machine (VM), 73 X
Virtual memory, 72 XAMPP, 7
Virtual organization, 263 XHTML, 8
Virtual private cloud, 78 XMLHttpRequest, 9
Voice-controlled application, 252 XML net, 57
Voice over IP (VoIP), 21 XML Schema Definition (XSD), 8
Voldemort, 92
VoltDB, 92 Y
Yahoo!, 10, 17, 33, 135
W Yelp, 126
Wales, Jimmy, 39 Yet another resource negotiator (YARN), 91,
Watson, 34, 95, 265 182
Watson, Thomas, 95 YouTube, 6, 42
Wayback machine, 14
Wearable, 29 Z
Web, 1 Zapp, 123
2.0, 1 Zeitgeist, 16
conferencing, 197 Zetsche, Dieter, 253
of trust (WOT), 240 Zipcloud, 10
service, 9 Zoho, 65, 67
WhatsApp, 122, 239

2017 Book TheWebAtGraduationAndBeyond

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2017 Book TheWebAtGraduationAndBeyond

Uploaded by

Copyright:

Available Formats

Gottfried Vossen

The Web at Graduation

ISBN 978-3-319-60160-1 ISBN 978-3-319-60161-8 (eBook)

Library of Congress Control Number: 2017943255

© Springer International Publishing AG 2017

Printed on acid-free paper

This Springer imprint is published by Springer Nature

collaboration never dreamt of before. Interestingly, it is not the technical limitations

Goals of the book:

Obviously, the book can be read chronologically. A reader not so interested in

Münster, Germany Gottfried Vossen

2.2 Virtualization and Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . 65

3.7 Electronic Government . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

6 The Road Ahead: Living in a Digital World . . . . . . . . . . . . . . . . . . . . 249

© Springer International Publishing AG 2017 1

agencies through hyper-connected, high-speed networks that can be accessed

1.1.1 Browsers: Mosaic and Netscape

1.1.2 Client/Server and P2P

Fig. 1.1 The client/server principle

Request (URL) get

Fig. 1.2 Client-side versus server-side scripting

1.1.3 HTML and XML

In parallel to hardware becoming a commodity over the last 40 years, software

of the Standard Generalized Markup Language (SGML) and is widely used in

1.1.4 Commerce on the Web

1.2 The Search Paradigm

1.2.1 The Web as a Graph

Web page Hyperlink

Fig. 1.3 Navigation through the Web along hyperlinks

1.2.2 Search Engines

Web pages User query

Fig. 1.4 Anatomy of a search engine

recommended by another doctor, that recommendation might count as 100%; a

1.2.3 The Long Tail

the long tail

Fig. 1.5 The long tail (of search queries)

Arts Business Computers

Fig. 1.6 DMOZ categories

that typically provides some personalization features (e.g., choice of language). As

1.3 Hardware Developments

We have previously mentioned some of the key (hardware or software) technolo-

1.3.1 Moore’s Law: From Mainframes to Smartphones

Interestingly, a similar trend of evolving from an expensive and rare technicality

1.4 Mobile Technologies and Devices

Mobile technology is one example of rapid technology evolution that is changing

1.4.1 Mobile Infrastructure

Table 1.1 Cellular technology generations

Synchronous voice, capacity, Broadband data Completely IP-Oriented, Wearable devices

Table 1.2 Evolution of Wi-Fi Standards

Table 1.3 Bluetooth versus Bluetooth Low Energy

0 4x10 10 3x10 11 4x10 14 7.9x10 14 3x10 16 3x10 19

Micro- Infra- Ultra- Gamma

Fig. 1.7 Wave spectrum

1.4.2 Mobile Devices

1.5 From a Flat World to a Fast World that Keeps

with their various aspects of inbound or outbound logistics, supply chain

1. Think as a new immigrant, i.e., pursue opportunities more energetically, per-

1.6 Socialization. Comprehensive User Involvement

In addition to Tom Friedman, several generations of users have changed their

1.6.1 Blogs and Wikis

Wikipedia has meanwhile become part of Wikimedia, a “global movement

1.6.2 Social Networks

• Communities of transactions which are characterized by the fact that they