Professional Documents
Culture Documents
Frank Schönthaler
Stuart Dillon
The Web
at Graduation
and Beyond
Business Impacts and Developments
The Web at Graduation and Beyond
Gottfried Vossen Frank Schönthaler
•
Stuart Dillon
123
Gottfried Vossen Stuart Dillon
Department of Information Department of Management Systems
Systems, ERCIS University of Waikato
University of Münster Hamilton
Münster New Zealand
Germany
Frank Schönthaler
PROMATIS Group
Ettlingen, Baden-Württemberg
Germany
At least two of the authors of this book have reached an age at which it is not
uncommon to witness, often in disbelief and sometimes stunned, the younger
generation that are commonly referred to as millennials, or Generation Y. We
encounter millennials in talent management, in the workplace, in associations and
clubs, but also in our families; in other words, we associate with them in all sectors
of society. Born in the period that spans the early 1980s to the beginning of the new
millennium, millennials constantly push boundaries in search for new meaning and
the perfect work-life balance. Their professional biographies are characterized by
internationalization, changing fields of activity, and phases of varying work
intensity, exhibiting a mixture of professional as well as social tasks. They demand
increased societal as well as environmental responsibility from their employers, and
they require room for their personal development. More importantly, they seek
praise and appreciation for their dreams, their visions, and their achievements.
It is of little value to examine and assess millennials based on their current
behaviors—they are way too dynamic for that. Instead, we must comprehend where
they come from, what has formed them, what influences them, and what they
consider important in their lives. To achieve this, we must look back at their early
years in life, their time at school, and their professional training which is when the
millennials often finish their formal education or studies—in other words, when
millennials reach graduation. This provides us with the basis on which we might
dare to look into their future.
Considerations like these have motivated us to write this book. After all, the
Web is just another (albeit non-human) millennial and unquestionably one of the
most important companions in the lives of human millennials. Its evolution has
taken it from a simple means to communicate as Web 1.0 to a Web of participation,
otherwise known as the Social Web or Web 2.0, to today where the Web now
infiltrates all aspects of our private and professional lives as a core driver of dig-
itization. While the early Web had a focus on rationalizing work procedures and
providing a repository for information, the Social Web enabled improvements of
process quality and ‘flattened’ the pathway into the digital world. Facilitating this
was improved usability, enhanced interactivity, and ubiquitous access, in particular
via mobile platforms. Today Web-based technologies are the enabler for novel
forms of customer experience, for disruptive business models, and for modes of
vii
viii Preface
• Big data analytics: How to exploit big data scenarios for the benefit of my
business? What is a reasonable Big Data architecture (beyond a data
warehouse)?
• Social media data: How to handle the big data produced in and by social media
today? How to distinguish relevant from irrelevant data? How to measure the
value of a social media presence?
• Business Intelligence: Which adaptations need to be made to our BI processes?
• IT Decision Making and Strategy: Bring Your Own Device (BYOD) versus
Company Owned, Personally Enabled (COPE).
We can see a variety of paths through the various chapters of this book, indicated
in the following picture:
3 4
inconsistencies. Sabine Schwarz prepared the figures, ensured they were presented
accurately and uniform throughout the text. Lena Hertzel checked all the references
for us and corrected a number of citation errors.
1 The Web from Freshman to Senior in 20+ Years (that is, A Short
History of the Web). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Beginnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Browsers: Mosaic and Netscape . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Client/Server and P2P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 HTML and XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Commerce on the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 The Search Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 The Web as a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.2 Search Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.3 The Long Tail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.4 Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Hardware Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.1 Moore’s Law: From Mainframes to Smartphones . . . . . . . . 19
1.3.2 IP Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Mobile Technologies and Devices . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4.1 Mobile Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4.2 Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 From a Flat World to a Fast World that Keeps Accelerating . . . . . 31
1.6 Socialization. Comprehensive User Involvement . . . . . . . . . . . . . . 35
1.6.1 Blogs and Wikis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.6.2 Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.6.3 The Crowd as Your Next Community . . . . . . . . . . . . . . . . 43
1.7 The Web at Graduation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2 Digital (Information) Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.1 Digitized Business Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.1.1 What Is the Problem? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.1.2 Business Process Modeling and the Horus Method . . . . . . . 55
2.1.3 Holistic Business Process Management . . . . . . . . . . . . . . . . 57
2.1.4 BPM Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
xi
xii Contents
Since its inception in the early 1990s, the World-Wide Web (the Web for short) has
revolutionized our personal and professional lives and indeed our society more than
many other technological developments in recent history. In this first chapter, we
will outline the evolution of the Web from the early days until today, having just
turned 25 in August 2016.1 This involves taking a brief tour of the history of the
Web, during which we touch upon some of its underlying technological develop-
ments, which have enabled its evolution (and continue to do so). This relates to
hardware, software as well as computer networks and their rapid evolution during
the past 2.5 decades. From a usage perspective, we look at what we have experi-
enced over the past 25 years, primarily viewing the Web as an ever-growing and
omnipresent library of information which we access through search engines and
portals, the Web as a media repository facilitating the hosting and sharing of
resources—often for free, and the Web as an enabler of do-it-yourself services as
well as of disruptive developments in many established industries. In later chapters
we will also discuss the Web as a commerce platform through which people and
companies increasingly conduct their business. We will also look at the compre-
hensive and unprecedented user involvement in the Web. Here, it will not come as a
surprise that the younger generation of ‘digital natives’ today interacts with the Web
in an entirely different way than people did at its inception. In all of this, an
important role is played by data, i.e., the vast amounts of data that are produced on
the Web at increasing speeds and volumes. For some time now, the buzzword here
has been “big data,” although size is only a small part of the story.
These aspects, their impacts, and their results will take us through an evolution
that went from “freshman Web 1.0” which was mainly usable as a one-way
information repository, to “junior Web 2.0,” also termed the “Read-Write Web,”
where end-users started contributing content to the Web, to the situation we are
facing today: constant interaction between people, business, even government
1
www.computerhistory.org/atchm/happy-25th-birthday-to-the-public-web/
1.1 Beginnings
Transmit yourself back to 1993, when the World-Wide Web, the WWW, or the
Web as we have generally come to call it, had just arrived. Especially in academia,
where people had been using the Internet since the late 1970s and early 1980s in
various ways and for various purposes including file transfer and email, it quickly
became known that there was a new service available on the Internet. Using this
new service, one could request a file written in a language called HTML (the
Hypertext Markup Language, see below), and if one had a program called a
browser installed on a local computer, that program was able to display or render
the HTML file when it arrived. In simple terms, the Internet had been transformed
from a scientific tool requiring expertise to use and being available only to a small
number of expert users, to an information discovery tool requiring little expertise
and now available to the mass market. We start our tour through the history of the
Web by taking a brief look at browsers.
One of the first browsers was Mosaic, developed by the National Center for
Supercomputing Applications (NCSA) at the University of Illinois in
Urbana-Champaign in the US. There had been earlier browser developments (e.g.,
Silversmith), but Mosaic was the first graphical browser which could display more
than just plain ASCII text (which is what a text-based browser does). The first
version of Mosaic had limited capabilities: It could access documents and data
using the Web, the File Transfer Protocol (FTP), or several other Internet services;
it could display HTML files comprising text, links, images (in different formats),
and already supported several video formats as well as Postscript; it came with a
toolbar that had shortcut buttons; it maintained a local history as well as a hotlist,
and it allowed the user to set preferences for window size, fonts, etc.
The initial version of Mosaic was launched in March 1993, its final version in
November the same year, and although far from modern browser functionality with
all their plug-ins and extensions, users pretty soon begun to recognize that there was
a new “beast” out with which one could easily access information that was stored in
remote places. A number of other browsers followed, in particular Netscape Nav-
igator (later renamed Communicator, then renamed back to just Netscape) in 1994,
Microsoft Internet Explorer in 1995, Opera in 1996, Apple’s Safari in 2003,
Mozilla Firefox in 2004, and Google Chrome in 2008. Netscape and Microsoft soon
1.1 Beginnings 3
got into what is now known as the browser war (see Quittner and Slatalla 1998),
which at the time was won by Microsoft, although as we will see later, in tech-
nology nothing is permanent. The playing field has changed significantly in recent
years with Google Chrome now dominating the market with close to 60% market
share as of November 2016. Microsoft’s Internet Explorer, now replaced by
Microsoft Edge, Mozilla Firefox and Apple’s Safari are a significant distance
behind (see: www.w3counter.com/trends).
In mid-1994, Silicon Graphics founder Jim Clark started to collaborate with
Marc Andreessen to found Mosaic Communications (later renamed to Netscape
Communications). Andreessen had just graduated from the University of Illinois,
where he had been the leader of the Mosaic project. They both saw the great
potential for Web browsing software, and from the beginning Netscape was a big
success (with more than 80% market share at times), in particular since the software
was free for non-commercial use and came with attractive licensing schemes for
other uses. Netscape’s success was also due to the fact that it introduced a number
of innovative features over the years, among them the on-the-fly displaying of Web
pages while they were still being loaded; in other words, text and images started
appearing on the screen as they were downloading. Earlier browsers did not display
a page until everything that was included had been loaded, which had the effect that
users might have to stare at an empty page for several minutes and which caused
people to speak of the “World-Wide Wait.” With Netscape, however, a user could
begin reading a page even before its entire contents was available, which greatly
enhanced the acceptance of this new medium. Netscape also introduced other new
features (including cookies, frames, and later JavaScript programming), some of
which eventually became open standards through bodies such as the W3C, the
World-Wide Web Consortium (w3.org), and ECMA, the European Computer
Manufacturers Association (now called Ecma International, see www.ecma-
international.org). An image of Version 4 of the Netscape homepage of April
1999 with its “Netcenter” site collection can be found, for example, at blogoscoped.
com/archive/2005-03-23-n30.html; the main features included then were the menu
bar, the navigation, address, and personal toolbars, the status bar, or the component
bar.
Although free as a product for private use, Netscape’s success was big enough to
encourage Clark and Andreessen to take Netscape Communications public in
August 1995. As Dan Gillmor wrote in August 2005 in his blog: “I remember the
day well. Everyone was agog at the way the stock price soared. I mean, this was a
company with scant revenues and no hint of profits. That became a familiar concept
as the decade progressed. The Netscape IPO was, for practical purposes, the Big
Bang of the Internet stock bubble—or, to use a different metaphor, the launching
pad for the outrages and excesses of the late 1990s and their fallout. … Netscape
exemplified everything about the era. It launched with hardly any revenue, though it
did start showing serious revenues and had genuine prospects … .” Netscape was
eventually retired in 2008.
4 1 The Web from Freshman to Senior in 20+ Years …
Mosaic already had basic browser functionality and features that we have mean-
while gotten used to, and it worked in a way we are still using browsers today: the
client/server principle applied to the Internet. This principle is based on a simple
idea, illustrated in Fig. 1.1: Interactions between software systems are broken down
into two roles: Clients request services, servers provide them. When a client needs a
service such as database access, an e-mail to be sent or a print function to be
executed on its behalf, it sends a corresponding request to the respective server. The
server will then process this request, i.e., execute the access, message sending, or
printing, and will eventually send a reply back to the client.
This simple scheme has been used widely, and it is this scheme that interactions
between a browser and a Web server are based upon. A common feature of this
principle is that it often operates in a synchronous fashion: While a server is
responding to the request of a client, the client will typically sit idle and wait for the
reply; only when the reply has arrived, the client will continue whatever it was
doing before sending off the request. This form of interaction is often necessary; for
example, if the client is executing a part of a workflow which needs data from a
remote database, this part cannot be completed before that data has arrived. This
dependency has also been common in the context of the Web until recently, when
asynchronous interaction became more dominant.
The basics that led to launching the Web as a service sitting atop the Internet
were two quickly emerging standards: HTML, the Hypertext Markup Language,
and HTTP, the Hypertext Transfer Protocol. The former is a language, developed
by Tim Berners-Lee at CERN, the European particle physics lab in Geneva,
Switzerland, for describing Web pages, i.e., documents a Web server will store and
a browser will render. The latter is a protocol for transferring a request for a page
from a client to a Web server and for transferring the requested page in a reply back
to the browser. Thus, the client/server principle is also fundamental for the inter-
actions happening on the Web between browsers and Web servers (see Fig. 1.2).
Over the years, HTML has become extremely successful as a tool that can be
employed even without a deep understanding of programming languages to put
Request
Reply
Client Server
Client Server
CSS
external local
sources sources doc+
script
information on the Web. The reasons for this include the fact that HTML is a vastly
fault-tolerant language, where programming errors are simply ignored, and the
availability of numerous tools, from simple text editors to sophisticated WYSI-
WYG (What You See Is What You Get) environments, for producing HTML
documents.
The client/server principle has over time undergone a generalization, since it has
also had an impact on how users see the Web and the information it provides.
Initially, when the Web first appeared and HTML became available as a markup
language for Web pages, people composed their HTML code in a text editor, a way
that still works today. A few years later, tools became available for designing Web
pages and for setting up Web sites more and more easily. Some of these simply
allowed users to design HTML documents and to include links, graphics, maybe
even audio and video in a WYSIWYG fashion, others allowed for an easy man-
agement of entire Web sites comprised of multiple pages. The modern result of this
development are content management systems (CMS), which underpin most major
Web sites today, in particular those maintained at an enterprise level (Barker 2016).
What is more important is the fact that over time more and more people started
setting up sites using these tools, and the obvious consequence was that the
information available on the Web grew exponentially. Once a site had been created,
the next important issue was to facilitate browsers to locate it, for which the
emerging class of search engines provided registration mechanisms, sometimes for
free, although increasingly with a fee. This also led to the development of tricks
that, for example, faked high popularity of a site just in order to get a high (i.e.,
close to the top) placement within search results. Besides text documents, people
soon started to place other types of documents on the Web, in particular media such
6 1 The Web from Freshman to Senior in 20+ Years …
as image, audio, and video files. Now every Web user is likely to have experienced
how easy it is to save (actually copy) an image found in an HTML document: just
right-click on the image and select the “save image as” option! Similarly, audio and
video files can easily be downloaded and copied to a local computer, as long as
access to these files is granted. The fact that obtaining information from the Web
became so easy and because of the sheer number of files available on the Web, the
way was paved for a new attitude towards information and its consumption.
It soon turned out that the traditional client/server model behind the Web was
less than optimal for some interactions, including the download or streaming of
large files, e.g., a video file that contains a 90-minute movie. Video streaming is not
just a matter of bandwidth; it is also a matter of a single server being occupied with
a large request for quite some time. In response to this problem, peer-to-peer (P2P)
networks were devised, which bypassed the need for a central server to take care of
all incoming requests. Instead, a P2P network primarily relies on the computing
power and bandwidth of its participants and is typically used for connecting nodes
via mostly ad-hoc connections. A P2P network also does not distinguish between a
client and a server; any participant in the network can function as either a client or a
server to the other nodes of the network, as needed by the task at hand. In fact, a
P2P system comes with complete and decentralized self-management and resource
usage, and it enables two or more peers to collaborate spontaneously in a network
of equals (peers) by using appropriate information and communication systems.
As mentioned, one of the many uses for P2P networks is the sharing of large
files, which is today done on a large scale on the Internet. The different P2P systems
in use are based on distinct file-sharing architectures with different principles,
advantages, disadvantages and naming conventions. One of the consequences of
developments like P2P networks, communication protocols, and other tools just
described, has been that information and files started to become available on the
Web which previously had been relatively difficult and costly to acquire. We are not
delving into legal issues related to services like Napster, Kazaa or Mega Upload
here or into general issues related to copyright or intellectual rights and their
protection. However, fact is that many users around the globe have started using the
Internet and the Web as a free source for almost everything. For example, once the
mp3 format had been invented as a digital audio encoding and compression format
by Fraunhofer’s Institut für Integrierte Schaltungen in Erlangen, Germany, music
got transformed into mp3 format en masse and then could be copied freely between
computers and other devices. As a result, users started illegally “ripping” music
CDs and exchanging their content over the Internet; others then took videos of
recently released movies with a camcorder in a cinema, compressed them into a
suitable video format, and put them on a file-sharing network for general copying.
Today, the most prominent site for online video is YouTube, from where music and
video can be streamed. As an aside, we direct the reader interested in how the music
business changed as a consequence of the above to www.digitalmusicnews.com/
2014/08/15/30-years-music-industry-change-30-seconds-less/.
1.1 Beginnings 7
2
www.apachefriends.org
8 1 The Web from Freshman to Senior in 20+ Years …
In order to enhance Web programming even further, a recent idea has been to not
only allow HTML creation or modification on the fly (“dynamically”), but to be
able to provide direct feedback to the user via on-the-fly HTML generation on the
client. This, combined with asynchronous processing of data which allows sending
data directly to the server for processing and receiving responses from the server
without the need to reload an entire page, has led to a further separation of user
interface logic from business logic now known by the acronym Ajax (Asyn-
chronous JavaScript and XML). Ajax was one of the first Web development
techniques that allow developers to build rich Web applications that are similar in
functionality to classical desktop applications, yet they run in a Web browser. Its
main functionality stems from an exploitation of XMLHttpRequest, a JavaScript
class (with specific properties and methods) supported by most browsers which
allows HTTP requests to be sent from inside JavaScript code. Ajax calls are pro-
vided by popular JavaScript libraries such as jQuery or AngularJS.
Out of the numerous applications XML has seen to date, we just mention Web
services, which extend the client/server paradigm by the notion of a registry,
thereby solving the problem of locating a service in a way that is appropriate for the
Web. In principle, they work as follows: A service requestor (client) looking for a
service sends a corresponding query to a service registry. If the desired service is
found, the client can contact the service provider and use the service. The provider
has previously published his service(s) in the registry. Hence, Web services hide all
details concerning their implementation and the platforms they are based on; they
essentially come with a unique URI3 that points to their provider. Since Web
services are generally assumed to be interoperable, they can be combined with other
services to build new applications with more comprehensive functionality than any
single service involved. It has to be noted, however, that this appealing concept has
been obscured by the fact that vendors have often insisted on proprietary registries,
thereby hindering true interoperability.
Roughly during the mid-90s people started thinking about ways to monetize the
Web and discovering that there is also a commercial side to the Web. We have
already mentioned the Netscape IPO, but commercialization was and is not just
about buying (and eventually selling) Internet companies.
A first step towards commercialization has been to attract user attention and,
once obtained, to retain it. A popular approach has been to require registration in
exchange for access to additional features or services, or even to the site at all.
Without being historically precise about the order in which this has occurred,
examples include Amazon.com, which let users create their personal wish list after
3
A URI is a Uniform Resource Identifier or character string used to identify a resource. The most
common form of a URI is the Uniform Resource Locator (URL).
10 1 The Web from Freshman to Senior in 20+ Years …
logging in, as well as (former) Yahoo!,4 Google, or Facebook. Once you have
registered for an account at any of these or many other sites, you may be allowed to
use storage space, communicate with other people, or set your personal preferences.
Sites such as Dropbox or zipcloud allow registered users to upload files, invite other
participants to access your files, or use their free email or messaging service etc.
What you may have to accept as a kind of compensation is that advertisements will
be placed on the pages you look at, next to the results of searches you do on that
site, or be sent to your email account from time to time. As we will discuss in more
depth later, advertising on the Web has become one of the most prominent Internet
business models, and the idea of “free” sites just described turns out to be a highly
attractive advertising channel. Clearly, the more people register at a site, i.e., reveal
some of their personal data and maybe even a user profile of preferences and
hobbies, the more data the site owner will have available and the more he can do
with it. Experience also shows that people do not re-register for similar service
functionality from distinct providers too often. Thus, there is some form of cus-
tomer retention right away, and then is it often just a small step to starting to offer
these customers a little extra service for which they then, however, have to pay.
Commercialization of the Web has in particular materialized in the form of
electronic commerce, commonly abbreviated e-commerce, which involves moving
a considerable amount of shopping and retail activity essentially from the street to
the Web or from the physical to a virtual world. More generally, e-commerce refers
to selling goods or services over the Internet or over other online systems, where
payments may be made online or otherwise. It was typically during the weeks
before Christmas in which the success as well as the growth of e-commerce could
be measured best every year. In the beginning, customers were reluctant to do
electronic shopping, since it was uncommon, it was not considered an “experience”
as it may well be when strolling through physical shops, and it was often considered
unreliable and insecure; initially, paying by credit card over the Web was even a
“no-go.” Many companies entering this new form of electronic business were not
ready yet, unaware of the process modifications they would have to install in their
front- and back-offices, and unfamiliar with the various options they had from the
very beginning. Major obstacles in the beginning also were lacking security, in
particular when it came to payments over the Web, and lacking trust, in particular
when it came to the question of whether goods I had paid for would indeed be
delivered to me. The global nature of the Web also resulted in a range of geo-
graphical, legal, language, and taxation issues being uncovered. As a consequence,
e-commerce took off slowly in the very beginning. However, the obstacles were
soon overcome, for example by improvement in hardware and software (e.g.,
session handling), by appropriately encrypting payment information, by corre-
sponding measures from credit card companies, or by the introduction of trusted
third parties for handling the physical aspects of sales transactions such as PayPal.
Regional issues were addressed by way of “localized” mirror sites. Then, towards
the end of the 20th century, e-commerce started flying, with companies such as
4
which was renamed Altaba in early 2017 after selling its Web business.
1.1 Beginnings 11
CDnow and Amazon.com, later also eBay, and sales figures soon went beyond the
billion-dollar threshold. Today, also “brick-and-mortar” retail chains such as
Walmart, Target, or Costco make considerable, if not most of their revenues online,
in addition to the numerous stores they run in a traditional way.
However, it was also discovered that e-commerce and selling over the Web was
not the only way of making money on or through the Web. Indeed, another was
placing advertisements, and ultimately to introduce paid clicks. Besides all this is,
of course, the telecommunication industry, for which technological advances such
as the arrival of DSL or wireless networks brought entirely new business models for
both the professional and the private customer.
CDnow is a good example of how setting up a new type of business on the Web
took off. CDnow was created in August 1994 by brothers Jason and Matthew Olim,
roughly at the same time Jeff Bezos created Amazon.com. As they describe in Olim
et al. (1999), their personal account of the company, it was started in the basement
of their parents’ home; Jason became the president and CEO and Matthew the
Principal Software Engineer. The company was incorporated in Pennsylvania in
1994 and originally specialized in selling hard-to-find CDs. It went public in
February 1998, and after financial difficulties eventually merged with Bertelsmann,
the big German media company, in 2000. CDnow became famous for its unique
internal music rating and recommendation service, which was also often used by
those who had never actually purchased a product on the site. In late 2002,
Amazon.com began operating the CDnow web site, but discontinued CDnow’s
music-profiling section.
What the Olim brothers detected early on was that the Web offered a unique
chance to provide not only basic product information, but highly specialized
information that previously had required an enormous amount of research to come
by. This is what they provided for music on CDs, and they combined their infor-
mation and catalogue service with the possibility to buy CDs directly from them. At
some point and for a short period of time, CDnow was probably the best online
store for music, as it was able to integrate so much information on a CD, on an
artist, or on a group in one place and in so many distinct categories. Their selection
was enormous, and most of the time whatever they offered could be delivered
within days. They also ran into problems that nobody had foreseen in the very
beginning, for example that customs fees often had to be paid when a package of
CDs is delivered to an addressee in a foreign country. In other words, legal issues
related to commerce over a network that does not really have physical boundaries,
came up in this context (as it did for any other shop that now started selling
internationally), and many of these issues remain unresolved today. We note that
this issue does not apply in most parts of the world, for non-physical goods, which
can be distributed electronically and for which these issues do not exist. Apple’s
iTunes is currently by far the most popular service for distributing music
electronically.
We will return to the topic of e-commerce and several of its distinctive features
in Chap. 3.
12 1 The Web from Freshman to Senior in 20+ Years …
The key to what made the Web so popular early on is the fact that a Web page or an
HTML document can contain hyperlinks or links for short, which are references to
other pages (or other places in a current page). The origin of this is hypertext, an
approach to overcome the linearity of traditional text that was originally suggested
by Vannevar Bush in an essay entitled As We May Think which appeared in The
Atlantic Monthly in July 1945 (see www.theatlantic.com/magazine/archive/1945/
07/as-we-may-think/303881/). Selecting a link that appears in a given HTML
document causes the browser to send off a request for the page whose address is
included in the link (or, if the link points to another place in the current page, to go
that position); this page will then be displayed next.
Figure 1.3 presents a simplistic, graphical portrayal of what the intuition just out-
lined can mean; the Web is a large collection of hyperlinked documents and can be
perceived, from a more technical point of view, as a directed graph in which the
individual pages or HTML documents are the nodes, and in which links leading
from one page to another (or back to the same page) are the (directed) edges.
Figure 1.3 hence shows a very small and finite sample of nodes and links only (but
it can easily be extended in any direction and by any number of further nodes and
edges).
Links in HTML are technically anchors which typically are composed of a name
(that will show up in the document where the links is placed) and a URL, a
Universal Resource Locator or logical address of a Web page. When a user clicks
on the link, the browser will contact the Web server behind that URL (through
common network protocols which, among other things, will ensure name resolu-
tion, i.e., translate the URL into the physical IP address of the computer storing the
requested resource, through various steps of address translation) and request the
respective HTML document (cf. Fig. 1.2). Links allow a form of navigation
through the Web, the idea being that if something that a user is looking for is not
contained in the current page, the page might contain a link to be followed for
getting her or him to the next page, which may in turn be more relevant to the
subject in question, or may contain another link to be followed, and so on. Links,
however, need not necessarily point to other pages (external links), but can also be
used to jump back and forth within a single page (internal links) or they can link to
different types of content (e.g., images, videos).
From a conceptual point of view, it is fair to say that the Web is a very large and
dynamic graph in which both nodes and edges come and go. Moreover, parts of the
Web might be unreachable at a time due to network problems, or Web designers
may add new pages with links and from time to time remove old ones. As a
consequence, looking up information on the Web typically relies on exploration,
i.e., a progression along paths or sequences of nodes without predetermined targets.
This is where the activity of search comes in. In the early days of the Web,
automated tools for exploration and search had not yet been developed; instead
these activities were often done manually by an information broker. While the
information broker as a job description has lost relevance over the years due to the
arrival of automated tools, an important aspect is still around today, that of price
comparisons. Indeed, comparing prices over the Web has become an important
activity, for both companies and individual users, and is a form of information
brokering still available today through companies or sites such as DealTime,
mySimon, BizRate, Pricewatch, or PriceGrabber, to name just a few.
Search engines are today’s most important tool for finding information on the Web,
and they emerged relatively soon after the Web was launched in 1993. Although “to
search” the Web is nowadays often identified with “to google” the Web (see
searchenginewatch.com for stats about which search engine gets how much traffic),
Google was a relative latecomer, and will most likely not be the last. The early dates
were dominated by the likes of Excite (launched in 1993), Yahoo, Webcrawler,
Lycos, and Altavista, all of which came into being in 1994. Google however, it has
dominated the search field ever since its launch in the fall of 1998, and it has
14 1 The Web from Freshman to Senior in 20+ Years …
invented many tools and services now taken for granted. For fairness reasons, we
mention that InfoSeek, AlltheWeb, Ask, Vivisimo, A9, Wisenut, Windows Live
Search, as well as others, have all provided search functions, at some point in time,
over the past 25 years.
Search has indeed become ubiquitous. Today people search from the interface of
a search engine, and then browse through an initial portion of the often thousands or
even millions of answers the engine returns. Search often even replaces entering a
precise URL into a browser. In fact, search has become so universal that Battelle
(2005) speaks of the Database of Intentions that exists on the Web: It is not a
materialized database stored on a particular server, but “the aggregate results of
every search ever entered, every result list ever tendered, and every path taken as a
result.” He continues to state that the Database of Intentions “represents a real-time
history of post-Web culture—a massive click stream database of desires, needs,
wants, and preferences that can be discovered, subpoenaed, archived, tracked, and
exploited for all sorts of ends.” Search not only happens explicitly, by referring to a
search engine; it also happens to a large extent inside other sites, for example within
a shopping or an auction site where the user is looking for a particular category or
product; also most newspaper sites provide a search function that can be used on
their archive. As a result, a major portion of the time presently spent on the Web is
actually spent searching, and Battelle keeps up to date with developments in this
field in his search blog on the topic (see battellemedia.com). Notice that the search
paradigm has meanwhile become popular even in the context of file system orga-
nization or e-mail accounts, where a search function often replaces a structured, yet
eventually huge and confusing way of organizing content (see also the end of this
section).
From a technical perspective, a search engine is typically based on techniques
from information retrieval (IR) and has three major components as indicated in
Fig. 1.4: A crawler, an indexer, and a runtime system. The crawler explores the
Web as indicated above and constantly copies pages from the Web and delivers
them to the search engine provider for analysis. Analysis is done by an indexer
which extracts terms from the page using IR techniques and inserts them into a
database (the actual index). Each term is associated with the document (and its
URL) from which it was extracted. Finally, there is the runtime system that answers
user queries. When a user initiates a search for a particular term, the indexer will
return a number of pages that may be relevant. These are ranked by the runtime
system, where the idea almost always is to show “most relevant” documents first,
whatever the definition of relevance is. The pages are then returned to the user in
that order.
A crawler commonly revisits Web pages from time to time, in order to keep its
associated index up-to-date. Thus, a search query will typically return the most
recent version of a Web page. If a user is interested in previous versions or wants to
see how a page has evolved over time (if at all), the place to look is the Wayback
Machine at the Internet Archive (see www.archive.org/web/web.php), which has
been crawling the Web on a daily basis ever since 1996.
1.2 The Search Paradigm 15
Crawler
Runtime system
Index /
database
Indexer
The popularity of Google grew out of the fact that they developed an entirely
new algorithmic approach to search. Before Google, it was essential to locate just
any site whose content was related or contained a given search term. To this end,
search engine builders constructed indexes of Web pages and often just stored the
respective URLs. As an answer to a query, a user would be returned a list of URLs
through which he or she then had to manually work through. Google co-founder
Larry Page came up with the idea that not all search results could be equally
relevant to a given query, but unlike the information broker, who can exploit his or
her expertise on a particular field, an automated search engine needs additional
ways to evaluate results. What Page suggested was to rank search results, and he
developed a particular algorithm for doing so; the result of that algorithm applied to
a given page is the PageRank, named after his inventor. The PageRank of a page is
calculated using a recursive formula (see infolab.stanford.edu/*backrub/google.
html for details), whose underlying idea is simple: Consider a doctor. The more
people that recommend the doctor, the better he or she is expected to be. It is similar
with ranking a Web page: The more pages linked to a page p, the higher the rank of
p will be. However, the quality of a doctor also depends on the quality of the
recommender. It makes a difference whether a medical colleague or a salesperson
for the pharmaceutical industry recommends her or him. If the doctor is
16 1 The Web from Freshman to Senior in 20+ Years …
Anyone interested in statistics about searching should consult, for example, Goo-
gle’s Zeitgeist (at www.google.com/zeitgeist), which keeps rankings about most
popular search terms in recent past. Other statistics may be obtained from places
like Nielsen//Netratings or the aforementioned SearchEngineWatch. What people
have observed by looking at these figures is, among other things, that a few queries
have a very high frequency, i.e., are asked by many people and pretty often, but the
large majority of queries have a considerably lower frequency; a possible appli-
cation of the 80/20 rule. When plotted as a curve, where the x-axis represents a list
of (a fixed number of) queries, while the y-axis indicates their frequency, the graph
will look like the “hockey stick” shown in Fig. 1.5. Graphs of this form follow a
1.2 The Search Paradigm 17
frequency
top 20%
popularity
power-law type of distribution: They exhibit a steep decline after an initial, say,
20%, followed by a massive tail into which the curve flattens out. Power laws can
be found in many fields: the aforementioned search term frequency, book sales, or
popularity of Web pages. Traditionally, when resources are limited, e.g., space on a
book shelf or time on a TV channel, the tail gets cut off at some point. The term
long tail is used to describe a situation where such a cutting off does not occur, but
the entire tail is preserved. For example, there is no need for search engine pro-
viders to disallow search queries that are only used very infrequently.
As we will see as we go along, the long tail phenomenon can be observed in a
variety of contexts related to the Internet and the Web. For example, it applies to
electronic commerce, where the availability of cheap and easy-to-use technology
has enabled a number of companies to take advantage of the broader reach provided
by the Internet which otherwise would not have considered entering this arena.
Think, for example, of shops selling high-end watches or cameras. It also applied in
the context of online advertising.
1.2.4 Directories
Yahoo! and AOL were among the first to recognize that the Web, following the
explosion of the number of pages that occurred in the mid-90s needed some form of
organization, and they did so by creating directories containing categorizations of
Web site content and pathways to other content. These were hierarchically orga-
nized catalogues of other sites, and many of them were later developed into portals.
A portal can be seen as an entry point to the Web or a pathway to Web sources that
has a number of topical sections that are owned and managed by the main site and
18 1 The Web from Freshman to Senior in 20+ Years …
Google Keep, or Ubernote, to name just a few. The possibility for users to attach
tags to documents originally marked the arrival of user input to the Web and
coincided with the opportunity for customers to place reviews or ratings on an
e-commerce site. For e-commerce, ratings and comments have been shown to have
a major impact on revenues a seller may be able to obtain, and that is no surprise: If
a seller is getting bad ratings repeatedly, why would anyone buy from them in the
future? This input is typically exploited in various ways, including the formation of
virtual communities. Such communities are characterized by the fact that its
members might not know each other, but they all (or at least most of them) share
common interests. This phenomenon has been identified and studied by many
researchers in recent years, and it represents a major aspect of the socialization of
the Internet and the Web.
In hardware, there is essentially one singular development that governs it all: the
fact that hardware is becoming smaller and smaller and will ultimately disappear
from visibility. Consider, for example, the personal computer (PC). While already
more than 10 years old when the Web was launched, it has shrunk (and become
cheaper) on a continual basis ever since, with laptops and tablets now being more
popular (demonstrated by their sales figures) than desktops. Moreover, with pro-
cessors embedded into other systems such as cars, smartphones, watches etc., we
can now carry computing power in our pockets that was unthinkable only a few
years ago (and that typically outperforms the computing power that was needed to
fly man to the Moon in the late 1960s by orders of magnitude). Just think of a
smartphone that is powered by a microprocessor, has some 128 GB of memory,
that can run a common operating system, and that can have a host of applications
(“apps”) installed (and can have many of them running simultaneously). Thus, in
many applications we do not see the computer anymore, and this trend, which has
been envisioned, for example, by Norman (1999), will continue in ways we cannot
even accurately predict today.
20 1 The Web from Freshman to Senior in 20+ Years …
Another important aspect of hardware development has always been that prices
keep dropping, in spite of expectations that this cannot go on forever. As has been
noted, Moore’s Law is still valid after 50 years, and expected to remain valid for a
few more years (chips nowadays have such a high package density that the heat
produced will eventually bring it to an end). In this “law” Gordon Moore, one of the
founders of chipmaker Intel, predicted in 1965 in an article in the “Electronics”
journal that the number transistors on a chip would double every 12–18 months; he
later corrected the time span to 24 months, but that does not change the underlying
tenet of his prediction. A plot of Moore’s Law can be found at en.wikipedia.org/
wiki/Moore’s_law or, for example, at pointsandfigures.com/2015/04/18/moores-
law/. It turns out that microprocessor packaging has been able to keep up with this
law (indeed, processor fabrication has moved from 90 nm5 structures in the
mid-2000s to 14 nm today and will most likely shrink even further6). Raymond
Kurzweil, one of the primary visionaries of artificial intelligence and father of
famous music synthesizers, and others, consider Moore’s Law a special case of a
more general law that applies to the technological evolution in general: If the
potential of a specific technology is exhausted, it is replaced by a new one.
Kurzweil does not use the “transistors-per-chip” measure, but prefers “computing
power per $1,000 machine.” Indeed, considering the evolution of computers from
mechanical devices via tubes and transistors to present-day microprocessors, it
exhibits a double-exponential growth of efficiency. The computing power per
$1,000 (mechanical) computer has doubled between 1910 and 1950 every three
years, between 1950 and 1966 roughly every two years, and presently doubles
almost annually.
As a result, hardware has become a commodity, cheap and ubiquitous, and—as
will be discussed in Chap. 2—is often provided today “in the cloud.” For being able
to use hardware, be it processors or storage, it is not necessary anymore to purchase
it, since computing power and storage capacity can nowadays be rented as well as
scaled up and down on-demand. With many Internet providers (as well as other
companies, for example Amazon or Google), private or commercial customers can
choose the type of machine they need (with characteristics such as number of
processors, clock frequency, or main memory), the desired amount of storage, and
the rental period and then get charged, say, on a monthly basis (or based on other
measures, e.g., CPU cycles or hours). This leasing approach, one core aspect of
cloud sourcing, has become an attractive alternative to purchasing for the reason
that, since hardware ages so fast, there is no more need to replace/dispose-of out of
date technology, when it is still functioning as intended.
One consequence of the hardware developments just described and the
increasing hardware complexity is the trend to develop hardware and software no
5
Nanometer.
6
7 nm is currently considered the end of the line by many people, since at 7 nm distance transistors
sit so close to each other that an effect called quantum tunneling occurs, which means that the
transistor cannot reliably be switched off and will mostly stay on. Graphene might then become a
replacement for silicon in the production of microchips.
1.3 Hardware Developments 21
longer separately, but together; the result are engineered systems or “appliances” as
marketed nowadays by vendors such as Oracle or SAP and which will be discussed
in Chap. 2.
1.3.2 IP Networking
technological basis that we are used to today. As remarked by Friedman (2005), one
of the consequences of the burst of the dot-com bubble was an oversupply of
fiber-optic cable capacity especially in the US, of which many newly created ser-
vice providers were able to take advantage.
The mid-1990s also saw a growing need for administration of Internet issues,
one result of which was the creation of ICANN (www.icann.org), the Internet
Corporation for Assigned Names and Numbers. ICANN is a private non-profit
organization based in Marina Del Rey, California, whose basic task is the technical
administration of the Internet, in particular the assignment of domain names and IP
addresses as well as the introduction of new top-level domains. To this end, it is
worth mentioning that naming on the Internet follows a hierarchical pattern as
defined in the Domain Name System (DNS), which translates domain or computer
hostnames into IP addresses, thereby providing a worldwide keyword-based redi-
rection service. It also lists mail exchange servers accepting e-mail for each domain,
and it makes it possible for people to assign authoritative names without needing to
communicate with a central registrar each time. The mid-1990s moreover saw the
formation of organizations dealing with the development of standards related to
Web technology, most notably the World Wide Web Consortium (W3C), founded
by Web inventor Tim Berners-Lee, for Web standards in general (see http://www.
w3.org) and the Organization for the Advancement of Structured Information
Standards (OASIS) for standards related to electronic business and Web services
(see www.oasis-open.org).
With broadband and wireless technology available as an increasingly ubiquitous
commodity (with the only remaining major exception being developing nations),
we constantly see a host of new applications and services arise and delivered over
the Internet and the Web, with digital radio and television only being precursors of
what is still to come; moreover, we see a constant growth in mobile usage of the
Web (see next section). Broadband communication in particular allows for an easy
transfer of large files, so that, for example, it becomes possible to watch movies
over the Internet on a mobile device, since at some point it will be possible to
guarantee a constant transfer rate over a certain period of time. For example, the
broadband penetration of homes in the US has gone up considerably since the year
2000, and indeed the period 2000–2013 shows a dramatic reversal of the use of the
two networking options—broadband and dial-up; see www.pewinternet.org/2013/
08/26/home-broadband-2013/; however, also according to PewResearchCenter,
home broadband penetration plateaued in 2015, see www.pewinternet.org/2015/12/
21/home-broadband-2015/. A typical effect after getting broadband at home is that
people spend more time on the Internet. Moreover, with flat rates for Internet access
widely available today, many users do not explicitly switch their connection on and
off, but are essentially always connected. As an example, slide 45 of wearesocial.sg/
blog/2014/02/social-digital-mobile-europe-2014/ shows the Internet penetration in
Europe as of February 2014, which a European average of 68%, twice as high as the
global average of 34% (for a 2015 update, showing, for example, the global average
already risen to 42%, see de.slideshare.net/wearesocialsg/digital-social-mobile-in-
2015).
1.3 Hardware Developments 23
To conclude this section, we mention that the future of hardware technology will
see a variety of other developments not discussed here, including nano-sensors, 3D
materials and microchips, or organs-on-chip; for more details we refer the reader to
www.weforum.org/agenda/2016/06/top-10-emerging-technologies-2016.
Cellular
Wireless technologies are often described in terms of the cellular generation by
which they were characterized. There are a number of attributes by which each
generation can be described, although it is the data bandwidth of each that provides
the greatest differentiator. Table 1.1 summarizes the generations that have occurred
to-date, along with a prediction of the next wave, that being 5G.
First generation wireless technology (1G) is the original analog, voice-only tele-
phone standard. It was developed in the early 1980s and was what the first cellular
phones were based upon. Various standards were adopted with Advance Mobile
Phone System (AMPS) adopted in North America and Australia, Total Access
Communication System (TACS) employed in the United Kingdom, and Nordic
Mobile Telephone (NMT) in a variety of European countries. Speeds were limited
and users could only call within their own country. 2G or second generation wireless
cellular networks were first launched in Finland on the GSM (Global System for
Mobile Communication) standard. 2G permitted SMS (Short Messaging Service),
picture messages, and MMS (Multimedia Messaging Service) and provided greater
levels of efficiency and security for both the sender and receiver. Compared to 1G, 2G
calls were of higher quality, yet to achieve this, mobile devices needed to be nearer to
24
cellphone towers. Technologies were either Time Division Multiple Access (TDMA)
which separates signals into distinct time slots, or Code Division Multiple Access
(CDMA) in which the user is allocated a code that allows them communicate over a
multiplex physical channel. GSM is the most well-known example of a TDMA.
Personal Digital Cellular (PDM) is another. Before the introduction of
third-generation (3G), a revised and improved 2G, known as 2.5G was introduced
that implemented a packet-switched domain in addition to the circuit-switched
domain. Viewed more as an enhancement of 2G rather than a new generation its own
right, 2.5G provided enhanced data rates via these improved standards. 3G offered
significant improvements to data speed and efficiency and via additional services.
These services are required to meet the speed threshold established by IMT-2000 that
being 200 Kbps (0.2 Mbps). Many service providers exceeded this threshold sig-
nificantly with speeds of up to 2 Mbps not uncommon. Enhanced voice quality along
with video was now possible. We are currently in the era of 4G cellular wireless
standards which offers theoretical speeds of up to 1 Gbps, although 100 Mbps is more
realistic. This is more than adequate for high quality mobile video delivery such as
on-demand television and video conferencing. Other benefits include seamless global
roaming, low cost and greater levels of reliability. The next generation of cellular
wireless, 5G, is tentatively projected to arrive in around 2020 and is forecast to be
extremely fast, extremely efficient, and has been described as being “fiber-wireless”.
Wi-Fi
Wi-Fi or WiFi is a wireless standard that lets mobile-enabled devices connect
wirelessly to a WLAN (Wireless Local Area Network). This is the universal
approach for both business and private wireless WLAN users. Wi-Fi first emerged
in the early 1990s and was released for consumer use in 1997 following the
establishment of the IEEE 802.11 committee which oversaw the IEEE 802.11 set of
WLAN standards. It was about two years later when Wi-Fi routers became avail-
able for home use that Wi-Fi really took off. A typical Wi-Fi setup involves one or
more Access Points (APs) and one or more clients. Clients connect to the AP or
hotspot upon receipt of the unique SSID (Service Set Identifier), commonly known
as the network name. Various forms of encryption standards are employed to secure
what is largely an insecure network structure. Common encryption standards
include Wired Equivalent Privacy (WEP) although this has been shown to be lar-
gely insecure. Wi-Fi Protected Access (WPA and WPA2) encryption is much more
common today and significantly more secure that WEP. Wi-Fi has advanced sig-
nificantly over the two decades in terms of data transfer rates.
Table 1.2 below provides a historical account of this evolution. In just 15 years,
Wi-Fi maximum data speeds have increased in magnitude by over 700 times the
initial legacy standard (802.11). It is interesting to note that indoor and outdoor
Wi-Fi ranges have not changed significantly, primarily because range is a product
of the radio frequency at which it operates and only two frequencies are accessible
for Wi-Fi, 2.4 and 5 GHz.
It is also important to note that Wi-Fi is highly susceptible to interference, which
can also severely constrain its effectiveness. Such interference can come from
26 1 The Web from Freshman to Senior in 20+ Years …
competing Wi-Fi networks and other electrical equipment operating within the 2.4
or 5 GHz ranges such as cordless phones and baby monitors. The physical envi-
ronment can also cause major service degradation with concrete walls significant
hurdles to overcome. There is little doubt the Wi-Fi will remain the benchmark
WLAN standard for the foreseeable future, and many workplaces, campuses and
even cities are seeking to establish broad Wi-Fi networks through the installation of
overlapping network access points. The first university campus to achieve this was
Carnegie Mellon University (CMU) with their “Wireless Andrew” project that
began in 1993 and was completed in 1999. As of 2010, CMU operated over 1,000
access points across seventy-six buildings and some 370,000 square meters
(Lemstra et al. 2010). Many cities around the world either have city-wide Wi-Fi
coverage or have such projects underway. London, UK and Seoul, South Korea are
two such examples.
Bluetooth
Bluetooth is a short distance, low data volume, wireless standard commonly used
for direct data sharing between paired mobile devices. It is largely included by
default in smartphones, tablets and other mobile devices today. Like Wi-Fi,
Bluetooth operates on a wireless network utilizing a narrow band between 2.4 and
2.485 GHz. It was first developed by Ericsson in the early 1990s. The Bluetooth
standard is managed by an independent special interest group (SIG) which itself
doesn’t actually develop or market Bluetooth devices, but instead manages the
development of the specification, protects the trademarks, and ensures the reliability
of the standard (Bluetooth 2016). Like many other technologies, the Bluetooth
standard has changed and been refined/improved over the years. However the
greatest and most radical change came with the replacement of Bluetooth 3.0 with
Bluetooth 4.0 (also known as Bluetooth Smart) in 2010. A subset of this, and what
is most widely known, is Bluetooth Low Energy or BLE.
Table 1.3 provides a summary of the key differences between the “classic”
Bluetooth standard and Bluetooth Low Energy.
The key difference, as the name suggests, with BLE is power consumption.
Although the peak consumption is comparable, the significantly reduced set-up
time means that BLE connections are actually very brief. It is thus not uncommon
for a BLE device to be able to operate for a number of years on a non-rechargeable
1.4 Mobile Technologies and Devices 27
button battery. This provides numerous potential real-work applications and indeed
is the basis of Beacon technology which is discussed in the next section.
Li-Fi and the Future of Wireless Communications
What each of the above mobile technologies, and indeed others not discussed here
such Near Field Communication (NFC) have in common is that they all operate via
radio waves within the electromagnetic spectrum, albeit at its lower end. To-date, it
is the only component of the spectrum that has been successfully used to any real
degree for wireless communication and data transmission. However a significant
limitation exists in that the size of this component of the electromagnetic spectrum
is relatively small, and is also reliant on costly infrastructure to maintain the scale
necessary for the continued growth in use we observe. As a possible solution to
these and other radio wave constraints, University of Edinburgh Physicist Harold
Haas has developed an approach that utilizes the visible spectrum, or as we all
know it, light. His concept is known as Li-Fi and promises to deliver, at some point
in the future, data transfer speeds exponentially faster than what we are currently
used to. One of the key benefits offered by the visible component of the electro-
magnetic spectrum, is the breadth of it, indeed, as shown in Fig. 1.7, the visible
light spectrum is roughly 10,000 times the size of the radio wave spectrum.
Li-Fi works through the use of LEDs which momentarily turn on and off in a
pattern that, when received after being transmitted via light by a receiver, is con-
verted into a computer readable form. While Haas is proposing that LED light be
Frequency / Hz
X ~ 10,000
the primary source of light and the data transfer source, there are indeed many
potential LED light sources that could be used. These include mobile phones and
televisions (Ganesan 2014). In terms of receivers, there is also the potential for
smart phone cameras to be used instead of specialized photo detectors and recei-
vers. A common question about Li-Fi is whether it works in the dark. The answer is
no, however it can work with such a low amount of light, that the light source is not
visible to the human eye. Another characteristic of light is that it cannot penetrate
walls, yet of course can go around corners. While this may reduce the range and
application of Li-Fi, it does provide the opportunity for physical security barriers.
Another key benefit of Li-Fi over Wi-Fi is that it can be used in dangerous envi-
ronments such as nuclear power plants where RF is not permitted. Overall Li-Fi
offers potentially the greatest advancement in wireless communications yet. While
still under development, and with some technical challenges yet to be overcome, it
seems likely that practical applications of Li-Fi will be seen within the next decade.
There exists a plethora of mobile-enabled devices today. Few would argue that the
devices we have at our disposal today are, in general, an amazing technical
achievement. They provide unquestionable performance, outstanding graphics,
high-quality multimedia, reliable connectivity, impressive broadband speeds and
long battery life (Qualcomm 2014). Just like traditional computing, mobile devices
are a combination of hardware, software and data processing. They comprise CPUs
and operating systems (OS). They have input and output mechanisms delivered by
mobile user interfaces. Like in traditional computing there exists considerable
variability in all of these attributes which leads to significant issues in compatibility.
An obvious example of this occurs within mobile operating systems. The market is
dominated by Google’s Android, with only Apple’s iOS offering any real com-
petition. Others including Microsoft’s Windows Mobile and Blackberry have, at
times, been more prominent in the marketplace. Android is an operating system
based on the Linux kernel, and designed primarily for touchscreen mobile devices
such as smartphones and tablet computers. Initially developed by Android, Inc.,
which Google backed financially and later bought in 2005. Android was unveiled in
2007. The first Android phone (HTC Dream) was sold in October 2008. Android is
open source and Google releases the source code under the Apache License. Google
Play has one million+ Android apps. The main competitor to Android is iOS
(previously iPhone OS) being a mobile operating system developed and distributed
by Apple Inc. originally unveiled in 2007 for the iPhone. It has been extended to
support other Apple devices such as the iPod Touch (September 2007), iPad
(January 2010), iPad Mini (November 2012) and second-generation Apple TV
(September 2010). Interface control elements consist of sliders, switches, and
buttons. Interaction with the OS includes gestures such as swipe, tap, pinch, and
reverse pinch, all of which have specific definitions within the context of the iOS
operating system and its multi-touch interface. Unlike Microsoft’s Windows Phone
1.4 Mobile Technologies and Devices 29
and Google’s Android, Apple does not license iOS for installation on non-Apple
hardware. The market share of mobile OS clearly shows an increasing dominance
of Android, from about 2% in 2009 to more than 80% in 2015, while iOS has been
pretty stable at around 15% during the same period.
Smartphones
A smartphone is primarily a cellular communication device that with the addition of
a mobile operating system, multimedia, a large touch screen, and data connectivity,
operates more as a small computer and it is widely reported that the average
smartphone of today has significantly more computation power than the Apollo
computers that took NASA to the moon in the 1960. Indeed it has been predicted
that in the very near future your smartphone will, for many, be your sole computing
device and used as a replacement, not a supplement, to a traditional computer
(Bonnington 2015). One of the key features of smartphones is their relative low
cost. This has enabled significant Internet penetration within emerging nations
where mobile infrastructure massively exceeds fixed network infrastructure. As a
result, smartphone sales continue to grow with 2016 sales expected to reach 1.5
billion units (Gartner 2016). Indeed smartphone ownership in countries such as
Turkey, Malaysia, Chile and Brazil have increased by over 25% in the past three
years (Poushter 2016).
Tablets
Tablets are effectively large smartphones. In most cases, their primary use is not for
voice communication as with a smartphone, yet many do have cellular (3G) con-
nectivity and there is indeed a combination of a tablet and a smart phone, typically
around five or six inches in size, known as a “Phablet” (Phone crossed with a
Tablet). Tablets themselves are intended to be somewhat of a cross between a
smartphone and a laptop with some tablets offering a screen similar in size to a
standard laptop. Tablets have touch screens and are intended for more convenient
and mobile interaction that a laptop.
One of the most significant future developments of both smartphones and tablets
will be the eventual introduction of flexible screens. The likes of Samsung and LG
are actively working on expandable, bendable, twistable OLED screens that may
very soon allow your smartphone to be folded inside your wallet. This could be a
“game-changer” for requiring multiple devices for different purposes where instead
you could simply roll-out the size screen you desire.
Wearables
The most recent development in the mobile hardware space has been the wearable.
Indeed, 2014 was regarded by some commentators as the year of the wearable
reflective of the large number of devices that entered the market. As the name
would suggest, these are devices that are intended to be worn on the body, and in
most cases are worn on the wrist as a smart watch or fitness tracker. Key players in
this rapidly growing domain include the range of Fitbit (fitbit.com) fitness trackers,
the Samsung Gear range of smartwatches, and the first entrant into smartwatch
market, Pebble (pebble.com) who developed their initial smart watch through in
excess of $US10 million in crowdfunding pledges. Many other smartwatches have
30 1 The Web from Freshman to Senior in 20+ Years …
entered the market in recent times including the Apple Watch, the Motorola Moto
360, and the Huawei Smartwatch, to name just a few. The impact such devices will
have is still unclear. At this point in time, fitness trackers do just that, and serve a
niche market, smartwatches with their broader functionality are still seen somewhat
expensive for their novelty value. A range of business uses of wrist-based wearables
are predicted, most of which are intended to provide for flexible ways of com-
municating with others, managing schedules, accessing short documents or memos
when “on the run”, making wireless payments with tools as Apple Pay, translating
short strings of text, or as use as a remote for presentations.
One domain where there exist significant opportunities is within the health and
well-being space, otherwise known as healthware (Patel et al. 2012). Smartwatches
and fitness trackers can be used to guide and monitor injury rehabilitation or illness
recovery. Collected data can be analyzed and monitored by medical practitioners
and, where appropriate, treatments revised. The key component element that
enables this is the various sensors contained within these smart devices. These
include, for example, heart-rate monitors, GPS receivers, thermometers,
accelerometers, altimeters, barometers, and compasses. Only time will tell whether
wrist-based smart watches and fitness trackers will have the disruptive effect that
many are predicting.
Other wearables such as smart glasses, made most famous by the now discon-
tinued Google Glass, smart rings, smart headphones, tags that attach to shoes and
various forms of wearable clothing are all under varying stages of development.
Smart glasses offer significant opportunities for carrying out activities where
hands-free operation is either desirable or necessary. They are able to carry out
basic functions such as reading and voice-to-tech writing of e-mails or text mes-
sages, making notes etc., they can be used for basic navigation (The Verge 2013),
and they can be used for viewing video and images. Prototypical smart glass
applications have been developed for a variety of inspection-based activities where
hands-free operation is required. One such application includes the inspection of
high-voltage power pylons where the inspector needs to climb up the pylon and
look for possible faults or damage. Using smart glasses allows them to take a
photograph of what is observed, provide an audio commentary of what has been
photographed, record the specific location via GPS, and send all of the information
to the cloud via a cellular network; all without needing to release their hand from
what they are climbing on. Another future possible application is the analysis of the
users viewing patterns. This could be applied to customers viewing items either in a
store or online. This information would allow the store to provide a more
customer-focused experience and therefore increase profit (Rallapalli and Austin
2014). There are, however, a number of potentially limiting factors of smart glasses,
not least the potential for accidents occurring as a result of being distracted or the
possible health effects that might result from long-term use.
More recent developments with the likes of smart rings, e.g. NFC ring (nfcring.
com) or the Kerv (kerv.com) may become viable in the future as tools for con-
tactless payments or to unlock doors. However the technology is currently not
particularly advanced. This also applies to what is termed the “hearable”; a smart
1.4 Mobile Technologies and Devices 31
devices that sit inside the ear, much like a hearing aid. The most prominent of these
right now is the Bragi Dash (bragi.com) which as a very minimum provides
wireless headphone functionality, but also purports to operate as a fitness tracker,
heart-rate monitor, wireless phone and much more.
Mobile devices are not without their challenges, however. Three such limitations
regularly receive attention. Safety and security is the primary issue for developers
and users alike. Users fear that their devices will be attacked by viruses, resulting in
the theft of personal data and people generally feel safer using their mobile devices at
home where the familiarity of the home setting gives the user the perception of safety
and security. Users are also concerned about slow or unstable connections. They fear
they may be cut off in the middle of an e-commerce transaction and so developers
need to ensure e-commerce platforms account for the eventuality. This issue is will
reduce as we move to faster, more reliable networks. Finally, and this is the issue
which is most difficult to address, users, in general, don’t like the small screen size.
The primary compliant is centered on the use of mobile devices for shopping, the
inability for users to get a good look and feel for the products they are considering
purchasing. In general, unless a buyer is familiar with a product or the product’s
appearance doesn’t matter, users are hesitant to buy an item on a smartphone.
In order to understand some of the statements we are going to make in this book,
and to put some of the developments on which we report into perspective, it makes
sense to deviate a little from the core topic of this chapter, the development of the
Web and its accompanying hardware and software evolution. For a brief moment
we take the view of a journalist, Tom Friedman, foreign affairs columnist for the
New York Times and three-time Pulitzer Prize winner, on globalization and how
the world has changed over the last 25–30 years in light of the technological
developments that we have mentioned.
As described in Friedman (2005), shortly after the turn of the century the world
became a “flat” one in which people from opposite ends of the planet can all of a
sudden interact, play, do business with each other, and collaborate, and all of that
without knowing each other or having met, and where companies can pursue their
business in any part of the world depending on what suits their goals and intentions
best; they can also look at an entire world of customer base. A typical example of
what was enabled at the time thanks to the Web is described by Richard MacManus
in a February 2016 blog post: “From 2003–2012 I built up and ran a technology
blog7 called ReadWriteWeb. At its peak it had over twenty people working for it,
nearly all of them in the US. The fact I could manage this business virtually, from
New Zealand, showed the power of the Internet tools the blog evangelized.
7
Blogs are the subject of the next section.
32 1 The Web from Freshman to Senior in 20+ Years …
The RWW team communicated via Skype (these days we’d do it on Slack, but
Skype did the job back then). We published on Movable Type. We managed
projects on Basecamp. We scheduled meetings with GoToMeeting. We kept track
of the editorial calendar using Google Calendar” (see augintel.com/2016/02/03/
bitcoin-online-payments/).
Thus, there are essentially no more significant limits to what anyone can
accomplish in the world these days, since the infrastructure we can rely upon and
the organizational frameworks within which we can move allow for so many
unconventional and innovative ways of communicating, working together, col-
laborating, and information exchange. In total, Friedman (2005) accounts for 10
flatteners which are:
1. The fall of the Berlin wall on November 9, 1989, when Eastern Europe opened
up as a new market, as a huge resource for a cheap, yet generally well-educated
work force, and as an area with enormous demand for investments and reno-
vation. Globalization swapped from the West across Eastern Europe and
extended deeply into Asia.
2. The Netscape IPO on August 9, 1995, when for the first time it was demon-
strated that one can make money through the Internet, in particular with a
company whose business model does not immediately imply major revenues.
3. Software with compatible interfaces and file formats as well as workflow
software which can connect people all over the world by chaining together what
they are doing into a comprehensive whole. New forms of collaboration and
distribution of work can be run over the Internet, and jobs become location- and
time-independent. A division of labour in specialized tasks has moved from a
regional or domestic to an international scale.
4. Open sourcing, or the idea of self-organizing collaborative communities which
in particular are capable of running large software projects. Prominent exam-
ples include the GNU/Linux operating system, the Mozilla Firefox browser
project, and the Moodle e-learning software system. In all cases, complex
software with numerous components has been developed in a huge common
effort, and is being maintained by a community of developers which respond to
bugs and failures with an efficiency unknown to (and vastly impossible for)
commercial software companies. The modern term for this is crowdsourcing.
5. Out-sourcing, where companies concentrate on their core business and leave
the rest to others who can do it cheaper and often more efficiently. Out-sourcing
occurs at a global scale, i.e., is not restricted to national boundaries or regional
constraints anymore.
6. Off-shoring, which means going way beyond outsourcing; indeed, the next step
is to take entire production lines to an area of the world where labour is cheaper.
This particularly refers to China, but also again to India or countries like Russia
and Brazil.
7. Supply-chaining or the idea of streamlining supply and production processes on
a world-wide basis, for example through the introduction of RFID tags (or more
recently Beacon) technology. Supply chains have become truly global today,
1.5 From a Flat World to a Fast World that Keeps Accelerating 33
Obviously, not all flatteners are related to the Internet and the Web, yet all of these
developments, which not only go together, but influence each other, are heavily
relying on efficient communication networks and on tools such as the Web for
utilizing them. In the flat world, it became possible to access arbitrarily remote
information in an easy and vastly intuitive way (“the global village”), in particular
information of which it had not been known before that it existed. One of the
slogans now was to have “information at your fingertips,” and search engines were
one of the major support tool making this possible.
Friedman described these flatteners originally in 2004, a time when smartphones,
tablet computers or Facebook were not yet around, when, as Friedman himself has
put it in a speech, “Twitter was a sound, the cloud was in the sky, 4G was a parking
spot, LinkedIn was a prison, and Skype was a typo.” 10+ years later, many of the
flatteners still apply, but several have considerably advanced. So it does not come as
a surprise that Friedman and Mandelbaum (2011) took another look at globalization
and its “clash” with the IT revolution. They noticed that, due to the developments
we have sketched above, the world has indeed transitioned from “flat” to “fast”. As
Friedman describes it in a November 2014 NYT column, “The three biggest forces
on the planet—the market, Mother Nature and Moore’s Law—are all surging, really
fast, at the same time. The market, i.e., globalization, is tying economies more
tightly together than ever before, making our workers, investors and markets much
more interdependent and exposed to global trends, without walls to protect them.
Moore’s Law … is, as Andrew McAfee and Erik Brynjolfsson posit in their book,
“The Second Machine Age,8” so relentlessly increasing the power of software,
computers and robots that they’re now replacing many more traditional white- and
8
Winner of the German Handelsblatt “Wirtschaftsbuchpreis 2015”.
34 1 The Web from Freshman to Senior in 20+ Years …
blue-collar jobs, while spinning off new ones—all of which require more skills.
And the rapid growth of carbon in our atmosphere and environmental degradation
and deforestation because of population growth on earth—the only home we have
—are destabilizing Mother Nature’s ecosystems faster” (www.nytimes.com/2014/
11/05/opinion/the-world-is-fast.html).
As Friedman and Mandelbaum note, these developments have a profound
influence on how we live and how we work, how we educate students, and how we
conduct business. The world is no longer just connected, but “hyper-connected”
and hence ultra-fast in interactions, but also in changes and disruptive develop-
ments. High-speed networking is nowadays even available on the top of Mount
Everest, cheap labour and even cheap genius is always available from any corner of
the world, via the “crowd”, and rapid changes occur permanently and everywhere;
we will discuss the topic of disruption and disruptive innovation in more detail in
Chap. 5. In order to get along in a world like this, Friedman and Mandelbaum
suggest five behavioural patterns that everyone should adopt:
In his most recent book, Friedman (2016) pins the developments he has repeatedly
reported on in essence to the year 2007. In his November 2016 blog post, con-
curring with the publication of that book, he writes: “Steve Jobs and Apple released
the first iPhone in 2007, starting the smartphone revolution that is now putting an
internet-connected computer in the palm of everyone on the planet. In late 2006,
Facebook, which had been confined to universities and high schools, opened itself
to anyone with an email address and exploded globally. Twitter was created in
2006, but took off in 2007. In 2007, Hadoop, the most important software you’ve
never heard of, began expanding the ability of any company to store and analyze
enormous amounts of unstructured data. This helped enable both Big Data and
cloud computing. Indeed, “the cloud” really took off in 2007. In 2007, the Kindle
kicked off the e-book revolution and Google introduced Android. In 2007, IBM
started Watson—the world’s first cognitive computer that today can understand
virtually every paper ever written on cancer and suggest to doctors highly accurate
1.5 From a Flat World to a Fast World that Keeps Accelerating 35
diagnoses and treatment options. Further, have you ever looked at a graph of the
cost of sequencing a human genome? It goes from $100 million in the early 2000s
and begins to fall dramatically starting around … 2007. The cost of making solar
panels began to decline sharply in 2007. Airbnb was conceived in 2007 and change.
org started in 2007. GitHub, now the world’s largest open-source software sharing
library, was opened in 2007. And in 2007 Intel for the first time introduced
non-Silicon materials into its microchip transistors, thus extending the duration of
Moore’s Law—the expectation that the power of microchips would double roughly
every two years. As a result, the exponential growth in computing power continues
to this day. Finally, in 2006, the internet crossed well over a billion users world-
wide” (see www.nytimes.com/2016/11/20/opinion/sunday/dancing-in-a-hurricane.
html). He continues to argue that three fundamental developments, namely in
computing, in globalization, and in climate change, are accelerating simultaneously
and are impacting each other, and that the ordinary person has increasing difficulties
to keep up with them and tends to feel more and more uncomfortable. He compares
these accelerations to a hurricane in which we are asked to dance.
In later sections and chapters we will see a variety of examples in which these
accelerations manifest. We will also remind the reader of Friedman’s perception in
places where it is appropriate, yet it makes sense to keep that in mind already at this
point.
We have occasionally mentioned throughout this chapter that users have started to
use the Web as a medium in which they can easily and freely express themselves,
and by doing so online they can reach a high number of other people most of which
they will not even know. Two forms of user-generated content that became popular
about 10–15 years ago are the following: Blogs (such as the ReadWriteWeb
mentioned earlier) are typically expressions of personal or professional opinion or
experience which other people can at most comment; wikis are pages or systems of
pages describing content that other people can directly edit and hence extend,
update, modify, or delete. Both communication forms have contributed significantly
to the read/write nature of the Web and were indications of the transition that has
been coined “Web 1.0” to “Web 2.0” and which started happening around 2004.
Blogs
An effect the Web has seen and that has made it highly popular is that anybody can
write comments on products or sellers, on trips or special offers, on political
developments, and more generally on almost any topic, be it serious or not; people
can even write about themselves, or comment on any issue even without a particular
cause (such as a prior shopping experience). At the starting point of Web 2.0 were
blogs and a new form of activity called blogging. In essence, a blog is an online
diary or a journal that a person is keeping and updating on an ad-hoc or a regular
basis. The word itself is a shortened version of Web log and is meant to resemble
the logs kept by the captain of a ship as a written record of daily activities and
documentation describing a journey of the ship.
A blog on the Web is typically a sequence of texts in which entries appear in
reverse order of publication so that the most recent entry is always shown first. In its
most basic form, a blog consists of text only. Without any additional features and in
particular if separated from subscriptions, a blog is hence no more than a kind of
diary that may be kept by anybody, e.g., private persons, people in prominent
positions, politicians, movie stars, musicians, companies, or company CEOs.
However, most blogs go way beyond a simple functionality today.
As a first example of a blog, consider Slashdot (www.slashdot.org). It was
started in 1997 by Rob Malda for publishing “news for nerds, stuff that matters.” He
still maintains the blog today and has created one of the most lively sites for Linux
kernel news, cartoons, open-source projects, Internet law, and many other issues,
categorized into areas such as Books, Developers, Games, Hardware, Interviews,
IT, Linux, Politics, or Science. Each entry is attached to a discussion forum where
comments can even be placed anonymously. A prominent example of an early
company blog was the FastLane blog by General Motors Vice Chairman Bob Lutz,
in which he and other GM people reported on new developments or events in any of
the GM companies, or responded to consumer enquiries. It was a good example of
how an enterprise can develop new strategies for its external (but also its internal)
communication. German GM subsidiary Opel even went so far to maintain a car
blog while its Insignia model was still under development, with the effect that
40,000 units had been sold before the car even arrived at dealerships!
1.6 Socialization. Comprehensive User Involvement 37
While blogging services are often for free, i.e., users can create and maintain a
blog of their own without any charges, they typically have to accept advertising
around their entries. The Dilbert blog is maintained by Typepad, which hosts free
blogs only for a trial period, but which explains a number of aspects why people
actually do blog, namely to broadcast personal news to the world, to share a passion
or a hobby, to find a new job, or to write about their current one. Providers where a
blog can be set up (typically within a few minutes) include Blogger, Blogging, or
WordPress. UserLand was among the first to produce professional blogging soft-
ware called Radio (radio.userland.com). If a blog is set up with a provider, it will be
often the case that the blog is immediately created in a format so that readers of the
blog will be informed about new entries.
The activity of blogging, typically enhanced with images, audio or video, was
the successor to bulletin boards and forums, which have existed on the Internet
roughly since the mid-90s. Their numbers peaked in the early 2000s, which is why
the advertising industry took a close look at them. And in a similar way as com-
menting on products on commerce sites or evaluating sellers on auction sites have
done, blogging is contributing to consumer behavior, since an individual user can
now express his or her opinion without someone else executing control over it. The
party hosting a blog also has an ethical responsibility and can block a blog or take it
off-line, yet people can basically post their opinions freely. Many blogs take this
issue seriously and follow some rules or code of ethics. On a related topic, ethical
implications of new technologies have been investigated by Rundle and Conley
(2007).
Studies are showing that trust in private opinion is generally high. Blogs may
also be moderated which typically applies to company blogs; see www.blog.wan-
ifra.org/2016/07/04/five-lessons-on-managing-online-comments for an account of
how to deal with online comments. Blogs are also indexed by search engines and
are visited by crawlers on a regular basis. Since blogs can contain links to other
blogs and other sites on Web and links can be seen as a way for bloggers to refer to
and collaborate with each other, and since link analysis mechanisms such as
Google’s PageRank give higher preference to sites with more incoming links,
bloggers can obviously influence the ranking of sites at search engines. And
blogging has also opened the door for new forms of misuse. For example, blogs or
blog entries can be requested in the sense that people are asked to write nice things
about a product or an employer into a blog, and they might even get paid for this;
see www.nytimes.com/2011/08/20/technology/finding-fake-reviews-online.html
for one of many articles written about this subject, and see Ott et al. (2011) for
an algorithmic approach to discovering what the authors call “opinion spam.”
Conversely, a blog could be used for mobbing a person or a product, and can
become the target of an Internet troll. An easy way to avoid some of these misuses
is to require that blog writers have to identify themselves.
Wikis
A second prominent form of user participation on and contribution to the Web is
represented by wikis. A wiki is a Web page or a collection of pages that allows its
38 1 The Web from Freshman to Senior in 20+ Years …
users to add, remove, and generally edit some of the available content, sometimes
without the need for prior registration if the wiki is a public one. Thus, a wiki is an
editable page or page collection that does not even require its users to know how to
write a document in HTML. The term “wiki” is derived from the Hawaiian word
“wikiwiki” which means “fast.” Thus, the name suggests having a fast medium for
collaborative publication of content on the Web. A distinction is commonly made
between a single “wiki page” and “the wiki” as an entire site of pages that are
connected through many links and which is in effect a simple, easy-to-use, and
user-maintained database for creating content.
The history of wikis started in March 1995, when Ward Cunningham, a software
designer from Portland, Oregon, was working on software design patterns and
wanted to create a database of patterns in which other designers could contribute by
refining existing patterns or by adding new ones. He extended his already existing
“Portland Pattern Repository” by a database for patterns which he called WikiWi-
kiWeb. The goal was to fill the database with content quickly, and in order to
achieve this, he implemented a simple idea which can still be seen at wiki.c2.com/?
SoftwareDesignPatterns: Each page had at its bottom a link entitled “EditText”
which could be used to edit the text in the core of the page directly in the browser.
Users could write within a bordered area, and they could save their edits after
entering a code number provided. The important point is that no HTML was needed
to edit the page; instead, the new or modified content was converted to HTML by
appropriate wiki software. Other wikis operate just like this one.
Often, there is no review before the modifications a user has made to a wiki page
are accepted and, commonly, edits can be made in real-time, and appear online
immediately. There are systems that allow or require a login which allows signed
edits, through which the author of a modification can be identified. In particular, a
log-in is often required for private wiki servers only after which a user is able to edit
or read the contents. Most wiki systems have the ability to record changes so that an
edit can be undone and the respective page be brought back into any of its previous
states. They can also show most recent changes and support a history, and often
there is a “diff” function that helps readers to locate differences to a previous edit or
between two revisions of the same page. As with blogs, there is an obvious pos-
sibility to abuse a wiki system and input garbage.
There are numerous systems for creating and maintaining wikis, most of which
are open source (see c2.com/cgi/wiki?WikiEngines for a pretty up-to-date listing).
It also does not surprise that there are meanwhile wikis for almost each and every
topic, a prominent example being The Hitchhiker’s Guide to the Galaxy (h2g2.com
). This wiki was created by The Digital Village, a company owned by author
Douglas Adams who also wrote the famous book by the same title, and was taken
over by the BBC after Adams’ untimely demise. Adams created it in an attempt to
realize his vision of an open encyclopedia authored solely by its users.
Wikipedia
One of the largest wikis today is the multi-lingual Wikipedia project which contains
more than 40 million pages in over 250 different languages and a large number of
1.6 Socialization. Comprehensive User Involvement 39
images. Wikipedia started out as an experiment in 2001 to create the largest online
encyclopedia ever, but was soon growing faster than most other wikis. It soon got
mentioned in blogs and various print media including The New York Times.
Wikipedia has gone through several cycles of software development, and has
always strictly separated content from comments and from pages about Wikipedia
itself. Wikipedia has many distinct features that make its use transparent, among
them instructions on citations, which anybody who wants to refer to a Wikipedia
article elsewhere can easily download.
The Wikipedia administration has established strict rules regarding content or
users, in a similar spirit as blogs establish ethical rules. For example, a page to be
deleted must be entered into a “votes for deletion” page, where users can object to
its deletion; this reflects the Wikipedia idea of making decisions vastly uniform. By
a similar token, a user may only be instantly excluded from contributing to
Wikipedia in a case of vandalism, which has turned out to be rare. In general,
discussions that take place about the content of articles are most of the time highly
civilized, and Wikipedia has become prototypical for proving that the ease of
interaction and operation make wikis an effective tool for collaborative authoring.
The accuracy of Wikipedia has regularly been questioned. It was first empirically
tested in a study published by the British science journal Nature in December 2005. In
an article entitled Internet encyclopedias go head to head, Nature wrote that “Jimmy
Wales’ Wikipedia comes close to Britannica in terms of the accuracy of its science
entries, a Nature investigation finds”. For this study, Nature had chosen articles from
both Wikipedia and the Encyclopedia Britannica in a wide range of topics and had
sent them to experts for peer review (i.e., without indicating the source of an article).
The experts compared the articles one by one from each site on a given topic. 42 of the
returned reviews turned out to be usable, and Nature found just eight serious errors in
the articles, of which four came from either site. However, the reviewers discovered a
series of factual errors, omissions, or misleading statements; in total Wikipedia had
162 of them, while the Encyclopedia Britannica had 123. This averages to 2.92
mistakes per article for the latter and 3.86 for Wikipedia. While it may not be as quite
as accurate as Encyclopedia Britannica, is it much larger with approximately 60 times
as many words in the English Wikipedia site as in the encyclopedia.
The reliability of Wikipedia, discussed at en.wikipedia.org/wiki/Reliability_of_
Wikipedia in Wikipedia itself, may in part be due to the fact that its community has
developed several methods for evaluating the quality of an article, including
stylistic recommendations, tips for writing good articles, a “cleanup” listing articles
that need improvement, or an arbitration committee for complex user conflicts,
where these methods can vary from one country to another; some articles show in
their header that they “need improvement.”. The bottom-line is that Wikipedia is
one of the best examples for an online community that, in spite of permanent
warnings, works extremely well, that has many beneficiaries all over the world, that
is in wide use both online and off-line, and that enjoys a high degree of trust.
Wikipedia also is an excellent example of a platform that is social in the sense that
it gets better the more people use it, since more people can contribute more
knowledge, or can correct details in existing knowledge for which they are experts.
40 1 The Web from Freshman to Senior in 20+ Years …
According to Levene (2010), social networks bring another dimension to the Web
by going way beyond simple links between Web pages; they add links between
people as well as links between communities. In such a network, direct links will
typically point to our closest friends and colleagues, indirect links lead to friends of
a friend, and so on. In terms of Fig. 1.3, social networks can be seen as graphs
where nodes now represent individuals, and edges represent relationships between
them; or they can be seen as “topic maps” where the nodes represent topics that
people write about or hashtags that they attach to writings, and edges are references
between these topics or hashtags. As an example, wiki.digitalmethods.net/Dmi/
StartingPoints2 shows such a network of hashtags that connect or reference posts
related to oil spill. In this section, we take a brief look at social networks and the
impact they are having on the Web today.
We note that the younger generation today has vastly abandoned traditional
media such as newspapers. Indeed, investigations such as those done by the British
Office of Communications show that the “networked generation” is driving a radical
shift in media consumption. British broadband users already in 2006 spent on
average 12.7 hours per week online, growing to more than 31 hours in 2015. Much
of this time is spent on online social networking sites such as Facebook. Moreover,
the 16–24 year olds are spurning television, radio, and newspapers in favour of
online services. Increasingly households are turning to digital TV, TV or video on
demand and streaming services (e.g., Netflix) enabled by modern broadband con-
nections. Thus, while it initially took a while for average users and consumers to
accept the Web as a medium or platform, or indeed trusted it, the generation of
“digital natives,” a term attributed to American writer Marc Prensky, is now growing
up with it and integrating it into everyday life much more comprehensively.
The information available on the Internet and the Web as well as the tools by
which this information has meanwhile become accessible has led to the establish-
ment of a number of distinct Internet communities, i.e., groups of people with
common interests who interact through Internet and the Web. Today, we can
identify at least the following types of communities:
1.6 Socialization. Comprehensive User Involvement 41
A prominent example and at the same time one of the oldest communities of
interests on the Internet is the Internet Movie Database (IMDb), which emerged in
1990 from a newsgroup maintained by movie fans into the biggest database on
movies in cinemas and on DVD and which is nowadays owned by Amazon.
The considerable change in perception and usage has opened the door for the
present-day willingness of people to share all kinds of information, private or
otherwise, on the Web, for a new open culture as observable in blogs, wikis, and
social networks like Facebook, Snapchat, Twitter, or LinkedIn. The Internet offers
various ways to make contact with other people, including e-mail, chat rooms,
online dating sites, blogs and discussion boards (which, unfortunately, are also
heavily exploited by spammers, hackers, and other users with unethical intentions);
Fig. 1.8 summarizes the most important categories of Web tools for personal
communication and information management today and gives examples in each
category. However, while these services often just support ad-hoc interaction or
focused discussions on a particular topic (or less focused conversations on the
world at large), an online social network goes a step further and is typically the
result of employing some software that is intended to focus on building an online
community for a specific purpose. Many social networking services are also blog
CommuniƟes
E-Mail
Fora, Blogs
Webtools
Mobile Apps
Maps, Google Groups,
e.g., for Android or iOS
Adobe Connect
Fig. 1.8 Tool types and sample tools for personal communication and information management
42 1 The Web from Freshman to Senior in 20+ Years …
hosting services where people can deliver longer statements than they would
otherwise, and the distinction between blogs and social networks is often blurred.
Social networks connect people with different interests, and these interests could
relate to a specific hobby, a medical problem, an interest in some specific art or
culture. Often, members initially connect to their friends which they know, for
example, from school or college and later add new contacts, e.g., from their pro-
fessional life, often found through the Web.
A social network can also act as a means of connecting employees of distinct
expertise across departments and company branches and help them build profiles in
an easy way, and it can do so cheaper and more flexible than traditional (knowledge
management) systems. Once a profile has been set up and published within the
network, others can search for people with particular knowledge and connect to
them. A typical example of a social network often used professionally and for
business purposes is LinkedIn, a network that connects people, but also businesses
by industry, functions, geography and areas of interest. Meetup.com is a social
events calendar in which registered users can post event entries and share them with
other users; it currently has more than 250,000 groups in more than 180 countries.
Since social networks can usually be set up free of charge (or for a low fee), they
are an attractive opportunity for a company to create and expand their internal
contact base. However, a social network needs not be restricted to a company’s
internal operation; it may as well include the customers to which the company sells
goods or services (and hence be used for customer relationship management).
Notice that many of the tools mentioned in Fig. 1.8 are also in professional use
today.
An early social networking site that was most popular among members of the
networked generation around 2005 is MySpace, a Web site that facilitated an
interactive, user-supplied network of friends, personal profiles, blogs, groups,
photos, music, and videos. MySpace, along with others of its time, such as
bebo.com were slowly abandoned when Facebook gained traction; in their own
words, Facebook is “a social utility that connects people with friends and others
who work, study and live around them;” its story is well documented by Kirkpatrick
(2011). Similarly, Twitter has been successful as a social network and “mi-
croblogging” site, in which people can follow others and until recently could only
post entries of no more than 140 characters; for its evolution, see Bilton (2014).
We finally mention YouTube in this context, which can be seen as a mixture of a
video blogging site and a social networking site, but which exhibits an interesting
phenomenon: Due to the size of networks now existing on the Web, and due to the
enormous functionality which the sites mentioned above offer, IT and media
companies have discovered opportunities in the use of social networks. Like much
of the Internet industry, there is a rich history of acquisitions, takeovers and indeed
failure. Acquisitions include Google’s purchase of YouTube, while Rupert Mur-
doch of News Corporation bought MySpace (see the cover story in Wired maga-
zine, July 2006), a purchase he definitely regretted later, Facebooks takeover of
WhatsApp in 2014, or Microsoft’s acquisition of LinkedIn in mid-2016.
1.6 Socialization. Comprehensive User Involvement 43
Social networks on the Web have also triggered a renewed interest in socio-
logical questions regarding how the Internet and the Web are changing our social
behavior (including the question of why people send out spam emails, act as
Internet trolls, and try to hack into other people’s accounts), how communities are
formed, or how news spreads across the Web. Social network analysis investigates
metrics that measure the characteristics of the relationships between the participants
of a network. In particular, such an analysis looks for ties (between pairs of par-
ticipants) and their strength as well as their specialization into bridges, for triangles
(involving three people), and for issues such as the clustering coefficient of a
participant or the density of the network or the degree of separation. For a brief
introduction to the former, the reader is referred to Levene (2010), for a more
detailed one to Scott (2013) or Borgatti et al. (2013). Watts (2004) found that the
average paths length for an e-mail message traveling the Web from sender to
receiver was around six; more recently, it was reported that Facebook found an
even smaller number: “The social media giant released a report on its blog
Thursday announcing ‘each person in the world’ is separated from every other by
‘an average of three and a half other people” (www.nytimes.com/2016/02/05/
technology/six-degrees-of-separation-facebook-finds-a-smaller-number.html).
In conclusion, the current expectation for several years to come is that user
participation in and contribution to the Web, which has increased considerably in
recent years, will continue to grow, and that the social life of individuals, families,
and larger communities of people will increasingly be enriched by online appli-
cations from social Web networks. It remains to be seen whether this will indeed
have anything more than local effects in the sense that people may now take their
network of friends online, but do not necessarily include people they have never
met or that they most likely will never meet. On the other hand and as a result of
this development, the social Web has become increasingly relevant to companies as
a vehicle for marketing, advertising, and communication internally as well as with
their customers.
The various developments we have described above, among them Freidman open
sourcing flattener or the Wikipedia movement, have led to various novel forms of
collaboration among people world-wide and to new models of doing business
which we will briefly discuss next.
Crowdsourcing
Open sourcing refers to the concept of outsourcing a task or a project to a com-
munity or “crowd” of contributors. As we saw with Wikipedia, the concept has
meanwhile found a host of additional applications beyond the development of
open-source software, and the more recent denomination is crowdsourcing.
Crowdsourcing can be seen as an offspring of both blogging (a hierarchical orga-
nization, where one blogger talks to a crowd and may allow or reject comments)
44 1 The Web from Freshman to Senior in 20+ Years …
and wikis, where a community works on a common task. The term crowdsourcing
was first coined in 2005 by Jeff Howe and Mark Robinson, editors of Wired
magazine (note: Wired also introduced the term “long tail” as previously discussed
in this chapter) and can be defined “the act of a company or institution taking a
function once performed by employees and outsourcing it to an undefined (and
generally large) network of people in the form of an open call” (Howe 2009).
While a new(ish) term, the concept underlying crowdsourcing is indeed cen-
turies old and began with the development of the vacuum-sealed pocket watch, also
known as the ‘marine chronometer’. In 1714 the British Government offered a prize
of £20,000 for the first person who could develop a device to aid navigation and
help prevent the loss of sailors. John Harrison subsequently developed the marine
chronometer that was accurate in determining longitude by means of celestial
navigation. In the early part of the 20th century, Toyota ran a competition to
redesign its logo. The competition was won, out of 27,000 entries, by a design that
included the three Japanese katakana letters for “Toyoda” in a circle. This was later
revised to “Toyota”. In 1955, an architectural contest was run to design a landmark
building to be located in the harbor of Sydney, Australia. The now famous Sydney
Opera House was judged the best design out of the 233 entries. In terms of
crowdsourcing on the Web, Wikipedia, by way of crowd size, is an example of
knowledge crowdsourcing—even though it is more commonly referred to as a Wiki
and pre-dates the introduction of crowdsourcing as a concept.
Online crowdsourcing first became popular in the form of crowdjobbing. One of
the oldest such platforms on the Web is Amazon’s Mechanical Turk (AMT). AMT
is typically called upon for tasks that are easy for humans, yet difficult for a
computer. For example, analyzing a large body of photos for an occurrence of a
certain person can be done by a human in a glance, while a computer needs to
perform pattern matching on each photo and search for one of potentially several
patterns in which the person in question could occur in one of the pictures. For tasks
like these, AMT acts as an intermediary for the Human Intelligence Task (HIT),
which is specified by a requester who wants to outsource it and typically equipped
with a (generally small) reward. People interested in working on a HIT, so-called
workers (also known as Providers or Turks), can download it, solve it, and return
the results. This principle is shown at www.mturk.com/mturk/welcome.
In general, crowdjobbing encapsulates a range of crowdsourcing tasks that are
concerned with the access to, and use of, labor, typically based on AMT. One of the
key characteristics of crowdjobbing is that in many cases, large, often complex
projects are broken down into much simpler tasks that, after completion by
anonymous crowd members, can be “re-assembled” to form the completed project;
see Kittur et al. (2013) for details. This approach gains the greatest benefit from a
large and disparate crowd that collectively may have the necessary skills, yet few
individual members may be able or willing to complete the full project. Depending
on their size and scale, tasks are often classified as either micro or macro tasks.
Micro tasks are HITs and can vary depending on the requester. For example you
can write descriptions, reviews, articles; tag and categorize images, data entry, fill
questionnaires and surveys, looking for information for businesses, transcribing,
1.6 Socialization. Comprehensive User Involvement 45
rewriting, proofreading and many other tasks. For each HIT listed, there is
appropriate reward. You can earn from $0.01 up to $100 per HIT, however vast
majority of the HITs are under $1. Macro work is a type of crowdjobbing where
specialized skills such as those related to education or infrastructure or technology
etc. are involved. Key differences from micro work are that they can be done
independently, they take a fixed amount of time and, they require special skills.
Crowdjobbing is suitable for both physical and virtual tasks. Lebraty and Leb-
raty (2013) describe, as an example of a physical crowdjobbing task, the mobi-
lization of small businesses in a rural area. Businesses are sent an electronic file
which they are required to print and display in public areas. Virtual task applica-
tions are much more plentiful and ideally suited to the many big data problems that
exist. One such example might be data cleansing where on its own, the problem is
enormous, but by employing a large number of workers, each addressing just a very
small part of the cleansing projecting, it can be completed with a very high degree
of accuracy.
Crowdsourcing has also found entry into scientific applications. For example,
UC Berkeley has developed CrowdDB, a system that uses human input via
crowdsourcing to process queries that neither database systems nor search engines
can adequately answer.
Crowdfunding
A popular special case of crowdsourcing is crowdfunding. In simple terms,
crowdfunding is the use of a Web-based application to facilitative the generation of
funds, often in the form of small donations by a large number of “backers” for a
particular cause or initiative. This principle is shown in Fig. 1.9.
It is common to see crowdfunding used for the funding of new product develop
or alternatively for fundraising for humanitarian, social, or personal crises causes
(e.g., funding for medical expenses). One of the most famous crowdfunding causes
in recent history was for the development of the Pebble e-ink smart watch in 2012.
As described, for example, at www.cnbc.com/2015/03/30/pebble-watch-funding-
hits-record-.html, the developers totally underestimated the scale of support for their
development (part of the incentive for donating was to be at the start of the queue
for the product when released) and ended up with more than 100 times their target.
The Pebble watch funding used the Kickstarter (kickstarter.com) platform, which is
the largest and most popular of the many such sites, other well-used sites include
Indiegogo (www.indiegogo.com), Sparksters (www.sparksters.com), or crowdrise
(www.crowdrise.com/). Others include Givealittle (www.givealittle.co.nz) and
GoFundMe (www.gofundme.com) which focus on social and humanitarian
fundraising causes, or ArtistShare (www.artistshare.com) which operates as a music
label for “creative artists.” There is nowadays an almost limitless supply of
crowdfunding alternatives.
Other Forms of Utilizing a Crowd
In addition to crowdfunding, the basic structural tenet of crowdsourcing has been
applied to a broad range of other tasks resulting in a number of more specific types
of crowdsourcing. The most important of these is described here by way of
crowdsearching. Other, more narrowly focused types, but not discussed here,
include crowdvoting, crowd-auditing, crowdcuration, crowdcontrol, and crowdcare.
In crowdsearching the crowd is used to search for information that cannot be easily
sourced by computers, or to help locate missing physical items. An interesting
example of the latter is the online community “CrowdSearching” (www.hippih.
com/crowdsearching), which uses the power of social media to help people around
the world find their lost belongings. Tomnod (tomnod.com), which means “Big
Eye” in Mongolian, uses a crowd of “digital volunteers”, or “nodders” to identify
items of interest in satellite images, often for social or humanitarian causes. For
example, one cause focuses on the identification of buildings in rural Ethiopia from
satellite maps based on the assumption that buildings typically indicate human
habitation and as such, acts as a de-facto measure of population distribution. This
then can aid, for example, in the management of preventable diseases.
A key enabler of crowdsourcing is the platform facilitating the various crowd-
sourcing activities. In general, these application platforms are custom-designed for
specific types of crowdsourcing tasks. As mentioned, Amazon Mechanical Turk is a
micro-job crowdsourcing online marketplace where requesters use and pay for
human intelligence from workers to perform tasks that computers cannot do.
Delicious.com used to be a social bookmarking site that enabled the crowd to store,
share and discover bookmarks of Web documents. Utilizing “tags” these could be
stored in a variety of structures for ease of access, sorting and manipulation. There
also exists specialized crowdsourcing for idea generation (e.g., ideascale.com), data
sharing (e.g., deadcellzones.com), distributed innovation (e.g., ideasculture.com),
and content markets (e.g., threadless.com, istockphoto.com), to name but a few.
1.6 Socialization. Comprehensive User Involvement 47
• Be specific. Make sure to make a detailed list of what the person is supposed to
do. If you’re looking for feedback on a design, don’t ask “what do you think?”,
but be specific like, “Is the text readable? Do you see any layout errors? Does
the page loads fast?”
• Don’t be too cheap. Crowdsourcing is cheap, but it follows the same formula of
other jobs: the more you ask from the people, the more you have to give in
return. If you ask the people to spend 10 minutes filling out a questionnaire, it’s
unreasonable to offer them 10 cents. Hardly anyone would bite on that deal, and
for those who do, the results won’t be useful to you.
• Have a way of verifying the results. When outsourcing to a large crowd of
non-professional workers, the results can vary greatly. Make sure to state in the
job description about what skill or knowledge is required from the user.
• Weigh your options. Instead of Amazon’s Mechanical Turk, consider using
dedicated crowdsourcing services for things like usability testing. While they
certainly cost more, they are a lot more likely to yield better results. (source:
hongkiat.com)
Similarly, Fitzgerald and Stol (2015) present a number of “Do’s” and “Don’ts” of
crowdsourcing software development that indeed apply to may uses of the crowd:
• Do build a relationship with the crowd to identify those who are invested. Get to
know the people who actually care about your product or service.
• Do Provide Clear Documentation to the Crowd.
• Do Assign a Special Person to Answer Questions from the Crowd.
1.6 Socialization. Comprehensive User Involvement 49
During its relatively short lifespan of a little over 20 years so far, the Web has
undergone a tremendous development. It is nowadays truly ubiquitous, accessible
from almost any device in almost any place at almost any time. Indeed, wearesocial.
com/uk/special-reports/digital-in-2016 shows a global snapshot comparing the
world’s overall population to the number of Internet users as well as to the number
of mobile users, all figures as of January 2016:
• 3.419 Billion Internet users among a world population of 7.395 Billion people;
• 2.307 Billion active social media users;
• 3.790 Billion unique mobile users.
So the Web as the most prominent Internet service has indeed evolved from a
freshman in 1993 to a senior today, but is the Web already graduating?
We have discussed a variety of applications that can nowadays be used on the
Internet and the Web, and technology which has provided the underlying infras-
tructure for all of this with fast moving and comprehensive advances in networking
and hardware as well as software technology. We have also discussed the various
forms of user participation and contribution (which we might also call socializa-
tion) which has changed the way in which users, both private and professional,
perceive and utilize the Web, interact with it, contribute to it, publish their own or
their private information on it, or conduct business. So the Web has emerged from a
medium where a few people centrally determined what all others had to use to one
where almost half of the world’s population participate and jointly create and
publish content. An immediate consequence is that increasing amounts of data are
produced on the Web, which either get stored or are streamed. More data arises
from commercial sites, where each and every user or customer transaction leaves a
digital trace. The reader interested in what happens on the Web is recommended to
take a look at visual.ly/internet-real-time, which preserves an animation that orig-
inally appeared at pennystocks.la/internet-in-real-time/. Other such sources include
www.webpagefx.com/internet-real-time/ as well as www.betfy.co.uk/internet-
realtime/.
50 1 The Web from Freshman to Senior in 20+ Years …
Berners-Lee (2000) is an account of the early design of the Web by the man who
created it; many other books deal with the history of the Web or with Berners-Lee
and his importance for the modern world. Berners-Lee submitted his proposal for
the World Wide Web in 1989 and launched the first website in 1991. He founded
the World Wide Web Consortium (W3C) in 1994, established the World Wide Web
Foundation in 2009, and is the recipient of the 2016 ACM Turing Award. A vivid
account in this context is the Internet History Program of the Computer History
Museum in Mountain View, California (see www.computerhistory.org/nethistory/).
We also refer the reader to the 2014 issue of CORE, the magazine of the Computer
History Museum, which contains various articles under “The Web at 25” umbrella
(see s3data.computerhistory.org/core/core-2014.pdf).
1.8 Further Reading 51
The client/server principle (cf. Fig. 1.1) has found wide use in computer systems
and is described in more detail, for example, by Tanenbaum and van Steen (2007).
Tanenbaum and Wetherall (2010) explain P2P networks in more detail. Musciano
and Kennedy (2006) is one of many sources on the HTML language and also
covers XHTML; for an introduction to the current version HTML5, available since
2014, see www.w3schools.com/html/.
The reader interested in search engines should consult Brin and Page (1998) for
the original research on Google, Vise (2005) for an account of the early history of
Google, and Miller (2009) for an in-depth presentation of its possibilities. Levene
(2010) describes how search engines work in general; Langville and Meyer (2012)
study Google’s as well as other ranking algorithms and give an in-depth exposition
of the mathematical and algorithmic aspects behind PageRank calculations. Infor-
mation retrieval (IR) are explained, for example, in Baeza-Yates and Ribeiro-Neto
(2011), Levene (2010), or Büttcher et al. (2010) and have had a big impact on how
a search engine works. The long tail concept is discussed in detail in Anderson
(2006) as well as on Anderson’s Web site (at www.thelongtail.com/).
Moore’s Law is discussed by Friedman (2016) and also by Thackray et al.
(2015); another account can be found in the 2015 issue of CORE under the title
“Moore’s Law @ 50” (see s3data.computerhistory.org/core/core-2015.pdf). Spec-
ulations that its end is near have been published repeatedly; see, for example, www.
technologyreview.com/s/601441/moores-law-is-dead-now-what/#/set/id/601453/.
On the other hand, leaps forward seem also possible, for example via Google’s
Tensor Processing Unit (www.pcworld.com/article/3072256/google-io/googles-
tensor-processing-unit-said-to-advance-moores-law-seven-years-into-the-future.
html). The future of semiconductors is outlined by Greengard (2017). For details of
TCP and IP, we refer the reader again to Tanenbaum and Wetherall (2010). A trend
in networking that has become popular around mid-2000s is software-defined
networking (SDN), an approach to computer networking addressing the fact that
traditional network architectures do not support the dynamic and scalable com-
puting and storage needs of more modern computing environments. SDN is
achieved by decoupling the network components that make routing decisions for
traffic (the control plane) from the components that forward traffic to selected
destinations (the data plane). An introduction to this area is provided by Goransson
and Black (2017).
Blogs have become a highly popular medium for people to express themselves;
readers interested in learning more about the topic are referred to Reardon and
Reardon (2015). Social networks have become popular outlets for self-portrayals
that often even exaggerate, a phenomenon that caused Time magazine to report on
“The Me Me Me Generation” already in its May 20, 2013 issue. Yet like other
social media, blogs and social networks have given rise to what is now called
“cyber-mobbing” or “cyber-bullying;” see, for example, Festl and Quandt (2016) or
Blöbaum (2016), an emerging research area within the field of communication
studies.
52 1 The Web from Freshman to Senior in 20+ Years …
In this chapter we discuss a variety of digital technologies that have become rel-
evant over the years. We start by looking at digitized business processes as they
have transformed many areas of business. This topic is closely related to business
process modeling and management (BPM) and with the execution of business
processes, which is nowadays often done using engineered systems and appliances.
After that we present cloud computing and cloud sourcing (not to be confused with
crowd sourcing which we introduced in the previous chapter), which is the main
enablers for big data and analytics. Our goal is to present the technological basics of
these areas, emphasize why they are relevant today, and discuss what their impact
so far has been. We will not present these technologies in full technical detail;
instead we try to describe the core of what is needed to appreciate them and to see
them in perspective, namely in a customer perspective (Chap. 3) as well as in a
business perspective (Chap. 4).
In a globalized world, business processes increasingly form the crux of any orga-
nization. The reason for this is comparatively straightforward, if one considers that
any change in an organization will be accompanied by changes inseparable from its
business processes. Changes in the global market are common for the daily agenda:
Companies are increasingly forced to adapt to new customers, competitors, sup-
pliers, and business partners. Globalization is one of the “accelerators” that Tom
Friedman has observed (Friedman 2016). Competitive edges are achieved more
frequently not by better products, but by more efficient and cost-effective processes.
In short: Business processes have developed into an additional factor of production.
Given this background, it does not come as a surprise that of all the professionals
and managers today, but also the IT staff, suppliers and sometimes even the cus-
tomers of a company—we collectively call these the business community—are
expected to have a good understanding of business processes. Collectively they
contribute to the design, analysis, documentation, execution, and evolution of many
different types of business processes. Of course, this only succeeds if an efficient
and effective communication is possible. It requires that the same language is used
within the entire business community and that time-consuming and error-prone
translation operations are omitted. However, what does this look like in practice?
Obscurities, contradictions, misunderstandings and omissions in communication are
common. Each interest group in the business community maintains its
group-specific perspective on a business process: The management focuses on
corporate goals and business performance indicators; business professionals have
their business applications and processes in mind, and IT professionals think in
terms of software- and hardware structures. It is clear that communication chal-
lenges are inevitable. In many organizations, one tries to remedy this situation with
modeling experts who “translate” the collected business requirements into process
requirements, summarizing these in vast and highly complex models. Such an
approach gives the appearance of professionalism and efficiency, and in fact, such
“model monsters” are often given the stamp of approval by the entire business
community, hence forming the basis of organizational change. However, many
community members only recognize the negative implications of these “model
monsters” at a point far too late when they are forced to live with the respective
organizational changes.
Figure 2.1 shows an alternative approach that is based on the use of a common
modeling language. This language is understood and (ideally) “spoken” fluently by
all members of the business community involved. This means that an explicit
translation of the communication processes will no longer be necessary, and the
abstraction of group-specific perspectives as well as structuring of the contents
conveyed can be carried out individually by everyone involved.
Which prerequisites are necessary for such a universal modeling language? First,
it must be easy to learn and can be mastered by inexperienced users quickly and
reliably. The language must provide all technically relevant aspects of a business
process in detail. All this is possible only if the language has a simple syntax, which
requires a minimum number of language elements, and clearly defined semantics
that distinctly govern the use and interpretation of the elements. A typical repre-
sentative of such languages are Petri nets, which have proved to be effective in
business process modeling.
Simplicity of use combined with various fields of application—the strengths of
Petri nets are best utilized when they are integrated into a proven modeling method
in a particular way. In addition, it can regulate when and how analysis and simu-
lation are to be employed, and which results can be obtained. Efficient work with
Petri nets and the application of relevant methods is inconceivable without
2.1 Digitized Business Processes 55
abstract and
structure
Business
Processes
appropriate software tool support. Tools ensure compliance with the syntactic rules,
support methodical steps and take over tasks of the administration, documentation
and use of content created.
Modeling strategy
Business Process Management (BPM) has established itself since its beginnings in
the early 1990s as an independent discipline that bridges the gap between corporate
strategy and the information and communication technologies used by a company.
Based on the Gartner IT Glossary (see www.gartner.com/it-glossary), bpm.com
Forum and Rosing et al. (2015), BPM can be defined as follows: “BPM is a
discipline that uses different methods and software tools to discover business
processes. They are then comprised into models, to analyze and simulate, to
measure, and to improve and optimize them on the basis of predetermined criteria.
BPM is aligned to the business strategy that is important to the company or for the
organization as a whole and in general. Business processes coordinate the behavior
of people, systems, information and things (see IoT) within an enterprise or across
58 2 Digital (Information) Technologies
Reshaping, Optimization
Reengineering
Shaping
• Design: Business process management is set up for the first time and introduced.
The processes of the respective companies are subjected to a design.
• Engineering: Designed processes are implemented and made available for
execution. An efficient resource use is important, furthermore an adequate
connection of process, object and organization models.
• Monitoring: Existing and pre-established business processes are subject to
continuous monitoring to identify and remove bottlenecks in processes or
resource allocation. The management of the business processes is improving
constantly; this can be done continuously or at specific time intervals. The goal
is a continuous optimization of the current operations of existing processes.
• Reengineering: An established process management is redesigned or optimized
partly or completely because of changed organizational conditions.
Business process design refers to the design and development of a process prior to
its implementation. This case is usually found in practice only where completely
new business fields must be integrated into an existing process landscape or when
new technical possibilities are introduced (e.g., a switch from a bricks-and-mortar
business to e-commerce). Business process engineering means the continuous
further development and optimization of processes. Proven processes are retained
and linked with improved or partially redesigned processes. The changes may not
be drastic, they take place gradually. For one, the risk that a transformation always
brings with it is reduced and the acceptance on the other hand is improved.
A prerequisite for this evolutionary form of business process development is a
permanent monitoring. Only then weaknesses can be identified and the impact of
the changes be displayed and analyzed. All these objectives require a precise
definition of the business processes as well as a consistent documentation, which
can be achieved by adequate modeling.
The original method introduced by Hammer and Champy in 1993 for business
process reengineering (BPR) proposes a radical redesign of the existing process
landscape. For best results, all processes will be newly developed from scratch.
However, in reality proven processes remain unconsidered. Because of the serious
changes, which a radical change generally causes, the approach did not gain
acceptance in its “pure form” in operational practice. A redesign, on the other hand,
of distinct subareas of a company is much more feasible and palatable.
Generally accepted nowadays is holistic business process management, which
on the one hand takes the indicated aspects of the procedure modeling, object
modeling as well as organization modeling already indicated here into considera-
tion. Further, it takes the business view (abstracting from processes) as well as the
service view (implementing processes and their constituents) into account; this is
indicated in Fig. 2.4.
60 2 Digital (Information) Technologies
Business Evolution
Business
View
View
Implementation Reconciliation
Business processes are the focal point when it comes to changes in an enterprise, be
it the implementation of new business models and strategies, be it in the realization
of information systems, or be it related to quality improvements—the discussion
always involves business processes. A logical consequence is the necessity for a
realistic and easy-to-understand portrayal of the business processes that qualify as a
basis for effective communication, but also for analysis and simulation. Based on
current practical projects, we will show next how business process models can be
used in important applications as well as the resulting benefits derived there from.
Business Process Reengineering (BPR)
While many books have been written on the topic of business process reengi-
neering, genuine accounts of implementation are rare. Why is this the case?
Because the nature of business process reengineering is a “fundamental rethinking
and redesigning of all business processes of an enterprise or corporate division.”
And as an objective, “dramatic and sustained improvements in process performance
in terms of quality, cost and time” come into question. Enterprises quickly find
enthusiasm for this goal; however, they have trouble with fundamental rethinking
and, above all, with an implementation of completely redesigned processes. This is
particularly true when—as is typical in business process reengineering—a new
organizational structure is derived from the new process. In practice, business
2.1 Digitized Business Processes 61
the best results. Based on these considerations, holistic business process manage-
ment has long surpassed the topic of SOA in its significance. And it is undisputed
that a consideration of business processes now belongs at the center of each
strategy, organization and IT project. Business process models then act as a central
reference point for all technical specifications and build the bridge to the specific
implementation in the form of organizational and IT solutions.
Process-Oriented Introduction of Business Software
There are numerous reports of failed or at least severely delayed and costly software
projects. Many of these projects deal with the introduction of standard software,
although this at first appears easier than developing new custom software. Why
business software projects appear to be more difficult will be explained below, in
order to derive an improved introduction process from it. It is often discovered only
during the course of the implementation of business software that the business
processes of the standard software at hand do not correspond to the current pro-
cesses in the enterprise. The comprehensive functionality of a standard software and
the resulting complexity lead to a divergent understanding of the term “standard”
with respect to these processes. Heterogeneous IT system landscapes are another
reason for problems in the introduction of standard software. Complex interface
solutions for the integration of different systems require additional efforts. Modern
cloud solutions offer much hope with respect to addressing complexity, agility, and
cost. Nevertheless introduction projects will remain difficult and will lead to more
organizational change management efforts.
For business users, introducing a new business software is always a challenge, as
entirely different skills are required than those needed in daily business, and sec-
ondly, the project work is often done in addition to the daily standard tasks. The
induction into new business software is difficult because of the complexity of the
software and the documentation is correspondingly large. Furthermore, the true
benefit of such solutions is often visible only with the interplay realized by the
software from several areas of an enterprise. This overlapping view of the solution
remains hidden for many users at first. The same problem is often found in the
system documentation, especially when this is geared purely functional and not
process-oriented. The points listed result in long project execution times and often
result in budget overruns. Furthermore, functions not covered by the standard
software are often only identified during system testing.
Many of the problems described cannot be solved with traditional business
software adoption methods and therefore demand novel approaches, with BPM
being one such approach. For an enterprise software to be introduced, predefined
business process models—often also reference models—are the key to shorter
project terms as well as to high quality in the implementation and results of the
project. Figure 2.5 shows an example of such a reference business process model
for the enterprise software Oracle Fusion ERP Cloud.1
1
Oracle Fusion ERP Cloud is a product of Oracle Corp., Redwood Shores, CA, USA.
2.1 Digitized Business Processes 63
Fig. 2.5 Model of a reference business process from the financial area
execution
Employee
typification of
object models
ownership
ownership
Procedure
Risk
model
model risk context
Activity
risk
provisioning
key figure
Key figure context Rule
model Refinement compliance model
as shown in Fig. 2.6. Such an integrated enterprise model prevents the creation of
new “information islands” through GRC that would lead to inefficiencies and hence
would stand in the way of interesting optimization opportunities.
We mentioned in Chap. 1 that more and more applications and service that are
available on the Web or on the Internet do not require a local installation anymore;
instead, they nowadays reside “in the cloud.” Essentially this means that there are
computing and storage resources, databases, or other systems or resources that
(often exclusively) are accessible via the Web, typically via a browser, or via an app
that is installed on a mobile device. Typical examples include audio and video
streaming (e.g., via Spotify or Netflix), shared writing of documents via Google
Docs or Zoho Docs, or the various applications and sites that are commonly
involved when booking travel: airlines for flights, hotels for accommodation, rental
car or shuttle services for transportation. All of them can be researched on the Web,
can be accessed for time and price comparisons, and are available for booking,
paying, and finally evaluating the product or service, often without leaving a single
platform or portal. In this section we deviate from the topic of BPM, and take a look
at the infrastructure that is rapidly becoming ubiquitous.
From a technical perspective, the infrastructure behind such applications are
compute or storage servers as well as the software running on them. We have no
idea where they are located, how they have been set up, or when maintenance is
66 2 Digital (Information) Technologies
needed. Lewis Cunningham of Oracle Corp. has characterized this by saying “cloud
computing is using the Internet to access someone else’s software running on
someone else’s hardware in someone else’s data center while paying only for what
you use.”2 In other words, cloud computing refers to the external provision of IT
infrastructure as well as application via the Internet conveying to a user the illusion
of unlimited, on-demand available resources.
Nicolas Carr in his 2008 book The Big Switch has compared this situation with
the arrival of electrical current; according to Carr (2008), we were at the point
where the handling of electricity was roughly 120 years ago: In order to be able to
use electricity, you had to produce it yourself. Later people learned how to transport
electrical current over increasingly long distances. Today we are used to obtaining
electrical energy from an outlet in the nearest wall in the quantity and quality just
needed, without having to worry about the source, the path to the outlet or the
provider. Carr compares this to the development of how we utilize computing
resources: Until a few years ago, companies or individuals needed their own
machine(s) if they needed compute or storage capabilities. As time passed, thin
clients replaced workstations, and servers withdrew into the background. In the
future, the location at which computing resources originate will no longer be
transparent at all, yet readily available in quantities as needed, and at the price that
is proportional to usage.
Smartphone and tablet users have gotten used to reading and writing e-mails
using a cloud-based service like Google Mail, to store itineraries with a service like
TripCase or Tripit, files with Dropbox or images with Instagram. The advantages
are obvious: There is no more need to store data on a local device, yet the data is
accessible (almost) anywhere and anytime. Protecting the data against loss or
misuse is left to the provider, and most of the time the provider will charge only
when advanced services (e.g., of a certain service quality) are requested. It is
exactly these aspects that make cloud computing relevant for companies: Resources
like compute power or storage space can be obtained in adequate amounts, as can
software services, and all without the need for local installations or maintenance.
We next take a closer look at several cloud applications and their providers that are
particularly relevant to enterprises.
As an example for a software service that is often being used by both individuals as
well as companies, we consider Gliffy (www.gliffy.com/), a drawing tool for
designing diagrams of various kinds, SWOT analysis results, Web page layouts,
networks sketches and architectures, business process models or technical drawings
(see www.gliffy.com/examples/), all of which can be done in a browser. Gliffy can
be used online or as a plugin to team collaboration software Confluence, and all
drawings or diagrams are stored at the Gliffy site, so that users can collaborate on
2
it.toolbox.com/blogs/oracle-guide/cloud-computing-defined-28433
2.2 Virtualization and Cloud Computing 67
them online. Besides the fact that this versatile cloud service does not require local
installation, it is also beneficial that a user does not have to worry about new
versions or security patches, and that pricing comes in three versions: Free, Stan-
dard, and Business. These scale the price a user has to pay from zero to around US$
10 per user per month, which makes it attractive for people who need to draw quite
a bit. Users might, however, be unhappy with the fact that their work remains on
Gliffy servers, which brings up the question of how well these are protected against
unauthorized access (e.g., by the competition), espionage, or loss, for example in a
catastrophe. Questions of this type result in some enterprises being hesitant to move
their applications and data to the cloud.
Gliffy is representative of a comprehensive class of software that in recent years
has been established on the Internet as an alternative to traditional licensing. Private
as well as professional users have on-demand access to writing tools, spreadsheets,
presentation software, meeting planners, calendar tools, conferencing software,
project management, accounting, HR management, as well as many others. Of the
many examples out there today, we just mention Google Apps for Work (gsuite.
google.com/), ThinkFree Cloud Office (www.thinkfree.com/), Syncplicity (www.
syncplicity.com/), or Zoho (www.zoho.com/).
Our second example goes a step further and no longer just considers usage, but
also development of applications. Sites offering both development and immediate
deployment are called platform services. An example is Force.com (www.
salesforce.com/platform/products/force/), which is a service offered by Salesforce.
As they say themselves, “Force connects business users and IT with a full suite of
tools for building apps that automate business processes, faster than ever before.” In
other words, Force is a platform enabling both the development and provisioning or
usage of applications; thus, a programmer can focus on functionality, search
function, process support or reporting options when developing an app, without
even having to bother with provisioning, sales, or management of the necessary
infrastructure (programming environment, machines, etc.). Once developed, apps
can directly be deployed using the Force platform, which they call “App Cloud.”
The developer no longer needs to consider scaling for increasing or decreasing
traffic, and the App Cloud even performs automatic backups. Besides Force, the
Salesforce site offers a variety of additional tools supporting the development of
mobile apps or appealing user interfaces and even connects to a marketplace for
apps (see www.salesforce.com/platform/products/). Alternatives to Salesforce and
Force are provided by the Google App Engine (cloud.google.com/appengine),
Heroku (www.heroku.com/), Microsoft Azure (azure.microsoft.com), or the Oracle
Cloud (www.oracle.com/cloud/paas.html).
Our third example comprises an even more elementary category of services,
namely the provisioning of pure infrastructure services or basic resources such as
compute power or storage space. Such resources can be obtained, for example, from
Oracle Infrastructure as a Service (www.oracle.com/cloud/iaas.html) or from
Amazon Web Services (AWS, aws.amazon.com/), the same platform that provides
the Amazon Mechanical Turk services we mentioned in Chap. 1. AWS started as a
small side business in 2006 and meanwhile also comprises a number of platform
68 2 Digital (Information) Technologies
With services like the ones just discussed, it is not difficult to imagine that the entire
IT functionality of an enterprise can be moved to the cloud; all that is needed within
the enterprise are access points (“thin clients”) to the various cloud-based infras-
tructure, development or runtime environments, or application services. Such a
transition can obviously be done in several steps and can have several forms, which
are summarized in Fig. 2.7.
From left to right, this figure shows an increasing level of outsourcing, which
ranges from none to a full cloud-based operation, where the latter may even include
3
techcrunch.com/2016/07/02/andy-jassys-brief-history-of-the-genesis-of-aws/
2.2 Virtualization and Cloud Computing 69
• The fact that cloud usage often exhibits a pay-per-use cost model suggests to
parallelize compute tasks more and more, so that they can be performed in
shorter time at (almost) the same price. The reason is simple: Suppose you need
100 CPU hours of compute time for a task, then you can either employ 100
virtual CPUs for one hour each, or one CPU for 100 h (where intermediate
solution are possible as well). The latter will almost always be the preferred
alternative.
• As many Web sites generate amounts of data in the GB or TB range on a daily
basis, e.g., by logging every click a user is performing, and as many users often
produce large amounts of data themselves, e.g., pressing Like buttons, evaluate
products, send short text messages, etc., site operators are increasingly interested
in analyzing this data, in order to better understand their customers, to send them
special offers, or to improve their products based on customer feedback. Cloud
services supporting such business intelligence (BI) applications are becoming
increasingly popular (see our discussion of AWS above).
Clearly, not every type of application is “made” for the cloud. Think, for example,
systems that need to guarantee an extremely low latency, such as electronic stock
exchanges or the emergency switch-off of a power plant, which cannot be depen-
dent on whether or not a computer network is up or down.
The aspects that need to be considered in connection with could computing, in
particular when it come to a decision on whether or not to move totally or in part to
the cloud, are:
Some of these aspects will be considered in what follows, where the view we
pursue will always be that of the (professional) customer. Before we delve further
into cloud aspects, we take a brief look at its precursors next.
2.2 Virtualization and Cloud Computing 71
VM Container VM Container
Virtualization Layer
Hardware
abstracts from the hardware actually present. This abstraction happens between the
hardware and the software layer of a system, as indicated in Fig. 2.8; which shows
two virtual machines mapped to the same hardware and encapsulated by individual
containers. Note that virtual machines (VM) can run distinct operating systems atop
the same hardware.
Virtualization typically simplifies the administration of a system, and it can help
increase system security; a crash of a virtual machine has no impact on other virtual
machines. Technically a virtual machine is nothing but a file. Virtualization is
implemented using a Virtual Machine Monitor or Hypervisor which takes care of
resource mapping and management.
We finally mention another precursor to cloud computing, which can be
observed during the past 25 years as a major paradigm shift in software develop-
ment, namely a departure from large and monolithic software applications to
light-weight services which ultimately can be composed and orchestrated into more
powerful services that finally carry entire application scenarios. Service-orientation
especially in the form of service calls to an open application programming interface
(API) that can be contacted over the Web as long as the correct input parameters are
delivered have not only become very popular, but are also exploited these days in
numerous ways, for the particular reason of giving users an increased level of
functionality from a single source. A benefit of the service approach to software
development has so far been the fact that platform development especially on the
Web has received a high amount of attention in recent years. Yet it has also
contributed to the fact that services which a provider delivers behind the scenes to
some well-defined interface can be enhanced and modified and even permanently
corrected and updated without the user even noticing, and it has triggered the
development of the SOA (Service-Oriented Architecture) concept that was men-
tioned in the previous section.
74 2 Digital (Information) Technologies
With these defining properties, cloud computing promises to realize the vision that
computational power will ultimately be obtainable like electricity from a wall
outlet, an idea that has been termed utility computing and was first mentioned by
John McCarthy during a talk at the MIT Centennial 1961, where he said: “If
computers of the kind I have advocated become the computers of the future, then
computing may someday be organized as a public utility just as the telephone
system is a public utility… The computer utility could become the basis of a new
and important industry.” Today the idea of utility computing is primarily mani-
fested in the pricing models of cloud services, which often base payment on actual
usage.
4
nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf
2.2 Virtualization and Cloud Computing 75
The defining properties also set cloud computing apart from its various pre-
cursors, in particular cluster computing as well as grid computing, since the latter
were not based on virtual resources, rapid elasticity, or on-demand self-service;
only the grid typically has possibilities to measure service usage. Other than that,
cloud services and cloud computing are the more flexible option, which has made it
so attractive to the IT community.
Technically, cloud providers typically base their processing power on large
collections of commodity hardware, including conventional processors (“compute
nodes”) connected via Ethernet or inexpensive switches, which are arranged in
clusters and which are replicated within as well as across data centers. Replication
as a form of redundancy is the key to hardware reliability and fault-tolerant pro-
cessing, and in just the same way as data is protected against losses via replication.
Besides fault-tolerance and availability, distribution can enhance parallel processing
of the given data, in particular when computing tasks can be executed indepen-
dently on distinct subsets of the data. In such a case, data is often partitioned over
several clusters or even data centers.
The number of cloud services available these days is huge, and as we have briefly
discussed with our introductory examples, cloud services can essentially be cate-
gorized in three ways as shown in Fig. 2.9 (although more categories are sometimes
used, e.g., Database-as-a-Service or Business Process-as-a-Service), which also
Software-as-a-Service (SaaS)
Platform-as-a-Service (PaaS)
Infrastructure-as-a-Service (IaaS)
Virtualization Layer
Hardware Infrastructure
76 2 Digital (Information) Technologies
represent the NIST service models5 (see also Mell and Grance 2011; Sitaram and
Manjunath 2012):
Thus, software in an SaaS model can be immediately employed by a user, yet the
provider is running the software and takes care of installation, administration,
maintenance, upgrades, or failure recovery. Under the PaaS model users can
develop and provision their own programs in the cloud; the infrastructure provider
may define certain general regulations regarding, for example, the programming
environment or the available libraries or interfaces. Under the IaaS model, which is
closest to the vision of utility computing, the cloud provider offers virtual hardware
or infrastructure services, including computing power, storage, or network
bandwidth.
We mention that specific IaaS services are nowadays available for storage as
well as for networking: Software-defined storage (SDS) is the idea of making data
storage independent of the underlying hardware by introducing a software layer
atop that is policy-based and provides management capabilities. Software-defined
storage typically includes a form of storage virtualization to separate the storage
hardware from the software that manages it. The software enabling such an envi-
ronment may also provide policy management for features such as data dedupli-
cation, replication, thin provisioning, snapshots and backup. It is essentially a
similar concept to software-defined networking which we already mentioned in
5
nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf
2.2 Virtualization and Cloud Computing 77
Chap. 1. Both techniques built upon virtualization, and together with virtualized
servers that form the basis of a data center that is completely software-defined,
capable of offering its services broadly, in an automated fashion. See www.snia.org/
sites/default/files/SNIA%20Software%20Defined%20Storage%20White%20Paper-
%20v1.0k-DRAFT.pdf for an attempt to provide a generally accepted definition of
SDS.
We also mention that the notion of a platform is used in different ways: What we
have just described is a purely technical view of a platform, where it provides
development tools and ways to directly provision the software that has been
developed using these tools. A typical example here is WSO2 (wso2.com/), where a
variety of tools are readily available for establishing a Service-Oriented Architec-
ture in the cloud. Another one is Salesforce’s Force platform. This is not to be
confused with a different type of platform, often related to and mentioned in the
connection with some form of disruption, which brings together supply and demand
as a “hotspot” of digital economy. Examples include Apple’s iTunes and AppStore
or Airbnb; artifacts we will say more about in Chap. 5.
Besides the types of services that a cloud can provide, a distinction can be made
regarding the types of clouds that are employed. Again it is common to follow NIST
document mentioned above, which calls them deployment models and defines the
following, which is illustrated in Fig. 2.10.
Company 1
Private
Cloud Hybrid
Cloud
Internet Company 6
Private
Public Cloud
Public Cloud
Cloud
Company 2 Company 3
Community
Cloud
Company 4 Company 5
• Public cloud: The cloud infrastructure is provisioned for open use by the
general public. It may be owned, managed, and operated by a business, aca-
demic, or government organization, or some combination of them. It exists on
the premises of the cloud provider.
• Private cloud: The cloud infrastructure is provisioned for exclusive use by a
single organization comprising multiple consumers (e.g., business units). It may
be owned, managed, and operated by the organization, a third party, or some
combination of them, and it may exist on or off premises.
• Hybrid cloud: The cloud infrastructure is a composition of two or more distinct
cloud infrastructures (private, community, or public) that remain unique entities,
but are bound together by standardized or proprietary technology that enables data
and application portability (e.g., cloud bursting for load balancing between clouds).
• Community cloud: The cloud infrastructure is provisioned for exclusive use by
a specific community of consumers from organizations that have shared con-
cerns (e.g., mission, security requirements, policy, and compliance considera-
tions). It may be owned, managed, and operated by one or more of the
organizations in the community, a third party, or some combination of them, and
it may exist on or off premises.
We mentioned earlier that a major motivation for the high interest in the cloud in
recent years has been the fact that cloud services are often based on a pay-per-use
business model. In fact, there are several pricing schemes in effect which make the
cloud interesting for both private as well as professional users. The following
models are common:
1. Free: The service is essentially free to use, i.e., no immediate flow of money is
required, yet most often a user has to register in return for the service. This
scheme is typically applied when the service itself is sponsored through
advertising, and when the service provider is interested in collecting data about
its users, e.g., their e-mail addresses or—better yet—some kind of “user pro-
files” that allows them to customize the placement of advertisements. Clearly,
this model can help to attract users which in turn attracts suppliers of ads.
Examples include services like AroundMe, Yelp, or SeatGuru.
2. Pay-per-use: As mentioned, in this model there is no free usage, but no constant
payments either. A service user is only charged for the duration and/or amount
of service usage. Usage-based pricing corresponds to the human rationality that
each single unit of a commodity raises the total amount of money to pay for.
Examples include AWS or Microsoft Azure.
3. Pay-per-unit: In this model, a payment is made once per product purchased,
independent of its usage intensity. For example, Amazon or Apple may charge
for the purchase of a single title of music or a movie, but they do not care how
often it is actually played thereafter. The same concept applies to buying apps in
an app store or to streaming services like Amazon Prime Video.
4. Package pricing: This model offers a user a certain amount of API calls to a
service for a fixed fee and is typically applied when the service is primarily
providing data. Note that companies in this category often sell quantities that are
not actually used by the customers; also note that depending on the package
size, API-calls potentially allow the model of arbitrage. Examples include
OpenCalais by Thomson Reuters or Yahoo!Boss.
5. Flat fee: This kind of tariff is one of the simplest pricing models with minimal
transaction costs. It can be based on time or volume as the only parameter. This
pricing scheme is mainly used in regard to software licenses and software
hosting and resembles the subscription model traditionally applied in the
newspaper business. On the one hand, a flat fee tariff provides suppliers and
users with more safety in planning future activities. On the other, especially
from the user’s perspective, a flat fee tariff lacks flexibility. A supplier has to
bear in mind his service’s specific market structure and the user’s preferences.
To do so, suppliers could combine a fee tariff with flexibility by offering short
term contracts. Examples include the Reserved Instances for Amazon Web
Services such as EC2 or RDS.
80 2 Digital (Information) Technologies
6. Two-part tariff pricing: This is a combination of the previous two, where users
pay a fixed basic fee and on top of that an additional ‘fee per unit consumed’.
Sometimes, the fixed part covers the fixed costs of the provider, whereas the
variable fee generates the profit. Another example is pricing schemes for soft-
ware licenses where prices are often calculated by taking a base fee and adding a
surcharge depending on the numbers of users who would use the system. This
pricing scheme is commonly used by telephone companies.
7. Freemium: This is another combination where the idea is to let users join and
use basic services for free and charge them for (premium) services that provide
additional value to customers. The payment model for additional services can
take any of the forms described above. An example is again AWS EC2, which
has a “free tier” for new customers, or OKCupid, which is a dating service
whose basic usage is free, but which requires a paid registration for using
advanced services.
Various other combinations besides freemium or two-part pricing are in use, for
example, in a “pay-per-unit + pay-per-use” model, an initial buying price is
charged as well as usage-based fees thereafter. The AWS pricing for “Reserved
Instances” mentioned above is an example of this model. All models have pros and
cons, especially depending on whether you consider them from a user’s or a pro-
vider’s perspective.
technological progress, as the provider will generally make sure that the systems he
is running are up-to-date.
On the downside, there are also potential pitfalls to consider, and it is these
pitfalls that often result in hesitation to move to the cloud. For example, SMEs
whose core business or competency is not in the IT area often lack appropriately
trained personnel for dealing with service-oriented IT in general or the cloud in
particular. They also may have difficulties to do comparisons among the variety of
providers nowadays offering cloud services. Next is the issue of trust, or the
question whether the provider can be trusted with respect to data and applications
he is hosting for the company in question. Availability is an important aspect, as an
unexpected outage which makes services unavailable can easily endanger, some-
times even ruin a business that vastly relies on the cloud (see www.theregister.co.
uk/2015/09/20/aws_database_outage/ for an example). Finally, there might be the
problem of a lock-in effect; if a company decides to switch from one cloud service
provider to another, how easy or difficult will that be? A recent study by Dillon and
Vossen (2015) has shown that these pitfalls are perceived quite differently in var-
ious countries.
Independent of the benefits and pitfalls of the cloud, a recent survey by Morgan
Stanley in which 100 CIOs (75 in the U.S. and 25 in Europe) were interviewed
reveals that almost 30% of all applications were expected to be migrated to the
public cloud by the end of 2017, up from 14% in 2016. Also, Microsoft Azure will
bypass Amazon AWS as an IaaS provider within the next three years, and Azure is
predicted to edge out AWS by 2019 for both IaaS and PaaS. For SaaS, an increase
in marketing applications is foreseen, while analytics and business intelligence will
remain largely stable.
During the mid-1990s, companies started to notice that the then new Web was
creating increasing amounts of data. Originally, the reason was a pure technological
one: a Web server which composes pages from database input, style files and other
sources into an HTML file a browser can render automatically and records
everything it is doing. Similarly, a search engine records every query a user is
issuing (which was, for example, used by Google for a while to analyze the
“Zeitgeist” behind queries), and an e-commerce site even records every click. So
when users became more active on the Web, it was near-at-hand to start conserving
every single click they made and every word they wrote. A famous quote char-
acterizing the demand to do something meaningful with all that data is “there’s gold
in your data, but you can’t see it.”6 This triggered the rise of data warehouse
technology for online analytical processing (OLAP) as well as the emerging of data
mining tools. This development has continued ever since, and with the ongoing
6
www.siggraph.org/education/materials/HyperVis/applicat/data_mining/data_mining.html
82 2 Digital (Information) Technologies
increase in digitization, there is no end in sight. So making the most of large data
collections is of high interest today (think of Amazon’s recommendations or
Facebook ads, which become more and more user-specific) and will be even more
crucial in the future. Indeed, the development has gone from pure analysis to
prediction and to prescription, or as Friedman (2016) puts it, “guessing is over.”
The core buzzword here, which emerged in the course of 2013 (and is effectively
gone by 2017), is “Big Data,” and in this section, which is partially based on
Vossen (2014), we will try to clarify what this means.
According to Wikipedia, big data “is a collection of datasets so large and complex
that it becomes difficult to process using on-hand database management tools.”
According to Bernard Marr at DataScienceCentral.com, “the basic idea behind the
phrase ‘Big Data’ is that everything we do is increasingly leaving a digital trace (or
data), which we (and others) can use and analyze. Big Data therefore refers to that
data being collected and our ability to make use of it.” In other perceptions, the
characterization of Big Data is done via properties that all start with a V:
• Volume: Big Data typically exceeds both an organization’s own data as well as
its storage or compute capacity for accurate and timely decision-making. In a
2013 report, Intel reported that in a single Internet minute 639,800 GB of global
IP data gets transferred over the Internet, which can be broken down into emails,
app downloads, e-commerce sales, music listening, video viewing, or social
network status updates; for an up-to-date impression of how much data is moved
on the Internet per second, the reader should refer back to Chap. 1 or take a look
at visual.ly/internet-real-time or www.webpagefx.com/internet-real-time/.
• Variety: Data nowadays comes in a variety of formats ranging from highly
structured (e.g., data from a relational database) to semi-structured (e.g., XML
data) to unstructured (e.g., arbitrary texts like the ones from Twitter tweeds or
Facebook posts).
• Velocity: Data is often produced at high speeds and often comes as a data
stream instead of in a “discrete” format; streams are often so large (and fast) that
there is no way to store and evaluate of analyze them offline. Instead, whatever
should be extracted from a stream needs to be instantly.
• Veracity: Data may be dirty, falsified, unsafe or simply unreliable; it may
represent fake news (“alternative facts”) or may simply contain typing and other
grammatical errors.
• Value: Data is (hopefully) of high value to those who analyze it.
2.3 Technology for the Management of (Big) Data 83
We note that the first three of these V’s are attributed to analyst Doug Laney who
works for Gartner, and additional V’s can easily be identified.7 In essence, big data
refers to the situation that more and more aspects and artifacts of everyday life, be it
personal or professional, are available in digital form, e.g., personal or company
profiles, social-network and blog postings, buying histories, health records, to name
just a few. Increasingly more data is dynamically produced especially on the
Internet and on the Web, and that nowadays the tools and techniques are available
for evaluating and analyzing all that data in various combinations and for deriving
conclusions from that which can converted into some kind of benefit. Numerous
companies foresee the enormous business opportunities that analytical scenarios
based on big data can have, and the impacts that it already has or at least soon will
have on advertising, commerce, and business intelligence (BI). BI tools have
meanwhile reached maturity, and besides stored data it is now possible to process,
or to incorporate into processing, data streams which cannot or need not be stored.
We generally consider “big” data as a consequence of the Web 2.0 developments
(see Chap. 1), but warn the reader right away that “big” is indeed relative to the
point in time that you look at it (what we consider big today would have been
unimaginable 10 years ago, and in ten years from now we will be amused by the
“tiny” amounts of data we handled in 2017).
As we mentioned earlier in this chapter for cloud computing, big data can again
be viewed from a technical, an economic, an organizational, and a legal perspective.
This section will primarily discuss the technological dimension and hence the
technology available for handling big data, in particular technology that has made it
to the center of attention recently. We note that big data processing often goes
beyond the pure employment of technology, but also needs a host of techniques
from statistics as well as from the field of visualization. The former area contributes
tools for clustering, classification, and statistical modeling (using, e.g., regression,
or neural networks), while the latter helps in data exploration using tools like
bubble charts, 3D scatter plots, or network and tree visualizations, to mention just a
few.
In order to cope with big data, a variety of techniques, methods, and technologies
has been developed over the years. For a long time, database systems has been the
technology of choice when it comes to storing, retrieving, querying, or generally
managing data. This is still the case today, since database technology has indeed
kept up with the fact that datasets to be processed fast have become larger and
larger over the years, that the types of data to be handled have changed, and that
considerable processing power is needed for complex computations to be per-
formed on this data. We take a brief look at the kind of database technology next
7
e.g., www.datasciencecentral.com/profiles/blogs/top-10-list-the-v-s-of-big-data
84 2 Digital (Information) Technologies
that has dominated the field since the early 1980s, and also look at data warehouses
as a core Business Intelligence (BI) technology for many years.
Databases and database systems were originally invented in the 1960s in an
attempt to organize ever growing data collections, and to do so in a way that could
guarantee certain properties. Among them was data independence, or independence
of data from the programs that operated on the data, later also declarative query
languages like SQL and the ACID contract for transaction processing guaranteeing
atomicity, consistency, isolation, and durability for database transactions in a
multi-user environment. Database systems experienced a major breakthrough in the
early 1980s with the arrival of the Codd’s relational data model, which suggested to
organize data in tables uniformly representing entities and their relationships. The
theory and also practice of relational systems that were built based on that model is
nowadays well-understood, has survived a number of potential competitors (e.g.,
object-oriented databases), and its query language SQL has reached popularity and
dissemination beyond pure databases.
In the mid-1990s it turned out that data collections especially in enterprises had
reached a volume which suggested that there should be other uses than just simple
CRUD operations (create, read, update, delete) or SQL querying and reporting. The
idea of data analysis was born, and it was soon suggested to perform analytical
applications on a separate “database” since it was expected that analysis could be
compute-intensive and should hence not impact the operational systems that pro-
duced and delivered the data and that supported daily business. This is what a data
warehouse is about; it enables online-analytical processing (OLAP) as well as data
mining on data collections that have been specifically prepared. Indeed, these
application are typically run on data that has been through a staging area or through
an ETL process (short for extraction, transformation, and loading) during which
data is selected from available sources, cleaned, curated, transformed into a specific
schema form (star or snowflake schema), and finally loaded in the warehouse.
Enterprises would move transactional data to the warehouse at regular intervals.
A large warehouse is often partitioned into subject-oriented data marts, and these or
the warehouse itself are made available to a variety of evaluation tools (dashboards,
spreadsheets, reports, mining applications, etc.). This results in a classical archi-
tecture for a data warehouse as shown in Fig. 2.11.
While a data warehouse typically comes—just like a database management
system—as a software package that is locally installed, Big Data and cloud com-
puting have triggered an evolution in this field that has radically changed the
picture. In particular, Big Data has—as a consequence of the Web 2.0 develop-
ments we have described in Chap. 1—brought along numerous new data sources
outside an enterprise that carry interesting or relevant data, and cloud computing
has made a host of tools available that no longer require local installation and can be
used on demand. We will have more to say about this later. We note that certain
external data sources have always been available, for example Web statistics and
tracking as done by SimilarWeb (www.similarweb.com/) which has been around
since 2007, but many sites (including the likes of Facebook and Twitter) nowadays
2.3 Technology for the Management of (Big) Data 85
Middleware Platform
Data
OLAP Meta Data Warehouse Archive Warehouse
Server Data Data Basis Systems Core
Data Marts
Data
Transfor- Meta Data Loading/
mation
Integration Cleansing
Generation Updating
Migration
(ETL)
Operational
Database
Systems
Internal Data Sources External Data Sources
We next look at more recent requirements especially from big data scenarios and
discuss how these are being handled. Two important developments in this area are
novel types of database systems, including NoSQL systems and in-memory data-
base systems, and computing environments that are based on distributed file sys-
tems; we start with the latter.
Even though relational database systems were a huge success and have been in
wide use for many years, there have always been situations or applications for
which a database was not an optimal solution. For example, such a system might
require an initial investment and additional hardware which is beyond what an
enterprise can afford. The system “overhead” for providing high-level querying,
concurrency control, recovery, and data integrity may be inappropriate for a given
application. Or the application at hand might be simple enough and not require
86 2 Digital (Information) Technologies
File 1
Chunk 1
File 1
Chunk Server
Chunk 2
File 2
Chunk 1
File 1
Chunk 2 redundant
File 1
App Master Chunk Server
Chunk Chunk 1
Mappings
File 2
Chunk 2
Shadow
Master
File 1
Chunk 2
File 2
Chunk Server
Chunk 1
File 2
Chunk 2
Global-scale data management and globally distributed file systems have given
rise to a number of technical developments that go beyond the scope of this book.
However, we will return to one of them, relaxing the ACID requirements, later in
this chapter.
In order to cope with the enormous amounts of data that modern applications
demand to handle efficiently, today’s computing environments often rely on mul-
tiple parallel compute clusters and distributed file systems. Cluster computing was
already mentioned earlier in this chapter as one of the precursors of cloud com-
puting, and modern clusters contain multiple machines that are connected within the
rack that is housing them; data centers then hold a large number of such racks
which are connected among each other through a network, and on top the data
centers of a provider are also connected, since, as we mentioned, data or files can
now be distributed globally. Since machines within a rack or entire racks may fail
(and typically will after a while), data processed by them is kept redundant, and
computations are broken down into various tasks that can individually be restarted
when necessary.
An approach to breaking a computation down into various tasks was developed
by Google when they were looking for a technique for indexing and searching large
data volumes and for ranking those indexed pages that qualify as search results. We
mentioned in Chap. 1 that computing the PageRank of a Web page, although
recursive by definition, can be accomplished by an iterative procedure which is
based on matrix-vector multiplication in very high dimensions. The good news is
that matrix-vector or matrix-matrix multiplication has long been known as a
problem for which efficient parallel algorithms exist. These require that both factors
of a multiplication task be suitably partitioned into pieces and that these pieces are
replicated over multiple compute devices, in order to be readily available for the
multiplication step that needs them.
While replication is generally a measure to enhance data availability, since if one
copy fails another might still be available, partitioning turns out to be the key to
tackling many large-data problems algorithmically. Partitioning essentially follows
the “old” principle of divide and conquer, which has a long tradition in computer
science and the design of algorithms. If data can be split into various independent
partitions, processing of that data can exploit parallelism, for example by keeping
multiple cores of a processor or multiple CPUs in a cluster busy at the same time.
The results obtained by these cores or CPUs may need to be combined in order to
form a final processing result. This is the basic idea of Google’s map-reduce (for
which Google obtained US Patent 7,650,331, granted in January 2010) which
employs higher-order functions (well-known from the functional programming
paradigm) for specifying distributed computations on massive amounts of data.
2.3 Technology for the Management of (Big) Data 89
Mapper
Reducer
Mapper
Partition
Shuffle
into Combined
Input data and group Reducer
various output
by keys
chunks
Mapper
Reducer
Mapper
X
n
xi ¼ mij vj
j¼1
Now suppose that n is of the order 1012, and that vector v fits in main memory.
Then a map steps processes a chunk of M: For each element mij, map produces a
90 2 Digital (Information) Technologies
00573321309999991990010103004--51317+028783FM-12+0171…-0128
date temperature
key-value pair (i, mijvj), i.e., all terms of the sum for one xi get the same key
i. Reduce will add all values that belong to the same key i and yield a pair (i, xi) and
hence the components of the result vector.
As another example, we consider the analysis of weather data coming as a long
string from weather stations; our interest is in an overview of the maximum tem-
peratures recorded during a year. Input data in this case might look like the sample
shown in Fig. 2.14. The weather station regularly sends long strings that have to be
interpreted appropriately; every string contains, among other information, the ID of
the station, the date of the measurement, longitude and latitude of the station’s
location, and the actual temperature.
Now suppose the following input is received, where the parts relevant for
determining the maximum temperature are highlighted (and temperature values are
rounded to integers):
If this is the chunk of data given to a mapper, then the latter will extract year (as
key) and temperature (as value) as desired:
ð1990; 0Þ; ð1990; 22Þ; ð1990; 11Þ; ð1989; 111Þ; ð1989; 78Þ
It should be obvious that a task like this, which will in reality be based on huge
amounts of weather-station data all of which can be processed independently, is a
perfect candidate for a map-reduce computation. Other such tasks include counting
the occurrences of words in a text collection (relevant to index creation and
maintenance for a search engine), or even operations from relational algebra. In
general, map-reduce is applicable to problems which are easy to parallelize, i.e.,
which can easily be partitioned into parts such that these parts are processed
independently and their results are then combined into an overall result.
Apache Hadoop and Spark
Clearly, several issues need to be addressed in order to make a map-reduce com-
putation work, including the question of how to decompose a given problem into
smaller chunks which can be processed in parallel, how to adequately assign tasks
to compute nodes (executing a mapper or a reducer), how to coordinate synchro-
nization between the different compute nodes involved in a computation, or how to
make such a scenario robust against failures. These questions have been answered
in recent years in various ways, the best known of which is the software library
Hadoop. Hadoop8 supports scalable computations across distributed clusters of
machines. Its core components are the MapReduce Engine, the Hadoop Distributed
File System (HDFS), and the YARN (Yet Another Resource Negotiator) config-
urable resource manager for clusters. The former is responsible for execution and
control of map-reduce jobs; HDFS is a distributed file system in which large
datasets can be stored, read, and output. User data is divided into blocks, which get
replicated across the local disks of cluster nodes. HDFS is based on a master-slave
architecture, where a namenode as master maintains the file namespace including
the file-to-block mapping and the location of blocks, and datanodes as slaves
manage the actual blocks. Besides these main components there are numerous
extensions of Hadoop by specific functionality such as data storage, processing,
access, and data management in general, which together are considered the Hadoop
“ecosystem,” a snapshot of which can be found at savvycomsoftware.com/what-
you-need-to-know-about-hadoop-and-its-ecosystem/.
A major Hadoop “competitor” nowadays is Spark, another Apache project that is
an improvement over the original Hadoop MapReduce component of
Hadoop. Spark is a fast cluster computing system developed through contributions
of almost 250 developers from 50 companies in UC Berkeley’s AMP Lab. Spark
follows an execution model supporting in-memory computing and optimization of
arbitrary operator graphs, so that querying data becomes much faster when com-
pared to the disk based engines. While Hadoop can only be used in batch mode,
Spark is interactive. Spark powers a stack of libraries including SQL and Data-
Frames, MLlib for machine learning, GraphX, and Spark Streaming. You can
combine these libraries seamlessly in the same application. For details, we refer the
reader to spark.apache.org/. Another recent development in this category of big data
processing tools is Flink, see flink.apache.org/, which emerged from the German
8
hadoop.apache.org/
92 2 Digital (Information) Technologies
failures are frequent. Since the good news is that not every application running in
the cloud permanently needs full consistency, consistency can often be relaxed into
what is known as eventual consistency: When no updates occur for a long period of
time, eventually all updates will propagate through the system and all the nodes will
be consistent; for a given accepted update and a given node, eventually either the
update reaches the node or the node is removed from service. Eventual consistency
is used, for example, by Amazon within several of their data storage products. An
observation first made by Eric Brewer and later proved by Gilbert and Lynch (2002)
is that of consistency, availability, and partition tolerance, only two properties can
be achieved simultaneously; this result has become known as the “CAP Theorem.”
In other words, if an application wants to be immune against partition failures and
needs to guarantee high availability, it has to compromise consistency. Conversely,
an application that needs consistency along with high availability cannot expect to
be partition-tolerant and hence has to take measures for handling partition failures.
The NoSQL systems mentioned above have reacted and responded to the CAP
Theorem in various ways, most often by allowing for relaxed notions of consis-
tency, yet more recent developments (such as Google’s Spanner and F1) claim to be
able to go back to strict forms of consistency.
In-memory (or main-memory) database systems as currently popular actually
represent the revival of an old idea that was first studied in the late 1980s and that
has finally become available in commercial products thanks to considerable tech-
nological advances during the last 30 years. In-memory database systems are pri-
marily characterized by enormous gains in processing speed, since the classical
bottleneck between (relatively slow) secondary memory and (very fast) primary
memory is essentially eliminated. Large quantities of data can now be kept in main
memory, and there is no need to swap data in and out because space in main
memory is so limited. Hence read access is now much faster since no I/O access to a
hard drive is required. In terms of write access, mechanisms are available which
provide data persistence and thus secure transactions. In-memory databases have
proven to be suitable for particular use cases, and big data analytics is one of them.
This use case goes way beyond traditional simple read and write operations, and
typically requires a database to provide functionality not commonly attributed to
database systems in the past. For example, SAP’s HANA system is capable of
conducting transactions and analytics in parallel and on the same database, so that a
division into operational database system and data warehouse system is no longer
necessary.
The general expectation today is that all these huge amounts of data we have at our
disposal help overcome guessing, speculating, and imprecise forecasting. The use
cases that we will discuss in the Chap. 3 all apply some form of analytics to the data
collection at hand, be it sports, healthcare, entertainment, or any other area. Above,
we have given an idea in the map-reduce examples how large amounts of data can
94 2 Digital (Information) Technologies
be broken into pieces and then be processed individually. But beyond pure storing
or streaming as well as processing, big data applications have an interest in ex-
ploiting the data, i.e., in deriving knowledge or meaning from the data that can
ultimately be turned into benefit or even profit.
Analytics for big data goes considerably beyond what has been done in the
traditional data warehouse. One reason is that the interest now is to take into
account data that cannot be found within a company, but vastly on the Web, e.g., in
public blogs or on social networks. Another reason is the size of big data, since that
often goes beyond the capabilities and capacities of an enterprise. This is where the
new technologies discussed in the previous subsection come in, and they do so in
various forms. One way is to augment a data warehouse by, say, a Hadoop envi-
ronment; another is to kiss the warehouse goodbye and implement a radically
different and new architecture, primarily based on the Hadoop stack.
The latter approach is often preferred when the application requires more than
just information technology, but also statistical computing or computational
statistics, which lies on the border of statistics and computer science. It is concerned
with the development of statistical programming languages such as R and with the
design and implementation of statistical algorithms (such as those available in
packages like SPSS, SAS, or STATA).
As mentioned, big data analytics also refer to data mining techniques to a large
extent, i.e., the process of discovering patterns (such as association rules or clusters)
in large datasets; data mining has, since the 1990s, become popular as a collection
of techniques to unleash previously unknown knowledge from raw data. A typical
data mining application is market basket analysis, which investigates the question,
among others, which items a supermarket customer typically buys together, i.e., in a
single transaction or single trip to the supermarket. The goal is to determine “re-
liable” association rules which can allow the supermarket to do special promotions
or item placements to improve sales. A well-known method here is the Apriori
algorithm, originally developed by Agrawal et al. (1993), which will be discussed
in Chap. 4.
Of particular interest in the analysis of big data are algorithms that in a sense can
adapt themselves to the fact that, over time, more and more data becomes available,
allowing for even more precise results than at the time when less data was available.
A typical example is a recommender system (see Chap. 3) which, for instance,
recommends movies to viewers. Based on searches or past viewing history, the
recommender will try to establish a “profile” for a viewer, possibly compare this
with the profiles of other viewers, and then recommend movies the viewer in
question has not yet seen, but which fit his or her profile. Clearly, recommendations
get better and better the more refined the profile is. However, the movie provider
will not be interested in rewriting the recommendation algorithms from time to
time, whenever a significant amount of new data has become available. Instead, it
would be preferable if the algorithm improves itself, or “learns.” This is the basic
idea behind the wide area of machine learning (ML), which is the science of
enabling computers to act without being explicitly programmed. ML has gained
high attention especially in connection with big data, and has impacted and is
2.3 Technology for the Management of (Big) Data 95
impacting such diverse areas as self-driving cars, speech recognition, Web search,
or genome research.
A highly acclaimed ML example is IBM Watson, “a technology platform that
uses natural language processing and machine learning to reveal insights from large
amounts of unstructured data” (www.ibm.com/watson/). IBM Watson became
widely known in 2011 through its win against two of the greatest champions of the
American quiz show Jeopardy essentially by evaluating huge amounts of possible
answers in parallel and at an extremely high speed. Since then, Watson technology
has been exploited in other areas, one of which is healthcare, as we noted earlier one
of the prominent use cases for big data. Watson Analytics is now available for
general use and can hence be integrated into a variety of other fields as well, and a
number of interesting discoveries can certainly be expected (see www.ibm.com/
analytics/watson-analytics/). We will come back to Watson in Chap. 6 when dis-
cussing the Watson IoT platform.
“I think there is a world market for maybe five computers.” This market forecast by
IBM chairman Thomas Watson in 1943 proved to be just as wrong as the
assessment by DEC founder Ken Olson in 1977: “There is no reason anyone would
want a computer in their home.” It is precisely digitization, which allows for
information and communication technologies to access all areas of our lives, which
shows how absurd these assessments were, granted, from today’s perspective. Not
even these legendary IT experts of their time could begin to imagine the perfor-
mance explosions possible in IT in such a short time. They were even less capable
of imagining the dramatic hardware price decline that accompanied these perfor-
mance explosions. Powerful information and communication technologies have
become affordable now to both business and private users.
Perhaps the IT veterans Watson and Olson had been right with their predictions,
had the market continued to focus on special-purpose systems designed for specific
purposes and developed from perfectly matched components. Instead, market
leaders such as IBM, Intel and Microsoft have emerged and created industry
standards with their products within a short time. The standardization allowed for
economies of scale, which led to a drop in prices for the components. Moreover,
driven by software innovations, the shift of system functionality away from hard-
ware to software enabled for more flexibility in configuration and subsequent use of
the systems. The result was that general-purpose systems could be used in more and
more fields of application, creating additional economies of scale and thus a further
drop in prices. In fact, the additional flexibility that had to be paid for by reduced
performance and higher integration complexity was more than offset by the rapidly
increasing price-performance ratio of the systems.
96 2 Digital (Information) Technologies
Vertical-specific Pre-
packaged In-Memory Hyperion Analytic Applications for
Analytic Applications EPM ERP, SCM, CX, HCM
Operating System
Virtualization
Oracle VM
Networking
Compute
Internal Storage
2.4.2 Appliances
Capacity-On-Demand Single-Vendor
Software Licensing Support
Database Software
Systems Management
Oracle Appliance
Operating System Manager
Virtualization
Oracle VM
Networking
Compute
Storage
9
PROMATIS® BPM Appliance™ is a product of PROMATIS software GmbH, Ettlingen,
Germany.
2.5 Further Reading 101
PROMATIS Solution
BPM Software Oracle BPM Suite
Horus Business Modeler
Systems Management
Infrastructure Software
Professional Services
Oracle VM
Hardware (optional)
process modeling tool. Various other tools have also been developed, in an effort to
establish a kind of “regulation framework” for the modeling task at hand, but with
less rigor; an example is the Signavio Process Editor (see www.signavio.com/
products/process-editor/).
Boutros and Purdie (2014) deal with business process reengineering and process
improvement. Erl (2005, 2009) are introductions to the topic of Service-Oriented
Architecture (SOA). The importance of BPM is also discussed by vom Brocke and
Schmiedel (2015). Vom Brocke and Rosemann (2015a, b) are comprehensive
introductions to all touchpoints that BPM can have in an enterprise environment.
Introductions to cloud computing are provided by Ruparelia (2016), Erl et al.
(2013), or Kavis (2014). Haselmann et al. (2011, 2015) discuss specific ways to
organize and administer a cloud that is particularly appealing to SMEs. Venters and
Whitley (2012) compare the desires and the reality of present-day cloud computing
and develop a framework for evaluating cloud service providers. Juels and Oprea
(2013) compare approaches to security and availability for cloud data. Portnoy
(2016) is an introduction to virtualization; Tanenbaum and van Steen (2007) give a
detailed account of distributed systems.
The reader interested in the foundations of relational databases should go back to
Ted Codd’s (1970) original paper or read his Turing Award Lecture in Codd
(1982). Elmasri and Navathe (2016) provide a comprehensive introduction to all
aspects of database systems. Han et al. (2012) is a comprehensive introduction to
data mining, which also discusses data warehouses. Modern distributed file systems
are described by Leskovec et al. (2014); a classical introduction to distributed
(database) systems is Özsu and Valduriez (2011).
Various introductions to the area of Big Data have been published recently,
including by Mayer-Schönberger and Cukier (2013) or Marr (2016). Google’s
Map-Reduce was first described by Dean and Ghemawat (2008). Examples for an
application of the Map-Reduce paradigm can be found in Leskovec et al. (2014).
The reader interested in approaches to matrix multiplication on parallel hardware
should consult Dekel et al. (1981) and Akl (1989). Regarding modern hardware
solutions for processing big data, we refer the reader to Saecker and Markl (2013).
An introduction to Hadoop is given, for example, by White (2015). Map-reduce, its
Hadoop implementation, Spark, Flink and many other components of the Hadoop
ecosystem not only have spawned a host of development in recent years, which has
resulted in a variety of commercial offerings (e.g., MapR, Hortonworks, or
Cloudera), but also a lot of research, presented, for example, in the annual Inter-
national Workshop on MapReduce and its Applications started in 2010. For
NoSQL systems we refer the reader to Redmond and Wilson (2012) or Elmasri and
Navathe (2016). A representative of the NewSQL category, which combines
classical database properties with those of NoSQL systems, is Google’s Spanner,
see Corbett et al. (2012). For details of in-memory database systems we mention
Loos et al. (2011) or Plattner and Zeier (2015), who also discusses SAP HANA.
2.5 Further Reading 103
In this chapter we look at the implications for consumers in light of the develop-
ments (technological and otherwise) that we have presented in the previous two
chapters. We start with electronic commerce and look at various approaches that
businesses, both brick-and-mortar and mobile, now have at their disposal for getting
their message to the customer, including advertising, social media marketing, and
recommendation. We will also show how big data analytics can open up new
possibilities. Finally, the emerging area of e-government will be touched upon. As
will be seen, we here take a customer’s perspective and argue that not all of the
above is beneficial, since there are pros and cons of the various ways customers are
nowadays addressed or approached, and there are also many things that they not
only now can, but indeed have to do themselves.
Online/Offline Ads,
Email, Recommendation,
Forum, FAQ, Promotion
Social Network,
Newsletter Need or
Awareness
After-
sales
Reviews,
Presales
Blogs, Media,
Sales Direct Contact
Store,
Ecommerce
Price
Comparison
claim. Many of these customer service situations can be dealt with by providing
detailed information and answers to questions (in FAQs) electronically, and by
providing appropriate customer services. The general perception and goal today
is that of providing a smooth customer journey (or customer experience, CX, see
Chap. 4) that starts with producing awareness for a product or service and
continues through the phases of consideration, purchase, retention, and even
advocacy as shown in Fig. 3.1. This journey typically alternates between a
variety of physical and digital touchpoints and is nowadays supported by
comprehensive analytics of all the data traces a customer leaves behind.
• Recommendation component: Closely related to the previous point (and in
particular the awareness and consideration phases) is that of recommendation,
which has become popular in e-commerce, as people can now refer to other
people for obtaining advice on a product or service. As will be seen ion
Sect. 3.6, an e-commerce site is typically interested in establishing a “profile”
for every customer containing, for example, all the products the customer has
ever bought or at least looked at. It gets interesting when the site is able to find
other customers whose profile is, in a sense that needs to be made precise,
“similar”, so the products the first customer has not yet acquired can be rec-
ommended to her or him. Known as collaborative filtering, this scheme applies
to goods, movies, music, and even people on a dating site.
Content
Social Customer Shopping
Management Product Order
Media Account Cart &
Store Locator Catalog Management
Connection Management Check Out
Search
Payment
Processing Inventory Customer
Fulfilment Reporting
Fraud Management Service
Detection
well as external applications in green. The customer (although not explicitly shown)
and customer service play a central role, as do social media.
Clearly, while these components are well-understood these days, it has taken
more than 20 years of development and (gathering of) experience to develop this
understanding. In the beginning, i.e., in the mid- to late-1990s, user acceptance of
e-commerce was low, due to limitations in Internet access, the limited number of
companies doing e-business at all, a general lack of trust, or due to the missing
customer support. For further details on the issues of the early days and the
development until today, the interested reader should consult, for example, con-
secutive reports by Mary Meeker of Silicon Valley venture capital company
KPCB.1 Indeed, the initial perception was that a traditional retail shop will know
returning customers after a short while, whereas an e-commerce site, without fur-
ther measures, cannot distinguish an HTTP request by a customer today from one
by the same customer tomorrow. This situation has meanwhile changed consider-
ably, and it is one of many situations where big data has become prominent:
E-businesses today typically know their customers well, can classify them
according to a variety of criteria, and are often even able to predict what the next
transaction or the next step during the respective customer journey will be.
It took e-commerce companies some time to recognize that doing their business
electronically involves much more than setting up a Web site that comprises the
components listed above. Indeed, it requires a considerable amount of process
reengineering, as, for example, the Web shop front and the back office must be
connected in ways that did not exist before; this was frequently overlooked in the
early days of electronic commerce and was a typical cause of failure. Among the
fastest to realize this and to react appropriately have been banks and financial
1
see www.kpcb.com/blog/2016-internet-trends-report for the latest edition.
3.1 Commercialization of the Web 109
institutions, since their business has moved to the Web quite extensively; electronic
banking, stock trade as well as online fund and portfolio management are in wide
use today. A general theme (and challenge) in this context is the broad digitization
of business, a topic that we will discuss in Chap. 5.
The move from stationary commerce to electronic and later to mobile commerce
has also triggered the development of new branches of the software industry, which
not only provide shop front software, but also systems for click stream analysis,
payment encryption (including technologies such as Blockchain and parallel cur-
rencies such as Bitcoin), data mining, and customer relationship management
(CRM). As we mentioned in Chap. 2, data mining has become popular for ana-
lyzing the vast amounts of data that are aggregated by a typical e-commerce
installation; prominent data mining applications include association rule determi-
nation, clustering, and classification, some of which will be discussed in Chap. 4.
Click streams that have been created by users are also subject to intensive data
mining, and CRM comprises a set of tools and techniques for exploiting data
mining results in order to attract new customers or to retain existing ones.
An e-commerce platform typically serves both the buyer as well as the seller side
and sometimes even an intermediary. On the buyer side, there is a number of
suppliers from which the company obtains its raw materials or supplies, often
through some form of procurement process, and, thanks to the flattening we have
mentioned in Chap. 1, this process can be executed world-wide. Internally, there is
a supply-chain management (SCM) system at work that interacts with an enterprise
resource planning (ERP) system in various ways. On the seller side, there may be
several channels through which the company sells its goods or services also
world-wide, including individual consumers, businesses, and partners (see also
Fig. 3.1). For their various customers, some form of CRM-system will be in place
in order to take care of after-sales activities, customer contacts, complaints, war-
ranty claims, help desk inquiries, etc.
Electronic commerce has developed into various types that all have their specific
properties and requirements:
2
see www.amazon.com/Amazon-Prime-Air/b?ie=UTF8&node=8037720011 for a December 2016
experiment in this direction.
3.1 Commercialization of the Web 111
they like it, whether or not they would buy it again, and maybe how the seller
performed or how easy the overall experience has been. Once the importance of
other customers’ opinions, evaluations, and recommendations had been recognized,
many Web shop providers started to install facilities for commenting on a regular
and intensive basis. Amazon was among the first to take this even a step further and
go from pure reviews, which can be commented by others, to a comprehensive
recommendation system (“Customers who bought this item also bought …”),
whose goal is, as an Amazon employee once put it, “to make people buy stuff they
did not know they wanted.” Integration of social media established such retailers as
proponents of what is now known as social commerce.
The other aspect the owner of an e-commerce business will be interested in, the
attraction of traffic as already mentioned above in the context of portals, is closely
related to classical disciplines from business administration: advertising and mar-
keting. Traditionally, advertising is the art of drawing public attention to goods or
services by promoting a business, and is performed through a variety of media. It is
a mechanism of marketing, which is concerned with the alignment of corporations
with the needs of the business market (the term customer journey we mentioned
above is also a term commonly used in marketing). We will take a closer look at
online advertising below.
While we have mentioned that, from a seller’s perspective, doing business over
the Web may be attractive due to the absence of intermediaries which often do little
more that take a share of the profit, there are situations in which new intermediaries
enter the picture and indeed offer a highly valued service. This in particular refers to
facilitating payments, for which trusted third parties have come onto the scene. The
term is borrowed from cryptography, where it denotes an entity enabling interac-
tions between two parties who both trust the third party; so that they can utilize this
trust to secure their business interactions. A well-known example is PayPal, which
allows payments and money transfers to be made over the Web, actually to anybody
with an email address, and it performs payment processing for online vendors,
auction sites, and other corporate users, for which it charges a fee. Private users
need to register and set up a profile which, for example, includes a reference to a
bank account or to a credit card; PayPal will use that reference to collect money to
be paid by the account or card owner to someone else. When a customer reaches the
checkout phase during an e-commerce session, the shopping site he or she interacts
with might direct them to PayPal for payment processing. If the customer agrees to
pay through PayPal, PayPal will verify the payment through a sequence of
encrypted messages; if approved, the seller will receive a message stating that the
payment has been verified, so that the goods can finally be shipped to the customer.
Services such as PayPal or Escrow have invented the notion of a micro-payment,
i.e., a tiny payment (of often just a few cents) which is feasible as a reimbursement
only if occurring sufficiently often. Figure 3.3 shows the checkout process when the
PayPal “Buy Now” button is employed.
3.1 Commercialization of the Web 113
Begin
Buyers are
ready to
purchase
your item
1a 1b
Buyers enter their Buyers with
billing information PayPal accounts
to pay by credit card log in to pay
Buyers
confirm
transaction
details
before
paying
In conclusion, it is fair to say that since the inception of the Web in 1993, a
considerable amount of the trading and retail business has moved to electronic
platforms and is now run over the Web. E-commerce continues to grow at a fast
pace; indeed, US e-commerce retail alone grew from $132 Billion in 2008 to a
estimated $224 Billion in 2014. More precise figures can be found, for example, at
www.emarketer.com/. This has created completely new industries, and it has led to
a number of new business models and side effects that do not exist, at least not at
this scale, in the physical world.
Before we continue our elaboration of IT and its impact on the consumer, we pause
for a moment and take a look at the “role model” for electronic commerce:
Amazon.com. Amazon sold its first book online in July 1995 and, as mentioned in
Chap. 1, soon after started selling CDs, then DVDs, and then items in many other
categories. According to an article in USA Today,3 “now you can hop online from
your phone, download the e-book version, bid on a vintage couch on which to read
it, and hire someone to explain the concepts to you—all with one click.”
3
www.usatoday.com/story/news/nation-now/2015/07/14/working—amazon-disruptions-timeline/
30083935/
114 3 IT and the Consumer
In the 20 years since its inception, Amazon has introduced major innovations
that have had an impact on how people read, consume music and movies, and even
shop for typical consumer products such as clothes and groceries. Beyond this,
Amazon has invented (or at least experimented with) novel ways of delivering
goods (e.g., by drones), and has become one of the largest cloud providers
worldwide. As we have discussed in Chap. 2, Amazon is nowadays providing all of
infrastructure, platform, and application software as service in the cloud. Not sur-
prisingly, the Internet traffic caused by AWS has long (actually in January 2007)
bypassed the traffic caused by Amazon’s e-commerce business, as can be seen from
media.amazonwebservices.com/blog/2008/big_aws_bandwidth.gif.
More importantly, Amazon has become the role model of a “disruptor” in the
sense that, even without a single physical bookstore, Amazon has been able to
disrupt the physical book market and its value network, and replace it by a virtual
market that is cheaper, faster, and way more convenient than the classical market in
many respects. We will discuss disruption and disruptive innovation in more detail
in Chap. 5, but a brief summary of “Case Amazon” is presented here.
We follow the USA Today article mentioned above from July 14, 2015 on
Amazon’s lifestyle innovations, which summarizes the following key areas of
innovation:
• Bookselling: In the mid-1990s, Amazon was the first to try online bookselling,
and it succeeded widely and even put competitors (i.e., Borders) out of business.
• One-click purchasing: Buying something with one click only was introduced in
the fall of 1997; Amazon even holds a patent for this.
• The cloud: We discussed AWS in Chap. 2, an idea that started in 2006 and has
evolved into a major Amazon business.
• The cloud at home: In March 2015, the company expanded its professional
services marketplace, Amazon Local Services. Now you can hire anything from
a plumber to a goat herder, at your digital leisure.
• Members only: Amazon Prime was created in 2005, with users paying a flat
annual subscription fee for certain benefits, including one-day shipping prices.
Same-day delivery for Prime members launched in May 2015. Amazon
announced its goals to launch drone delivery called Amazon Prime Air in
December 2013, and said that the future delivery system is “designed to safely
get packages into customers’ hands in 30 minutes or less using small airborne
devices”.
• The rise of e-books and e-reading: Amazon’s Kindle was a game-changer for
e-reading. Launched in 2007, the Kindle connected book purchasing with book
platforms, leading to sensational headlines like “print is dead.” In mid-2010,
Amazon’s Kindle e-book sales outpaced hardcover book sales for the first time.
By a similar token, the Kindle Fire, launched in September 2011, was a cheap
version of the flashier Apple counterparts. Beyond this, Amazon purchased
Goodreads, the leading social network for book lovers, in March 2013.
3.1 Commercialization of the Web 115
4
www.forbes.com/sites/suwcharmananderson/2012/12/18/amazon-is-ripe-for-disruption/
#55ad91947d4c
5
www.entrepreneur.com/article/228525
116 3 IT and the Consumer
On the other hand, Amazon has been instrumental over the past 20 years in the
development of data mining techniques that take advantage of the massive amount
of digital data that is available on an e-commerce site, including customers’ sear-
ches, clicks paths, length of stay on a particular site, buying histories, wish lists etc.
Every single click is recorded and eventually analyzed, which has led to features
such as “Customers Who Bought This Item Also Bought” or “Customers Who
Viewed This Item Also Viewed.” While content-based recommendations as well as
collaborative filtering are commonly employed, there remains room for improve-
ment in this space.
In Chap. 2 we discussed technology for the management of big data, and we gave
two brief examples (matrix-vector multiplication and weather data analysis) for an
application of the map-reduce paradigm. But what are some actual cases where big
data has resulted in a new insight and how does this technology benefit business in
general? In this section we will describe several application areas which demon-
strate how big data can indeed be considered a game changer, since it has already
led to developments that differ significantly from what we have seen in the past,
e.g., in the context of data warehouses. These areas exhibit great variety, so it is
important to note that our sample is not exhaustive and also represents only the
beginning of the development, generally termed “big data analytics.”
Sports
One of the oldest examples of what data can do, before it was called “big” data and
be popularized, is from the area of sports and relates to the Oakland Athletics
baseball team and their coach Billy Beane, who was able to use statistics and player
data to revamp the team from an unsuccessful one into a comparatively successful
one within a relatively short time span. The story is well documented in the book by
Lewis (2004) and in a 2011 movie based on that book (with Brad Pitt as main actor).
Doug Laney, whom we already mentioned in Sect. 2.3, provided a more recent
example from sports on his blog, namely from the Indy 500 race happening in the U.
S. every year on Memorial Day weekend. According to Laney, a present-day Indy
500 race car is on the inside “smattered with nearly 200 sensors constantly mea-
suring the performance of the engine, clutch, gearbox, differential, fuel system, oil,
steering, tires, drag reduction system (DRS), and dozens of other components, as
well as the drivers’ health. These sensors spew about 1 GB of telemetry per race to
engineers pouring over them during the race and data scientists crunching them
between races. According to McLaren, its computers run a thousand simulations
during the race. After just a couple of laps they can predict the performance of each
subsystem with up to 90% accuracy. And since most of these subsystems can be
tuned during the race, engineers pit crews and drivers can proactively make minute
adjustments throughout the race as the car and conditions change.” Further details
3.2 Big Data Analytics Application Areas 117
Commonly you need to sign up to one of the providers’ sites and can then inspect
your personal statistics. Activities are recorded on a daily basis; the user can set
goals, monitor whether they have been achieved, and even compared himself or
herself with friends who are using the same type of device. Such devices and
associated activities are supported by a range of gamification-like features including
rewards and leaderboards.
While this data may be beneficial for its users, for example to watch the progress
of a personal diet or to find out how the personal shape has improved (or deteri-
orated) over time, there is a host of other people and institutions interested in that
data, first and foremost your doctor as well as your health insurance provider. While
a personal doctor can so far diagnose a patient only based on the data from a recent
check-up, combined with the records which the doctor may keep about this patient
or which may be available from previous consultations, he or she can now integrate
this with fitness data which the patient provides himself. It will thus become easier,
for example, to relate a heart condition to a lack of exercise, and the patient will
even be enabled to control recovery him/herself. On a larger scale, it has been
predicted by IBM Research (see http://www.research.ibm.com/cognitive-computing/
machine-learning-applications/targeted-cancer-therapy.shtml) that “in five years,
doctors will routinely use your DNA to keep you well.” The healthcare area will
particularly boom soon due to an availability of personal sequence or genome data
and an increasing understanding of which of its portions (i.e., genes) are responsible
for what disease or defect.
Similar to personal doctors, health insurance companies will likely ask for the
availability of such personal tracking data soon, and they will typically be able to
execute an even wider integration of data than a general practitioner, thanks to the
digitization of research and test results, insurance claims, or home monitors, all of
which deliver data in addition to what the general practitioner (GP) already can
acquire. It can also be expected that it won’t be long until the insurance premium a
person has to pay for such an insurance will reflect the person’s willingness to do
health monitoring or to grant access to personal data.
Property Insurance
Developments analogous to healthcare can already be seen for car insurance, for
example in the USA or in the UK, where some companies already reduce the
premium a car owner has to pay if the latter is willing to plug a small device into the
on-board diagnostics (OBD) port of their car, through which the insurer can per-
manently monitor how the driver is behaving when on the road. The OBD port
allows access to sensor readings from a variety of devices that are built into a car.
Insurer Allstate is marketing its Drivewise device as follows6: “Drivewise is a way
for smart drivers to get rewarded for driving safely every day. Each time you take a
drive, it collects feedback on driving behaviors including hard braking, high speed
and when you’re behind the wheel. The safer you drive, the more you can earn!
Drivewise will never raise your rates. The focus of Drivewise is to give you
6
www.allstate.com/drive-wise.aspx
3.2 Big Data Analytics Application Areas 119
feedback that can only help your driving, and your rates.” Competitor Progressive is
marketing its Snapshot device as follows7: “The fair way to pay for car insurance. It
just makes sense—insurance should be based partly on how you actually drive,
rather than just on traditional factors like where you live and what kind of car you
have. That’s what Snapshot is all about. Your safe driving habits can help you save
on car insurance.”
Connected Cars
One important point in these developments, be it healthcare, homes, or cars, is that we
are observing more and more cases where devices talk directly to each other, instead
of just recording, say, a measurement on a website and have it ready for human or
algorithmic inspection. For example, in the car insurance example, the plug in a car is
not talking to the driver, but to a machine on the insurer’s site which can immediately
use the transmitted data for premium calculations. Ultimately, (small or big)
machines talk to other (small or big) machines, potentially through various stages or
intermediate machines, and ultimately come up with a decision on an issue that
impacts human beings. This is a typical example of the Internet of Things (IoT) and of
a cyber-physical system which we will say more about in Chap. 6.
Take connected or self-driving cars. The idea, around as a vision since the
1950s,8 is that a car be able to self-navigate along a roadmap and while doing so
observe exceptional conditions (such as construction sites) and communicate with
other cars as well as the road itself. Similar to race cars, autonomous cars carry
sensor and camera technology, so that they can monitor their distance from other
cars and their surroundings, adapt their speed appropriately, and can recognize
obstacles or oncoming traffic. Ultimately, they will even be able to predict technical
malfunctions or even breakdowns, and will then communicate to the nearest garage
to arrange repair. When the car reaches the garage, alternative transportation will
already be ready to take over the passengers; this was originally envisioned by HP
Labs in California within their CoolTown project.9
On the downside, preliminary experience with self-driving cars as gathered by
companies like Tesla, Jeep, or Volvo shows that there is still a lot to be done before
the vision of totally self-driving cars will become a reality. Autonomous cars need
to be connected to the Internet in order to be able to communicate with a remote
server that can evaluate, for example, distance measures or images in real-time.
These connections could become subject to hacking, or the decision that the car
makes in response to a server communication may simply be wrong. Worse, the
problem of making a “correct” decision when there are two alternatives, both of
which imply casualties but in differing amounts, has been shown by Englert et al.
(2014) to be closely related to the famous halting problem for Turing machines and
is hence undecidable; see also Achenbach (2015). The halting problem states that
there is no algorithm which can decide given a program and an arbitrary input to
7
www.progressive.com/auto/snapshot/
8
See www.youtube.com/watch?v=F2iRDYnzwtk
9
See www.youtube.com/watch?v=U2AkkuIVV-I
120 3 IT and the Consumer
that program, whether the program will halt for that input. While originally stated
for the formal computational model of Turing machines, it generalizes to programs
written in arbitrary, yet “Turing-complete” languages. Undecidability means that
there is no algorithm, of whatever complexity, that can solve the problem at hand.
Hence, connected cars have a built-in problem, which cannot even be solved by
algorithm and even less by an ethics committee!
Smart Cities
While connected or autonomous cars are still in an early stage of development, other
developments regarding transportation (or more generally, “becoming smart”) are
more advanced. We mention two examples next that are representatives of the pos-
itive effects (big) data can have for the customer or more generally the individual. The
first example deals with Milton Keynes, a town in Buckinghamshire, England. Milton
Keynes, or MK for short, has set out to become a role model for smart cities, taking a
holistic view of transportation, energy and water management, enterprises and citi-
zens. On its website www.mksmart.org/ the city originally stated: “Milton Keynes is
one of the fastest growing cities in the UK and a great economic success story.
However, the challenge of supporting sustainable growth without exceeding the
capacity of the infrastructure, and whilst meeting key carbon reduction targets, is a
major one. MK:Smart is a large collaborative initiative, partly funded by HEFCE (the
Higher Education Funding Council for England) and led by The Open University,
which is developing innovative solutions to support economic growth in Milton
Keynes. Central to the project is the creation of a state-of-the-art ‘MK Data Hub’
which supports the acquisition and management of vast amounts of data relevant to
city systems from a variety of data sources. These include data about energy and water
consumption, transport data, data acquired through satellite technology, social and
economic datasets, and crowdsourced data from social media or specialized apps.”
One of these apps concerns transportation, where the goal is to provide
cloud-enabled mobility (CEM) for everybody. The idea is “to connect users with
information and other cloud-based services (e.g., booking and billing systems) in
such a way as to reduce travel frustrations and congestion, and also allow users to
make spontaneous public transport decisions.”10 Central to this is the MotionMap
“that continuously describes the real-time movements of people and vehicles across
the city. It will include embedded timetables, car parking, bus and cycleway
information and estimates of congestion and crowd density in different parts of the
city.” Hence, users of the motion can, for example, decide to switch from car to bus
while on their way, or switch from one type of public transport to another, since
there are enabled to “see” what is happening on their path. It should not come as a
surprise that all of this is based on intensive data analysis utilizing several of the
techniques and technologies we have mentioned, including recent hardware
developments, social media analytics, cloud computing, and recommendations.
The second example is Urban Engines, a Silicon Valley startup whose original
mission was to improve “urban mobility—saving you and everyone else time in
10
www.mksmart.org/transport/
3.2 Big Data Analytics Application Areas 121
transit—by using information from the Internet of Moving Things.” The latter
refers to transit systems like metros and buses, delivery services, or on-demand
fleets which move through a city, thereby generating huge amounts of data. Urban
Engines collected and analyzed that data in such a way that people (or companies)
could understand better how traffic flows change during the day; ideally, this
knowledge can be exploited to optimize a personal transportation schedule, for
example by learning that a bus will be late due to a traffic jam and suggesting the
user to switch to the underground, and ultimately save time. The approach also
works for transportation services themselves, and Urban Engines’ software has
been deployed by Singapore’s Urban Redevelopment Authority (URA).11 The
important point here lies in the combination of data from a variety of sources, and in
analyzing this data jointly in order to identify, for example, commuter flows to and
from home and work locations. Urban Engines was acquired by Google in
September 2016 to become part of the Google Maps team.
Other Use Cases
Other areas that are already big on an exploitation of the massive amounts of data
that can be collected include market research (enabling the customer journey we
mentioned above) or the entertainment industry. Disney Parks & Resorts has
developed the MyMagic+ system (see www.disneyworld.disney.go.com/faq/bands-
cards/understanding-magic-band/), which through the My Disney Experience web
site and the corresponding mobile app can deliver up-to-date information on current
offerings to prospective guests planning a trip to one of the Disney parks. Disney’s
MagicBand can be used by the guest as a room key, a ticket for the theme park,
access to FastPass+ selection, or to make a purchase. Participating visitors can skip
queues, reserve attractions in advance and later change them via their smartphone,
and they will be greeted by Disney characters by their name. The system behind
MagicBand collects data about the visitor, his or her current location, purchase
history, and which attractions have been visited.
Finally, we mention that social media sites or search engines also intensively
analyze the data that they can get a hold of. Indeed, Twitter regularly analyzes the
tweets its users are generating, for example to identify and compare user groups, to
analyze user habits, or to perform sentiment analyses on the text of tweets. Simi-
larly, Facebook is interested in the number of “Likes” a page gets over time and
keeps a counter for recommended URLs, in order to make sure it takes less than
30 second from a click to an update of the respective counter. Google performs text
clustering in Google News and ties to show similar news next to each other;
moreover, they classify e-mails in Gmail, and perform various other analytics tasks,
e.g., in connection with their AdWords business. We will look “behind the curtain”
for some of these applications later, in order to give the reader an idea of the
techniques employed in these areas.
11
www.ura.gov.sg/uol/
122 3 IT and the Consumer
Mobile commerce can be defined as any business activity conducted over a wireless
telecommunications network or from a mobile device. Specifically, in most cases it
can more simply be defined as the buying and selling of goods and services through
wireless handheld devices. Service-based mobile transactions can include those
involving the likes of entertainment such as online gaming, gambling, and content
consumption (e.g., from Netflix, Amazon Prime, or iTunes). They can also include
transactions that result from a user viewing advertisements on their mobile devices.
Another key source of mobile commerce revenue is from web-based mobile
communication. With increasing ubiquity of low-cost wireless networking (Wi-Fi
and Cellular), mobile users are moving away from more traditional, fixed-line forms
of electronic communication to greater use of mobile-only communication appli-
cations such as WhatsApp, Viber, or Snapchat, as well as mobile enabled appli-
cations such as Facebook Messenger, FaceTime, or Skype. All of these applications
are offered free to use, albeit with limited functionality, funded through the
ever-increasing presence of online advertising. WhatsApp, purchased by Facebook
in 2014 for an estimated $US19 billion, is an exception and, to-date, has resisted the
opportunity of advertising, instead recognizing the value of over one billion reg-
istered users and instead encouraging users to sign up to Facebook where they can
contribute to its extremely successful advertising-based funding model.
3.3 Mobile Commerce and Social Commerce 123
While we mention ubiquity as being a key enabler of mobile commerce, the rapid
growth of mobile commerce that has been observed around the world cannot be
attributed to any singular factor. Indeed, there are a number of reasons behind what
cannot only be described as a highly disruptive innovation. The following attributes
are collectively responsible for this growth.
mobile security is actually probably safer than that of your average laptop. The
reason for this is still a little unclear, but what has been observed is that hackers are
still predominately focused on fixed-line online transactions, not so much mobile.
While mobile connectivity continues to improve, at least within urban areas
within development nations, there continues to be concerns of the quality of mobile
connections, especially when financial transactions are involved. Users remain
concerned over slow or unstable connections and in particular worry that they may
be cut off in the middle of an e-commerce transaction. In the vast majority of cases
however, mobile application providers have technical solutions for such situations.
Increasing network speed, availability and reliability will continue to reduce the
likelihood of such occurrences.
One mobile commerce challenge that is less easy to solve relates to the small
screen size that inherently characterizes mobile devices. While on the one hand,
users demand the convenience that goes with small-sized devices, on the other hand
they want to be able to view in detail the products and services they are purchasing.
Unless a buyer is familiar with a product or service, or the product’s appearance
does not matter, users report being hesitant to buy an item on a smartphone. This
has led to many technical and design developments to try to maximize the space
available on a mobile screen and provide a user experience that is both intuitive and
maximizes the capabilities of the screen for which the user is viewing product and
service information. Further technological developments around foldable and
expandable screens will continue to reduce these user concerns.
Social
e-Commerce Social Media
Commerce
There exists a number of different social commerce business models. Not all
utilize social media extensively, but instead involve elements of socialization that
exist within existing retailing platforms. Social commerce business models, as
defined by Sagefrog (2013) and wpress4.me (2013) include:
In Chap. 1 we discussed the impact that the Web has had on the growing importance of
social networks over the years. A particular result of this has been the wide emergence
of social networking sites as well as blogs (e.g., Wordpress), microblogging (e.g.,
Twitter), or wikis (e.g., Wikipedia). Readers interested in the impact these media
nowadays have in general should consult sites like http://www.fanpagelist.com/
category/top_users/, “the social media directory of official accounts of your favorite
brands, celebrities, movies, TV shows and sports teams,” or http://www.ebizmba.
com/articles/blogs, which shows the top 15 most popular blogs on a monthly basis.
We will focus here on specifically at the business impact of social media, how
companies might use them, and how they could analyze the impact of their use.
Social media is no longer just confined to public usage by private people. Indeed,
many companies have discovered social media for their internal communication, and
nowadays use them intensively. Examples of tools available for this purpose include
Socialtext (see http://www.socialtext.com/, an “integrated suite of web-based social
software applications includes microblogging, user profile, directories, groups,
personal dashboards using OpenSocial widgets, shared spreadsheet, wiki, and
weblog collaboration tools, and mobile apps”), Atlassian Confluence (see http://
www.confluence.atlassian.com/, a type of team collaboration software), Asana
(asana.com, “the easiest way for teams to track their work—and get results”), Slack
(slack.com, “messaging for teams”), or Starmind (see http://www.starmind.com/).
The effects social media can have on an organization have been nicely summarized
by Kietzmann et al. (2011) in the “honeycomb” of social media, which separates
between social media functionality and implications of that functionality as follows:
Social Media Functionality:
• Presence: the extent to which users know if colleagues are available.
• Sharing: the extent to which users exchange, distribute and receive content.
• Relationships: the extent to which users relate to each other.
• Identity: the extent to which users reveal themselves.
• Conversations: the extent to which users communicate with each other.
• Reputation: the extent to which users know the social standing of others and
associated content.
• Groups: the extent to which users are ordered or form communities.
Although this may change over time, it is interesting to note that many com-
panies have recognized the convenience and benefits these media can offer both
internally and in the interaction with their customers. Indeed, companies often allow
their customers today to approach them through a variety of channels, including
voice (telephone, VoIP), social networks, e-mail, classical mail, private messages,
or chat and consequently employ what is a called a “multi-channel strategy” in their
customer relationship management. On the other hand, the saying that “a fool with a
tool is still a fool” is still valid; people’s mindset must be such that they are willing
to engage in all this.
As it is our goal in this chapter to give the reader an impression of how to approach
an analysis of social media, or how enterprises learn about their customers via
3.4 Social Media Technology and Marketing 129
social media, we now take a look at a typical problem in the context of social
networking, that of determining communities. Intuitively, communities are groups
of people that are related by a common interest or purpose and that often interact
regarding this interest; they also develop a sense of togetherness. Communities,
once found, can often be addressed as a whole, in order to support their purpose or
simply as subjects for conducting business. Communities can form for a variety of
purposes and goals, e.g., people with the same disease, people forming a shopping
community, people with the same hobby, the alumni of a school or university, a
sports club, or the fans of a particular type of music; notice that communities often
overlap, i.e., individual members often belong to more than one community.
This last remark is already a hint to a technical challenge: Communities can
easily be visualized as graphs, where nodes represent individual members and
edges a relationship (e.g., “friend”) between two members. So one might expect
that classical graph algorithms are applicable, for example for determining weakly
or strongly connected components. The catch is that graph algorithms tend to
determine disjoint subsets of the set of nodes, so that no node can be in more than
one of them. This is counterintuitive when applied to overlapping communities,
which is why different approaches are needed, one of which is described next.
As a running example, we consider the graph shown in Fig. 3.6, which shows a
very small social network. The interpretation of this graph is that there is a col-
lection of participating entities, e.g., individuals which form the nodes of the graph
and which in our example are named A … F. Moreover, there is at least one
relationship between entities of the network, which could be absolute or with a
degree, i.e., there could be different kinds of relationships between individuals, but
these are ignored here. As a consequence, we can consider undirected graphs,
where every relationship is symmetric, i.e., if X is a “friend” of Y, then Y is also a
“friend” of X. Finally, there is an assumption of locality, i.e., relationships tend to
cluster, e.g., if X is related to Y and Z, then Y and Z are probably also related.
However, notice that relationships are not necessarily transitive: If X is related to Y
and Y is related to Z, then X is not necessarily related to Z as well.
Obviously, real-world social networks are considerably more complex than our
tiny example. Many examples in that regard can be found on the Web; for instance,
Internet user John M. Baker shows his LinkedIn connections (as of 2011) at http://
www.etechsuccess2.blogspot.de/2011/01/my-social-network.html.
From a formal point of view, a social network is a graph G = (V, E) with a set V
of vertices (nodes) and a set E of edges that is a subset of V V. Key problems in
social network analysis are the following:
Following the example shown in Fig. 3.6, we are now interested in finding edges
that are least likely to be inside a community. We will make this precise using the
notion of “betweenness” of an edge (x, y) 2 E, defined as the number of pairs of
nodes u, v such that (x, y) lies on the shortest path between u and v. For example, in
Fig. 3.6 edge (B, D) has the highest betweenness of any edge in this graph, since it
appears on every shortest path between any of the nodes A, B, C to any of the nodes
D, E, F, G; all these 12 paths go through (B, D); hence betweenness (B,
D) = 3 4 = 12.
The intuition behind the notion of (edge) betweenness is to look at the strengths
of weak ties, and the higher that strength, the weaker the tie. This is similar to
playing golf, where a high score is also bad; high betweenness of (a, b) suggests
that (a, b) runs between two different communities, yet a and b do not belong to the
same community.
We mention that there is a related centrality notion, node betweenness, which
considers individual nodes instead of edges. It indicates how “central” a node is in a
network and is again measured by the number of shortest paths from all vertices to
all others that pass through that node. A node with high (node) betweenness cen-
trality has a large “influence” on things (messages, opinions) that pass through the
network, under the assumption that passing is based on shortest paths. Both con-
cepts have many applications, including computer and communication sciences,
biology, transport and scientific cooperation.
We next present an algorithm originally proposed by Girwan and Newman (2002),
hereafter abbreviated as GNA. It focuses on edge betweenness and detects commu-
nities by progressively removing edges from the original network, in such a way that
the connected components of the remaining network are the communities (which may
have smaller communities nested inside). GNA focuses on edges that are most likely
“between” communities, and essentially proceeds in three steps as follows:
1. First, the betweenness of all existing edges in the given graph is calculated, by
considering each node X in turn, determining the number of shortest paths from
X to other nodes, and using that number for assigning a partial betweenness to
the edges adjacent to X. When all nodes have been considered, add the
betweenness values determined for each edge and divide by 2 (since for each
edge, both its endpoints have been considered).
2. The edge with the highest betweenness is removed (if multiple edges have the
same highest betweenness, remove them all). The graph may thus split into
several disjoint components; if so, we have found some communities already.
3. Now make the second highest betweenness the highest and repeat Step 2 and
until the graph is broken into a suitable number of connected components,
which form the communities.
In the example of Fig. 3.6, the calculation of edge betweenness will yield the
result shown in Fig. 3.7; as mentioned before, the edge with the highest
betweenness is (B, D). If we remove this edge from the graph, we obtain two
communities as expected, one with nodes A, B, C and one with nodes D, E, F, G.
132 3 IT and the Consumer
Fig. 3.7 Graph from Fig. 3.6 after first betweenness calculation
We could then consider edges (A, B) and (B, C) in the first component as well as
edges (D, E) and (D, G) in the second, whose removal would yield smaller com-
munities; whether these are meaningful, however, would have to be answered from
an application point of view.
We are not discussing GNA in further detail, but mention that the algorithm
essentially needs to consider all the shortest paths between all pairs of nodes, which
is computationally expensive and can hence become an obstacle when determining
communities in networks with thousands or millions of nodes. GNA avoids this by
using a breadth-first search approach, which by counting the number of shortest
paths from a node to every other node determines the “flow” values for each edge.
This method has a complexity that is proportional to n e (i.e., linear in the size of
the input), where n is the number of nodes and e is the number of edges in the given
graph, which essentially means that it can be solved efficiently. Details can be
found, for example, in Leskovec et al. (2014).
To conclude this section, we note that there are many other techniques for
analyzing (social) networks. For example, in a site where users can post pictures
and tag them with keywords, the interest may be in “similar” pictures, where
similarity could either be based on a use of similar tags, or on similar picture
content, or both.
More generally and according to the position2 blog,12 there are four different
types of tools that enterprises need for analyzing social networks:
1. “Listening Tools: A brand cannot afford to be ignorant about what’s being said
about it on any major social platform. Social media listening can be your digital
eyes and ears. They sift through all the chatter and analyze it for positive and
negative comments. Depending on their complexity and features, they can give
you alerts on where your brand is featured or direct your attention to negative
comments and potential trouble creators.
2. Reach Tools: Every brand wants to maximize its reach on social media. With
the variety of social media platforms today, this is becoming an increasingly
12
www.blogs.position2.com/four-types-social-media-analytical-tools-need
3.4 Social Media Technology and Marketing 133
tough task. Each platform has different formats—Slideshare can carry a 100 MB
presentation but Twitter restricts at 140 character messages. The growth and
diversity of social media offerings makes manual posting across social media
platforms tougher with each passing day.
3. Depth Tools: Some products are high involvement products and choosing the
right product means a lot to the buyer. It could be a camera for an amateur
photographer or a financial software package for a small business owner. The
stakes are high for the buyer either emotionally, financially or in terms of effort
and impact.
4. Relationship Tools: Social relationship tools are useful for publishing content
on social sites. These tools offer scheduling capabilities, which ensure an
enduring online presence. This helps the brand stay in touch regularly with
consumers instead of making sporadic appearances.”
Important tools are henceforth tools that perform sentiment analysis, cluster
analysis (such as GNA) as well as other forms of analytics that ideally result in
perceptions that not only allow us to study the relationship between a company and
a customer as it has been in the past, but that ideally allows to foresee how to
improve (or reestablish) it in the future.
We mentioned in Chap. 1 that advertising on the Web has become one of the most
prominent Internet business models. It begun with simple banners that could be
placed on other Web sites. Since then it has emerged as one of the major ways to
make money on the Web, which according to Battelle (2005) is due to Bill Gross
and his invention of GoTo, a service that became famous for being among the first
to differentiate Web traffic. Indeed, what Gross quickly realized was that
non-targeted advertising on the Web was largely irrelevant and of little value to the
advertiser as long as the traffic passing by any placed ad was the “wrong” traffic,
i.e., from users not interested in what was being advertised. If users arrive at a site
due to a spammer who has led them there, due to a bad portal classification, or due
to a bad search result, they are unlikely to be interested in the products or services
offered at that site. He hence started investigating the question of how to get
qualified traffic to a site, i.e., traffic with a reasonable likelihood of responding to
the goods or services found at a site, and then started calculating what businesses
might be willing to pay for this. This gave birth to the idea that advertisement can
be associated with the terms people search for and the pay-per-click tracking
models we see today in this business.
Advertising has become a major business model on the Web since the arrival of
Google AdSense, according to them “a fast and easy way for website publishers of
all sizes to display relevant Google ads on their website’s content pages and earn
money” and Google AdWords, which allows businesses to “create ads and choose
134 3 IT and the Consumer
keywords, which are words and phrases related to [their] business. … When people
search on Google using one of [the] keywords, [the] ad may appear next to the
search results.” It is hence no surprise that consumers of stationary or mobile
devices are constantly flooded with ads today. It is also a major source of income
for social networks like Facebook, and hence is a connection to the topic of
community detection we discussed in the previous section.
Before we embark on a more detailed discussion of online advertising as a
business model in general and AdWords in particular, we note that advertising on
the Web represents another incarnation of the long tail curve of Web applications
we have seen in Chap. 1 (Fig. 3.7): Through Google AdWords and related
approaches (e.g., in social networks), it has become possible not only for large
companies (amounting to 20% of all companies) to place advertisements on the
Web, but now the same is possible even for a small company. Through a
cost-effective and highly scalable automated infrastructure provided by the index of
a search engine or of a social network, online sites can offer advertising even for
very limited budgets, as they may be only available for a small company. In other
words, small companies do not have to set up an infrastructure for advertising (even
in niche markets) themselves, they can simply rely on what others are providing and
searching for on the Web.
The 20+ years of history of online advertising has seen a variety of important
developments, including the following:
• Direct placement: Advertisers post their ads directly on a site, and do so for
free or for a fee or pay a commission. Examples include eBay, craigslist, and
many auto trading sites. The selection of an ad by a user can be based on
parameters (e.g., make, model, or year of a car), or it can be done relative to
query terms (e.g., “apartment Belmont”). Ranking of ads, i.e., the question of in
which order ads should be presented, is typically tricky under this approach and
may be based on such strategies like “most recent first” or similar. However,
users can be shown individual ad selections.
• Display ads: These are banners that are placed on many sites, sometimes at
fixed places (e.g., upper left corner), sometimes in the middle of text that rep-
resents a search result or an article spread over multiple Web pages, yet all users
get to see the same ads. Banner ads resemble advertising in traditional media
(e.g., magazines, TV); however, a big difference is that the advertiser now
typically pays for impressions, not just for showing the ad. An obvious benefit is
that the Web can exploit the information about its users, in order to determine
which ad they should be shown; this information can be gathered from a variety
of sources, including social media, email, bookmarks, time spent on a page, or
search queries issued. For example, if a search engine recognizes a user (via
cookies or when logged into an account with the respective provider) and can
record that the user has an interest, for example, in motorsports, there is a high
probability that an advertisement for car and car parts will be regularly presented
3.5 Online Advertising 135
to that user. As mentioned earlier, banner ads were the initial form of advertising
on the Web and have brought along a typical foundation for calculating fees, the
CPM (cost per mille,13 i.e., per thousand impressions) rate, meaning that an
advertiser does not have to pay for each and every click of his ad, but only per
thousand. Banner ads typically show low click-through rates and, correspond-
ingly, a low return on investment or revenue for the respective advertiser.
• Search advertising: This form of advertising is based on the simple idea of
creating an association between what a user is searching for on the Web and the
ads shown to her or him in response to a search query. This idea was originally
developed by a company called Overture, which was acquired by Yahoo!
around the year 2000, and would place ads together with the results of a search
query; advertisers can now bid on certain keywords, and when someone sear-
ches for one of these keywords, the ad of the highest bidder is shown. Like with
banners, the advertiser is charged only if the ad is actually clicked on. The
concept is (and has been) easily extended to e-mail, where a provider can
analyze e-mail content, e.g., search for “important” terms in e-mail, and then
select and show ads correspondingly.
Search advertising has become a primary advertising method on the Web since
Google adopted it around 2002. After a number of changes it was made available to
the public under the name AdWords. We will discuss these changes below, but in
particular focus on two issues that arise when ads are shown dynamically. These are:
• How to determine which ads should be shown together with a particular search
result?
• How to rank ads in case multiple ones link to a given search term?
Additional questions, not discussed here, refer to the question of how to attract
views and clicks, where to place an ad on a Web page, and generally how to do
better than traditional mass-media (radio, TV billboards), both from a provider’s
and from an advertiser’s point of view.
13
Latin for thousands.
136 3 IT and the Consumer
in one way or another that they are interested in having their ads placed near search
results for “sports car.” The typical way they indicate this is through an online
auction, in which advertisers offer a certain premium they are willing to pay to the
search engine provider as soon as their ad is clicked. We assume that every
advertiser or bidder has a certain budget (e.g., for the month), and that budget
cannot be exceeded.
The following simple example indicates that the letting the highest bidder win
might not always be the best solution: Suppose we are faced with the following
situation:
Thus, we only have two bidders and no one else; both still have all of their
budgets, and we only display one ad per query. Next assume we receive the
following sequence of search queries:
cmcmcm
The first query asks for “camaro”, the next for “mustang”, the next for “camaro”
again, and so on. If ads are placed as follows
BABA
The search engine will get stuck after answering four queries, since B, the
highest bidder on “c”, does not have enough budget anymore, and similarly, A, the
highest bidder on “m” does also not have enough budget anymore; thus, the overall
revenue will be 8, and no more ads will be shown after the first four searches.
So what we see here is that highest bids are not always a guarantee for having
ads placed, and indeed a search engine will typically keep track of how often the
ads of a particular advertiser are actually clicked, in order to get a more realistic
picture of potential revenues; we will come back to this point later.
We could do better than what we saw above if the search engine had an idea of
the future or, in other words, would know in advance which search queries to
expect. Indeed, if the entire sequence “c m c m c m” had been known in advance,
the search engine could have assigned ads as follows:
BABAAB
Now the last two ads are not from the highest bidders anymore, but the overall
revenue would come to 10.
3.5 Online Advertising 137
It is easy to see that this aspect of “knowing the future” can make a crucial
difference. Consider the following revised example, where budgets remains the
same as above, but bids go down:
mmmmmccccc
(i.e., five queries for “m” followed by five queries for “c”). Since both advertisers
bid on “m” and there is no difference in their bids, the search engine might assign as
follows:
BBBBB
But then B’s budget is exhausted, and A does not bid on “c”, so the revenue
obtained in this way is 5. Had the search engine known what to expect after the first
five queries, it could have placed ads as follows:
AAAAABBBBB
14
Notice that “online” does not mean in this context that it has to be done on the Internet; it only
indicates incomplete input information.
138 3 IT and the Consumer
example above shows why the future could help, but in this scenario knowledge
about the future is not available.
The last example above also shows that typically an online algorithm will
achieve results that are not as good as what could be achieved by an offline
algorithm. Indeed, the result of the online algorithm indicated could only be 50% of
the revenue otherwise achievable, i.e., by an optimal algorithm A. This discrepancy
is measured by a coefficient called the competitive ratio c of the online algorithm at
hand; in our case, c < 5/10 = ½, and it can be shown that the converse also holds
and hence c = ½. In other words, the result that this “greedy” online algorithm can
achieve cannot be guaranteed to be any better than 50% of the optimal result.
Having now established a basic understanding of what kind of algorithm is
needed for solving the problem of matching advertisements to search queries (or
their results), we now look at a simplified version of the matching problem itself.
The simplification is that we consider bipartite graphs, i.e., graphs whose set of
nodes can be divided into two disjoint subsets such that edges only connect nodes
that belong to distinct subsets. For example, Fig. 3.8 shows a bipartite graph with
four nodes in each node subset.
We interpret the edges in such a graph as preferences; if we consider ads to form
the left set and queries the right one, our interest is in finding as many matchings as
possible, i.e., subsets of the edges such that no node is an end of two or more edges.
For example, {(1, c), (2, b), (3, d), (4, a)} is a matching for the graph shown (with
thick lines) in Fig. 3.8; it is even a “perfect” matching in the sense that every node
of the graph appears in the matching. A matching that has the most number of edges
possible in a given graph is called a maximal matching. The matching just con-
sidered is also maximal, since no other matching could have more edges.
We now consider search advertising, sometimes also called the “adwords problem”
after the Google AdWords system. Informally, the problem we are faced with is the
following: A stream of queries q1, q2, … arrives at a search engine, and several
advertisers bid on each query. When a query arrives, the search engine must pick a
subset of advertisers and show their ads. Not surprisingly, the goal is to maximize
the revenue for the search engine! Stated slightly simpler than what Google actually
does, the search advertising problem can be stated as follows. Given are:
The aim is to respond to each search query with a selection of advertisers such
that the following holds:
1. The size of the selection is no larger than the limit on the number of ads per
query.
2. Each advertiser indeed has a bid on the query.
3. Each advertiser has enough budget left to pay for the ad if it is clicked.
Clearly, the search engine provider is interested in maximizing its revenue, or the
total value of the ads selected, where each value is calculated as bid * CTR. It is
therefore understandable that he will not necessarily display the ad of the highest
bidder, but those that promise the highest revenue. For example, A in the table
above has the highest bid, but a low CTR, B has the highest value, and C has the
highest CTR. In other words, if 1,000 queries occur for the search term in question,
A will most likely be clicked 10 times and yield a value of 10 c, B will be clicked
20 times with a value of 30 c, and C will be clicked 25 times with a value of 28.13
c. So the provider will obviously be more interested in C than in A.
To make the situation somewhat more complicated, each advertiser has a limited
(typically monthly) budget, which is divided by 30 to obtain a daily budget, and the
search engine makes sure that no one is charged more than their (daily) budget.
Moreover, the CTR of an ad is essentially unknown in advance and can only be
observed and monitored over time; so typically a search engine will need to start
with an assumption about the click probability of a new ad.
We next present the basic ideas underlying a greedy algorithm for search
advertising; to this end, we make several simplifying assumptions:
Then the algorithm simply says: For each query, pick any advertiser with a bid
for that query. As an example, consider two advertisers A and B such that A bids on
3.5 Online Advertising 141
m, while B bids on m and c; both have a budget of 4. Similar to what we have seen
earlier, for query stream
mmmmcccc
BBBB
AAAABBBB
would be optimal with a revenue of 8. Again the competitive ratio can be shown to
be ½, but with a simple improvement called the Balance algorithm it can be brought
up to 0.63. This improvement picks, for each query, the advertiser who bids on the
query and has the largest unspent budget, if that applies to more than one, it picks
one of them arbitrarily.
We conclude this section by briefly discussing the Google mechanism for
advertisers: It is based on an ongoing auction where advertisers can submit bids on
particular keywords; bids indicate the value a click would have for the advertiser
when the ad is shown. Google shows a limited number of ads only with each query.
Thus, while the original (Overture) idea was to simply order all ads for a given
keyword, Google now decides which ads to show, as well as the order in which to
show them, and as we have seen this decision is solely driven by expected revenue.
That is, the click-through rate is observed for each ad, based on the history of
displays of that ad. Users of the AdWords system specify a budget: the amount they
were willing to pay for all clicks on their ads in a month. As we have also shown,
these constraints make the problem of assigning ads to search queries significantly
more complex.
We also note that online advertising is nowadays a complex business due to the
number of parties involved. It is most often not just a business involving an advertiser
and a Web platform provider or (ad) publisher, but there are many intermediate steps
(involving intermediaries), each of which causes an application of additional fees. As
can be seen from Fig. 3.9, there is an entire ecosystem behind the online advertising
business today, consisting of advertising agencies, demand- as well as supply-side
platform, auction markets, and at its endpoints the advertiser and publisher. The three
big players distributing ads on the Web are Doubleclick by Google, Adtech, formerly
AOL, and smartadserver in Europe. It is safe to assume that each transition between
any two parties in Fig. 3.9 involves paying a fee, and statistics show that roughly only
50% of the amount an advertiser spends actually reaches the publisher. More infor-
mation on this can be found, for example in de.slideshare.net/andrewtweed1/
thomvest-advertising-technology-overview-sept-2014 or in http://www.de.slideshare.
net/ksanz15/understanding-the-online-advertising-technology-landscape.
142 3 IT and the Consumer
Advertising Agency
Demand-Side Platforms
Supply-Side Platform
Publisher
3.6 Recommendation
While traditional reviews often come from a professional source (such as the
publisher of a book or newspaper staff) or from private customers, online recom-
mendations are often generated by the data mining tools that work behind the
scenes; indeed, recommendations may come from other users, or they are generated
from user behavior (e.g., search history, time spent on particular Web pages while
browsing). Recommendation systems may look at transactional data that is col-
lected about each and every sales transaction, but also at previous user input (such
as ratings) or click paths. Ideally, it becomes possible to classify a customer’s
preferences and to build a profile; further recommendations can then be made on the
basis of some form of similarity of items or categories that have been identified in
or between consumers’ profiles. Clearly, recommendations point to other items,
where more customer reviews as well as further recommendations to more products
can be found (and hopefully end in purchases).
In the context of electronic commerce, recommendations have become very
popular. As an introductory example, let us briefly look at what Amazon does when
a user adds a product to his or her shopping cart. Amazon then creates a special
interim page where recommendations are the main pillar of the strategy, and where
a mix of several strategies occurs; these are:
• Cross-selling,
• other “related” or “similar” products,
• recommended promotions,
• more generic recommendations aimed at serendipity,
• recommended products in the Amazon shopping cart.
Amazon makes the most of recommendations in the page that is displayed when
you add a product to your shopping cart, seizing their opportunities to sell.
While in the context of e-commerce recommendation is typically about items or
products, it can, however, also be about a number of other things in other contexts.
For example, recommendation in electronic learning is about learning content, in
search and navigation about links and pages, in social networks about potential new
friends, or in online dating about potential dates. Irrespectively of the context, the
goal is to help people (customers, users) make decisions on where to spend
attention, money, time, or any combination of these. In what follows, we use
“items” as generic term for what is recommended.
Figure 3.10 shows the various components of a recommender system: It
incorporates two types of entity, items and users, and it takes as input ratings (if
available), content data, and possibly also demographic data. Ratings can be
implicit (obtained through observing user activity, including page views, purchases,
or mails) or explicit, and content can be structured, unstructured (e.g., a textual
evaluation), or somewhere in between. The output of a recommender system is
typically a recommendation and potentially even a prediction of what a user might
like next or what item could become interesting next.
144 3 IT and the Consumer
Recommender
Systems
Semi-
Explicit Implicit Structured Unstructured
Structured
A utility matrix is commonly sparse, i.e., most entries are missing (or 0) since
they are not known, since most people have not rated most items, and the goal of a
recommender is to predict values for these blanks; at the least, a recommender
should be able to fill in those blanks whose values are likely to be high. In reality, a
utility matrix U will have many columns (items) and many rows (users; both
3.6 Recommendation 145
numbers can be in the millions or even higher), and a recommender has access to a
number of attributes for both items and users in order to come up with an entry for
the matrix. Notice that adding a new user would add a line to U that is initially all
blank; adding a new item means adding a blank column. Even worse, a new system
would start with an empty matrix and then has a “cold-start problem.”
The key problems a recommender is faced with are to gather “known” ratings for
U, to extrapolate unknown (high) ratings from known ones, and to evaluate the
extrapolation methods employed. Clearly, the key interest when extrapolating is in
high unknown ratings, since the recommender is interested in what a user likes, not
what he or she dislikes. For gathering new ratings, there are again explicit as well as
implicit methods, both of which have pros and cons. Explicit ratings can be
obtained by asking users to rate items, which could bother people and hence lead to
unreliable responses. Implicit ratings means learning from user actions, in particular
from purchases that lead to high ratings, but a problem here is how to treat pur-
chases that result in low ratings.
In the following, we will briefly discuss two major approaches to the design of
recommender systems (see Fig. 3.11): content-based recommenders and collabo-
rative filtering; hybrid recommenders as a combination of these two are also an
option.
Recommender Systems
recommend build
match
Red
Circles
Triangles
User profile
1. Preprocess the document by eliminating stop words (as well as other actions,
e.g., word stemming).
2. Compute the TF-IDF score for each remaining word in the document; the ones
with the highest scores are the words that characterize the document.
3. Take as the features of a document the n words with the highest TF-IDF scores.
Then the Jaccard similarity of these sets is determined by looking at the relative
size of their intersection, and hence given by
Users B and C both liked movie MI1 as well as FF6 and disliked movie JB, so
they might have similar tastes; thus MI2 could be a good recommendation for C.
Conversely, users A and D liked both MI2 and FF6, so we may conclude that
people who like FF6 will also like MI2, and hence MI2 will be recommended to
user C.
Let us come back to the Jaccard similarity introduced above and consider
whether it is appropriate. To determine the similarity of users B and C, we consider
their associated vectors and ignore the missing entries. Thus, when B is considered
as a set, we get B = {2, 4, 4, 5, 5} (technically, we need to consider multisets here,
where duplicate entries are allowed, due to the fact that each number represents a
distinct valuation of some item). Similarly, C = {2, 4, 5, 5}. Hence we obtain:
B \ C ¼ f2; 4; 5; 5g
B [ C ¼ f2; 4; 4; 5; 5g
which implies
By the same type of calculation, we obtain the following, for example, for C
and D:
C \ D ¼ f5; 5g
C [ D ¼ f2; 3; 4; 5; 5g
which implies
While the second result (regarding C and D) is somewhat more intuitive than the
first (since C and D hardly have anything comparable), we can see that the Jaccard
measure is not appropriate in this case, since the information to which item a rating
value belongs is completely lost, and hence we are somehow comparing apples and
oranges. This would be even more striking if we had other users, say, E and F such
that, for example, E rated MI2 with 1 and FF7 with 5, while F rates exactly opposite
(and both so far rated nothing else): The Jaccard similarity of E and F would be 1,
i.e., these users would be identified as having the same taste, but that would be far
from valid.
A more appropriate measure in cases like these is the cosine measure described
next. Since we do have a utility matrix, we can look at the various vectors contained
in it. In particular, we now consider user preferences (i.e., rows in the utility matrix)
as vectors in a multi-dimensional space and will look at their pairwise cosine
distance, i.e., the angle between them. Since our vectors have positive integer
components only, we are technically looking at the discrete version of a Euclidian
space. Our intuition is that the smaller the angle between two vectors, the more the
150 3 IT and the Consumer
point in the same direction and hence the more similar they are: Two vectors x and
y with the same orientation have a cosine similarity of 1; if x and y are at 90°, they
have a cosine similarity of 0.
A quick recap from any geometry book will reveal that the cosine of two vectors
x and x, denoted cos(x, y) is defined as follows:
xy
cosðx; yÞ ¼
jj xjj kyk
In words, the cosine of vectors x and y is calculated as the dot product of x and y
divided by the L2 norms of x and y, i.e., their Euclidean distances from the origin.
The dot product of vectors x = [x1, …, xn] and y = [y1, …, yn] is defined as
X
n
xy¼ xi yi
i¼1
Missing entries will be no longer be ignored (as for the Jaccard measure), but
now be treated as 0, and we first look at the cosine of the angle between users A
and C:
5 2þ1 4
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:322
4 þ 52 þ 12 22 þ 42 þ 52
2
45
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:380
42 þ 52 þ 12 52 þ 52 þ 42
3.6 Recommendation 151
Since a larger (positive) cosine implies a smaller angle and therefore a smaller
distance, this measure tells us that A is slightly closer to B than to C, confirming our
intuition. By comparison, it is easily verified that, using Jaccard similarity, we
would have obtained simJ(A, C) = 0.5 and simJ(A, B) = 0.2.
So far we have focused on user-to-user collaborative filtering, yet there is the
obvious alternate (“dual”) view of item-to-item collaborative filtering:
Clearly, this kind of filtering can use the same similarity metrics and prediction
functions as in the user-to-user model.
Apparently, collaborative filtering is a powerful and efficient method, which
works for any item and can deliver very relevant recommendations. The bigger the
underlying database is and the more the past behaviors are recorded (i.e., the larger
the utility matrix U and the more non-blank entries in U), the better the recom-
mendations it can produce.
On the downside, collaborative filtering might be expensive to implement and
resource as well as time-consuming. A new item that has never been purchased
cannot be recommended, and a new customer who has never bought anything
cannot be compared to other customers and hence not be recommended any items.
In order to cope with the various drawbacks of both content-based and collab-
orative filtering, the approaches can be combined into “hybrid” recommenders,
which use a content-based approach to score some unrated items and then use
collaborative filtering for recommendations.
A final remark in this context concerns the size of the problems we are looking at
here. Given that sites like Amazon or Netflix have millions of users or customers,
and also may have thousands or even millions of products that could be subject to
recommendation, a typical utility matrix will obviously have a huge dimension. In
order to be able to produce recommendations in a timely fashion, ideally while a
user is navigating Amazon’s or Netflix’ pages, two directions of technical support
need to be explored: Relevant computations might be done using parallelization and
paradigms like map-reduce as discussed in Chap. 2. Alternatively, new algorithmic
paradigms need to be employed, which go beyond the scope of this book. Finally a
combination of the two is often the method of choice, since otherwise no timely
recommendations would be possible.
is a large consumer of technology, vehicles, offices supplies etc. Using the col-
lective size and purchasing power of government departments/agencies (often
working together as a single purchasing entity) greater value for public money can
be achieved than ever before.
The final category of e-government is government-to-government (G2G). This
includes activities within government units and also those between governments. In
this regard information sharing and process efficiency are the key benefits of G2G
e-government. In terms of information sharing, there are obvious situations where
timely and accurate data staring could lead to much better governmental outcomes.
Such sharing might be seen between government departments such as Immigration
and Inland Revenue, along with Police and Justice.
Broadly, the vision of e-government is based around enhancing public partici-
pation and by providing a progressive and reformist approach to bureaucracies
(Cumbie and Kar 2016). While what has transpired in a practical sense may not
fully aligned with what was initially envisioned, there was a clear expectation that
e-government was not about automating existing processes, but in offering
improved service delivery, integrated services, and market development (Grant and
Chau 2006).
Benchmarking studies tend to categorize e-government initiatives as being at one
of several distinct stages, or levels, of sophistication. The traditional view through
the late-1990s was that e-government developments would parallel those being
observed in the commercial world; essentially offering basic online information;
then citizen-requested information; followed by extra online service channels. It
was also expected that a step change in coverage might subsequently occur via
extensive online collaborations with a wide range of stakeholders on the way to
offering a full e-government service (after De Kare-Silver 1998). While even recent
models of e-government sophistication offer evidence of such progression (e.g.,
Norris and Reddick 2012), the advent of Web 2.0 technologies also requires that
e-government be viewed against the backdrop of (commercial) online social net-
working applications and services; in particular because a clear trend has emerged
of users expecting to contribute and shape Web content themselves (Wirtz and
Nitzsche 2013; Deakins et al. 2008). Recent research has started to uncover sig-
nificant use of various social media applications in local government (e.g., Oliveira
and Welch 2013) and these implementations provide transferable Web 2.0 migra-
tion paths for other government organizations to consider.
One model for e-government implementation that has stood the test of time, and
remains relevant even today, is Deloitte’s (2001) six-stage model. Taking a
citizen-centric approach and seeking to establish long-term relationship with citi-
zens, Deloitte’s six stages are as follows:
It is without question that governments around the world are facing unprece-
dented opportunities and challenges regarding management of their information and
interactions. While the commoditization of information and communication tech-
nologies coupled with Web 2.0 trends and technologies presents a plethora of
possible solutions, the pace of change is such that few governments are able to keep
up with it. Real-world e-government implementation has largely failed to match the
hype associated with early predictions of government transformation. These pre-
dictions are still sound, it is just that the rate of change in the public sector is
lagging that of for-profit industries.
(2014) or the original sources Kalyanasundaram and Pruhs (2000) as well as Mehta
et al. (2005).
An introduction to recommender systems is given by Aggarwal (2016) or
Agarwal and Chen (2016); the topic is also covered by Leskovec et al. (2014). We
also mention the recent research papers by Lu et al. (2015) and Jannach et al.
(2016).
IT and the Enterprise
4
As we have mentioned before, a recent Morgan Stanley study revealed that almost
30 percent of all applications will have migrated to the public cloud by the end of
2017. While this is a clear indication that cloud computing is an unstoppable trend
and will ultimately reach considerably higher percentages, many companies for
which IT is not their core business are confronted with a non-trivial and potentially
far-reaching decision when it comes to cloud computing: Is it the right alternative
for my organization? Can I really save money when I move to the cloud? What are
the legal implications of choosing a non-local cloud provider who maintains data
centers across the globe? Do I have a chance to negotiate the conditions under
which cloud provision would become acceptable for me? Is my data sufficiently
protected when hosted by a cloud provider?
These and other questions essentially touch on five different dimensions of cloud
sourcing, which can also be observed in many other areas and decision situations
involving technology adoption and are collectively called the TELOS dimensions:
Some of the TELOS dimensions have already been dealt with in previous chapters;
for others, in particular the legal dimension, specific knowledge regarding domestic
and international law is needed (which goes beyond the scope of this book). We
hence concentrate here on the organizational dimension.
Strategy development in IT has a long tradition, ever since it was recognized that an
IT business is not something that can be setup and run in an ad hoc fashion, but
instead needs careful planning and, due to the rapid development of the field,
constant evaluation and evolution. Hence, our discussion here takes a quite generic
approach to devising an IT strategy, which is shown in Fig. 4.1. It consists of five
stages beginning with preparation and planning. A crucially important aspect is the
specification of goals and the definition of one or more business cases. This is
followed by the selection of a solution provider, which typically involves a com-
parison of several potential providers.
4.1 Cloud Sourcing 159
Detailed Planning;
3
Contract Negotiation
Once a particular provider has been selected, detailed planning as well as con-
tract negotiations can start. These are followed by an implementation of the selected
solution and a migration of previous services to the new one(s). Finally, the new
setup becomes operational, which typically includes a maintenance schedule.
Ultimately, the new solution evolves over time, which may require some earlier
phases of the strategy to be revisited in order to consider the impact of evolutionary
changes from there. This can occur during intermediate phases as well.
It is necessary to adapt the generic strategy shown in Fig. 4.1 to one specific to
cloud computing and cloud sourcing. The first and most important step here is to
develop the cloud strategy at the level of top management. The strategy needs to
define which processes or business areas will be moved to the cloud or shall be
supported by cloud services, and which, if any, will not. Management also needs to
identify possible risks and how they might be mitigated or their effects managed.
When these issues have been clarified, further steps towards cloud sourcing can be
taken.
If it is not at all clear whether or not a move to the cloud will be beneficial, it
may help to retreat to traditional evaluation techniques such as SWOT analysis
(McGuire 2014) or balanced scorecard (Kaplan and Norton 1996). A SWOT
analysis, for example, will analyze the strengths, weaknesses, opportunities, and
threats for the organization in question and can suggest measures or remedies in
each case. Figure 4.2 shows a sample SWOT analysis that can accompany the
development of a cloud strategy (and is actually applicable to most forms of out-
sourcing). The value derived from carrying out a SWOT analysis include the
concentration on core competencies, improving the efficiency of the company’s IT
operation, a consolidation of the company-internal IT landscape, and ultimately cost
reduction. There are a number of challenges associated with cloud adoption, and
160 4 IT and the Enterprise
Efficiency
of IT Concentration
on core Participation in
operation Participation in technology
Cost competencies
cutting price decline leadership
Chance to
IT consoli- react fast to
dation Strengths Opportunities changing
markets
Internal External
Influencing Factors Early
Factors Factors adopter
Staff
opposition Weaknesses Threats Hacker
Necessary
Organization preparatory Contract
of data Dependency on measures termination Data leaks
management Internet by provider
these need to be considered carefully. For example, staff might be unhappy about
the changes brought about by a move to the cloud, in particular since this requires
serious planning and preparation; another challenge might be the intensified
dependency on the Internet and maybe even on a certain guaranteed bandwidth, and
the management of the company-owned data might require new forms of organi-
zation. On the other hand, the opportunities include a customer participation in
declining prices and technological innovations, since the cloud provider will always
be interested in offering up-to-date equipment, and also an opportunity for the
customer to quickly react to changing markets, which may require an increase or a
decrease in IT support from time to time. Being an early adopter of new technology
may be an opportunity if the technology is robust and future-proofed, but it may
also be a threat if the technology “flops”; both situations have occurred with the
adoption of novel technologies in the history of computing. Other threats include
the danger of being hacked, data leaks, security breaches, or the simple fact that the
provider may prematurely abrogate the contract.
In general, a strategy should be a set of guidelines that is binding for the entire
company (or at least those organizational units that are affected by it), often with
little room for maneuver. A company typically has an enterprise strategy from
which other strategies can be derived, among them an IT strategy and potentially a
cloud strategy. The latter can be broken down into a cloud sourcing strategy and a
cloud providing strategy. The former describes which services will be procured
from the cloud under which conditions; the latter is needed in case a company
4.1 Cloud Sourcing 161
wants to act as a cloud service provider itself; this typically applies to software
companies interested in marketing their products “as-a-service.”
We focus here on cloud sourcing, for which the corresponding strategy should
be in line with the overall enterprise strategy; to this end, it needs to address issues
such as the following “W” questions:
• Why should services be procured from or moved to the cloud? If there are
primarily financial reasons, the expected gain should be specified.
• What should be moved to the cloud? If existing systems are affected, should
they be replaced or just moved? What about the data that these systems handle?
• Which risks can be tolerated from a management perspective? Often certain risks
need to be tolerated in the beginning of a cloud sourcing project, with the option
of being eliminated later.
• When should the move to the cloud occur? A timeline for migration is often
useful.
• Who is acceptable or unacceptable as a cloud service provider? A number of
criteria can be used to assess a particular provider, like how the service avail-
ability has been in the past or how quick the provider responds to glitches.
• Where may the cloud provider be geographically located? This has to do with
the question where company-relevant data is kept, for which legal regulations
may apply. For example, if a requirement states that no company data or no
customer data may be kept outside the European Union, a cloud provider that
keeps redundant storage in Asia is not acceptable.
When designing a cloud strategy, it is important to get all stakeholders onboard and
then follow a sequence of steps such as the following:
1. Strategic analysis: Identify the important enterprise goals and how they might
impact the cloud strategy. Try to keep an orientation towards future
developments.
2. Identification of strategic cloud goals: State the core goals, ideally supported by
quantifiable and measurable statements regarding goal achievement and time-
line. Typical goals include effectiveness, efficiency, quality, flexibility, pro-
ductivity, and security.
3. Identification and specification of planning objects: Potential planning objects
include staff, infrastructure, applications, integration, and services. For each
such object, partial goals need to be specified, which when reached deliver the
overall goal. The cloud strategy should now be ready.
4. Review of the cloud strategy: Check the strategy design, typically in a workshop
to make sure that there are no more contradictions, unclear statements, or
passages that require further discussion; if such issues are discovered, they
should be taken care of right away.
5. Publication and enactment of the cloud strategy: Make the strategy known
within the enterprise; stipulate organizational measures for its implementation.
162 4 IT and the Enterprise
A cloud strategy that is the result of such a proceeding is a reasonable basis for the
next steps towards cloud adoption. Figure 4.3 shows a sample realization of a cloud
strategy based on such preparations. Notice that it is directly derived from the
generic IT strategy that was shown in Fig. 4.1. The implementation requires that
necessary responsibilities have been clearly assigned, and that all parties involved
are part of the process from the outset.
Once a move to the cloud has been decided on, there are still several open issues,
especially for a company whose core competency is in a field other than IT. This in
particular applies to small and medium-sized enterprises (SMEs), who often hesitate
the most or the longest before making a decision regarding cloud sourcing. There is
a simple reason for that: Large enterprises have often relied on a particular IT
company for an extended period of time, and when a move to the cloud is sug-
gested, the company normally relies on the advice of the IT company. Startup
companies, on the other hand, often rely on the (public) cloud entirely, since cloud
sourcing gives them an easy and comparatively cheap way of testing their idea or
product; if it “flies”, they can still decide whether to remain in the cloud or to invest
in IT resources themselves; if it does not, they can simply return the resources.
4.1 Cloud Sourcing 163
We therefore primarily target SMEs in the measures that are briefly described
next, and which indicate alternatives that are available when they need to make a
decision about cloud adoption. These include:
1. Sanity check: Ensure that the cloud paradigm is in principle adequate for the
problem at hand and check that typical benefits of cloud sourcing are likely to be
leveraged. As outlined above, the first step for a company should be to look for
indicators that promise attractive results from a cloud sourcing and
contra-indicators signaling that a cloud sourcing is likely to be a suboptimal
choice.
Profitable projects
Cost-efficiency Trust
Cooperative
Community Cloud
Legal
Cooperative Monitoring
Certainity
Cloud Intermediary
“Pass-through” services
(Supply-side)
(Buy-side)
(Supply-side)
services
(Buy-side)
SMEs
CSPs
(Supply-side)
Supply-side value
(Buy-side)
SMEs
CSPs
Buy-side value
Key activities „Atomic“
SMEs
CSPs
proposition
proposition
services
Compound
services Key resources
Revenue streams
Competition
relevant when there is a group of enterprises that plan a close cooperation based on
a mid- to long-term horizon. In this scenario, a bilateral cloud intermediary can
reduce the coordination overhead between the partners and allow for the realization
of economies of scale, scope and skill.
A particularly attractive setting can be given by a cloud intermediary that is also
a community cloud provider. In this case, all benefits of a cloud intermediary are
combined with the possibility to offer custom services that are tailored towards the
needs of the members of the cloud community. These services can comprise both
completely self-provided functionality and compound services, created from a
combination of third-party cloud services (providing some kind of added value);
they are also referred to as higher-order services. A cloud intermediary that also acts
as a CSP is called a hybrid cloud intermediary. The model of this type of inter-
mediary is shown in Fig. 4.6.
We finally mention that there are numerous other tools available which can help
SMEs in their handling of cloud computing and cloud sourcing. For example,
CloudHarmony1 “provides objective, impartial and reliable performance analysis to
compare cloud services” and simplifies “the comparison of cloud services by
providing reliable and objective performance analysis, reports, commentary, met-
rics and tools.” So, if you are interested, for instance how AWS compute services
1
cloudharmony.com/
4.1 Cloud Sourcing 167
for Europe perform over a 30-day period and what individual services have had,
this would be a useful source. Similar types of analyses were previously available
from CloudSherpas, which has meanwhile been acquired by Accenture.
The effects of implementing a cloud strategy and of utilizing these various tools
for cloud-oriented decision-making are meanwhile becoming known: According to
a 2015 survey of the British Cloud Industry Forum, cloud adoption grew from 48%
in 2010 to 84% in 2015,2 and the top five benefits the interrogated companies found
achieved through cloud service deployment were:
Companies have also recognized that by moving some or all of their IT resources
to the cloud, they can automate their business; in particular, sales, (potentially)
increase agility, and perform predictive analytics to an extent that was unthinkable
before.
The cloud has meanwhile reached a state where considerable amounts of money
can be made or spent. Through its development it thus resembles the development
of making money on Internet (e.g., the IPO of Netscape Communications in 1995,
see Chap. 1) in general or the development of e-commerce from a niche activity to
a ubiquitous, location-independent 24/7 business in particular. The Internet has
more than once produced new business models and artefacts, whose value initially
seemed low, but which at some point essentially exploded. The cloud seems to be
no exception.
2
raconteur.net/technology/cloud-is-shaping-a-new-uk-digital-landscape
3
www.innocentive.com/
168 4 IT and the Enterprise
4
www.crowdsource.com/
5
www.onespace.com/
6
www.stallman.org/
7
www.gnu.org/gnu/manifesto.en.html
8
www.apache.org/
4.1 Cloud Sourcing 169
We noted in the previous section that companies have become enabled by the cloud
to perform data analytics. This is due to the fact, as detailed in Chap. 2, that
computing power can be obtained from the cloud on-demand and in particular, in
arbitrarily large amounts. Moreover, numerous analytical tools are nowadays pro-
vided via the cloud, either for free or for a fee, to which everybody with an Internet
connection has access.
We have mentioned in Sect. 2.3 already that data warehouses emerged in the
1990s as a way to exploit the increasing amounts of data that digital businesses as
well as customers produce. We also mentioned that a data warehouse can be viewed
as a separate database, distinct from the operational one(s), that collects and inte-
grates data from a variety of operational sources and makes that data available to
(typically compute-intensive) analytical tasks. Enterprises commonly move trans-
actional data to the warehouse, where it is sent through extraction, transformation,
and loading (ETL), and ultimately made available to online analytical processing
(OLAP) or data mining applications. We also saw in Sect. 2.3 the “classical” data
warehouse architecture, which in a bottom-up fashion, distinguishes operational
database systems, a staging area for ETL, from the actual data warehouse core,
which is topped by a layer of analytical tools. We now consider this as the first
generation and term it “data warehousing 1.0.” In this section, we indicate that the
architecture of a data warehouse is now significantly more flexible and is capable of
incorporating a variety of additional sources and services. We look at the organi-
zational dimension of big data and consider the situation where a company or
institution wants to make use of it. What does it take to do so, and what needs to
change if the company has previously set up a data warehouse for data analytics
purposes? In particular, we briefly look again at strategy development and then
present a modification of the “classical” data warehouse architecture that is intended
to accommodate big data requirements.
9
www.moodle.org/?lang=en_us
170 4 IT and the Enterprise
Before we delve into recent developments that are replacing the classical data
warehouse, we take a quick look at one of its major applications, data mining. We
will not discuss data mining in its full generality, but mainly focus on one particular
application, association rule mining, in order to provide a summary of what is
achievable and how it can be approached.
Data mining is concerned with the algorithmic extraction of knowledge from
large data collections that is interesting for or relevant to a company or its appli-
cations. The important characteristic is that from the outset, it is often not exactly
clear what is being looked for. The information to be extracted from given data can
consist of patterns, associations, or relationships (like in association rule mining),
rules, conditional statements, classifications, clusters, time-series developments,
and various other outcomes. Typical usages of mining results were originally just
seen in areas such as marketing or customer relationship management and have
meanwhile been expanded to business applications such as advertising or recom-
mendations as well as to a variety of non-business areas. In CRM, the goal is often
to compose a customer profile or a customer 360-degree view, or, as a manager
from a large online retailer once put it: “Help people find stuff they didn’t know
they wanted.” Typical questions related to the already discussed customer journey
triggering data mining include: Who buys what? Who is interested in what? To
whom can I recommend a new product or which products to a recent purchase?
Which products are typically bought as a sequel to the purchase of another product?
When dealing with a social network, the interest might be in which communities
exist and how to find and address them.
The Data Mining Process
The basic data mining process is shown in Fig. 4.7: It starts with given input data,
which may be internal to the respective company or stem from external sources.
From this data, a selection of relevant portions has to be made, and the selected data
may need to undergo preprocessing or preparation (e.g., cleansing, reduction,
curation). Then a mining function, to be applied to the resulting data, is chosen and
executed. The mining results obtained are subject to interpretation and usage, but
may also give rise to an iteration of the process with new or more data or simply
with a different selected data set.
This general process, which has been in use for quite some time, albeit with
variations, has meanwhile been refined into the CRISP-DM methodology. CRISP-
DM stands for Cross-Industry Standard Process for Data Mining10 and essentially
is a hierarchical process model consisting of sets of tasks described at four levels of
abstraction, which top to bottom describe various phases, then generic tasks of each
phase, then specialized tasks, and finally process instances. There are six phases
altogether, shown in Fig. 4.8, which go beyond what was shown in Fig. 4.7 in that
CRISP-DM considers the business context, and reflects the fact that often-statistical
10
crisp-dm.eu/home/crisp-dm-methodology/
4.2 Business Intelligence and the Data Warehouse 2.0 171
Iteration, Selection of
potentially with Mining Function
new data
Mining Results
Interpretation
and Evaluation
Part-ID Description
S Spoiler
R Radiator cap
C Cold air intake
PC Performance camshaft
F Fuel rail cover
0 Oil dipstick
PF Performance oil filter
St Strut tower cap
T Tuning programmer
172 4 IT and the Enterprise
Business Data
Understanding Understanding
Data
Preparation
Deployment
Data
Modeling
Evaluation
Some parts are purchased in isolation, some together, and sometimes parts are
acquired after earlier purchases. We just look at the second case here. The parts
dealer records his customer transactions, which are as follows:
The basic idea of association rule mining is to discover rules for items that are
frequently purchased together. In our example, we see that the strut tower cap
(St) and radiator cap (R) appear together in several transactions, which might
indicate that the respective customers are interested in engine bay dress-ups. More
formally, a rule will be an expression of the form L ! R where L and R are sets of
items and the arrow indicates that if the items in L are bought, then there is good
probability that the items in R are also bought in the same transaction. For example,
{St} ! {R} means that with some probability, if someone buys a strut tower cap,
he or she will buy a radiator cap as well.
For ease of explanation, we consider a simple transaction database like the one
just shown which records a transaction ID and the set of items that were purchased
within that transaction. Note that we thus abstract from a number of further details,
such as the number of items of each type, the unit price for each item, the overall
price for the transaction (i.e., we cannot really determine the “customer value”), or
who was actually the customer. Such information can give rise to refined proce-
dures; we focus here on a simple version to convey the principle.
The algorithm we are going to describe is based on counting the frequency of
individual items within transactions, then pairs of items, then triples of items and so
on. This is done relative to a predetermined threshold, the (minimum) support of an
item set X, which is the (minimum) percentage of transactions that contain X. For
our example, we assume a required minimum support of 0.4, i.e., we are only
interested in combinations of items that occur in at least 40% of all transactions.
The algorithm proceeds in a stepwise fashion and considers larger sets of items
in each step. In each iteration, the support of all candidate item sets considered is
first determined; then those are eliminated which do not meet the required minimum
support, and the others are considered “frequent.” The remaining sets form the input
for the next iteration. For singleton sets, we obtain the following:
Given a required minimum support of 0.4, we can see that only parts C, R, St,
and T qualify for further consideration, since these are the only parts occurring in at
least 40% or 5 of the 11 transactions.
174 4 IT and the Enterprise
Since we are interested in rules of the form X ! Y, we need to look for larger
sets and hence iterate the process with the following candidate sets containing two
items each:
Obviously, only set {R, St} meets the required support of 40%, so we cannot
form larger item sets, and the only rules we can form in this case are R ! St and
St ! R. Now the questions is: Which of these two is significant? To evaluate this,
we employ a second measure such as the confidence of a rule of the form X ! Y,
which refers to the percentage of transactions that contain Y, provided they contain
X. For R ! St, we find that 5 of the 6 transactions that contain R also contain St, so
we can calculate a confidence of 5/6 or 83%. For St ! R, all transactions that
contain St also contain R, so for this rule the confidence is even 100%. If we were
given a required minimum confidence of, say, 90%, the first candidate rule would
be dropped, while the second rule would prevail.
The car parts shop could now try to exploit this result in a variety of ways.
Firstly they could raise the price of radiator caps, since with high probability, if
someone buys a strut tower cap, he or she will also purchase a radiator cap (and
probably care little for a higher price). They could place the parts near each other at
the shop, since people shopping for St will most likely also buy R. Alternatively
they could place them far apart, so that if someone shops for St, he or she will come
across a number of other parts when going for R, etc.
Note that the choice of minimum support and minimum confidence are crucial
for what the algorithm we have just indicated will produce. Suppose, for example,
that we lower the required minimum support to 25%, all singleton sets except for F
and PF are frequent, and we obtain more frequent item sets of size 2:
With these, we can now form candidate set {O, R, St} of size 3 and obtain the
following potential rules:
If the required minimum confidence is now, say, 75% instead of the previous
90%, we have found three new association rules involving 3 items instead of just 2,
which are obviously more interesting than the previous ones involving just 2 items
(although these are still valid, as are now rules O ! R, O ! St, R ! St, and
St ! R).
The algorithm we have just described by way of an example is generally known
as the Apriori algorithm. It utilizes the “Apriori principle,” a monotonicity property
of frequent sets, which states that all subsets of a frequent set are frequent as well
(or, conversely, any superset of an infrequent set must also be infrequent). This
property allows the algorithm, which as we saw proceeds by determining frequent
sets of size 1, then of size 2, then of size 3 and so on until no more frequent sets can
be found, to cut work in the middle, as we have seen in the example above. On the
other hand, the basic algorithm gives rise to a number of improvements, which have
been suggested over the years:
1. For every new size of an item set, it makes a complete scan over the given
transactions; while a limited (and small) number of scans can be shown to
suffice.
2. The number of rules the algorithm produces as output can be very large, in
particular when the required minimum support value is low; in extreme cases it
can even exceed the number of transactions given as input. This can be com-
pensated, for example, by limiting the length of a rule (or of either of its sides)
or by considering “condensed” representations of frequent item sets that sum-
marize their essential characteristics.
3. The confidence measure can easily be shown to be not the best for evaluating
rule, which is which measures such as the lift or the correlation as well as many
others have been suggested.
176 4 IT and the Enterprise
We also note that the Apriori algorithm is based on the generation of candidates
(for frequent item sets as well as for rules), which can actually be avoided, for
instance by sampling techniques. For all of these considerations, we refer the reader
to textbooks listed in the “Further Reading” section for this chapter.
Classification and Clustering
Association rule mining has become very popular as a data mining functionality,
yet it is certainly not the only one. We briefly look at two others, classification and
clustering, which have wide applications. While association rule mining is generally
a descriptive technique, classification is a prescriptive one (and clustering is again
descriptive).
In brief, classification is the task of predicting a class label for a given data item
or data point. Typical applications for classification include credit approval, target
marketing, or medical diagnosis, and typical models employed are classification
rules, decision trees, or statistical approaches (e.g., Bayes classifier). Given a
database, D, of data items and a set of classes C = {C1, …, Cm}, the classification
problem is to define a mapping f: D ! C, such that each element of D is assigned
to one class. For example, if the “items” are customers and the classes are A, B, and
C (depending on how much turnover each is generating), we get a classification of
the customer base into excellent (A), good (B) and reasonable (C). The approach to
obtain this classification for a large base is to start with a training sample (which
may be chosen using an expert’s knowledge) and then apply the model thereby
created to new data as it comes in.
While in classification the target classes are predefined (which “supervises” the
actual process), in clustering the output clusters are created on the fly. Here the goal
is to start from a given data set, and group data items that are “close” or “similar”
into the same cluster in such a way that for each cluster there is a representative
point that summarizes the cluster. For example, the boss of the aftermarket auto
parts company we learnt about above has five sales reps, and now would like to
organize all customers of the company into five distinct groups so that each can be
assigned a different sales rep. Clearly, customers in each group are expected to have
similar interests when it comes to auto parts, and two customers with very different
interests or buying patterns should not be in the same group. The remedy here is to
use a clustering technique that is based on a suitable similarity measure or good
partitioning criteria; numerous options include well-accepted approaches like
k-means or DBSCAN.
A challenge arising in clustering is outlier detection. Outliers are data objects
whose behavior deviates considerably from the general expectation, and their
detection has many applications, for example, in fraud detection, e.g., in medical
care, public safety, industry damage detection, credit card fraud, or intrusion
detection. For example, an outlier in a dataset containing personal data focusing on
4.2 Business Intelligence and the Data Warehouse 2.0 177
Hadoop
Data Data
Map/ Meta Data Warehouse
Warehouse OLAP Warehouse
Reduce Data Data Basis Server
Extension Core
HDFS
Streams
Static and
HDFS Logs Dynamic
Data Sources
Internal Data Sources External Data Sources
Fig. 4.9 Data warehouse “2.0” architecture enhanced for big data processing
age, is one for people more than 100 years old (an age that most people do not
reach). Again, there are many established techniques for detecting outliers.
Data mining has become one of the most prominent and widely used applications
for data warehouses in the late 1990s and early 2000s, in particular when a
warehouse has been configured into data marts that are selected for specific usages.
On the other hand, the software industry has long discovered that data mining
“packages”11 can be used even without a data warehouse, for example by just
putting them atop an operational system such as a relational database. This
development has continued in recent years, with digitization becoming ubiquitous
and data volumes rising faster and faster, and many software developers do not
explicitly rely on a data warehouse anymore, some not even on a database.
It is generally straightforward to extend the basic data warehouse architecture,
which we have seen in Sect. 2.3 (cf. Fig. 2.11 in Chap. 2); the same architecture
can also be recognized from the right half of Fig. 4.9, yet the figure also indicates
how to extend a traditional data warehouse architecture for big data. Indeed, what is
new in this figure is a wider selection of external data sources than typically
considered and the extension by a map-reduce engine such as Hadoop (or other
components from the Hadoop “ecosystem”) on the left side. Various ways of
11
Like the ones listed at www.predictiveanalyticstoday.com/top-free-data-mining-software/
178 4 IT and the Enterprise
communication need to be made available between these old and new building
blocks, but in the end, the set-up might look as shown in the figure.
When it comes to big data analytics, it is no longer necessary to “think” just in
terms of a data warehouse or an extension of it. We therefore depart from the
assumption that data analytics requires an underlying data warehouse and will
consider more general architectures next. To this end, we will first adapt our general
IT strategy we have seen earlier in this chapter (cf. Fig. 4.1) to the big data case and
assume that the typical CIO needs to decide whether to invest in additional tech-
nology for these new developments. Once again, a SWOT analysis can help, which
may be able to reveal the strengths, weaknesses, opportunities, and threats of the
big data project envisioned. Another tool that could be employed in
decision-making is context analysis, which looks at objectives, benefits, and the
general context and environment into which a project should fit.
We here adapt our generic IT strategy, which we have already used in cus-
tomized form for the cloud adoption case, to a big data decision scenario. Slightly
different from before, we assume that a company whose core business is not in the
IT domain has decided to step into the big data area, and consider the strategy
shown in Fig. 4.10. It starts with information gathering and planning, which could
involve either a SWOT analysis or a context analysis or both. If a decision is made
in favor of a big-data project or of a general adoption of big-data technology,
relevant data sources need to be selected. In an enterprise this could be a variety of
in-house sources, e.g., databases, but could also be a variety of external sources,
e.g., from the Web, which may provide relevant data for free or for a cost.
The second phase includes a selection of the technology to be employed, e.g.,
the selection of a specific Hadoop implementation. Then detailed planning and
implementation can take place. Finally, the system or project is in operation and
may need regular or ad hoc maintenance.
Our next goal is to indicate what a software selection (or its result) might look
like for specific big data use cases.
Single Document
Preprocessing
filtering, stop word removal, stemming, pruning, etc.
Basic Statistical
Computations
Information
Extraction
Topic
Discovery
Clustering
Categorization
Summarization
Sentiment
Analysis
from collecting basic statistics about a given text or a collection to opinion mining as
shown in Fig. 4.11.
Depending on the input data, different methods within each processing technique
are applicable. The middle layer of Fig. 4.11 shows possible such techniques; their
respective outputs are shown at the bottom of the figure. In general, output data is
arranged here in the order of increasing complexity (from left to right), i.e., the
effort grows with more complex tasks. The figure also shows that results of pre-
vious techniques can be used in subsequent steps. For example, an analytics process
might start with gaining some basic statistics knowledge about the given text; the
process might proceed with extraction of entities from the text, which is about
identifying (and extracting) structured information from the generally unstructured
text document. The result is knowledge about the main entities occurring in the
document(s) and their relationships.
Next, the main topics of the text can be discovered by executing a topic mod-
eling algorithm, which is essentially a statistical approach aimed at discovering the
core topics in a document corpus. Topic modeling algorithms aim at automatically
revealing and annotating document collections with their topics; see, for example,
Blei (2012). In case of input data being a collection of documents, their structure
might be explored by clustering and categorization techniques. Finally, more
recently developed techniques of text summarization and sentiment analysis might
be applied in order to obtain a brief summary of the document(s) and see if there are
any sentiments present in the text.
4.2 Business Intelligence and the Data Warehouse 2.0 181
12
www.cloudera.com/
13
hortonworks.com/
14
www.mapr.com/
182 4 IT and the Enterprise
A sample setup for sentiment analysis using the Hortonworks framework, which
builds upon a variety of open-source components from the Apache ecosystem
(www.apache.org/), including Hadoop technologies including HDFS and YARN as
4.2 Business Intelligence and the Data Warehouse 2.0 183
Analysis Analysis
DB Approach Methods
structured data
unstructured data content analysis/
text mining
Topic/issue/
trend-related
Tracking method trend analysis
APIs
RSS/HTML parsing
Combination of
Methods
opinion/ opinion mining/
sentiment-related sentiment analysis
Tracking approach Summarizing
Results
Analysis keyword-based
actor-based social network
Purpose random/explorative structural
analysis
Preprocessing
Social Media dash-
board
well as Apache Storm, Apache Solr, Oozie, and Zookeeper, can be found at www.
hortonworks.com/hadoop-tutorial/nlp-sentiment-analysis-retailers-using-hdp-itc-
infotech-radar/. The use-case behind the figure shown there is a brick-and-mortar or
online retailer interested in tracking social sentiment for each product as well as
competitive pricing or promotions being offered in social media and on the Web, for
any number of products in their portfolio. Using this, retailers can create continuous
re-pricing campaigns that can be implemented in real-time in their pricing systems.
A more general setup for social media analytics is shown in Fig. 4.13, which
abstracts from particular components and restricts the attention to the necessary
type of functionality. The framework provides a general guideline for the devel-
opment of toolsets aiming at collecting, storing, monitoring, analyzing, and sum-
marizing user-generated content from social media. Although originally developed
with political content in mind, the framework is obviously general enough to serve
other purposes as well.
The framework in Fig. 4.13 has two major parts, data tracking/collection and
data analysis. Data tracking first selects appropriate data sources, and then distin-
guishes different approaches for selecting a relevant subset of data to be tracked,
e.g., keyword-based, actor-based (only track data of certain persons/groups), or a
random/explorative approach. Suitable tracking methods for each data source also
need to be determined (e.g., the Twitter API as tracking method for Twitter). The
collected data may be structured or unstructured, and later analysis steps may
benefit from data pre-processing, such as converting natural language text into
uniform terms. The second part of the framework comprises the actual analysis and
184 4 IT and the Enterprise
Reviews
Recommendations
Evaluations (New)
Blog Posts Need or
Awareness
Customer
Profile Click Paths
in CRM Search History
System Search Duration
Transactional
Data
conferencing. During the pre-sales phase, a company will typically try to attract a
customer to its products, and during after-sales, it is crucial to keep the customer
satisfied, while the actual sales step is often done by “self-service” (e.g., airline
tickets, bank transactions, etc.). The important point here is that each phase and
indeed each channel used during the lifecycle creates data in abundance, and this
data can serve as the basis for CX. Indeed, if that data is properly integrated,
filtered, and processed in such a way that a customer profile is created and that each
individual interaction with the customer contributes to that profile, a company will
be enabled to keep good relationships with their customers.
Figure 4.14 summarizes what typical data sources a company has available to
create a customer profile. Typically, there will be a customer relationship man-
agement (CRM) system for keeping track of all customer interactions, and gener-
ally, this will be where customer profiles are also kept. However, while traditionally
only internal sources, such as the transactional database, could contribute to a
profile, a company nowadays has a host of other options, in particular though the
availability of external data that can be collected (or bought, see below) from the
Web. Importantly, a customer profile can be made the basis for predictions of what
the customer might be interested or even need next, an approach that Amazon has
pioneered through their “customer who bought this also bought …” feature.
Amazon today generates massive additional revenue by precisely analyzing what
users have been searching for, how long they stay on the site, where they click (e.g.,
on a particular positive or negative reviews), what they put on their wish-list, how
186 4 IT and the Enterprise
often they return to the site, and in response send them seemingly individualized
recommendations.
Predictive Analytics. Industrial Data Analytics
The concept of prediction is not restricted to electronic commerce. While it may not
be easy to predict the next product a user is interested in purchasing, there are other
areas where prediction is much more reliable and potentially even more useful.
A prominent (and old) example is given by the aircraft industry. According to
Hunter and Eng (1975), “the ability to detect an impending failure in an aircraft
engine mechanical power system, at an early stage, where expensive and possibly
catastrophic system failures can be prevented, will provide enhanced aircraft safety
by minimizing the possibility of a serious engine failure. It will also prevent the
pilot from needing to shut down an engine during flight with all the attendant
emergency ramifications that can arise. This will also improve the utilization of the
aircraft, by the scope to plan unscheduled engine removals to suit the aircraft
downtime, and it will reduce the turnaround time and costs to effect the necessary
repairs; particularly if the skill of detection can also pinpoint the area of distress
with accuracy.”
What all these sample applications tell us is essentially two things: Firstly, data
today is abundant, and there are tools available (many of them for free) which allow
us to maximize the value of that data. Secondly, data alone is not enough; indeed
the goal is to turn raw data into something smart, or big data into smart data. In the
context of a research project of the German government, even the following
“formula” has been stated, which goes even far beyond the few aspects we have
discussed here: “Smart Data = Big Data + Benefit + Semantics + Data Qual-
ity + Safety + Data Privacy.” In other words, big data only delivers the raw
material that needs to be appropriately processed and refined for being able to
deliver its full economic potential.
Consultancy
integration
Certificate / Licensing
Crawl, Buy, Aggregate
Match/Mine
€ Observe, … (OLAP)
Storage
Lineage/ Cleansing
Relevancy
Trust Transform
Processing Infrastructure
Public Web Executing processing flows + user defined functions
to being sold. Different from the stock market, however, a data marketplace may be
open to anyone, i.e., users can act as sellers or buyers or both.
Figure 4.15 shows the general concept of a data marketplace for integrating
public Web data with other data sources. Like with a data warehouse architecture,
the schema includes components for data extraction, transformation and loading, as
well as meta-data repositories describing data and algorithms. In addition, the data
marketplace offers interfaces for uploading data and methods for optimizing data,
e.g., by employing operators with user-defined-functionality, as well as components
for trading and billing the usage of these operators. In return, the provider of the
user-defined-function retrieves a monetary consumption (indicated by the Euro
symbol) from buyers. Moreover, in the case of large data volumes from the Web,
the marketplace relies on a scalable infrastructure for processing and indexing data.
For completeness, we mention that data nowadays need not only be obtained
from data marketplaces. Indeed, there are numerous sources of data on the Web
today; for example, www.linkedin.com/pulse/ten-sources-free-big-data-internet-
alan-brown lists ten of them. Another place on the Web for getting hold of data
collections is the Kaggle platform,15 which can be seen as an analog to InnoCentive
for data. According to Wikipedia, “In 2010, Kaggle was founded as a platform for
predictive modelling and analytics competitions on which companies and
researchers post their data and statisticians and data miners from all over the world
compete to produce the best models. This crowdsourcing approach relies on the fact
that there are countless strategies that can be applied to any predictive modelling
15
www.kaggle.com/
188 4 IT and the Enterprise
task and it is impossible to know at the outset which technique or analyst will be
most effective. Kaggle also hosts recruiting competitions in which data scientists
compete for a chance to interview at leading data science companies like Facebook,
Winton Capital, and Walmart.” Everybody can participate in a Kaggle competition,
and many individuals as well as companies have used Kaggle competitions to train
or improve their data analytics capabilities.
Data-processing and data-analytics technology is today capable of handling huge
amounts of data efficiently, which implies that there are large and primarily eco-
nomic opportunities for exploiting this data. The notion of business intelligence that
was “invented” in the context of (early) data mining as a circumscription of the fact
that business can improve or enhance their “intelligence” regarding customers and
revenues by analyzing and “massaging” their data to discover the unknown will
now enter the next level. Indeed, a consequence of the fact that more and more data
is made available in digital form not only allows business to gain new insights; it
also renders new discoveries possible in areas such physics or healthcare where the
primary target not necessarily is of type “business.” So not only regarding business,
big data can indeed be seen as the new intelligence enabler, since the broadness of
data available today (not just its sheer size!) and the available technology enable us
to perform analytics, to see connections, and to make predictions unthinkable only a
short while ago. It has been predicted that in ten years from now, every individual
will be in a position to exploit digital information to his or her advantage in ways
unforeseeable today.
While security breaches and data misuse have always been a challenge in
computer science, even this reaches a new level with big data. Website io9 (see
www.io9.gizmodo.com/or www.gizmodo.com.au) lists a number of ways in which
big data is creating the “science fiction future”, among them that dating sites can
predict when you are lying, that surveillance will become Orwellian, or that sci-
entists, doctors and insurers can make sense of your genome. We should hence be
aware that big data does not just require the right technology, but also needs an
appropriate governance and protection, a topic also favored by the Free Software
Foundation16 and its protagonists.
We now turn to a different topic that is relevant for the modern enterprise. With
increasing market penetration of (mobile and) smart devices such as smartphones,
tablets, or laptops and owing to the ubiquitous (“always-on”) nature of these
devices, private applications such as social networks and e-mail will more and more
reside on the same device as corporate documents or applications such as company
spreadsheets or (interfaces to) proprietary software. Most commonly, both types of
services are used interchangeably in both business and private environments, e.g.,
16
www.fsf.org/
4.3 IT Consumerization, BYOD and COPE 189
employees usually have a private and a corporate e-mail address, but both are
accessed through a common interface or even using the same e-mail application.
This observation is supported by the BYOD (“bring your own device”) develop-
ment, also known as IT consumerization (Castro-Leon 2014), where companies are
allowing their employees to use their personal devices at work or for work-related
purposes. Of the many benefits BYOD offers (both to organizations and their
employees), an increase in flexibility and efficiency as well as the ability to work at
anytime from anywhere are considered key (Morrow 2012). The underlying phi-
losophy of BYOD is in line with Mark Zuckerberg’s philosophy, implemented in
Facebook, that every person has only one identity (as opposed to a private and a
professional one). Indeed, in an interview with David Kirkpatrick for his book,
“The Facebook Effect.,” Zuckerberg is cited as saying “The days of you having a
different image for your work friends or co-workers and for the other people you
know are probably coming to an end pretty quickly. Having two identities for
yourself is an example of a lack of integrity.” (Zimmer 2010). Even though this
statement is controversial from a privacy point of view, the lack of integrity is
particularly pertinent, especially from a technological perspective.
An alternative to BYOD is often termed COPE (Corporate Owned, Personally
Enabled). This is effectively BYOD in reverse when an organization provides a
device for its employee yet allows the device to be used for personal use. The key
benefit of COPE is that the owner (the organization) can maintain control over the
setup and security of the device, thereby limiting potential security breaches that
may occur. BYOD is different from what is commonly termed CYOD (“Choose
Your Own Device”), where an employee can choose from a (typically limited)
variety of devices offered, yet the device chosen remains company property. CYOD
is in fact very similar to COPE and both COPE and CYOD may or may not come
with restrictions regarding the selection of devices. We will not distinguish CYOD
and COPE in the remainder of this section, and we will generally assume that for
the latter private usage is accepted. COPE is more commonly discussed, so we will
follow suit.
For BYOD or COPE to be effective, various issues need to be addressed,
including but not limited to:
By way of an example which is a reality in many parts of the world today, consider
the daily routine of a typical knowledge worker (e.g., a bank employee). While
having breakfast she wants to check on her private and business e-mails. Today, it
is very likely that she will do this on her smartphone, tablet, or on the rather new
combination “phablet” (portmanteau from phone and tablet) using two different
190 4 IT and the Enterprise
Web services, each with individual login. Also, a third and fourth application will
be needed for calendar and (quality) news, all of which with potentially different
credentials. In her office, after plugging her laptop into a docking station, she will
access proprietary banking software, the same e-mail and calendar services, only
via a different interface. She is likely to store some files on a company cloud storage
solution. Heading for a customer presentation she grabs her laptop again which is
obviously able to access the aforementioned cloud storage. Many interesting sce-
narios emerge from this setting: During a meeting, relevant company performance
figures can be accessed; on the way home a presentation can be finalized on the
train; during her lunch break she might take a quick look at photos from a relative’s
vacation; during a free moment, the remainder of last night’s movie can be watched.
While this may still be viewed as the realms of science fiction to some, it is only the
beginning of what will soon be everyday manifestations of our 24/7
hyper-connected world in which the distinction between private and professional
life is vastly blurred (Schmidt and Cohen 2013). In other words, people will soon be
living in their “personal” cloud, a term that was first mentioned in a 2011 Forrester
report and also picked up by the blog readwriteweb.com around the same time.
A study reported in BITKOM (2013), the German Federal Association for
Information Economy, Telecommunication and New Media (Bundesverband
Informationswirtschaft, Telekommunikation und Neue Medien e.V., abbreviated
BITKOM) shows that 43% of German IT and telecommunication companies allow
their employees to connect their own devices to the company network. Almost 60%
of these companies have established specific rules for this, 81% expect an increase
in employee satisfaction, 74% expect increased productivity, while roughly 40%
believe they will be perceived as a modern employer. On the other hand, 53% of the
companies interviewed declined the use of private devices in the workplace, mostly
due to increased maintenance and security costs in the presence of a large variety of
devices, differing operating systems, with all kinds of application software installed.
We expect these figures to be similar in other parts of the Western world. Lance and
Schweigert (2013) examine BYOD projects at IBM, Cisco, Citrix, and Intel to
determine when and whether an implementation of this concept is successful.
Issues relating to BYOD/COPE can be categorized into legal, economical,
organizational, and technical issues, several of which will be elaborated upon in this
section; for an introduction to cultural and organizational impact, see Lofaro (2014).
We look at current practice which reveals solutions already in use, and we point out
research issues that need further study.
BYOD eliminates the need to carry and use separate devices for private and work
purposes. As shown in Fig. 4.16, both BYOD (1) and COPE (2) have in common
that the total cost of usage is carried by one entity (company in case of COPE,
employee in case of BYOD). In reality, usage costs are likely to be shared such that
the cost of work usage is paid for by the employer in case of BYOD or the cost of
4.3 IT Consumerization, BYOD and COPE 191
Low High
Usage
Device
Security Risk
3
Total Cost
Single device, or
Shared usage payment (Hybrid)
Usage
Device
4
Separate devices
Personal
Separate payments Work Device
Device
5
Work Ownership Personal Ownership High Low
private usage is paid for by the individual (cases 4 and 3, respectively). In contrast
to these forms is the traditional model of separate devices for professional and
private usage (5). Whilst this is the most secure solution from a company per-
spective as the work device is controlled by it, it is also the most costly approach—
at least if total cost is examined—because two devices have to be purchased and
maintained. This is reversed with COPE or BYOD where less institutional control
results in a greater security risk, while the total cost are lower.
In terms of organizational processes, for each of these ownership models, it
becomes necessary to specify what concrete actions will be taken when a new
device is deployed or enabled. Furthermore, processes have to be defined that
describe the actions to be taken in case a device is stolen or lost, when updates need
to be physically deployed, or an employee leaves the company, as well as other
eventualities.
need to introduce ways to control them, instead of simply ignoring them or even
forbidding usage of personal devices for work purposes, when such actions are
likely to result in unauthorized, and indeed risky practices anyway.
A software solution to this problem is provided by Mobile Device Management
(MDM) such as that described by Liu et al. (2010), which offers centralized
management of all mobile devices that are used within an organizational context.
MDM typically comes with features such as the following:
• Software distribution via an in-house app store, installation and maintenance via
push apps
• Remote access for reading and setting possible configurations, push messages,
switch-off (remote lock and wipe)
• Inventory management for keeping track of hardware, software, licenses, patch
management
• Optional encrypted backup and restore
• “Containerization,” i.e., strict differentiation between private and professional
data and applications
• Protection against unauthorized access to enterprise services, jailbreak detection
• In general, conceptual issue include interfaces, SSO (single sign-on), architec-
ture, standards.
• Citrix XenMobile is an MDM tool that was originally created by Zenprise and
that particularly emphasizes protection against data loss, besides offering a
variety of functions pertaining to the mobile cloud.
We refer the reader to Schomm and Vossen (2013) for further details.
While MDM is a viable solution and some features such as an in-house app store
and SSO are demanded widely, this is particularly labor intensive and restricts users
in the way they can actually use their device. In this section, we suggest alternative
solutions and also look at the problem from a conceptual point of view; the reader is
referred to Morrow (2012) and Miller et al. (2012) for an overview of relevant
security (employer side) and privacy (employee side) challenges.
Essentially, two scenarios need to be considered:
1. The company owns the device (COPE): In this case, the device needs different
ways of being accessed: a personal domain, a company section, and a combined
section, each with an individual user-chosen access code.
2. The user owns the device (BYOD): In case of a split, i.e., the user leaves the
company and is no longer granted access to company data or services, the
company disables access to any company application and the user returns the
SIM card of the device in question.
Clearly, the measures that need to be taken in either case vary slightly for different
types of devices such as smartphones, tablets, or laptops and for cellular versus
non-cellular devices. We ignore such differences in what follows.
Both scenarios can utilize any of the security measures discussed in the fol-
lowing subsections (security tokens, hardware keys, smart containers). Before we
elaborate on these issues, we note that an additional complication can arise
depending on whether or not data is replicated on a user’s device. Data replication
necessitates identical copies existing at a central company-owned repository as well
as on all devices that request access to that data. As is well-known, replica control,
or making sure that all copies remain identical at all times, is a non-trivial problem
(Weikum and Vossen 2002), yet has the obvious advantage that data remains
available in the presence of network failures. Conversely, if keeping replicas is not
an option, data access depends on network availability. We consider replication a
purely technical issue and therefore do not consider it any further here.
Chow et al. (2009) outline the main issues from a general cloud computing
perspective without presenting actual solutions: (1) Trusted Computing: “authen-
ticate” hardware systems; allow computers to proof their compliance with certain
requirements; make sure hardware is not tampered with. Question: How to adapt for
virtual hardware in clouds? (2) Working on encrypted data: New approaches, such
as searchable encryption, homomorphic encryption or private information retrieval
194 4 IT and the Enterprise
As a term, the Digital Workplace is not new and was most likely first proposed by
Jeffrey Bier in around 2000. Bier had previously worked for Lotus Corporation as
general manager of the spreadsheet division. In 1996 he founded Instinctive
Technologies, later renamed as eRoom Technologies, based upon the work he had
undertaken at Lotus on collaborative technologies. Bier established five key
requirements of a digital workplace—all of which still hold today. These are
described later in this section.
There is no agreed upon definition of a digital workplace, in part because the
concept is still evolving. Marshall (2014) describes a virtual workplace as simply
being the virtual equivalent of the physical workplace, although it is far from as
simple as this. Marshall does accept that this is an imprecise definition and indeed
the shape of a digital workplace will vary between organizations. Key character-
istics of any definition are:
1. People focus. People are the central element of any digital business and the
overarching goal of any initiative is to develop a working environment that
allows people to work how they want, when they want, and—to a degree- where
they want, whilst also enhancing productivity and efficiencies.
4.4 The Digital Workplace 195
Mindset
Values, expectations and ways of
thinking that determines how
people and organizations act.
Capabilities
Where people and tools
come together serving People Organization
the purposes of
individuals, business and
the enterprise. Tools Enablers
Where the organization
and tools come together
in structures and
processes facilitating
change.
Technology
Adoption
However for these to work effectively, there are five management activities that
need to be carried out in a way that facilitates the provision of the services above:
1. Strategic planning
2. Governance and operational management
3. Proactive support for adoption
4. High quality user experience
5. Robust, secure and flexible technology.
Jeffery Bier established five key requirements of a digital workplace. Whilst now
over 15 years old, these are still highly relevant to the working environment of
today. The requirements he proposed are as follows.
4.4 The Digital Workplace 197
The key enabler of the digital workplace has been the rapid development of highly
functional collaborative technologies. Such technologies include those that have
been available for some years, such as e-mail and Intranets, along with the most
recent developments like big data and cloud computing. None of these in isolation
offer the capability to transform a traditional workplace into a fully-functioning
digital workplace, but as a suite of technologies, they permit much more radical
change. Deloitte summarizes the array of technologies that form the basis of the
technology infrastructure of the digital workplace. Messaging technologies such as
email, instant messaging, microblogging and SMS messaging provide a fast way for
colleagues to communicate. Productivity and efficiency may be enhanced through
the use of traditional IT software tools such as word processors, spreadsheets,
presentation software and the like. Collaboration technologies are used by all
employees to work with each other as well as external partners. Web conferencing
is a common tool of this type, along with Wikis, virtual communities and social
collaboration tools like Asana, Slack and Google Docs. Communication tools are a
key element of a digital workplace enabling information sharing and internal
publishing and can include portals and intranets, blogs as well as personalization of
websites. Self-service applications provide flexibility in working arrangements and
include HR systems, ERP and CRM applications. In some situations,
198 4 IT and the Enterprise
For many, a digital workplace is often most easily conceptualized in the physical
sense. Commonly held views are of work spaces that are highly contemporary,
often sterile, open-plan, loaded with physical technology, yet sparsely populated as
most employees are connected remotely while working in far-flung, yet unknown
locations. While this may be the case of some digital workplaces, the reality for
most is often very different, indeed little uniformity exists as the culture, industry
and indeed physical location of the organization greatly impacts how each digital
workplace actually looks and operates. Because of this variability, it is neither
possible nor useful to try and characterize the physical space of a digital workplace.
However three distinctly different styles are often observed. These include those
that have been designed with the principal goal of facilitating collaboration, those
designed to enable creativity, and finally workplaces that are intended to provide an
underlying sense of fun.
Collaboration
Many of the technologies employed within a digital workplace are intended to
facilitate internal and external collaboration. The likes of email, social collabora-
tions tools like Asana or Slack, cloud-based file sharing platforms such as Google
Docs, all provide the basis for synchronous and asynchronous collaboration.
However the physical space also plays an important part. The organization needs to
be cognizant of the nature of collaboration that occurs and provide a physical
environment that enables it to occur. Open-plan, flexible work-spaces are com-
monly used in this regard. Different types of seating (e.g. couches and bean-bags
are also used in some situations). From a technology point of view, interactive
touch screens are a useful addition to the collaborative workspace.
Creativity
For those organizations operating within the creative industries such as design,
architecture, art, etc., the physical space is a crucially important element in pro-
viding an environment in which the creative juices can flow. This can involve the
use of light, color, warmth and entertainment. Again, the nature of the business will
be the key determinant in what the physical space needs to look like.
4.4 The Digital Workplace 199
Fun
This is a challenging one. We all want work to be enjoyable. We want workplaces
that our staff want to go to. Making the workplace fun can assist with that, but at
what stage does this get out of hand? Google for example have workplaces that
portray a strong sense of fun. Employees move about the buildings on scooters,
slides and fire poles are incorporated. There are games rooms and entertainment,
plus a range of other features that most would view as fun-focused. But in many
ways, this is the type of portrayal that Google strives for and uses such physical
spaces to instill this type of organizational culture in its business. This type of
physical space is not-uncommon in this type of business. However it is not suited to
all industries. Would you expect your elected government representative to operate
in such a digital space? What about your tax department or your local police
station? Fun as a characteristic of a digital workplace has its “place”. However it is
not suited to every type of business and needs to align with the culture and external
perception the business wishes to portray.
While the concept of a digital workplace is not new, it is important to recognize
that is not simply achieved by renaming intranet technology, by adopting social
media, by introducing mobile technology, or by allowing workers to work from
remote locations and flexible times. For a business to provide a true digital
workplace, a combination of aligned tools/technologies, engaged employees, and
management processes needs to be present in equal measure.
Novel digital technologies such as the ones discussed earlier in this book or simply
unprecedented ways of using known technologies, or sometimes just immense
increases in performance of proven technologies provide the elements needed to
drive the digital economy. Here, motivated entrepreneurs with ingenious—or
sometimes downright strikingly simple—ideas will find the soil to thrive on. If
potent and ideally patient risk capital providers then come into play, a rising star of
the digital economy may be born. But how it can be prevented that this star does not
burn up in the fire of scorched millions after a short time? How can a business
model’s efficiency and sustainable success be ensured?
This question arises not only for the entrepreneurs in the digital economy as we
discuss in the next section, but also for any company competing for market share.
Companies are required to test their business models and to screen them for
potential success by digitizing them. This must not be a one-time process, but has to
be established as a process of continuous monitoring and continuous improvement
of the respective business model. Agility is the order of the day and the organization
needs to apply a high level of willingness to change. And it must be able to quickly
draw conclusions from the emergence of disruptive technologies that threaten
existing business models, and also offer opportunities for developing new markets.
200 4 IT and the Enterprise
In addition, the demands in terms of governance and compliance are on the rise and
companies, especially those operating in a global context, are faced with new
challenges on a daily basis.
External influencing
factors Business Process
Management
society Strategy
environment
markets
continuous
business Business Processes improvement
partners
job market
investors Business Excellence
corporate vision with business objectives, strategies and an objective risk assess-
ment as well as a comprehensive definition of business processes with process
context, processes, business rules, business objects, competences and responsibil-
ities as well as the relevant corporate structures. Construction plans of the digital
economy may not be rigid, as they would fail. Rather, they must be agile by being
stretched from a lot of “conceivable” business processes, which constitute an agile
character themselves. The mapping of business processes in semiformal models
then allows rapid implementation processes, in which the technical requirements of
the business process models are implemented into productive enterprise software
systems, which are increasingly sourced from the cloud (Software as a Service;
SaaS).
However, these are currently only the initial phases of the digital economy,
where this method has been proven. Further phases will follow, in which singular
business processes must be connected to enterprise-wide process chains and process
networks. Seamless integration and best usability for all users involved are required
just as much as an optimal exploitation of the potentials of the Internet of Things
(IoT). These aspects are briefly discussed in Chap. 6.
BPM Trends
Given the high volatility of business processes in the digital economy, it is
important that the underlying BPM platform is able to respond flexibly to future
trends or already considers the currently foreseeable trends. Interesting in this
context is a study by Forrester Research (cf. Richardson et al. 2012), in which the
analysts identify five trends that will change BPM fundamentally in the future.
4.5 BPM and the CPO: Governance, Agility and Efficiency for the Digital Economy 203
Based on these trends, we have defined the following change potentials for the
fields of action listed above:
• Competition:
Is the business process critical in terms of competitiveness? In other words, do
customers acknowledge quality and performance of the processes and do they
influence them directly or indirectly in making their purchasing decisions?
• Value:
In terms of value added, is it a core process of the company?
• Quality:
How does the business process affect the quality of products and services offered
by the company?
As a result of this brief review, business process’s profitability can be classified and
can be implemented following different priorities and using various methods. At a
generic level, two classes can be distinguished:
Values
standards
Norms and
and
Laws
ethical Corporate
principles objectives
Governance
Compliance Risk
Regulations Management Management Risks
Enterprise
Model
Application
Software Legal &
IT GRC
Process
GRC
IT Platform
Legal- and process GRC places the operative business in the center of consid-
eration. Are all legal requirements fulfilled? Customs and tax issues will be
resolved. Moreover, do the business processes guarantee the observance of
norms (e.g., ISO, DIN) and standards (including industry standards) across
national borders? Market risks and endangerment through a business interrup-
tion (e.g., force majeure) must be taken into account.
• IT GRC:
Information technology plays not only an important role in GRC with the
implementation of the GRC mechanisms—think of the automation of business
processes and monitoring of key indicators—but also presents significant risks.
In addition, they are subject to numerous regulations, such as with data pro-
tection (e.g., Federal Data Protection Act). An appropriate course of action
guarantees a secure and legally compliant use of all information resources. In
addition, programmatic anomalies and violations of existing regulations must be
avoided, e.g., regarding the segregation of duties (SoD).
Business
Process
Management
(BPM)
Business Rules
Enterprise Management (BRM)
Performance Management
(EPM)
Systems People
Documents
Decisions
Events
the completeness of the business process and master data structures are pursued,
along with the quality.
• BPM and Business Rules Management (BRM):
The business rules defined in a cross-process context provide a framework into
which the business processes must be fitted. On the other hand, new regulations
arise within defined business processes. Based on these considerations, it is
recommended to tune the rules defined at different levels to each other by means
of simultaneous planning.
• BPM and Enterprise Performance Management (EPM):
In EPM, performance measurement systems are designed and used to monitor
the company’s performance and therefore also the performance of business
processes. In this respect, EPM is an essential component in business process
governance and therefore simultaneous planning is also recommended here.
Practice reports and surveys show that more and more companies are threatening to
choke on their complexity in different countries and industries. Often the reasons
for the complexity are quite comprehensible, especially when referring to the
products’ varieties and complexities, or to the demand of the market to meet
individual customer experiences. But even in these cases it is worth taking a look
into the company’s business processes. Often innovative technologies provide
210 4 IT and the Enterprise
Cross-company
Core business processes
Corporate
Cross-company Cross-company
Master data management
BPM/BRM Repository SOA Integration platform
products suppliers
customers sites
Company-specific
processes
17
eTOM® (enhanced Telecom Operations Map) is a business process framework by the
TeleManagement Forum aimed at service providers of the telecommunications industry and their
partners (cf. Schönthaler et al. 2012).
18
SCOR® (Supply Chain Operations Reference Model) describes the process reference model by
the Supply Chain Councils with the aim to discuss and improve supply chain management
procedures within a company and with business partners (cf. Schönthaler et al. 2012).
212 4 IT and the Enterprise
refinement
• Financial Services:
Purchasing and accounts payable, order management and accounts receivable,
general accounting, travel and expense reports, account reconciliation.
• Trade:
Logistics, freight invoices, order management, returns, account reconciliation.
• Insurance:
Quotations, contract conclusion, premium statement, accounts receivable,
claims processing.
• Healthcare:
Social insurance accounting, processing of reimbursement claims, accounts
receivable, rejected reimbursement claims.
Knowledge-intensive Processes
While cost considerations have been paramount in the previously considered BPO
examples, the superior knowledge of a service provider plays an important role in
the following examples. In general, the scope of outsourced knowledge-intensive
processes and in particular its complexity is significantly higher than the “simple”
transaction processing mentioned above. For the end customer, the advantages of
BPO knowledge-intensive processes are demonstrated mostly through improved
product quality and the accelerated implementation of innovations. A few examples
will illustrate this:
• Distribution Services:
The service provider takes over the distribution of certain products in a clearly
defined target market. Often, distribution is not limited to the marketing of the
products, but includes marketing, warehousing, and often end-customer service.
• Contact Center:
Qualified service providers operate entire contact centers on behalf of their
customers. Multi-modal and multilingual inbound and outbound services are
offered. Typical inbound services include customer service and care (loyalty,
commitment, satisfaction) or internal IT help desks for technical and user
support. Typical outbound services include campaign management and tele-
marketing, telesales, sales support (appointments), tele-market research or also
multi-modal dunning processes. Up and cross-selling opportunities are put to
full use with a high level of expertise.
• Personnel Management and Earnings:
A very common application for BPO is personnel management (full
recruit-to-retire processes) and especially the settlement of wage and salary
statements. Although cost considerations indeed play a role in the personnel
area, the high level of expertise and the service provider’s ability to remain in a
current state of knowledge stands in the foreground of BPO.
Customer Processes
Business
Services
Customer 2
Customer 3
Orchestration
Process costs typically are at the center of economic BPO studies. They are
compared with the expected benefits in order to come to a resilient basis for
decision. Keep in mind that BPO always raises technical issues. How does the
integration of the outsourced business processes with internal processes or other
outsourced processes take place? How are corporation-wide processes realized
beyond corporate boundaries, and how does the overall process control take place?
Can business activity monitoring be guaranteed across all processes? Who will
ensure compliance of business rules within the overall context of the virtual
enterprise? How will a uniform and consistent master data management be ensured?
Are collaborative planning processes, involving all participating business partners,
including customers and suppliers provided for? The sheer quantity of these issues
shows that in the initial technical implementation of BPO, and in the current
operation, significant cost drivers can be found.
Integration components in the form of a mediation layer are shown in Fig. 4.25.
A mediation layer is spoken of since the service provider strives rising economies
of scale and will therefore always try to handle all outsourced processes internally
as consistently as possible. This is reflected in largely standardized procedures with
appropriate control and data flows. On the other hand, since the service customer
expects his individual requirements to be implemented, a customer-specific trans-
formation of the control and data flow exchanged between service providers and
216 4 IT and the Enterprise
service customer must take place within the mediation layer. BPO service cus-
tomers also demand—not least of all due to GRC requirements—a degree of
transparency regarding the process results, but also the process execution itself.
Accordingly, the service provider is required to offer customized reporting,
meaningful key indicators and analysis.
Model-Based Planning and Implementation of BPO Contracts
As the name suggests, BPO is all about business processes, so that BPO is a
“natural” field of application for methods and tools like Horus. And this applies
across the entire life cycle of BPO contracts. In addition, it should be more than
clear from the foregoing statements that BPO requires a common understanding of
the technical business requirements between service customer and service provider,
generated by the outsourced process to be implemented. This common under-
standing can be specified in a formal way, for example, by means of Horus models.
How Horus can be used by parties involved with BPO, and the resulting benefits
will be addressed in the following.
In many industries, markets today show a volatility, which we at most have only
experienced in some segments of the financial industry up until a few years ago.
Business models with a half-life of only a few years or even months are no longer
uncommon. This is also a characteristic of the digital economy. Drivers are in many
cases business processes or service innovations that are made possible by new
digital technologies or through new uses of technology.
218 4 IT and the Enterprise
For innovations to be successful in the digital economy, more than ever speed
and consistency are crucial for competitiveness when asserting a market. It is also
important to use the knowledge of the entire business community for innovation
management. For this reason, more and more companies have started to anchor
their innovation management directly within the business community. The provi-
sion of the entire knowledge and creativity present in the community creates an
ideal basis for generating a continuous stream of innovative customer services and
business processes. This base is even more necessary, as in the public debate the
social benefits of innovation are questioned increasingly (see Stiglitz and Green-
wald 2014).
For innovation management, we propose a method that is based on the
above-described Horus Method for Social BPM (cf. Schönthaler et al. 2012). In the
business community, this forms social innovation networks where domain spe-
cialists, experts from different disciplines, ideally potential customers and external
partners in the value chain, as well as opinion leaders and idea givers are connected
to each other in a social network: the innovation community. The community works
using a web-based collaboration platform, on which, next to popular social media,
intuitive software tools are used to graphically model processes and services. The
active participation in the community work adds each individual’s creativity and
knowledge, which is then connected and amplified in group dynamic processes and
leads to process and service innovations. The quality of the innovation process
depends on that community members contribute relevant knowledge, experience,
creativity and inspiration openly and unconditionally and that they are prepared to
link these ingredients to the other members in a way that benefits the community.
Figure 4.26 shows how services can be provided for the entire innovation life-
cycle around a social collaboration platform. In this way, not only the actual
generation of innovation can be promoted, but also the learning process leading to
innovation or during the generation process, or as part of the market assertion, the
accompanying research, the development and marketing innovation.
Collaboration then becomes the lifeblood of innovation management and the
driver for agile business processes as a construction plan of the digital economy.
Up until now this section has dealt with the important fields of action for the
implementation of a holistic BPM platform in a company. In practice, however, it
turns out time and time again that even after the consistent implementation of BPM
projects, the BPM commitment is reduced in a gradual process (sometimes called
the “BPM erosion”). This results in cases where only short-term benefits are
achieved and that fact that significant volumes of long-term potential are not
assessed. Such situations can be avoided if, with the implementation of a BPM
platform, organizational measures for the establishment of a dedicated BPM
4.5 BPM and the CPO: Governance, Agility and Efficiency for the Digital Economy 219
Social Partner
Collaboration
Social
Collaborative Collaboration
Marketing & Sales Collaborative
Platform Education
Education Consulting
Services Services
On On
Demand Demand
Collaborative Collaborative
Development SaaS / Innovation
On Premise
Collaborative POWERED BY
Research
Process Governor
BIZ PROCESS FACTORY
Process
Factory Manager Solution Architect governance
Best Practice
Reference
Process
IS Engineer Models
Department Developers Seamlessly integrated
Business BPM Methods Business Processes
& Tools Admini-
Analyst
Process demand strator
(integration focus)
Services & Integration Integrated
workflow
Manufacturing
Enterprise
application-specific
Applications Teams
knowledge (Oracle Apps, SAP, MS
Dynamics, …)
no longer require a data warehouse as their base, (2) the fact that many tools can
easily be plugged together today via their APIs, and (3) that many solutions have
become available as open-source. Several big data processing goals require specific
solutions; for example, looking for similarity of texts or documents can be based on
sophisticated techniques such as minhashing or locality-sensitive hashing as
described by Leskovec et al. (2014). Sentiment analysis is the topic of Liu (2015).
The process flow shown in Fig. 4.12 was originally described by Steffen (2013).
The social media analytics framework shown in Fig. 4.13 was originally proposed
by Stieglitz and Dang-Xuan (2013). The Stieglitz-Dang-Xuan framework has been
used, for example, by Ruland (2015) for doing an in-depth analysis of public
microblogging and parliament protocol data to find out how the German members
of the Bundestag as well as the German public perceives TTIP, the highly con-
troversial Transatlantic Trade and Investment Partnership under negotiation at the
time of this writing.
For details on the Smart Data Research Program we mentioned in the text, the
reader is referred to www.digitale-technologien.de/DT/Navigation/EN/Home/home.
html. Figure 4.15 on data marketplaces is originally from Muschalle et al. (2013);
the topic has intensively been studied by Schomm et al. (2013) or Stahl et al. (2014,
2016).
An introduction to distributed ledger technology can be found in ASTRI (2016),
to blockchain technology in Diedrich (2016) or Drescher (2017). These technolo-
gies have a number of applications, for example, in mortgage loan applications,
trade finance, digital identity management, regulatory compliance, or cryptocur-
rency. Most prominent in the latter area is Bitcoin, originally proposed by Naka-
moto (2008). Bitcoin is an example of an unpermissioned distributed ledger
platform, meaning that it is maintained by public nodes and is accessible to anyone.
Another platform type is the permissioned one, which involves authorized nodes
only and hence facilitates faster, more secure, and more cost-effective transactions;
an example is Corda (see www.corda.net/). The development of Corda is led by R3,
a fintech company that heads a consortium of over 70 of the world’s largest
financial institutions. Other such consortia are the Enterprise Ethereum Alliance
(entethalliance.org/), Ripple (ripple.com/), or Hyperledger (www.hyperledger.org/).
For up-to-date information on permissioned blockchains we refer the reader to the
respective Web page of IBM Fellow C. Mohan at bit.ly/CMbcDB.
Digitization and Disruptive Innovation
5
After having discussed technical developments over the last few decades and
presented strategies for IT-related decision making in specific areas, the message in
this chapter is that companies need to change the way they consider their customers,
as well as their internal business operations. One core keyword here is digitization,
another is disruption. Our goal is to discuss what innovation and disruption mean
and provide some typical examples of businesses disruption. The Christensen
theory states that traditional companies cannot be disruptive since they are busy
keeping their customers and fighting the competition; hence, they will ultimately
fail when disruptors take over. However, there might be ways even for a traditional
company to survive, which we will discuss.
Market research
R&D Insights
Early SWOT
adopters Analysis
Opportunity evaluation.
Delivery
Value proposition
Preparation
of market
entry
Development Pilot development. Testing
Proof of concept
processes. Moreover, changes in the global marketplace are common on the daily
agenda of any business: Companies are increasingly forced to adapt to
(new) markets, customers, competitors and business partners, but also to new
requirements in terms of governance, risk, compliance, and security management
(GRC+, see Chap. 2). We also saw that, “thanks” to Big Data, numerous options
exist today to enhance a business process or the knowledge that a company has
about customers, services, and products, provided the respective enterprise is aware
of these options and is willing or even has a strategy to exploit them.
Most often, innovation is seen as a process, often one that is cyclical in nature,
like the one shown in Fig. 5.1. It typically starts with a development of ideas for
new services or products or from the recognition or discovery of novel customer
needs. This may or may not be based on market research,1 or just on brainstorming
or insights from a company’s research and development (R&D) department. Once
an innovative idea has been fixed, opportunity evaluation can start, as can value
proposition, the former possibly based on a SWOT analysis or even other tools
from the Horus method described in Chaps. 2 and 4. This phase considers the risk
of failure, the cost of development, the return on investment (ROI), and as well as
the competition.
If an evaluation indicates positive opportunities, a prototype/pilot of the product
or service can be developed, in order to conduct a proof of concept, possibly with
selected test customers or users. If the intended purpose can indeed be verified and
the considerations around a wider introduction be validated, development can start.
This phase will be accompanied by a preparation of the market entry (through
1
Former Apple CEO Steve Jobs never believed in market research.
5.1 Innovation. Social Innovation Labs 225
marketing and sales departments), and finally delivery can first go to early adopters
and then to a wider audience.
Not surprisingly, these phases, sometimes abbreviated just as Definition, Dis-
covery, Development, and Delivery, will have touchpoints with existing business
processes, and nowadays makes extensive use of big data. As we have discussed,
data can be acquired from reviews, blogs, and social media, to find out what people
say about a (new) service or product, recommenders can be used to produce
awareness, and data mining techniques like classification can be employed to
identify pilot customers or those to whom the new product is offered first.
Occasionally, the cycle shown in Fig. 5.1 involves the development of a new
business model, typically at the very beginning when a new product or service is
being conceived. A typical example is Apple’s iTunes store. When the first iPhone
was released in 2007, which for the first time combined a mobile phone with a
PDA, a music player, and a Web capable device, a new business model for mar-
keting and distributing music came with it, which was soon expanding into a
distribution and sales channel for software (in the form of apps) as well.
We mentioned cloud revenue models in Chap. 2. Among them is the freemium
model, which combines free basic services with charging for an advanced one (or
often free access to services in exchange for commercial use of the customer’s
data), as well as the subscription model. Online advertising as a prominent Web
business model was discussed in Chap. 3. Other Internet business models that have
evolved over time (many of which are “digitized” versions of classical business
models) include the following:
• Auctions: the process of buying and selling goods or services by offering them
up for bid, taking bids, and then selling the item to one of the bidders; among
them
• Brokerage: brings together buyers and sellers and charges a fee per transaction
to one or another party. Examples are Charles Schwab or Bayleys. We also
described the concept of a cloud intermediary in Chap. 4, which is a form of
brokerage.
• Razor and Blades: where an item is sold at a low price (or even given away for
free), in order to increase sales of a complementary good, such as supplies
needed to use the item. An example of this is printers (inkjet or laser), which are
typically amortized via their consumable supplies. The concept, also known a
freebie marketing, is wrongly attributed to the founder of the Gillette Safety
Razor Company, who gave away razors for free and made people pay for the
226 5 Digitization and Disruptive Innovation
blades they needed. The concept is still common in the mobile phone market,
where devices are often subsidized via the contract.
• A value proposition, i.e., a statement of how the products or services offered can
create value for the customer.
• A revenue model, i.e., a description of the cash flows that will generate profit.
• A specification of the target customer or the market segment to which the
products or services are to be offered for the purpose of creating value and
revenue.
• Distribution channels, the actual ways in which the company plans to reach its
customers.
Customer
Key Activities
Relationships
Key Value Customer
Partners Propositions Segments
documenting existing, business models—as is shown in Fig. 5.2. Each box of this
visual chart poses different questions, and the idea is to collect all the information
relevant to a business model for a specific purpose or area. We now list the
questions contained in each box; notice that these questions are not strictly separate,
but also connect the various boxes or aspects:
• Key Partners:
Who are our key partners?
Who are our key suppliers?
Which key resources are we acquiring from partners?
Which key activities do partners perform?
• Key Activities:
What key activities do our value propositions require?
Our distribution channels?
Customer relationships?
Revenue streams?
• Key Resources:
What key resources do our value propositions require?
Our distribution channels?
Customer relationships?
Revenue streams?
• Value Propositions:
What value do we deliver to the customer?
Which of our customer’s problems are we helping to solve?
What bundles of products and services are we offering to each customer
segment?
Which customer needs are we satisfying?
• Customer Relationships:
What type of relationship does each of our customer segments expect us to
establish and maintain with them?
Which ones have we established?
How are they integrated with the rest of our business model?
How costly are they?
• Channels:
Through which channels do our customer segments want to be reached?
How are we reaching them now?
How are our channels integrated?
Which ones work best?
Which ones are most cost-efficient?
How are we integrating them with customer routines?
• Customer Segments:
For whom are we creating value?
Who are our most important customers?
228 5 Digitization and Disruptive Innovation
• Cost Structure:
What are the most important costs inherent in our business model?
Which key resources are most expensive?
Which key activities are most expensive?
• Revenue Streams:
For what value are our customers really willing to pay?
For what do they currently pay?
How are they currently paying?
How would they prefer to pay?
How much does each revenue stream contribute to overall revenues?
In addition to these questions, each box lists further aspects or asks for various
details, e.g., motivations for partnerships, categories of key activities, types of
resources, characteristics of the value proposition or the cost structure, channel
phases, or types of revenue. The Business Model Canvas (strategyzer.com/canvas)
has received considerable support and can provide guidance for not only existing
companies who enter into new business areas, but also for startup companies who
need to better structure their product and business idea. As a sample application of
the canvas, consider Fig. 5.3, in which the Business Model Canvas has been
applied to Google as an example.
An important question for companies that have some history already, that have
been in their market for some time, and that have had a fair degree of success, is
how to produce innovation. How can they keep up with new technical develop-
ments, changing customer tastes and demands, or product or service innovations
made elsewhere by the competition? One way, which we will discuss in Sect. 5.3
below, is to disrupt an existing industry or business approach by a totally new and
unique way of doing things. However, as we will see, this might be difficult for an
established company that has to continue to pay its employees and that needs to
monitor market and competition in order to keep up.
In this situation, a different approach that has been made popular through a
variety of (successful) examples is to establish an innovation lab. Such a lab is a
physical or virtual space intended for the initiation, conception and testing of
innovative ideas. Based on some relevant infrastructure, an innovation lab enables
cooperation and collaboration among people from distinct backgrounds and disci-
plines, which may even include (future) customers. The main goal of such a lab is
the interdisciplinary exchange of ideas, information, and knowledge, and the
underlying approach is often based on design thinking. If an innovation lab is run
by an established company, it will ideally be integrated into the company’s inno-
vation process(es).
Innovation labs, which are often company-owned, but might also be run by a
collection of different entities, e.g., universities, business development agencies,
chambers of commerce, sponsors, etc., can be seen in a wider context as one form
of digital lab; other forms are:
Another type of innovation lab is the co-working lab, where typically people
with vastly distinct and different backgrounds are working towards a common
(innovation) goal. Instead of discussing these types in detail, rather, we look at a
few examples. We note, however, that these labs are often referred to as social
innovation labs due to the fact that they benefit from the social interaction of its
participants. Our first example is from Deutsche Bahn (DB) AG, the main railway
operator in Germany. Although in recent times DB is not exactly known for
punctuality and flawless technology, it has launched a number of initiatives that aim
to improve current processes and procedures or take a look at future developments.
One such virtual activity is Bahn.de/ideenschmiede, where customers can submit
new ideas and contribute to a development of new products for customers; the
platform regularly runs competitions in which anybody can participate, and people
can also comment on product suggestions that have been made. A recent (and
probably not too serious) proposal, for instance, suggested to add a sauna car to
German trains in which people can enjoy spa amenities during long-distance trips.
Web site inside.bahn.de/innovation/ lists a number of other DB approaches to
innovation. Our second example is the Porsche Digital Lab that was opened in
Berlin, Germany in August 2016 with two primary goals: firstly to identify and test
innovative information technologies relevant to motor vehicles, customers, and
staff. Secondly the lab is intended as a platform for collaboration with other tech-
nology companies, venture capitalists, startups, and science.
230 5 Digitization and Disruptive Innovation
Another form of lab is what German air carrier Lufthansa is experimenting with.
A Flying Lab is an event that takes place on board a flight and that is a kind of mini
tech conference, where distinguished people discuss future tech scenarios and how
human and machine are fusing though digitization (or designers present their latest
fashion collection) and passengers can follow (and comment) via the onboard
WLAN network.
A virtual lab similar to the one just described was established by Starbucks under
mystarbucksidea.force.com/, where people can publish ideas regarding an
improvement of Starbucks’ business, other people can comment and essentially
“vote” on them, and top-rated ideas might be put into action by Starbucks. For
example, business people who work near a Starbucks outlet and regularly have their
morning coffee there suggested a way to skip the line and have their coffee ready as
soon as they enter the shop. As a result, in some countries (including the UK)
Starbucks now enables customers to order over a smartphone app so that the
product is waiting for them at a specific pickup time.
The basic idea of social innovation labs is that the members of an innovation
community collaborate in a social network, to exchange ideas about how to
overcome identified disharmonies, to define objectives, strategies, product struc-
tures and requirements, to prepare models of business processes and services or
even “just” to find a common understanding of a disharmony and solution
requirements.
Participants perceive the lab as a unique experience in which they accomplish
tasks together as a team, take on responsibility and contribute new ideas. Group
dynamic processes empirically provide for quite some surprising results that may
help overcome barriers, identify compromises and strengthen the sense of com-
munity. The lab acts as a catalyst for creativity and willingness to compromise and
helps to form a common understanding of disharmonies, products, processes, and
services. Finally lab collaboration sometimes leads to exceptional situations in
which the participants have to deal with uncommon or incorrect behavior of col-
laboration partners, with poor quality of results, with misleading instructions etc.
In such a lab, the entire innovation community should be represented, ideally
including employees across all relevant organizational units of all hierarchical
levels, customers across all target market segments, strategic business partners
including suppliers, and external advisors (consultants, spin doctors, researchers).
This does not necessarily mean that there has to be one member from every
community group involved—often it is sufficient if a community member repre-
sents the interests of an entire group. It is necessary, however, that the represen-
tative provides a distinct understanding of the needs, feelings, or goals of the group
she or he represents.
Various roles should be represented in a social innovation lab: The Moderator
establishes an initial structure of the innovation sphere, in which s/he forms groups
(teams) of innovation community members. An essential task is to guide commu-
nity members through the lab. The Lab participants are allocated to the teams
according to their competence and expertise, developing multi-site innovation
teams. In each team a Leader will be identified who will supervise integrative tasks
5.1 Innovation. Social Innovation Labs 231
Fig. 5.4 Social innovation labs supporting invention and adoption processes
and take responsibility for the team results. Depending on the size of the Lab and
the knowledge of the participants, Quality managers are appointed for technical and
substantive review of the resulting models. Experts On Demand are available as a
point of contact for questions on methodology, modeling and tool use.
Social innovation labs particularly support the innovator‘s skills for social
interaction. And these skills are of paramount importance especially in generative
innovation environments. Therefore, there are interesting applications for Social
Innovation Labs during all stages of innovation, as shown in Fig. 5.4. They range
from supporting brainstorming activities to the experience of disharmony through
to modeling products, services and processes. By using Horus tools in the context
of the Labs, extensive analyses and simulations are possible with reasonable effort.
For service and process innovations, Horus also supports the construction of pro-
totypes and piloting of innovation. Of course, Social Innovation Labs also con-
tribute to the training members of the innovation community.
We conclude this section by noting that innovation labs have role models in
Tech Shops, a concept not restricted to IT applications: “TechShop provides access
to instructional classes, events, and over $1 million worth of professional equip-
ment and software at each location. Each of our facilities includes laser cutters,
plastics and electronics labs, a machine shop, a woodshop, a metalworking shop, a
textiles department, welding stations, class and conference rooms, and much more.
Members have open access to design software, featuring the Autodesk Design
Suite. Huge project areas with large work tables are available for completing
projects and working with others” (www.techshop.ws/).
232 5 Digitization and Disruptive Innovation
The emerging business environment today is characterized by the fact that every-
thing is becoming digital: Airlines tickets are not printed anymore; banking
transactions exclusively take place electronically (and banks even charge for
manual transactions); reading books or magazines have moved from physical prints
to digital versions on e-readers, ipads or even smartphones; music and movies have
moved from storage devices that can physically be handled to streaming over the
Web. While these developments can be seen as one of the many consequence of
Moore’s Law, as we discussed in Chap. 1 and of technological progress in general,
we are now in a stage where digital transformation is incurring change in every
aspect of business, and indeed almost every aspect of society. While initially per-
ceived as “going paperless” only (which has never happened to its full potential, at
least not until the time this book was written), it is nowadays seen as more than just
enhancing traditional methods of doing business (e.g., in travel agencies or banks);
it is seen as an enabler of innovation and new forms of creativity, no longer
restricted to a particular domain.
“Digital transformation is the profound and accelerating transformation of
business activities, processes, competencies and models to fully leverage the
changes and opportunities of digital technologies and their impact across society in
a strategic and prioritized way, with present and future shifts in mind. The devel-
opment of new competencies revolves around the capacities to be more agile,
people-oriented, innovative, customer-centric, aligned and efficient. The goal is an
ability to move faster from an increased awareness capability regarding changes to
decisions and innovation, keeping in mind those changes” (www.i-scoop.eu/digital-
transformation/). In Sect. 5.5 we will return to Amazon as a perfect example of
taking advantage of digitization and constant digital transformation.
In a blog post in the MIT Sloan Management Review in January 2014, George
Westerman, Didier Bonnet, and Andrew McAfee identify the nine elements of
digital transformation shown in Fig. 5.5 (see sloanreview.mit.edu/article/the-nine-
elements-of-digital-transformation/).
They consider digital transformation an opportunity “to radically improve per-
formance or reach of enterprises” and identify the three main building blocks shown
in Fig. 5.5. In Chaps. 3 and 4 we have already discussed the first aspect of trans-
forming the customer experience; indeed, we argued that companies are nowadays
interested in providing an absolutely smooth customer journey, and the digital
workplace is now the state-of-the-art. The second aspect of transforming opera-
tional processes was dealt with in Chaps. 2 and 4. The study performed by
Westerman et al. revealed that companies see automation as a way to free their
employees for more strategic tasks, and that the virtualization and digitization of
work enables them to separate actual work from the location where it is performed.
Moreover, we have already seen that Big Data analytics will enable decision makers
to make better decisions. The last aspect of transforming business models refers to a
transformation of how business is conducted, e.g., in the food or banking industry
5.2 Digital Transformation. The Chief Digital Officer 233
• Customer understanding
Transforming customer
• Top-line growth
experience
• Customer touch points
• Process digiƟzaƟon
Transforming operaƟonal
• Worker enablement
processes
• Performance management
or in retail, to the introduction of new digital products, e.g., digital tracking devices
that complement physical sports gear, and to an extension of the business from a
multinational to a truly global operation.
Let us continue with the examples just mentioned for a moment: Fintechs, a
modern shorthand for financial technology and used as a synonym for companies
using innovation and current technology for competing with traditional financial
institutions, originally tried to just reinvent payment applications, lending, and
money transfers. Meanwhile, they have expanded into more than thirty banking
areas, according to a January 2017 McKinsey report.2 These areas cover insurance,
investment banking, wealth management in addition to the ones already mentioned,
and they even cover issues beyond banking, such as virtual marketplaces or
couponing. Fintechs have also pioneered “robo advisors” like vaamo or Whitebox,
which are increasingly replacing human advisors in the banking business and have
gained popularity even with a number of traditional banks already. Typical
examples for innovative fintechs are Ripple (ripple.com/), Simple (www.simple.
com/), the Fidor Bank (www.fidor.de/), or solarisBank (www.solarisbank.de/). The
buzzword in all of these examples is “platform.” Successful participants of digiti-
zation have created platforms that offer new ways of doing business, but further-
more integrate a number of services previously unrelated to the business at hand.
In the food sector, stores and chains typically have an ordering problem when it
comes to fresh food: If they order too much, food is partially wasted; if they order
too little, they lose sales opportunities. Here, machine learning algorithms can help
retailers to determine their optimal stock levels, and we can expect that, once again,
Amazon will be at the forefront of new developments here when Amazon Machine
Learning gets applied to online grocery shopping service Amazon Fresh.
2
www.mckinsey.com/business-functions/digital-mckinsey/our-insights/three-snapshots-of-digital-
transformation
234 5 Digitization and Disruptive Innovation
3
www.london.gov.uk/sites/default/files/smart_london_plan.pdf
4
www.london.gov.uk/what-we-do/business-and-economy/science-and-technology/smart-london
5
www.data.london.gov.uk/
5.2 Digital Transformation. The Chief Digital Officer 235
Just like other core competencies in a company have brought along a number of
positions in the “CxO” domain over the years (e.g., CEO, CFO, COO, CIO, or
CTO), there is room for a new one called the Chief Digital Officer (CDO), who is an
individual helping an enterprise or government agency, either at a national,
regional, or even local level. They drive growth by converting traditional analog
businesses to digital ones using online technologies and data, and at times over-
seeing operations in the rapidly changing sectors like mobile applications, social
media and related applications, virtual goods, as well as Web-based information
management and marketing.
As we said earlier, top management needs to be on board when it comes to
digital transformation. A CDO will typically report directly to the CEO of a
company, and possibly also to its board of directors and its shareholders. A CDO
will be able to monitor the main opportunities of digitization: improved process
efficiency, more effective GRC, better response to customer needs, cost saving, and
the development of new business models.
5.3 Disruption
vehicles. Transportation was disrupted only later by the arrival of the Ford Model T
in 1908. Mass-production of cars was a disruptive innovation since, due to the new
production lines Ford installed, cars all of a sudden became affordable to many
people.
Christensen even has advice for companies on how to respond to disruption:
React to it when it happens, but do not overreact and don’t give up your established
business! Instead strengthen your relationships to the most important customers and
invest in innovation. In addition, they can create new business units focusing on the
opportunities of disruption, such as social innovation labs described earlier. For
some, the options to react are limited: The taxi business has no chance of competing
with the Uber model; the only action they could take to compete would be to buy
Uber, and since this will hardly happen, they can only continue to run their
established businesses for as long as possible. (Of course, there are other possible
actions, already being taken in some countries, namely trying lawsuits against Uber
or simply political lobbying.)
As an aside, we mention that Uber is not considered sustainable by a number of
people for various reasons. Among them is the city of Austin, Texas, which has
banned Uber and created its own service. “Unlike Uber, RideAustin is a non-profit.
It charges $2 off the top, and the driver keeps the entire fare—including tip. This
isn’t a model Uber can compete with, even though rides are competitively priced”.6
The following technologies are currently seen by many as the core technologies
for the immediate future upon which much of innovation and disruption will be
based:
• Mobile Internet
• Automation of knowledge work
• Internet of Things
• Self-driving cars
• Cloud technology
• Advanced robotics
• Genome research
• Energy storage
• 3D printing
• Unconventional oil and gas extraction
• Renewable energy
6
www.thenextweb.com/apps/2017/03/15/sxsw-showed-us-the-future-of-ride-sharing-and-its-not-
uber/
238 5 Digitization and Disruptive Innovation
Much has been written in previous chapters about making the customer vitreous by
utilizing all the data he or she creates or simply leaves behind. We discussed how
retailers like Amazon use data mining techniques to analyze what customers might
like to buy, how streaming services like Netflix try to recommend movies a sub-
scriber might like to watch, or how social networks like Facebook deeply analyze
user activities in order to show them the “right” advertisements. They all follow the
slogan “the more you fill in, the better your experience will be.” Indeed, the
experience on the Web today is that “innocently clicking on a link results in ad
targeting that’s hard to shake and our purchases quickly reveal more information
than we intend, such as the infamous example of Target knowing a woman is
pregnant before she’s told her family—and before she’s purchased any baby
products.”7
We also discussed how health or car insurers already are or soon will be utilizing
customer data in order to predict their individual risk and base premiums on the
resulting ratings. Various questions come to mind in light of this situation that is
close, if not already beyond Orwell’s “1984” vision:
We can immediately answer “no” or “possibly not” to the last question, since
although many sites allow the user to configure his or her own settings or they can
simply stay away from certain sites, a complete withdrawal would mean to give up
Internet usage as we know it. But if data collection and analysis continues as we
experience it today, and there is no reason to assume otherwise, some of the
implications will be the following:
7
www.fastcoexist.com/3057514/your-data-footprint-is-affecting-your-life-in-ways-you-cant-even-
imagine
5.4 The Price of Data. Publicity Versus Privacy 239
• Intensely personal data gets crunched in order to attract customers (or bribe or
blackmail them); take a look at stalkscan.com to find out what Facebook already
knows about you today, which may be more than your mother does, according
to a 2015 study.8 If you install browser extension dataselfie (dataselfie.it), it can
show you your own data traces and reveal how machine learning algorithms use
your data to gain insights about your personality while you are on Facebook.
• Early detection can mitigate catastrophes. This will be particularly beneficial for
earthquake forecasting and prediction and sites like www.quakeprediction.com,
which so far mostly explore statistics and evaluate computational models. By a
similar token, weather forecasts might become significantly more accurate than
they are currently.
The value of data can be assessed in various ways and from various perspectives,
such as the shareholder, the company, or the individual user. Examples for the first
option are easy to find: When Facebook bought WhatsApp for US$ 19 billion in
2014 that amounted to roughly US$ 30 for each of the network’s 600 million users
at the time. Facebook paid the same amount in 2012 for each Instagram user when
they were acquired. In 2016, however, when Microsoft bought LinkedIn for US$
26.2 billion, that price tag had already doubled (LinkedIn had about 433 million
users at the time, resulting in roughly US$ 60 per user). For Microsoft, this was still
highly beneficial, since it now got access to LinkedIn’s social graph, users’ location
and address information, as well as user interests and skills, all of which they did
not have before.
Companies often go through a data broker when buying user data, and a long list
of such brokers can, for example, be found at www.privacyrights.org. An example
is Gild (people.gild.com/), which transforms talent acquisition and hiring processes.
Gild identifies candidates who fit a job opening and analyzes factors that can predict
their success. It “has built a database of tens of millions of professionals that
contains data purchased from third-party providers plus ‘anything and everything
that’s publicly available’.” A consequence is that job applicants are often surprised
at how much an interviewer knows about them ahead of time.
Several sites meanwhile help people find out about the value of their data
themselves. This has been triggered by past breaches or unauthorized dissemination
of customer or user data, even when customers have especially paid for their
information to be kept private.9 One such site is www.totallymoney.com/personal-
data/, showing how cheap user data can be purchased. Another had been established
by the Financial Times in 2013 and allowed a user to go through categories like
demographics, family and health, property, activities, and consumer behavior to
determine the value of his or her data. Marketplaces for personal data that even
offered customers to exchange their data for money or benefits, such as Enliken or
Handshake, typically cannot be sustained for very long.
8
europe.newsweek.com/does-facebook-know-you-better-your-mother-or-roommate-299171
9
www.techcrunch.com/2015/10/13/whats-the-value-of-your-data/
240 5 Digitization and Disruptive Innovation
Which brings us to the issue of protecting privacy, which we will briefly discuss.
It is easy today to become a victim, e.g., of phishing or skimming, to become the
target of a virus or a Trojan horse which encodes the local hard-disk and requests a
“fee” for giving it back, i.e. ransomware. Worse, there are even tools like the
browser extension Web of Trust (WOT) that pretend to protect the user from unsafe
browsing, but then collects your browsing data in the background and even sells it
to third parties.10 So let us at least clarify a few things.
Privacy protection is not concerned with the protection of data, but with the
protection of people, while data security deals with the protection of data against
attacks, unauthorized access, or unintended errors or incidents. Privacy protection
protects against a misusage of personal data and is often regulated in domestic laws
(or at least design principles, see Further Reading for this chapter). Yet reality is
different, and even CEOs of Internet or computer companies have more than once
made it clear that privacy is no longer an option in today’s world. As Bruce
Schneier wrote in his blog on security issues in 2010, “we’re not Google’s cus-
tomers; we’re Google’s product that they sell to their customers.”11
So the important point is to be aware of the fact that privacy may be at stake in
whatever we do on the Internet and on the Web, to be aware that privacy and
personal data are valuable goods today which companies and organizations can
make money from if we do not take relevant precautions, and that it makes sense to
regularly monitor and update our privacy settings wherever we are active online
(and maybe even read the terms and conditions section that pops up whenever a
new registration is entered).
10
www.pcmag.com/news/349328/web-of-trust-browser-extension-cannot-be-trusted
11
www.schneier.com/blog/archives/2010/12/security_in_202.html
5.5 Towards Sharing and On-Demand Communities 241
12
www.huffingtonpost.com/jeremy-rifkin/uber-german-court_b_5758422.html
242 5 Digitization and Disruptive Innovation
While many more examples for platforms in the digital age can be found, it is
worth mentioning that their success is also due to a phenomenon that is related to
the concept of sharing: the on-demand society. As could be read in The Atlantic in
2016, “when today’s consumers want to watch a TV show, they can watch it when
they want on Netflix. When they want to buy household goods, they can order them
from Amazon, even when the stores are all closed. And when they want a car, they
can just book a Zipcar or hail an Uber, without owning a car.”13 This concept is no
longer restricted to the media or transportation, but has reached such qualified jobs
like lawyers (Upcouncel), programmers (Topcoder), consultants (Eden McCallum),
delivery personnel (Postmates), home butlers (Hello Alfred), or sales professionals
(Universal Avenue). Platform Upwork meanwhile connects 9.3 million freelancers
to 3.7 million companies worldwide.
At a more abstract level, “the On-Demand Economy is defined as the economic
activity created by technology companies that fulfill consumer demand via the
immediate provisioning of goods and services. Supply is driven via an efficient,
intuitive digital mesh layered on top of existing infrastructure networks. The
On-Demand Economy is revolutionizing commercial behavior in cities around the
world. The number of companies, the categories represented, and the growth of the
industry is expanding at an accelerating pace.”14
We conclude this section and chapter by returning to one of the examples of
modern enterprises that fit almost any aspect of Web shopping, cloud services,
digitization, technical aspects such as data mining or recommenders, and novel
business models: Amazon, already used as an example in more than one place in
this book, is also one of the core examples that serves the on-demand society.
Fastcompany has recently selected Amazon as the world’s most innovative com-
pany of 2017,15 and in their justification they give several reasons: First, Prime,
Amazon’s membership program that we already mentioned in Chap. 3, is con-
nected to almost all of Amazon’s recent innovations. It is used by an estimated
40–50 million people in the US alone, and besides preferred shipping of ordered
items or ad-free viewing of streaming video, “what Prime is selling most is time,”
according to Fastcompany: Whatever people want, they nowadays want it in the
shortest time window possible. Amazon meets this demand by same-day delivery or
Prime Air, but also by innovations like the Amazon Dash button, a gadget through
which Prime members can order the favorite products by pressing a button provided
by Amazon. If you cannot order via Dash, Amazon offers to do so via Alexa, “the
voice service that powers Echo, provides capabilities, or skills, that enable cus-
tomers to interact with devices in a more intuitive way using voice. Examples of
these skills include the ability to play music, answer general questions, set an alarm
or timer, and more. Alexa is built in the cloud, so it is always getting smarter. The
more customers use Alexa, the more she adapts to speech patterns, vocabulary, and
13
www.theatlantic.com/entertainment/archive/2016/06/the-on-demand-society/489257/
14
www.businessinsider.com/the-on-demand-economy-2014-7
15
www.fastcompany.com/most-innovative-companies/2017
244 5 Digitization and Disruptive Innovation
personal preferences.”16 Alexa can also take orders, such as “Alexa, reorder
toothpaste,” and scan through all the data that Amazon has collected about the
respective customer via Prime; so the system will know the kind of toothpaste to be
ordered.
To round out the picture on Amazon’s innovative business ideas, the company is
experimenting with Amazon Go, a concept for convenience stores in which a
shopper can swipe a code on his or her mobile phone when entering the store, and
everything taken from the shelve will thereafter be added to a digital cart that is
automatically paid for from an existing customer account upon exit. Thus, a cus-
tomer can skip both the line and cash register when done shopping. Another
shopping concept under development resembles what we have reported earlier
about Starbucks regarding the pickup of coffee previously ordered via an app;
customers of Amazon Fresh will be enabled to fill their digital carts remotely, pay
online, and pick up their purchase within a certain time window.
Last, not least, Amazon is opening brick-and-mortar book stores that “solve one
of the biggest problems with online shopping: discoverability”.17 Amazon Books
will represent data-driven book stores where customers will be likely to “pick up a
book that you didn’t know you wanted to read.” There are various differences to
traditional bookstores, like the fact that all books are facing out so that their cover
can be seen. All of them have received an online rating between 4.6 and 5 stars, and
the display even shows customer reviews. Clearly, a setup like this allows fewer
books to be held in the store, but this is compensated for by the fact that Amazon
already knows its customers in the vicinity of the store from online shopping
patterns, so that a store can tailor its selection to the local crowd; moreover, a
customer can always order a book not present through a terminal available in the
store. The bottom-line is that a company like Amazon not only has contributed
numerous inventions since the incarnation of the Web, but continues to do so when
it comes to bridging the physical and the virtual world.
16
www.developer.amazon.com/alexa
17
www.fastcodesign.com/3067020/with-amazon-books-jeff-bezos-is-solving-digital-retails-
biggest-design-flaw
5.6 Further Reading 245
The components of a business model go back to Michael Rappa and the Web site
digitalenterprise.org/ he created. Ovans (2015) is a good discussion of what a
business model is. A 2015 report on the Business Model Canvas can be downloaded
from the Strategyzer blog at blog.strategyzer.com/, although a more authentic source
is Osterwalder and Pigneur (2010). Gassmann et al. (2014) apply it to derive 55
different business models for a variety of areas. Various videos in which one can hear
Osterwalder speak can be found on YouTube. Modern innovation labs have a
famous precursor in Bell Labs which were founded in the 1920s; their success story
is described by Gertner (2012). Our discussion of social innovation labs follows
Schönthaler and Oberweis (2013). The business models of Airbnb, Uber, and others
and how they influence our world is the subject of Stone (2017).
Design Thinking is a discipline that has emerged from a variety of fields,
including such diverse ones as mechanical engineering, architecture, urban plan-
ning, organizational learning, or process improvement. It basically resembles an
innovation process or cycle in the style of Fig. 5.1, yet is more than just a creative
process. It is considered a “new way of seeing people in relation to work, of
imagining the concept of work and of posing questions about how we want to live,
learn, and work in the 21st century. The appeal of Design Thinking lies in its ability
to inspire new and surprising forms of creative teamwork” (hpi.de/en/school-of-
design-thinking/design-thinking.html). The approach has been made popular by
Stanford University’s d.school (dschool.stanford.edu/) and is described in books,
for example, by Lockwood (2009) or Yayiki (2016); for the latter see also www.
artbiztech.org. Peffers et al. (2006, 2007) have made the approach popular in
information systems research and break it down into the following phases (compare
these to Fig. 5.1, but also to what we have stated about social innovation labs):
why outstanding companies that had their competitive antennae up, listened closely
to customers, and invested aggressively in new technologies still lost their market
dominance. Using examples like the hard-disk drive industry, he argues that good
business practices can even weaken a great company. In the absence of break-
through innovations, which may initially be rejected by potential customers, many
enterprises let their most important innovations languish and then face the inno-
vator’s dilemma: On the one hand, keeping close to customers is critical for success
and survival, but the other, long-term growth and profits depend upon a different
managerial approach. In a similar direction argues Docherty (2015), when he says:
“If you ask professionals, especially executives within large companies, what
images and thoughts come to mind with the word ‘disruption’, it’s usually not good.
Disruption is too often thought of as something that you didn’t see coming—
something that happens to you by outside forces, especially by startups. It doesn’t
have to be that way. Collective Disruption is about changing that paradigm and
learning to embrace disruption through collaboration.”18 Thiel, a successful Silicon
Valley investor, and Masters (2014) present another way of thinking about
innovation.
Canadian data protectionist Ann Cavoukian has created the concept of “Privacy
by Design” which essentially means “privacy protection by technology” and aims
to guarantee that privacy protection is already “built into” the development of
technical devices, as opposed to being added later when the first breaches have been
discovered. The concept is based on seven principles intended to promote privacy
and data protection compliance from the very beginning. Although not a law,
Privacy by Design is often recommended, in particular from official sides like the
British Information Commissioner’s Office.19 As a starting on the issues of privacy
protection and data security, the reader may consult Bazzell and Carroll (2016).
Lindner (2016) is an introduction to the European Data Protection Law. Schneier
(2016) uses concrete examples of bad behavior with data, in order to alert people of
what is happening behind their backs, but also indicates what they can do.
On the positive side, it is undeniable that Big Data has the potential to expand
our understanding of humanity. A recent project in this direction is the Kavli
Human Project (www.kavlifoundation.org/kavli-human-project), a massive scien-
tific undertaking to launch a “study of all of the factors that make humans…
human.” The project plans to recruit 10,000 New York City residents in approxi-
mately 2,500 households and monitor and measure them 10 years.
Rifkin (2014) discusses the sharing economy. An interesting read regarding the
on demand society is www.atelier.net/en/trends/articles/how-demand-economy-
remodelling-society_440900. Wrap-ups of the history of Amazon are Brandt (2012)
or Stone (2014). Rossman (2016) is an introduction to the corporate culture of the
world’s largest Internet retailer. Keen (2015) criticizes the on-demand economy as
18
innovationexcellence.com/blog/2015/01/26/collective-disruption/
19
ico.org.uk/for-organisations/guide-to-data-protection/privacy-by-design/
5.6 Further Reading 247
Worldwide, economic activities are now largely driven by information and com-
munication technologies. Indeed, few areas of society remain untouched by the
disruptive impacts of ICT, and there is little doubt: we are not only rapidly heading
towards the digital economy, but to an entirely digital world as well. So the
question is: how do we want to live, learn, and work in this world? Future-focused
answers must be found to these questions. This calls for a worldwide cultural and
transnational, interdisciplinary discourse based on shared values, following the
vision of global welfare and harmony. This can only succeed if, in particular, the
economically strong countries on this planet accept their global responsibility for
humanity, trust, security, and accountability.
Since the dawn of ERP systems, the sole role of machine-generated data has
been to enable the proper execution of a supply chain. With the ongoing adoption of
cyber-physical systems and the Internet of Things, machine-to-machine commu-
nication is enabling collaborative shop floor planning, and machine generated
(big) data offers unprecedented value. This paradigm shift calls for new smart ERP
systems that make use of big data in (predictive) planning throughout the entire
value chain and create the transparency for improved governance, risk, security, and
compliance management. And even if companies think they have adapted to the
situation today, future change is almost certain. Therefore, we have to accept, and
respond to, what we can foresee now, and this is includes both small-scale (sensors,
beacons, etc.) technologies as well as those with broader impact (e.g., Internet of
Things, Industry 4.0). We could attempt to try to predict the future, but should be
careful with this for obvious reasons. We are better equipped (using the tech-
nologies that we have just discussed) than ever before to make such predictions, but
many uncertainties remain.
In this final chapter, we try to answer the question about what to expect from
living in a digital world, and we concede at the outset that without a crystal ball, we
cannot assure the reader that our prediction is any more accurate than any others
they read. We join a long line of “futurists” who have tried to predict the future of
technology. For example, Kahn and Wiener (1967) were among the most profound
© Springer International Publishing AG 2017 249
G. Vossen et al., The Web at Graduation and Beyond,
DOI 10.1007/978-3-319-60161-8_6
250 6 The Road Ahead: Living in a Digital World
of the earlier futurists and in their book got many predictions right, but apparently
underestimated the comprehensive influence of information technology and com-
puters. German tech giant Siemens had a study entitled “Horizons2020” conducted
by market researcher TNS Infratest, whose results were published in October 2004;
it outlined two possible scenarios of how life might look in 2020.
In the first scenario, they present a future where people are generally skeptical of
technology, and even explicitly create free space from what they term an “engi-
neered environment,” thereby accepting stagnation in the European economy.
A strong government guarantees education, security, and health for its citizens; both
genders are equally represented at all levels of leadership. However, many families
need second and third jobs, in order to finance their living, yet the society is
generally open to technical innovation. For example, growing environmental con-
sciousness enables a breakthrough for fuel cells, quantum computers become a
reality, and automated translation systems guarantee the survival of exotic local
languages. Data protection agencies have popularized the view that it is “uneco-
nomic” to connect objects of everyday life.
In the second scenario, market and competition determine the rules and the speed
of life. If you want to reach a high standard of living, you cannot waste time on
anything and vice versa. Government is reduced to core duties and it otherwise
leaves its citizens to manage their own lives. Traditional moral ideas have been
replaced by a pursuit of what best serves the individual. A result of this is high
social tension; a “relevant” number of people are living just above the breadline.
The retirement age (in Germany) has gone up to almost 70 years, but work life now
includes several interruptions for executive education or sabbaticals. Humanity is
unwilling to abandon nuclear power, and is happy to employ any technology
available for influencing human life even prior to birth. Ubiquitous Computing has
become a reality, and people are surrounded by a host of autonomous systems.
Looking at the results of this study when it is almost 2020, it is clear that both
scenarios have not become a reality, although present-day life has quite a few
commonalities with both. We want the reader to keep this in mind for the remainder
of this chapter, where we try to take a look into the near future and what it will bring
along in terms of the topics we have previously discussed.
In their bestseller “The Second Machine Age” Brynjolfsson and McAfee (2014)
vividly reveal their vision of the technological, social and economic changes that
we will need to adjust to. It comes as no surprise that such changes will not only
bring along winners, but also losers. Therefore, it is upon the political and social
powerful and elite to create a framework that opens future opportunities to the
broad population and mitigates unavoidable risks. Protectionist measures, which
result in restrictions on global trade flows and the subsequent reduction of the
global distribution of the value chain, and which see the solutions to structural
6.1 Cyber-Physical Systems and the Internet of Things 251
2nd industrial
1st industrial revolution: 3rd industrial 4th industrial
revolution: mass
mechanization, water & revolution: computers revolution: cyber-
production, assembly
steam power and automation physical systems
line, electricity
problems in industries from the 60s as well as an untamed financial industry are
certainly not suitable.
Industry now stands at the beginning of its 4th industrial revolution (see
Fig. 6.1). Via the evolution of the Internet, the real world and the virtual world are
increasingly converging, to form an Internet of Things (IoT). Turner (2016) defines
the IoT as a “network of networks of uniquely identifiable endpoints – or things –
that communicate without human interaction.” International Data Corporation
(IDC), a global provider of market research, predicts “that the worldwide installed
base of IoT endpoints, or connected devices, will reach nearly 30 billion by 2020,
representing a compound annual growth rate (CAGR) of 19.2%.”
Consider this current example. “The tomato has an iron deficiency” is what app
Plantix recognizes, and it immediately suggests a cure. Plantix is a disease diag-
nostic and monitoring tool developed by German startup PEAT (short for Pro-
gressive Environmental & Agricultural Technologies). Plantix studies photos, like
in the case of the tomato, where their leaves were lacking color and its veins stick
out in green, and discovers the net pattern that is typical for an iron deficiency. The
app has been trained on more than 1,500 images of plants with that deficiency, and
it is now familiar with more than 40 plants and more than 100 deficiencies. While
the program is currently used from smartphones, the vision of its developers is that
in the future solar-powered robots will plow through fields in order to discover pest
plants with their built-in cameras and destroy them.
Plantix is just one glimpse into a future that will (fortunately) look considerably
different from what science fiction movies have been promising us for ages. We are
experiencing the arrival of a generation of machines that are heavily based on
algorithms, and that are able to conclude and think faster and more reliable than a
human ever could. Computers can search patient files and doctors’ statements for
rare diseases. They compute the credibility of bank customers and decide on the
investments of rich people. They automatically maneuver cars into parking spots
and apply the brakes to avoid accidents.
252 6 The Road Ahead: Living in a Digital World
1
www.en.wikipedia.org/wiki/Industry_4.0
2
www.youtube.com/watch?v=lJpnoRHba_Y
6.1 Cyber-Physical Systems and the Internet of Things 253
3
www.perseusmirrors.com/
4
www.raconteur.net/current-affairs/singapore-the-robot-city
254 6 The Road Ahead: Living in a Digital World
5
www.itu.int/en/ITU-T/about/groups/Pages/sg20.aspx
6.2 The Smart Factory and Industry 4.0 255
The Internet of Things (IoT) and cyber-physical systems (CPS) are the techno-
logical basis for the digitalization of traditional industries such as manufacturing
and logistics. As discussed above, a CPS is a system of collaborating IT elements,
designed to control physical (mechanical, electronic) objects and processes. Com-
munication takes place via data infrastructure such as the Internet. Traditional
embedded systems can be considered as a special case of a stand-alone CPS. The
key characteristics of the industrial production of the future will include production
of extensively individualized products, within highly flexible production environ-
ments, early-stage integration of customers and business partners within design and
value-creation processes, and linking of production and high-quality services to
yield “hybrid products”. The New High-Tech Strategy: Innovations for Germany of
the German Federal Ministry of Education and Research6 compiles and describes
the following potentials of Industry 4.0:
6
www.hightech-strategie.de/de/The-new-High-Tech-Strategy-390.php
256 6 The Road Ahead: Living in a Digital World
• Work Life Balance: The more flexible work organization models of companies
that use CPS mean that they are well placed to meet the growing need of
employees to strike a better balance between their work and their private lives
and also between personal development and continuing professional
development.
Industry 4.0 connects people, machines, and objects in the Internet of Things and
paves the way to new production concepts and fully integrated, digital value chains.
Traditional ERP systems are now reaching their limits in terms of planning and
managing corporate resources. We now need Smart ERP systems, which are
available as software services from the cloud.
In an environment where digital transformation is becoming a reality in more
and more companies, a new dimension of digitization opens up: Now objects and
machines are digitally accessible at any time and from anywhere via sensors and
SIM cards. The usage and the network of autonomously acting CPS tap new
potential for the automation of production and logistics processes. But above all,
they create the conditions for new processes and services. The keyword here is
social manufacturing and logistics. Vertically and horizontally integrated, highly
digitized value chains with the consequence of higher complexity, as well as an
increasing degree of decentralization and self-organization are the result. The
resulting demands are what modern Smart ERP systems have to meet.
Conventional ERP systems use machine-generated data solely to ensure a
smooth running supply chain. With the use of networked cyber-physical systems, a
collaborative, decentralized machine-level control is now possible, and
machine-generated data represents a value on its own as Big Data. Smart ERP
systems must be able to employ big data across the entire value chain for strategic,
tactical and operational planning. Also, they must create the transparency required
by today’s enterprise management, namely Governance, risk, compliance, and
security management. Despite the required high efficiency, Smart ERP systems
must be easy to use, fast, high-performing, cost-effective and especially safe and
accessible at all work stations, even mobile ones, along the value chain. This
explicitly includes external partners such as customers, suppliers, original equip-
ment manufacturers (OEMs) and service partners. Manufacturers of Smart ERP
systems meet these requirements by offering their systems as a software service
(Software as a Service, SaaS) from either public, private, or hybrid clouds.
With the claim of a technological infrastructure, in which all electronic devices can
communicate with each other, IoT is a new paradigm with applications in all areas
of life. Rifkin (2014) describes “how the Communication Internet is converging
with a nascent Energy Internet and Logistics Internet to create a new technology
platform that connects everything and everyone. Billions of sensors are being
attached to natural resources, production lines, the electricity grid, logistics
6.2 The Smart Factory and Industry 4.0 257
networks, recycling flows, and implanted in homes, offices, stores, vehicles, and
even human beings, feeding big data into an IoT global neural network. Prosumers
(producer + consumer) can connect to the network and use big data, analytics, and
algorithms to accelerate efficiency, dramatically increase productivity, and lower
the marginal cost of producing and sharing a wide range of products and services to
near zero, just like they now do with information goods.” And in view of the rapid
pace with which digitization penetrates into all areas of life, it becomes clear how
important not only technological questions are, but also considerations regarding
privacy and ethical aspects in terms of data sensing, storing and processing. Which
begs the question: What are the limits for the application of artificial intelligence?
Or in other words: What degree of autonomy do we want machines to really have?
A current example is autonomous driving; current discussion is focused on partially
autonomous driving, although the majority of the technological challenges for full
autonomy have already been solved.
Undoubtedly the industrial application of the IoT is already the most advanced.
Interesting examples can be found in Gilchrist (2016). Based on the scenario shown
in Fig. 6.2, the following shows how IoT is designed to tap into value chains in
their entirety. In the figure, intelligent CPS communicate with each other and with
conventional IT systems. The figure shows IoT-based communication along the
value chain, i.e. that between supplier and carrier, then between shipper and pro-
ducer, and finally between the producer and his customer. Already by utilizing this
type of communication, a digital transformation of the value chain takes place
resulting in enormous potential for improvement. However, perhaps even more
important are the newly emerging communication channels such as between sup-
plier and customer, which give the supplier an insight into inventory of salable
products directly on the shelf at the point-of-sale, so that he can pro-actively
Authorities
Business Partners
Employees
Field Sales
Manufacturing &
Intralogistics
Field Service
Customers
Suppliers
Transportation
(inbound) Transportation
(outbound)
Here, intelligent production systems will create intelligent products that can be
identified at any time and that can be localized. They will know their current status
and they will be able to submit this information together with their provenance, i.e.,
all states that they have been through as part of their life cycle. Important for
agent-based self-organization is that they know their options for the path to com-
pletion, i.e., to a certain extent, they carry their own production plan within
themselves. And in addition to intelligent products, intelligent machines and tools,
6.2 The Smart Factory and Industry 4.0 259
From a consideration of the changes that lie ahead, it can be deduced that con-
ventional monolithic ERP systems offer little to Industry 4.0. In the future, appli-
cation functionality will not only be used within the own company, but will become
decentralized where decisions are made by business partners, customers, suppliers;
in short, by all partners in the value chain. For this, the functionality must be
provided much more fine-grained than existing ERP systems would be capable of
today. If we add compliance, security, risk and governance requirements to this
functional deterministic requirement, then it becomes obvious that Web services
deployed in a cloud are the most logical solution. In addition, more and more
functionality will be used on mobile devices, which can serve more effectively in a
transformed worker reality with an increase in mobile work than stationary devices.
Cyber-Physical Production Systems (CPPS)
A basic requirement in relation to Smart ERP systems is that they must be able to
communicate efficiently with physical processes, which process the material and
energy flows. Kagermann et al. (2013) define the structure of a cyber-physical
production system (CPPS) as shown in Fig. 6.3. A CPPS is the use of CPS in the
manufacturing industry. Basically, a CPPS consists of intelligent machines, storage
systems, and resources, which exchange information independently, trigger actions,
and control each other independently. It enables the continuous viewing of prod-
ucts, means of production and production systems while considering constantly
changing processes.
set meter
CPS
Information Processing Information Processing Information Processing
meter
set
energy
Physical Process Physical Process Physical Process
substance
The key to the communication between the Smart ERP system and the physical
processes is the virtualization of these processes in the CPS. The CPS is equipped
with sensors and actuators for measurements and site operations, which impact the
respective physical process directly. This creates a virtual representation of the
physical process in the CPS, on which the communication between the CPS with
the Smart ERP system then takes place.
An obvious approach to the design of a Smart ERP system could come back to a
centralized control of the entire system by means of the communication taking
place between the ERP system and all CPS’s. Such an approach can often be found
in today’s practice. However, in such application scenarios, we should not be
talking about a Smart ERP system. On the one hand, it is assumed that the sheer
number of CPS’s integrated with an overall system leads to such high complexity
that a central control of the system is no longer viable. Moreover, Fig. 6.3 shows
that the CPS’s do not only communicate with the superordinate ERP system, but
also with each other. This means that autonomous units are created, which make
decentralized decisions in collaboration with each other. In these decision pro-
cesses, the Smart ERP system often only serves for the transmission of information,
the adjustment of overall planning, and ensuring governance and compliance.
Fundamental Requirements for Smart ERP Systems
From looking at the Smart ERP system as part of a CPPS, a number of basic
requirements arise. In formulating these requirements we draw on design principles,
which have been collected for Industry 4.0 scenarios in Hermann et al. (2015).
Collectively, they define the essential requirements that have to be met by
Smart ERP systems:
public data
CPS
SaaS
IoT
Middle-
ware
SaaS
IoT
Middleware
CPS
CPS
“Platforms beat products every time”. This often used quote by MIT Professor
Marshall van Alstyne may provide the reason why the IoT software platform
market is so competitive. The best-selling “Platform Revolution” by Parker et al.
(2016) describes a platform as a market square, which defines an ecosystem in
which players assume different roles (producers, consumers, suppliers, owners)
which they can switch quickly over time. The marketplace offers a powerful
infrastructure and rules for relationships and transactions between the players.
Parker et al. interpret the IoT as “worldwide platform of platforms” and as a driver
of the platform revolution. This assessment is underscored by an IDC forecast that
assumes 29.5 billion devices that will be connected with the IoT by 2020 (see
Turner 2016).
There is an abundance of IoT software platforms available on the market, but
also the variety of features that are provided by these platforms is considerable. This
is due to the different origins of the providers, as shown by a recent Forrester
study.8 Forrester analysts studied 11 providers of IoT software platforms in terms of
their current offerings, strategy and market presence. IBM (Watson IoT Platform),
PTC (ThingWorx), GE (Predix), and Microsoft (Azure IoT Suite) have been
identified as market leaders. Amazon Web Services (AWS IoT Platform), SAP
(SAP HANA Cloud Platform IoT Services) and Cisco (Cisco Jasper Control
Center) were considered “strong providers”. This study can be seen as a snapshot of
the rapid development of available platforms in conjunction with numerous cor-
porate acquisitions. Especially in Germany, it is becoming evident how
market-leading industrial companies are turning into IoT platform providers: Bosch
(Bosch IoT suite), Siemens (MindSphere IoT Operating System) and TRUMPF
(AXOOM IoT platform), just to name a few.
7
www.industrialdataspace.org/en/
8
www.forrester.com/report/The+Forrester+Wave+IoT+Software+Platforms+Q4+2016/-/E-
RES136087
6.2 The Smart Factory and Industry 4.0 265
Fig. 6.7 IoT software platforms, integrating edge devices with enterprise systems (source: Turner
2016)
• Connect: create and manage the link from the device to the Internet
• Secure: protect IoT devices, data, and identity from intrusion
• Manage: control the provisioning, maintenance, and operation of IoT devices
• Analyze: transform data into timely, relevant insight and action
• Build: create applications and integrate with enterprise systems
We consider two sample platforms next which have already achieved a significant
market presence and are continually developed by innovative and economically
strong manufacturers.
IBM Watson IoT Platform
The Watson IoT platform is an adaptable, scalable and open IoT platform, which is
deployed as a service from the IBM Bluemix® cloud infrastructure. It “convinces”
through its functional diversity, including capabilities such as augmented reality,
cognitive capabilities, Blockchain technology, edge analytics, analytics tooling, and
natural language processing, next to the usual basic functionalities. In addition,
IBM is expanding their IoT application range through corporate acquisitions,
including, for example, the weather forecast platform The Weather Company. The
Forrester analysts highlight IBM’s commitment to open source standards and their
extensive global partner ecosystem as strengths. Details of the IBM Watson IoT
solution architecture can be found at www.ibm.com/internet-of-things/platform/
watson-iot-platform/.
266 6 The Road Ahead: Living in a Digital World
integrations for Oracle’s PaaS and SaaS products can provide IoT data safely and
easily in enterprise applications.
Just like their competitors, Oracle is increasingly opting for strategic partner-
ships with the aim of developing special application solutions and vertical industry
packages. One such example is the cooperation with Bosch Rexroth AG, a pioneer
in the application and development of Industry 4.0 solutions. With the integration
of industrial automation technologies by Bosch Rexroth, the Oracle IoT platform
forms the bridge from the Smart Factory to the world of enterprise systems.
6.2.4 Summary
Industry 4.0 networks people, machines, and objects in the Internet of Things and
paves the way to new production concepts and fully integrated, digital value chains.
They extend across company boundaries and form the basis for the collaboration
between customers, suppliers, engineering, production, and service partners. Tra-
ditional ERP systems are now reaching their limits in terms of planning and
managing corporate resources. We now need Smart ERP systems, which are
available as software services from the cloud. In addition to the cloud, mobile and
big data analytics are the enabling technologies.
Smart ERP systems must be able to communicate efficiently with physical pro-
cesses that process material and energy flows. This communication takes place via
CPS, which are equipped for measurement and site operations in the process with
sensors and actuators, which participate directly in the process. This creates a virtual
representation of the physical process in the CPS, on which then the communication
between the CPS with the Smart ERP system takes place. A proposal for an archi-
tecture where the resulting core system of networked CPS is embedded into an overall
system meets the requirements of a modern Smart ERP system: interoperability,
virtualization, decentralization, real-time capability, service orientation, modularity
(properties generally required for any modern enterprise software system).
IoT platforms play an increasingly key role in the market. The IBM Watson IoT
platform and the Oracle IoT Cloud Service are two platforms that have already
achieved a significant market presence and are continually developed by innovative
and economically strong manufacturers. The capacity for collaborative innovation
involving customers, suppliers and development partners that paves the way for
horizontally and vertically integrated industry solutions defines the overall
competition.
Turner (2016) features interesting results from IDC studies from 2014 and 2015,
demonstrating that the enterprise sector is increasingly seen as the driver for IoT
investments, leaving behind the consumer sector. And nearly 60% of the companies
emphasize the strategic importance of the IoT for their company’s future. IoT is
increasingly becoming a business subject, and modern Smart ERP systems make a
valuable contribution in unlocking the potential of the IoT through improved
process automation, faster and better quality decision support, and a better customer
experience.
268 6 The Road Ahead: Living in a Digital World
With all we have said about digitization, the Internet and the Web, as well as their
implications for practically all areas for everyday private as well as professional
life, we finally look at several topics every one of us will be confronted with in the
years to come and that will make the road (of the Web) partially bumpy and
partially smooth.
9
www.weforum.org/agenda/2017/01/3-predictions-for-the-future-of-retail-from-the-ceo-of-
walmart/
6.3 Towards the E-Society 269
Much can regularly be found in the press about the disappearance of traditional
labor and about the fact that many of today’s jobs will soon vanish. Indeed, when
we look at an area like car manufacturing, we can immediately see that lots of
manual work is nowadays done by robots (a fact that also holds for other fields of
manufacturing, as implied by Fig. 6.1). As the Daily Mail already warned in 2014,
“50 per cent of occupations will be redundant in 11 years’ time.”10 The study on
which the article was based continued that “experts believe half of today’s jobs will
be completely redundant by 2025; Artificial intelligence will mean that many jobs
will be done by computers; customer work, process work and middle management
will ‘disappear’; … workspaces with rows of desks will no longer exist.”
We mentioned earlier the new generation of robot advisors that many banks are
putting forward, a typical example of a personal advisor being replaced by an
algorithm or a collection of algorithms hidden behind a Web interface. In the same
category falls Wipro HOLMES,11 an AI platform based on cognitive computing
that can serve a number of applications, including helpdesks, diagnosis, shopping,
insurance claims, or maintenance. Amelia by IPsoft is advertised as “your first
digital employee … a cognitive agent who can take on a wide variety of service
desk roles and transform customer experience,”12 and its inventors see businesses
such as insurance, banking, health, retail, or government as potential applications.
So while many consider these prospects a threat, correct predictions about the
future of work are difficult, and we refer the reader interested in more than just
articles on the buzzword “future of work” to the recordings from the 10th De Lange
Conference on “Humans, Machines, and The Future of Work” that took place at
Rice University in Houston, Texas in December 2016, which can be found at
delange.rice.edu/conference_X/videos.html.
When it comes to Industry 4.0, to the use of IoT across entire value chains, when
Smart Factories are established, or when modern Smart ERP systems are to be
introduced, one must not forget the human factor. An article by Constanze Kurz
states, “Industry 4.0 is understood as a socio-technical system, which does not only
need new technical but also new social infrastructures in order to be implemented
successfully” (see Botthoff and Hartmann 2014). The latter authors also refer to a
survey conducted by the German Fraunhofer Institute in 2013, where experts from
economy and practice were asked to make their 5-year forecasts on the importance
of human work (planning, controlling, execution, monitoring). The result was
10
www.dailymail.co.uk/news/article-2826463/CBRE-report-warns-50-cent-occupations-redundant-
20-years-time.html
11
www.wipro.com/holmes/
12
www.ipsoft.com/amelia/
270 6 The Road Ahead: Living in a Digital World
abundantly clear: 97% of the respondents thought that human work will remain
very important (60.2%) or important (36.6%). And even for the non-producing
parts of the value chain the human labor and skill will continue to play an important
role, especially when it comes to the issue of customer experience.
Technological Unemployment and New Skills
However, one cannot deny that the progressive digitization will destroy many jobs or
at least fundamentally transform them. This leads to fears in the population, which
are often ignored by the economic and political elites and act as door openers for
populists of all shapes and sizes. On this subject, a report in Issue 36 of 2016 of
German news magazine DER SPIEGEL identifies a number of high-risk work areas:
• Independence
• Participation
• Variability
• Complexity
• Communication/cooperation
• Feedback and information
• Avoiding time pressure
In May 2014, the New York Times ran an interesting feature entitled “A Vision of
the Future from Those Likely to Invent It,” in which people like Marc Andreessen,
Peter Thiel, Susan Wojcicki and others could share their vision of the future, see
www.nytimes.com/interactive/2014/05/02/upshot/FUTURE.html. The inventor of
272 6 The Road Ahead: Living in a Digital World
the Web, Tim Berners-Lee, recently stated his worries about the future of the Web,
which in his view are mainly due to three trends13:
He continues to say that “these are complex problems, and the solutions will not be
simple. But a few broad paths to progress are already clear. We must work together
with web companies to strike a balance that puts a fair level of data control back in
the hands of people, including the development of new technology such as personal
‘data pods’ if needed and exploring alternative revenue models such as subscriptions
and micropayments. We must fight against government overreach in surveillance
laws, including through the courts if necessary. We must push back against misin-
formation by encouraging gatekeepers such as Google and Facebook to continue
their efforts to combat the problem, while avoiding the creation of any central bodies
to decide what is ‘true’ or not. We need more algorithmic transparency to understand
how important decisions that affect our lives are being made, and perhaps a set of
common principles to be followed.”
A lot has been written recently about the Internet of Things, its opportunities,
technologies, impacts, and threads, some of which we have already mentioned in
the text. Georgakopoulos and Jayaraman (2016) provide a research perspective.
Readers interested in broader perspective should consult Greengard (2015), Chou
(2016), Buyya and Dastjerdi (2016), or Raj and Raman (2017). Obviously, since
IoT is still in an embryonic state, there is a lot of discussion not just about tech-
nology, but also about social and ethical aspects; for starters, we refer to Berman
and Cerf (2017).
Rifkin (2014) studies the implications of the Internet of Things, states a para-
digm shift from market capitalism to collaborative commons, and predicts the zero
marginal cost society. The latter is on the horizon, for example, through 3D
printing, which makes production of anything a local business, or MOOCs (Mas-
sive Open Online Courses, made popular through platforms like Coursera, Edu-
cause, or edX, which are online courses intended for worldwide participation and
open access via the Web. One of the authors’ of this book personal vision is that, in
a not-too-distant future, we will buy cars as data sets comprising digital represen-
tations of all equipment (standard or additional) we are willing to pay for, which
then gets either delivered to the buyer or directly to a print shop where the car gets
printed; if delivered to the buyer, (copying and) reselling needs to be regulated,
while in both cases warranty claims or selling data (i.e., cars) as “used” need new
concepts.
A different look into the future is offered by Carr (2016), who opposes the idea
that the future is only about technology and data; in his view, technology has not
only enriched, but also imprisoned us and hence puts it in perspective.
13
www.theguardian.com/technology/2017/mar/11/tim-berners-lee-web-inventor-save-internet
6.4 Further Reading 273
Stieglitz and Greenwald (2014) present and study the modern learning society.
Raconteur ran a feature in April 2016 entitled “The Future Workplace,” see www.
raconteur.net/the-future-workplace. Samit (2015) presents stories from people like
Richard Branson, Steve Jobs, or Elon Musk as well as from companies like
YouTube, Cirque du Soleil, or Odor Eaters and shows how personal transformation
can reap entrepreneurial and other rewards. Kelley and Kelley (2013) show how to
“unleash the creativity within us” in order to keep up with the modern world and its
developments.
References
Achenbach, J. (2015): Driverless cars are colliding with the creepy Trolley Problem; The
Washington Post, December 29, 2015 (see https://www.washingtonpost.com/news/
innovations/wp/2015/12/29/will-self-driving-cars-ever-solve-the-famous-and-creepy-trolley-
problem/).
Agarwal, D.K., B.-C. Chen (2016): Statistical Methods for Recommender Systems. Cambridge
University Press, New York.
Aggarwal, C.C. (2016): Recommender Systems – The Textbook. Springer Cham, Heidelberg,
Germany.
Agrawal, D., S. Das, A. El Abbadi (2012): Data Management in the Cloud: Challenges and
Opportunities. Synthesis Lectures on Data Management, Morgan & Claypool Publishers, San
Francisco, CA.
Agrawal R., T. Imielinski, A. Swami (1993): Mining association rules between sets of items in
very large databases. Proc. ACM SIGMOD International Conference on Management of Data,
207–216.
Agrawal, R., R. Srikant (1994): Fast Algorithms for Mining Association Rules. Proc. 20th
International Conference on Very Large Data Bases, 487–499.
Akl, S.G. (1989): The Design and Analysis of Parallel Algorithms; Prentice-Hall, Inc.,
Englewood-Cliffs, NJ.
Alpaydin E. (2016): Machine Learning. The MIT Press. Cambridge, MA.
Anderson, Ch. (2006): The Long Tail. Why the Future of Business is Selling Less of More;
Hachette Book Group, Lebanon, Indiana. See also the following Web page of Wired Magazine
for a short article: www.wired.com/wired/archive/12.10/tail.html.
Andriole, S.J. (2010): Business Impact of Web 2.0 Technologies. Communications of the ACM 53
(12) 67–79.
ASTRI (2016). Whitepaper on Distributed Ledger Technology. Commissioned by the Fintech
Facilitation Office (FFO) of Hong Kong Monetary Authority (HKMA). https://www.astri.org/
tdprojects/whitepaper-on-distributed-ledger-technology/, retrieved 15 April, 2017.
Baeza-Yates, R., B. Ribeiro-Neto (2011): Modern Information Retrieval: The Concepts and
Technology behind Search, 2nd edition. Addison-Wesley, Reading, MA.
Barabasi, A.-L. (2016): Network Science. Cambridge University Press, Cambridge, UK.
Barker, D. (2016): Web Content Management: Systems, Features, and Best Practices. O’Reilly
Media, Sebastopol, CA.
Battelle, J. (2005): The Search – How Google and Its Rivals Rewrote the Rules of Business and
Transformed Our Culture. Portfolio (Penguin Group), New York.
Bazzell, M., J. Carroll (2016): The Complete Privacy & Security Desk Reference: Volume I:
Digital. CreateSpace Independent Publishing, Seattle.
Berman, F., V. Cerf (2017). Social and Ethical Behavior in the Internet of Things. Communi-
cations of the ACM 60 (29) 2017, 6–7.
Berners-Lee, T. (2000): Weaving the Web: The Original Design and Ultimate Destiny of the World
Wide Web. HarperCollins Publishers, New York.
Bilton, N. (2014): Hatching Twitter: A True Story of Money, Power, Friendship, and Betrayal.
Portfolio Penguin, New York.
Blei, D.M. (2012): Probabilistic topic models. Communications of the ACM 55 (4), 77–84.
Blöbaum, B. (ed.) (2016): Trust and Communication in a Digitized World: Models and Concepts
of Trust Research; Springer, Berlin.
Bonnington, C. (2015) http://www.wired.com/2015/02/smartphone-only-computer/, Retrieved 13
May, 2016.
Borgatti, S.P., M.G. Everett, J.C. Johnson (2013): Analyzing Social Networks. SAGE Publications
Ltd., Thousand Oaks, CA.
Botthoff, A., E.A. Hartmann, eds. (2014): Future of Work in Industry 4.0. Springer Vieweg Berlin
Heidelberg (in German).
Boutros, T., T. Purdie (2014). The Process Improvement Handbook: A Blueprint for Managing
Change and Increasing Organizational Performance. McGraw-Hill Education, New York.
Brabham, D.C. (2013): Crowdsourcing. The MIT Press, Boston, MA.
Brandt, R.L. (2012): One Click: Jeff Bezos and the Rise of Amazon.com. Protfolio Penguin,
London, UK.
Brin, S., L. Page (1998): The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Computer Networks 30, pp. 107–117.
Brynjolfsson, E., A. McAfee (2014): The Second Machine Age – Work, Progress, and Prosperity
in a Time of Brilliant Technologies. W. W. Norton & Company, New York, London.
Büttcher, S., C.L.A. Clarke, G.V. Cormack (2010): Information Retrieval – Implementing and
Evaluating Search Engines. The MIT Press, Cambridge, MA.
Buyya, R., A.V. Dastjerdi, eds. (2016): Internet of Things – Principles and Paradigms. Morgan
Kaufmann, Cambridge, MA.
Carr, N. (2008): The Big Switch: Rewiring the World, from Edison to Google. W. W. Norton &
Company, New York.
Carr, N. (2016): Utopia Is Creepy and Other Provocations. W. W. Norton & Company, New
York.
Castro-Leon, E. (2014). Consumerization in the IT Service Ecosystem. IEEE IT Professional 16
(5), pp. 20–27.
Chang, F., J. Dean, S. Ghemawat et al. (2008): Bigtable: A Distributed Storage System for
Structured Data. ACM Transactions on Computer Systems (TOCS) 26 (2), Article No. 4.
Chou, T. (2016): Precision: Principles, Practices and Solutions for the Internet of Things.
Lulu.com Crowdstory.
Chow, R., P. Golle, M. Jakobsson et al. (2009). Controlling Data in the Cloud: Outsourcing
Computation without Outsourcing Control, In Proc. ACM Workshop on Cloud Computing
Security (CCSW), Chicago, IL.
Christensen, C.M. (1997): The Innovator’s Dilemma: When New Technologies Cause Great Firms
to Fail. Harvard Business School Publishing, Boston, MA.
Codd E.F. (1970). A relational model of data for large shared data banks. Communications of the
ACM, 13, pp. 377–387.
Codd E.F. (1982). Relational databases: a practical foundation for productivity. Communications
of the ACM, 25, pp. 109–117.
Cohen, A. (2002): The Perfect Store: Inside eBay. Little, Brown and Company, Boston, MA.
Cohen, H. (2011): What is Social Commerce? http://heidicohen.com/what-is-social-commerc/,
Retrieved 26 March, 2017.
Corbett, J.C., J. Dean, M. Epstein et al. (2012): Spanner: Google’s Globally-Distributed Database.
Proc. 10th USENIX Symposium on Operating System Design and Implementation (OSDI),
251–264.
References 277
Cumbie, B. A., B. Kar (2016): A Study of Local Government Website Inclusiveness: The Gap
Between E-Government Concept and Practice. Information Technology for Development 22
(1), 15–35.
Das Sarma, A., A. Parameswaran, J. Widom: (2016). Towards globally optimal crowdsourcing
quality management: The uniform worker setting. In Proceedings of the 2016 International
Conference on Management of Data, 47–62.
De Kare-Silver, M. (1998): E-Shock: The Electronic Shopping Revolution: Strategies for Retailers
and Manufacturers. New York: AMACOM.
Deakins, E., S. Dillon, H. Al Namani (2008): Local e-Government Development Philosophy in
China, New Zealand, Oman, and the United Kingdom. Proc. 6th International Conference on
E-Government: ICEG 2008. Academic Conferences Limited, 109.
Dean, J., S. Ghemawat (2008): MapReduce: simplified data processing on large clusters.
Communications of the ACM 51 (1), 107–113.
Dekel, E., D. Nassimi, S. Sahni (1981): Parallel Matrix and Graph Algorithms; SIAM Journal on
Computing 10, 657–675.
Deloitte (nd): The digital workplace: Think, share, do Transform your employee experience, http://
www2.deloitte.com/content/dam/Deloitte/mx/Documents/human-capital/The_digital_
workplace.pdf, Retrieved 8 May 2016.
Deloitte & Touche (2001): The Citizen As Customer. CMA Management 74 (10), 58.
Denning, P.J., R. Dunham (2010): The Innovator’s Way: Essential Practices for Successful
Innovation. The MIT Press, Cambridge, MA.
Diedrich H. (2016). Ethereum – Blockchains, Digital Assets, Smart Contracts, Decentralized
Autonomous Organizations. Wildfire Publishing.
Dillon, S., G. Vossen (2015): SaaS Cloud Computing in Small and Medium Enterprises: A
Comparison between Germany and New Zealand; International Journal of Information
Technology, Communications and Convergence 3 (2), 87–104.
Docherty, M. (2015): Collective Disruption: How Corporations & Startups Can Co-Create
Transformative New Businesses. Polarity Press, Boca Raton, FL.
Drescher, D. (2017). Blockchain Basics – A Non-Technical Introduction in 25 Steps. Apress
Springer Science + Business Media, New York.
Elmasri R.A., S.B. Navathe (2016). Fundamentals of Database Systems, 7th ed. Boston, MA.
Pearson Addison-Wesley, Boston, MA.
Emarketer.com (2014): Mobile Commerce Trends, https://www.emarketer.com/Webinar/Mobile-
Commerce-Trends/4000088, Retrieved 26 March, 2017.
Englert, M., S. Siebert, M. Ziegler (2014): Logical Limitations to Machine Ethics with
Consequences to Lethal Autonomous Weapons; Computing Research Repository (CoRR),
November 2014 (see http://arxiv.org/abs/1411.2842).
Erl, T. (2005). Service-Oriented Architecture (SOA): Concepts, Technology, and Design.
Prentice-Hall, Upper Saddle River, NJ, USA.
Erl, T. (2009). SOA Design Patterns. Prentice-Hall, Upper Saddle River, NJ, USA.
Erl, T., R. Puttini, Z. Mahmood (2013): Cloud Computing: Concepts, Technology & Architecture.
Prentice Hall, Upper Saddle River, NJ.
Festl R., T. Quandt (2016): The Role of Online Communication in Long-Term Cyberbullying
Involvement among Girls and Boys. Journal of Youth and Adolescence, 45 (9), 1931–1945.
Fitzgerald B., K. Stol (2015): The Dos and Don’ts of Crowdsourcing Software Development, Proc.
SOFSEM 2015, LNCS 8939, 58–64.
Fouss, F., M. Saerens, M. Shimbo (2016): Network Data and Link Analysis. Cambridge University
Press, Cambridge, UK.
Friedman, T.L. (2005): The World is Flat – A Brief History of the Twenty-First Century. Farrar,
Straus and Giroux, New York.
Friedman, T.L. (2016): Thank You for Being Late – An Optimist’s Guide to Thriving in the Age of
Accelerations. Farrar, Straus and Giroux, New York.
278 References
Friedman, T.L., M. Mandelbaum (2011): That Used to Be Us: How America Fell Behind in the
World It Invented and How We Can Come Back. Farrar, Straus and Giroux, New York.
Ganesan, R. (2014): Li-Fi Technology in Wireless Communication, Communications Engineering
Papers, Madras Institute of Technology, Anna University, http://www.yuvaengineers.com/li-fi-
technology-in-wireless-communication-revathi-ganesan/ Retrieved 10 December 2016.
Gartner (2016): Gartner Says Worldwide Smartphone Sales to Slow in 2016, http://www.gartner.
com/newsroom/id/3339019 Retrieved 8 August 2016.
Gassmann, O. K., Frankenberger, M. Csik (2014): The Business Model Navigator: 55 Models That
Will Revolutionise Your Business. Pearson Education Ltd., Harlow, UK.
Gertner, J. (2012): The Idea Factory: Bell Labs and the Great Age of American Innovation.
Penguin Group, New York.
Georgakopoulos, D., P.P. Jayaraman (2016): Internet of things: from internet scale sensing to
smart services. Computing 98, 1041–1058.
Gilbert, S., N. Lynch (2002): Brewer’s conjecture and the feasibility of consistent, available,
partition-tolerant web services. ACM SIGACT News 33 (2), 51–59.
Gilchrist, A. (2016): Industry 4.0: The Industrial Internet of Things. Apress Media (Springer
Nature).
Girvan M., M.E.J. Newman (2002). Community structure in social and biological networks. Proc.
National Academy of Science of the USA 99, 7821–7826.
Goransson, P., Ch. Black (2017): Software Defined Networks: A Comprehensive Approach, 2nd ed.
Morgan Kaufmann Publishers, Cambridge, MA.
Grandinetti, L. (2006): Grid Computing: The New Frontier of High Performance Computing.
Elsevier Science, Amsterdam, The Netherlands.
Grant, G., D. Chau (2006): Developing a generic framework for e-government. Advanced Topics
in Information Management 5, 72–94.
Greengard, S. (2015): The Internet of Things. The MIT Press, Cambridge, MA.
Greengard, S. (2017): The Future of Semiconductors. Communications of the ACM 60 (3) 2017,
18–20.
Hammer, M. and Champy, J.A. (1993, revised edition Dec. 2003). Reengineering the
Corporation: A Manifesto for Business Revolution. New York: Harper Collins Publishers.
Han, J., M. Kamber, J. Pei (2012): Data Mining: Concepts and Techniques, 3rd edition; Morgan
Kaufmann Publishers, San Francisco, CA.
Harold, E. R., W. S. Means (2004): XML in a Nutshell, 3rd edition. O’Reilly Media, Sebastopol,
CA.
Haselmann, T., G. Vossen (2014): EVACS: Economic Value Assessment of Cloud Sourcing by
Small and Medium-sized Enterprises; EMISA Forum 1/2014, 18–31 (available at http://www.
emisa.org/index.php/publikationen/forum/item/46-2014-1).
Haselmann, T., G. Vossen, S. Dillon (2015): Cooperative Hybrid Cloud Intermediaries — Making
Cloud Sourcing Feasible for Small and Medium-sized Enterprises; Open Journal of Cloud
Computing (OJCC) 2 (2) 2015, 4–20.
Haselmann, T., G. Vossen, St. Lipsky, Th. Theurl (2011): A Cooperative Community Cloud for
Small and Medium Enterprises; Proc. 1st International Conference on Cloud Computing and
Service Science (CLOSER) 2011, Noordwijkerhout, The Netherlands, SciTePress Science and
Technology Publications, 104–109.
Hermann, M., Pentek, T., Otto, B. (2015): Design Principles for Industrie 4.0 Scenarios: A
Literature Review. Working Paper No. 1/2015, TU Dortmund, Audi Stiftungslehrstuhl Supply
Net Order Management. (http://www.thiagobranquinho.com/wp-content/uploads/2016/11/
Design-Principles-for-Industrie-4_0-Scenarios.pdf).
Hoare, A., R. Milner, eds. (2004): Grand Challenges in Computing Research. The British
Computer Society, Swindon, Wiltshire, UK.
Hopcroft, J.E., R.M. Karp (1973): An n5/2 algorithm for maximum matchings in bipartite graphs.
SIAM Journal on Computing 2 (4), 225–231.
References 279
Howe, J. (2009): Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business.
Three Rivers Press, New York.
Hunter, R.C., C. Eng (1975): Engine Failure Prediction Techniques. Aircraft Engineering and
Aerospace Technology 47 (3), 4–14.
Internet Retailer 2016 Mobile 500 (2015): https://www.digitalcommerce360.com/2015/08/18/
mobile-commerce-now-30-all-us-e-commerce/, Retrieved 26 March, 2017.
Jannach, D., P. Resnick, A. Tuzhilin, M. Zanker (2016): Recommender Systems – Beyond Matrix
Completion. Communications oft he ACM 59 (11), 94–102.
Juels, A., A. Oprea (2013): New approaches to security and availability for cloud data.
Communications of the ACM 56 (2), 64–73.
Kagermann, H., W. Wahlster, J. Helbig, eds. (2013): Recommendations for implementing the
strategic initiative Industrie 4.0: Final report of the Industrie 4.0 Working Group. (http://www.
acatech.de/fileadmin/user_upload/Baumstruktur_nach_Website/Acatech/root/de/Material_
fuer_Sonderseiten/Industrie_4.0/Final_report__Industrie_4.0_accessible.pdf).
Kahn, H., A.J. Wiener (1967): The year 2000: A framework for speculation on the next thirty-three
years. Macmillan.
Kalyanasundaram, B., K.R. Pruhs (2000): An optimal deterministic algorithm for b-matching.
Theoretical Computer Science 233 (1–2), 319–325.
Kaplan, R.S., D.P. Norton (1996): The Balanced Scorecard: Translating Strategy into Action.
Harvard Business Review Press, Boston, MA.
Karp, R. (1992): On-line algorithms versus off-line algorithms: How much is it worth to know the
future? Proc. 12th IFIP World Computer Congress, 416–429.
Kavis, M.J. (2014): Architecting the Cloud: Design Decisions for Cloud Computing Service
Models. John Wiley & Sons, Hoboken, NY.
Keen, A. (2015): The Internet Is Not the Answer. Grove Atlantic, Inc., New York.
Keese, C. (2016): The Silicon Valley Challenge: A Wake-Up Call for Europe. Penguin Books,
London, UK.
Kelleher, J.D., B. MacNamee, A. D’Arcy (2015): Fundamentals of Machine Learning for
Predictive Data Analytics. The MIT Press, Cambridge, MA.
Kelley, T., D. Kelley (2013): Creative Confidence – Unleashing the Creative Potential in All of
Us. Crown Business, New York.
Key, S. (2015): XML Programming Success in a Day, 2nd edition. CreateSpace Independent
Publishing Platform.
Kietzmann, J.H., K. Hermkens, I.P. McCarthy, B.S. Silvestre: Social media? Get serious!
Understanding the functional building blocks of social media; Business Horizons (2011) 54,
241–251.
Kirkpatrick, D. (2011): The Facebook Effect: The Real Inside Story of Mark Zuckerberg and the
World’s Fastest Growing Company. Virgin Books, New York.
Kittur, A., J.V. Nickerson, M. Bernstein, et al. (2013): The future of crowd work. In Proceedings of
the 2013 conference on Computer supported cooperative work, 1301–1318.
Kostojohn, S., B. Paulen, M. Johnson (2011): CRM Fundamentals. Apress Springer Science +
Business Media, New York.
Lance, D., M.E. Schweigert (2013): BYOD: Moving toward a More Mobile and Productive
Workforce. Business & Information Technology. Paper 3; Montana Tech Library.
Langville, A.N., C.D. Meyer (2012): Google’s PageRank and Beyond – The Science of Search
Engine Rankings. Princeton University Press, Princeton, NJ.
Lanier, J. (2014): Who Owns the Future? Simon and Schuster, New York.
Laudon, K.C., C.G. Traver (2015): E-Commerce: Business, Technology, Society, 11th edition.
Prentice-Hall, Englewood-Cliffs, NJ.
Lebraty, J.-F., K. Lobre-Lebraty (2013): Forms of Crowdsourcing, in Crowdsourcing, John Wiley
& Sons, Inc., Hoboken, NJ USA. doi: 10.1002/9781118760765.ch4.
280 References
Lechtenbörger, J., F. Stahl, V. Volz, G. Vossen (2015): Analyzing Observable Success and
Activity Indicators on Crowdfunding Platforms; International Journal of Web Based
Communities (IJWBC) 11 (3–4), 264–289.
Lemstra, W., V. Hayes, J. Groenewegen (2010). The innovation journey of Wi-Fi: The road to
global success. Cambridge University Press.
Leskovec, J., A., Rajaraman, J.D. Ullman (2014): Mining of Massive Datasets, 2nd ed. Cambridge
University Press.
Levene, M. (2010): An Introduction to Search Engines and Web Navigation, 2nd edition. John
Wiley & Sons, New York.
Lewis, D.D. (1992): Representation and learning in information retrieval. PhD thesis, University
of Massachusetts, Amherst, MA.
Lewis, M. (2004): Moneyball: The Art of Winning an Unfair Game. W.W. Norton & Company,
New York.
Lindner, A., ed. (2016): European Data Protection Law: General Data Protection Regulation
2016. CreateSpace Independent Publishing, Seattle.
Liu, B. (2015): Sentiment Analysis – Mining Opinions, Sentiments, and Emotions. Cambridge
University Press, Boston, MA.
Lockwood, T. (2009): Design Thinking: Integrating Innovation, Customer Experience, and Brand
Value. Allworth Press, New York.
Lofaro, R. (2014). The Business Side of BYOD: Cultural and Organizational Impacts. Amazon
Media.
Loos, P., J. Lechtenbörger, G. Vossen, et al. (2011): In-memory Databases in Business
Information Systems; Business Information Systems Engineering 6, 389–395.
Lu, J., D. Wu, M. Mao, W. Wang, G. Zhang (2015): Recommender System Application
Development: A Survey. Decision Support Systems 74, 12–32.
MacManus, R. (2015): Health Trackers: How Technology is Helping Us Monitor and Improve
Our Health. Rowman & Littlefield, Lanham, MA.
MacPherson, I. (1995): Co-operative Principles for the 21st Century. International Co-operative
Alliance, Geneva.
Mahon, E. (2015): Transitioning the Enterprise to the Cloud: A Business Approach. Cloudworks
Publ. Co., Hudson, OH.
Marr, B. (2016): Big Data: Using SMART Big Data, Analytics and Metrics To Make Better
Decisions and Improve Performance. John Wiley & Sons Ltd, Chichester, UK.
Marsden, P. (2009): The 6 Dimensions of Social Commerce: Rated And Reviewed, Digital
Intelligence Today, http://digitalintelligencetoday.com/the-6-dimensions-of-social-commerce-
rated-and-reviewed/, Retrieved 26 March, 2017.
Marshall, S. (2014): What a Digital Workplace Is and What It Isn’t. CMS Wire, http://www.
cmswire.com/cms/social-business/what-a-digital-workplace-is-and-what-it-isnt-027421.php,
Retrieved 9 May 2016.
Mayer-Schönberger, V., K. Cukier (2013): Big Data: A Revolution That Will Transform How We
Live, Work, and Think. John Murray (Publishers), London, UK.
McCaskill, S. (2015): Zapp: Don’t Worry, Mobile Payments ‘Safer’ Than Shopping Online, 2017.
http://www.silicon.co.uk/e-marketing/zapp-mobile-payments-security-161630, Retrieved 26
March 2017.
McConnell, J. (2016): The Organization in the Digital Age, 10th Annual Report. http://www.
netjmc.com/digital-workplace-report/, Retrieved 9 May 2016.
Mcguire, K. (2014): SWOT analysis 34 Success Secrets. Emereo Publishing.
McKeen, J.D., H.A. Smith (2014): IT Strategy: Issues and Practices, 3rd ed. Pearson, Boston,
MA.
Mehta, A., A. Saberi, U. Vazirani, V. Vazirani (2005): Adwords and generalized on-line matching.
IEEE Symp. on Foundations of Computer Science, 264–273.
References 281
Mell, P., T. Grance (2011): The NIST Definition of Cloud Computing. Techn. Report SP800-145,
National Institute of Standards and Technology (NIST), Gaithersburg, MD. http://nvlpubs.nist.
gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf (as of Jan 8, 2017).
Miller, M. (2009): Google.pedia – The Ultimate Google Resource, 3rd edition. Que Publishing,
Indianapolis, IN.
Miller, K., J. Voas, G. Hurlburt (2012). BYOD: Security and Privacy Considerations. IT
Professional, 14 (5), 53–55.
Morrow, B. (2012). BYOD security challenges: control and protect your most sensitive data,
Network Security 2012 (12), pp. 5–8.
Muschalle, A., F. Stahl, A. Löser, G. Vossen (2013): Pricing Approaches for Data Markets. in M.
Castellanos, U. Dayal, E. Rundensteiner (Eds.): BIRTE 2012 (Proc. 6th International Workshop
on Business Intelligence for the Real Time Enterprise 2012, Istanbul, Turkey), Springer LNBIP
154, 129–144.
Musciano, C., B. Kennedy (2006): HTML & XHTML: The Definitive Guide, 6th edition. O’Reilly
Media, Sebastopol, CA.
Nakamoto, S. (2008): Bitcoin: A Peer-to-Peer Electronic Cash System. Available online for
example from https://bitcoin.org/bitcoin.pdf. Retrieved 15 April, 2017.
Norman, D. A. (1999): The Invisible Computer: Why Good Products Can Fail, the Personal
Computer Is So Complex, and Information Appliances Are the Solution. The MIT Press,
Boston, MA.
Norris, D.F., C.G. Reddick (2013): Local E‐Government in the United States: Transformation or
Incremental Change? Public Administration Review 73 (1), 165–175.
Özsu, M.T., P Valduriez (2011). Principles of Distributed Database Systems, 3rd ed. Springer, New
York.
Olim, J., M. Olim, P. Kent (1999): The CDnow Story: Rags to Riches on the Internet. Top Floor
Publishing, Lakewood, CO.
Oliveira, G.H.M., E.W. Welch (2013): Social media use in local government: Linkage of
technology, task, and organizational context. Government Information Quarterly 30 (4), 397–
405.
Osterwalder, A., Y. Pigneur (2010): Business Model Generation: A Handbook for Visionaries,
Game Changers, and Challengers. John Wiley & Sons, Hoboken, NJ.
Ovans, A. (2015): What is a Business Model? Harvard Business Review. https://hbr.org/2015/01/
what-is-a-business-model (last visited Feb 22, 2017).
Parker, G.G., M.W. Van Alstyne, S.P. Choudary (2016): Platform Revolution: How Networked
Markets are Transforming the Economy – and How to Make them Work for You. W. W. Norton
& Company, Ney York, London.
Patel, S., Park, H., Bonato, P., Chan, L., and Rodgers, M. (2012): A review of wearable sensors
and systems with application in rehabilitation. Journal of Neuroengineering and rehabilitation 9
(1), p. 1.
Payne, A. (2005): Handbook of CRM: Achieving Excellence in Customer Management.
Butterworth-Heinemann Elsevier, Amsterdam.
Peffers, K., T. Tuunanen, C.E. Gengler, M. Rossi, W. Hui, V. Virtanen, J. Bragge (2006): The
Design Science Research Process: A Model for Producing and Presenting Information Systems
Research. Proc. 1st Int. Conf. on Design Science Research in Information Systems and
Technology, 83–106.
Peffers, K., T. Tuunanen, M.A. Rothenberger, S. Chatterjee (2007): A Design Science Research
Methodology for Information Systems Research. Journal of Management Information Systems
24 (3), 45–77.
Petri, C.A. (1962): Communication with Automata. Dissertation, Schriften des
Rheinisch-Westfälischen Instituts für Instrumentelle Mathematik an der Universität Bonn,
Germany, Heft 2 (in German).
Plattner, H., A. Zeier (2015): In-Memory Data Management: Technology and Applications, 2nd ed.
Springer-Verlag, Berlin.
282 References
Schönthaler, F., G. Vossen, A. Oberweis, T. Karle (2012): Business Processes for Business
Communities: Modeling Languages, Methods, Tools. Springer Heidelberg Dordrecht London
New York.
Schomm, F., F. Stahl, G. Vossen (2013): Marketplaces for Data: An Initial Survey; ACM
SIGMOD Record 42 (1), 15–26.
Scott J. (2013): Social Network Analysis, 3rd edition. SAGE Publications Ltd. Thousand Oaks, CA.
Seelig, T. (2012): inGenius: A Crash Course on Creativity. HarperCollins Publishers, New York.
Seelig, T. (2015): Insight Out: Get Ideas Out of Your Head and Into the World. HarperCollins
Publishers, New York.
Shan, S., L. Wang, J. Wang, Y. Hao, F. Hua (2011): Research on e-government evaluation model
based on the principal component analysis. Information Technology and Management 12 (2),
173–185.
Shasha, D., M. Wilson (2010): Statistics is Easy! 2nd ed. Synthesis Lectures on Mathematics and
Statistics, Morgan & Claypool Publishers, San Francsico, CA.
Sitaram, D., G. Manjunath (2012): Moving To The Cloud: Developing Apps in the New World of
Cloud Computing. Elsevier Syngress.
Stahl, F., F. Schomm, G. Vossen (2014): Data Marketplaces: An Emerging Species; H.-M. Haav,
A. Kalja, T. Robal, eds.: Databases and Information Systems VIII, Frontiers in Artificial
Intelligence and Applications Series, Vol. 270, IOS Press, Amsterdam, 145–158.
Stahl, F., F. Schomm, G. Vossen, L. Vomfell (2016): A Classification Framework for Data
Marketplaces; Vietnam Journal of Computer Science 3 (3), 137–143.
Steffen, D. (2013): Parallelized Analysis of Opinions and their Diffusion in Online Sources.
Master’s thesis, University of Münster, Germany, January 2013.
Stiglitz, J.E., B.C. Greenwald (2014). Creating a Learning Society. A New Approach to Growth,
Development, and Social Progress. Columbia University Press, New York.
Stieglitz, S., L. Dang-Xuan (2013). Social media and political communication: a social media
analytics framework. Social Network Analysis and Mining, 3 (4), 1277–1291.
Stone, B. (2014): The Everything Store: Jeff Bezos and the Age of Amazon. Little, Brown and Co.,
New York.
Stone, B. (2017): The Upstarts: How Uber, Airbnb, and the Killer Companies of the New Silicon
Valley Are Changing the World. Little, Brown and Company, New York.
Swenson, K. D. (2010). Mastering the Unpredictable: How Adaptive Case Management Will
Revolutionize the Way That Knowledge Workers Get Things Done. Tampa, FL: Meghan-Kiffer
Press.
Tanenbaum, A. S., D.J. Wetherall (2010): Computer Networks, 5th edition. Pearson Education Inc.,
Boston, MA.
Tanenbaum, A. S, M. van Steen (2007): Distributed Systems – Principles and Paradigms, 2nd
edition. Prentice-Hall, Upper Saddle River, NJ.
Thackray, A., D. Brock, R. Jones (2015): Moore's Law: The Life of Gordon Moore, Silicon
Valley'’s Quiet Revolutionary. Basic Books, New York.
The Verge. (2013): “Google Glass apps: everything you can do right now”, http://www.theverge.
com/2013/5/20/4339446/google-glass-apps-everything-you-can-do-right-now, last accessed:
2016/05/25.
Theurl, T., E.C. Meyer, eds. (2005): Strategies for Cooperation. Shaker-Verlag, Aachen, Germany.
Thiel, P., B. Masters (2014): Zero to One: Notes on Startups, or How to Build the Future. Crown
Business, New York.
Tian, Y., B. Song, E.N. Huh (2011). Towards the development of personal cloud computing for
mobile thin-clients. Proc. IEEE International Conference on Information Science and
Applications (ICISA), 1–5.
Turner, V. (2016): Reducing the Time to Value for Internet of Things Deployments. IDC,
Framingham, MA, USA.
Venters, W., E.A. Whitley (2012): A critical review of cloud computing: researching desires and
realities. Journal of Information Technology 27 (3), 179–197.
284 References
Vise, D.A. (2005): The Google Story – Inside the Hottest Business, Media and Technology Success
of Our Time. Macmillan, London.
Vom Brocke, J., M. Rosemann (2015a): Handbook on Business Process Management 1:
Introduction, Methods, and Information Systems, 2nd ed. Springer Heidelberg.
Vom Brocke, J., M. Rosemann (2015b): Handbook on Business Process Management 2: Strategic
Alignment, Governance, People and Culture, 2nd ed. Springer Heidelberg.
Vom Brocke, J., Th. Theresa Schmiedel (2015): BPM – Driving Innovation in a Digital World.
Springer Berlin.
Vossen, G. (2014): Big Data as the New Enabler in Business and other Intelligence. Vietnam
Journal of Computer Science 1 (1), 1–12.
Watts, D.J. (2004): Six Degrees: The Science of a Connected Age. W.W. Norton & Company,
New York.
Weikum, G., G. Vossen (2002): Transactional Information Systems – Theory, Algorithms, and the
Practice of Concurrency Control and Recovery; Morgan Kaufmann Publishers, San Francisco,
CA.
White, M. (2012). Digital workplaces Vision and reality. Business information review, 29(4), 205–
214.
White, T. (2015): Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale. O’Reilly
Media, Sebastopol, CA.
Williamson, O. (2005): Economics of Interfirm Networks, chapter Networks - Organizational
Solutions to future challenges. Ökonomik der Kooperation. Mohr Siebeck, Tübingen, 3–28.
Wirtz, B.W., P. Nitzsche (2013): Local level E-government in international comparison. Journal of
Public Administration and Governance 3 (3), 64–93.
Witten, I. H., E. Frank, M.A. Hall, C.J. Pal (2016): Data Mining – Practical Machine Learning
Tools and Techniques, 4th edition. Morgan Kaufmann Publishers, San Francisco, CA.
wpress4.me (2013): The 7 Species of Social Commerce, http://www.wpress4.me/the-7-species-of-
social-commerce/, Retrieved 26 March 2017.
Yayiki, E. (2016): Design Thinking Methodology Book. Design Management Institute,
ArtBizTech, Istanbul, Turkey.
Yotpo. (n.d.): The 4 Most Powerful Social Commerce Trends, https://www.yotpo.com/blog/the-4-
most-powerful-social-commerce-trends/, Retrieved 26 March 2017.
Zaki, M.J., W. Meira Jr. (2014): Data Mining and Analysis. Cambridge University Press, New
York.
Zimmer, M. (2010). Facebook’s Zuckerberg: “Having two identities for yourself is an example of
a lack of integrity”, http://www.michaelzimmer.org/2010/05/14/facebooks-zuckerberg-having-
two-identities-for-yourself-is-an-example-of-a-lack-of-integrity/, last accessed 2014/12/11.
Index
A Watch, 29
Accelerator, 229 Appliance, 98
ACID, 84 Application programming interface (API), 73
Adams, Douglas, 38 Application service provider (ASP), 72
Adaptive case management (ACM), 58 Apriori algorithm, 94, 175
Ad-blocker, 142 Artificial intelligence (AI), 252, 257, 269, 271
Adtech, 141 ArtistShare, 46
Advertising, 112 Asana, 127, 197
AdWords, 133, 135 Association rule mining, 170, 171, 173
Airbnb, 35, 77, 235, 242, 245, 283 Astroturfing, 115
Allegro, 92 Asynchronous javaScript and XML (Ajax), 9
Allstate Drivewise, 118 Atlassian confluence, 127
Amazon, 92, 113, 126, 143, 185, 226, 243 Auction, 141, 225
Alexa, 243, 252
Aurora, 68 B
AWS, 67, 68, 78, 80, 81, 86, 87, 114, 115, BackRub, 16
168, 264 Balance algorithm, 141
.com, 11 Balanced scorecard, 159
Elastic Compute Cloud (EC2), 68 Banner ads, 134
Prime video, 79 Beane, Billy, 116
Redshift, 68 Berners-Lee, Tim, 4, 7, 22, 50, 272
Simple Storage Service (S3), 68 Betweenness, 131
Mechanical Turk (AMT), 44, 47 Bezos, Jeff, 11
Web Services (AWS), 67, 114 Big data, 34, 82, 84, 154, 256, 257, 261
Andreessen, Marc, 3, 271 BigTable, 92
Android, 28, 34 Bilateral cloud intermediary, 165
AngularJS, 9 Bing, 33
AOL, 17 Bipartite graph, 138
Apache, 7, 28, 169, 182 Bitcoin, 109
Hadoop, 91 Blockchain, 109, 203
Software Foundation, 168 Blog, 36, 127, 179
Apple, 2, 30, 50, 77, 79, 96, 114, 224, 225, 236 Blogging, 36
iCloud, 78 Bluetooth, 26
iOS, 28 Bluetooth low energy (BLE), 26
iPad, 28 Bosch IoT suite, 266
iPhone, 28, 34 Bragi Dash, 31
iPod, 28 Branson, Richard, 273
iTunes, 225 Brewer, Eric, 93
Siri, 252 Brin, Sergey, 16
NuoDB, 92 PewResearchCenter, 22
Phablet, 29
O PHP, 8
Oakland athletics, 116 Pinterest, 126
OKCupid, 80 Pitt, Brad, 116
Olim brothers, 11 Plantix, 251
Olson, Ken, 95 Platform, 46, 67, 77, 242, 262, 263
Omidyar, Pierre, 110 Platform as a service (PaaS), 76
On-board diagnostics (OBD), 118 Pokemon go, 23
On-demand society, 243 Porsche digital lab, 229
OneSpace, 168 Portal, 17
Online advertising, 134 position2, 132
Online algorithm, 137, 138 PostgreSQL, 68
Online Analytical Processing (OLAP), 81, 84 Predictive analytics, 186
Open-source, 168 Prensky, Marc, 40
Opera, 2 PRISM, 254
Oracle, 21, 262 Privacy, 240
Applications, 61 Private cloud, 78
Cloud, 67 Process mining, 203
Cloud machine, 78 Profile, 143–145
Database appliance, 99 Progressive Snapshot, 119
Engineered system, 97, 258 PROMATIS BPM Appliance, 100
Exadata database machine, 96 Prosumer, 257
Exalogic elastic cloud, 97 Public cloud, 78
Exalytics, 97, 98
Fusion ERP cloud, 62 R
IoT cloud service, 266 R, 94
TimesTen, 98 Raconteur, 273
Organization for the Advancement of Rating, 143
Structured Information Standards Razor and blades, 225
(OASIS), 22 ReadWriteWeb, 1, 31
Original Equipment Manufacturing (OEM), Recommendation, 107, 112, 142
213 Recommender system, 94
Osterwalder, Alexander, 226 Reducer, 89
Outsourcing, 72 Relational database service (RDS), 68
Overture, 135, 141 Relational data model, 84
Replication, 86, 88
P Responsive design, 106
Packet, 21 RFID tags, 32
Page, Larry, 15, 16 Risk management, 64
PageRank, 15, 37, 88 Ritchie, Dennis, 168
Partitioning, 88 Robo advisor, 233
PayPal, 10, 106, 112
Pay-per-click, 133 S
Pay-per-unit, 79 Safari, 2
Pay-per-use, 79 Salesforce, 67
Pebble watch, 46 Samsung Bixby, 252
Peer-to-peer (P2P), 6, 21 SAP, 21, 93
Personal computer (PC), 19 HANA, 93, 102, 258, 261, 266
Personal digital assistants (PDA), 33 SAS, 94
Personal digital cellular (PDM), 25 Scalability, 71
Personalization, 123 SCOR, 211
Petri, Carl Adam, 101 Schneiderman, Eric T., 115
Petri net, 54, 55, 57, 101 Search, 13
Index 291
Undirected graph, 129 What you see Is what you get (WYSIWYG), 5
Uniform resource identifier (URI), 9 Wi-Fi, 25
Universal resource locator (URL), 13 Wi-Fi protected access (WPA), 25
UNIX, 168 Wiki, 37, 127
Urban Engines, 120 Wikimedia, 40
User generated content (UGC), 35 Wikipedia, 38
UserLand, 37 WikiWikiWeb, 38
User-to-user recommendation, 148 Wired, 42, 44
Utility computing, 74 Wired equivalent privacy (WEP), 25
Utility function, 144 Wireless local area network (WLAN), 25
Utility matrix, 144 Wojcicki, Susan, 271
WordPress, 37
V Workflow Management Coalition (WfMC), 58
Value chain, 257 Work life balance, 256
Vector, 145 World Economic Forum (WEF), 50
Viber, 122 World-Wide Web (WWW), 1, 2
Virtualization, 72, 260 World-Wide Web Consortium (W3C), 3, 22
Virtual lab, 230 WPA2, 25
Virtual machine monitor, 73
Virtual machine (VM), 73 X
Virtual memory, 72 XAMPP, 7
Virtual organization, 263 XHTML, 8
Virtual private cloud, 78 XMLHttpRequest, 9
Voice-controlled application, 252 XML net, 57
Voice over IP (VoIP), 21 XML Schema Definition (XSD), 8
Voldemort, 92
VoltDB, 92 Y
Yahoo!, 10, 17, 33, 135
W Yelp, 126
Wales, Jimmy, 39 Yet another resource negotiator (YARN), 91,
Watson, 34, 95, 265 182
Watson, Thomas, 95 YouTube, 6, 42
Wayback machine, 14
Wearable, 29 Z
Web, 1 Zapp, 123
2.0, 1 Zeitgeist, 16
conferencing, 197 Zetsche, Dieter, 253
of trust (WOT), 240 Zipcloud, 10
service, 9 Zoho, 65, 67
WhatsApp, 122, 239