Professional Documents
Culture Documents
S
IM
M
N
COURSE DESIGN COMMITTEE
S
IM
Author: Shashank Mishra
M
Copyright:
2017 Publisher
ISBN:
978-93-86052-15-5
Address:
4435/7, Ansari Road, Daryaganj, New Delhi–110002
Only for
NMIMS Global Access - School for Continuing Education School Address
V. L. Mehta Road, Vile Parle (W), Mumbai – 400 056, India.
S
4 Resource Considerations to Support Business Analytics 87
c u rr i c u l u m
Business Transformation with Big Data: What is Big Data; Structured v/s Unstructured data Big Data
Skills and Sources of Big Data; Big Data Adoption; Characteristics of Big Data - The Seven V’s; Under-
standing Big Data with Examples; Key aspects of a Big Data Platform; Governance for Big Data; Text
Analytics and Streams; Business applications of Big Data; Technology infrastructure required to store,
handle, and manage Big Data
Technologies for handling Big Data: Distributed and Parallel Computing for Big Data, Introduction to
Big Data Technologies (Hadoop, Python, R etc.) Cloud Computing and Big Data, In-Memory Technolo-
gy for Big Data; Big Data Techniques (Massive Parallelism; Data Distribution; High-Performance Com-
puting; Task and Thread Management; Data Mining and Analytics; Data Retrieval; Machine Learning;
Data Visualization)
S
Introduction to Business Analytics: What is Business Analytics (BA)? ; Types of BA; Business Analyt-
ics Model; Importance of business analytics now; what is Business Intelligence (BI)? Relation between
IM
BI and BA; Emerging Trends in BI and BA
Resource considerations to support Business Analytics: What is Data, Information and Knowledge;
Business Analytics Personnel and their roles, required competencies for an analyst; Business Analytics
Data; Ensuring Data Quality; Technology for Business Analytics; Managing Change
M
Descriptive Analytics: What is descriptive Analytics; Visualizing and Exploring Data; Descriptive Sta-
tistics; Sampling and Estimation; Introduction to Probability Distributions
Predictive Analytics: What is predictive Analytics; Introduction to Predictive Modeling: Logic driven
N
CONTENTS
S
1.1 Introduction
1.2 Evolution of Big Data
IM
Self Assessment Questions
Activity
1.3 Structured v/s Unstructured data
Self Assessment Questions
Activity
1.4 Big Data Skills and Sources
M
CONTENTS
S
1.13 Summary
1.14 Descriptive Questions
1.15 Answers and Hints
IM
1.16 Suggested Readings & References
M
N
Introductory Caselet
n o t e s
S
alone is of impeccable value towards the corporation and serves
as a driving factor for most of the part of its business decisions,
the way forward strategy, trend analysis and internal quality con-
trolling policy formulations.
IM
However, at the same time, it also accounts for an immense mag-
nitude of never-ending unstructured data, like videos, images,
documents, server configurations, customer set-ups, infrastruc-
ture details, and so on. To unleash the actual BI potential lying
underneath those information mountains, the corporation decid-
ed to implement Hadoop – an open source framework for distrib-
M
n o t e s
learning objectives
S
>> Discuss about text analytics
>> Describe business applications of Big Data
>> Explain technology infrastructure requirement
IM
1.1 INTRODUCTION
The 21st century is characterised by the rapid advancement in the
field of information technology. IT has become an integral part of daily
life as well as various other industries, be it health, education, enter-
M
This chapter first discusses about the evolution of Big Data. Next, the
chapter describes the differences between structured and unstruc-
n o t e s
tured data. Further, the chapter explains Big Data skills and sourc-
es. This chapter next discusses about the adoption of Big Data. The
chapter also discusses the characteristics of Big Data and Big Data
analytics. Next, the chapter discusses about key aspects of a Big Data
platform and text analytics. Towards the end, the chapter discusses
about business applications of Big Data and technology infrastructure
requirement.
S
displaced after the great rail road program into random habitats or at
different places faraway from their original ones. Authorities felt the
need of having an efficient system that could hold the data of such
dynamics.
IM
In 1890, the Hollerith Tabulating System was utilised for census – it
was a mechanical device and worked with punch cards that could hold
80 different variables or attributes. It revolutionised the way census
was conducted and reduced the time taken for compilation of census
data from almost seven years to six weeks.
M
Some years later in 1919, IBM took up the agricultural census with
over 5000 federal employees deployed across Washington and over
90,000 enumerators by using more than 100 million IBM punch cards
and other processing equipment. After that successful program, Big
N
Data took yet another leap forward with the development of The Man-
hattan Project – the atomic bomb developed by the US in World War
II and further more in US space programs from 1950. Later, a synoptic
data collection model was adopted, which relied heavily on allocation
of large data sets. This shift in data-collecting techniques, analysis,
and subsequent collaboration helped to redefine how bigger scientific
projects were planned and accomplished. One such ambitious project
was the International Biological Program – it studied the environmen-
tal changes on the species and flora-fauna of a particular place. This
program led to the exponential increase in the amount of data gath-
ered and combined latest analysis technologies. Although, it was met
with difficulties related to research structures and methodologies, and
ultimately ended in 1974, it opened a host of different transformed
ways that data was collected, organised, shared and redefined the
ways the existing tech could use data science more efficiently.
The lessons gained from the arrival of Big Data Science laid way for
further contemporary Big Data projects, like weather prediction, su-
percollider data analytics and other physics based research, astronom-
ical sciences and data collection like planetary image detection, med-
n o t e s
ical research and many others. Big Data has become such a dynamic
force that it doesn’t apply only to sciences anymore; many businesses
have got their critical data based services hooked onto its methodolo-
gies, techniques and objectives too which has allowed the businesses
to unleash the data value that might have gone unnoticed earlier.
1. The path towards modern Big Data was actually laid during
__________.
2. In 1890, the Hollerith Tabulating System was utilised for
census. (True/False)
Activity
S
Where else, instead of existing industries and domains, do you
think Big Data can play a crucial role in improving the overall op-
erational and organisational efficiency? Make a list of the domains
IM
with reasons to back them up.
uration file) with mails from the last two years for a company execu-
tive that receives over 100 emails per day. If you open it raw, via means
of reverse-engineering, all you are going to see is a sea of randomly oc-
curring datasets that point to nothing with hard to decipher meanings
and number codes with occasional familiar words’ sightings. But if
you open it in the program it is made for, you will see the structure and
arrangement in which it is supposed to be presented and aligned with.
No. Actually, a word file may not fit in a database where only text files
are supposed to be kept. Word file may have an internal structure with
all sorts of indentations, grammar, alignment and margins thoroughly
worked upon but in a database with different definitions for the data,
the database designer expects a text or excel file as a word file is con-
sidered as unstructured.
n o t e s
The joys of having a structurally sound data are many like they can be
seamlessly added in a relational database and are easily searchable
by simplest of search engine operations or even algorithms; whereas,
the unstructured data is basically the reverse of the above definition.
It is a nightmare for the designers to connect the random strands of
data with the existing meaningful ones and present it as a structure.
Structural data is closer to machine language than the unstructured
data. So, the battle of finding out a fine balance between keeping the
machine happy and the user happier is all that leads to the ever-refin-
ing Big Data sciences and its affiliated technologies.
S
considered a structurally sound entity. (True/False)
4. Structural data is closer to ______language.
IM
Activity
In your day-to-day life, write all the structured data patterns. You
see that you have observed for a week and compare them with the
unstructured patterns around you. Now, think of the ways how the
connection between them can be laid, if required. Please note re-
late only logically cohesive things, i.e. things that can co-exist.
M
Exhibit
Semi-structured data
N
n o t e s
Semi-Structured Data
S. No. Name E-mail
1. Sam Jacobs smj@xyz.com
2. First Name: David davidb@xyz.com
Last Name: Brown
S
1.4 Big Data Skills and Sources
IM
Now that we know theoretically what Big Data means, its evolution
into data sciences, the dramatic turnaround that made the sever-
al industries latch onto it, what kind of data exist in the known data
sphere – armed with that level of knowledge, comes the next stage of
reining the data science where we will look at the tools of the trade
that are frequently used and skills you need to possess to tame the
dataset bulls that may almost seem intimidating at first.
M
ther to its correct place. A keen statistical and data mining mind will
always take lesser time in finding out the patterns and studying the
data. Hence, it is necessary to have hands on with statistics and good
mathematical skills – needless to say, you don’t need to be a genius.
n o t e s
Over the next five years, demand for Big Data staff, by comparison, is
forecasted to increase at an average rate of between 13% (low growth)
and 23% p.a. (high growth). A mid-point average of these two rates
would give an expected growth rate of 18% p.a. This would be a fa-
voured situation and should equate to the creation of approx. 28,000
job opportunities p.a. by 2017.
That was a read about the Big Data technologies and methodologies
with a brief overview of how the job prospects are for a potential Big
Data candidate. Let’s take a brief look on the sources of datasets that
define Big Data as a science and complete it as a method.
S
1.4.1 The Sources of Big Data
The philosophy around Big Data sciences and collection has often
IM
been defined around the 3 Vs – volume, velocity and variety of data in-
flowing a system. For many yesteryears, this used to be enough but as
companies moved more towards online processes, this description has
been stretched to take in variability as well — which simply denotes
the increase in the range value of a large data set — and value, that
addresses the evaluation of a typical enterprise data.
M
The chunk of Big Data comes from three primary sources: machine
data, social data and transactional data. Besides, companies need to
make out the difference between internally generated data, like data
residing behind a corporation’s firewall, and externally generated
N
n o t e s
Despite the immense variety of existing data, these datasets and types
alone are almost meaningless, and most organisations struggle to
make sense of the data that they are generating and how it can be put
to effective use.
S
5. Data that comes from door-to-door surveys falls in _______
category.
6. ________ data is the data created from/by sensors installed in
IM machinery and industrial equipment, and even logs that track
the typical user behaviour.
Activity
Can there be specific data types that are most reliable and authen-
M
tic while another one that is more prone to errors? Consider met-
rics, such as references, quotes, and sources while creating the vi-
sualisation.
Given the Big Data nature and its analytical prowess, there are many
issues that require consideration and planning at the very start. For
example, with the adoption of any new technology, it becomes equally
important to secure it in a way conforming to current corporate stan-
dards. Tracking issues related to the source of a dataset from its dis-
covery to its consumption is considered as a new requirement for the
organisations. Managing the privacy of elements whose data or iden-
tity is being controlled by analytical processes must also be planned
ahead.
n o t e s
Big Data frameworks are not push-button answers. For data analysis/
analytics to offer value, corporations ought to have data management
and the governance frameworks of Big Data. Complete well-defined
processes and ample skill sets for those who will be responsible for
customising, implementing, populating and using Big Data solutions
are also necessary. Additionally, the data quality aimed for Big Data
powered processing needs to be evaluated as well.
S
1.5.1 Use of Big Data in Social Networking
Big Data is also used to gather the friend requests, activity sugges-
tions and pages to be followed – all these are nothing but Big Data
behind the scenes as the chief driving force enabling you to reconnect
with your old lost friend, customise your account as per the liking and
interests. Not only on Facebook, but interconnection of several other
social media platforms has opened the potential of a new social media
world order that might be brewing with several hidden features, ex-
ploiting which can prove beneficial for all.
n o t e s
The clerk for a U.S. council received an e-mail from her senior, who
was out of the country on a vacation, requesting the funds transfer for
a time-bound acquisition requiring to be closed by the end of the day.
The senior said that a lawyer would contact her to provide further
details.
“It was not uncommon for me to get official e-mails seeking funds trans-
fer,” the clerk said. Later the lawyer contacted her via e-mail, with
the appropriate authorisation—including her senior’s signature with
company’s seal—she simply followed the directions to transfer more
than $880,000 to a bank in China.
S
where Big Data offers a potential answer as it allows institutions or
corporations to tackle the fraud differently and get results accordingly.
also inform the actual card holders instantly and can prohibit the
transaction. Big Data is simplifying the detection of unusual trans-
actions like if two transactions take place from a single credit card
in different cities within a short period, the bank is going to get
N
alerted.
Leverage data to detect suspicious activities: Banks access large
number of customer’s data from various sources such as social
media, logs, call center conversation and that data can be very
helpful in determining abnormal activities. For example, a credit
card holder travelling in an airplane currently and is posted his
present status on Facebook. Therefore, any transaction on user’s
credit card during that period is considered suspicious and can be
blocked at the bank’s discretion.
n o t e s
S
data-centric and solve problems that call for bigger data sets, such as
cultural change needs to happen for Big Data solutions to become uni-
versal norm across the industry, including solutions that don’t work or
take you to a dead end but invariably end up educating you.
IM
1.5.3 Use of Big Data in Retail Industry
Big Data has brought in some remarkable results for retailers across
the industries as evident from their testimonials.
However, as with any other great bargain, plenty of obstacles and cyn-
icism still remain with using Big Data as the key retail transition ex-
pert. Big Data is creating a lot of interest, as confirmed by many senior
executives but most of them struggle with common challenges – like
aligning Big Data with the use cases, identifying new (usually unstruc-
tured) types of data and how to utilise Big Data for faster and efficient
decision-making.
Cases of some clever usages of Big Data in retail industries are true
examples of creative thinking of solution architect. Consider the fol-
lowing big Data examples in in which hotels using Big Data to increase
reservations.
n o t e s
such unpredictable weather are not good. However, Café Inn turned
this adversity into their advantage. They observed that the travellers
of a cancelled flight end up in an urgent situation and are in need of
an overnight stay. The company used weather and flight cancellation
information that was readily and freely available, coupled with hotel
and airport information, and an algorithm was developed, which took
factors, like travel conditions, weather severity, time of the day and
rates of cancellation by airlines among other variables. With insights
of Big Data, and pattern recognition of travelers using the mobiles for
this use case, the company effectively used Pay Per Click (PPC) and
search mobile campaigns to send specific mobile ads to stuck travel-
ers and made it easier for them to book a nearby hotel and increasing
the overall hotel revenue by manifolds even in the most unexpected
of times.
S
There are several such case studies and stories where Big Data’s effec-
tive utilisation resulted in a great deal of turnaround for corporations.
IM self assessment Questions
Activity
M
Big Data for retail industries can be a hit and miss affair. Discuss
with your friends.
N
n o t e s
S
cesses into something easily comprehensible and actionable. For
human purposes, the best methods are conversion into graphical
formats like charts, graphs, diagrams, etc.
IM
Value: Big Data offers an excellent value to those who can actually
play and tame it on its scale and unlock the true knowledge. It also
offers newer and effective methods putting new products to their
true value even in formerly unknown market and demands.
While Velocity, Volume and Variety are inherent itself to Big Data, the
M
Activity
Are seven V’s enough/too much for Big Data classification. Critical-
ly explain both the cases with examples.
n o t e s
Big Data analytics helps the corporations to utilise their data and use
it in identifying new opportunities which further leads to more effi-
cient operations, smarter and well calculated business moves, hap-
pier clients and higher revenues. Companies are actively looking to
find workable insights about their data. Many Big Data projects are
initiated from the need of answering key business requirements and
questions. With selection of a correct Big Data platform, the enter-
prise can increase efficiency, sales and improve operations, be better
S
at managing risks and servicing customers.
Cost reduction: Big Data technologies like Hadoop bring substan-
tial cost advantages when it comes to storage of large data and to
IM
recognise more efficient ways of doing business.
Better and faster decision making: With the evolving new age
technologies and memory analytics, coupled with the ability to
analyse new data sources, corporations are now able to immedi-
ately analyse the information – and make decisions based on the
learnings they derive.
M
New services and products: With the clarity to read the custom-
er’s need and analytical satisfaction enables the power to give
consumers what they want – even till the levels to tailormake the
solution according to the requirements of each customer individu-
N
Activity
Analyse a real life situation around you that can use Big Data ana-
lytics to increase the overall operational and functional efficiencies.
n o t e s
erated within the organisation. Being able to analyse all this data in a
meaningful way can be an intimidating task without the proper infra-
structure and ways to process data from diverse sources and effective-
ly. And once you have managed it, it’s another fight to make it mean-
ingful to the people who need to understand it. So, for organisations to
build the correct Big Data policy, here are the five crucial components
to consider:
A universal data model: Ensure your entire data is centralised
and unified in a common data model to provide a single accurate
view of the business. The conventions for common data model
such as naming, fields relationships and attributes are created by
data model itself in a way that everything is aligned across trans-
actional and other related systems.
Exploit the power of external data: Capturing the true meaning
S
of the data means successfully integrating initial data from inter-
nal sources with external data from diverse environments (like so-
cial media, vendor data and demographics). The platform should
be flexible enough to accommodate information in multiple ways
IM
from multiple structured or unstructured distributed databases.
Focus on open standards and scalability: Organisations can uti-
lise existing systems efficiently by using a platform with scalable
standards, simultaneously gaining flexibility and reducing the IT
related costs in terms of businesses. Open industry standard com-
pliant systems are readily available and preferred to existing sys-
M
tems for many reasons, one being their effortless integration with
existing systems from multiple other vendors, legacy systems and
future add-on solutions.
Platform independent model: In today’s age, the information is
N
n o t e s
Activity
How will you unify the different data sets in case you are given an
opportunity to design and develop an architecture?
S
1.9 Governance for Big Data
IM
Big Data governance is a crucial factor in dealing with management
of diverse datasets because many times, such data poses as risks, like
unplanned costs, input and misleading data.
Since Big Data is a new model and ever changing with dynamics of
industries, data governance is at nascent stage and not many know
about it. With policies and procedures yet to be developed, many gov-
M
n o t e s
The data usually has a lifecycle beyond which it either becomes obso-
lete or simply becomes a liability to be looked after. Overlooking such
aspects is a common error organisations commit. Hence, a standard
schedule is never recommended for all data types as they may have
different retention stages. Data archival is recommended to enhance
the overall performance of your applications.
S
self assessment Questions
Which other governance model other than Big Data you can think
for managing low traffic data centers?
M
n o t e s
Text
Identification
Visualisation Text Mining
Text
Summarisation
Categorisation
Text Analytics
Sentiment Text
Analysis Clustering
Link Search
S
Analysis Access
Entity/Relation
Modeling
IM
Figure 1.1: Displaying the Text Analytics Process Flow
Source: https://s-media-cache-ak0.pinimg.com/originals/05/3d/e0/053de0478bb02ab7dfb-
73222059fe182.jpg
n o t e s
Activity
S
organisations or even smaller hotel chains, is no uncommon achieve-
ment for a model that is meant to failsafe you against the worst of
the conditions. But, Big Data is not only limited to that; it comes with
IM
much deeper and broader applications.
dealership can predict when the next car going to be sold, Walmart
can predict the most selling item in each point of time for a month or
in a year or around any holiday season.
Big Data is now seeping in those areas that were earlier prone to mis-
calculations and predictions, such as stock inventory model where a
retailer couldn’t decide whether to stock up for the upcoming seasonal
sales based on the factors around or not. Now, the same retailer can
optimise their stock from the Web search trends, social media data
and weather forecasts predictions.
n o t e s
Collider nuclear physics lab, world’s most powerful and largest parti-
cle accelerator is currently experimenting on genesis of the universe
in search of the elusive God particle. The datacentre responsible for
managing CERN’s datasets has 66,000 processors to analyse around
30 petabytes of data produced. It uses the distributed computing pow-
er of thousands of systems located across 140 datacentres around the
world. Such computing powers can be utilised to change the way many
other areas of science and research function and give the results.
S
Activity
Technology Infrastructure
1.12
Requirement
M
Big Data is simply a large data repository with the following charac-
teristics:
Has distributed redundant data storage
N
So, the infrastructure that’s going to host Big Data as the prime driver
of an organisation must be robust, scalable, ductile and fail-safe for
unplanned situations. But how do we arrive at such robust scale of
infrastructure? Merely having a super-expensive high-spec systems
and networking gears will be enough or Big Data requires something
more than these usual factors?
n o t e s
S
1.12.1 Storing of Big Data
The data once gathered from your sources is stored in the sophisticated
but accessible systems and traditional data warehouse, a distributed/
IM
cloud-based storage system, a data lake, and in the company servers
or even a simple computer’s hard disk depending on the magnitude of
the data received. For not so larger amounts of data, one can consid-
er using their clustered networks storage as the data-storing option,
given it’s well designed and has failsafe measures to withstand those
unpredictable storage issues. However, for larger data inflows, where
M
n o t e s
Apache Hadoop streamlines the excess data for any distributed pro-
cessing system across computer clusters using simple programming
based models. Instead of having hardware dependency to provide the
uptime, the library inbuilt with features at the application layer, to
detect and handle breakdowns, providing a reliable and always avail-
able service along with a computer cluster, since both versions may be
prone to failures.
S
OS level and File system abstractions
A MapReduce or YARN (Yet Another Resource Negotiator) engine
IM
Hadoop Distributed File System (HDFS)
Java ARchive (JAR) files
Scripts needed to start Hadoop, documentation and source code,
and a contribution section
M
A lot has been discussed and written about the Big Data’s functioning,
its associated workflows, technologies used and traits they require to
share in order to perform efficiently. Following key points should be
N
n o t e s
S
Information security architecture: A usual examination of mul-
tiple Big Data implementations illustrates that less security fea-
tures are considered secondary over other pressing requirements
of a demanding system and aftermarket security solutions are not
IM
tailor made for these clusters. These deployments often turn out
to be insecure and solely rely on perimeter and network security
support.
Activity
N
1.13 Summary
The Big Data sciences use concepts of statistics, relational data-
base programming extensively.
Normally, while dealing with enormous number of datasets, you
need to have a good sense of observing the patterns, frequency of
data occurrences and other features that help in narrowing down
a data further to its correct place.
The chunk of Big Data created comes from three primary sources:
machine data, social data and transactional data.
The adoption of a contemporary technology like Big Data can
enable the altering innovation that can bring a transition in the
structure of a business, either with its services, products, or organ-
isation.
n o t e s
key words
S
on the world’s most popular social media platforms.
Transactional data: It is the data that is generated from online
and offline transactions occurring daily.
IM
Unstructured data: It is the data that is not well organised.
n o t e s
S
Data
Technology Infrastructure 14. Google
Requirement
IM
ANSWERS FOR DESCRIPTIVE QUESTIONS
1. The earliest need for managing large datasets of information
originated back in early eighteenth century around 1880. Refer
to Section 1.2 Evolution of Big Data.
2. Anything that has a well-defined arrangement, easy-to-
M
since unstructured data does not have a definite data model and,
hence, requires more resources to make sense out of it. Refer to
Section 1.4 Big Data Skills and Sources.
4. The seven signs of Big Data define the true Big Data attributes
and sum it up as an effective yet extremely straightforward
solution for those datasets that require dealing with an incredibly
plumped-up information. Refer to Section 1.6 Characteristics of
Big Data – The Seven Vs.
5. Big Data Analytics are a set of advanced analytic techniques
used against very large, miscellaneous data sets that include
unstructured/structured, batch/streaming and different sizes
ranging from terabytes to zettabytes. Refer to Section 1.7 Big
Data Analytics.
6. Text analysis requires multiple statistical, linguistic and machine-
learning techniques and involves retrieval of information from
unstructured data and restructuring the input text to create
patterns and trends, evaluate and interpret the data output.
Refer to Section 1.10 Text Analytics.
n o t e s
SUGGESTED READINGS
Mayer-Schönberger, V., & Cukier, K. (2014). Big data: a revolution
that will transform how we live, work, and think. Boston: Mariner
Books, Houghton Mifflin Harcourt.
Erl, T., Khattak, W., & Buhler, P. (2016). Big data fundamentals:
concepts, drivers & techniques. Boston: Prentice Hall.
E-REFERENCES
What is Big Data and why it matters. (n.d.). Retrieved April 22,
2017, from https://www.sas.com/en_us/insights/big-data/what-is-
big-data.html
S
BigData. (2017, March 17). Retrieved April 22, 2017, from https://
www.ibm.com/big-data/us/en/
IM
M
N
CONTENTS
S
2.1 Introduction
2.2 Distributed and Parallel Computing for Big Data
IM
Self Assessment Questions
Activity
2.3 Introduction to Big Data Technologies
2.3.1 Hadoop
2.3.2 Python
2.3.3 R
M
Activity
2.5 In-Memory Technology for Big Data
Self Assessment Questions
Activity
2.6 Big Data Techniques
2.6.1 Massive Parallelism
2.6.2 Data Distribution
2.6.3 High-Performance Computing
2.6.4 Task and Thread Management
2.6.5 Data Mining and Analytics
2.6.6 Data Retrieval
2.6.7 Machine Learning
2.6.8 Data Visualisation
Self Assessment Questions
Activity
2.7 Summary
CONTENTS
S
IM
M
N
Introductory Caselet
n o t e s
S
ogies for implementing Cisco UCS Common Platform Architec-
ture (CPA) for Big Data. MapR Technologies had suggested the
Apache Hadoop solution that provides a complete new way of
handling Big Data. Unlike traditional databases that store struc-
IM
tured data only, Hadoop allows Solutionary to distribute and
analyse both types of data, structured or unstructured, smoothly
on a single data infrastructure.
He also declares, “MapR and Cisco UCS have many of the same
values: high performance, efficient management, and ease of use.
Using both solutions together enables us to scale our security analy-
sis services while keeping complexity and cost under control.”
n o t e s
learning objectives
2.1 Introduction
The market is flooded with corporations offering custom-made tools
S
and frameworks for implementing Big Data and analytics. However,
behind the branding and beneath the platform, the basic features are
common in all. Given below is a list of methods and practices that are
usually followed for a typical Big Data implementation:
IM
NoSQL database: It offers a provision for storage and extraction
of the data modelled in tabular relations instead of typical relation-
al databases to cater efficiently to real-time situations.
Data incorporation: Data management tools available as solu-
tions like Amazon Elastic MapReduce (EMR) that run underneath
M
n o t e s
In this chapter, you will first learn distributed and parallel comput-
ing for Big Data. Next, you will learn the basics of Big Data technolo-
gies. Further, you will study cloud computing in reference to Big Data.
Next, you will learn in-memory technology for Big Data. Towards the
end, you will learn about various Big Data techniques.
S
Distributed and Parallel
2.2
Computing for Big Data
IM
In Big Data, terminologies related to computing have similar meaning
that they have in other fields although with different scope of appli-
cability. Let’s have a look at what they mean and what they stand for:
Distributed computing: It works on the rules of the divide and
conquer approach, performing modules of parent tasks on multi-
ple machines and then combining the results. It is basically multi-
M
n o t e s
Data, such parallel systems are the ones that execute from multi-
ple datasets throughput points and run in parallel connected to a
master system. Parallel computing is a close-coupled system and
is used in solving the following:
Computer-exhaustive problems
Bigger problems in the same time
Similar-sized problems in the same time with high precision
Distributed Computing
Grid Node
S
Control Server
Task
IM
Parallel Computing
M
Server
10/100 MB/s Ethernet Switch Internet
Compute Nodes
N
Network Disk
Storage
n o t e s
S
Loose coupling of computers con- Tight coupling of processing re-
nected in a network that provides sources that are used for solving a
access to data and remotely located single, complex problem
resources
IM
Besides these computing models, a common occurring model that
lies somewhere between these two models is called the concurrent
computing model. Concurrency of a system is simply the operation of
multiple threads that execute on single or multiple processors. Con-
currency refers to sharing of multiple sources in real-time.
M
Activity
n o t e s
A typical Big Data system consists of a setup that adheres to these sev-
en Vs and provides a great infrastructure that can withstand the in-
flux of huge datasets with high velocity, meanwhile providing an effec-
tive mechanism to process the datasets by cleansing, shaping, filtering
S
and sorting into meaningful information aimed towards making the
data both user- and machine-friendly. Beneath the complex system
of architecture, sophisticated hardware and methodologies working
IM
in conjunction with each other, therein lie the interfaces that are re-
sponsible for communicating with the hardware and user simultane-
ously – the programmable applications or the tools that are the prime
drivers of the efficiency of a typical Big Data system setup. A few such
contemporary interface development programs are described in the
next few sections along with their applications.
M
2.3.1 Hadoop
umes of data.
n o t e s
reducing data loss. Figure 2.2 shows The Hadoop multinode cluster
architecture:
S
IM
Figure 2.2: Hadoop Multinode Cluster Architecture
that a DataNode cluster goes down while the processing is going on,
then the NameNode should know that the some DataNode is down in
the cluster, otherwise it cannot continue processing. Each DataNode
sends a “Heart Beat Signal” to NameNode after every few minutes (as
per Default time set) to make NameNode aware of the active / inactive
status of DataNodes. This system is called Heartbeat mechanism.
n o t e s
S
IM
M
n o t e s
S
of server failure. Usually, three copies of data are maintained, so
the usual fault-replication factor in Hadoop is 3.
IM
Hadoop also manages hardware failure and smoothens data handling.
Following few inbuilt components of Hadoop make it a great platform
to perform larger dataset related operations:
Hive: A data warehouse tool created by Facebook based on Ha-
doop converts query language into MapReduce jobs. It deals with
storage, analysis and queries of large sets of data. HQL (Hive Que-
M
2.3.2 Python
The most acknowledged fact that goes in the favour of Python as a lan-
guage is that it is widely used by developers, analysts or even finance/
n o t e s
S
colossal groups of graphics servers to compile the imagery for the
chartbuster movies. Python consistently ranks higher than JavaS-
cript, Ruby and Perl in popularity ratings.
IM
Just like Hadoop, Python consists of custom implementation of Spark
framework of Apache which is used to handle, manage and analyse
large chunks of datasets. Apache Spark is a large-scale data process-
ing framework which is fast and can be customised according to the
platform being implemented upon.
However, a key point to note here would be – Python is not being im-
M
n o t e s
S
spaces (such as databases or HDF5) and applications to make it
easier to mathematically operate or manipulate the data or simply
analyse the data that is otherwise too big for the memory.
IM
There are some limitations to Python in the context of a Big Data im-
plementation. In the case of benchmarking performance, Python fares
less than or equal to Java. It is not slow by any measure, but still there
remains a lot of optimisation to be done. Let’s move to another statis-
tical and analytical language called R, study about it and summarise
differences between the two languages.
M
2.3.3 R
n o t e s
S
where sampling is a must, it can still lead to substantial models,
especially if the sample is:
Still big in total numbers
IM
Not too small proportionally to the size of the entire dataset
and not biased
Bigger hardware: Since R retains and keeps all objects in the dy-
namic memory, it can pose a serious problem if the dataset gets
exponentially larger. However, given the memory costs, it is easier
M
n o t e s
S
data science.
n o t e s
Table 2.2 lists the pros and cons of using R and Python for Big Data:
S
front-line packages and great ented programming language that
community support. All R is easy and intuitive. It comes with
packages are available at R virtually no learning curve for
documentation. those having prior programming
IM experience, and it increases the
speed at which you can create a
program. You need lesser time to
code while you have more time to
test it
R is meant for statisticians. The Python testing framework is
They can interconnect an in-built testing framework that
M
n o t e s
Activity
S
2.4 CLOUD COMPUTING AND BIG DATA
IM
One of the vital issues that organisations face with the storage and
management of Big Data is the huge amount of investment to get the
required hardware setup and software packages. Some of these re-
sources may be over utilised or underutilised with varying require-
ments overtime. We can overcome these challenges by providing a set
of computing resources that can be shared through cloud computing.
These shared resources comprise applications, storage solutions, com-
M
SaaS
Laptop Cloud
Internet IaaS
Provider
PaaS
Desktop
Mobiles or
PDAs
n o t e s
S
to cater to the requirements of businesses. Both cloud computing and
Big Data analytics use the distributed computing model in a similar
manner and hence, are complementary to each other.
IM
FEATURES OF CLOUD COMPUTING
The following are some features of cloud computing that can be used
to handle Big Data:
Scalability: Scalability means the addition of new resources to an
existing infrastructure. An increase in the amount of data being
M
support to the software that used to run properly on the earlier set
of hardware. We can solve such issues by using cloud services that
employ the distributed computing technique to provide scalability
to the architecture.
Elasticity: Elasticity in cloud means hiring certain resources, as
and when required, and paying for resources that have been used.
No extra payment is required for acquiring specific cloud services.
For example, a business expecting the use of more data during
in-store promotion could hire more resources to provide high pro-
cessing power. Moreover, a cloud does not require customers to
declare their resource requirements in advance.
Resource pooling: Resource pooling is an important aspect of
cloud services for Big Data analytics. In resource pooling, multiple
organisations, which use similar kinds of resources to carry out
computing practices, have no need to individually hire all resourc-
es. The sharing of resources is allowed in a cloud, which facilitates
cost cutting through resource pooling.
n o t e s
S
Fault tolerance: Cloud computing provides fault tolerance by of-
fering uninterrupted services to customers, especially in cases of
component failure. The responsibility of handling the workload is
shifted to other components of the cloud.
IM
CLOUD DEPLOYMENT MODELS
n o t e s
Company X
Cloud
Services
Public Cloud (IaaS/ Company Y
PaaS/
SaaS)
Company Z
S
a private cloud. In other words, in this cloud, the cloud comput-
ing infrastructure is solely designed for a single organisation and
cannot be accessed by other organisations. However, the organi-
sation may allow this cloud to be used by its employees, partners
IM
and customers. The primary feature of a private cloud is that an
organisation installs the cloud for its own requirements. These
requirements are customary to the organisation that plans and
manages the resources and their use. A private cloud integrates
all processes, systems, rules, policies, compliance checks, etc. of
the organisation at a place. In a private cloud, you can automate
M
n o t e s
Cloud
Private Services
Cloud (IaaS/PaaS/
SaaS)
S
service and can be made available on or off premises. To make
the concept of community cloud clear and to explain when com-
munity clouds can be designed, let’s take an example. In any state
IM
or country, say England, the community cloud can be provided so
that almost all government organisations of that state can share
resources available on the cloud. Because of the sharing of cloud
resources on community cloud, the data of all citizens of that state
can be easily managed by government organisations.
Figure 2.7 shows the use of community clouds:
M
Community Community
Cloud for Level A Cloud for Level B
N
n o t e s
Public Cloud
S
Migrated Application
IM
Private
Cloud
Organisation X Organisation Y
Cloud Services
(IaaS/PaaS/SaaS)
M
n o t e s
S
Software as a Service (SaaS): SaaS is one of the most popular
cloud-based models and comprises applications provided by the
service provider.
IM
Exhibit
The cloud is a broad concept and it covers just about every possi-
ble sort of online service, but when businesses refer to cloud pro-
M
Software as a Service
n o t e s
Platform as a Service
S
they need them, scaling as demand grows, rather than investing in
hardware with redundant resources. Examples of PaaS providers
include Heroku, Google App Engine and Red Hat’s OpenShift.
IM
Infrastructure as a Service
on-demand.
IaaS providers offer these cloud servers and their associated re-
sources via dashboard and/or API. IaaS clients have direct access
to their servers and storage, just as they would with traditional
N
IaaS is the most flexible cloud computing model and allows for
automated deployment of servers, processing power, storage and
networking. IaaS clients have true control over their infrastructure
than the users of PaaS or SaaS services. The main uses of IaaS in-
clude the actual development and deployment of PaaS, SaaS and
Web-scale applications.
Source: https://www.computenext.com/blog/when-to-use-saas-paas-and-iaas/
Big Data cloud providers have been gearing up to bring the most ad-
vanced technologies at competitive prices in the market. Some pro-
viders are established, whereas some of them are relatively new to the
n o t e s
S
resources elastically in a way that the hiring of resources is
possible on an hourly basis.
Elastic MapReduce: It is a Web service that uses Amazon EC2
IM
computation and Amazon S3 storage for storing and process-
ing large amounts of data so that the cost of processing and
storage is reduced significantly.
DynamoDB: It is a NoSQL database system in which data stor-
age is done on Solid State Devices (SSDs). DynamoDB allows
data replication for high availability and durability.
M
n o t e s
S
Hadoop is used as a cloud service in Windows Azure PaaS with
the help of HDInsight. HDFS and MapReduce related frameworks
are thus, offered economically, and in a simpler way, by the in-
IM
tegration of Hadoop in this PaaS. The efficient management and
storage of data are important features of HDInsight, which also
uses the Sqoop connector for importing the Windows Azure SQL
data into HDFS or exporting the data to a Windows Azure SQL
database from HDFS.
M
a. private cloud
b. public cloud
c. hybrid cloud
d. community cloud
7. The SaaS model of cloud service allows its users to deploy and
use applications on run-time environment platforms, which
are provided on the Internet and supported by the provider.
(True/False)
Activity
n o t e s
S
million members supporting over 3 million concurrent visitors watch-
ing and chatting about games from over 2 million broadcasters where
the capacity of a chat room often goes beyond 500,000 in a single chat
room. Besides, it also offers a target-based advertising – a potential
IM
revenue driver, based on the chat history. This is one such example
where hardware obstructions and limitations, lag of memory indiffer-
ences have to be sidelined and streamlined with something faster like
a cache memory or dynamic access memory so that the data is readily
available for disposal. To deliver such services and capabilities, busi-
nesses require the skill to integrate both abrupt dynamics with histori-
M
Now cost variations for such setups have abridged. Figure 2.9 shows
the cost of various storage technologies available for a sample 1GB of
memory along with respective read/write performance:
500 µsec
$9 250 µsec
90 µsec
25 µsec
1 µsec
0.10 µsec
$2 $0.4
$1
0
DRAM NV-DIMM/PM NVMe SSD SATA SSD
n o t e s
It takes $9 for 1GB of RAM, $0.40 for SSDs and $1 for PCI compatible
memory cards. The choice of a specific memory technology is subject
to its raw performance figures for a real-time scenario than bench-
marking figures, for a given use case. As memory evolution goes on,
new dynamic memory substitutes are shortening performance gaps
by far and large. Database-related technologies are adapting with the
evolution that has struck the goldmine for corporations for giving a
capability to fuse the newer and older setups in tandem with deliver-
ing radical performance to cost ratios.
S
9. Twitch is a social media gaming platform community.(True/
False)
IM
Activity
section, we will study about some of the techniques that are used to
tackle datasets and bring them to a conclusive end. However, this list
is not exhaustive since newer methodologies and techniques keep on
evolving from time to time.
2.6.1 MASSIVE PARALLELISM
n o t e s
S
2.6.2 DATA DISTRIBUTION
multiple transfers are parallelly requested, the server will drop the
connections due to numerous virtual machines seeking blocks of
data – leading to a flash crowd effect.
Semi-centralised approach: Given the flash crowd effect in the
N
n o t e s
S
sistence – where multiple database technologies are singularly used to
store multiple datasets in a single system.
IM
2.6.3 HIGH-PERFORMANCE COMPUTING
High-performance computing is the simultaneous use of supercom-
puters and parallel processing techniques for solving intricate com-
putation problems. It emphasises making parallel processing systems
and algorithms by joining both parallel and administrative computa-
tional methods. The words ‘supercomputing’ and ‘high-performance
M
n o t e s
tions, they also assist Big Data in the bioinformatics domain as they do
for sequencing and alignment.
S
environments and to deal with such concurrency related issues in Big
Data, we deal with two types of parallelisms – Task and Data.
Task parallelism refers to the execution of computer programes
IM
throughout the multiple processors on different or same machines. It
emphasises on performing diverse operations in parallel to best utilise
the accessible computing resources like memory and processors.
One example for such parallelism would be an application creating
multiple threads for doing parallel processing with every thread re-
M
uted data.
It is often dealt in normal programming languages under the syntax of
synchronous and asynchronous programming techniques which are
similarly implemented in Hadoop, with the use of Java.
n o t e s
2.6.6 DATA RETRIEVAL
Big Data refers to the large amounts of multi-structural data that con-
tinuously flows around and within the organisations, and includes
text, video, transactional records and sensor logs. Big Data systems
utilise the Hadoop and the HDFS architecture to retrieve the data us-
ing MapReduce - a distributed processing framework.
It helps programmers in solving parallel data problems where the
dataset can be divided into small chunks and handled autonomous-
S
ly. MapReduce is an important step as it allows normal developers to
utilise parallel programming concepts irrespective of cluster commu-
nication details, failure handling and task monitoring.
IM
MapReduce simplifies all that by splitting the input data-set into mul-
tiple portions, each assigned a map task to process the data parallelly.
Each map task takes the input (key, value) and creates a transformed
(key, value) output.
MapReduce uses TaskTracker and JobTracker mechanisms for task
M
2.6.7 MACHINE LEARNING
Machine learning formally focuses on the performance, theory and
properties of learning algorithms and systems. Machine learning is
considered to be an ideal research field for taking advantage of the
opportunities available in Big Data.
n o t e s
It delivers on the potential of mining the value from huge and differ-
ent data sources with less dependence on human instructions. It is
data-driven and runs at machine scale and well-suited to the compli-
cation of dealing with different data sources and the enormous range
of variables and quantities of data involved. And in contrast to con-
ventional analysis, machine learning blooms on expanding datasets.
More data a machine learning system gets, more it learns and applies
the results to yield higher quality insights.
S
Logistic Regression
Linear Regression
Autoencoders
IM
Neural Networks
vital for solving the Big Data problems are discussed. These methods
do not focus on the algorithm logic only rather on the idea of learning:
Representation learning: Datasets with multi-dimensional fea-
tures are becoming gradually more common nowadays, which
challenges the current learning algorithms to excerpt and man-
age the discerning information from the datasets. Representation
learning aims to achieve a rational size learned representation
that can capture many likely input configurations, and can pro-
vide improvements in both statistical efficiency and computational
efficiency.
Deep learning: Unlike most learning techniques that use scarcely
designed learning styles, the deep learning technique uses con-
trolled and/or uncontrolled strategies in deep structures to learn
hierarchical representation automatically. Deep architectures
gather hierarchically launched statistical and complicated input
patterns for achieving adaptiveness for newer areas than outdat-
ed learning methods and frequently beat the state-of-the-art tech-
niques.
n o t e s
S
using less labelled instances possible, thus curtailing the cost of
finding the labelled data.
IM
All the above forms of learning find a supportive library function in
Hadoop and HDFS file structure. Textual analysis, analytical tools
end up deploying a few of above learning techniques implicitly during
regular operations, which is further evaluated and later studied to fig-
ure out valuable insights offered by the automated learning. It is a
clear case of artificial intelligence coupled with Big Data and associat-
ed technologies and several developments in this field have only sup-
M
2.6.8 DATA VISUALISATION
N
n o t e s
S
query filters and tight coupling). There are also a few standard prob-
lems for big data visualisation:
Visual noise: Most dataset objects are too tightly coupled to each
IM
other making it tougher for users to divide them as distinct objects
on the screen.
Information loss: Lessening of evident datasets often leads to in-
formation loss.
High image change rate: Users simply observe the data and can-
not react to the data change or their intensity in real time on dis-
M
play.
High performance necessities: Good data visualisation requires
a higher degree of efficient setup backed by scalable and robust
machines that are ready to churn out visualisation in high perfor-
N
mance environment.
n o t e s
10. Each system acts as a client and a server, and to access virtual
machines, the data centre offers firewall-free, low-latency ISP
traffic. Such an approach is called ____________.
11. Parallelism is the execution of multiple threads concurrently
to complete a task in the shortest possible time. (True/False)
Activity
S
Research on different machine learning methods and find out
which methods and their algorithms are vital for solving Big Data
problems.
IM
2.7 SUMMARY
Distributed computing works on the rules of the divide and con-
quer approach, performing modules of the parent tasks on multi-
ple machines and then combining the results.
M
n o t e s
S
Cloud computing makes it possible for organisations to dynami-
cally regulate the use of computing resources and access them as
per the need while paying only for those resources that are used.
IM
The in-memory Big Data computing tool supports the processing
of high velocity data in real time and also faster processing of the
stationary data.
Massive parallelism refers to a parallel system where multiple sys-
tems are interconnected with each other pose as a single mighty
M
conjoint processor and carry out the tasks received from the data
sets parallelly.
Distribution of data is a highly critical step in a typical Big Data
setup.
N
n o t e s
key words
S
Pig: Pig is a high-level modular programming tool developed by
Yahoo for streamlining huge data sets with the use of Hadoop
and MapReduce.
IM
Python: It is a popular interpreted, general-purpose, high-lev-
el dynamic programming language that aims to improve code
readability and overall ease of use and expression in fewer state-
ments than other competitive languages such as C++ or Java.
R: It is an open source interpreted programming language
and an application environment for statistical computing with
M
n o t e s
S
Big Data Techniques 10. P2P
11. False
IM
ANSWERS FOR DESCRIPTIVE QUESTIONS
1. The distributed computing is basically multiple processors
interconnected by communication links as opposed to parallel
computing models which usually work on shared memory (but
not always). Refer to Section 2.2 Distributed and Parallel
Computing for Big Data.
M
SUGGESTED READINGS
Wadkar, S., Siddalingaiah, M., &Venner, J. (2014). Pro Apache Ha-
doop. Berkeley, CA: Apress.
White, T. (2011). Hadoop: the definitive guide. Sebastopol, CA:
O’Reilly.
n o t e s
E-REFERENCES
Welcome to Apache™ Hadoop®! (n.d.). Retrieved April 22, 2017,
from http://hadoop.apache.org/
What is Hadoop? (n.d.). Retrieved April 22, 2017, from https://www.
sas.com/en_us/insights/big-data/hadoop.html
Hadoop& Big Data.(n.d.). Retrieved April 22, 2017, from https://
mapr.com/products/apache-hadoop/
S
IM
M
N
CONTENTS
S
3.1 Introduction
3.2 Introduction to Business Analytics
IM
Self Assessment Questions
Activity
3.3 Types of BA
Self Assessment Questions
Activity
3.4 Business Analytics Model
M
Introductory Caselet
n o t e s
AMNESTY INTERNATIONAL
S
THE CHALLENGE
Around four years back, with the help of its in-house fund-
IM
raising consultants, Amnesty International started seeking an
analytics software to work parallel to the existing CRM systems.
The fund-raising consultants are responsible for gathering funds
and managing various kinds of donors. They are also required to
measure the donors’ sentiments and interests based on multiple
inputs, such as various parameters and participatory ratios. For
M
THE SOLUTION
The analytical tool was integrated with the CRM. Thus, using the
contemporary analytics software with CRM database became eas-
ier, making the reporting features much more robust. Of course,
as a human rights organisation, Amnesty International performs
all data analytics in obedience with privacy rules and protective
integrity.
n o t e s
learning objectives
3.1 INTRODUCTION
The word ‘Analytics’ has multiple meanings and is open to interpreta-
S
tion for business and marketing professionals. This term is used dif-
ferently by experts and consultants in almost a similar fashion. Ana-
lytics, as per the definition of the business dictionary, is anything that
IM
involves measurement – a quantifiable amount of data that signifies a
cause and warrants an analysis that culminates into resolution.
This chapter discusses about Business Analytics and its types. Next,
the chapter discusses about Business Analytics (BA) model. This chap-
ter further discusses about importance of Business Analytics. Further,
this chapter discusses about the concept of Business Intelligence (BI)
M
and its relation with business analytics. In the end, this chapter dis-
cusses about emerging trends of BI and BA.
INTRODUCTION TO BUSINESS
3.2
N
ANALYTICS
Business Analytics is a group of techniques and applications for stor-
ing, analysing and making data accessible to help users make better
strategic decisions. Business Analytics is a subset of Business Intel-
ligence, which creates competences for companies to contest in the
market efficiently and is likely to become one of the main functional
areas in most companies (More on BI later in this chapter).
n o t e s
Activity
S
3.3 TYPES OF BA
IM
Going by the linguistic definition purely, there may be multiple elu-
cidations of the term BA. However, in practical terms, there are four
types of BA that help an organisation in gauging out the customer
sentiments and then take respective actions:
Descriptive analysis: It refers to “What is happening?” or “What
M
facts to derive the scenarios about what happened and why it hap-
pened. The result of this analysis is often a pre-defined reporting
structure, such as root cause analysis (RCA) report. For example,
a root cause analysis may help in finding out the factors which the
above coffee shop owners fail to read and comprehend.
Predictive analysis: It refers to analysis of probabilities. Predic-
tive analysis tries to forecast on the basis of previous data and sce-
narios. For example, a hotel chain owner might ramp down pro-
motional offers during a restive season of rains in a coastal area.
This is based on the predictions that there is going to be fewer
footfalls due to heavy rain.
Prescriptive analysis: This analysis type tells you about the ac-
tions you should take. This is the most essential analysis type and
typically forms the standards and recommendations for the next
phase. For example, a doctor prescribes medicines to the patient
after researching, studying, evaluating and diagnosing the cause
of pain or irritation with the patient. Similarly, organisations too,
after drawing out the statements, resultants, conclusions and oth-
n o t e s
er factors will take a step in ensuring that the factors affecting the
growth charts positively continue to exist, whereas the damaging
factors stay out of their future prospects.
S
Activity
IM
Is there any other analysis type you can think of other than above
four models? What would it be?
n o t e s
businesses that may end up exploiting its weaknesses and may turn its
strengths into weakness. Figure 3.1 shows the SWOT diagram:
S
IM
Figure 3.1: The SWOT Diagram
Source: https://s-media-cache-ak0.pinimg.com/736x/88/b0/1a/88b01aa805648a30 4c0a3bbd-
954c1a5e.jpg
M
On the other hand, new starters should include SWOT as their plan-
ning process. SWOT is not necessarily a pan-organisation-based pro-
cess; rather each of the organisation’s departments can have their
own dedicated SWOT, such as Marketing SWOT, Operational SWOT,
Sales SWOT, etc.
n o t e s
S
IM
M
Source: https://www.smartdraw.com/pest-analysis/
n o t e s
S
Activity
IM1. Do an honest SWOT of Big Data so far.
2. Can a strength identified in SWOT be a political challenge in
PEST? Support your answer with an example.
The need of analytics arises from our basis day-to-day life. An average
person has to analyse the time factor from getting up from the bed to
getting ready to leave for office so as to reach on time in a relaxed man-
ner. Not only that, it also includes analysing the best possible route to
N
avoid the traffic and save more time in order to have an extra cup of
coffee for the day! As evident, even a ballpark analysis of the daily life
often yields results that may be assuring that analytics actually are an
efficient way of measuring and tracking your results periodically.
Significance of BA:
To get visions about customer behaviour: The prime advantage
of financing some BI software and expert is the fact that it increas-
es your skill to examine the present customer-purchasing trend.
Once you know what your customers are ordering, this informa-
tion can be used to create products matching the present con-
sumption trends and, thus improve your cost-effectiveness since
you can now attract more valued consumers.
n o t e s
S
is increase in the efficacy of the organisation leading to increased
productivity. BI helps in sharing information across multiple chan-
nels in the organisation, saving time on reporting analytics and
processes. This ease of sharing information reduces redundancy
IM
of duties or roles within the organisation and improves the preci-
sion and practicality of the data produced by different divisions.
business running in brick and mortar stores and who use their web-
site only for marketing purposes.
already expect some turmoil in one of your business sections, you can
do a SWOT of the section and impact the overall outcome positively.
Here, BA not only helped you in retaining a section full of customers,
but also helped you in avoiding a future conflict of similar nature. BA
arms you with situational arsons – you get a machine gun in the form
of viral marketing campaigns when you are targeting a mass audience
for a given product, whereas in case of customer withdrawal or ramp-
up, you can have your sniper ready to specifically target them out.
n o t e s
Activity
S
costs.
BI-based solutions are most apt for industries with huge customer
base, higher competition levels and massive data volumes. Some of
the exclusive BI functions include the following:
Examining sales trends
n o t e s
S
10. _________ based solutions are most apt for industries with
huge customer base, higher competition levels and massive
data volumes.
IM
Activity
How can an election campaign benefit from BI? Make a case study
on it.
M
n o t e s
The difference between BI and BA is that BI equips you with the in-
formation whereas BA gives you the knowledge.
S
With the help of BA, you get to know the pain points of your busi-
ness; your product’s standing in the market, your strengths related to
IM
business that put you ahead of the competition and the opportunity
which you are yet to explore. BA helps you in knowing your business
thoroughly. BI helps in bridging that gap between ground reality and
management perspective on a pan-organisational basis.
BI BA
Uses current and past data to opti- Utilises the past data and separately
mise the current age performance analyses the current data with past
for success data as reference to prepare the busi-
nesses for the future
Informs about what happened Tells why it happened
Tells you the sales numbers for Tells you about why your sales
first quarter of a fiscal year or total numbers tanked in first quarter or
number of new users signed up on about the effectiveness of the newly
our platform launched user campaign for making
users refer other users to our plat-
form
Quantifiable in nature, it can help More subjective and open to interpre-
you in measuring your business in tations and prone to changes due to
visualisations, chartings and other ripples in organisational or strategic
data representation techniques structure
Studies the past of a company and Predicts the future based on the
ponders over what could’ve been learning gained from the past, pres-
done better in order to have more ent and projected business models for
control over the outcomes a given term in the near future
n o t e s
All this boils down to the interchangeable usage of the term “business
intelligence” and “business analytics” and its importance in manag-
ing the relationship between the business managers and data. Owners
and managers now, as a result of such accessibility, need to be more
familiar with what data is capable of doing and how they need to ac-
tively produce data to create lucrative future returns. The significance
of the data hasn’t changed, its availability has.
S
self assessment Questions
Activity
n o t e s
initiate outside the data from multiple sensor devices, and servers,
e.g. a spatial satellite or an oil rig in the sea.
Artificial Intelligence (AI): This is a top trend as per multiple
studies with scientists targetting to make machines that do what
complex human reflexes and intelligence achieve. The analytical
work on such programmes is exponentially growing with AI and
machine-learning transforming the way we relate with the analyt-
ics and data management.
BI Centre of Excellence (CoE): Moving to a simpler, secure and
effective BI strategy isn’t entirely the onus of IT. The difficulty of
the data management in huge companies is astounding, and the
need to strengthen it is becoming important. A growing number
of organisations are opting for BI and Analytical CoE to substitute
the implementation of self-serviced analytics. These CoE centres
S
will have a great role in applying an information-driven culture
and get the maximum advantage from a BI solution. Through me-
diums like virtual forums and training, the CoEs will authorise
even laymen to include data in their decision-making strategy. It
IM
is quite an efficient way of getting skilled people, processes and
technology aligned in a structured manner at one place.
Predictive analytics and impact on data discovery: By gather-
ing more information, organisations will have the capacity to build
more detailed visual models that will help them to act in more ac-
curate ways. For instance, having better information models shows
M
n o t e s
Activity
What trend you think can be emerging the next in BI and BA field?
Discuss.
S
3.9 Summary
IM
Business Analytics is a group of techniques and applications for
storing, analysing and making data accessible to help users make
better strategic decisions.
The analytics certainly influences the business by acquiring knowl-
edge that can be helpful to make enhancements or bring changes.
M
key words
n o t e s
S
examples.
4. Discuss the importance of BA with suitable examples.
5. Describe the importance of BI.
IM
6. Discuss the evolution and relation between BA and BI.
2. Analytical
Types of BA 3. Diagnostic
4. Predictive
Business Analytics Model 5. Strengths, Weaknesses,
Opportunities, Threats
6. True
Importance of Business 7. Logical
Analytics
8. True
What is Business 9. False
Intelligence (BI)?
10. Business Intelligence (BI)
Relation between BI and BA 11. Root
12. True
Emerging Trends in BI and 13. b. Centre of Excellence
BA
14. Digitisation
n o t e s
S
processes and make it likely to recognise any parts requiring a
fix or improvement. Refer to Section 3.5 Importance of Business
Analytics.
IM
5. Business Intelligence (BI) is the set of applications, technologies
and ideal practices for the integration, collection, presentation of
business information and analysis. Refer to Section 3.6 What is
Business Intelligence (BI)?
6. BA and BI can be two of the most interchangeably used terms
but rarely explained in a way that doesn’t put the end-user in a
M
Suggested Readings
Liebowitz,J. (2013). Big data and business analytics. Boca Raton
(FL): CRC Press.
Laursen, G. H., & Thorlund, J. (2017). Business analytics for man-
agers: taking business intelligence beyond reporting. Hoboken,
NJ: John Wiley & Sons, Inc.
E-References
What is big data analytics? – Definition from WhatIs.com. (n.d.).
Retrieved April 25, 2017, from http://searchbusinessanalytics.
techtarget.com/definition/big-data-analytics
What is business analytics (BA)? – Definition from WhatIs.com.
(n.d.). Retrieved April 25, 2017, from http://searchbusinessanalyt-
ics.techtarget.com/definition/business-analytics-BA
Monnappa, A. (2017, March 24). Data Science vs. Big Data vs. Data
Analytics. Retrieved April 25, 2017, from https://www.simplilearn.
com/data-science-vs-big-data-vs-data-analytics-article
CONTENTS
S
4.1 Introduction
4.2 What is Data, Information and Knowledge?
IM
Self Assessment Questions
Activity
4.3 Business Analytics Personnel and their Roles
Self Assessment Questions
Activity
4.4 Required Competencies for an Analyst
M
Activity
4.6 Ensuring Data Quality
Self Assessment Questions
Activity
4.7 Technology for Business Analytics
Self Assessment Questions
Activity
4.8 Managing Change
Self Assessment Questions
Activity
4.9 Summary
4.10 Descriptive Questions
4.11 Answers and Hints
4.12 Suggested Readings & References
Introductory Caselet
n o t e s
XYZ Inc. provides its consumers a private and tailor made cloud
infrastructure to execute important applications, with the help of
latest cutting edge tools, which support the company to look after
customer needs while reducing management and system compli-
cations.
S
Along with a zero-acceptance policy for downtime, max data se-
curity is another core focus area of the company for which it has
two network connected data centers in metro cities working in
IM
tandem with the first data center deputed as a backup/failover
recovery with other data center to create a secure and reliable
disaster recovery solution.
n o t e s
learning objectives
S
4.1 Introduction
IM
Business analytics is a process to filter and analyse sets of data which
might be small bits of data, a file containing the data or a large col-
lection of data generally known as a database. With the growth in
the data, a need of storing it at some appropriate location arises from
where it can be easily accessed and modified irrespective of geograph-
ical location. Unlike small datasets which is useful only for individual
organisations, Big Data is useful for various organisations. To store
M
Big Data, companies use cloud technology, data warehousing, etc. This
data is further retrieved from its storage and analytics is applied on it
to derive useful information. The analytics involves the use of various
statistical methods such as measures of central tendency, graphs, etc.
N
In this chapter, you will first study about data, information and knowl-
edge. Next, the chapter discusses business analytics personnel and their
roles. Further, the chapter discusses the required competencies for an
analyst. Next, the chapter details upon business analytics data and the
importance of ensuring data quality. Towards the end, the chapter dis-
cusses technology for business analytics and change management.
n o t e s
Examples of data
2,4,6,8
Mercury, Jupiter, Pluto
The above data alone does not represent the true picture. Maybe the
sequence above is simply the table of two or a sequence denoting
the difference of two between numbers. The names may just be the
names of conference rooms in an organisation rather than being plan-
et names, unless you give it a logic and define the reasoning for its ex-
istence, the data alone does not have a standalone existence by itself.
Information is the result that we achieve after the raw data is pro-
cessed. This is where the data takes the shape as per the need and
starts making sense. Standalone data has no meaning. It only assumes
meaning and transitions into information upon being interpreted. In
IT terms, characters, symbols, numbers or images are data. These are
S
joint inputs which a system running a technical environment needs to
process in order to produce a meaningful interpretation.
IM
Information can offer answers to questions like which, who, why,
when, what and how. Information put into an equation should look
like:
Examples of Information
M
The second type is termed as the tacit knowledge referring to the type
of knowledge that is complex and intricate. It is gained simply by pass-
ing on to others and requires elevated and advance skills in order to
n o t e s
S
in the following order, as shown in Figure 4.1:
becomes becomes
IM
Data Information Knowledge
For example, if humidity levels are high and the temperature drips
considerably, the atmosphere is pretty much unlikely to hold the mois-
ture and the humidity, hence it rains. The pattern is reached on the
basis of comparing valid points emanating from data and information
resulting into the knowledge or sometimes also referred to as wisdom.
Wisdom exemplifies the understanding of essential values personi-
fied within the knowledge that are foundation for the knowledge in its
current form. Wisdom is systematic and includes an understanding of
all interactions that happen between raining, temperature gradients,
evaporation, changes, air currents and raining.
n o t e s
Activity
Suppose you have to explain a school going kid the difference be-
tween data, information and knowledge. Describe the method and
technique you will use.
S
tionality – since all they represent is the business their organisation is
offering to customers.
IM
Key Roles and Responsibilities of a Business Analyst
Business System
Planner Analyst
Project
Manager
Organization Financial
Analyst Analyst
Technology
Subject
Architect
Data Area Expert
Analyst
Application
Application Designer
Architect
Process
Analyst
n o t e s
S
Organising requirements: Requirements often come from mul-
tiple sources that sometimes may contrast with other sources.
A business analyst must segregate requirements into associated
IM
categories to efficiently communicate and manage them. Require-
ments are organised into types as per their source and applica-
tion. An ideal organisation averts project requirements from over-
looked, and thus leads to an optimum use of budgets and time.
Translating requirements: A business analyst must be skilled at
interpreting and converting the business requirements effectively
M
cycle, the business analyst protects the user’s and business needs
by confirming the functionality, precision and inclusiveness of
the requirements developed so far compared to the requirements
gathered in the initial documents. Such protection reduces the risk
and saves considerable time by certifying that the requirements
are being fulfilled before devoting further time in development.
Simplifying requirements: The main role of a business analyst is
to simplify tasks and maintain easier functionality. Completing the
business objective is the aim of every project; a business analyst
recognise and evades unimportant activities that are not helpful in
resolving the problem or achieving the objective.
Verifying requirements: A business analyst is the most informed
person in a project about the use cases; hence, they frequently val-
idate the requirements and discard implementation that do not
help in growing the business objective to culmination. Require-
ment verification is completed through test, analysis, inspection
and demonstration.
Managing requirements: Usually, an official requirements pre-
sentation is followed by the review and approval session, where
n o t e s
S
ing the system to regulate when replacement or deactivation may
be required.
IM
self assessment Questions
Activity
n o t e s
S
ment and resources from diverse backgrounds will collaborate on
a single platform to discuss, debate and finalise the requirements
which would incidentally be captured by you. It is necessary for
you to have that comprehension level along with the eloquence to
IM
deliver your conceptions or clear any doubts, which you have. You
should be able to make your point evidently and explicitly. Com-
municating the data and the information at the appropriate level is
important – as some stakeholders require more detailed informa-
tion than others due to the varying levels of understanding.
Manage stakeholder meetings: While email also acting as an au-
M
the times, you end up discovering more about your project from a
physical presence of all stakeholder where all collaborators tend to
be open about debating circumstances.
A good listener: You are better off listening more than you speak
and jotting down the notes and takeaways from the meetings.
Having good listening skills require patience and virtue to under-
stand and listen to the stakeholder, which gives them a feeling of
being heard and not being overlooked or overpowered by a dom-
inating analyst. Such projects often end up in mess sooner than
they should be. Your listening and information absorbing skills are
important to make you an effective analyst. Not only listen, but
understand the situation, question only where you think you are
being condescended upon by the stakeholders passing off unnec-
essary off-business requirements and ignoring the actual require-
ments that can help in making of an efficient system. You can at-
tend personality development training to get the control over voice
modulation, dialect and pitch moderation along with an effective
body language with business presentation skills.
n o t e s
S
the development team. Business analyst should prioritise activi-
ties separating critical ones from the others that can wait, and fo-
cus on them.
IM
Literary and documenting skills: Requirements documents, spec-
ifications, reports, analysis and plans. Being a business analyst, you
are supposed to deliver numerous types of documentations that
will go on to become project and legal documents later on. So, you
need to ensure that your documents are created concisely, and at
a comprehensible level for the stakeholders. Avoid specific jargons
M
head start over the others since it will lead to unambiguous require-
ments documentation.
Stakeholder management: It is important that you know how to
deal with stakeholders and know how much power and impact
they have on your project. Stakeholders can either be your best
friends/supporters or your greatest critics. An accomplished busi-
ness analyst will have the skill to investigate the degree of man-
agement every stakeholder needs and how they ought to be inde-
pendently dealt.
Develop your modelling skills: As the expression goes, a photo
paints a thousand words. Procedures (such as process modeling)
are compelling tools to pass on a lot of data without depending on
the textual part. A visual portrayal enables you to get an outline
of the issue or project so that you can see what functions well and
where the loopholes lie.
n o t e s
Activity
S
standards and the knowledge. IM
4.5 Business Analytics Data
Any approach for analytics must adjust to changes in the way people
work inside their business settings, particularly with the developing
size of data volumes. Arranging data that is redone in a way that bodes
well for every business customer requires infusing content with con-
text before augmenting the estimation of relevant filtering and rep-
M
and also persistently retrain the machine, cutting and dicing this data
in the view of individual needs and conveying it in a way that is most
useful relying upon a person’s perspective (area, time, gadget and so
on). Some of data analytics challenges are as follows:
Content variety and quality: Information sources are no longer
entirely organised. Business folks depend on a pool of information
objects that mix customarily structured information with various
types of artefacts, for example, transactional system databases and
in addition Web-based social networking channels, like Facebook,
Twitter, LinkedIn, Web journals, wikis, etc. each of which must be
surveyed for logical importance and incorporated inside different
data models.
For quality, the bits of information that can be mined from an infor-
mation source like a database or an online networking Web page
may have distinctive levels of relevance for various sorts of data
consumers in different places of an organisation. One example is
information gathered for announcing the item launches for senior
officials, a moved-up lookout of positive or negative beliefs might
n o t e s
S
through Twitter. That poses two difficulties – firstly, linking infor-
mation artifacts to various business domains, while the second in-
cludes deriving dynamic linkages, connections and relevance be-
yond settled ordered models. The last challenge likewise implies
IM
striving to advance an understanding of how data sets are utilised
by various people and adjusting analytical models respectively.
Personalisation challenges: More important than separating
through substantial volumes of data resources taken from a vari-
ety of sources is that a wide range of channels must be set up to
recognise different filters of business value relying on who cus-
M
n o t e s
Activity
S
ly manage all data types. The data quality management framework
comprises of three mechanisms: control, monitor and improve.
Control
IM
The most ideal approach to deal with the nature of information in a
data framework is to guarantee that only the information which meets
the standard models is permitted to enter the framework. This can
be accomplished by setting up solid controls at the front end of every
data inflow system, or by putting validation runs in the integration
M
Monitor
n o t e s
data to ensure that the data quality matches the desired levels. Ad-
ditionally, information captured from one system to another compels
the company to monitor the data frequently to confirm consistency
across multiple systems. Data quality monitoring enables the organi-
sation to actively discover issues before they affect the decision-mak-
ing process.
S
Correctness Measure the degree of data accuracy
IM
Measure the degree to which all required data is present
Completeness
Improve
When the data quality checks report a decline in quality, a few correc-
tive measures can be deployed. As described above, training and ad-
justing processes and system enhancements involve both people and
technology. Usually, an improvement plan which is implemented right
after the first instance of quality dip comprises data cleansing, which
can be completed via automation or manually by business users. If the
business can self-define the rules to improve data, then data purging
programs can be easily created to mechanise the data enhancement
method. Next step - business validation – makes sure that the data
regains its required quality levels. Habitually, organisations end the
data quality enhancement program after a single round of positive
validation which is a wrong step. An important step that is missed is
improving data quality controls to ensure that the same issues do not
recur by doing a full RCA of the issues and quality controls. Applying
these steps is more critical when a project consists of master data or
n o t e s
S
data parameters and controlling the overall aspects of data to
ensure ________________ in quality.
10. In a close environment, data quality is achievable and can be
IM
achieved without adhering to data metrics. (True/False)
Activity
n o t e s
While taking the human factor in mind, the change between reactive
and proactive decision making is defined by the complexity level of
the fields between advanced analytics and BI. Summary reports, sta-
tistics and queries, and low-latency dashboards are built on chrono-
logical information. There is a mid-ground for simple analytics, e.g.,
algebraic or trending predictions that give estimated answers about
expectations in terms of sales, production, etc. Advanced analytics are
much more refined, support techniques such as statistical analysis,
S
forecasting, prediction and correlation, whereas trend analysis simply
infers the existing data to project the next quarter. A refined predic-
tive model takes seasonality, correlations between strong and weak
quarters, and historical sales outlines into account.
IM
Let’s take a look at decision making from another point of view. Say
we want to examine our brain while taking a decision. From a logi-
cal viewpoint, when our brain encounters a task it has no idea about,
it attempts to create rational assumptions guessing the input, likely
outcomes vs. actions to be taken, and attempts to find the best an-
swer. When the brain encounters the same level of problem again, it
M
n o t e s
S
the world utilise applications like MS Word, Excel, Visio, PowerPoint
& Project and many such tools in order to put their best foot forward.
These tools are effective and clear in presenting information closest to
depiction as wanted by the analyst and hence elate the overall levels
IM
of analytical operational standards.
Activity
n o t e s
zz Compare planned
zz Register and study and actual indi-
corporate data cators
zz Follow the budget
Monitoring Analysis
Change Control
Management zz Evaluate the
zz Implement a efficiency of
achieved targets
S
balanced indicator
system zz Make exact
decisions
IM
Figure 4.4: Change Management Phases
There should be multiple phase auditors to ensure that the roles and
responsibilities of one phase assigned to a business analyst do not seep
into the other phases, affecting the overall outcomes and messing up
the overall project execution.
M
ical investors to reviewing the solution which was put in place a bit
too early and now is facing strong resistance. To get your job done
efficiently in such circumstances, you need to comprehend how well a
change is received by the susceptible individuals and how to lead the
people through the change. Let us discuss a few of the topics related
to change management a business analyst should abide to.
n o t e s
At the point when the vast majority consider helping people to adjust
S
to a change, the two most commonly used methods are Training and
Communication. Both are important tools that are expected to help
individuals work through the change procedure, and help address the
IM
awareness and ability/knowledge areas. Nonetheless, they are not ad-
equate to completely back the implementation of a change.
n o t e s
S
concentrate on next trends and envision the restatement of business
data needs.
Project and change management are different and correlative activ-
IM
ities that use multiple skill sets. Project management drives the spe-
cialised side of a technology side, concentrated on guaranteeing that
the solution is appropriately designed and works as required. Change
management is centered around the people side, preparing clients
for the change and attempting to ensure that the new procedures are
adaptable and usable. According to a study carried out by an individ-
ual research group, emerging best practices are intended for change
M
n o t e s
Activity
4.9 Summary
Data, to put simply, is the raw material that does not make any
definite sense unless you process it to any meaningful end.
Information is the result which we achieve after the raw data is
S
processed.
Standalone data has no meaning rather it only assumes meaning
and transitions into information upon being interpreted.
IM
Knowledge is something that is inferred from the data and infor-
mation.
A business analyst is anyone who has the key domain experience
and knowledge related to the paradigms being followed.
Business analysts need not necessarily be from the IT background
M
key words
n o t e s
S
What is Data, Information 1. Data
and Knowledge?
2. True
IM
Business Analytics Person- 3. Failure
nel and their Roles
4. Segregate
Required Competencies 5. True
for an Analyst
6. False
M
10. False
Technology for Business 11. a. Associative Query Logic
Analytics
Managing Change 12. Change
n o t e s
SUGGESTED READINGS
Laursen, G. H., & Thorlund, J. (2017). Business analytics for man-
agers: taking business intelligence beyond reporting. Hoboken,
NJ: Wiley. Isson, J. P. (2013). Win with advanced business analytics:
creating business value from your data. Hoboken, NJ: John Wiley
& Sons.
S
E-REFERENCES
Risk, S. (n.d.). Business Analytics less Data Quality equals Bad
IM
Decisions. Retrieved April 26, 2017, from https://www.blue-gran-
ite.com/blog/business-analytics-less-data-quality-equals-bad-de-
cisions
Data Quality for Business Analytics by David Loshin - BeyeNET-
WORK. (n.d.). Retrieved April 26, 2017, from http://www.b-eye-net-
work.com/view/15539
M
N
Descriptive Analytics
CONTENTS
S
5.1 Introduction
5.2 Visualising and Exploring Data
IM
5.2.1 Dashboards
5.2.2 Column and Bar Charts
5.2.3 Data Labels and Data Tables Chart Options
5.2.4 Line Charts
5.2.5 Pie Charts
5.2.6 Scatter Chart
M
Activity
5.3 Descriptive Statistics
5.3.1 Central Tendency (Mean, Median and Mode)
5.3.2 Variability
5.3.3 Standard Deviation
Self-Assessment Questions
Activity
5.4 Sampling and Estimation
5.4.1 Sampling Methods
5.4.2 Estimation Methods
Self-Assessment Questions
Activity
5.5 Introduction to Probability Distributions
Self-Assessment Questions
Activity
CONTENTS
5.6 Summary
5.7 Descriptive Questions
5.8 Answers and Hints
5.9 Suggested Readings & References
S
IM
M
N
Introductory Caselet
n o t e s
S
The cab company is on a strict marketing and advertising budget
and needs the analytics to stay true to their potential. A misfired
campaign may result in a detrimental image as well as the revenue
IM
loss for the company. The statistics and analysis of the consultan-
cy firm needs to be spot on in order to create a niche in the market
for a domain where already there are several players. They need
to make sure that the customers are taken into confidence along
with the existing players and retained for a long time. The con-
sultancy will study the current market and stats around the area
M
n o t e s
learning objectives
5.1 Introduction
Descriptive analytics is the most essential type of analytics and estab-
lishes the framework for more advanced type of analytics. This sort of
analysis involves “What has occurred in the corporation” and “What
S
is going on now?” Let us consider the case of Facebook. Facebook
user produce content through comments, posts and picture uploads.
This information is unstructured and is produced at an extensive rate.
Facebook stats reveal that 2.4 million posts equivalent to around 500
IM
TB of information are produced every minute. These jaw-dropping
figures have offered popularity of another term which we know as Big
Data.
There are three crucial approaches to abridge and describe the raw
data:
Dashboards and MIS reporting: This technique gives condensed
data giving information on “What has happened”, “What’s been
going on?” and “How can it stand with the plan?”
Impromptu detailing: This technique supplements the past strat-
egy in helping the administration to extract the information as re-
quired.
Drill-down reporting: This is the most complex piece of descrip-
tive analysis and gives the capacity to delve further into any report
to comprehend the information better.
n o t e s
S
Data visualising provides a way of data collaboration at all business
levels and can disclose surprising relationships and patterns.
5.2.1 DASHBOARDS
N
MS Excel refers to the vertical bar charts as column and horizontal bar
charts as bar charts. Column and bar charts are valuable for equating
categorical or series specific data, for demonstrating differences be-
tween value sets, and for displaying percentages or proportions of a
whole.
n o t e s
S
5.2.3 DATA LABELS AND DATA TABLES CHART OPTIONS
IM
MS Excel provides options for including the numerical data on which
charts are based within the charts. Data labels can be added to chart
elements to show the actual value of bars. Data tables can also be
added; these are usually better than data labels, which can get quite
messy. Both can be added from the Add Chart Element Button in the
Chart Tools Design tab, or also from the Quick Layout button, which
provides standard design options. Figure 5.2 shows data labels and
M
Line charts are a useful way of displaying data for a given period. You
may enter multiple series of data in line charts; however, it can be-
come difficult to interpret if the size of data values differs exponential-
n o t e s
S
Figure 5.3: Line Charts
Source: http://www.advsofteng.com/gallery_line.html
IM
5.2.5 PIE CHARTS
New age 3D pie charts can get confusing at times because of their nar-
row representation in case of huge data variables. This is because the
third dimension also represents something especially on a coordinate
graph. Hence, pie charts are preferred only in two dimensional form
for effective and simpler data representation. Figure 5.4 displays a pie
N
chart:
Organic (36%)
Facebook (6%)
Twitter (7%)
Pinterest (7%) Referrals (3%)
n o t e s
S
IM
Figure 5.5: Scatter Chart
Source: https://www.zingchart.com/docs/chart-types/scatter-plots/
n o t e s
5.2.9 PARETO ANALYSIS
S
omist. In 1906, he realised that a large portion of the total wealth is
held by a comparatively small number of the people in Italy. The Pa-
reto principle is often seen in many business situations. For example,
higher percentage of sales may come usually from a small percentage
IM
of customers, a higher percentage of defects originate from relatively
smaller batches of the product or a high percentage of stock value be-
longs to a small percentage of selective items. As a result, the Pareto
principle is also often called the “80–20 rule,” referring to the generic
situations.
M
Activity
n o t e s
S
of Google. A company like Netflix keeps extensive records on its cus-
tomers, making it easy to retrieve data about the entire population of
customers. However, it would probably be impossible to identify all
IM
individuals who do not own cell phones.
it is a sample. Most populations, even the finite ones, are usually too
large to practically or effectively deal with. For example, it would be
unreasonable as well as costly to survey the TV viewers’ population of
the United States. Sampling is also necessary when data must be ob-
tained from destructive testing or from a continuous production pro-
N
n o t e s
Median
Mode
S
Midrange
Mean
IM
The mathematical average is called the mean (or the arithmetic mean),
which is the sum of the observations divided by the total number of
observations. The mean of a population is shown by the µ, and the
sample mean is denoted by . If the population contains N observations
x1, x2,…xN, the population mean is calculated as
N
∑x
M
i
µ= i=1
∑x
i=1
i
x=
n
Note that the calculations for the mean are the same whether we are
dealing with a population or a sample; only the notation differs. We
may also calculate the mean in Excel using the function AVERAGE
(data range).
One property of the mean is that the sum of the deviations of each
observation from the mean is zero:
∑ (X
i
i − X) =
0
This simply means that the sum of the deviations above the mean is
the same as the sum of the deviations below the mean. Thus, the mean
“balances” the values on either side of it. However, it does not suggest
that half the data lie above or below the mean.
n o t e s
Median
The measure of location that specifies the middle value when the data
are arranged from least to greatest is the median. If the number of
observations is odd, the median is the exact middle of the sorted num-
bers – i.e. the 4 observation. If the number of observations is even, say
8, the median is the mean of the two middle numbers – i.e. mean of 4th
and 5th observation. We can use the Sort option of MS Excel to order
the data as per the rank and then find the median. The Excel function
MEDIAN (data range) could also be used. The median is meaningful
for ratio, interval and ordinal data. As opposed to the mean, the medi-
an is not affected by outliers.
Mode
S
servation/number/series that occurs the maximum number of times.
The mode is valuable for datasets containing smaller number of
unique values. You can easily identify the mode from a frequency dis-
IM
tribution by identifying the value having the largest frequency or from
a histogram by identifying the highest bar. You may also use the Excel
function MODE.SNGL (data range). For frequency distributions or
grouped data, the modal group is the group with the greatest frequency.
Midrange
M
5.3.2 Variability
The bigger the variance is, the more is the spread of the observations
from the mean. This indicates more variability in the observations.
The formula used for calculating the variance is different for popula-
tions and samples.
∑(x – µ)
2
i
σ2 = i −1
where xi is the value of the ith item, N is the number of items in the
population, and µ is the population mean.
n o t e s
∑(x – x)
2
i
s2 = i −1
n–1
where n is the number of items in the sample and is the sample mean.
μ = (1 + 3 + 5 + 7) / 4 = 4
S
Insert all known values into the formula for the variance, as shown
below:
IM
σ2 = σ (Xi - μ )2 / N
σ2 = [ (1 - 4 )2 + (3 - 4 )2 + ( 5 - 4 )2 + ( 7 - 4 )2 ] / 4
σ2 = [ ( -3 )2 + ( -1 )2 + ( 1 )2 + ( 3 )2 ] / 4
σ2 = [ 9 + 1 + 1 + 9 ] / 4 = 20 / 4 = 5
The square root of the variance is the standard deviation. For a popu-
lation, the standard deviation is computed as:
N
∑(x – µ)
2
N
i
σ= i −1
∑(x – x)
2
i
s= i −1
n–1
n o t e s
x = (1 + 3 + 5 + 7) / 4 = 4
S
Then, we insert all the known values into formula for calculating the
SD of a sample, as shown below:
Standardised Values
We subtract the sample mean from the ith observation, xi, and divide
N
Thus, a z-score of 1.0 means that the observation is one standard de-
viation to the right of the mean; a z-score of -1.5 means that the ob-
servation is 1.5 standard deviations to the left of the mean. Thus, even
though two data sets may have different means and standard devia-
tions, the same z-score means that the observations have the same
relative distance from their respective means.
Z-scores can be computed easily on a spreadsheet; however, Excel has
a function that calculates it directly, STANDARDISE (x, mean, stan-
dard_dev).
xi – x
Zi =
s
n o t e s
Coefficient of Variation
CV = Standard Deviation/Mean
S
That is, if the objective is to maximise return, a higher return-to-risk
ratio is often considered better. A related measure in finance is the
Sharpe ratio, which is the ratio of a fund’s excess returns (annualised
IM
total returns minus Treasury bill returns) to its standard deviation. If
several investment opportunities have the same mean but different
variances, a rational (risk-averse) investor will select the one that has
the smallest variance. This approach to formalising risk is the basis for
modern portfolio theory, which seeks to construct minimum-variance
portfolios.
M
Activity
n o t e s
S
The objective of a sampling study might be to estimate the proportion
of golfers who would likely subscribe to this programme. The target
population might be all golfers over 25 years old. However, identify-
IM
ing all golfers in America might be impossible. A practical population
frame might be a list of golfers who have purchased equipment from
national golf or sporting goods companies through which the discount
card will be sold. The operational procedures for collecting the data
might be an e-mail link to a survey site or direct-mail questionnaire.
The data might be stored in an Excel database; statistical tools such
as PivotTables and simple descriptive statistics would be used to seg-
M
n o t e s
and then every 2000th name can be selected. This approach can be
used for sampling telephone supported by an automated dialler
used to dial numbers in an orderly manner. However, systemat-
ic sampling is complex compared to random sampling as for any
given sample, every possible sample of a given size of the popula-
tion has no equal chance of getting selected. In few situations, this
method can bring weighty bias if the population has some basic
pattern. For example, sampling the orders received on each Sun-
day may not produce an illustrative sample if consumers tend to
order more or less i=on other days.
Stratified sampling: It applies to populations divided into natu-
ral subsets (strata). For example, a large city may be divided into
political districts called wards. Each ward has a different number
of citizens. A stratified sample would choose a sample of individu-
als in each ward proportionate to its size. This approach ensures
S
that each stratum is weighted by its size relative to the population
and can provide better results than simple random sampling if the
items in each stratum are not homogeneous. However, issues of
IM
cost or significance of certain strata might make a disproportion-
ate sample more useful. For example, the ethnic or racial mix of
each ward might be significantly different, making it difficult for a
stratified sample to obtain the desired information.
Cluster sampling: It refers to dividing a population into clusters
(subgroups), sampling a cluster set, and conducting a complete
M
5.4.2 Estimation Methods
Sample data provides the basis for many useful analyses to support
decision making. Estimation involves evaluating the value of an unfa-
miliar population constraint—such as a population proportion, popu-
lation mean, or population variance—using sample data. Estimators
as measures are used to approximate the population parameters; e.g.,
we use the mean sample x to approximate a population mean µ. The
n o t e s
Unbiased Estimators
It seems quite intuitive that the sample mean should provide a good
point estimate for the population mean. However, it may not be clear
why the formula for the sample variance we read previously, has a
denominator of n - 1, particularly because it is different from the for-
mula for the population variance. In these formulas, the population
variance is computed by
N
∑(x – µ)
2
i
S
σ2 = i −1
∑(x – x)
2
i
s2 = i −1
n–1
One of the drawbacks of using point estimates is that they do not pro-
vide any indication of the magnitude of the potential error in the es-
timate. A newspaper reported that college professors were the best-
paid workforces in the area, with an average pay of $150,004. However,
it was found that average pays for two local universities were less
than $70,000. How did this happene? It was revealed that the sample
size taken was very small and included a large number of highly-paid
n o t e s
S
tion effectively. This is generally a result of poor sample design, such
as using a convenience sample when a simple random sample would
have been more appropriate or choosing the wrong population frame.
IM
To draw good conclusions from samples, analysts need to eliminate
non-sampling error and understand the nature of sampling error.
Sampling error depends on the size of the sample relative to the pop-
ulation. Thus, determination of sample size to be taken is basically a
statistical issue based on the precision of the estimates required to
infer a valuable assumption. Also, from a rational point of view, one
M
should also deliberate the sampling price and create a trade-off be-
tween cost and information obtained.
Sampling Distributions
We can quantify the sampling error in estimating the mean for any
unknown population. To do this, we need to characterise the sampling
distribution of the mean.
The means of all possible samples of a fixed size n from some popu-
lation will form a distribution that we call the sampling distribution
of the mean. The histograms are approximations to the sampling dis-
tributions of the mean based on 25 samples. Statisticians have shown
two key results about the sampling distribution of the mean. First one,
n o t e s
Confidence Intervals
S
of a point estimate. It is a value range between which the population
parameter value is assumed to be correctly estimating the true (indef-
inite) population parameter along with a probability. This probability
IM
is called the level of confidence, denoted by 1 – a, where a is a number
between 0 and 1.
Prediction Intervals
n o t e s
Note that this interval is wider than the confidence interval by the
additional value of 1 under the square root. This is because, in addi-
tion to estimating the population mean, we must also account for the
variability of the new observation around the mean.
S
b. The population frame
c. The method of sampling
d. All of these
IM
8. The most common probabilistic sampling approach is simple
____ sampling.
9. A _____ sample would choose a sample of individuals in each
ward proportionate to its size.
M
Activity
Introduction to Probability
5.5
Distributions
The concept of probability is prevalent everywhere, from stock mar-
ket predictions and market research to weather forecasts. In a busi-
ness, managers need to know the likelihood that a new product will
be profitable or the chances that a project will be completed on time.
Probability quantifies the uncertainty that we encounter all around us
and is an important building block for business analytics applications.
Probability is the likelihood that an outcome occurs. Probabilities are
expressed as values between 0 and 1, although many people convert
them to percentages. The statement that there is a 10% chance that oil
prices will rise next quarter is another way of stating that the proba-
bility of a rise in oil prices is 0.1.
The closer the probability is to 1, the more likely it is that the outcome
will occur. Before we discuss probability, let’s get familiarised with its
terminology.
n o t e s
S
First, if the process that generates the outcomes is known, prob-
abilities can be deduced from theoretical arguments; this is the
classical definition of probability.
IM
The second approach to probability, called the relative frequency
definition, is based on empirical data. The probability that an out-
come will occur is simply the relative frequency associated with
that outcome.
Finally, the subjective definition of probability is based on judg-
ment and experience, as financial analysts might use in predicting
M
a 75% chance that the DJIA will increase 10% over the next year,
or as sports experts might predict, at the start of the football sea-
son, a 1-in-5 chance (0.20 probability) of a certain team making it
to the final.
N
The definition to use depends on the specific application and the avail-
able information. We will see various examples that draw upon each of
these perspectives.
And let P(Oi) be the probability related with the outcome Oi.
n o t e s
The union of A and B is the event {2, 3, 7, 11, 12}. The probability that
some outcome in either A or B (i.e., the union of A and B) occurs is
S
denoted as P(A or B). Finding this probability depends on whether the
events are mutually exclusive or not. Two events are mutually exclu-
sive if they have no outcomes in common. The events A and B in this
IM
example are mutually exclusive. When events are mutually exclusive,
the following rule applies:
Ifevents A and B are mutually exclusive, then P(A or B) = P(A) +
P(B)
Iftwo events A and B are not mutually exclusive, then P (A or B)
=P(A) + P(B) – P (A and B). Here, (A and B) represents the inter-
M
Conditional Probability
N
n o t e s
S
intervals of real numbers.
n o t e s
Bernoulli Distribution
Binomial Distribution
S
The binomial distribution models n independent replications of a Ber-
noulli experiment, each with a probability p of success. The random
variable X represents the number of successes in these n experiments.
Let us consider a telemarketing example, suppose we call n = 10 cus-
IM
tomers, each of which has a probability p = 0.2 of making a purchase.
Then the probability distribution of the number of positive responses
obtained from 10 customers is binomial. Using the binomial distribu-
tion, we can calculate the probability that exactly x customers out of
the 10 will make a purchase. The value of x will always be between 0
and 10. A binomial distribution might also be used to model the results
M
Poisson Distribution
N
Uniform Distribution
n o t e s
Normal Distribution
S
increased, the distribution becomes narrower or wider, respectively.
Using sample data may limit our ability to predict uncertain events
that may occur because potential values outside the range of the sam-
ple data are not included. A better method is to identify the probability
N
Summary statistics can also provide clues about the nature of a distri-
bution. The mean, median, standard deviation and coefficient of vari-
ation often provide information about the nature of the distribution.
n o t e s
For instance, normally distributed data tend to have a fairly low coef-
ficient of variation (however, this may not be true if the mean is small).
For normally distributed data, we would also expect the median and
mean to be approximately the same. For exponentially distributed
data, however, the median will be less than the mean. Also, we would
expect the mean to be about equal to the standard deviation, or, equiv-
alently, the coefficient of variation would be close to 1. We could also
look at the skewness index. Normal data are not skewed, whereas
lognormal and exponential data are positively skewed. The following
example of Analysing Airline Passenger Data will help better in un-
derstanding the distribution of a normal data.
S
at a high price. The histogram shows a relatively symmetric distri-
bution. The mean, median, and mode are all similar, although there
is some degree of positive skewness. It is important to recognise that
IM
this is a relatively small sample that can exhibit a lot of variability
compared with the population from which it is drawn. Thus, based on
these characteristics, it would not be unreasonable to assume a nor-
mal distribution for developing a predictive or prescriptive analytics
model.
M
Activity
5.6 Summary
Descriptive analytics is the most essential type of analytics and es-
tablishes the framework for more advanced type of analytics.
Datavisualisation is the method of showing data in a graphical
manner to provide insights that help take better decisions.
n o t e s
S
Conditional probability is the probability of occurrence of one
event A, given that another event B is known to be true or has
already occurred.
IM
key words
measures.
Line chart: A type of chart that is used to display data pertain-
ing to a given period.
N
n o t e s
S
9. Stratified
Introduction to Probability Dis- 10. False
tributions
IM
11. Conditional
12. Random
Suggested Readings
Sheikh, N. M. (2013). Implementing analytics: a blueprint for de-
sign, development, and adoption. Amsterdam: Elsevier.
Atzmüller,
M., & Roth-Berghofer, T. R. (2016). Enterprise big data
engineering, analytics, and management. Hershey: IGI Global.
n o t e s
E-References
Descriptive, Predictive, and Prescriptive Analytics Explained.
(2016, August 05). Retrieved May 01, 2017, from https://halobi.
com/2016/07/descriptive-predictive-and-prescriptive-analytics-ex-
plained/
Big Data Analytics: Descriptive Vs. Predictive Vs. Prescriptive.
(n.d.). Retrieved May 01, 2017, from http://www.information-
week.com/big-data/big-data-analytics/big-data-analytics-descrip-
tive-vs-predictive-vs-prescriptive/d/d-id/1113279
What is descriptive analytics? - Definition from WhatIs.com. (n.d.).
Retrieved May 01, 2017, from http://whatis.techtarget.com/defini-
tion/descriptive-analytics
S
IM
M
N
Predictive Analytics
CONTENTS
S
6.1 Introduction
6.2 Predictive Modelling
IM
6.2.1 Logic Driven Models
6.2.2 Data Driven Models
Self Assessment Questions
Activity
6.3 Introduction to Data Mining
Self Assessment Questions
M
Activity
6.4 Data Mining Methodologies
6.4.1 Classification
6.4.2 Regression
N
Introductory Caselet
n o t e s
S
fleeing, that led the company to ultimately recall all the phones it
sold and put the lid on the project Note 7 forever – a total loss of
$18 billion.
IM
However, rather than taking it as an incident to beat the bush
around with and pinning the blame on quality control, vendors
and everyone else, Samsung took it in a positive stride. They fig-
ured out the real issue with the battery, fixed the gaps and ex-
ploited the existing market sentiments cleverly by emphasising
on their battery issues openly and steps they took to fix that goof-
M
n o t e s
learning objectives
6.1 INTRODUCTION
In the previous chapter, you have learned about descriptive analytics
analyses a database to provide information on the trends of past or
current business events that can help managers, planners, leaders,
etc., to develop a road map for future actions. Descriptive analytics
S
performs an in-depth analysis of data to reveal details such as fre-
quency of events, operation costs, and the underlying reason for fail-
ures. It helps in identifying the root cause of the problem. On the other
hand, Predictive analytics is about understanding and predicting the
IM
future and answers the question ‘What could happen?’ by using statis-
tical models and different forecast techniques. It predicts the near fu-
ture probabilities and trends and helps in what-if analysis. In predic-
tive analytics, we use statistics, data mining techniques, and machine
learning to analyse the future. Figure 6.1 shows the steps involved in
predictive analytics:
M
N
In this chapter, you will first learn about about predictive modelling.
Further, the chapter discusses about the concept of data mining. To-
wards the end, the chapter discusses about different data mining
methodologies such as classification, regression, clustering (K-means)
and artificial neural networks.
n o t e s
S
Predictive analysis and models are characteristically used to predict
future probabilities. Predictive models in business context, are used
to analyse historical facts and current data to better comprehend cus-
tomer habits, partners and products and to classify possible risks and
IM
prospects for a company. It practices many procedures, including sta-
tistical modelling, data mining and machine learning to aid analysts
make better future business predictions.
Predictive modelling is at the heart of business decision making.
Building decision models more than science is an art.
M
and research
Logical skillset
It is always recommended to start simple and keep on adding to
the models as required.
n o t e s
nection between cases of the instances “issues with the item”, for ex-
ample, and increase in customer service calls.
S
able variables. This method allows for the data collection and prepa-
ration of a statistical model, to which extra data can be added as and
when available.
IM
The accumulation of higher data volumes creates a nifty predictive
model, trusting the larger data sets which produce more dependable
forecasts based on the data volume examined. Moreover, trusting the
actual data to power predictive analytics models marks better accu-
rateness of the predicting process.
M
Logic driven models are created on the basis of inferences and postu-
lations which the sample space and existing conditions provide. Creat-
ing logical models requires solid understanding of business functional
n o t e s
30% of the customers do not return each year, while 70% do return to
provide more business to the restaurant.
S
profit for a typical customer turns out to be 12000×3.33 = `39,960
Armed with all the above details, we can logically arrive at a conclu-
sion and can derive the following model for the above problem state-
IM
ment:
where,
M = Profit margin
So, as you can see, logical driven predictive models can be derived for
a number of situations, conditions, problem statements and a lot other
scenarios where predictive analytical models provide a futuristic view
on the basis of validation, testing and evaluation to guess the likeli-
hood of an outcome in a given set amount of input data.
n o t e s
comes based on the data. Refer to the caselet in this chapter for data
driven modelling – Samsung’s case with their product and their en-
suing actions as a good example of data driven predictive modelling.
S
Activity
n o t e s
S
Similar data where the classification is known are utilised to cre-
ate rules, which are then subjected to the data with the unknown
order. We will study about classification in more detail further in
the chapter.
IM
Prediction: Prediction resembles classification, aside from that we
are attempting to foresee the estimation of a numerical variable
(e.g., measure of procurement) as opposed to a class (e.g., buyer or
non-buyer). Obviously, in classification, we are attempting to fore-
see a class, yet the term forecast in this book alludes to the forecast
of the constant variable estimation. Once in a while, in the data
M
n o t e s
S
7. Predictive analysis deals with data mining in the same way
business analytics deals with raw data. (True/False)
8. The third stage in data mining is ________.
IM
9. Data mining is solely predictive analytical strategy since
descriptive and prescriptive analytics deal with data only
after receiving it and predictive analysis forecasts the data
outcomes. (True/False)
M
Activity
n o t e s
S
between metrics that drive business performance—for instance,
profitability, customer satisfaction, or employee satisfaction. Un-
derstanding the drivers of performance can lead to better deci-
IM
sions to improve performance. For example, the controls group
of CGL Inc. evaluated the relationship between contract-renewal
rates and overall satisfaction. They concluded that 91% contract
renewals were of the customers who were either very satisfied or
satisfied, and higher defection rate for not satisfied customers.
Their model foretold that a one-percent-point surge in the gen-
eral satisfaction score was worth $12 million in renewals of yearly
M
6.4.1 Classification
Classification is the process of analysing data to predict how to classify
a new data element. An example of classification is spam filtering in
an e-mail client. By examining textual characteristics of a message
(subject header, key words, and so on), the message is classified as
junk or not. Classification methods can aid predicting if a credit-card
charge may be fake, risk details of a loan applicant, or whether expect-
ing a consumer response to an advertisement.
Classification is about predicting a positive conclusion based on a
given input and algorithm. The algorithm attempts to determine the
relationships between the attributes that will make it feasible to fore-
cast the outcome. Next an unseen data set is given to the algorithm,
called prediction set, containing the same set of attributes, excluding
the prediction attribute. The algorithm examines the input and yields
a prediction. The accuracy of the prediction describes about the ef-
ficiency of the algorithm. For example, the training set in a medical
database would have applicable patient information captured earlier
in which the prediction attribute is the patient’s heart problem.
n o t e s
Figure 6.3 demonstrates the prediction sets and training of such a da-
tabase:
Figure 6.3: Showing Training set and Prediction Set for Medical
Database
S
Among a few types of data representation known, classification nor-
mally uses forecast principles to express learning and knowledge. Pre-
diction standards are communicated as IF-THEN guidelines, where
IM
the antecedent (IF part) comprises a conjunction of conditions and the
rule subsequent (THEN part) predicts a specific expectations trait for
an item that fulfils the forerunner. Utilising the above example, a rule
expecting the first row in the training set might be represented as:
IF (Age=65 AND Heart rate>70) OR (Age>60 AND Blood pres-
sure>140/70) THEN Heart problem=yes
M
6.4.2 Regression
n o t e s
S
relationship may be linear or nonlinear, or there may be no relation-
ship at all. Because we are focusing our discussion on linear regression
models, the first thing to do is to verify that the relationship is linear.
IM
We would not expect to see the data line up perfectly along a straight
line; we simply want to verify that the general relationship is linear. If
the relationship is clearly nonlinear, then alternative approaches must
be used, and if no relationship is evident, then it is pointless to even
consider developing a linear regression model.
recommend that you create a scatter chart that can display the rela-
tionship between variables visually as shown in Figure 6.4:
N
Y = β0 +β1X + β2X2 + e
n o t e s
S
ters will stay dissimilar. Cluster analysis reduces the data overhead
since it can take number of observations, such as questionnaires or
customer surveys, and decrease the information into smaller easier to
interpret similar groups. The segmentation of customers into small-
IM
er groups, for example, can be used to customise advertising or pro-
motions. As opposed to many other data-mining techniques, cluster
analysis is primarily descriptive, and we cannot draw statistical infer-
ences about a sample using it. In addition, the clusters identified are
not unique and depend on the specific procedure used; therefore, it
does not result in a definitive answer but only provides new ways of
M
n o t e s
S
malise the data points distance from the cluster. K-means clustering
solves:
IM
ci =set of points belonging to cluster i.
The goal of this algorithm is to find groups in the data, with the num-
ber of groups represented by the variable K. The algorithm works it-
N
n o t e s
Business Uses
S
which have not been unequivocally labeled in the data. This can be
used to affirm business assumptions about group types that exist or to
recognise unclear groups in complex datasets. Once the algorithm is
executed with characterisation of groups, any new data can be effort-
IM
lessly allotted to the right group.
This is a flexible algorithm that can be utilised for a group. A few use
cases types are:
Behavioral segmentation:
Segment purchase history and activities on application, web-
M
site, or platform
Define interests based roles
Profiling based on activity monitoring
N
Inventory categorisation:
Group inventory by sales activity and manufacturing metrics
Sorting sensor measurements:
Detect activity in motion sensors
Group images and separate audio
Identify health monitoring groups
Detecting bots or irregularities:
Separating valid activity groups from bots
Grouping valid activity to clean up outlier detection
n o t e s
Subject A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
This data set is to be clustered into two groups. Let the A and B values
of the two individuals farthest apart (using the Euclidean distance cal-
culation), define the initial cluster means:
S
Individual Mean Vector (Centroid)
Group 1 1 (1.0, 1.0)
IM
Group 2 4 (5.0, 7.0)
Cluster 1 Cluster 2
Mean Vector Mean Vector
Step Individual Individual
(Centroid) (Centroid)
1 1 (1.0. 1.0) 4 (5.0, 7.0)
N
Now the initial partition is no more the same and the two clusters cur-
rently have the following features:
Individual Mean Vector (centroid)
Cluster 1 1, 2, 3 (1.8, 2.3)
Cluster 2 4, 5, 6, 7 (4.1, 5.4)
But since not everyone has been assigned to the respective cluster, we
cannot say for sure. So, we relate everyone’s distance from its own
cluster mean and of the opposite cluster:
Distance to mean (cen- Distance to mean (centroid)
Individual
troid) of Cluster 1 of Cluster 2
1 1.5 5.4
n o t e s
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
S
Cluster 1 1, 2 (1.3, 1.5)
Cluster 2 3, 4, 5, 6, 7 (3.9, 5.1)
The recurring movement would continue from this new partition till
IM
no more relocations remain to occur. However, in this example, each
person gets nearer to its own cluster mean than the other cluster and
the recurrence stops, choosing the latest partitioning as the final clus-
ter solution.
n o t e s
S
relationships hidden in in the data – in a route closely resembling that
utilised by the human cerebrum.
The input layer feeds past data values into the next (hidden) layer. The
black circles denote the hubs of the neural system. The hidden layer
stores a few complex functions that make predictors; those functions
are oblivious from the client. An arrangement of hubs (black circles)
at the hidden layer speaks to mathematical functions called neurons,
that alter the data information. The output layer gathers the predic-
tions from the hidden layer and delivers the outcome.
n o t e s
the output. Every neuron takes a group of input values; each is related
with a weight (more about that in a minute) and a numerical value
called bias. The output of every neuron is a function of the output of
the weighted aggregate of each input in addition to the bias.
S
Here f refers to the activation function which activates the neuron,
IM
and e denotes a mathematical constant that possesses an approximate
value of 2.718. The sigmoid functions are used in neurons as these
functions have positive derivatives and are easy to compute. More-
over, these are continuous, can act as types of smoothing and bound-
ed functions. This association of unique characteristics of sigmoid
functions is important for the workings of a neural network algorithm
— mainly when a derivative calculation (such as the weight related
M
n o t e s
Activity
Figure out the model that would best depict the above data in the
least number of steps.
S
6.5 SUMMARY
Predictive modelling is the method of making, testing and authen-
ticating a model to best predict the likelihood of a conclusion.
IM
Predictive analysis and models are characteristically used to pre-
dict future probabilities.
Predictive models are representations of the relationship between
how a member of a sample performs and some of the known char-
acteristics of the sample.
M
key words
n o t e s
S
6.6 DESCRIPTIVE QUESTIONS
1. Explain the concept of predictive modelling.
IM
2. What are logic driven models? Discuss with appropriate
examples.
3. Describe the concept of data mining. Enlist its four stages.
4. Discuss the differences between classification and prediction.
5. Explain some approaches in data mining.
M
n o t e s
S
foresee the estimation of a numerical variable. Refer to Section
6.3 Introduction to Data Mining.
5. Some common approaches in data mining include data
IM exploration and reduction, association and cause-and-effect
modelling. Refers to Section 6.4 Data Mining Methodologies.
6. Regression analysis is an instrument for creating statistical and
mathematical models that define relations between a dependent
variable (should be ratio variable, not categorical) and one or
more descriptive or independent numerical (ratio or categorical)
M
SUGGESTED READINGS
Bari,A., Chaouchi, M., & Jung, T. (2014). Predictive Analytics for
Dummies. Hoboken, NJ: John Wiley & Sons, Inc.
Finlay, S. (2014). Predictive Analytics, Data Mining and Big Data:
Myths, Misconceptions and Methods. Basingstoke: Palgrave Mac-
millan.
Larose, D. T., & Larose, C. D. (2015). Data Mining and Predictive
Analytics. Wiley.
E-REFERENCES
Predictive analytics. (2017, May 09). Retrieved May 16, 2017, from
https://en.wikipedia.org/wiki/Predictive_analytics
What is predictive analytics? - Definition from WhatIs.com. (n.d.).
Retrieved May 16, 2017, from http://searchbusinessanalytics.
techtarget.com/definition/predictive-analytics
Impact, I. P., & World, P. A. (n.d.). Predictive Analytics World. Re-
trieved May 16, 2017, from http://www.predictiveanalyticsworld.
com/predictive_analytics.php
PRESCRIPTIVE ANALYTICS
CONTENTS
S
7.1 Introduction
7.2 Overview of Prescriptive Analytics
IM
7.2.1 Prescriptive Analytics brings a lot of Input into the Mix
7.2.2 Prescriptive Analytics Comes of Age
7.2.3 How Prescriptive Analytics Functions
7.2.4 Commercial Operations and Viability
7.2.5 Research and Innovation
7.2.6 Business Development
M
Introductory Caselet
n o t e s
S
After many weeks had passed, one day Bill’s cell phone received
a notification while he was driving the car. Upon opening the no-
tification, Bill was surprised to get an alert message from the fast
IM
food vendor, which notified that the place where Bill was current-
ly travelling had a restaurant where he could use his coupon. Ini-
tially though shocked, Bill had always heard about the existence
of this cutting-edge technology, but had not known that one day
he might be benefited from this technology. This technology is a
sheer example of what retailers can do in the future by combining
M
the geo-location ability of the phone along with any other informa-
tion, which they had acquired from their customers.
Bill was too excited to be rewarded positively for sharing all of his
data with the credit card company. A bit uncomfortable initially
N
with the possibility of sharing all his details with the credit card
company, currently, Bill is very much pleased that the company is
using innovative methods like prescriptive analytics to serve its
customers better.
In this caselet, you can see how the credit card uses prescriptive
analytics to link customers with their requirements. After all the
information of an individual is being shared with the company, it
can use many mathematical modeling and statistics methods to
find actionable insights, which can again be used to help customer
to get better results.
n o t e s
learning objectives
7.1 INTRODUCTION
After studying predictive and descriptive analytics steps in the previ-
ous chapters of the business analytics process, one should be in a good
position to take the final step, i.e., prescriptive analytics. This analysis
will provide a prediction or a forecast of what future trends in the
S
business may look like.
By the end of this chapter, readers will understand how various class-
es of analytics—predictive and descriptive—can lead to prescriptive
analysis. This chapter will first discuss the meaning of prescriptive
analytics. Next, the chapter discusses prescriptive modeling. In the
end, the chapter discusses non-linear optimisation.
n o t e s
OVERVIEW OF PRESCRIPTIVE
7.2
ANALYTICS
Prescriptive analysis answers ‘What should we do?,’ on the basis of
complex data obtained from descriptive and predictive analyses. By
using the optimisation technique, prescriptive analytics determines
the finest substitute to minimise or maximise some equitable finance,
marketing and many other areas. For example, if we have to find the
best way of shipping goods from a factory to a destination, to minimise
costs, we will use prescriptive analytics. Figure 7.1 shows a diagram-
matic representation of the stages involved in the prescriptive analyt-
ics:
S
IM
M
ysed successfully, it can become the answer to one of the most import-
ant questions: how can businesses acquire more customers and gain
business insight? The key to this problem lies in being able to source,
link, understand and analyse data.
Since a lot has been written on Big Data, we will focus on analytics,
which will help companies transform the finance function by offering
forward looking insights and help them devise a solution appropriate
for the optimal course of action, improve the ability to communicate
and collaborate with other companies at a lower cost of ownership.
n o t e s
S
profit healthcare key arranging. By utilising data analytics, one can
harness operational information which includes population statistic
patterns, financial information, and population health patterns, to a
IM
more exact arrangement and contribute future capital, such as, equip-
ment usage and new facilities.
n o t e s
S
sion-making process. The approach dissects potential choices, col-
laborations amongst choices and influences on these decisions. The
approach then uses this data to help graph the best activities/choices.
It is conceivable in the view of advancements in processing speed and
IM
the subsequent advancement of complex scientific calculations con-
nected to the shifted data sets (big data).
n o t e s
S
Predicting and proactively overseeing market events
Providing significant data for territory examination, customer
deals and medical data
IM
A one-estimate fits-all business model is no longer reasonable; the
eventual fate of a focused sales model is focused on customised mes-
saging.
n o t e s
markets are demanding which items are key zones for prescriptive
analytics including:
Identifying and settling on choices about circumstances/rising
ranges of unmet need
Predicting the potential advantage
Proactively following industry trends and actualising techniques
to get an advantage
Exploiting data analytics to distinguish particular buyer popula-
tions and regions that ought to be focused on
Leveraging data analytics to distinguish key advancements for
item improvement that will produce the biggest return for the in-
vestment
S
Identifying likely purchasers to cut business improvement costs
altogether, and imagine a scenario where situations for items, mar-
kets and purchasers could be an unmistakable differentiator for
developing organisations
IM
7.2.7 CONSUMER EXCELLENCE
n o t e s
S
Prescriptive analytics can likewise furnish, supply chain capacities
with an upper hand through the capacity to predict and make deci-
sions in a few basic areas including:
IM
Forecasting future demand and pricing (e.g., supplies, material,
fuel and different components affecting cost to guarantee proper
supply)
Utilising prescriptive analytics to illuminate stock levels, schedule
plants, route trucks and different components in the supply chain
M
cycle
Modifying supplier threat by mining unstructured information re-
garding value-based information
Better understanding historical demand examples and product
N
n o t e s
S
4. Prescriptive analytics can help associations to remain
consistent in anticipating the upcoming dangers and settling
on the proper mitigation choices. (True/False)
IM
Activity
INTRODUCTION TO PRESCRIPTIVE
7.3
MODELING
Prescriptive analytics methods are not just concentrating on Why,
How, When, and What; additionally they also prescribe acceptable
behavior for taking advantage of the situation. Prescriptive analytics
every now and then proved itself as a benchmark for an organisation’s
analytics development. Segments of prescriptive analytics are:
a Evaluate and choose better ways to deal with work
b. Target business goals and conform all restrictions
n o t e s
S
It avoids overlapping of each phase.
This model works for small projects as the prerequisites are un-
derstood extremely well.
IM
Thismodel is favored for those activities where quality is more
imperative when contrasted with the cost of the venture.
The issues with this model are uncovered, until the product test-
ing stage.
The measure of risk is quite high.
N
n o t e s
S
7.3.3 Rapid Application Development (RAD) Model
following questions:
What information is generated?
Who generated the information?
N
n o t e s
S
b. It avoids overlapping of each phase.
c. This model works for small projects.
IM
d. It is a poor model for long activities.
7. RAD is a Rapid ________ Development model.
8. The waterfall model is also called the _______.
Activity
M
n o t e s
S
to be spent on a specific media and past that point any further use
may prompt the expansion in income yet at a diminishing rate. Hence,
it’s essential to discover the decreasing purpose of return for each of
the promoting mediums. Figure 7.2 demonstrates the income created
IM
against the cost caused for a TV commercial and both the cost and the
income said in the present paper dollar value are in thousands:
M
N
The curve that perfectly fits the plotted revenue and cost of TV promo-
tion is cubic and is plotted in Figure 7.2. The R-square accomplished
through the cubic condition is an astounding 98.7%. The first and sec-
ond request subordinates of the cubic condition are figured as takes
after:
n o t e s
S
discover the best fitting line by limiting the separation of the line from
the data points.
IM
This is accomplished by setting first order partial derivatives of inter-
cepts and the angles are equivalent to zero. The second order partial
derivative is utilised as a part of an optimisation issue to make sense of
whether a given basic point is a relative most extreme, relative least,
or a saddle point.
n o t e s
Activity
Create some teams in your class, each having four students and go
to the nearest truck dealer. Use the non-linear optimisation method
to calculate how to minimise the cost of transport as a lot of trucks
of the dealer ship goods to a large network of markets or stores.
7.5 SUMMARY
By using the optimisation technique, prescriptive analytics deter-
mines the finest substitute to minimise or maximise some equita-
ble finance, marketing and many other areas.
Data, which is available in abundance, can be streamlined for
growth and expansion in technology as well as business.
S
In real life, prescriptive analytics can automatically and continu-
ously process new data to improve forecast accuracy and offer bet-
ter decision options.
IM
Prescriptive analytics are an absolute necessity for any company
to execute key marketing strategies.
Corporate account functions can immensely use prescriptive ana-
lytics to improve their capacity to settle on choices that help drive
internal excellence and outer strategy.
M
key words
n o t e s
S
Answers for Self Assessment Questions
n o t e s
SUGGESTED READINGS
Liebowitz, J. (2014). Business analytics: an introduction. Boca Ra-
ton: CRC Press.
Williams, S. (2016). Business intelligence strategy and big data an-
alytics: a general management perspective. Cambridge, MA: Mor-
gan Kaufmann.
Bruce, P. C. (2015). Introductory statistics and analytics a resam-
S
pling perspective ;. Hoboken, NJ: Wiley.
E-REFERENCES
IM
August 17, 2016 · by Tuhin Chattopadhyay · in Big Data Analyt-
ics. (2016, August 23). Application of Derivatives to Nonlinear
Programming for Prescriptive Analytics. Retrieved May 02, 2017,
from https://www.blueoceanmi.com/blueblog/application-deriva-
tives-nonlinear-programming-prescriptive-analytics/
Beginning Prescriptive Analytics with Optimization Modeling by
M
CONTENTS
S
8.1 Introduction
8.2 Social Media Analytics
IM
Self Assessment Questions
Activity
8.3 Key Elements of Social Media
Self Assessment Questions
Activity
8.4 Overview of Text Mining
M
CONTENTS
S
8.13 Answers and Hints
8.14 Suggested Readings & References
IM
M
N
Introductory Caselet
n o t e s
S
The company took the help of Wipro to reduce the response time
in resolving customer issues by 65% by tracking customer experi-
ence through Social media analytics. This analytics, using Senti-
ment analysis, finds out business insights of the client’s strategies
IM
related to marketing and their customer relationships. Sentiment
analysis also helps in improving the promotional activities and
engaging customer’s attention with improved services.
which can also handle the data having noise. The SMA solution
also allows to send reports containing the data based on generat-
ed insights weekly/fortnightly to the clients.
Introductory Caselet
n o t e s
S
expansion on the basis of customers’ sentiments. Moreover, the
SMA solution also helped in identifying key influencers on the
social media by performing Social Node Network analysis.
IM
M
N
n o t e s
learning objectives
S
>> Explain how to perform mobile analytics
>> Describe the challenges of mobile analytics
IM
8.1 Introduction
In a world where information is readily available via Internet on the
click of a button, organisations need to remain abreast with the ongo-
ing events and latest happenings in order to gain a competitive edge
over business markets. Apart from that, organisations also need to
interact with their consumers more effectively in order to gain an in-
M
sight about the ongoing business trends and the market position of
particular products. Social media provides an opportunity to business
organisations and individuals to connect and interact with each other
worldwide. With the evolution of social media as a tool to connect with
N
This chapter discusses the role of social media and the importance of
conducting social media analytics by business organisations. These
analyses help organisations to evaluate feedback from consumers and
gauge their current and future position in the market. Further you
will learn about text mining and sentiment analysis. The chapter ends
with a presentation on how to perform social media analytics and
opinion mining on tweets.
n o t e s
S
by which public relations were developed. The new approach encour-
ages active participation in development and distribution of informa-
tion by merging innovative technologies and sociology. Social media
IM
provides a collaborative environment which can be employed for:
Building relationships
Distributing content
Rating products and services
Engaging target audience
M
n o t e s
S
Apart from the listed ones, social media may include websites that
showcase reviews and ratings, such as Yelp, forums, and discussion
boards, such as Yahoo!, and websites that showcase virtual social
worlds that create a virtual environment where people can interact,
IM
such as SecondLife. Figure 8.1 depicts the forms of conversations pos-
sible via social media:
M
N
n o t e s
S
example of _____ site.
Activity
IM
Search and prepare a report on Social Media Analytics Cycle.
n o t e s
S
Engage: The basic idea behind social media is to engage the ex-
isting and prospective customers. The tools and routines of social
media and the regular practice of listening, curation, and sharing
IM
help executives and sales personnel of an organisation in engag-
ing more and more customers, stakeholders, prospective custom-
ers, journalists, and industry influencers. Tools such as Salesforce
help to connect people from different categories. Apart from that,
various mobile apps also help in expanding the realm of reaching
more and more people.
M
various resources?
a. Collect b. Curate
c. Create d. Share
4. The Feedly and Hootsuite tools help in _______ information
and content over the social media.
Activity
n o t e s
The insight obtained from such reviews can help organisations to iden-
tify their key areas of improvement and enhance their performance.
S
However, certain tools and methodologies are required to read, inter-
pret, and analyse the large number of reviews received on a daily ba-
sis. This is accomplished by text mining.
IM
It is pretty difficult for any database administrator, marketing pro-
fessional, or a researcher to explore and extract the desired informa-
tion from the huge amount of data and information generated and
exchanged online on a daily basis. The problem is multiplied manifold
by the text-based social networking communications and documents
exchanged during business operations. Although keyword searching
M
Text mining employs the concepts obtained from various fields, rang-
ing from linguistics and statistics to Information and Communication
Technologies (ICT). Statistical pattern learning is applied to create
n o t e s
patterns from the extracted text, which are further examined to ob-
tain valuable information. The overall process of text mining compris-
es retrieval of information, lexical analysis, creation and recognition
of patterns, tagging, extraction of information, application of data
mining techniques, and predictive analytics. This can be summarised
as follows:
note
S
volves collection and identification of information from a set of textu-
al material. The information can come from various sources, such as
websites, database, documents, or content management system. The
IM
textual information is processed by parsers and other linguistic analy-
sis tools to examine and recognise textual features, such as people, or-
ganisations, names of places, stock ticker symbols, and abbreviations.
Unrestricted
M
Collect Exploratory
Data Freedom
Apply Text
Parse Mining
Algorithms
N
Repository Optimise
View
Results
note
n o t e s
Statistical analysis tools, such as R and word count, aid in the assess-
ment of the overall review. Further, positive and negative relationships
can be explored using various plotting techniques, such as scatter
plot. Apart from the listed application areas, text mining techniques
can be further applied for analysis of demographics, financial status,
and buying tendencies of customers.
S
tions need to know about not only the key players in the industry
but also the strengths and weaknesses of their competitors. Text
mining provides factual data to organisations that can be applied
for strategic decision making.
IM
Community leveraging: Text mining facilitates the identification
and extraction of the information embedded in community inter-
action. This information can be applied for amending marketing
strategies.
Law enforcement: Text mining can be applied in the domain of
M
n o t e s
S
Text preprocessing: This involves the identification of all the
unique words in a document. Non-informative words, such as the,
and, or, and when, are filtered out from the document text before
IM
applying word stemming. Word stemming refers to the process of
reducing the inflected or derived words to their stem base. For ex-
ample, words such as cat, cats, catlike, and catty will all be mapped
to the same stem base ‘cat’. Terms such as stemmers or stemming
algorithms are also used interchangeably in stemming programs.
Affix stemmers trim down both suffix and prefix, such as ed, ly,
M
and ing, from a given word. Popular stemmers include Brute Force
algorithm and Suffix Tripping algorithm.
Document representation: A document is basically represented
in words and terms.
N
n o t e s
S
IM
Figure 8.3: Stages of Document Processing in Text Mining
M
n o t e s
SAS Text Miner: Provides a rich suite of text processing and anal-
ysis tools
Textalyzer: Used for online text analysis
Apart from these, some other text mining tools include AeroText, An-
goss, Autonomy, Clarabridge, IBM LanguageWare, IBM SPSS, Word-
Stat, and Lexalytics
8.4.2 Sentiment Analysis
Sentiment analysis is one of the most important components of text
mining. Also termed as opinion mining, it involves careful analysis of
people’s opinions, sentiments, attitudes, appraisals, and evaluations.
S
This is accomplished by examining large amounts of unstructured
data obtained from the Internet on the basis of positive, negative, or
neutral view of the end user. Sentiment analysis involves the analysis
of following sentences:
IM
Facts: Product A is better than product B.
Opinions: I don’t like A. I think B is better in terms of durability.
Similar to Web analysis, specific queries are applied in sentiment
analysis to retrieve and rank relevant content. However, sentiment
analysis also differs from Web analysis in certain factors. It is possi-
M
note
n o t e s
S
perspective about the noun ‘room’. At this stage, the emotional factor
in the phrase is also examined and analysed. After that, an average
sentiment orientation of all the phrases is computed and analysed to
conclude if a product is recommended by a user.
IM
The following parameters may be applied to classify the given text in
the process of sentiment analysis:
Polarity, which can be positive, negative, or neutral
Emotional states, which can be sad, angry, or happySubjectivity or
objectivity
M
n o t e s
Apart from these, some other sentiment analysis tools include Social
Mention, AlertRank Sentiment Analysis, and Twitter Sentiment Anal-
ysis. The business organisations can apply specific tools as per their
requirements and sentiment analysis needs.
S
(True/False)
6. Text mining tools are often based on the principles of _____
and ______ processes.
IM
Activity
In this section, you will practice deriving useful information from the
data obtained from social networking sites.
n o t e s
and analysing text data using the R tool. A mobile phone manufactur-
ing company hires a data analyst to review the opinion given by peo-
ple on its products. This information will help the company to know
about the current market trends and further enhance the quality of
its products based on the insights. The data analyst decides to collect
data from the tweets of people, and then examine it under three cate-
gories: positive, negative, and neutral. Here, you are going to help him
download tweets and analyse them to derive valuable information.
Before performing the social media analytics, you need to load some
library utilities into the current R environment and verify the Twitter
authentication information to work with the tweets.
install.packages("twitteR")
S
install.packages("bitops")
install.packages("digest")
install.packages("RCurl")
IM
# If there is any error while installing RCurl,
follow the below
command using terminal
#sudo apt-get install libcurl4-openssl-dev
install.packages("ROAuth")
install.packages("tm")
install.packages("stringr")
M
install.packages("plyr")
library(twitteR)
library(ROAuth)
library(RCurl)
N
library(plyr)
library(stringr)
library(tm)
If you are working on Windows Operating System, you may face Se-
cured Socket Layer (SSL) certificate issues.
After loading the required R utilities and providing the SSL certificate
authentication information, load the Twitter authentication informa-
tion. This information will be used to download tweets later.
n o t e s
load("/Datasets/twitter_cred.RData")
registerTwitterOAuth(cred)
Figure 8.4 shows the use of Twitter credentials for Twitter authenti-
cation in R:
S
IM
Figure 8.4: Using Twitter Credentials for Twitter Authentication in R
M
note
input_tweets=searchTwitter("nokia", n=1000,lang="en")
input_tweets[1:3]
n o t e s
Some tweets, containing the search word, may be insignificant for our
analysis. Therefore, we need to extract tweets only with the relevant
texts. Enter the following command to extract a specific set of words
as a text string:
tweet=sapply(input_tweets,function(x) x$getText())
S
The strings can be viewed as vectors by entering the following com-
mand:
input_tweets[1:4]
IM
The next task is to segregate tweets on the nature of the feedback they
provide. The feedback would be positive, negative, or neutral. In our
case, we are using only positive and negative words. The function for
sentiment analysis is given as follows:
words,.progress='none')
{ scores = laply(sentences, function(sentence, pos.
words, neg.words)
{ sentence = gsub("[[:punct:]]", "", sentence)
sentence = gsub("[[:cntrl:]]", "", sentence)
N
n o t e s
After writing the preceding function, the files containing positive and
negative words are loaded to run the sentiment function. Enter the
following commands to load the data file containing positive and neg-
ative words, respectively:
pos=readLines("/Datasets/positive-words.txt") # find
file positive_words.txt
neg=readLines("/Datasets/negative-words.txt") # find
file negative_words.txt
S
scores$very.neg = as.numeric(scores$score < 0)
scores$very.neu = as.numeric(scores$score == 0)
Enter the following commands to find out the number of positive, neg-
ative, and neutral tweets:
n o t e s
After the sentiments are categorised and the number of positive, neg-
ative, and neutral tweets is found out, plot the results by using the
following command:
The Pie chart for the analysed sentiment score is shown in Figure 8.7:
S
IM
Figure 8.7: Pie Chart for the Sentiment Score
M
Activity
n o t e s
S
Figure 8.8: Showing the Social Mention Website
2. Type the name of the product about which you need to gather
IM
information in the Search box and press the Search button, as
shown in Figure 8.9:
M
A Web page appears with the sentiment score for the product
(Sony, in our case) being displayed on the left-hand side, as
shown in Figure 8.10:
n o t e s
S
IM
Figure 8.11: Showing the Sentiment Score for the Product and
Related Information
You can get more information on the feedback of the product by
M
Figure 8.12 shows the Web page of the Sentiment140 text analysis
tool:
n o t e s
S
Figure 8.13: Showing the Online Analysis for Toshiba Products Using
the Sentiment140 Tool
IM
self assessment Questions
Activity
N
Search and find some tools that are used by the organisations to
analyse its Social Media Competitors.
n o t e s
S
Figure 8.14: Evolution of Mobile Technologies
IM
Source: 3GPP Alliance, UMTS forums, Informa telecoms, Motorola, ZTI.
note
The full forms of the terms used in Figure 8.14 are as follows:
GSM: Global System for Mobile Communications
M
“Forget what we have taken for granted on how consumers use the In-
ternet,” said Karsten Weide, research vice president, Media and En-
tertainment. “Soon, more users will access the Web using mobile devic-
es than using PCs, and it’s going to make the Internet a very different
place.”
n o t e s
S
service. Users are identified by unique device IDs. The growth and
popularity of a service greatly depend on the number of new users
it is able to attract.
IM
Active users: These are users who use mobile services at least once
in a specified period. If the period is one day, for example, the ac-
tive user will use the service several times during the day. The
number of active users in any specific period of time shows the
popularity of a service during that period.
Percentage of new users: This is the percentage of new users over
M
the total active users of a mobile service. This figure is always less
than 100%, but a very low value means that the particular service
or app is not doing very well.
Sessions: When a user opens an app, it is counted as one session.
N
In other words, the session starts with the launching of the app
and finishes with the app’s termination. Note that a session is not
related to how long the app has been used by the user.
Average usage duration: This is the average duration that a mo-
bile user uses the service.
Accumulated users: This refers to the total number of users (old
as well as new) who have used an app before a specific time.
Bounce rate: The bounce rate is calculated in percentage (%).
It can be calculated as follows:
Bounce rate = Number of terminated sessions on any specific
page of an app/Total number of sessions of the app*100.
The bounce rate can be used by service providers to help them
monitor and improve their service so that customers remain
satisfied and do not leave the service.
User retention: After a certain period of time, the total number
of new users still using any app is known as the user retention of
that app.
n o t e s
S
likes and dislikes, mobile analytics offers different products and
services to them. The purpose of this exercise is to convert a visitor
to a buyer.
IM
Analyse m-commerce activities of visitors: Mobile analytics can
analyse the m-commerce activities of the visitors and find out a
lot of useful information like a user’s frequency of making a pur-
chase and the amount he is willing to spend. Mobile commerce (or
m-commerce) refers to the delivery of electronic commerce capa-
bilities directly into the consumer’s hand anywhere, via wireless
M
technology.
Track Web links that users visit on their mobile phones: Mobile
analytics can be used to analyse the visited Web links of users and
know their preferences.
N
Mobile analytics has several similarities with Web and social analytics,
such as both can analyse the behavior of the user with regard to an
application and send this information to the service provider. Howev-
er, there are also several important differences between Web analytics
and mobile analytics.
Some of the main differences between Web analytics and mobile ana-
lytics are as follows:
Analytics segmentation: Mobile analytics works on the basis of
location of the mobile devices. For example, suppose a company is
offering cab service in a city like New York. In this case, the compa-
ny can use mobile analytics to identify the target people travelling
in New York. Mobile analytics works for location-based segments,
while Web analytics works globally.
n o t e s
S
8.7.3 Types of Results from Mobile Analytics
ers. It also offers deep insight into what makes people buy a product
or service and what makes them quit a service.
Mobile analytics can easily and effectively collect data from various
data sources and manipulate it into useful information. Mobile analyt-
ics keeps track of the following information:
Total time spent: This information shows the total time spent by
the user with an application.
Visitors’location: This information shows the location of the user
using any particular application.
Number of total visitors: This is the total number of users using
any particular application, useful in knowing the application’s
popularity.
Click paths of the visitors: Mobile analytics tracks of the activities
of a user visiting the pages of any application.
n o t e s
S
content fits a particular device screen is done through mobile an-
alytics.
Performance of advertising campaigns: Mobile analytics is used
IM
to keep track of the performance of advertising campaigns and
other activities by analysing the number of visitors and time spent
by them as well as other methods.
There are two types of applications made for mobile analytics. They
are:
Mobile Web analytics
Mobile application analytics
N
Mobile Web refers to the use of mobile phones or other devices like
tablets to view online content via a light-weight browser. The name
of any mobile-specific site can be the form of m.example.com. Mobile
Web sometimes depends on the size of the screen of the devices. For
example, if you design an application for a small screen, its images
would appear blurred on a big screen; similarly, if you make your site
for the big screen, it can be heavy for a small screen device. Some or-
ganisations are starting to build sites specifically for tablets because
they have found that neither their mobile-specific site nor their main
website ideally serves the tablet segment. To solve this problem, mo-
bile Web should have a responsive design. In other words, it should
n o t e s
have the property to adapt the content to the screen size of the user’s
device.
Figure 8.15 shows the difference between a website, a mobile site, and
a responsive-design site:
S
IM
Figure 8.15: Difference among a Website, Mobile Site, and Respon-
sive-design Site
In Figure 8.15, you can see that a website can be opened on both com-
puters and mobile phones, while a mobile site can be opened only
M
The term mobile app is short for the term mobile application software.
It is an application program designed to run on smartphones and oth-
er mobile devices.
n o t e s
Table 8.1 lists the main differences between mobile app analytics and
mobile Web analytics:
S
time shorter session timeouts longer session timeouts. In
(around 30 seconds). general, a session will end
after 30 minutes of inactivity
for websites.
IM
Online/Of- Depending on how it was Mobile Web analytics re-
fline developed, mobile app quires an Internet connec-
analytics may not require tion and can run online only.
to be connected to a mobile
network.
Updates App owners provide frequent Updates are not frequent.
M
Exhibit
N
n o t e s
Activity
S
The selection of analytics tools is not an easy process because these
tools are new and undergo rapid enhancements as compared to tra-
ditional Web analytics tools. Companies frequently upgrade their ex-
IM
isting analytical tools as well as launch new tools with new features.
Mobile analytics tools have some technical limitations; all mobile ana-
lytics tools do not perform all the services, so must find out which tool
can be beneficial for you. Following are some points to be considered
while selecting mobile analytics tools:
M
Now, the question is, ‘how is the information presented to you?’ You
must choose an analytical tool that best suits your requirements.
n o t e s
S
lows:
1. Localytics: This is a big marketing and analytics platform for
mobile and Web apps. Its developer is Localytics, in Boston. It
IM supports cross-platform and Web-based applications. For more
details, you can check out their website at www.localytics.com.
Localytics supports push messaging, business analytics, and
acquisition campaigns management. Localytics has a list of big
customers like Microsoft, New York Times, ESPN, Soundcloud,
and eBay.
M
n o t e s
S
Placed: Placed provides a ‘ratings service’ by measuring various
types of information such as place visited and duration of visit.
This is an efficient tool to explain offline consumer behavior.
IM
8.8.2 Real-Time Analytics Tools
“The 6.8 billion subscribers are approaching the 7.1 billion world popu-
lation” (ITU). This is illustrated in Figure 8.16.
Figure 8.16 shows the growth of mobile phone users (in billions) with
respect to years:
M
N
With the increase in the popularity of mobile phones and mobile Web,
business organisations want to know more about the behavior of the
user. Real-time analytics tools refer to software tools that analyse and
report data in real time.
n o t e s
S
Figure 8.17: Geckoboard Application
Mixpanel: Mixpanel is a Web-based, cross-platform service. It is
IM
a business analytics service that tracks user interactions with mo-
bile Web applications. It offers services to users on the basis of user
behavior. Mixpanel does all its activities in real time. Suhail Doshi
and Tim Trefren founded Mixpanel in 2009 in San Francisco, Cali-
fornia. Figure 8.18 shows the Mixpanel applications:
M
N
User behavior tracking tool is a software tool that tracks user behavior
with any particular mobile application.
n o t e s
S
following are some popular behavior-tracking tools:
TestFlight: Let’s suppose a company has many testers, spread
over different countries (or locations). The company has a new app
that it wants to test. How is it going to test the app? The solution is
IM
TestFlight. TestFlight is a free software platform. Using TestFlight,
a team of developers can use beta and internal iOS applications
to manage the testing and feedback using the dashboard of Test-
Flight. The TestFlight SDK has a wide range of useful APIs to test
application from various dimensions. Figure 8.19 shows the Test-
Flight application:
M
N
n o t e s
S
Figure 8.20: Mobile App Tracking Application
Activity
n o t e s
S
8.9.1 Data Collection through Mobile Device
Data for analysis is collected on the mobile devices and sent back to
IM
the server for further manipulation. This collection process may be
done online as well as offline. This simply means that some data col-
lection processes require an Internet connection to send collected
data to the server but, on the other hand, several applications collect
data into spreadsheets, and they do not require an Internet connec-
tion. Collected data can be stored in various formats.
M
n o t e s
S
Figure 8.21: GUI of Numbers
HanDBase is a relational database management system. It was ini-
IM
tially designed to run on Palm PDAs, but can run on almost any
handheld platform. HanDBase is not as full-featured as Oracle,
Sybase, and DB2, but it still has various other features that make
it important for computing. It is simple enough, supports multiple
handheld platforms, and provides high security. The company of-
fers a few apps to download free from its website.
M
n o t e s
S
8.9.2 Data Collection on Server
As we have studied in the previous section, data collected by a mo-
IM
bile device is ultimately transferred to its server for analysis. A server
stores the received data, performs analysis over it, and creates reports.
The following are some popular applications that collect data into the
server:
DataWinners is the data collection service design for experts. This
M
n o t e s
S
IM
Figure 8.25: Home Page of COMMANDmobile
Till now, you have studied various fundamental concepts related to mo-
bile analytics. Now, let’s do a practical activity with mobile analytics.
M
bile device. You must have an android mobile phone and an Internet
connection to do this practical work.
Perform the following steps to download and install the Graph Trial
app, and then create a graph to present the results of data analysis:
1. Open the Google Play Store by tapping the Play Store icon on
the screen of any android phone or tablet. A window appears,
showing the contents of the play store.
n o t e s
2. Type Graph Trial in the search box, and tap the button to start
the search operation, as shown in Figure 8.26:
S
IM
Figure 8.26: Showing the Google Play Store Window
A window appears, containing the list of available apps for the
particular search item.
M
3. Select the first app, named Graph trial, from the window by
tapping it, as shown in Figure 8.27:
N
Figure 8.27: Selecting the Graph Trial App from the List
The next window appears, asking for permission to install the
app.
n o t e s
S
A new window appears with the INSTALL button.
5. Tap the INSTALL button to install the app, as shown in Figure 8.29:
IM
M
N
n o t e s
S
the Installed App Icon
6. Tap the Graph trial icon to get to the home screen, as shown in
Figure 8.32:
IM
M
N
n o t e s
8. Select the type of graph you want to create by tapping on its icon.
In our case, we have selected simple graph. The Create simple
graph window appears.
9. Select the simple type of graph from the Graph type tab. In our
case, we have selected the Bar graph, as shown in Figure 8.34:
S
Figure 8.34: Showing the Create Simple Graph Window
10. Input the details in the Y axis title, Min, and Max fields, as shown
IM
in Figure 8.35:
M
N
n o t e s
12. Tap the Save button to get the Barchart of list items icon, as
shown in Figure 8.37:
S
IM
Figure 8.37: Showing the Window with the Barchart of list items Icon
M
13. Long tap the Barchart of list items icon to see the graph, as
shown in Figure 8.38:
N
n o t e s
14. Select the Pie tab by tapping it to get a pie chart, as shown in
Figure 8.39:
S
IM
Figure 8.39: Showing a Pie Chart for the Given Dataset
15. Select the Line tab by tapping it to get a line chart, as shown in
M
Figure 8.40:
N
n o t e s
S
IM
M
n o t e s
18. Select the particular file name to get a graph for the dataset.
19. Select the size of the graph by tapping an optionfrom the Select
image size window, as shown in Figure 8.43:
S
IM
Figure 8.43: Selecting the Graph Size
20. Tap the OK button to save the graph in the form of an image, as
M
n o t e s
21. Tap the Share option to open the Share graph image window, as
shown in Figure 8.45:
S
IM
Figure 8.45: Showing the Share graph image Window
22. Tap the option through which you want to share the image. In
M
n o t e s
23. Enter the details in the required fields, as shown in Figure 8.47:
S
IM
Figure 8.47: Entering Details
24. Tap the Share button and exit the application by pressing
the OK button on the pop-up box that appears, as shown in
Figure 8.48:
M
N
n o t e s
Exhibit
S
app in January 2011 to make bookings online. Through this mo-
bile app, Premier Inn was able to generate revenues of over £1m
in just three months of launching the app. Since then, the app has
IM
achieved more than two million downloads. Around 77% of the to-
tal bookings are made through the mobile app.
How was the hotel able to achieve such big revenues? Actually, the
magic behind the success of the Premier Inn mobile app was mo-
bile data analytics provided by Grapple, a mobile-innovation agen-
cy. Grapple collected data from its 300 branded applications of cli-
M
Activity
n o t e s
S
such as the location of user, etc.
Redirect: Some mobile devices do not support redirects. The term
‘redirect’ is used to describe the process in which the system auto-
IM
matically opens another page.
Special characters in the URL: In some mobile devices, some
special characters in the URL are not supported.
Interrupted connections: The mobile connection with the tow-
er is not always dedicated. It can be interrupted when the user is
moving from one tower to another tower. This interruption in the
M
analytics marketing:
Limited understanding of the network operators: Network oper-
ators are unable to understand the business processes happening
outside the carrier’s firewall.
True real-time analysis: True real-time data analysis is not always
possible with mobile analytics due to various reasons such as sig-
nal interruption, variation in technology used in mobiles, random
change in subscriber ID, etc.
Security issues: Mobile technology has various important features
but some of these features, such as GPS, cookies, Wi-Fi, and bea-
cons can disclose important information of the user. Information
like details of credit cards, bank accounts, medical history, or oth-
er personal content can be easily misused. Some techniques like
Deep Packet Inspection (DPI), Deep Packet Capture (DPC), and
application logs can increase security threats.
n o t e s
Activity
S
Determine the ways to overcome the challenges in the field of mo-
bile marketing and mobile advertising.
IM
8.11 SUMMARY
Social media refers to a computer-mediated, interactive, and in-
ternet-based platform that allows people to create, distribute, and
share a wide range of content and information, such as text and
images.
M
n o t e s
key words
S
change of information and data in various formats, such as text,
videos, and photos.
Text mining tools: The tools used to identify themes, patterns,
IM
and insights hidden in the structured as well as unstructured
data.
example.
2. Enlist and explain the key elements of social media analytics.
3. What do you understand by text mining? Discuss the key steps
for any text mining process.
N
n o t e s
S
14. Real-time dashboard
Performing mobile analytics 15. Server
16. True
IM
17. d. Temporary Mobile Sub-
scriber Identity
Challenges of mobile analytics 18. Redirect
n o t e s
SuGGESTED READINGS
Ganis, M., & Kohirkar, A. (2016). Social media analytics: techniques
and insights for extracting business value out of social media. New
York: IBM Press.
Rowles, D. (2017). Mobile marketing: how mobile technology is
revolutionizing marketing, communications and advertising. Lon-
don: Kogan Page.
S
E-REFERENCES
Top 25 social media analytics tools for marketers - keyhole. (2017,
IM
march 09). Retrieved April 28, 2017, from http://keyhole.co/blog/
list-of-the-top-25-social-media-analytics-tools/
Social media analytics. (2017, April 13). Retrieved April 28, 2017,
from https://en.wikipedia.org/wiki/Social_media_analytics
What is social media analytics? - Definition from WhatIs.com.
(n.d.). Retrieved April 28, 2017, from http://searchbusinessanalyt-
M
ics.techtarget.com/definition/social-media-analytics
Mobile Analytics Key Benefits | Mobile Marketing. (n.d.). Re-
trieved April 28, 2017, from https://www.webtrends.com/prod-
ucts-solutions/digital-analytics/mobile-analytics-use-cases/
N
Data Visualisation
CONTENTS
S
9.1 Introduction
9.2 What is Visualisation?
IM
9.2.1 Ways of Representing Visual Data
9.2.2 Techniques Used for Visual Data Representation
9.2.3 Types of Data Visualisation
9.2.4 Applications of Data Visualisation
Self Assessment Questions
Activity
M
Activity
9.4 Tools Used in Data Visualisation
9.4.1 Open-Source Data Visualisation Tools
9.4.2 Analytical Techniques Used in Big Data Visualisation
Self Assessment Questions
Activity
9.5 Summary
9.6 Descriptive Questions
9.7 Answers and Hints
9.8 Suggested Readings & References
Introductory Caselet
n o t e s
S
Resource Planning (ERP) using a variety of different data archi-
tectures and source systems for data-driven decision making.
Main business stakeholders were making their decision on the
basis of manual assembled reports, which were lacking measur-
IM
able consistency, data reliability, and metric transparency.
As a result, the company realised that they require a way to
visualise key performance areas across the organisation. They
had the requirement of creating real-time dashboards with a
consistent user interface, across Sales, Finance, and Operations.
Knowledgent provided an Enterprise Data Warehouse and data
M
n o t e s
learning objectives
9.1 INTRODUCTION
In the previous chapter, you have learned about prescriptive analyt-
ics. It is the final phase of Business Analytics, which uses fundamen-
tals of mathematical and computational sciences to provide different
decision options for taking the benefit of the results of descriptive and
S
predictive analytics.
and the need to visualise data in Big Data analytics. You also learn
about different types of data visualisations. Next, you learn about var-
ious types of tools using which data or information can be presented
in a visual format.
The data is first analysed and then the result of that analysis is visu-
alised in different ways as discussed above. There are two ways to
visualise a data—infographics and data visualisation:
Infographics are the visual representations of information or data
The use of colorful graphics in drawing charts and graphs helps in
n o t e s
S
IM
M
N
n o t e s
Data can be classified on the basis of the following three criteria irre-
spective of whether it is presented as data visualisation or infographics:
Method of creation: It refers to the type of content used while cre-
ating any graphical representation.
S
Quantity of data displayed: It refers to the amount of data which
is represented.
Degree of creativity applied: It refers to the extent to which the
IM
data is created graphically, and wheather it is designed in a color-
ful way or in black and white diagrams.
n o t e s
S
IM
M
0.4 –
0.2 –
0–
–0.2 –
–0.4 –
1
0.5 –1
0 –0.5
–0.5 0
x 0.5 y
–1 1
n o t e s
S
field description of the data flow. Figure 9.5 shows a set of stream-
lines:
IM
M
n o t e s
10
Z
5 Y
0
–2
–2 –1 0 1 2 X
S
Figure 9.7: Parallel Coordinate Plot
Venn Diagram: It is used to represent logical relations between
finite collections of sets. Figure 9.8 shows a Venn diagram for a set
IM
of relations:
A∩B A∩B
A B A B
M
A∪B A–B
A B A B
N
n o t e s
S
the hyperbolic geometry. Figure 9.11 shows a hyperbolic tree:
IM
M
N
n o t e s
S
9.2.3 TYPES OF DATA VISUALISATION
You already know that data can be visualised in many ways, such as
in the forms of 1D, 2D, or 3D structures. Table 9.1 briefly describes the
IM
different types of data visualisation:
Table 9.1: Data Visualisation Types
Name Description Tool
1D/Linear A list of items organised Generally, no tool is used for
in a predefined manner 1D visualisation
M
surface rendering,
volume rendering, and
computer simulations
Temporal Timeline, time series, TimeFlow, Timeline JS, Excel,
Gantt chart, sanky dia- Timeplot, TimeSearcher, Goog-
gram, alluvial diagram, le Charts, Tableau Public, and
and connected scatter Google Fusion Tables
plot
Multidimen- Pie chart, histogram, Many Eyes, Google Charts,
sional tag cloud, bubble cloud, Tableau Public, and Google
bar chart, scatter plot, Fusion Tables
heat map, etc.
Tree/Hierar- Dendogram, radial tree, d3, Google Charts, and Net-
chical hyperbolic tree, and work Workbench/Sci2
wedge stack graph
Network Matrix, node link Pajek, Gephi, NodeXL,
diagram, hive plot, VOSviewer, UCINET, GUESS,
and tube map Network Workbench/Sci2, sig-
ma.js, d3/Protovis, Many Eyes,
and Google Fusion Tables
n o t e s
S
3D (Volumetric) data visualisation: In this method, data presen-
tation involves exactly three dimensions to show simulations, sur-
face and volume rendering, etc. Generally, it is used in scientific
IM
studies. Today, many organisations use 3D computer modelling
and volume rendering in advertisements to provide users a better
feel of their products. To create 3D visualisations, we use some
visualisation tools that involve AC3D, AutoQ3D, TrueSpace, etc.
Temporal data visualisation: Sometimes, visualisations are time
dependent. To visualise the dependence of analyses on time, the
M
n o t e s
S
munication, and automobile industry extensively use 3D adver-
tisements to provide a better look and feel to their products.
Science: Every field of science including fluid dynamics, astro-
IM
physics, and medicine use visual representation of information.
Isosurfaces and direct volume rendering are typically used to
explain scientific concepts.
Systems visualisation: Systems visualisation is a relatively new
concept that integrates visual techniques to better describe com-
plex systems.
M
n o t e s
S
Activity
Visual analysis of data is not a new thing. For years, statisticians and
analysts have been using visualisation tools and techniques to inter-
pret and present the outcomes of their analyses.
n o t e s
The most common notation used for Big Data is 3Vs—volume, veloci-
ty, and variety. But, the most exciting feature is the way in which val-
ue is filtered from the haystack of data. Big Data generated through
social media sites is a valuable source of information to understand
consumer sentiments and demographics. Almost every company now-
adays is working with Big Data and facing the following challenges:
Most data is in unstructured form
Data is not analysed in real time
The amount of data generated is huge
There is a lack of efficient tools and techniques
S
search and development of robust algorithms, software, and tools to
analyse the data that is scattered in the Internet space. Tools such
as Hadoop provide state-of-the-art technology to store and process
IM
Big Data. Analytical tools are now able to produce interpretations on
smartphones and tablets. It is possible because of the advanced visual
analytics that is enabling business owners and researchers to explore
data for finding out trends and patterns.
The most exciting part of any analytical study is to find useful infor-
mation from a plethora of data. Visualisation facilitates identification
of patterns in the form of graphs or charts, which in turn helps to de-
rive useful information. Data reduction and abstraction are generally
N
Visual data mining also works on the same principle as simple data
mining; however, it involves the integration of information visualisa-
tion and human–computer interaction. Visualisation of data produces
cluttered images that are filtered with the help of clutter-reduction
techniques. Uniform sampling and dimension reduction are two com-
monly used clutter-reduction techniques.
n o t e s
S
smartphones and tablets. (True/False)
10. Big Data generated through _________ sites is a valuable
source of information to understand consumer sentiments
and demographics.
IM
11. Which of the following is/are the challenges with Big Data?
a. Most data is in unstructured form.
b. Data is not analysed in real time.
c. The amount of data generated is huge.
M
d. All of these
12. _______ and _________ are two commonly used clutter-
reduction techniques.
N
Activity
n o t e s
S
IM
M
n o t e s
S
IM
Figure 9.16: Digg Arc
Larger stories have more diggs, as shown in Figure 9.16. The arc
M
becomes thicker with the number of times users dig the story.
Google Charts API: This tool allows a user to create dynamic
charts to be embedded in a Web page. A chart obtained from the
data and formatting parameters supplied in a HyperText Trans-
N
3 Jefferson EGY
37
USA
RUS IRN 12.5%
deu 7%
16.
n o t e s
S
IM Figure 9.18: TwittEarth
Source: http://cybergyaan.com/2010/01/10-supercool-ways-to-visualise-internet.html
of your choice and it will find the picture. The central (core) star
contains all the images directly relating to the initial tag and the
revolving planets consist of similar or corresponding tags. Click
on a planet and additional sub-categories will appear. Click on the
central star and Flickr images gather and land on a gigantic 3D
N
n o t e s
S
IM
Figure 9.20: Some Visuals Obtained from D3
Source: http://d3js.org/
n o t e s
Open-source tools are easy to use, consistent, and reusable. They de-
liver high-quality performance and are compliant with the Web as
well as mobile Web security. In addition, they provide multichannel
analytics for modelling as well as customised business solutions that
S
can be altered with changing business demands.
n o t e s
Activity
Collect information about the pivot table used in Excel for repre-
senting data.
S
9.5 SUMMARY
Visualisation
IM
is a pictorial or visual representation technique.
Anything which is represented in pictorial or graphical form, with
the help of diagrams, charts, pictures, flowcharts, etc. is known as
visualisation.
Data presented in the form of graphics can be analysed better than
the data presented in words.
M
n o t e s
key words
S
9.6 DESCRIPTIVE QUESTIONS
IM
1. What do you understand by data visualisation? List the different
ways of data visualisation.
2. Describe the different techniques used for visual data
representation.
3. Discuss the types and applications of data visualisation.
M
n o t e s
S
16. True
17. Flickr
IM
HINTS FOR DESCRIPTIVE QUESTIONS
1. Visualisation is a pictorial or visual representation technique.
Anything which is represented in pictorial or graphical form, with
the help of diagrams, charts, pictures, flowcharts, etc. is known
as visualisation. Refer to Section 9.2 What is Visualisation?
M
n o t e s
SUGGESTED READINGS
Kirk, A. (2016). Data visualisation: a handbook for data driven de-
sign. Los Angeles: Sage Publications.
Evergreen, S. (2017). Effective data visualization: the right chart
for the right data. Los Angeles: Sage.
Kirk, A. (2012). Data visualization: a successful design process. S.l.:
Packt Publ.
E-REFERENCES
Data visualization. (2017, April 26). Retrieved May 02, 2017, from
S
https://en.wikipedia.org/wiki/Data_visualization
Suda, B., & Hampton-Smith, S. (2017, February 07). The 38 best
tools for data visualization. Retrieved May 02, 2017, from http://
IMwww.creativebloq.com/design-tools/data-visualization-712402
50 Great Examples of Data Visualization. (2009, June 01). Re-
trieved May 02, 2017, from https://www.webdesignerdepot.
com/2009/06/50-great-examples-of-data-visualization/
M
N
CONTENTS
S
10.1 Introduction
10.2 Financial and Fraud Analytics
IM
Self Assessment Questions
Activity
10.3 HR Analytics
Self Assessment Questions
Activity
10.4 Marketing Analytics
M
Activity
10.6 Supply Chain Analytics
Self Assessment Questions
Activity
10.7 Web Analytics
Self Assessment Questions
Activity
10.8 Sports Analytics
Self Assessment Questions
Activity
10.9 Analytics for Government and NGO’s
Self Assessment Questions
Activity
10.10 Summary
10.11 Descriptive Questions
10.12 Answers and Hints
10.13 Suggested Readings & References
Introductory Caselet
n o t e s
S
in comparison to amateur sports. Miami baseball wanted to be
updated and competitive in this sport with the use of analytics.
The team wanted to use analytics for analysing pitching which
IM
includes type, speed and location of pitch.
After searching various available tools in the market for sports
analytics, the team has decided to use Vizion360 impact analyt-
ics. This analytics involve the use of visualisation tool, Microsoft
spower BI, at the front end. This tool provides deep insight of the
collected data. It provides detailed summary of the performance
M
of the team and the player at the individual level. The summa-
ry also includes statistics related to pitching and batting. After
studying this data, the team worked on improving their perfor-
mances in intrinsic situations of the game.
N
n o t e s
learning objectives
S
10.1 INTRODUCTION
Business analytics has emerged as a growth driver for most new era
IM
organisations. Gone are those occasions when managers used to settle
on choices on the premise of their own guts or use large-scale finan-
cial indicators and their imaginable effect on individual organisations.
Choices made without data and information have turned out to be
unfortunate for many associations. With the advent of data innova-
tion and increased data handling ability of PCs, supervisors are utilis-
ing numerous metbods to anticipate the fate of business and enhance
M
This chapter first discusses financial and fraud analytics. Next, the
chapter explains HR analytics, marketing analytics and healthcare
analytics. The chapter also explains supply chain analytics and Web
analytics. Towards the end, the chapter discusses sports analytics and
how analytics is used by the government and NGOs for providing var-
ious beneficial services to people.
n o t e s
S
novations that are currently accessible.
ing privacy and security of data while doing business with them or
offering them various services which require their personal data to be
utilised.
n o t e s
S
be analysed over all pertinent business frameworks and applications.
Breaking down business exchanges at the source level provide audi-
tors with better knowledge and a more entire view with regards to the
probability of fraud happening. Analysis involves the investigation of
IM
those activities that are suspicious and help control weaknesses that
could be misused by fraudsters.
losses happen.
3. It is essential for an organisation to have successful fraud
management or a fraud analytics program to defend its
reputation against fraud. (True/False)
Activity
10.3 HR ANALYTICS
Human Resource (HR) analytics, additionally called talent analytics,
is the use of complex information mining and business analytics (BA)
strategies to get HR information. HR analytics is a zone in the field
of analysis that alludes to applying analytic processes to the human
resource department of a company in the expectation of enhancing
n o t e s
S
of that information.
n o t e s
S
5. HR analytics help managers in gaining deeper details from
information at hand, then make important decisions and take
proper actions. (True/False)
IM
6. ____________ analytics helps in identifying how many are
operationally efficient people are in business.
Activity
n o t e s
You need to follow the below three steps to get the benefits from mar-
keting analytics:
1. Practice a balanced collection of analytic methods
In order to get the best benefits from marketing analytics, you
need an analytic evaluation that is balanced – that is, one that
merges methods for:
Covering the past: Utilising marketing analytics to research
on the past. You can answer a few queries such as which cam-
paign component was used to make most income from last
quarter?
Exploring the present: Marketing analytics enables you to
decide how your marketing activities are acting at this mo-
ment by asking questions such as: How are clients doing?
S
Which channels do clients use to gain maximum benefits?
What is the reaction of different networking media personnel
on the company’s image?
Predicting influencing what’s to come: Marketing analytics
IM
can be used to deliver data driven expectations to change the
future by putting few inquiries such as: How would we be
able to transform here and now win into dedication and con-
tinuous engagement? In what capacity, we should include
more sales representatives to meet expectations? Which ur-
ban communities would be a good idea for us to focus next by
M
n o t e s
S
your efforts and investments. It can lead to better management which
helps in generating more revenue and greater profitability.
d. None of these
8. Marketing analytics enables you to decide how your marketing
activities are acting at this moment. (True/False)
N
Activity
n o t e s
S
ical analysis, fraud analysis, supply chain analysis and HR analysis.
Basically, healthcare analytics is based on the verification of patterns
in healthcare data for determining how clinical care can be enhanced
IM
while minimising the excessive cost.
n o t e s
ume-to value based healthcare. Presently like never before, the ana-
lytics is crucial for clinicians and health service providers so that they
can distinguish and address gaps in care, quality and hazards and use
it to bolster changes in clinical and quality results and financial per-
formance.
S
a. Electrical Medical Records
b. Electronic Medical Records
IM
c. Electronic Mediclaim Records
d. None of these
11. Healthcare analytics is based on the verification of patterns
in healthcare data for determining how clinical care can be
enhanced while minimising excessive cost. (True/False)
12. _______ analytics is capable of continuous reporting that
M
Activity
n o t e s
for their products as soon as the products are announced in the mar-
ket. Most Apple products are manufactured in China; therefore, Apple
needs to have a highly efficient supply chain to ship items from China
to different countries in the world.
S
Almost every economy is getting globalised today, and the companies
are competing to increase their presence in the global market. The
IM
operations performed by global manufacturing and logistic teams are
getting more intrinsic and challenging. Delay in shipments, ineffec-
tive planning and inconsistent supplies can lead to an increase in the
supply chain cost of the company. Some issues faced by supply chain
organisations are as follows:
Visibility of global supply chain and various processes in logistics
M
n o t e s
S
more precisely to forecast demand and describe and monitor pol-
icies related to supply and replenishment. It is also used for plan-
ning inventory flow of goods and services.
IM
Reducing cost by optimising sourcing and logistic activities: The
cost involved in supply chain is a major portion of company’s over-
all cost. The supply chain costs significantly impact various finan-
cial metrics such as the cost of goods sold, working capital and
cash flow. There is a constant requirement to improve organisa-
tions’ financial performance which can manage huge amounts of
M
n o t e s
Activity
S
turing industry.
IM
10.7 WEB ANALYTICS
Web analytics refers to measuring, collecting, analysing and reporting
of Web data to understand and optimise the usage of Web. However,
Web analytics is not only restricted to measurement of Web traffic but
can also be utilised as a method of performing research in business
and market.
M
views, and in gauging Web traffic and popularity patterns which are
useful in market research. The four basic steps of Web analytics are
as follows:
Collection of information: This stage involves gathering of basic
or elementary data. This data involves counting of things.
Processing of data into information: The purpose of this stage is
to process the collected data and derive information from it.
Developing KPI: This stage focuses on using the derived informa-
tion with business methodologies, referred to as Key Performance
Indicators (KPI).
Formulating online strategy: This stage emphasises on setting
online goals, objectives and standards for the organisation or busi-
ness. It also lays emphasis on making and saving money and in-
creasing marketshare.
There are two categories of Web analytics: off-site Web analytics and
on-site Web analytics. Off-site Web analytics allows Web measurement
and analysis irrespective of whether you own or maintain a website. It
n o t e s
There are mainly two methods of gathering the data technically. The
first method lays emphasis on server log file analysis in which the log
files are read and used by the Web server for recording file requests
S
sent by browsers. The second method, known as page tagging, uses
JavaScript embedded in the Web page for tracking it. Both the meth-
ods can gather data which can be processed for generating reports
IM
of Web traffic. This second method provides more accurate result as
compared to the first method.
n o t e s
could mean visitors were unable to find what they were searching
for in the site.
Identify exit pages: An exit is the point at which a visitor visits
various pages on site and then leaves that site. A few pages on a
site may have a high leave rate, similar to the thank you page on an
online e-commerce website after purchasing is done successfully.
A high exit rate on a particular page demonstrates that the page
has some issue and should be investigated quickly. Examination of
such pages should be done to determine whether visitors are not
getting the intended information for which they have visited the
website. Web analytics tools help in finding such pages quickly and
rectifying the problems with those pages.
Identify target market: It is essential for advertisers to understand
their visitors and deliver information according to their require-
S
ments. The discoveries of analytics services uncover the present
market requests which generally change with a geographic area.
By utilising Web analytics, marketers can track the volume and
geographical information of visitors and can offer things according
IM
to the interest of visitors.
16. There are two categories of Web analytics which are _______
Web analytics and ________ Web analytics.
N
Activity
Visit a Web hosting company and try to learn how Web analytics can
help the company to monitor the activity on the hosted websites of
the server.
n o t e s
game simulators, etc. Fitness trackers are smart devices that provide
data about the fitness of players on the basis of which coaches can
take a decision of including particular players in the team or not. The
game simulators help in practicing the game before the actual sport-
ing event takes place.
The sport analytics not only modifies the way of playing a game but
also changes the way of recording the performance of players. The
National Basketball of America (NBA) teams are now using the player
tracking technology which can evaluate the efficiency of a team by an-
alysing the movement of its players. As per the information provided
by the SportVu software website, the teams in NBA have installed six
cameras for tracking the movements of each player on the court and
the basketball at the rate of 25 times per second. The data collected
using cameras provide significant amount of innovative statistics on
S
the basis of speed, player separation and ball possession. For example,
how fast a player moved, how much distance he had covered during
the game, how many times he had passed the ball and much more. On
the basis of the data collected, strategies are created to win the game
IM
or to improve the performance in the game.
Sports analytics has also found its application in the field of sports
gambling. The availability of more accurate information about teams
and players on the websites leads to sport gambling to new levels. The
analytics information helps gamblers in better decision making and
attaining accuracy in predicting outcomes of games or performance
M
17. Fitness trackers are _____devices that provide data about the
fitness of players.
18. Sports analytics does not contribute in the field of sports
gambling. (True/False)
Activity
Discuss with your friends how analytics can be used in the field of
sports to enhance the energy of players while protecting them from
injuries.
n o t e s
Big data analytics is used in almost every part of the world for deriving
useful information from huge sets of data. Not only private organisa-
tions and industries are employing data analytics but also many gov-
ernment enterprises are adopting data analytics for taking smart de-
cisions for the benefit of its citizens. Lot of data gets generated in the
S
government sector and processing and analysing this data helps the
government in improving its policies and services for citizens. Some
benefits of data analytics in government sector are as follows:
IM
With the rise of national threats and criminal activities these days,
it is important for any government to ensure safety and security of
its citizens. With the help of data analytics, intelligence organisa-
tions can detect crime prone areas and be prepared to prevent or
stop any kind of criminal activity.
The analytics also help in detecting the possibility of the cyber at-
M
citizens. It can also be used for tracking disease patterns. The gov-
ernment can launch proper healthcare facilities in advance in the
areas prone to diseases. It also helps in arranging and managing
free medicines and vaccinations, etc in order to save life of people.
Real time analysis and sensors help government departments in
water management in the city. The officials can detect the issues
in the flow of water, pollution level in water, predict scarcity of wa-
ter on the basis of usage, detect areas of leakage, etc. Government
departments can take proper action to avoid these issues to ensure
supply of clean water in city.
Government organisations also use analytics to detect tax frauds
and predict the revenue. Government can take necessary steps to
prevent tax frauds and increase the revenue.
Government can also use the analytics in the field of agriculture
to know the appropriate time for cultivation of crops, fertilisers
required for crops, etc. Moreover, the government can also take
prior actions to prevent damage of crops in case of various envi-
ronmental challenges.
n o t e s
S
shaya Patra foundation, which supplies food in government schools
in Bangalore. The foundation was finding it difficult to supply food to
government schools due to high cost involved with it. Therefore, they
looked for a cost-effective solution to deliver food in schools without
IM
any interruption.
routes by five.
Besides Akashya Patra, several other large NGOs such as Bill and
Melinda Gates Foundation India, Save the Children India, and Child
Rights and You (CRY) are also utilising data to raise their efficiency
in getting and allocating funds, predicting trends and planning cam-
paigns.
These NGOs often face difficulties with data collection because they
use traditional ways of data collection. In order to overcome these
challenges, NGOs have allotted mobile phones equipped with apps
so that real time collection and recording of data can take place. The
data recorded in this manner would be accurate and will give more
precise information on the basis of which further decisions or action
plans can be made.
n o t e s
Activity
S
Visit to a nearby NGO and try to know how analytics has helped
them in improving their services and focus more on the overall de-
velopment of the people or area.
IM
10.10 SUMMARY
Business analytics has expanded consistently over the previous
decade as confirmed by the constantly developing business ana-
lytics software market.
M
today’s dangers.
Organisations generally move to HR analytics and data led solu-
tions when there exists problems that cannot be resolved with the
current management practices.
Marketing analytics helps in providing deeper insights of custom-
er preferences and trends. Despite various benefits, a majority of
organisations failed to realise the benefits of marketing analytics.
Healthcare organisations are also implementing approaches, for
example lean and Six Sigma to take a more patient-driven concen-
tration, lessen errors and waste, and increase the number of flow
of patients with the objective of enhancing quality.
Organisations that operate in a highly competitive global environ-
ment needs to have a highly effective supply chain management
system in place.
n o t e s
key words
S
performance of an individual employee.
Fraud Analytics: It is used to detect whether a financial activity
is fraudulent or not to prevent any kind of financial loss.
IM
Marketing analytics: It helps in providing deep insight of cus-
tomer preferences and trends.
n o t e s
S
Marketing Analytics 7. a. Search Engine Optimisation
8. True
9. False
IM
Healthcare Analytics 10. b. Electronic Medical Records
11. True
12. Real-time
Supply Chain Analytics 13. True
M
14. Advanced
Web Analytics 15. Web
16. Off-site, On-site
N
n o t e s
S
historical information in the field of sports mainly to perform
better than any other team or individual. Refer to Section
10.8 Sports Analytics.
IM
8. Data analytics is also playing its role in the government sector. Not
only it is important for government, it is also equally beneficial
for non-governmental organisations as well. NGOs are also often
called non-profit organisations. Data analytics is used by these
organisations to get deeper details of data. Refer to Section
10.9 Analytics for Government and NGOs.
M
SUGGESTED READINGS
N
E-REFERENCES
Data analysis techniques for fraud detection. (2017, April 26). Re-
trieved May 03, 2017, from https://en.wikipedia.org/wiki/Data_
analysis_techniques_for_fraud_detection
n o t e s
S
IM
M
N
CASE STUDIES
S
CONTENTS
Case Study 1 How Cisco it uses Big Data Platform to Transform Data
Management
IM
Case Study 2 Usda used Data Mining to know the Patterns of Loan Defaulters
Case Study 3 Cincinnati Zoo used Business Analytics for Improving Performance
Case Study 4 Application of Business Analytics in Resource Management
Case Study 5 Role of Descriptive Analytics in the Healthcare Sector
Case Study 6 An Application of Predictive Analytics in Underwriting
Case Study 7 Unicredit Bank Applies Prescriptive Analytics for Risk Management
M
Players
Case Study 11 Fraud Analytics Solution Helped in Saving the Wealth of Companies
Case Study 12 Big Data Analytics Allowing Users to Visualise the Future of Free
Online Classifieds
Case study 1
n o t e s
Background
Cisco is one of the world’s leading networking organisations
that has transformed the way how people connect, communicate
and collaborate. Cisco IT has 38 global data centres that totally
comprise 334,000 square feet space.
S
Challenge
The company had to manage large datasets of information about
customers, products and network activities, which actually
IM
comprise the company’s business intelligence. In addition,
there was a large quantity of unstructured data, approximately
in terabytes in the form of Web logs, videos, emails, documents
and images. To handle such a huge amount of data, the company
decided to adopt Hadoop, which is an open-source software
framework to support distributed storage and processing of big
datasets.
M
Case study 1
n o t e s
Solution
Cisco IT developed a Hadoop platform using Cisco® UCS Common
Platform Architecture (CPA) for Big Data.
According to Jag Kahlon, a Cisco IT architect, “Cisco UCS CPA for
Big Data provides the capabilities we need to use big data analytics
for business advantage, including high-performance, scalability,
and ease of management.”
For computation, the building block of the Cisco IT Hadoop
S
Platform is the Cisco UCS C240 M3 Rack Servers, which are
powered by Intel Xeon E5-2600 series processors, 256 GB of RAM,
and 24 TB of local storage.
IM
Virendra Singh, a Cisco IT architect, says, “Cisco UCS C-Series
Servers provide high performance access to local storage, the biggest
factor in Hadoop performance.”
The present architecture contains four racks of servers, where
each rack is having 16 server nodes providing 384 TB of raw
storage per rack. Kahlon says, “This configuration can scale to 160
M
Case study 1
n o t e s
S
on each job compared to Oozie because reducing the number of
programming steps means less time needed for debugging.” Another
benefit of using Cisco TES is that it operates on mobile devices, so
that the end-users of the company can manage big data jobs from
IM
anywhere.
Results
The main result of transforming the business using Big Data by
Cisco IT is that the company has introduced multiple big data
analytics programs, which are based on the Cisco® UCS Common
M
Case study 1
n o t e s
Lesson Learned
Cisco IT has come up with the following observations shared with
other organisations:
Hive is good for structured data processing, but provides lim-
ited SQL support.
Sqoop easily moves a large amount of data to Hadoop.
Network File System (NFS) saves time and effort to manage a
large amount of data.
Cisco TES simplifies the job-scheduling and orchestration
S
process.
A library of user-defined functions (UDFs) provided by Hive
and Pig increases developer productivity.
IM
Knowledge of internal users is enhanced as they can now
analyse unstructured data of email, webpages, documents,
etc., besides data stored in databases.
questions
M
Case study 2
n o t e s
S
$216 billion for providing economic opportunities to the rural
communities of the nation.
The rural housing service of USDA runs various programmes
IM
to create and improve housing and other important community
facilities in rural areas. USDA also provides loans, permissions
and loan guarantees for housing of single- and multi-family, fire -
and police stations, child care centres, hospitals, nursing homes,
libraries, schools, etc. The main aim of USDA and its partners
working together is to make sure that rural America should be
a better place to live, work and raise a family.
M
Case study 2
n o t e s
questions
S
IM
M
N
Case study 3
n o t e s
Background
Opened in 1875, Cincinnati Zoo & Botanical Garden is a world-
famous zoo that is located in Cincinnati, Ohio, US. It has more
than 1.3 million visitors every year.
Challenge
S
In late 2007, the management of the zoo had begun a strategic
planning process to increase the number of visitors by enhancing
their experience with an aim to generate more revenues. For
this, the management decided to increase the sales of food items
IM
and retail outlets in the zoo by improving their marketing and
promotional strategies.
According to John Lucas, the Director of Operations at Cincinnati
Zoo & Botanical Garden, “Almost immediately, we realised we had
a story being told to us in the form of internal and customer data, but
we didn’t have a lens through which to view it in a way that would
M
onto it.”
They looked for various providers, but did not include IBM initially
in the false assumption that they could not afford IBM. Then,
somebody guided them that it was completely free to talk to IBM.
Then, they found that IBM not only had suggested a solution that
could fit in their budget, but it was the most appropriate solution
for what they were looking for.
Solution
IBM has provided a business analytics solution to the zoo’s
executive committee, which provides a facility of analysing data
related to the membership of customers, their admission and food,
etc. in order to gain a better understanding of visitors’ behaviour.
This solution also provides a facility of analysing the geographic
and demographic information that could help in customer
segmentation and marketing.
The zoo’s executive committee wanted a platform, which would
be capable of delivering the desired goals by combining and
Case study 3
n o t e s
Output
The result of implementing the IBM’s business analytics solution
is that the zoo’s return of investment (ROI) has increased. Lucas
S
admits, “Over the 10 years we’d been running that promotion, we
lost just under $1 million in revenue because we had no visibility
into where the visitors using it were coming from.”
IM
The new business analytics solution has helped in cost savings
for the zoo; for example, there is a saving of $40,000 in marketing
in the first year, visitors’ number has been increased to 50,000 in
2011, food sales is increased by least 25%, and retail sales has been
increased by at least 7.5%, etc.
By adopting new operational management strategies of the
M
questions
Case study 4
n o t e s
This Case Study discusses how a real estate company uses business
analytics for resource management. It is with respect to Chapter 4.
Analytics can influence cross domains expertise. This case study
presents an instance where a real estate company assisted a law
firm in choosing whether or not to relocate to a different office
space through the usage of data devices. This was done based on
the feedback of employees of the law firm taken by the internal
analytics team of the real estate company. Such feedback helped
the real estate company to come up with an employee lean
management program for the law firm.
S
In one of a kind example, the law firm wished to bring in and keep
the most suitable employees, so the first factor to be evaluated
as personnel retention. It had got great ratings for its brilliant
IM
services and consistent focus on improving customer service
experience. Being a firm with services of this range, the firm
certainly faced some challenges as usual with any other resource-
critical organisation. Now to deal with space-related issues, the
firm roped in the real estate company which went on not to only
suggest the office space but also managed to streamline resource
operations to effectively compensate a positive impact of admitting
M
Method
The company conducted a few surveys and questionnaire among
N
the group and came out with a solution to streamline and lean
manage the teams present within the law firm. For the office
space, the real estate company used the firm’s resources to
map out where the employees were most often. The real estate
company assisted the law firm by utilising different location-
conscious mechanisms to keep track of the whereabouts of the
firm’s personnel in which the data was accumulated based on
employee partialities and activities. The end-result was that the
law firm decided to relocate from the high-rise office into a more
affordable space based on the location habits of its personnel. The
new location was too convenient for employees that it resulted in
increased employee retention; thereby saving costs of the firm.
Apart from the above actions, the following questionnaires were
circulated across various departments:
Questions for Management:
What evaluation methods should be employed to assess the
yearly performance of employees?
Case study 4
n o t e s
questions
S
1. What were the initial challenges faced by the law firm?
(Hint: Office space relocation indecisiveness, employee
IM
retention impact and resourcing issues)
2. What are the lessons learned from this case study?
(Hint: You can cite examples of cross-functionality
deployed by the real estate team to denote excellent all-
round services provided by the real estate company.)
M
N
Case study 5
n o t e s
S
Many standards and measurable attributes can be used for
defining performance and quality in the healthcare industry.
Some attributes are effectiveness, timeliness, safety, efficiency,
IM
accessibility and availability. In addition to this, healthcare
organisations also consider patient and social preferences in
order to assess and assure quality in the healthcare sector.
The major challenge that lies in the healthcare sector across
the world is crowding of emergency rooms which may lead to
serious consequences and complications. Overcrowding and poor
M
Case study 5
n o t e s
S
healthcare organisations.
Descriptive analytics also helps in studying various decisions in
healthcare and their impact on service performance and clinical
IM
results. Descriptive analytics is an easy and simple approach to
apply and the data is usually represented in terms of graphs and
tables, which display hospital occupancy rates, average time of
stay, indicators related to healthcare services, etc.
Moreover, descriptive analytics provide data visualisation, which
helps in answering specific queries or determining the patterns
M
questions
Case study 6
n o t e s
S
Freedom Specialty Insurance Company placed the industry as the
top priority. Using external predictive analytic data to calculate
risk, D&O claims could be foreseen from class action lawsuit
data. An exclusive, multimillion dollar underwriting model was
IM
created, the disbursements of which have proven profitable to
Freedom in the amount of $300 million in annual direct written
premiums. Losses have been kept at a minimum with a rate below
49% in 2012, which is the industry’s average loss percentage.
The model has proven successful in all areas, with satisfied and
assured employees at all levels of the company, as well as the
M
Case study 6
n o t e s
S
into place and the process needed to be carried out manually,
the process took weeks to finish. Now, with the modernised
methods and information cataloguing devices, everything can
be completed within hours.
IM
Back testing: This is one of the most important processes that
determine the potential risk upon receiving a claim. The sys-
tem will use the predictive model to run the claim and analyse
the selection criterion, altering tolerances as required. Upon
being used numerous times, the positive feedback loop polish-
M
es the system.
Predictive model: Information is consolidated and run
through a model, which defines the wisest range of apprais-
al and limits through the use of multivariate analysis. Algo-
N
Case study 6
n o t e s
S
scrubbing, back-testing and classification were all discovered and
learned by the people themselves and were originally carried out
by hand. However, they have been increasingly mechanised since
IM
they were first conceived. Also, there is an ever-growing quantity
of external sources. Freedom is currently undergoing processes
to assess the implementation of cyber security and intellectual
property lawsuits, with the predictive model continuously being
enhanced and improved.
The D&O industry has adopted many processes related to the
M
Case study 6
n o t e s
questions
S
2. What changes did the implementation of an advanced
predictive model bring in for the company?
(Hint: Integrated processes, easier claim tracking, etc.)
IM
M
N
Case study 7
n o t e s
S
technology framework.
Recently, the bank has implemented Fico software, which works
as a decision engine to manage data related to credit cards,
IM
personal loans or other small business loans. According to Ivan
Cavinato, head of credit risk methodologies for the Italian bank,
“The predictive analytics and decision management software will
analyse big data to improve customer lending decisions and capital
optimization.”
The Fico software adopts UniCredit’s strategy on data and
M
Case study 7
n o t e s
questions
S
1. What was the challenge faced by UniCredit?
(Hint: UniCredit required right information in order to
handle their risk management projects.)
IM
2. How has UniCredit achieved its goal?
(Hint: By adopting Fico software that uses prescriptive
analytics to enhance customer relationships and credit
risk management.)
M
N
Case study 8
n o t e s
This Case Study discusses how MediaCom has taken the assistance
of Sysomos for planning and measuring data related to advertising
campaigns for its clients. It is with respect to Chapter 8.
MediaCom is one of the leading media agencies of the world,
which helps its clients to plan and measure its advertising
strategies across all media channels. The company greatly
depends on Sysomos in planning and measuring the performance
of campaigns of its clients.
The main motto of MediaCom agency was to improve the business
along with having insight data related to the audience’s response
to their brands and issues.
S
Alejandro De Luna, Social Strategy Manager at MediaCom, says
“The value Sysomos provides for us is very clear. We need to have a
bedrock of insights to justify how to approach content solutions for
IM
different audiences and different platforms, and Sysomos helps us
to sell in our strategies by giving us a much clearer understanding
of how audiences feel about specific brands and issues.”
Sysomos has enabled MediaCom to analyse online conversations
without any limitations of keywords or result into the database of
over 550 billion social media posts. Now, MediaCom is able to use
M
Case study 8
n o t e s
questions
S
IM
M
N
Case study 9
n o t e s
S
office applications, support systems, etc. in order to fulfil the
requirements of municipalities and residents of Portugal. In
addition to fulfil the requirements of the Portuguese government,
IM
Medidata has its own pool of customers who use ERP solutions
and services for improving document management and enhancing
workflow.
Medidata started receiving demands from its clients to provide
software that can help them in analysing and interacting with the
data generated from the ERP software. Medidata felt necessary to
M
Case study 9
n o t e s
S
IM
M
N
Case study 9
n o t e s
S
actions.
It helped clients in identifying and understanding their key
performance indicators (KPIs).
IM
It has provided dashboards to clients which include KPIs, such
as workflow performance, in addition to the ratio of workflow
outstanding tasks, grouped by department.
It helped clients by making information available quickly,
the decision-makers of clients became capable to regulate re-
sources in real-time, task execution time in different scenari-
M
questions
Case study 10
n o t e s
This Case Study discusses how real-time analytics from IBM have
been utilised by team USA for measuring and improving their
athlete’s performance. It is with respect to Chapter 10 of the book.
A US-based cycling organisation, which is dedicatedly
contributing towards the betterment of advanced US cycling
teams in the Olympics and other international events, was
involved in determining the ways to get an edge over its well-
funded competitive organisations in the events like Women’s
Team Pursuit. In the team pursuit event, there are four cyclists
with one in the lead and the other three remaining behind. The
challenge appears when riders change their places, which cause
S
disruption and slows down the group. The delay of fraction of a
second can cost the race in this extremely competitive sport.
USA cycling totally depends on private donations unlike national
IM
teams which are totally supported by government bodies.
Coaches in USA recycling felt the need of analytics for analysing
the rider’s performance along with managing the organisation’s
budget efficiently. The challenge in front of USA cycling was to
quantify the performance in Team Pursuit track cycling events
in real time, which were organised indoors in velodromes. It was
M
Case study 10
n o t e s
S
their useful key metrics. As the cycling analytics data is produced
in the real-time scenario and shown on a mobile dashboard, both
coaches and cyclists, therefore, can access their performance data
during the on-going training session with proper feedback.
IM
“The ability to get hold of the data immediately after the training
session has finished has completely changed my relationship with
the team,” according to Neal Henderson, a high performance
consultant with USA Cycling.
The USA cycling can now view data promptly, it gets much
M
questions
Case study 11
n o t e s
S
withdrawals from ATM up to a certain amount or purchasing
using the credit cards outside the credit card holder’s country.
These traditional methods helped in reducing the number of
fraudulent cases but not all. The research team at IBM decided
IM
to take the fraud detection system to the next level, so that a large
number of fraudulent financial transactions can be detected and
prevented. At IBM, the team has created a virtual data detective
solution by using machine learning and stream computing to
prevent fraudulent transactions to save industries or individuals
from financial losses.
M
Case study 11
n o t e s
S
manager, Machine Learning Technologies group.
These machine-learning technologies are presently used
in detecting and preventing fraud in financial transactions,
IM
which includes transactions related to credit cards, ATMs and
e-payments. The system is embedded with client’s infrastructure
and a machine-learning model is developed using its existing set
of data to combat with fraudulent transactions before they take
place.
“By identifying legal transactions that have a high probability of
M
questions
Case study 12
n o t e s
Background
OLX is a popular fast growing online classified advertising
website. It is active in around 105 countries and supports over
40 languages. This website is having more than 125 million
unique visitors per month across the world and generates one
billion page-hits per month approximately. OLX allows its users
S
to design and personalise their advertisements and add them in
their social networking profiles, so that their data require big data
analytics.
IM
Challenges
The main challenge for OLX website was to find new ways to
use business analytics to handle the vast data of their customers.
The business users of OLX required numerous metrics to track
their customer data. To achieve this aim, they need to build a
M
good control over their data warehouse. OLX takes the help of
Datalytics, Pentaho’s partner vendor, in searching the solutions
for extracting, transforming and loading data from worldwide and
then creating an improved data warehouse. After creating such a
warehouse, OLX wants to allow its customers to visualise its stored
N
data in real time without facing any technical error or barrier. OLX
knew that it would be difficult for those people who do not have
without previous Business Intelligence (BI) knowledge, so it is
essential to use a visualisation tool for this purpose. According to
Franciso Achaval, Business Intelligence Manager at OLX, “While
it may be easy for a BI analyst to understand what’s happening in the
numbers, to explain this to business users who are not versed in BI
or OLAP (On-line Analytical Processing), you need visualisations.”
Solutions
OLX has approached Pentaho, which is a business intelligence
software company that provides open source products and
services to its customers, such as data integration, OLAP
services, reporting, information dashboards, etc. Pentaho has
partnership with Datalytics, which is basically a consulting firm
based in Argentina. Datalytics provides data integration, business
intelligence, and data mining solutions to Pentaho’s worldwide
clients.
Case study 12
n o t e s
Results
S
OLX has realised that Datalytics’ expertise and Pentaho’s platform
have enabled them to deploy their new analytics solution in less
than a month. They have realised the following changes in the
IM
new solution:
Pentaho Business Analytics enables OLX to facilitate its users
to create easy and creative reports about key business metrics.
Instead of buying an expensive enterprise solution or invest-
ing time in building a new data warehouse internally, OLX
was able to save time by focussing on data integration with
M
analytics capabilities.
Pentaho Business Analytics provides end-user satisfaction.
Pentaho Business Analytics provides a scalable solution to
N
OLX, as it can integrate any type of data from any data source
and can increase its business. In addition, Datalytics’ assis-
tance provides an opportunity to OLX regarding the experi-
ment with big data.
questions