You are on page 1of 28

Solution Manual for Analytics, Data Science, &

Artificial Intelligence Systems for Decision Support


11th by Sharda

To download the complete and accurate content document, go to:


https://testbankbell.com/download/solution-manual-for-analytics-data-science-artificial
-intelligence-systems-for-decision-support-11th-by-sharda/
Solution Manual for Analytics, Data Science, & Artificial Intelligence Systems for Decision

Chapter 9:
Big Data, Cloud Computing,
and Location Analytics: Concepts and
Tools

Learning Objectives for Chapter 9

1. Learn what Big Data is and how it is changing the world of analytics
2. Understand the motivation for and business drivers of Big Data analytics
3. Become familiar with the wide range of enabling technologies for Big Data
analytics
4. Learn about Hadoop, MapReduce, and NoSQL as they relate to Big Data
analytics
5. Compare and contrast the complementary uses of data warehousing and Big Data
technologies
6. Become familiar with in-memory analytics and Spark applications
7. Become familiar with select Big Data platforms and services
8. Understand the need for and appreciate the capabilities of stream analytics
9. Learn about the applications of stream analytics
10. Describe the current and future use of cloud computing in business analytics
11. Describe how geospatial and location-based analytics are assisting organizations

CHAPTER OVERVIEW               
Big Data, which means many things to many people, is not a new technological
fad. It has become a business priority that has the potential to profoundly change the
competitive landscape in today’s globally integrated economy. In addition to providing
innovative solutions to enduring business challenges, Big Data and analytics instigate
new ways to transform processes, organizations, entire industries, and even society
altogether. Yet extensive media coverage makes it hard to distinguish hype from reality.
This chapter aims to provide a comprehensive coverage of Big Data, its enabling
technologies, and related analytics concepts to help understand the capabilities and
limitations of this emerging technology. The chapter starts with a definition and related
concepts of Big Data followed by the technical details of the enabling technologies,
including Hadoop, MapReduce, and NoSQL. We provide a comparative analysis between
data warehousing and Big Data analytics. The last part of the chapter is dedicated to
analytics, which is one of the most promising value propositions of Big Data analytics.
stream This chapter contains the following sections:

1
Copyright © 2019 Pearson Education, Inc.

Visit TestBankBell.com to get complete for all chapters


CHAPTER OUTLINE

9.1 Opening Vignette: Analyzing Customer Churn in a Telecom Company Using Big
Data Methods

9.2 Definition of Big Data

9.3 Fundamentals of Big Data Analytics

9.4 Big Data Technologies

9.5 Big Data and Data Warehousing

9.6 In-Memory Analytics and Apache SparkTM

9.7 Big Data and Stream Analytics

9.8 Big Data Vendors and Platforms

9.9 Cloud Computing and Business Analytics

9.10 Location-Based Analytics for Organizations

ANSWERS TO END OF SECTION REVIEW QUESTIONS      


Section 9.1 Review Question

1. What problem did customer service cancellation pose to AT’s business survival?

The company was churning through customers to quickly and there were concerns about
the financial impacts.

2. Identify and explain the technical hurdles presented by the nature and characteristics of
AT’s data.

The company allowed cancellations at many different contact points and would wants to
record information about why customers were canceling at each of these points.
Unfortunately this resulted in a multitude of different types of data that would be
collected. Additionally due to the large number of cancellations a large amount of data
was being generated.

3. What is sessionizing? Why was it necessary for AT to sessionize its data?

While not discussed in the case, sessionizing is ordering customer interactions into a flow
so that the flow of actions can be visualized and analyzed. This is what is seen in figure

2
Copyright © 2019 Pearson Education, Inc.
9.2. It was important for the company to sessionizing the data so that they could visualize
the path the customers took to cancellation.

4. Research other studies where customer churn models have been employed. What types
of variables were used in those studies? How is this vignette different?

Student research and results will vary.

5. Besides Teradata Vantage, identify other popular Big Data analytics platforms that
could handle the analysis described in the preceding case. (Hint: see Section 9.8.)

Student research and results will vary.

Section 9.2 Review Questions

1. Why is Big Data important? What has changed to put it in the center of the
analytics world?

As more and more data becomes available in various forms and fashions, timely
processing of the data with traditional means becomes impractical. The
exponential growth, availability, and use of information, both structured and
unstructured, brings Big Data to the center of the analytics world. Pushing the
boundaries of data analytics uncovers new insights and opportunities for the use
of Big Data.

2. How do you define Big Data? Why is it difficult to define?

Big Data means different things to people with different backgrounds and
interests, which is one reason it is hard to define. Traditionally, the term “Big
Data” has been used to describe the massive volumes of data analyzed by huge
organizations such as Google or research science projects at NASA. Big Data
includes both structured and unstructured data, and it comes from everywhere:
data sources include Web logs, RFID, GPS systems, sensor networks, social
networks, Internet-based text documents, Internet search indexes, detailed call
records, to name just a few. Big data is not just about volume, but also variety,
velocity, veracity, and value proposition.

3. Out of the Vs that are used to define Big Data, in your opinion, which one is the
most important? Why?

Although all of the Vs are important characteristics, value proposition is probably


the most important for decision makers’ “big” data in that it contains (or has a
greater potential to contain) more patterns and interesting anomalies than “small”
data. Thus, by analyzing large and feature rich data, organizations can gain
greater business value that they may not have otherwise. While users can detect
the patterns in small data sets using simple statistical and machine-learning

3
Copyright © 2019 Pearson Education, Inc.
methods or ad hoc query and reporting tools, Big Data means “big” analytics. Big
analytics means greater insight and better decisions, something that every
organization needs nowadays. (Different students may have different answers.)

4. What do you think the future of Big Data will be like? Will it lose its popularity to
something else? If so, what will it be?

Big Data could evolve at a rapid pace. The buzzword “Big Data” might change to
something else, but the trend toward increased computing capabilities, analytics
methodologies, and data management of high volume heterogeneous information
will continue. (Different students may have different answers.)

Section 9.3 Review Questions

1. What is Big Data analytics? How does it differ from regular analytics?

Big Data analytics is analytics applied to Big Data architectures. This is a new
paradigm; in order to keep up with the computational needs of Big Data, a
number of new and innovative analytics computational techniques and platforms
have been developed. These techniques are collectively called high-performance
computing, and include in-memory analytics, in-database analytics, grid
computing, and appliances. They differ from regular analytics which tend to focus
on relational database technologies.

2. What are the critical success factors for Big Data analytics?

Critical factors include a clear business need, strong and committed sponsorship,
alignment between the business and IT strategies, a fact-based decision culture, a
strong data infrastructure, the right analytics tools, and personnel with advanced
analytic skills.

3. What are the big challenges that one should be mindful of when considering
implementation of Big Data analytics?

Traditional ways of capturing, storing, and analyzing data are not sufficient for
Big Data. Major challenges are the vast amount of data volume, the need for data
integration to combine data of different structures in a cost-effective manner, the
need to process data quickly, data governance issues, skill availability, and
solution costs.

4. What are the common business problems addressed by Big Data analytics?

Here is a list of problems that can be addressed using Big Data analytics:

• Process efficiency and cost reduction


• Brand management
• Revenue maximization, cross-selling, and up-selling

4
Copyright © 2019 Pearson Education, Inc.
• Enhanced customer experience
• Churn identification, customer recruiting
• Improved customer service
• Identifying new products and market opportunities
• Risk management
• Regulatory compliance
• Enhanced security capabilities

Section 9.4 Review Questions

1. What are the common characteristics of emerging Big Data technologies?

They take advantage of commodity hardware to enable scale-out, parallel


processing techniques; employ nonrelational data storage capabilities in order to
process unstructured and semistructured data; and apply advanced analytics and
data visualization technology to Big Data to convey insights to end users.

2. What is MapReduce? What does it do? How does it do it?

MapReduce is a programming model that allows the processing of large-scale


data analysis problems to be distributed and parallelized. The MapReduce
technique, popularized by Google, distributes the processing of very large multi-
structured data files across a large cluster of machines. High performance is
achieved by breaking the processing into small units of work that can be run in
parallel across the hundreds, potentially thousands, of nodes in the cluster. The
map function in MapReduce breaks a problem into sub-problems, which can each
be processed by single nodes in parallel. The reduce function merges (sorts,
organizes, aggregates) the results from each of these nodes into the final result.

3. What is Hadoop? How does it work?

Hadoop is an open source framework for processing, storing, and analyzing


massive amounts of distributed, unstructured data. It is designed to handle
petabytes and exabytes of data distributed over multiple nodes in parallel,
typically commodity machines connected via the Internet. It utilizes the
MapReduce framework to implement distributed parallelism. The file
organization is implemented in the Hadoop Distributed File System (HDFS),
which is adept at storing large volumes of unstructured and semistructured data.
This is an alternative to the traditional tables/rows/columns structure of a
relational database. Data is replicated across multiple nodes, allowing for fault
tolerance in the system.

5
Copyright © 2019 Pearson Education, Inc.
4. What are the main Hadoop components? What functions do they perform?

Major components of Hadoop are the HDFS, a Job Tracker operating on the
master node, Name Nodes, Secondary Nodes, and Slave Nodes. The HDFS is the
default storage layer in any given Hadoop cluster. A Name Node is a node in a
Hadoop cluster that provides the client information on where in the cluster
particular data is stored and if any nodes fail. Secondary nodes are backup name
nodes. The Job Tracker is the node of a Hadoop cluster that initiates and
coordinates MapReduce jobs or the processing of the data. Slave nodes store data
and take direction to process it from the Job Tracker.

Querying for data in the distributed system is accomplished via MapReduce. The
client query is handled in a Map job, which is submitted to the Job Tracker. The
Job Tracker refers to the Name Node to determine which data it needs to access to
complete the job and where in the cluster that data is located, then submits the
query to the relevant nodes which operate in parallel. A Name Node acts as
facilitator, communicating back to the client information such as which nodes are
available, where in the cluster certain data resides, and which nodes have failed.
When each node completes its task, it stores its result. The client submits a
Reduce job to the Job Tracker, which then collects and aggregates the results from
each of the nodes.

5. What is NoSQL? How does it fit into the Big Data analytics picture?

NoSQL, also known as “Not Only SQL,” is a new style of database for processing
large volumes of multi-structured data. Whereas Hadoop is adept at supporting
large-scale, batch-style historical analysis, NoSQL databases are mostly aimed at
serving up discrete data stored among large volumes of multi-structured data to
end-user and automated Big Data applications. NoSQL databases trade ACID
(atomicity, consistency, isolation, durability) compliance for performance and
scalability.

Section 9.5 Review Questions

1. What are the challenges facing data warehousing and Big Data? Are we
witnessing the end of the data warehousing era? Why or why not?

What has changed the landscape in recent years is the variety and complexity of
data, which made data warehouses incapable of keeping up. It is not the volume
of the structured data but the variety and the velocity that forced the world of IT
to develop a new paradigm, which we now call “Big Data.” But this does not
mean the end of data warehousing. Data warehousing and RDBMS still bring
many strengths that make them relevant for BI and that Big Data techniques do
not currently provide.

6
Copyright © 2019 Pearson Education, Inc.
2. What are the use cases for Big Data and Hadoop?

In terms of its use cases, Hadoop is differentiated two ways: first, as the
repository and refinery of raw data, and second, as an active archive of historical
data. Hadoop, with their distributed file system and flexibility of data formats
(allowing both structured and unstructured data), is advantageous when working
with information commonly found on the Web, including social media,
multimedia, and text. Also, because it can handle such huge volumes of data (and
because storage costs are minimized due to the distributed nature of the file
system), historical (archive) data can be managed easily with this approach.

3. What are the use cases for data warehousing and RDBMS?

Three main use cases for data warehousing are performance, integration, and the
availability of a wide variety of BI tools. The relational data warehouse approach
is quite mature, and database vendors are constantly adding new index types,
partitioning, statistics, and optimizer features. This enables complex queries to be
done quickly, a must for any BI application. Data warehousing, and the ETL
process, provide a robust mechanism for collecting, cleaning, and integrating data.
And, it is increasingly easy for end users to create reports, graphs, and
visualizations of the data.

4. In what scenarios can Hadoop and RDBMS coexist?

There are several possible scenarios under which using a combination of Hadoop
and relational DBMS-based data warehousing technologies makes sense. For
example, you can use Hadoop for storing and archiving multi-structured data,
with a connector to a relational DBMS that extracts required data from Hadoop
for analysis by the relational DBMS. Hadoop can also be used to filter and
transform multi-structural data for transporting to a data warehouse, and can also
be used to analyze multi-structural data for publishing into the data warehouse
environment. Combining SQL and MapReduce query functions enables data
scientists to analyze both structured and unstructured data. Also, front end query
tools are available for both platforms.

7
Copyright © 2019 Pearson Education, Inc.
Section 9.6 Review Questions

1. What are some of the unique features of Spark as compared to Hadoop?

Hadoop utilizes the batch processing framework and lacks real time processing
capabilities, whereas Spark is a unified analytics engine that can execute both batch and
streaming data .

2. Give examples of companies that have adopted Apache Spark. Find new examples
online.

Examples include Uber, Pinterest, Netflix, Yahoo, and eBay. Uber uses Apache SparkTM
to detect fraudulent trips at scale. Pinterest measures user engagement in real-time using
Apache SparkTM. The recommendation engine of Netflix also utilizes the capabilities of
Apache SparkTM. Yahoo, one of the early adopters of Apache SparkTM, has used it for
creating business intelligence applications. Finally, eBay has used Apache SparkTM for
data management and stream processing.

Searches for updated users will vary based on the date of the search.

3. Run the exercise as described in this section. What do you learn from this exercise?

Student outcomes and results will vary.

Section 9.7 Review Questions

1. What is a stream (in the Big Data world)?

A stream can be thought of as an unbounded flow or sequence of data elements, arriving


continuously at high velocity. Streams often cannot be efficiently or effectively stored for
subsequent processing; thus Big Data concerns about Velocity (one of the six Vs) are
especially prevalent when dealing with streams. Examples of data streams include sensor
data, computer network traffic, phone conversations, ATM transactions, web searches,
and financial data.

2. What are the motivations for stream analytics?

In situations where data streams in rapidly and continuously, traditional analytics


approaches that work with previously accumulated data (i.e., data at arrest) often either
arrive at the wrong decisions because of using too much out-of-context data, or they
arrive at the correct decisions but too late to be of any use to the organization. Therefore
it is critical for a number of business situations to analyze the data soon after it is created
and/or as soon as it is streamed into the analytics system. It is no longer feasible to “store
everything.” Otherwise, analytics will either arrive at the wrong decisions because of
using too much out-of-context data, or at the correct decisions but too late to be of any

8
Copyright © 2019 Pearson Education, Inc.
use to the organization. Therefore it is critical for a number of business situations to
analyze the data as soon as it is streamed into the analytics system.

3. What is stream analytics? How does it differ from regular analytics?

Stream analytics is the process of extracting actionable information from continuously


flowing/streaming data. It is also sometimes called “data in-motion analytics” or “real-
time data analytics.” It differs from regular analytics in that it deals with high velocity
(and transient) data streams instead of more permanent data stores like databases, files, or
web pages.

4. What is critical event processing? How does it relate to stream analytics?

Critical event processing is a method of capturing, tracking, and analyzing streams of


data to detect events (out of normal happenings) of certain types that are worthy of the
effort. It involves combining data from multiple sources to infer events or patterns of
interest. An event may also be defined generically as a “change of state,” which may be
detected as a measurement exceeding a predefined threshold of time, temperature, or
some other value. This applies to stream analytics because the events are happening in
real time.

5. Define data stream mining. What additional challenges are posed by data stream
mining?

Data stream mining is the process of extracting novel patterns and knowledge structures
from continuous, rapid data records. Processing data streams, as opposed to more
permanent data storages, is a challenge. Traditional data mining techniques can process
data recursively and repetitively because the data is permanent. By contrast, a data stream
is a continuous flow of ordered sequence of instances that can only be read once and must
be processed immediately as they come in.

6. What are the most fruitful industries for stream analytics?

Many industries can benefit from stream analytics. Some prominent examples include e-
commerce, telecommunications, law enforcement, cyber security, the power industry,
health sciences, and the government.

7. How can stream analytics be used in e-commerce?

Companies such as Amazon and eBay use stream analytics to analyze customer behavior
in real time. Every page visit, every product looked at, every search conducted, and every
click made is recorded and analyzed to maximize the value gained from a user’s visit.
Behind the scenes, advanced analytics are crunching the real-time data coming from our
clicks, and the clicks of thousands of others, to “understand” what it is that we are
interested in (in some cases, even we do not know that) and make the most of that
information by creative offerings.

9
Copyright © 2019 Pearson Education, Inc.
8. In addition to what is listed in this section, can you think of other industries and/or
application areas where stream analytics can be used?

Stream analytics could be of great benefit to any industry that faces an influx of relevant
real-time data and needs to make quick decisions. One example is the news industry. By
rapidly sifting through data streaming in, a news organization can recognize
“newsworthy” themes (i.e., critical events). Another benefit would be for weather
tracking in order to better predict tornados or other natural disasters. (Different students
will have different answers.)

9. Compared to regular analytics, do you think stream analytics will have more (or less)
use cases in the era of Big Data analytics? Why?

Stream analytics can be thought of as a subset of analytics in general, just like “regular”
analytics. The question is, what does “regular” mean? Regular analytics may refer to
traditional data warehousing approaches, which does constrain the types of data sources
and hence the use cases. Or, “regular” may mean analytics on any type of permanent
stored architecture (as opposed to transient streams). In this case, you have more use
cases for “regular” (including Big Data) than in the previous definition. In either case,
there will probably be plenty of times when “regular” use cases will continue to play a
role, even in the era of Big Data analytics. (Different students will have different
answers.)

Section 9.8 Review Questions

1. Identify some of the key Big Data technology vendors whose key focus is on-premise
Hadoop platforms.

Student search results will vary, but may include Qubole, Yash & HortonWorks.

2. What is special about the Big Data vendor landscape? Who are the big players?

The Big Data vendor landscape is developing very rapidly. It is in a special period of
evolution where entrepreneurial startup firms bring innovative solutions to the
marketplace. Cloudera is a market leader in the Hadoop space. MapR and Hortonworks
are two other Hadoop startups. DataStax is an example of a NoSQL vendor. Informatica,
Pervasive Software, Syncsort, and MicroStrategy are also players. Most of the growth in
the industry is with Hadoop and NoSQL distributors and analytics providers. There is still
very little in terms of Big Data application vendors. Meanwhile, the next-generation data
warehouse market has experienced significant consolidation. Four leading vendors in this
space—Netezza, Greenplum, Vertica, and Aster Data—were acquired by IBM, EMC,
HP, and Teradata, respectively. Mega-vendors Oracle and IBM also play in the Big Data
space, connecting and consolidating their products with Hadoop and NoSQL engines.

3. Search and identify the key similarity and differences between cloud providers’
analytics offerings and analytics providers’ presence on specific cloud platforms.

10
Copyright © 2019 Pearson Education, Inc.
The results of student research will vary based on the date of the research and the
provider selected.

4. What are some of the features of a platform such as Teradata Vantage?

Some results will vary based on the data search, but current features listed at
https://www.teradata.com/Products/Software/Vantage include:

• best analytics engine

• tool integration

• scalability

Section 9.9 Review Questions

1. Define cloud computing. How does it relate to PaaS, SaaS, and IaaS?

Cloud computing offers the possibility of using software, hardware, platform, and
infrastructure, all on a service-subscription basis. Cloud computing enables a more
scalable investment on the part of a user. Like PaaS, etc., cloud computing offers
organizations the latest technologies without significant upfront investment.

In some ways, cloud computing is a new name for many previous related trends:
utility computing, application service provider grid computing, on-demand
computing, software as a service (SaaS), and even older centralized computing with dumb
terminals. But the term cloud computing originates from a reference to the Internet as a
“cloud” and represents an evolution of all previous shared/centralized computing
trends.

2. Give examples of companies offering cloud services. 3. How does cloud computing
affect BI?

Companies offering such services include 1010data, LogiXML, and Lucid Era. These
companies offer feature extract, transform, and load capabilities as well as advanced
data analysis tools. Other companies, such as Elastra and Rightscale, offer dashboard
and data management tools that follow the SaaS and DaaS models.

4. How does DaaS change the way data is handled?

In the DaaS model, the actual platform on which the data resides doesn’t matter. Data
can reside in a local computer or in a server at a server farm inside a cloud-computing
environment. With DaaS, any business process can access data wherever it resides.
Customers can move quickly thanks to the simplicity of the data access and the fact
that they don’t need extensive knowledge of the underlying data.

11
Copyright © 2019 Pearson Education, Inc.
5. What are the different types of cloud platforms?

The three different types of cloud platforms are IaaS, PaaS and SaaS.

6. Why is AaaS cost-effective?

AaaS in the cloud has economies of scale and scope by providing many virtual
analytical applications with better scalability and higher cost savings. The capabilities
that a service orientation (along with cloud computing, pooled resources, and parallel
processing) brings to the analytic world enable cost-effective data/text mining large-
scale optimization, highly-complex multi-criteria decision problems, and distributed
simulation models.

7. Name at least three major cloud service providers.

Examples includes Amazon, IBM Cloud, Microsoft Azure, Google App Engine and
Openshift.

8. Give at least three examples of analytics-as-a-service providers.

Examples include IBM Cloud, MineMyText.com, SAS Viya, Tableau and Snowflake.

Section 9.10 Review Questions

1. How does traditional analytics make use of location-based data?

Traditional analytics produce visual maps that are geographically mapped and based
on the traditional location data, usually grouped by the postal codes. The use of postal
codes to represent the data is a somewhat static approach for achieving a higher level
view of things.

2. How can geocoded locations assist in better decision making?

They help the user in understanding “true location-based” impacts, and allow them to
view at higher granularities than that offered by the traditional postal code
aggregations. Addition of location components based on latitudinal and longitudinal
attributes to the traditional analytical techniques enables organizations to add a new
dimension of “where” to their traditional business analyses, which currently answer
questions of “who,” “what,” “when,” and “how much.” By integrating information
about the location with other critical business data, organizations are now creating
location intelligence (LI).

12
Copyright © 2019 Pearson Education, Inc.
3. What is the value provided by geospatial analytics?

Geospatial analysis gives organizations a broader perspective and aids in decision


making. Location intelligence (LI) is enabling organizations to gain critical insights
and make better decisions by optimizing important processes and applications. By
incorporating demographic details into locations, retailers can determine how sales
vary by population level and proximity to other competitors; they can assess the
demand and efficiency of supply chain operations. Consumer product companies can
identify the specific needs of the customers and customer complaint locations, and
easily trace them back to the products. Sales reps can better target their prospects by
analyzing their geography.

4. Explore the use of geospatial analytics further by investigating its use across various
sectors like government census tracking, consumer marketing, and so forth.

Students’ answers will vary.

5. Search online for other applications of consumer-oriented analytical applications.

Students’ answers will vary.

6. How can location-based analytics help individual consumers?

If a user on a smart phone enters data, the location sensors of the phone can help
find others in that location who are facing similar circumstances, as well as local
companies providing services and products that the consumer desires. The user
can thus see what others in their location are choosing, and the opportunities for
meeting his or her needs. Conversely, the user’s behaviors and choices can then
contribute information to other consumers in the same location.

7. Explore more transportation applications that may employ location-based analytics.

Students’ answers will vary.

8. What other applications can you imagine if you were able to access cell phone location
data?

Students’ answers will vary.

13
Copyright © 2019 Pearson Education, Inc.
ANSWERS TO APPLICATION CASE QUESTIONS FOR DISCUSSION  
Application Case 9.1: Big Data Analytics Helps Luxottica Improve Its Marketing
Effectiveness

1. What does Big Data mean to Luxottica?

For Luxottica, Big Data includes everything they can find about their customer
interactions (in the form of transactions, click streams, product reviews, and social
media postings). They see this as constituting a massive source of business
intelligence for potential product, marketing, and sales opportunities.

2. What were their main challenges?

Because Luxottica outsourced both data storage and promotional campaign


development and management, there was a disconnect between data analytics and
marketing execution. Their competitive posture and strategic growth initiatives
were compromised for lack of an individualized view of their customers and an
inability to act decisively and consistently on the different types of information
generated by each retail channel.

3. What was the proposed solution, and the obtained results?

Luxottica deployed the Customer Intelligence Appliance (CIA) from IBM


Business Partner Aginity LLC. This product is built on IBM PureData System for
Analytics. This solution helps Luxottica highly segment customer behavior and
provide a platform and smart database for marketing execution systems, such as
campaign management, e-mail services, and other forms of direct marketing.
Anticipated benefits include a 10% improvement in marketing effectiveness,
identifying the highest valued customers, and the ability to target customers based
on preference and history.

Application Case 9.2: Overstock.com Combines Multiple Datasets to Understand


Customer Journeys
1. What are some of the different marketing campaigns a company might run to woo
customers? What format might data about these campaigns take?

There are a wide variety of different types of marketing campaigns, but some of those
mentioned within the case include targeted online and direct mail campaigns, advertising
through various channels, growing the loyalty program by providing different customer
incentives and so on.

14
Copyright © 2019 Pearson Education, Inc.
2. By visualizing the most common customer paths to sales, how would you use that
information to make decisions on the future marketing campaigns?

By visualizing these paths you would be able to determine the effectiveness or necessity
of each of the different types of marketing campaigns and determine if there was
important interplay between them to generate purchase events.

3. What other applications of such path analysis techniques can you think of?

Student opinions and ideas will vary, but many retail businesses use a similar marketing
mix a variety of different advertising methods.

Application Case 9.3: eBay’s Big Data Solution

1. Why is Big Data a big deal for eBay?

eBay is the world’s largest online marketplace, and its success requires the ability
to turn the enormous volumes of data it generates into useful insights for
customers. Big Data is essential for this effort.

2. What were the challenges, the proposed solution, and the obtained results?

eBay was experiencing explosive data growth and needed a solution that did not
have the typical bottlenecks, scalability issues, and transactional constraints
associated with common relational database approaches. The company also
needed to perform rapid analysis on a broad assortment of the structured and
unstructured data it captured. eBay’s solution includes NoSQL via Apache
Cassandra and DataStax Enterprise. It also uses integrated Apache Hadoop
analytics that come with DataStax. The solution incorporates a scale-out
architecture that enables eBay to deploy multiple DataStax Enterprise clusters
across several different data centers using commodity hardware. Now, eBay can
more cost effectively process massive amounts of data at very high speeds. The
new architecture serves a wide variety of new use cases, and its reliability and
fault tolerance has been greatly enhanced.

3. Can you think of other e-commerce businesses that may have Big Data challenges
comparable to that of eBay?

Any large company with a significant online presence will have similar
challenges as eBay. Amazon and Walmart are two of the largest.

15
Copyright © 2019 Pearson Education, Inc.
Application Case 9.4: Understanding Quality and Reliability of Healthcare Support
Information on Twitter
1. What was the data scientists’ main concern regarding health information that is
disseminated on the Twitter platform?

The primary concern was the quality and accuracy of the information that was being
distributed.

2. How did the data scientists ensure that nonexpert information disseminated on social
media could indeed contain valuable health information?

The scientists evaluated the type of information that was distributed and the size of the
following of the user for both influential and non-influential users. They found that
influential users typically provided higher quality information with more discussion of
facts.

3. Does it make sense that influential users would share more objective information
whereas less influential users could focus more on subjective information? Why?

Student opinions will vary, but it may be that users gain influence by providing higher
quality information and other users are able to identify this information themselves.

Application Case 9.5: Using Natural Language Processing to analyze customer


feedback in TripAdvisor reviews
1. How did the predictive modelling help TripAdvisor?

The company was able to use historical information on user reviews to predict the type of
review they would give a particular restaurant.

2. Why was Spark used?

Spark was used because of its ability for parallel processing and in memory processing.

Application Case 9.6: Salesforce Is Using Streaming Data to Enhance Customer


Value
1. Are there areas in any industry where streaming data is irrelevant?

Student opinions will vary, but it would be difficult to find an industry.

2. Besides customer retention, what are other benefits of using predictive analytics?

Companies are able to make predictions and decisions about their consumers more
rapidly. This ensures that businesses target, attract, and retain the right customers and
maximize their value.

16
Copyright © 2019 Pearson Education, Inc.
Application Case 9.7: Using Social Media for Nowcasting Flu Activity

1. Why would social media be able to serve as an early predictor of flu outbreaks?

Social media can be used because individuals will mention if they become sick and this
information, along with the users geographic location, can be analyzed with models that
have historically been used for disease propagation.

2. What other variables might help in predicting such outbreaks?

Student opinions will vary, but some examples might include the density level of
individuals in metropolitan areas and the percentage of individuals active on the
particular social media platform.

3. Why would this problem be a good problem to solve using Big Data technologies
mentioned in this chapter?

This is a very complex problem and process since data needs to be extracted for a
secondary use from a social media platform, geographic information tagged and then an
analysis performed.

Application Case 9.8: Analyzing Disease Patterns from an Electronic Medical


Records Data Warehouse

1. What could be the reasons behind the health disparities across gender?

Students opinions on these topics will vary. One possible answer is that individuals have
different lifestyles related to their gender that affect these results.

2. What are the main components of a network?

A network is comprised of a defined set of items called nodes, which are linked to each
other through edges.

3. What type of analytics was applied in this application?

To compare the comorbidities, a network analysis approach was applied.

Application Case 9.9: Major West Coast Utility Uses Cloud-Mobile Technology to
Provide Real-Time Incident Reporting

17
Copyright © 2019 Pearson Education, Inc.
1. How does cloud technology impact enterprise software for small and mid-size
businesses?

It is made deployment of software easier without issues related to hardware, installation


and backups. All of the business logic and configurations are stored in the cloud, the
solution itself can act as a stand-alone system for customers who have no backend
systems.

2. What are some of the areas where businesses can use mobile technology?

Mobile technology is ideal when employees themselves are mobile either due to the
nature of their work or because they are away from the office.

3. What types of businesses are likely to be the forerunners in adopting cloud-mobile


technology?

In addition to businesses that have compelling needs to communicate with employees that
are separate from an office location, businesses that have needs for distributed
applications without large budgets or IT staffs may benefit.

4. What are the advantages of cloud-based enterprise software instead of the traditional
on-premise model?

There are number of benefits including the ability to use familiar hardware that does not
need to be standardized, the ability to easily perform software updates, and store data in
the cloud (for both ease of access and backup reasons).

5. What are the likely risks of cloud versus traditional on-premise applications?

Student opinions will vary, but there will always be a concern about data security and
privacy.

Application Case 9.10: Great Clips Employs Spatial Analytics to Shave Time in
Location Decisions

1. How is geospatial analytics employed at Great Clips?

Great Clips depends on a growth strategy that is driven by rapidly opening new stores in
the right locations and markets. They use geospatial analysis to help analyze the locations
based on the requirements for a potential customer base, demographic trends, and sales
impact on existing franchises in the target location. They use their Alteryx-based solution
to evaluate each new location based on demographics and consumer behavior data,
aligning with existing Great Clips customer profiles and the potential revenue impact of
the new site on the existing sites.

18
Copyright © 2019 Pearson Education, Inc.
2. What criteria should a company consider in evaluating sites for future locations?

Major criteria include potential customer base, demographic trends, and sales
impact on existing franchises in the target location.

3. Can you think of other applications where such geospatial data might be useful?

Geospatial data can be used to help customers find the right location (for example, the
closest Great Clips location). It is certainly relevant for other companies in a variety of
industries. Analyzing customer profiles and applying these to geographic information can
assist with many retail firms. Another possibility is utilizing geospatial analysis to find
locations for manufacturing facilities; in this case you would be looking for supplier and
raw materials’ locations more than customer locations. In the consumer market,
geospatial analysis can help users in a variety of applications; for example finding the
best locations for restaurants or stores catering to the customer’s desires. (Student
answers will vary.)

Application Case 9.11: Starbucks Exploits GIS and Analytics to Grow Worldwide
1. What type of demographics and GIS information would be relevant for deciding on a
store location?

Information such as trade areas, retail clusters and generators, traffic, and demographics
is important in deciding the next store’s location.

2. It has been mentioned that Starbucks encourages its customers to use its mobile app.
What type of information might the company gather from the app to help it better plan
operations?

The company will be able to determine where the user is ordering from, and the distance
to the nearest Starbucks from that location.

3. Will the availability of free Wi-Fi at Starbucks stores provide any information to
Starbucks for better analytics?

Student ideas and responses will vary.

ANSWERS TO END OF CHAPTER QUESTIONS FOR DISCUSSION   


1. What is Big Data? Why is it important? Where does Big Data come from?

Traditionally, the term “Big Data” has been used to describe the massive volumes
of data analyzed by huge organizations like Google or research science projects at
NASA. But for most businesses, it’s a relative term: “Big” depends on an

19
Copyright © 2019 Pearson Education, Inc.
organization’s size. In general, Big Data exceeds the reach of commonly used
hardware environments and/or capabilities of software tools to capture, manage,
and process it within a tolerable time span for its user population. Big Data has
become a popular term to describe the exponential growth, availability, and use of
information, both structured and unstructured. Big data is important because it is
there and because it is rich in information and insight that, if effectively tapped,
can lead to better business decisions and improved company performance. Much
has been written on the Big Data trend and how it can serve as the basis for
innovation, differentiation, and growth. Big Data includes both structured and
unstructured data, and it comes from everywhere: data sources include Web logs,
RFID, GPS systems, sensor networks, social networks, Internet-based text
documents, Internet search indexes, and detailed call records, to name just a few.
Big data is not just about volume, but also variety, velocity, veracity, and value
proposition.

2. What do you think the future of Big Data will be? Will it lose its popularity to
something else? If so, what will it be?

Big Data could evolve at a rapid pace. The buzzword “Big Data” might change to
something else, but the trend toward increased computing capabilities, analytics
methodologies, and data management of high volume heterogeneous information
will continue. (Different students may have different answers.)

3. What is Big Data analytics? How does it differ from regular analytics?

Big Data analytics is analytics applied to Big Data architectures. This is a new
paradigm; in order to keep up with the computational needs of Big Data, a
number of new and innovative analytics computational techniques and platforms
have been developed. These techniques are collectively called high-performance
computing, and include in-memory analytics, in-database analytics, grid
computing, and appliances. They differ from regular analytics which tend to focus
on relational database technologies.

4. What are the critical success factors for Big Data analytics?

Critical factors include a clear business need, strong and committed sponsorship,
alignment between the business and IT strategies, a fact-based decision culture, a
strong data infrastructure, the right analytics tools, and personnel with advanced
analytic skills.

5. What are the big challenges that one should be mindful of when considering
implementation of Big Data analytics?

Traditional ways of capturing, storing, and analyzing data are not sufficient for
Big Data. Major challenges are the vast amount of data volume, the need for data
integration to combine data of different structures in a cost-effective manner, the
need to process data quickly, data governance issues, skill availability, and
solution costs.

20
Copyright © 2019 Pearson Education, Inc.
6. What are the common business problems addressed by Big Data analytics?

Here is a list of problems that can be addressed using Big Data analytics:

• Process efficiency and cost reduction


• Brand management
• Revenue maximization, cross-selling, and up-selling
• Enhanced customer experience
• Churn identification, customer recruiting
• Improved customer service
• Identifying new products and market opportunities
• Risk management
• Regulatory compliance
• Enhanced security capabilities

7. In the era of Big Data, are we about to witness the end of data warehousing?
Why?

What has changed the landscape in recent years is the variety and complexity of
data, which made data warehouses incapable of keeping up. It is not the volume
of the structured data but the variety and the velocity that forced the world of IT
to develop a new paradigm, which we now call “Big Data.” But this does not
mean the end of data warehousing. Data warehousing and RDBMS still bring
many strengths that make them relevant for BI and that Big Data techniques do
not currently provide.

8. What are the use cases for Big Data/Hadoop and data warehousing/RDBMS?

In terms of its use cases, Hadoop is differentiated two ways: first, as the
repository and refinery of raw data, and second, as an active archive of historical
data. Hadoop, with their distributed file system and flexibility of data formats
(allowing both structured and unstructured data), is advantageous when working
with information commonly found on the Web, including social media,
multimedia, and text. Also, because it can handle such huge volumes of data (and
because storage costs are minimized due to the distributed nature of the file
system), historical (archive) data can be managed easily with this approach.

Three main use cases for data warehousing are performance, integration, and the
availability of a wide variety of BI tools. The relational data warehouse approach
is quite mature, and database vendors are constantly adding new index types,
partitioning, statistics, and optimizer features. This enables complex queries to be
done quickly, a must for any BI application. Data warehousing, and the ETL
process, provide a robust mechanism for collecting, cleaning, and integrating data.
And, it is increasingly easy for end users to create reports, graphs, and
visualizations of the data.

21
Copyright © 2019 Pearson Education, Inc.
9. Is cloud computing “just an old wine in a new bottle?” How is it similar to other
initiatives? How is it different?

In some ways, cloud computing is a new name for many previous related trends:
utility computing, application service provider grid computing, on-demand
computing, software as a service (SaaS), and even older centralized computing
with dumb terminals. But the term cloud computing originates from a reference to
the Internet as a “cloud” and represents an evolution of all previous
shared/centralized computing trends.

Cloud computing offers the possibility of using software, hardware, platform, and
infrastructure, all on a service-subscription basis. Cloud computing enables a
more scalable investment on the part of a user. Like PaaS, etc., cloud-computing
offers organizations the latest technologies without significant upfront investment.

10. What is stream analytics? How does it differ from regular analytics?

Stream analytics is the process of extracting actionable information from


continuously flowing/streaming data. It is also sometimes called “data in-motion
analytics” or “real-time data analytics.” It differs from regular analytics in that it
deals with high velocity (and transient) data streams instead of more permanent
data stores like databases, files, or web pages).

11. What are the most fruitful industries for stream analytics? What is common to
those industries?

Many industries can benefit from stream analytics. Some prominent examples
include e-commerce, telecommunications, law enforcement, cyber security, the
power industry, health sciences, and the government.

12. Compared to regular analytics, do you think stream analytics will have more (or
fewer) use cases in the era of Big Data analytics? Why?

Stream analytics can be thought of as a subset of analytics in general, just like


“regular” analytics. The question is, what does “regular” mean? Regular analytics
may refer to traditional data warehousing approaches, which does constrain the
types of data sources and hence the use cases. Or, “regular” may mean analytics
on any type of permanent stored architecture (as opposed to transient streams). In
this case, you have more use cases for “regular” (including Big Data) than in the
previous definition. In either case, there will probably be plenty of times when
“regular” use cases will continue to play a role, even in the era of Big Data
analytics. (Different students will have different answers.)

13. What are the potential benefits of using geospatial data in analytics? Give examples.

Geospatial analytics gives organizations a broader perspective and aids in decision


making. Geospatial data helps companies with managing operations, targeting
customers, and deciding on promotions. It also helps consumers directly, making use

22
Copyright © 2019 Pearson Education, Inc.
of integrated sensor technologies and global positioning systems installed in their
smartphones. Using geospatial data, companies can identify the specific needs of the
customers and customer complaint locations, and easily trace them back to the
products. Another example is in the telecommunications industry, where geospatial
analysis can enable communication companies to capture daily transactions from a
network to identify the geographic areas experiencing a large number of failed
connection attempts of voice, data, text, or Internet.

14. What types of new applications can emerge from knowing locations of users in real
time? What if you also knew what they have in their shopping cart, for example?

One prominent application is in the emerging area of reality mining, which uses
location-enabled devices for finding nearby services, locating friends and family,
navigating, tracking of assets and pets, dispatching, and engaging in sports, games,
and hobbies. Adding shopping cart knowledge will enhance the application’s ability
to provide targeted information to a customer; for example, the app could find prices
for similar products in nearby stores.

15. How can consumers benefit from using analytics, especially based on location
information?

Consumer-oriented analytics based on location information fit into two major


categories: (a) GPS navigation and data analysis, and (b) historic and current
location demand analysis. Consumers benefit from analytics-based applications in
many areas, including fun and health, as well as enhanced personal productivity.

16. “Location-tracking–based profiling is powerful but also poses privacy threats.”


Comment.

Students’ answers will differ. Privacy threats relate to user-profiling, intrusive use of
personal information, and not being able to control what is being collected.

17. Is cloud computing “just an old wine in a new bottle?” How is it similar to other
initiatives? How is it different?

In some ways, cloud computing is a new name for many previous related trends:
utility computing, application service provider grid computing, on-demand
computing, software as a service (SaaS), and even older centralized computing with dumb
terminals. But the term cloud computing originates from a reference to the Internet as a
“cloud” and represents an evolution of all previous shared/centralized computing
trends.
Cloud computing offers the possibility of using software, hardware, platform, and
infrastructure, all on a service-subscription basis. Cloud computing enables a more
scalable investment on the part of a user. Like PaaS, etc., cloud-computing offers
organizations the latest technologies without significant upfront investment.

18. Discuss the relationship between mobile devices and social networking.

23
Copyright © 2019 Pearson Education, Inc.
Mobile social networking enables social networking where members converse and
connect with one another using cell phones or other mobile devices.

ANSWERS TO END OF CHAPTER EXCERCISES  


Teradata University Network (TUN) and Other Hands-on Exercises

1. Go to teradatauniversitynetwork.com, and search for case studies. Read cases and


white papers that talk about Big Data analytics. What is the common theme in those case
studies?

Student research and reports will vary

2. At teradatauniversitynetwork.com, find the SAS Visual Analytics white papers, case


studies, and hands-on exercises. Carry out the visual analytics exercises on large data sets
and prepare a report to discuss your findings.

Student performance of tasks will vary as will their analysis.

3. At teradatauniversitynetwork.com, go to the Sports Analytics page. Find applications


of Big Data in sports. Summarize your findings.

Student research and reports will differ

4. Go to teradatauniversitynetwork.com, and search for BSI Videos that talk about Big
Data. Review these BSI videos, and answer the case questions related to them.

5. Go to the teradata.com and/or asterdata.com Web sites. Find at least three customer
case studies on Big Data, and write a report where you discuss the commonalities and
differences of these cases.

Student selection of case studies and the resulting reports will differ.

6. Go to IBM.com. Find at least three customer case studies on Big Data, and write a
report where you discuss the commonalities and differences of these cases.

Student selection of cases will cause differences in the reports.

7. Go to claudera.com. Find at least three customer case studies on Hadoop


implementation, and write a report where you discuss the commonalities and differences
of these cases.

Students will select different cases.

24
Copyright © 2019 Pearson Education, Inc.
8. Go to mapr.com. Find at least three customer case studies on Hadoop implementation,
and write a report where you discuss the commonalities and differences of these cases.

Students will select different cases.

9. Go to hortonworks.com. Find at least three customer case studies on Hadoop


implementation, and write a report in which you discuss the commonalities and
differences of these cases.

Student selection and analysis will vary.

10. Go to marklogic.com. Find at least three customer case studies on Hadoop


implementation, and write a report where you discuss the commonalities and differences
of these cases.

Student selection and analysis will vary.

11. Go to youtube.com. Search for videos on Big Data computing. Watch at least two.
Summarize your findings.

Students will select different videos and have different reactions.

12. Go to google.com/scholar, and search for articles on stream analytics. Find at least
three related articles. Read and summarize your findings.

Student summaries will vary based on the data search and the articles found.

13. Enter google.com/scholar, and search for articles on data stream mining. Find at least
three related articles. Read and summarize your findings.

Selection of articles will vary.

14. Enter google.com/scholar, and search for articles that talk about Big Data versus data
warehousing. Find at least five articles. Read and summarize your findings.

Student research will differ.

15. Location-tracking–based clustering provides the potential for personalized services


but challenges for privacy. Divide the class into two parts to argue for and against such
applications.

Student opinions and debate positions will be different.

16. Enter YouTube.com. Search for videos on cloud computing, and watch at least two.
Summarize your findings.

Students will find and select different videos.

25
Copyright © 2019 Pearson Education, Inc.
Solution Manual for Analytics, Data Science, & Artificial Intelligence Systems for Decision

17. Enter Pandora.com. Find out how you can create and share music with friends.
Explore how the site analyzes user preferences.

Students will have different impressions of the service.

18. Enter Humanyze.com. Review various case studies and summarize one interesting
application of sensors in understanding social exchanges in organizations.

Students will select different case studies and have different reactions.

19. The objective of the exercise is to familiarize you with the capabilities of
smartphones to identify human activity. The data set is available at
archive.ics.uci.edu/ml/datasets/ Human+Activity+Recognition+Using+Smartphones. It
contains accelerometer and gyroscope readings on 30 subjects who had the smartphone
on their waist. The data is available in a raw format and involves some data preparation
efforts. Your objective is to identify and classify these readings into activities like
walking, running, climbing, and such. More information on the data set is available on
the download page. You may use clustering for initial exploration and to gain an
understanding of the data. You may use tools like R to prepare and analyze this data.

Student analysis and subsequent reports will be different.

26
Copyright © 2019 Pearson Education, Inc.

Visit TestBankBell.com to get complete for all chapters

You might also like