You are on page 1of 36

BIG DATA AS A SERVICE(BDAAS) AND

ARCHITECTURE
SESSION -2 ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 1
DISCUSSION QUESTIONS
Question 1. Data partnerships, and data sharing more broadly, are creating important
new sources of business value but at the same time generating new and as yet unanswered
questions regarding data ownership, data accessibility, and data rights. To the extent that
companies rely on data sharing for business value, according to you who will manage the
risks that accompany such dependence? And what are the ways to mitigate these risks?

Question 2. List at least 3 case scenarios to show how the analytics revolution is currently
transforming organizations as well as the economies and societies in which they operate.

Question 3. What is the role of leaders in transforming data driven organization?

Question 4. ’To become more competitive and more efficient, companies need to look at
the broader set of related risks, incorporate more data sources, use better tools to allow
them to move to real-time or near-real-time analysis and increase data volumes. List the
key questions which could be used to assess their (organizations) readiness to truly start
benefiting from big data.
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 2
DATA ECO SYSTEM: FUTURE TRAJECTORY OF VALUE
Data Generation and Collection: Sources and platform where data are
initially captured
Data Aggregation: Processes and platforms for combining data from
multiple sources.
Data Analysis: Getting insights from data that can be acted upon.

Most literature on Big data analytics focuses on


 how it can be used to enhance tactical organizational capabilities?
 how it can create strategic value for the organization?

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 3


GARTNER 2017 HYPE CYCLE ABOUT ANALYTICS AND BI

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 4


ANALYTICS AS A SERVICE (A-AA-S)
Analytics as a service (AaaS) is one innovative way to create business and
its idea is basically as with Platform-as-a-service (PaaS) models.

Can support different kind of data sources that companies need. Include:
Sensor Event as a Service,
Video Surveillance as a Service,
Big Data Analytics as a Service and
Data as a Service.

Therefore, revenue may be based on the hours customers use the service and
how much they use storage capacity.

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 5


BIG DATA AS A SERVICE (BD-AA-S)
The BDaaS service stack is composed of layers that group technology types according to
the function they perform.
Each layer (6 in number) has a specific use and abstracts the complexity of distributed Big
Data processing from the end users.
The lower layers of the BDaaS service stack are closer to the IaaS platform of cloud
computing.
The upper layers of the BDaaS stack have a presentation layer, which enables users to
access the services.
The ingredients necessary for BdaaS Include:
Service-Oriented Architecture.
Cloud Virtualization Capabilities.
Complex Event-Driven Processing.
Business Intelligence Tools.

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 6


BIG DATA AS A
SERVICE

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 7


TO DO ACTIVITY
Benefits Concerns End Users
Data Analytics
Data Management
Computing

Data Storage
Cloud Infrastructure

Infrastructure

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 8


BDAAS: ADVANTAGES AND DISADVANTAGES OF EACH LAYER FOR USERS
Benefits Concerns End Users
Data  Users can readily access analytics  No direct access to data  Business users
Analytics services without the hassle of data  Analytics limited to the  Data scientists
or infrastructure management data exposed in the  Business analysts
layer
Data  Direct access to data  Requires programming  Database
Management  Ability to analyze or modify knowledge to operate administrators
complex data sets  Programmers

Computing  Most flexibility as programmers  Requires programming  Programmers


can write programs to manipulate knowledge to operate
data

Data Storage  Access to raw data in the  Requires programming  Programmers


distributed storage knowledge to operate

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 9


BDAAS: ADVANTAGES AND DISADVANTAGES OF EACH LAYER FOR
USERS
Benefits Concerns End Users
Cloud  Can be used to instantiate IT  Requires infrastructure  Infrastructure engineers
Infrastructure infrastructure knowledge to operate  Programmers
 Ability to determine the
capability of the overlying
infrastructure
Infrastructure  Basic hardware on which  Infrastructure engineers
computing infrastructure is
based

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 10


BIG DATA ARCHITECTURE:
1. IBM WATSON
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 11
WATSON:
Watson can “think” or “reason” in a way very similar to the human brain. It
processes information, draws conclusions, and learns from its experiences.
Watson does not use predefined rules and structured queries to uncover
answers, it instead generates hypotheses based on a wide variety of
potentially relevant information and connections.
Answers are expressed as recommendations along with confidence rankings.

IBM Watson is an ecosystem of cognitive computing capabilities

IBM Watson uses Spark to manage incoming data streams.


It also uses Spark’s machine learning library to analyze data.
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 12
WHAT IS DEEPQA?
QA – Question Answering
Systems designed to answer questions posed in natural language

Goal – create a system capable of playing Jeopardy! at human


championship level
In real time

IBM’s follow up project to DeepBlue

Generates many hypotheses, collects wide range of evidence, balances the


combined confidences and analyzes the evidence from different dimensions.
DEEPQA
ARCHITECTURE
Watson Explorer Platform: the core expandable cognitive indexing and
natural language search framework

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 15


BIG DATA ARCHITECTURE:
2. E-BAY
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 16
The company is using big data and machine learning solutions to address use cases such as
personalization, merchandising to improve the user's experience.

eBay has one of the most mature Enterprise Data Platform’s in the industry with over
200PBs of data stored in our Hadoop and Teradata Warehouses.

On average 30 TB of transactional and behavioral data is extracted on a daily basis and
thousands of metrics are computed, analyzed and monitored for decision making and
detecting anomalies.

eBay is currently working with several tools including Apache Spark, Storm, Kafka, and
Hortonworks HDF

eBay models personalization on structured (e.g. purchases) and unstructured (e.g.


behavioral activity synopsis) data.
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 17
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 18
BIG DATA ARCHITECTURE:
3. ORACLE REFERENCE
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 19
Reference
Architecture High
level Logical View
by Oracle

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 20


At the base of the reference architecture is the Shared Infrastructure Layer. This layer
includes the hardware and platforms on which the Big Data and Analytics components run.

The Information Layer includes all information management components, i.e.


Data stores,
Components to capture, move, integrate, process,
& Virtualize data

The Services Layer includes components that provide or perform commonly used services.
Presentation Services
Information Services
Business Activity Monitoring,
Business Rules, and
Event Handling
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 21
The Process Layer represents components that perform higher level processing activities
like
 Analytical
 Intelligence gathering, and
 Performance management processes.

The Interaction Layer is comprised of components used to support interaction with end
users. Common artifacts for this layer include
dashboards,
reports, charts, graphs, and spreadsheets.

The results of analysis can be delivered via many different channels. The architecture calls
out common IP network based channels such as desktops and laptops, common mobile
network channels such as mobile phones and tablets, and other channels such as email,
SMS, and hardcopy.

The architecture is supported by - monitoring, management, security, and governance.


ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 22
REAL TIME ANALYTICS COMPONENTS OF THE LOGICAL ARCHITECTURE

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 23


INTELLIGENT PROCESS COMPONENT IN THE LOGICAL ARCHITECTURE

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 24


BIG DATA ARCHITECTURE :
4. MICROSOFT AZURE REFERENCE
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 25
BIG DATA ARCHITECTURE STYLE

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 26


Data sources: Like
Application data stores, such as relational databases.
Static files produced by applications, such as web server log files.
Real-time data sources, such as IoT devices.

Data storage: Data for batch processing operations is typically stored in a distributed
file store that can hold high volumes of large files in various formats.

Batch processing: Because the data sets are so large, often a big data solution must
process data files using long-running batch jobs to filter, aggregate, and otherwise
prepare the data for analysis.
Options include
using Hive, Pig, in an Hadoop cluster
or using Java, or Python programs in an Spark cluster.

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 27


Real-time message ingestion: If the solution includes real-time sources, the architecture must include a way to
capture and store real-time messages for stream processing.
Azure Event Hubs, Azure IoT Hubs, and Kafka

Stream processing: After capturing real-time messages, the solution must process them by filtering,
aggregating, and otherwise preparing the data for analysis.
Apache streaming technologies like Storm and Spark Streaming in an HDInsight cluster

Analytical data store: The analytical data store used to serve these queries can be a Kimball-style relational
data warehouse, as seen in most traditional business intelligence (BI) solutions.
Interactive Hive, HBase, and Spark SQL

Analysis and reporting: To empower users to analyze the data, the architecture may include a data
modeling layer, such as a multidimensional OLAP cube or tabular data model.
Microsoft Power BI , Microsoft Excel, R Server, either standalone or with Spark.

Orchestration: Transform source data, move data between multiple sources and sinks, load the processed
data into an analytical data store, or push the results straight to a report or dashboard.
To automate these workflows, you can use an orchestration technology such Azure Data Factory

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 28


ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 29
THINGS TO MANAGE IN BIG DATA PLATFORM

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 30


To derive real business value from big data, you need the
right tools to capture and organize a wide variety of
data types from different sources, the ability to analyze
each type within the context of your enterprise data –
and do it all securely.

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 31


Source: Mckinsey Global Report on Age of analytics, December
ONLY FOR ACADEMIC PURPOSE (PREPARED 2016
BY DR. PREETI KHANNA) 32
TOOLS TO HANDLE BIG DATA
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 33
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 34
BIG DATA
TOOLS
OVER
THE
YEARS

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 35


For businesses that need
commercial support
with Hadoop and their
Big Data requirements,
there are more than 10
companies that are ready
to serve you with them.

ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 36

You might also like