Professional Documents
Culture Documents
ARCHITECTURE
SESSION -2 ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 1
DISCUSSION QUESTIONS
Question 1. Data partnerships, and data sharing more broadly, are creating important
new sources of business value but at the same time generating new and as yet unanswered
questions regarding data ownership, data accessibility, and data rights. To the extent that
companies rely on data sharing for business value, according to you who will manage the
risks that accompany such dependence? And what are the ways to mitigate these risks?
Question 2. List at least 3 case scenarios to show how the analytics revolution is currently
transforming organizations as well as the economies and societies in which they operate.
Question 4. ’To become more competitive and more efficient, companies need to look at
the broader set of related risks, incorporate more data sources, use better tools to allow
them to move to real-time or near-real-time analysis and increase data volumes. List the
key questions which could be used to assess their (organizations) readiness to truly start
benefiting from big data.
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 2
DATA ECO SYSTEM: FUTURE TRAJECTORY OF VALUE
Data Generation and Collection: Sources and platform where data are
initially captured
Data Aggregation: Processes and platforms for combining data from
multiple sources.
Data Analysis: Getting insights from data that can be acted upon.
Can support different kind of data sources that companies need. Include:
Sensor Event as a Service,
Video Surveillance as a Service,
Big Data Analytics as a Service and
Data as a Service.
Therefore, revenue may be based on the hours customers use the service and
how much they use storage capacity.
Data Storage
Cloud Infrastructure
Infrastructure
eBay has one of the most mature Enterprise Data Platform’s in the industry with over
200PBs of data stored in our Hadoop and Teradata Warehouses.
On average 30 TB of transactional and behavioral data is extracted on a daily basis and
thousands of metrics are computed, analyzed and monitored for decision making and
detecting anomalies.
eBay is currently working with several tools including Apache Spark, Storm, Kafka, and
Hortonworks HDF
The Services Layer includes components that provide or perform commonly used services.
Presentation Services
Information Services
Business Activity Monitoring,
Business Rules, and
Event Handling
ONLY FOR ACADEMIC PURPOSE (PREPARED BY DR. PREETI KHANNA) 21
The Process Layer represents components that perform higher level processing activities
like
Analytical
Intelligence gathering, and
Performance management processes.
The Interaction Layer is comprised of components used to support interaction with end
users. Common artifacts for this layer include
dashboards,
reports, charts, graphs, and spreadsheets.
The results of analysis can be delivered via many different channels. The architecture calls
out common IP network based channels such as desktops and laptops, common mobile
network channels such as mobile phones and tablets, and other channels such as email,
SMS, and hardcopy.
Data storage: Data for batch processing operations is typically stored in a distributed
file store that can hold high volumes of large files in various formats.
Batch processing: Because the data sets are so large, often a big data solution must
process data files using long-running batch jobs to filter, aggregate, and otherwise
prepare the data for analysis.
Options include
using Hive, Pig, in an Hadoop cluster
or using Java, or Python programs in an Spark cluster.
Stream processing: After capturing real-time messages, the solution must process them by filtering,
aggregating, and otherwise preparing the data for analysis.
Apache streaming technologies like Storm and Spark Streaming in an HDInsight cluster
Analytical data store: The analytical data store used to serve these queries can be a Kimball-style relational
data warehouse, as seen in most traditional business intelligence (BI) solutions.
Interactive Hive, HBase, and Spark SQL
Analysis and reporting: To empower users to analyze the data, the architecture may include a data
modeling layer, such as a multidimensional OLAP cube or tabular data model.
Microsoft Power BI , Microsoft Excel, R Server, either standalone or with Spark.
Orchestration: Transform source data, move data between multiple sources and sinks, load the processed
data into an analytical data store, or push the results straight to a report or dashboard.
To automate these workflows, you can use an orchestration technology such Azure Data Factory