Professional Documents
Culture Documents
[Speaker Name]
Two approaches to information management for
analytics: Top-down and bottom-up
Top-down How can we
make it happen?
(deductive) Prescriptive
What will analytics
happen?
Theory
Predictive
Theory analytics
Why did T ION Hypothesis
it happen? IZA
Hypothesis TIM Pattern
What
Diagnostic OP
analytics
Observation happened? Observation
Descriptive
Confirmation analytics
TION
RMA
Bottom-up
VALUE
O
INF
(inductive)
DIFFICULTY
Data warehousing uses a top-down approach
Understand Gather Implement data warehouse
corporate requirements
Reporting and BI and analytics
strategy Reporting and
analytics
analytics design
Business development Dashboards Reporting
requirements
Data warehouse
Dimension modeling Physical design
ETL
ETL
ETL design
Technical development
requirements
Data sources
Set up infrastructure Install and tune
OLTP ERP CRM LOB
The data lake uses a bottom-up approach
Ingest all data Store all data Do analysis
regardless of requirements in native format without using analytic engines
schema definition like Hadoop
Devices Social
Batch queries
Interactive queries
Devices
LOB
applications Video LOB
applications
Social
Real-time analytics
Sensors
Machine Learning
Video
Web Sensors
Relational
Data warehouse
Web Clickstream
Relational Clickstream
Challenges involved in implementing a data lake
Data Silos Analytics
Operations
Purchasing
Marketing
Sales
Inefficiency in colocation Variety of analytics tools
Storage
YARN
WebHDFS
ADL Store
Storage
1 1 1 1 1 1
1 1 1 1 1 1
Azure Data Lake
As a part of Cortana Analytics Suite
Information management Big data stores Machine Learning Dashboards and
and Analytics visualizations
Power BI
Business
apps
Azure Azure
Azure Personal digital assistant
Data Factory SQL Data Warehouse Machine Learning
Cortana People
Azure
Stream Analytics
Azure
Custom Data Catalog Perceptual intelligence
apps Azure Face, vision
Azure HDInsight (Hadoop)
Data Lake Store Speech, text
Azure Azure
Event Hub Data Lake Analytics Business scenarios
Sensors Recommendations,
and devices customer churn, Automated
forecasting systems
ADL
Analytics Azure
Storage
U-SQL job reads and blobs
writes data
ADL Analytics
Clickstream Web
HDInsight
ADL Store
R
Social Sensors
Spark
Relational LOB
applications
Machine Learning
Introducing Azure
Data Lake Store
A hyper-scale repository for big data Store ANY DATA in its native format
analytics workloads
HADOOP FILE SYSTEM (HDFS) for the
cloud
ENTERPRISE GRADE
No limits to SCALE
Semi-structured
Structured
Social Sensors
Relational LOB
applications
HDFS for the cloud
HDInsight
Durable and highly
available
Automatically replicates your data
Three copies within a single region
Highly available
Unlimited storage
PB
GB
TB
Unlimited account sizes
Individual file sizes from gigabytes to
petabytes
No limits to scale
TB PB
Optimized for analytics
workload performance
Built for running large analytics systems
that require massive throughput
Optimized for parallel computation over
petabytes of data
Automatically optimizes for any
throughput
Azure Data Lake
Analytics
Azure Built on Apache YARN
Data Lake Analytics Scales dynamically with the turn of a dial
service Pay by the query
A new distributed analytics service Supports Azure Active Directory for access
control, roles, and integration with on-
premises identity systems
ADL Analytics
© 2016 Microsoft Corporation. All rights reserved. Microsoft, Windows, Microsoft Azure, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The
information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT
MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION