You are on page 1of 24

Microsoft C+E Technology Training

Data Platform and


Analytics
Foundational Training
Solution Area
Data Analytics
Solution
Big Data
Technology
Data Lake

[Speaker Name]
Two approaches to information management for
analytics: Top-down and bottom-up
Top-down How can we
make it happen?
(deductive) Prescriptive
What will analytics
happen?
Theory
Predictive
Theory analytics
Why did T ION Hypothesis
it happen? IZA
Hypothesis TIM Pattern
What
Diagnostic OP
analytics
Observation happened? Observation
Descriptive
Confirmation analytics
TION
RMA
Bottom-up
VALUE

O
INF
(inductive)
DIFFICULTY
Data warehousing uses a top-down approach
Understand Gather Implement data warehouse
corporate requirements
Reporting and BI and analytics
strategy Reporting and
analytics
analytics design
Business development Dashboards Reporting
requirements
Data warehouse
Dimension modeling Physical design

ETL
ETL
ETL design
Technical development
requirements

Data sources
Set up infrastructure Install and tune
OLTP ERP CRM LOB
The data lake uses a bottom-up approach
Ingest all data Store all data Do analysis
regardless of requirements in native format without using analytic engines
schema definition like Hadoop

Devices Social
Batch queries

Interactive queries
Devices
LOB
applications Video LOB
applications
Social
Real-time analytics
Sensors
Machine Learning
Video
Web Sensors
Relational
Data warehouse
Web Clickstream

Relational Clickstream
Challenges involved in implementing a data lake
Data Silos Analytics

Data spans sources Open interfaces to data

Operations
Purchasing

Marketing

Sales
Inefficiency in colocation Variety of analytics tools

Performance and Scale Security

Storage bottlenecks Compliance challenges


IoT sources – small writes Effectively control access
Price-performance Corporate policies
Data grows independently
Introducing
Azure Data Lake
Azure Data Lake (ADL)
Azure Data Lake (ADL)
Analytics

Azure Data Lake Azure Data Lake


Analytics Analytics

Storage

Azure Data Lake


Store
Built on open source
ADL Analytics ADL HDInsight
Hive
Analytics U/SQL

YARN

WebHDFS
ADL Store

Storage
1 1 1 1 1 1

1 1 1 1 1 1
Azure Data Lake
As a part of Cortana Analytics Suite
Information management Big data stores Machine Learning Dashboards and
and Analytics visualizations
Power BI
Business
apps
Azure Azure
Azure Personal digital assistant
Data Factory SQL Data Warehouse Machine Learning

Cortana People
Azure
Stream Analytics
Azure
Custom Data Catalog Perceptual intelligence
apps Azure Face, vision
Azure HDInsight (Hadoop)
Data Lake Store Speech, text
Azure Azure
Event Hub Data Lake Analytics Business scenarios
Sensors Recommendations,
and devices customer churn, Automated
forecasting systems

DATA INTELLIGENCE ACTION


Introducing Azure Data Lake
Big data made easy

Analytics on any All users productive Ready for your


data, any size on day one enterprise
How do you start using ADL?
Data
Create an ADL
Analytics account
(90 seconds, free)
Log in to the ADL
Azure portal Store

ADL
Analytics Azure
Storage
U-SQL job reads and blobs
writes data

Write a U-SQL … and


script and submit so on
it to the ADL
Analytics account
Azure Data Lake
Store
What is Azure Data Lake (ADL) Store?
A highly scalable, distributed, parallel file system in the cloud
specifically designed to work with multiple analytic frameworks
Devices Video

ADL Analytics

Clickstream Web
HDInsight
ADL Store
R
Social Sensors

Spark

Relational LOB
applications
Machine Learning
Introducing Azure
Data Lake Store
A hyper-scale repository for big data Store ANY DATA in its native format
analytics workloads
HADOOP FILE SYSTEM (HDFS) for the
cloud

ENTERPRISE GRADE

No limits to SCALE

Optimized for analytic workload


PERFORMANCE
Any data Devices Video

Unstructured Clickstream Web

Semi-structured
Structured
Social Sensors

Relational LOB
applications
HDFS for the cloud

Built from the ground up as a Hadoop


file system
Support for file/folder objects and
operations
Integration with HDInsight, Hortonworks, ADL Store
and Cloudera
Accessible to all HDFS-compliant
projects
Spark | Storm | Flume Sqoop | Kafka | R | and more

HDInsight
Durable and highly
available
Automatically replicates your data
Three copies within a single region
Highly available
Unlimited storage
PB
GB
TB
Unlimited account sizes
Individual file sizes from gigabytes to
petabytes
No limits to scale

TB PB
Optimized for analytics
workload performance
Built for running large analytics systems
that require massive throughput
Optimized for parallel computation over
petabytes of data
Automatically optimizes for any
throughput
Azure Data Lake
Analytics
Azure Built on Apache YARN
Data Lake Analytics Scales dynamically with the turn of a dial
service Pay by the query
A new distributed analytics service Supports Azure Active Directory for access
control, roles, and integration with on-
premises identity systems

Built with U-SQL to unify the benefits of SQL


with the power of C#

Processes data across Azure


Develop big data applications
Author, debug, and optimize big
data applications in Visual Studio

Multiple languages: U-SQL


(Hive support coming)

Seamlessly integrate U-SQL with


your existing .NET code
Work across all your cloud data
Compute on data anywhere and join data from
multiple cloud sources

ADL Analytics
© 2016 Microsoft Corporation. All rights reserved. Microsoft, Windows, Microsoft Azure, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The
information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT
MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION

You might also like