Professional Documents
Culture Documents
2021 05 Training Day Data Fundamentals Original
2021 05 Training Day Data Fundamentals Original
FOR USE ONLY AS PART OF MICROSOFT VIRTUAL TRAINING DAYS PROGRAM. THESE MATERIALS ARE NOT AUTHORIZED
FOR DISTRIBUTION, REPRODUCTION OR OTHER USE BY NON-MICROSOFT PARTIES.
Microsoft Azure Virtual
Training Day:
Data Fundamentals
Agenda
Table
Transactional vs analytical data stores
Customer
Orders
CustomerID Balance
5558 1000
500
6023 1500
2000
Customer
Transfers
Cloud data
SaaS data
Batch Data / Streaming Data
Lesson 2: Explore roles and responsibilities in the world
of data
Explore data job roles
Lesson 2
objectives
Explore common tasks and tools for data job
roles
Roles in Data
Azure Data Studio SQL Server Management Studio Azure Portal / CLI
• Graphical interface for managing • Graphical interface for managing • Tools for management and
on-premises and cloud-based data on-premises and cloud-based data provisioning of Azure Data Services
services services • Manual and automation of scripts
• Runs on Windows, macOS, Linux • Runs on Windows using Azure Resource Manager or
• Comprehensive Database Command Line Interface scripting
Administration tool
Common Tools – Data Engineering
Azure Synapse Studio SQL Server Management Studio Azure Portal / CLI
• Azure Portal integrated to manage • Graphical interface for managing • Tools for management and
Azure Synapse on-premises and cloud-based data provisioning of Azure resources
• Data Ingestion (Azure Data Factory) services • Manual and automation of scripts
• Management of Azure Synapse • Runs on Windows using Azure Resource Manager or
assets (SQL Pools / Spark Pool) • Comprehensive Database Command Line Interface scripting
Administration tool
Common Tools – Data Analyst
• Data Visualization tool • Authoring and management of • Data Visualization tool for paginated
• Model and Visualize Data Power BI reports reports
• Management of Azure Synapse • Authoring of Power BI dashboards • Model and Visualize paginated reports
assets (SQL Pools / Spark Pool) • Share Reports / Datasets
Lesson 3: Describe concepts of relational data
Explore the characteristics of relational data
Customers
CustomerID CustomerName CustomerPhone
100 Muisto Linna XXX-XXX-XXXX
XXX-XXX-XXXX
101 Noam Maoz
XXX-XXX-XXXX
102 Vanja Matkovic
XXX-XXX-XXXX
103 Qamar Mounir
XXX-XXX-XXXX
104 Zhenis Omar
XXX-XXX-XXXX
105 Claude Paulet
XXX-XXX-XXXX
106 Alex Pettersen
XXX-XXX-XXXX
107 Francis Ribeiro
Normalization
Customers
CustomerID CustomerName CustomerPhone
100 Muisto Linna XXX-XXX-XXXX
Orders
XXX-XXX-XXXX
101 Noam Maoz OrderID CustomerName CustomerPhone
XXX-XXX-XXXX
MK106 Muisto Linna
Relations
Customers
CustomerID CustomerName CustomerPhone
100 Muisto Linna XXX-XXX-XXXX
Orders
XXX-XXX-XXXX
101 Noam Maoz OrderID CustomerID SalesPersonID
AD100 101 200
XXX-XXX-XXXX
102 Vanja Matkovic
AD101 101 200
XXX-XXX-XXXX AD102 101 200
103 Qamar Mounir
AX103 103 201
XXX-XXX-XXXX
104 Zhenis Omar AS104 103 201
AR105 105 200
XXX-XXX-XXXX
105 Claude Paulet
MK106 105 201
XXX-XXX-XXXX DB205 100 205
106 Alex Pettersen
Indexes
Customers
CustomerID CustomerName CustomerPhone
100 Muisto Linna XXX-XXX-XXXX
IDX-CustomerRegion
XXX-XXX-XXXX
101 Noam Maoz CustomerID Region
100 France
XXX-XXX-XXXX
102 Vanja Matkovic 101 Brazil
XXX-XXX-XXXX 102 Croatia
103 Qamar Mounir
103 Jordan
XXX-XXX-XXXX
104 Zhenis Omar 104 Spain
105 France
XXX-XXX-XXXX
105 Claude Paulet 106 USA
XXX-XXX-XXXX
106 Alex Pettersen
View
Customers
CustomerID CustomerName CustomerPhone
Orders
100 Muisto Linna XXX-XXX-XXXX
OrderID CustomerID SalesPersonID
XXX-XXX-XXXX
101 Noam Maoz AD100 101 200
AD101 101 200
XXX-XXX-XXXX
102 Vanja Matkovic AD102 101 200
XXX-XXX-XXXX AX103 103 201
103 Qamar Mounir
AS104 103 201
XXX-XXX-XXXX
104 Zhenis Omar AR105 105 200
MK106 105 201
XXX-XXX-XXXX
105 Claude Paulet DB205 100 205
XXX-XXX-XXXX
106 Alex Pettersen
Lesson 4: Explore concepts of non-relational data
Explore the characteristics of non-relational data
## Customer 1 ID: 1
Name: Mark Hanson
Telephone: [ Home: 1-999-9999999, Business: 1-888-8888888, Cell: 1-777- 7777777 ]
Address: [ Home: 121 Main Street, Some City, NY, 10110,
Business: 87 Big Building, Some City, NY, 10111 ]
## Customer 2 ID: 2
Title: Mr
Name: Jeff Hay
Telephone: [ Home: 0044-1999-333333, Mobile: 0044-17545-444444 ]
Address: [ UK: 86 High Street, Some Town, A County, GL8888, UK,
US: 777 7th Street, Another City, CA, 90111 ]
Identify non-relational database use cases
IoT and Telematics
Often require to ingest large amounts of data in frequent burst of activity, data is either semi structured or
structured, often requires real time processing
Gaming
In-game stats, social media integration, leaderboards, low-latency applications
Business requirements:
• OLTP apps with highly correlated data.
• Easy updates to single or many objects.
• Flexible data modelling.
• Data requirements that evolve.
• Hierarchical data structures.
Lesson 5: Explore concepts of data analytics
Learn about data ingestion and processing
Prescriptive Cognitive
Agenda
Best for re-hosting and apps requiring Best for modernizing existing apps Best for building new apps in the cloud
OS-level access and control
Pre-provisioned or serverless compute
Automated manageability features and Offers high compatibility with SQL Server and Hyperscale storage to meet
OS-level access and native VNET support demanding workload requirements
Memory or Storage optimized sizes for best performance Tempdb on local SSD
Sizes and Storage
Data and log on Premium Storage Managed Disks Ultra disks for extremely low latency needs
Performance Azure Blob Read Caching for data disks
Azure VM built-in HA Failover Cluster Instance with Azure Premium File Share
Azure Storage built-in DR Always On Availability Groups with Cloud Witness
HADR Azure Backup and Automated backups to Azure Blob Storage Hybrid Availability Group Secondary replicas
File-Snapshot Backups HADR on RedHat Linux with Pacemaker and fencing
IaaS vs PaaS
P P S S R P R R R R
vCore model
Independent scalability
Fully managed Built-in high availability Intelligent performance Industry-leading Integration with the
community database for lowest TCO and scale security and compliance Azure ecosystem
Take advantage of a fully Ensure your data is Improve performance Protect your data Build apps faster with
managed service while always available without with built-in intelligence with enhanced Azure services and
still using the tools and the need for additional and up to 16TB storage security features safeguard your
languages you're familiar costs and 20K IOPs including Advanced innovation with
with Threat Protection Azure IP Advantage
Azure Database for PostgreSQL
Fully managed and Intelligent performance Flexible and open High performance
secure optimization scale-out with
Hyperscale
Single Server
Hyperscale
Lesson 2: Explore provisioning and deploying relational
database offerings in Azure
Provision relational data services
mysqldbsrv.database.windows.net
GW GW
westus1-a.control.database.windows.net
104.42.238.205,1433 23.99.34.75
GW GW
proxy
GW
(1) redirect-find-db
GW GW
104.42.238.205
174.17.218.16
10.0.0.2 10.0.0.5
mysqldbsrv.database
.windows.net
westus1-a.control.
mysqldbsrv.privatelink.
database.windows.net
database.windows.net
104.42.238.205
10.0.0.5
102.14.157.9
10.0.0.1
Read Replica #1 Read Replica #2 Read Replica #3 Read Replica #4 Read Replica #5
• Used to query and manipulate • Used to define database • Used to manage security
data
objects permissions
• SELECT, INSERT, UPDATE,
DELETE • CREATE, ALTER, DROP, • GRANT, REVOKE, DENY
REMOVE
Use DML statements
Statement Description
SELECT Select/read from a table
INSERT Insert new rows in a table
UPDATE Edit/Update existing rows in a table
DELETE Delete existing rows in a table
Elements of the SELECT Statement
Clause Expression
SELECT <select list>
FROM <table or view>
WHERE <search condition>
GROUP BY <group by list>
ORDER BY <order by list>
Example of SELECT statement
• Table and row constructors add multirow capability to INSERT ... VALUES
VALUES
(10256,39,18,2,0.05),
(10258,39,18,5,0.10);
Use DDL statements
Statement Description
CREATE Create a new object in the database, such
as a table or a view.
ALTER Modify the structure of an object. For
instance, altering a table to add a new
column.
DROP Remove an object from the database.
RENAME Rename an existing object.
Example of CREATE statement
psql --host=<server-name>.postgres.database.azure.com --
username=<admin-user>@<server-name> --dbname=postgres
Query relational data in Azure SQL Database for MySQL
Use MySQL Workbench to query a database
Agenda
• Has a maximum size of 4.7TB • Can hold up to 8TB of data • The maximum size is just
over 195GB
• Best for storing large, discrete, • Is organized as a collection
binary objects that changes
of fixed sized-512 byte • Is a block blob that is used
infrequently
pages to optimize append
• Each individual block can store operations
up to 100MB of data • Used to implement virtual
disk storage for virtual • Each individual block can
• A block blob can contain up to machines store up to 4MB of data
50000 blocks
Explore Azure File Storage
Explore Azure Cosmos DB
Using Azure Cosmos DB's multi-master replication model along with Microsoft's
performance commitments, Data Engineers can implement a data architecture to
Web and retail support web and mobile applications that achieve less than a 10-ms response time
anywhere in the world
The database tier is a crucial component of gaming applications. Modern games perform
graphical processing on mobile/console clients but rely on the cloud to deliver
Gaming customized and personalized content like in-game stats, social media integration, and
high-score leader boards.
Hundreds of thousands of devices have been designed and sold to generate sensor
data known as Internet of Things (IoT) devices. Using technologies like Azure IoT Hub,
IoT scenarios Data Engineers can easily design a data solution architecture that captures
real-time data. Cosmos DB can accept and store this information very quickly
Lesson 2: Explore provisioning and deploying non-
relational data services in Azure
Provision non-relational data services
You can use the Data Migration tool to import data to Azure
Cosmos DB from a variety of sources, including:
• JSON files
• MongoDB
• SQL Server
• CSV files
• Azure Table storage
• Amazon DynamoDB
• HBase
• Azure Cosmos containers
Configure consistency
Configure consistency
Query Azure Cosmos DB
Up to 14x faster and costs 94% less Leader in the Magic Quadrant for Business
Azure Databricks than other cloud providers Intelligence and Analytics Platforms*
(Data prep)
Up to 10x faster than vanilla Spark
Store
Files
(unstructured)
Business/ Store
custom apps
(structured)
Azure Data Lake Storage
High performance data lake
available in all 54 Azure regions
Explore Azure data services for modern data warehousing
What is Azure Data Factory
Store
Simplifies the provisioning and Can integrate with a variety of Azure data
Utilizes the security capabilities of Azure.
collaboration of Apache Spark-based platform services and Power BI
analytical solutions
What is Azure Synapse Analytics?
What is Azure Analysis Services?
What is Azure HDInsight?
Lesson 2: Explore data ingestion in Azure
Describe data ingestion in Azure
Linked Service
Triggers
@ Parameters
Integration
IR
Runtime
Control
Pipeline CF
Data Lake Store
Flow
Activities
Azure Databricks
Dataset
Demo: Load data into Azure Synapse Analytics
Lesson 3: Explore data storage and processing in Azure
Describe data processing options for performing
analytics in Azure
Lesson 3
objectives
Explore Azure Synapse Analytics
Data processing options for performing analytics in Azure
Azure Synapse Azure Databricks Azure HDInsight Azure Data Factory Data Lake Store
Analytics
Explore Azure Synapse Analytics
Lesson 4: Get started building with Power BI
Learn how Power BI services and applications
work together