DP-203T00 Microsoft Azure Data Engineering-05

DP-203T00:
Ingest and load data

into the data
warehouse
Agenda
Lesson 01: Use data loading best practices in Azure Synapse Analytics
Lesson 02: Petabyte-scale ingestion with Azure Data Factory

Lesson 01: Use data loading best practices in Azure Synapse
Analytics
Dedicated SQL Pool architecture revision
Compute Node Compute Node Compute Node
0110101010101010 0110101010101010 0110101010101010

1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
0110101010101010 0110101010101010 0110101010101010

Control Node 1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Understand data load design goals
Where is my data coming from?
Is the data nett new? or do you receive changes from existing datasets?
How often is the data being refreshed, added to or replaced?
What formats are the data coming in?
Is the data ingestible as-is? or are transformations and cleansing tasks required?
Which takes priority, loading or querying/analysis?

Manage singleton updates
0110101010101010 0110101010101010 0110101010101010

1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
0110101010101010 0110101010101010 0110101010101010

Control Node 1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Set-up dedicated data loading accounts
0110101010101010 0110101010101010 0110101010101010

1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
0110101010101010 0110101010101010 0110101010101010

Control Node 1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Manage concurrent access to Azure Synapse Analytics
0110101010101010 0110101010101010 0110101010101010

1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
0110101010101010 0110101010101010 0110101010101010

Control Node 1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Implement Workload Management
0110101010101010 0110101010101010 0110101010101010

1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
0110101010101010 0110101010101010 0110101010101010

Control Node 1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Use PolyBase, the Copy command or the Copy Activity
PolyBase Copy command Copy data activity
Staging Tables Production

Tables
Lesson 02: Petabyte-scale ingestion with Azure Data
Factory
Azure Data Factory/Synapse pipeline revision
Linked Service
Triggers
@ Parameters
Integration
IR
Runtime
Control
Pipeline CF
Data Lake Store
Flow
1
Activities
Azure Databricks
Dataset
Petabyte-scale ingestion with Azure Data Factory
Understanding integration
IR IR
SQL Server
Azure Integration Self-hosted

Runtime Integration Runtime
Review questions
Q01 – Which data loading feature limits the number of resources a group of
requests can consume in Azure Synapse Analytics?
A01 – Workload management
Q02 – Why should you split up one large files into smaller files when
loading data into a dedicated SQL pool in Azure Synapse Analytics?
A02 – To take advantage of the Massively Parallel Processing (MPP) architecture
Q03 – In which section of the Data Factory designer canvass would you find
the Copy Activity?
A03 – Move and Transform
Lab: Ingest and load data into the data warehouse
Lab overview
This lab teaches students how to ingest data into the data warehouse through T-SQL scripts and
Synapse Analytics integration pipelines. The student will learn how to load data into Synapse dedicated
SQL pools with PolyBase and COPY using T-SQL. The student will also learn how to use workload
management along with a Copy activity in a Azure Synapse pipeline for petabyte-scale data ingestion.
Lab objectives
After completing this lab, you will be able to:
Use data loading best practices in Azure Synapse Analytics

Lab review
Question 1 – What is the appropriate distribution to use in a staging table?
Question 2 – When creating an External Data Source, what should the Type
property be set to access data in a data lake?
Question 3 – When setting file format properties for a data source in the
Copy Activity, you can import the schema from a connection/store and what
other option?
Question 4 – You want to truncate a staging table before loading it with

data in a Copy activity. Which sink property would you set to do this?
Module summary
In this module, you have learned about:
Use data loading best practices in Azure Synapse Analytics
Next steps
After the course, consider reading [
Best practices for loading data using dedicated SQL pools in Azure Synapse Analytics] for
more guidance on data ingestion and loading
© Copyright Microsoft Corporation. All rights reserved.

DP-203T00 Microsoft Azure Data Engineering-05

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DP-203T00 Microsoft Azure Data Engineering-05

Uploaded by

Copyright:

Available Formats

DP-203T00:

Ingest and load data

Lesson 02: Petabyte-scale ingestion with Azure Data Factory

Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010

0110101010101010 0110101010101010 0110101010101010

Where is my data coming from?

How often is the data being refreshed, added to or replaced?

What formats are the data coming in?

Which takes priority, loading or querying/analysis?

Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010

0110101010101010 0110101010101010 0110101010101010

Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010

0110101010101010 0110101010101010 0110101010101010

Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010

0110101010101010 0110101010101010 0110101010101010

Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010

0110101010101010 0110101010101010 0110101010101010

PolyBase Copy command Copy data activity

Staging Tables Production

Azure Integration Self-hosted

Use data loading best practices in Azure Synapse Analytics

Petabyte-scale ingestion with Azure Data Factory

Question 1 – What is the appropriate distribution to use in a staging table?

Question 4 – You want to truncate a staging table before loading it with

Use data loading best practices in Azure Synapse Analytics

Petabyte-scale ingestion with Azure Data Factory

You might also like