You are on page 1of 20

DP-203T00:

Ingest and load data


into the data
warehouse
Agenda

Lesson 01: Use data loading best practices in Azure Synapse Analytics

Lesson 02: Petabyte-scale ingestion with Azure Data Factory


Lesson 01: Use data loading best practices in Azure Synapse
Analytics
Dedicated SQL Pool architecture revision

Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010


1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010


Control Node 1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Understand data load design goals

Where is my data coming from?

Is the data nett new? or do you receive changes from existing datasets?

How often is the data being refreshed, added to or replaced?

What formats are the data coming in?

Is the data ingestible as-is? or are transformations and cleansing tasks required?

Which takes priority, loading or querying/analysis?


Manage singleton updates

Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010


1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010


Control Node 1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Set-up dedicated data loading accounts

Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010


1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010


Control Node 1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Manage concurrent access to Azure Synapse Analytics

Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010


1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010


Control Node 1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Implement Workload Management

Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010


1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Compute Node Compute Node Compute Node

0110101010101010 0110101010101010 0110101010101010


Control Node 1011010101110101 1011010101110101 1011010101110101
01010110 01010110 01010110
Use PolyBase, the Copy command or the Copy Activity

PolyBase Copy command Copy data activity

Staging Tables Production


Tables
Lesson 02: Petabyte-scale ingestion with Azure Data
Factory
Azure Data Factory/Synapse pipeline revision

Linked Service
Triggers
@ Parameters

Integration
IR
Runtime
Control
Pipeline CF
Data Lake Store
Flow
1
Activities

Azure Databricks

Dataset
Petabyte-scale ingestion with Azure Data Factory
Understanding integration

IR IR
SQL Server

Azure Integration Self-hosted


Runtime Integration Runtime
Review questions

Q01 – Which data loading feature limits the number of resources a group of
requests can consume in Azure Synapse Analytics?
A01 – Workload management

Q02 – Why should you split up one large files into smaller files when
loading data into a dedicated SQL pool in Azure Synapse Analytics?
A02 – To take advantage of the Massively Parallel Processing (MPP) architecture

Q03 – In which section of the Data Factory designer canvass would you find
the Copy Activity?
A03 – Move and Transform
Lab: Ingest and load data into the data warehouse
Lab overview
This lab teaches students how to ingest data into the data warehouse through T-SQL scripts and
Synapse Analytics integration pipelines. The student will learn how to load data into Synapse dedicated
SQL pools with PolyBase and COPY using T-SQL. The student will also learn how to use workload
management along with a Copy activity in a Azure Synapse pipeline for petabyte-scale data ingestion.

Lab objectives
After completing this lab, you will be able to:

Use data loading best practices in Azure Synapse Analytics

Petabyte-scale ingestion with Azure Data Factory


Lab review

Question 1 – What is the appropriate distribution to use in a staging table?

Question 2 – When creating an External Data Source, what should the Type
property be set to access data in a data lake?

Question 3 – When setting file format properties for a data source in the
Copy Activity, you can import the schema from a connection/store and what
other option?

Question 4 – You want to truncate a staging table before loading it with


data in a Copy activity. Which sink property would you set to do this?
Module summary
In this module, you have learned about:

Use data loading best practices in Azure Synapse Analytics

Petabyte-scale ingestion with Azure Data Factory

Next steps
After the course, consider reading [
Best practices for loading data using dedicated SQL pools in Azure Synapse Analytics] for
more guidance on data ingestion and loading
© Copyright Microsoft Corporation. All rights reserved.

You might also like