Professional Documents
Culture Documents
A Complete Introduction
V1 V2
2016 2020
What is Azure Data Factory?
https://azure.microsoft.com/en-gb/services/data-factory/
What is Azure Data Factory?
1 Linked Services – How and what to connect to. Like the SSIS connection manager.
SQLDBLinkedService
ConnectionString: Server=MyServer;Database=myDataBase
UserName: “MrPaulAndrew”
Password: ***************
Data Factory Components
Copy Data Transform
1 Linked Services – How and what to connect to. Like the SSI
SQLDBLinkedService
ConnectionString: Server=MyServer;Database=myDataBase
UserName: “MrPaulAndrew”
Password: ***************
Data Factory Components
Copy Data Transform
1 Linked Services
Data Factory Components
Copy Data Transform
1 Linked Services
2 Data Sets – Where is my data? What format? What file path/table do I need?
dbo.DimCustomer
/RAW/Orders/2018/01/01/Orders.csv
Data Factory Components
Copy Data Transform
1 Linked Services
2 Data Sets
Databricks Notebook Activity
3 Activities – What do we notebookPath: /Playground/Playing
want to happen? baseParameters: Testing
With what conditions? libraries[ jar]: dbfs:/lib1.jar
linkedServiceName: BricksOfData01
Data Factory Components
Copy Data Transform
1 Linked Services
2 Data Sets
3 Activities
1 Linked Services
2 Data Sets
3 Activities
1 Linked Services
1 Linked Services
1 Linked Services
5 Triggers
Data Factory Components
Copy Data Transform
1 Linked Services
1 Linked Services
5 Triggers
Data Factory Components
Copy Data Transform
1 Linked Services
1 Linked Services
2 Data Sets
3 Activities
4 Pipelines
5 Triggers
Data Factory Control Flow Components
Copy Data Transform
1 Linked Services
2 Data Sets
3 Activities
4 Pipelines
5 Triggers
Data Factory Control Flow Components
Copy Data Transform
1 ✓Linked Services
2 ✓Data Sets
3 ✓Activities
4 ✓Pipelines
5 Triggers
Data Factory Control Flow Components
Copy Data Transform
1 Linked Services
Expression Builder
2 Data Sets Parameters
@{………}
System Variables
3 Activities
• Collection
4 Pipelines • Conversation
• Date
5 Triggers • Logical
• Math
• String
Integration Runtimes
Integration Runtimes
Flexible Region
Data Movements
Azure
1 Integration Runtime Activity
Orchestration
Specified Region
Virtual Machine
Gateway Access
Self Hosted
3 Integration Runtime Activity
Orchestration
Integration Runtimes
Flexible Region
Movement Hours
Azure
1 Integration Runtime Activity
Orchestration
Specified Region
Virtual Machine
Gateway Access
Self Hosted
3 Integration Runtime Activity
Orchestration
Running SSIS Packages in Azure Data Factory
Parent/Child Packages
SSIS Package
SSIS
Activity
Package
80x
Run Package
8x Packages per
Node
10x Nodes
Logical or Managed
Instance
Data Factory
SSISDB
Visual Studio – SQL Server Data Tools (SSDT) SSIS IR
Integration Runtimes
Flexible Region
Movement Hours
Azure
1 Integration Runtime Activity
Orchestration
Specified Region
Virtual Machine
Gateway Access
Self Hosted
3 Integration Runtime Activity
Orchestration
Integration Runtimes
Flexible Region
Movement Hours
Azure
1 Integration Runtime Activity
Orchestration
Specified Region
Virtual Machine
Gateway Access
Self Hosted
3 Integration Runtime Activity
Orchestration
Single Hosted IR
On Premises Azure
Multiple Hosted IR’s (Failover & Load Balancing)
On Premises Azure
Hosted IR Linked to Multiple Data Factory’s
On Premises Azure
Using a Hosted IR with Express Route
On Premises Azure
Using a Hosted IR with Express Route
On Premises Azure
Recap: Data Factory
1 Linked Services Azure
1 Integration Runtime
2 Data Sets
SSIS
3 Activities 2 Integration Runtime
4 Pipelines
Self Hosted
5 Triggers
3 Integration Runtime
Data Factory Key Activities
Lookup Activity
Get Metadata to Support Other Control Flow Activities
Single Value
Or
Many Values
Dataset [array]
Lookup
https://docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
ForEach Activity IsSequential:
true
[array]
Scaling Out Control Flow Activities [0]
[1]
[i]
ForEach
[array]
Lookup
https://docs.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity
Custom Activity Extend Data Factory with Custom Code
References Objects
Datasets: []
Linked Services: []
Custom
Linked Services
Azure Batch ???
Azure Blob Storage
https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity
Building a Custom Activity
A .Net Console App Executed Using Azure Batch Service.
Data Factory
Pipeline &
Custom Activity
Custom
Azure AD
Batch
Service
Solution &
Projects
Data Lake
Blob Store
C# Classes
Compute Pool Clean Data
https://mrpaulandrew.com/2018/11/12/creating-an-azure-data-factory-v2-custom-activity/ Key Vault
Recap: Data Factory – Useful Activities
Lookup
1 Get value(s) from lots of places to support other activities.
ForEach
2 Iteration of other activities, sequentially or in parallel.
Custom
3 Extensibility - code executed by Azure Batch compute pools.
Simple Dynamic Copy Pipeline
Demo Architecture
SQLDB
Data Lake
Store Gen2
Hosted IR
C:\
Wrangling Mapping
Data Flow Data Flow
1 Linked Services
2 Data Sets
3 Activities
4 Pipelines
5 Triggers
Mapping Data Flows
What is a Mapping Data Flow?
Mapping Data Azure
Flow Databricks
Data Factory
https://mrpaulandrew.com/2018/10/22/what-is-azure-data-factory-data-flow/
Mapping Data Flows
Mapping Data Flows – Settings & Concepts
Mapping Data Flows – Transformations
Multicast
Merge Join
Script Component
Mapping Data Flows – Expression Builder
Mapping Data Flows – Debug Mode
Power BI Desktop
Data Factory
What is a Wrangling Data Flow?
Wrangling Data Azure
Flow Databricks
Data Factory
Data Flow - Cluster Configuration
Azure
Databricks
• General Purpose
• Memory Optimised
• Compute Optimised
Recap: Data Factory – Data Flows
Mapping
1 Similar to SSIS Data Flows in appearance.
Wrangling
2 Similar to Power BI Power Query.
Git
master*
adf_publish Deployed
Developer Service
Save
Debug Dev
“ARMTemplateForFactory.json”
save
merge
master publish
Deployed
Developer Service
Dev Prod
Option 2 – ARM Templates for Multiple Data Factory Services
Debug
Test Service Live Service
Git branch Service
save
merge
master ARM
Template
Developer
Dev Test Prod
Recap: Data Factory – DevOps
Development and Debugging
1 Via the Portal UI
Source Code
2 3x choices of JSON files!
Deployment
3 Several options depending on requirements.
Summary
What is Azure Data Factory?
Copy Data
Custom
Do Stuff
Blog: mrpaulandrew.com
Email: paul@mrpaulandrew.com
Twitter: @mrpaulandrew
LinkedIn: In/mrpaulandrew
Slides:
GitHub: github.com/mrpaulandrew Community Events
Repository