AZURE DATA FACTORY
Contents
AZURE DATA FACTORY.................................................................................1
Objective:...................................................................................................2
Tools & Environment:...............................................................................2
For Create a Data Factory & Explore the UI......................................2
For Create Linked Services...................................................................2
Azure Data Factory (ADF)- Conceptual Overview:...............................2
Key Components:...................................................................................2
ADF UI Overview....................................................................................2
Azure Blob Storage................................................................................3
Azure SQL Database..............................................................................3
Data Flow in This Project......................................................................4
Task:.........................................................................................................4
Value Creator................................................................................................4
Azure Data Factory (ADF).....................................................................4
Tasks........................................................................................................4
a) Create a data factory and explore the UI (Author, Monitor,
Manage tabs)..........................................................................................4
ADF UI Overview – Three Main Tabs....................................................7
b) Create Linked Services: Azure Blob Storage & Azure SQL
Server....................................................................................................10
Step 1.....................................................................................................10
Connector configuration details........................................................14
c) Create datasets for CSV files (Blob) and SQL tables................25
d) Build a simple pipeline that copies data from Blob Storage to
SQL Server.............................................................................................30
e) Trigger a pipeline manually and monitor the run......................32
Objective:
This task supports my Performance Improvement Plan (PIP) by
enhancing my skills in cloud-based data integration using Azure Data
Factory and Azure SQL Database. The goal is to design a simple ETL
pipeline that moves data from Azure Blob Storage to a relational database
and apply T-SQL queries for data manipulation and analysis.
Tools & Environment:
For Create a Data Factory & Explore the UI
Azure Portal – Web-based interface to create and manage Azure
services
Azure Data Factory (ADF) – Cloud-based data integration service
For Create Linked Services
Azure Data Factory Studio – UI to create linked services, datasets,
and pipelines
Azure Blob Storage – Used as a source or destination for files
Azure SQL Database – Cloud-based relational database
Authentication Methods – Account Key for Blob Storage, SQL
Authentication for Azure SQL
Azure Data Factory (ADF)- Conceptual Overview:
Azure Data Factory is a cloud-based data integration service that allows
you to create, schedule, and manage data pipelines. It provides a code-free
and code-friendly environment for building complex data workflows.
Key Components:
1. Pipelines: Logical containers that define the data flow process.
2. Activities: Tasks performed in a pipeline, like copy data,
transformation, etc.
3. Datasets: Represent data structures (e.g., files, tables) used by
activities.
4. Linked Services: Define connections to external data sources or
sinks.
5. Integration Runtime (IR): The compute environment for executing
activities.
ADF UI Overview
1. Author Tab
o Used to create and manage pipelines, datasets, and data flows.
o Includes a visual designer and code view.
o Supports drag-and-drop activities for building workflows.
2. Monitor Tab
o Used to track pipeline executions and monitor performance.
o Displays success/failure status, execution time, and error details.
3. Manage Tab
o Used to configure:
Linked services (e.g., Azure Blob, SQL DB)
Integration Runtimes
Triggers (schedules)
Global parameters
Azure Blob Storage
Azure Blob Storage is a scalable, cloud-based object storage system. In
this project, it serves as the source or landing zone for raw or semi-
structured data files.
Uses in this project:
Stores source files (e.g., CSV or JSON) that will be loaded into Azure
SQL.
Configured in ADF via a Linked Service using Storage Account keys or
SAS tokens.
Accessed using binary, delimited, or JSON dataset formats in ADF.
Azure SQL Database
Azure SQL Database is a fully managed relational database in the cloud.
It is used in this project as the destination (sink) for structured data.
Uses in this project:
1. Acts as a central storage point for cleaned and structured data.
2. Receives data loaded via ADF’s Copy Data activity.
3. Can be queried using T-SQL in tools like Azure Data Studio or SSMS.
4. Supports features like indexes, views, joins, constraints, and stored
procedures.
Data Flow in This Project
1. Source: File in Azure Blob Storage (e.g., Employee data in CSV).
2. ADF Pipeline:
o Connects to Blob using a Linked Service
o Defines a dataset pointing to the file
o Uses a Copy Activity to move data
o Connects to Azure SQL using another Linked Service
o Defines a target SQL dataset
3. Destination: Azure SQL Database table (e.g., Employee)
4. Post-load: Data is verified and queried using T-SQL
Task:
Value Creator
Azure Data Factory (ADF)
Tasks
a. Create a data factory and explore the UI (Author, Monitor,
Manage tabs).
Installations:
Azure Portal – Web-based interface to create and manage Azure
services
Azure Data Factory (ADF) – Cloud-based data integration service
Step 1: Sign in to Azure Portal
Open your browser and go to: https://portal.azure.com
Sign in with your Azure credentials.
Step 2: Create a Data Factory
1. In the search bar at the top, type “Data Factory” and select it.
2. Click “+ Create”.
3. Fill in the required fields:
o Subscription: Choose your subscription.
o Resource Group: Select existing or create a new one.
o Region: Select a region (e.g., East US).
o Name: Provide a unique name for the Data Factory.
4. Select Version V2.
5. Click Review + Create, then Create.
6. Wait for deployment to complete, then click Go to resource.
Step 3: Exploring the Azure Data Factory UI
Once the deployment is complete:
1. Go to the Data Factory instance you created.
2. Click on "Launch Studio" – this opens the ADF UI.
ADF UI Overview – Three Main Tabs
1. Author Tab
This is where you design and build your data pipelines.
Sections:
o Pipelines: Create ETL/ELT workflows.
o Datasets: Define data structures (input/output).
o Linked services: Connections to data sources like Azure Blob,
SQL, etc.
o Data flows: For visually transforming data (mapping data flows).
o Triggers: Schedule or event-based pipeline executions.
Actions:
o Click + (Add resource) to create a pipeline, dataset, data flow,
etc.
o Use the drag-and-drop canvas to build pipeline workflows.
2. Monitor Tab
Used to track pipeline execution and debug issues.
Sections:
o Pipeline runs: View history of pipeline executions (status,
duration, etc.).
o Trigger runs: Monitor trigger-based executions.
o Integration runtimes: View status of your compute
environment.
Actions:
o Click on a failed pipeline to view activity details and
troubleshoot.
o Filter logs by status, date, or name.
3. Manage Tab
Used for configuration and administration.
Sections:
o Linked services: Add or manage connections to external
systems.
o Integration runtimes: Manage self-hosted or Azure-hosted
compute.
o Triggers: Create/edit triggers.
o Git configuration: Integrate with Git for source control.
Actions:
o Set up self-hosted IR for on-premises connectivity.
o Configure Git repository to track changes in your pipelines.
What is a Linked Service?
A Linked Service in ADF is like a connection string. It defines the connection
information needed for ADF to connect to external resources (e.g.,
databases, storage).
b) Create Linked Services: Azure Blob Storage & Azure SQL Server
Step 1: Launch Azure Data Factory Studio
1. Go to https://portal.azure.com
2. Open your Data Factory resource.
3. Click "Launch Studio".
Part 1: Create Linked Service for Azure Blob Storage
Steps:
1. Go to the Manage tab (gear icon on the left).
2. Under Connections, click Linked services.
3. Click + New.
4. In the New linked service pane:
o Search and select Azure Blob Storage.
5. Click Continue.
Configuration options:
Name: e.g., AzureBlobStorage1
Authentication method: Choose from:
o Account key (simplest for testing)
o Managed Identity (recommended for production)
o SAS token or Service Principal
Storage account name: Select your Blob Storage account.
Connector configuration details
The following sections provide details about properties that are used to
define Data Factory and Synapse pipeline entities specific to Blob storage.
Linked service properties
This Blob storage connector supports the following authentication types. See
the corresponding sections for details.
1. Anonymous authentication
2. Account key authentication
3. Shared access signature authentication
4. Service principal authentication
5. System-assigned managed identity authentication
6. User-assigned managed identity authentication
1. Anonymous authentication
The following properties are supported for storage account key
authentication in Azure Data Factory or Synapse pipelines:
Property Description Required
type The type property must be set to AzureBlobStorage (suggested) Yes
or AzureStorage (see the following notes).
containerUr Specify the Azure Blob container URI that has enabled Anonymous Yes
i read access by taking this
format https://<AccountName>.blob.core.windows.net/<Container
Name> and Configure anonymous public read access for
containers and blobs
connectVia The integration runtime to be used to connect to the data store. No
You can use the Azure integration runtime or the self-hosted
integration runtime (if your data store is in a private network). If
this property isn't specified, the service uses the default Azure
integration runtime.
JSON CODE:
The following properties are supported for storage account key
authentication in Azure Data Factory or Synapse pipelines:
Property Description Requir
ed
type The type property must be set Yes
to AzureBlobStorage (suggested)
or AzureStorage (see the following notes).
connectionStri Specify the information needed to connect Yes
ng to Storage for
the connectionString property.
You can also put the account key in Azure
Key Vault and pull
the accountKey configuration out of the
connection string. For more information,
see the following samples and the Store
credentials in Azure Key Vault article.
connectVia The integration runtime to be used to No
connect to the data store. You can use the
Azure integration runtime or the self-hosted
integration runtime (if your data store is in
a private network). If this property isn't
specified, the service uses the default
Azure integration runtime.
using Account key:
Click “Test connection” (to verify access).
Click Create.
Part 2: Create Linked Service for Azure SQL Server
Steps:
1. In the Linked services page, click + New again.
2. Search and select Azure SQL Database or SQL Server (based on
your setup).
3. Click Continue.
Configuration options:
Name: e.g., AzureSQLDBLS
Server name: e.g., yourserver.database.windows.net
Database name: Enter your DB name.
Authentication type: Choose from:
o SQL Authentication (username/password)
o Managed Identity
Username: SQL admin username.
Password: SQL password (stored securely).
Encrypted connection: Usually enabled.
4. Click “Test connection”, then create once it succeeds.
SQL authentication
To use SQL authentication, in addition to the generic properties that are
described in the preceding section, specify the following properties:
Propert Description Requir
y ed
userNam The user name used to connect to the server. Yes
e
passwor The password for the user name. Mark this field Yes
d as SecureString to store it securely.
JSON CODE
"name": "AzureSqlDbLinkedService",
"properties": {
"type": "AzureSqlDatabase",
"typeProperties": {
"server": "<name or network address of the SQL server instance>",
"database": "<database name>",
"encrypt": "<encrypt>",
"trustServerCertificate": false,
"authenticationType": "SQL",
"userName": "<user name>",
"password": {
"type": "SecureString",
"value": "<password>"
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
C) Create datasets for CSV files (Blob) and SQL tables.
Firstly, Created linked services for blob storage and SQL server
After that,
Step 1: Create Dataset for CSV File (Blob Storage)
1. Go to the Author tab (pencil icon).
2. Expand Datasets > Click + > Add dataset.
3. Select Azure Blob Storage > DelimitedText > Click Continue.
4. Provide:
o Name: DS_CSVInput
o Linked Service: LS_BlobStorage
o File path: e.g., container-name/folder/filename.csv
5. Set:
o Column delimiter: Comma (,), or other if needed.
o First row as header: Check this box if applicable.
o Import schema or leave as none (you can define schema
manually).
6. Click OK or Create.
Create Dataset for SQL Table
1. Again, click + > Add dataset.
2. Select Azure SQL Database (or SQL Server) > Click Continue.
3. Provide:
o Name: DS_SQLTarget
o Linked Service: LS_SQLServer
o Table name: Browse or type (e.g., dbo.Employees)
4. Import schema or define manually.
5. Click OK.
d) Build a simple pipeline that copies data from Blob Storage to SQL
Server.
Create the Pipeline
Go to Author > Pipelines
Click New Pipeline
Drag a Copy Data activity onto the canvas
Configure:
o Source: Select Blob dataset
o Sink: Select SQL Server dataset
o Optionally: Configure mappings, pre/post SQL, etc.
Run & Monitor the Pipeline
Publish the pipeline
Trigger the pipeline manually or on a schedule
Go to Monitor to check execution status and logs
Open Your Pipeline
Go to the Author tab (pencil icon on the left)
Find your pipeline under Pipelines
Click to open the pipeline
Trigger the Pipeline Manually
With the pipeline open, click "Add Trigger" (top menu)
Select "Trigger Now"
If your pipeline has parameters, a dialog will pop up—fill in required
values
Click "OK" to trigger the pipeline run
e) Trigger a pipeline manually and monitor the run.
Monitor the Pipeline Run
Switch to the Monitor tab (clock icon on the left)
You’ll see a list of pipeline runs
Find your pipeline by name and click on the latest Run ID
This shows details such as:
o Status: Succeeded, Failed, In Progress
o Start/End Time
o Activities: Individual steps with status
o Output & Logs
View Activity-Level Details
Click on the Copy Data activity in the pipeline run
You'll see:
o Input/output datasets
o Number of rows read/written
o Any error messages if the run failed