Professional Documents
Culture Documents
Hands On Exercises
Page 1
Table of Contents
Unit 1: Create Azure Data Factory Workspace...............................................................................4
Exercise 1: Creating Data Factory.............................................................................................4
Unit 2: Understanding Key Components.........................................................................................7
Exercise 2: How to create Azure Integration Runtime................................................................7
Exercise 3: How to create linked Self- Hosted Integration Runtime.........................................14
Exercise 4: How to create Linked Services...............................................................................18
4.1: Linked Service with SAP Tables.........................................................................................19
4.2: Linked Service with SQL Database....................................................................................22
4.3: Linked Service with Azure Data Lake Storage Gen2.........................................................24
Exercise 5: How to create Datasets...........................................................................................27
5.1: Dataset with SAP Table......................................................................................................28
5.2: Dataset with SQL Database...............................................................................................30
5.3: Dataset with Data Lake Storage Gen2...............................................................................32
Exercise 6: Create a pipeline....................................................................................................35
Unit 3: Activities............................................................................................................................38
Exercise 7: How to create Lookup Activities............................................................................38
Exercise 8: How to create Stored Procedure Activity...............................................................39
Exercise 9: How to create Copy Data Activity..........................................................................41
Exercise 10: How to create Azure Function Activities..............................................................44
Exercise 11: How to create Iteration & Conditional Activities................................................46
Unit 4: Triggers, publishing and monitoring.................................................................................48
Exercise 12: How to create Schedule Trigger...........................................................................48
Exercise 13: How to create Tumbling Window Trigger............................................................50
Exercise 14: How to create Event Trigger................................................................................52
Exercise 15: Publishing of the Activity......................................................................................56
Exercise 16: Debugging & Monitoring a pipeline....................................................................58
Unit 5: Copy multiple tables in bulk.............................................................................................59
Exercise 1: End-to-end workflow..............................................................................................60
Exercise 2: Prerequisites...........................................................................................................60
Exercise 3: Prepare SQL Database and Azure Synapse Analytics (formerly SQL DW)..........60
Exercise 4: Azure services to access SQL server......................................................................61
Exercise 5: Create a data factory..............................................................................................61
Page 2
Exercise 6: Create the source Azure SQL Database linked service..........................................63
Exercise 7: Create the sink Azure Synapse Analytics (formerly SQL DW) linked service.......64
Exercise 8: Create the staging Azure Storage linked service...................................................65
Exercise 9: Create a dataset for source SQL Database............................................................66
Exercise 10: Create a dataset for sink Azure Synapse Analytics (formerly SQL DW).............66
Exercise 11: Create pipelines....................................................................................................69
Exercise 12: Create the pipeline IterateAndCopySQLTables...................................................69
Exercise 13: Create the pipeline GetTableListAndTriggerCopyData......................................74
Exercise 14: Trigger a pipeline run..........................................................................................77
Exercise 15: Monitor the pipeline run......................................................................................77
Unit 6: Incrementally load data from Azure SQL Database to Azure Blob storage.....................79
Exercise 1: Overview.................................................................................................................79
Exercise 2: Prerequisites...........................................................................................................80
Exercise 3: Create a data source table in your SQL database.................................................81
Exercise 4: Create another table in your SQL database to store the high watermark value. . .81
Exercise 5: Create a stored procedure in your SQL database..................................................82
Exercise 6: Create a data factory..............................................................................................83
Exercise 7: Create a pipeline....................................................................................................85
Exercise 8: Monitor the pipeline run........................................................................................94
Exercise 9: Review the results...................................................................................................94
Exercise 10: Trigger another pipeline run................................................................................95
Exercise 11: Monitor the second pipeline run..........................................................................95
Exercise 12: Verify the second output.......................................................................................96
Page 3
2. Under the Create Data Factory page, provide the values to create a Data Factory workspace.
After filling in all the required details click on Review + create.
Page 4
After few seconds, you will see a message ‘Your deployment is complete’. Click on ‘Go to
resource’. Below screen will appear:
3. Click on Launch Studio to get into Data Factory homepage. On the left panel you will see options
like Author, Monitor, Manage, etc. which will use in further exercises.
Page 5
Unit 2: Understanding Key Components
Exercise 2: How to create Azure Integration Runtime
1. On the Azure Data Factory Homepage, select the Manage tab on the left pane.
Page 6
2. Under Connections, select Integration runtimes and click on ‘New’ button to create an
integration runtime.
3. Select Azure, Self-Hosted option and click ‘Continue’ button from the below screen.
4. Select Azure option and click ‘Continue’ button from the below screen.
Page 7
5. Set the value of Name (give description as well) and region as below. Leave the other options as
it is:
Name: ir-hgstft2023-<username>
Region: "North Central US"
Page 8
6. Click on Create, after creation the new Integration Runtime will appear like this:
Page 9
Exercise 3: How to create linked Self- Hosted Integration Runtime
1. Under Connections, select Integration runtimes and click on ‘New’ button to create an integration
runtime.
2. Select Azure, Self-Hosted option and click ‘Continue’ button from the below screen.
3. Select Linked Self-Hosted option and click ‘Continue’ button from the below screen.screen.
Page 10
Need to add more details
Page 11
Exercise 4: How to create Linked Services
1. Under Linked Service, provide the values to create a Linked Service to link SAP Tables to
the Data Factory.
2. Under New linked service, Select SAP HANA and then select ‘Continue’.
Page 12
Need to add more details
1. Under Linked Service, provide the values to create a Linked Service to link Azure SQL Database
to the Data Factory
2. Under New linked service, Select Azure SQL Database and then select ‘Continue’.
Page 13
Need to add more details
1. Under Linked Service, provide the values to create a Linked Service to link Azure Data Lake
Storage Gen2 to the Data Factory
Page 14
2. Under New linked service, Select Azure Data Lake Storage Gen2 and then select ‘Continue’.
3. Under New Linked Service pane, provide the following fields to connect with Azure Data Lake
Storage Gen2.
Name: ls_adls_hgstft2023_<username>
Azure subscription: Microsoft Azure Sponsorship - 2021_22
Storage Account Name: sahgstft2023
Page 15
Then click on ‘Create’.
4. Once you click on it, new Linked Service will be created. We can view all the linked services
created under Manage -> Linked services:
Page 16
Exercise 5: How to create Datasets
1. Under Author tab, In Factory Resources, Select Dataset to create new dataset.
Page 17
3. After selecting the SAP HANA, provide the values for the following fields like Name, Linked
Service & Table.
Page 18
5.2: Dataset with SQL Database
4. Under New Dataset pane, select the Azure SQL Database and click on ‘Continue’.
5. Under Properties Panel, set the properties for Azure SQL Database.
6. Under New Dataset pane, select the Azure Data Lake Storage Gen2 and click on ‘Continue’.
Page 19
7. On the Select Format page, choose the format type of the data, and then select Continue. In
this case, select Delimited Text when copy files as-is without parsing the content.
Page 20
8. Under Set Properties Panel, set the properties for Azure Data Lake Gen2.
Name: ds_adls_hgstft2023_<username>
Linked Service: ls_adls_hgstft2023_<username>
File Path: adf-training
Page 21
9. Once you click ‘OK’ it, new Dataset will be created. We can view all the datasets created under
Factory Resources -> Datasets:
Page 22
Exercise 6: Create a pipeline
1. Under Factory Resources, Select Pipeline, a new tab will appear and we can drag and drop
various activity into the pane and also you can edit the pipeline name.
Page 23
Unit 3: Activities
Exercise 7: How to create Lookup Activities
The Lookup activity can read data stored in a database or file system and pass it to subsequent copy or
transformation activities.
1. Under Activities pane, Choose the Lookup activity from General tab.
2. In the Settings tab of the Lookup activity, we can define the values for Source Dataset, File path
type and checked the box for First row only.
Name: Lookup Activty
Source dataset: ds_adls_hgstft2023_<username
File path type: Wildcard file path
Page 24
Exercise 8: How to create Stored Procedure Activity
1. Under Activities pane, Choose the Stored Procedure activity from General tab and assign the
Name.
Name: Stored procedure
Page 25
2. Under Stored Procedure Actvity, provide the specific value for the given fields like Linked
Service & Stored procedure name
Linked Service: ls_sql_demo1
Stored procedure name: [dbo].[CreateNewVisit]
Page 26
Exercise 9: How to create Copy Data Activity
1. Under Pipeline Pane, In the Activties toolbox expand Move & Transform. Drag the Copy
Data to the pipeline designer surface.
Name : Copy Activty
Page 27
2. Under Pipeline Designer Surface, provide the values for both Source & Sink and click on
the Validate to validate the pipeline settings.
Source Connection:
Source dataset: ds_saptable_demo
Row Count: 5
Page 28
Sink Connection :
Sink dataset: ds_dls_demo
Copy behavior: None
Page 29
Exercise 10: How to create Azure Function Activities
The Azure Function activity allows to run Azure Functions in a Data Factory pipeline. To run an Azure
Function, we need to create a linked service connection and an activity that specifies the Azure Function
that we plan to execute.
1. Firstly, will have to link the Azure Function with Azure Function linked service & provide the specific
values for the given fields.
Page 30
Under Azure Function activity,provide the specific values for the given fields like name, type, linked
service, function name, method header & body.
Property Description
Name Name of the activity in the pipeline
Type Type of activty is “Azure function Activty”
Linked service The Azure Function linked service for the
corresponding Azure Function App
function name Name of the function in the Azure Function
App that this activity calls
method REST API method for the function call
Page 31
Exercise 11: How to create Iteration & Conditional Activities
The Conditionals Activity defines a repeating control flow in your pipeline. This activity is used to iterate
over a collection and executes specified activities in a loop. The loop implementation of this activity is
similar to Foreach looping structure in programming languages.
1. Under Data Factory activty page, you will choose the various conditonal activity like Filter,
ForEach, IF Condition, Switch & Until for implementing any conditions over the data.
Page 32
2. Under Conditional pane, choose the setting tab to set the sequentials, batch count, & items
properties for implementing any conditions over the data.
Sequentail: Not tick
Batch Count: 16
Items: @activity(Get Metadata1).outputchilditems
Page 33
Exercise 12: Transformation using Dataflow.
In this exercise we are creating a Dataflow and performing aggregate and join transformations
in the dataflow and we are creating a pipeline with Dataflow Activity.
Prerequisite:
Page 34
If files are not present in the specified location upload the files given below.
Employee data
Department data
Creating datasets for employee data, department data and for sink data.
For the employee, department and the sink datasets are pointing to the Azure Data Lake
Storage Gen2. So, in the below steps are about to create these three datasets.
Page 35
2. In the New Dataset window, select Azure Data Lake Storage Gen2, and then
click Continue.
adf-training/dataflow-input/employee.csv
adf-training/dataflow-input/department.csv
adf-training/dataflow-output
Page 36
Set Properties for EmployeeDataset
Page 37
Set Properties for DataflowSinkDataset
Creating Dataflow
Page 38
2. In the General panel under Properties, specify Dataflow_demo for Name. Then
collapse the panel by clicking the Properties icon in the top-right corner.
Page 39
5. Next to add Aggregate transformation, click on the +(plus) in Employee stream
and select Aggregate under Schema modifier section.
Page 40
6. Switch to Aggregate settings
Page 41
d. Click on Aggregates, select column as empid
e. Open expression builder by clicking the Expression box and give the
Column name as TotalEmpCount and Expression as
count(empid)
Page 42
7. For seeing Data Preview, turn on Data flow debug and switch into Data Preview
tab.
Page 43
Here we are getting department wise employee count. But in place of department id if
it’s department name, it’s become more understandable. For that we are using join
transformation which joins with department data and pulls the corresponding
department name.
Page 44
9. Click on +(plus) in the AggregateOnDept stream and select Join transformation
under Multiple Inputs/Outputs sesson.
Page 45
a. Give Output stream name as JoinOnDep
Page 46
a. Output stream name as Sink
Clear the option Auto mapping and delete the unwanted columns using delete
button and keep only below columns and drag and adjust the column ordering.
Page 47
15. Switch to Optimize tab, select Single partition
Page 48
17. To validate the dataflow, click Validate on the toolbar. Confirm that there are no
validation errors.
Create Pipeline
1. In the left pane, click + (plus), and click Pipeline.
Page 49
3. In the Activities toolbox, expand Move & Transform, and drag-drop
the Dataflow activity to the pipeline design surface. You can also search for
activities in the Activities toolbox.
b. Switch to the Settings tab, select the dataflow dataflow_demo from the drop
down
4. To validate the pipeline, click Validate on the toolbar. Confirm that there are no
validation errors.
5. To publish entities (dataflow, datasets, pipelines, etc.) to the Data Factory service,
click Publish all on top of the window. Wait until the publishing succeeds.
6. Click Debug on the toolbar and confirm the pipeline is running fine.
Page 50
Page 51
Unit 4: Triggers, publishing and monitoring
Exercise 12: How to create Schedule Trigger
2. Under New Trigger pane, we can choose the Type of trigger as Schedule from the given options
Schedule, Tumbling window, Storage events, or Custom events.
Name: Schedule_trigger_demo
Type: Schedule
3. Other than this we can choose start date, recurrence, end date and whether or not to activate
the trigger immediately after you create it (Start trigger on creation). The recurrence have
various options like Minutes, Hour, Days,Week etc are present.
Page 52
4. After providing the details click on ‘OK’. The newly created trigger we can see at
Manage>Triggers pane. The trigger is showing as in the stopped state as we are unselect the
option for ‘Start trigger on creation’. We can activate the trigger if we want.
Page 53
2. Under New Trigger pane, we can choose the Type of trigger as Tumbling window from the
given options Schedule, Tumbling window, Storage events, or Custom events.
Name: tumbling_trigger_demo
Type: Tumbling Window
3. Other than this we can choose start date, recurrence, end date and whether or not to activate
the trigger immediately after you create it (Start trigger on creation). The recurrence setting is
different, you can only choose Minutes, Hours, or Months in the pane.
Page 54
4. After providing all details click on ‘OK’. The newly created trigger we can see at under Mange>
Trigger pane.
Page 55
2. Under New Trigger pane, we can choose the Type of trigger as Storage events from the given
options Schedule, Tumbling window Trigger, Storage events, Custom events.
3. We can choose the start date, end date, whether or not to activate the trigger immediately
after you create it (Start trigger on creation) & the main settings are Storage account name,
container, and blob path. We can also provide the Blob path begins with and Blob path ends
with conditions.
Name: storage_event_trigger_demo
Type: Event
Azure Subscription name: Microsoft Azure Sponsorship - 2021_22
Storage Account Name: sahgstft2023
Container name: adf-training
Blob Path ends with: csv
Event: Blob Created
Page 56
Page 57
4. Click on ‘Continue’, ‘Data Preview’ pane will open which shows the preview of blobs which
satisfies the conditions.
Click ‘OK’, a new trigger will get created. We can see the newly created trigger under Mange>
Trigger pane.
1. Once the creation of the trigger is done, open the pipeline that you want to trigger then click on
‘Add trigger’ where you can see two options Trigger now or New/Edit.
Page 58
2. In the Add Triggers, we can either create a new trigger or choose a trigger from the existing
triggers.
Page 59
Page 60
Exercise 15: Publishing of the Activity
1. Before triggering a pipeline, you must publish entities to the Data Factory.
To publish, select Publish all on the top.
Page 61
2. Under Publish All pane, you will see all the pending changes and then click Publish.
3. After publishing is done you will see a pop up like ‘Publishing is completed’.
Page 62
Exercise 16: Debugging & Monitoring a pipeline.
2. For monitoring the pipeline, swich on the Monitor tab and use the Refresh button to refresh the
list for seeing the status of the particular activity run.
Page 63
Unit 5: Copy multiple tables in bulk
This tutorial demonstrates copying a number of tables from Azure SQL Database to
Azure Synapse Analytics. You can apply the same pattern in other copy scenarios as
well. For example, copying tables from SQL Server/Oracle to Azure SQL Database/Azure
Synapse Analytics (formerly SQL DW)/Azure Blob, copying different paths from Blob to
Azure SQL Database tables.
End-to-end workflow
Prerequisites
Azure Storage account. The Azure Storage account is used as staging blob
storage in the bulk copy operation.
Azure SQL Database. This database contains the source data.
Azure Synapse Analytics. This data warehouse holds the data copied over from
the SQL Database.
Page 64
Prepare SQL Database and Azure Synapse Analytics
1. If you don't have an Azure Synapse Analytics (formerly SQL DW), see the Create a
SQL Data Warehouse article for steps to create one.
For both SQL Database and Azure Synapse Analytics, allow Azure services to access SQL
server.
Ensure that Allow Azure services and resources to access this server setting is
turned ON for your server. This setting allows the Data Factory service to read data from
your Azure SQL Database and write data to your Azure Synapse Analytics.
To verify and turn on this setting, go to your server > Security > Networking > set
the Allow Azure services and resources to access this server to ON.
Page 65
Note: The above-mentioned step is for your reference, participants no need to do
anything on this step.
Page 66
Create the source Azure SQL Database linked service.
In this step, you create a linked service to link your database in Azure SQL Database to
the data factory.
2. On the Linked services page, select +New to create a new linked service.
1. In the New Linked Service window, select Azure SQL Database, and
click Continue.
2. In the New Linked Service (Azure SQL Database) window, do the following
steps:
2. In the New Linked Service window, select Azure Synapse Analytics, and
click Continue.
3. In the New Linked Service (Azure Synapse Analytics) window, do the following
steps:
h. Click Create.
Page 68
Create the staging Azure Storage linked service
In this tutorial, you use Azure Blob storage as an interim staging area to enable PolyBase
for a better copy performance.
2. In the New Linked Service window, select Azure Blob Storage, and
click Continue.
3. In the New Linked Service (Azure Blob Storage) window, do the following steps:
c. Click Create.
Page 69
2. In the New Dataset window, select Azure SQL Database, and then click Continue.
4. Switch to the Connection tab, select any table for Table. This table is a dummy
table. You specify a query on the source dataset when creating a pipeline. The
query is used to extract data from your database. Alternatively, you can
click Edit check box, and enter dbo.dummyName as the table name.
2. In the New Dataset window, select Azure Synapse Analytics, and then
click Continue.
4. Switch to the Parameters tab, click + New, and enter DWTableName for the
parameter name. Click + New again and enter DWSchema for the parameter
name. If you copy/paste this name from the page, ensure that there's no trailing
space character at the end of DWTableName and DWSchema.
a. For Table, check the Edit option. Select into the first input box and click
the Add dynamic content link below. In the Add Dynamic Content page,
click the DWSchema under Parameters, which will automatically populate
the top expression text box @dataset().DWSchema, and then click Finish.
Page 70
1. Select into the second input box and click the Add dynamic content link below. In
the Add Dynamic Content page, click the DWTAbleName under Parameters, which
will automatically populate the top expression text box @dataset().DWTableName, and
then click Finish.
2. The tableName property of the dataset is set to the values that are passed as
arguments for the DWSchema and DWTableName parameters. The ForEach activity
iterates through a list of tables and passes one by one to the Copy activity.
Page 71
Create pipelines.
Looks up the Azure SQL Database system table to get the list of tables to be
copied.
Triggers the pipeline IterateAndCopySQLTables to do the actual data copy.
a. Click + New.
Page 72
b. Enter tableList for the parameter Name.
21. In the Activities toolbox, expand Iteration & Conditions, and drag-drop
the ForEach activity to the pipeline design surface. You can also search for
activities in the Activities toolbox.
b. Switch to the Settings tab, click the input box for Items, then click the Add
dynamic content link below.
c. In the Add Dynamic Content page, click the tableList under Parameters, which
will automatically populate the top expression text box
as @pipeline().parameter.tableList. Then click Ok
Page 73
Page 74
1. Switch to Activities tab, click the pencil icon to add a child activity to
the ForEach activity.
2. In the Activities toolbox, expand Move & Transfer, and drag-drop Copy
data activity into the pipeline designer surface. Notice the breadcrumb menu at
the top. The IterateAndCopySQLTable is the pipeline name
and IterateSQLTables is the ForEach activity name. The designer is in the activity
scope. To switch back to the pipeline editor from the ForEach editor, you can click
the link in the breadcrumb menu.
Page 75
3. Switch to the Source tab, and do the following steps:
3. Click the Query input box -> select the Add dynamic content below -> enter
the following expression for Query -> select Finish.
SQLCopy
SELECT * FROM [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]
2. Click the input box for the VALUE of DWTableName parameter -> select
the Add dynamic content below, enter @item().TABLE_NAME expression as script,
-> select Finish.
Page 76
3. Click the input box for the VALUE of DWSchema parameter -> select the Add
dynamic content below, enter @item().TABLE_SCHEMA expression as script, ->
select Finish.
6. Click the Pre-copy Script input box -> select the Add dynamic content below
-> enter the following expression as script -> select Ok.
SQLCopy
TRUNCATE TABLE [@{item().TABLE_SCHEMA}].[@{item().TABLE_NAME}]
Page 77
6. To validate the pipeline settings, click Validate on the top pipeline tool bar. Make
sure that there's no validation error. To close the Pipeline Validation Report, click
Close.
Looks up the Azure SQL Database system table to get the list of tables to be
copied.
Triggers the pipeline "IterateAndCopySQLTables" to do the actual data copy.
2. In the General panel under Properties, change the name of the pipeline
to GetTableListAndTriggerCopyData.
3. In the Activities toolbox, expand General, and drag-drop Lookup activity to the
pipeline designer surface, and do the following steps:
SQLCopy
SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.TABLES WHERE
TABLE_TYPE = 'BASE TABLE' and TABLE_SCHEMA = 'SalesLT' and TABLE_NAME <>
'ProductModel'
Page 78
5. Drag-drop Execute Pipeline activity from the Activities toolbox to the pipeline
designer surface, and set the name to TriggerCopy.
6. To Connect the Lookup activity to the Execute Pipeline activity, drag the green
box attached to the Lookup activity to the left of Execute Pipeline activity.
7. Switch to the Settings tab of Execute Pipeline activity, and do the following steps:
3. In the Parameters section, click the input box under VALUE -> select the Add
dynamic content below -> enter @activity('LookupTableList').output.value as
table name value -> select Finish. You're setting the result list from the Lookup
Page 79
activity as an input to the second pipeline. The result list contains the list of
tables whose data needs to be copied to the destination.
8. To validate the pipeline, click Validate on the toolbar. Confirm that there are no
validation errors. To close the Pipeline Validation Report, click >>.
2. Confirm the run on the Pipeline run page, and then select Ok.
1. Switch to the Monitor tab. Click Refresh until you see runs for both the pipelines
in your solution. Continue refreshing the list until you see the Succeeded status.
Page 80
run.
Page 81
Unit 6: Incrementally load data from Azure SQL Database to Azure
Blob storage
In this tutorial, you create an Azure data factory with a pipeline that loads delta data
from a table in Azure SQL Database to Azure Blob storage.
Overview
Page 82
1. Select the watermark column. Select one column in the source data store, which
can be used to slice the new or updated records for every run. Normally, the data
in this selected column (for example, last_modify_time or ID) keeps increasing
when rows are created or updated. The maximum value in this column is used as a
watermark.
2. Prepare a data store to store the watermark value. In this tutorial, you store the
watermark value in a SQL database.
Prerequisites
Azure SQL Database. Use the Azure SQL Database used in previous tutorial.
Azure Storage. Use the storage account sahgstft2023 and the container adf-
training.
Page 83
Create a data source table in your SQL database
1. Open SQL Server Management Studio. In Server Explorer, right-click the database,
and choose New Query.
2. Run the following SQL command against your SQL database to create a table
named data_source_table as the data source store:
SQL Copy
create table data_source_table
(
PersonID int,
Name varchar(255),
LastModifytime datetime
);
Create another table in your SQL database to store the high watermark
value
1. Run the following SQL command against your SQL database to create a table
named watermarktable to store the watermark value:
SQL Copy
create table watermarktable
(
TableName varchar(255),
WatermarkValue datetime,
);
2. Set the default value of the high watermark with the table name of source data
store. In this tutorial, the table name is data_source_table.
SQL Copy
Page 84
INSERT INTO watermarktable
VALUES ('data_source_table','1/1/2010 12:00:00 AM')
Output:Copy
TableName | WatermarkValue
---------- | --------------
data_source_table | 2010-01-01 00:00:00.000
BEGIN
UPDATE watermarktable
SET [WatermarkValue] = @LastModifiedtime
WHERE [TableName] = @TableName
END
Page 85
Open Data Factory
Create a pipeline.
In this tutorial, you create a pipeline with two Lookup activities, one Copy activity, and
Page 86
2. In the left pane, click + (plus), and click Pipeline.
Name. Then collapse the panel by clicking the Properties icon in the top-right
corner.
4. Let's add the first lookup activity to get the old watermark value. In the Activities
toolbox, expand General, and drag-drop the Lookup activity to the pipeline
LookupOldWaterMarkActivity.
Page 87
5. Switch to the Settings tab, and click + New for Source Dataset. In this step, you
create a dataset to represent data in the watermarktable. This table contains the
6. In the New Dataset window, select Azure SQL Database, and click Continue. You
7. In the Set properties window for the dataset, enter WatermarkDataset for
Name.
Page 88
11. In the Connection tab, select [dbo].[watermarktable] If you want to preview
Page 89
12. Switch to the pipeline editor by clicking the pipeline tab at the top or by clicking
the name of the pipeline in the tree view on the left. In the properties window for
the Lookup activity, confirm that WatermarkDataset is selected for the Source
Dataset field.
13. In the Activities toolbox, expand General, and drag-drop another Lookup activity
This Lookup activity gets the new watermark value from the table with the source
14. In the properties window for the second Lookup activity, switch to the Settings
tab, and click New. You create a dataset to point to the source table that contains
15. In the New Dataset window, select Azure SQL Database, and click Continue.
16. In the Set properties window, enter SourceDataset for Name. Select
17. Select [dbo].[data_source_table] for Table. You specify a query on this dataset
later in the tutorial. The query takes the precedence over the table you specify in
this step.
19. Switch to the pipeline editor by clicking the pipeline tab at the top or by clicking
the name of the pipeline in the tree view on the left. In the properties window for
the Lookup activity, confirm that SourceDataset is selected for the Source
Dataset field.
Page 90
20. Select Query for the Use Query field and enter the following query: you are only
Please make sure you have also checked First row only.
SQLCopy
21. In the Activities toolbox, expand Move & Transform, and drag-drop the Copy
IncrementalCopyActivity.
Page 91
22. Connect both Lookup activities to the Copy activity by dragging the green
button attached to the Lookup activities to the Copy activity. Release the mouse
button when you see the border color of the Copy activity changes to blue.
23. Select the Copy activity and confirm that you see the properties for the activity in
24. Switch to the Source tab in the Properties window, and do the following steps:
SQLCopy
Page 92
25. Switch to the Sink tab and click + New for the Sink Dataset field.
26. In this tutorial sink data store is of type Azure Blob Storage. Therefore, select
Azure Blob Storage, and click Continue in the New Dataset window.
27. In the Select Format window, select the format type of your data, and click
Continue.
28. In the Set Properties window, enter SinkDataset for Name. For Linked Service,
incrementalcopy as folder name. You can also use the Browse button for
2. For the File part of the File path field, select Add dynamic content
'.txt')in the opened window. Then click Ok. The file name is dynamically
generated by using the expression. Each pipeline run has a unique ID. The
Page 93
30. Switch to the pipeline editor by clicking the pipeline tab at the top or by clicking
31. In the Activities toolbox, expand General, and drag-drop the Stored Procedure
activity from the Activities toolbox to the pipeline designer surface. Connect the
green (Success) output of the Copy activity to the Stored Procedure activity.
32. Select Stored Procedure Activity in the pipeline designer, change its name to
StoredProceduretoWriteWatermarkActivity.
33. Switch to the settings tab and select AzureSqlDatabaseLinkedService for Linked
34. To specify values for the stored procedure parameters, click Import parameter,
ime w.NewWatermarkvalue}
TableName}
Page 94
35. To validate the pipeline settings, click Validate on the toolbar. Confirm that there
are no validation errors. To close the Pipeline Validation Report window, click
Close.
36. Publish entities (linked services, datasets, and pipelines) to the Azure Data Factory
service by selecting the Publish All button. Wait until you see a message that the
publishing succeeded.
2. Confirm the run on the Pipeline run page, and then select Ok
Page 95
Monitor the pipeline run
1. Switch to the Monitor tab on the left. You see the status of the pipeline run
triggered by a manual trigger. You can use links under the PIPELINE NAME
column to view run details and to rerun the pipeline.
2. To see activity runs associated with the pipeline run, select the link under the
PIPELINE NAME column. For details about the activity runs, select the Details link
(eyeglasses icon) under the ACTIVITY NAME column. Select All pipeline runs at
the top to go back to the Pipeline Runs view. To refresh the view, select Refresh.
Exercise 9: Review the results
1. Connect to your Azure Storage Account by using tools such as Azure Storage
Explorer. Verify that an output file is created in the incrementalcopy folder of the
adftutorial container.
2. Open the output file and notice that all the data is copied from the
data_source_table to the blob file.
3. Check the latest value from watermarktable. You see that the watermark value was
updated.
Page 96
SQL Copy
Select * from watermarktable
TableName WatermarkValue
data_source_table 2017-09-05 8:06:00.000
1. Switch to the Author tab. Click the pipeline in the tree view if it's not opened in
the designer.
1. Switch to the Monitor tab on the left. You see the status of the pipeline run
triggered by a manual trigger. You can use links under the PIPELINE NAME
column to view activity details and to rerun the pipeline.
Page 97
2. To see activity runs associated with the pipeline run, select the link under the
PIPELINE NAME column. For details about the activity runs, select the Details link
(eyeglasses icon) under the ACTIVITY NAME column. Select All pipeline runs at
the top to go back to the Pipeline Runs view. To refresh the view, select Refresh.
Verify the second output
1. In the blob storage, you see that another file was created. In this tutorial, the new
file name is Incremental-<GUID>.txt. Open that file, and you see two rows of
records in it.
2. Check the latest value from watermarktable. You see that the watermark value was
updated again.
SQL Copy
Select * from watermarktable
sample output:
TableName WatermarkValue
data_source_table 2017-09-07 09:01:00.000
Page 98