You are on page 1of 8

My Name is Arun

I have 4 Years of work exp in IT industry.


I have been working for SQL Server,Azure Data Engineering,Power BI.
as a part of azure data engineering we are using different services inside azure
cloud environment.

like Azure SQL ,


Azure Blob Storage,
Azure Data Lake Gen2,
Azure Synapse Data Warehousing (DW).
Azure data Factory,
i have knowledge on Azure Databricks as well.
and also i worked for 1 year of exp on power bi reporting tool.
we are working as agile method,
for Every 2 weeks we are conducting one sprint.
we are using jira tool for assigning user stories
we are using azure devops git repository
for deploying of azure data factory pipelines into different environments by using
CICD methodology.

-------------------------My Roles and responsbilites in projects


are--------------------------------------
Taking user stories and performing analysis, development and code reviews and unit
testing
Attending every Day status calls to discuss about progress of work or any blockers
to task.
and i am completly involved in this project 70% of effort on Azure Data Fcatory
Pipelines develoment.
Rest involved on sql related and power bi reports designing.

------------------------About Project
Archeture:--------------------------------------------------

Currently i am working for client varocc group


We are taking data from on premises oracle db and some csv files from FTP site,
and loading data into staging area of Azure Data Lake gen2 by using azure data
factory ETL Tool
in Incremental Loading mechanisam.
once new and latest modified records loaded into staging area of Data lake gen2 by
using control table,
we are creating some external tables using polybase in Synapse Data Warehousing
(DW).
by using this external tables we are implementing slowly changing dimension method-
1
to load data into dimensions and fact tables avialble in synapse data warehouse
once data available in synapse datawarehouse we are connecting from power bi and
consuming data like dashboards and reports.

--------------------------------------what are activites used inside


ADF :------------------------------------

1. Getmetadata activity
2. Copy Activity
3. Lookup Activity
4. For each Loop
5. Stored Procedure Activity
6. Web Activity
7. Dataflows
------------------------------------------incremental loding of data in
piepline :-----
we are maintaining one control table inside azure sql database
This control table contains all pipeline names and lastruntime of each pipeline.
based on last runtime of pipeline , we are picking latest and modified records
greter then the last runtime of pipeline,
from oracle source tables and placing data into datalake gen2 like csv files.
Then by using polybase external table ,
We are comparing Target Tables with this source external tables and identifying
which are the records new and which are records modified
then new records and insert new record and updating old record in synapse data
warehousing.

-------------------------------------------integration runtimes in
adf :---------------------------------------

Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory
to provide
data integration capabilities across different network environments.
There are 3 different kind of integration runtimes

1.auto resolve integration---this is used to do data movement between cloud


services.
2.self hosted integration runtime---this is used data movement between on premises,
private networks and cloud.
in my project to connect to on premises oracle
we are using this integration run time.
3.ssis integration run time---to lift and shifting of ssis packages into cloud
azure data factory.

-----------------------------------------------Triggers:--------------------------
To Run Azure Data Factory pipelines, we can use any of one triggers.
There are 3 different kind of triggers in ADF

Normal scheduled triggers----This is normal schedule trigger as per scheduled


timing recursively executing pipelines.
Event Based Triggers---This type of trigger will executing pipelines based on event
specified like file uploaded or deleted in blob.
Tumbling Window----executing pipelines in periodic time interval from a specified
start time while retaining state.
Tumbling windows are a series of fixed-sized, non-overlapping,
and contiguous time intervals.
when ever executing pipelines in historical manner, we can use

-----------------------------------------------How to run pipeline point


failure----------------------------------
Simply navigate to the ‘Monitor’ section in data factory user experience,
select your pipeline run, click ‘View activity runs’ under the ‘Action’ column,
select the activity and click Rerun from activity failed last time.

---------------------------------how to Run SSIS Packages from


ADF------------------------------------------------
By Lifting and shifting of ssis packages from on premises to Azure Data Factory
call packages from ADF Pipeline using Execute Package Task

----------------------------------How to establish Connection to on premises


datasource using ssis in ADF------------
To establish Connection to on prem datasource used in SSIS Package we have to
perform any of one procedure like
1.we have to setup Azure-SSIS integration runtime to a virtual network
2.The second method is Configure a self-hosted IR as a proxy for an Azure-SSIS IR
in Azure Data Factory

---------------------------------------------
Limitations-------------------------------------------------------
you cannot place a For Each activity inside of another For Each activity
You cannot put a For Each activity or Switch activity inside of an If activity
You cannot use a Set Variable activity inside a For Each activity that runs in
parallel
You cannot nest If activities
You cannot nest Switch activities
You cannot put a For Each or If activity inside a Switch activity
You cannot use an expression to populate the pipeline in an Execute Pipeline
activity
The Lookup activity has a maximum of 5,000 rows and a maximum size of 4 MB
Max 40 activites per one pipeline
100 queued runs per pipeline and 1,000 concurrent pipeline activity runs per
subscription per Azure Integration Runtime region

-------------------------------------------Performence
Tuning--------------------------------------------------
For Copy Data Activites Incresing DIUs-Data Integration Units for Azure Integration
Run Times
If it is Self Hosted integration Runtimes increse No of nodes nothing but Machines
set parelle copy option on copy activity
use Azure Blob storage or Azure Data Lake Storage Gen2 as an staging store when
loading from on prem to cloud

------------------------------------------Copying Data
dynamically---------------------------------------------

1. Create one metada

1. what is meant by RUN ID?


A pipeline run in Azure Data Factory and Azure Synapse defines an instance of a
pipeline execution.
For example, say you have a pipeline that executes at 8:00 AM, 9:00 AM, and 10:00
AM.
In this case, there are three separate runs of the pipeline or pipeline runs. Each
pipeline run has a unique pipeline run ID.
A run ID is a GUID that uniquely defines that particular pipeline run.

2. what is web and webhook activity?


A webhook activity can control the execution of pipelines through custom code.
With the webhook activity, code can call an endpoint and pass it a callback URL.
The pipeline run waits for the callback invocation before it proceeds to the next
activity.
Create a Webhook activity with UI
To use a Webhook activity in a pipeline, complete the following steps:

Search for Webhook in the pipeline Activities pane, and drag a Webhook activity to
the pipeline canvas.

Select the new Fail activity on the canvas if it is not already selected, and its
Settings tab, to edit its details.

Shows the UI for a Webhook activity.

Specify a URL for the webhook, which can be a literal URL string, or any
combination of dynamic expressions, functions, system variables, or outputs from
other activities. Provide other details to be submitted with the request.

Use the output from the activity as the input to any other activity, and reference
the output anywhere dynamic content is supported in the destination activity.

web activity
Web Activity can be used to call a custom REST endpoint from an Azure Data Factory
or Synapse pipeline.
You can pass datasets and linked services to be consumed and accessed by the
activity.

Create a Web activity with UI


To use a Web activity in a pipeline, complete the following steps:

Search for Web in the pipeline Activities pane, and drag a Web activity to the
pipeline canvas.

Select the new Web activity on the canvas if it is not already selected, and its
Settings tab, to edit its details.

Shows the UI for a Web activity.

Specify a URL, which can be a literal URL string, or any combination of dynamic
expressions, functions, system variables, or outputs from other activities. Provide
other details to be submitted with the request.

Use the output from the activity as the input to any other activity, and reference
the output anywhere dynamic content is supported in the destination activity.

3. what is look-up activity?


Lookup activity can retrieve a dataset from any of the data sources supported by
data factory and Synapse pipelines.
You can use it to dynamically determine which objects to operate on in a subsequent
activity, instead of hard coding the object name.
Some object examples are files and tables.

Create a Lookup activity with UI


To use a Lookup activity in a pipeline, complete the following steps:

Search for Lookup in the pipeline Activities pane, and drag a Lookup activity to
the pipeline canvas.

Select the new Lookup activity on the canvas if it is not already selected, and its
Settings tab, to edit its details.
Shows the UI for a Lookup activity.

Choose an existing source dataset or select the New button to create a new one.

The options for identifying rows to include from the source dataset will vary based
on the dataset type. The example above shows the configuration options for a
delimited text dataset. Below are examples of configuration options for an Azure
SQL table dataset and an OData dataset.

Shows the configuration options in the Lookup activity for an Azure SQL table
dataset.

Shows the configuration options in the Lookup activity for an OData dataset.

Supported capabilities
Note the following:

The Lookup activity can return up to 5000 rows; if the result set contains more
records, the first 5000 rows will be returned.
The Lookup activity output supports up to 4 MB in size, activity will fail if the
size exceeds the limit.
The longest duration for Lookup activity before timeout is 24 hours.
When you use query or stored procedure to lookup data, make sure to return one and
exact one result set. Otherwise, Lookup activity fails.

4. what is Get MetaData activity?


You can use the Get Metadata activity to retrieve the metadata of any data in Azure
Data Factory or a Synapse pipeline.
You can use the output from the Get Metadata activity in conditional expressions to
perform validation, or consume the metadata in subsequent activities.

Create a Get Metadata activity with UI


To use a Get Metadata activity in a pipeline, complete the following steps:

Search for Get Metadata in the pipeline Activities pane, and drag a Fail activity
to the pipeline canvas.

Select the new Get Metadata activity on the canvas if it is not already selected,
and its Dataset tab, to edit its details.

Choose a dataset, or create a new one with the New button. Then you can specify
filter options and add columns from the available metadata for the dataset.

Shows the UI for a Get Metadata activity.

Use the output of the activity as an input to another activity, like a Switch
activity in this example. You can reference the output of the Metadata Activity
anywhere dynamic content is supported in the other activity.

Shows the pipeline with a Switch activity added to handle the output of the Get
Metadata activity.

In the dynamic content editor, select the Get Metadata activity output to reference
it in the other activity.

Shows the dynamic content editor with the output of the Get Metadata activity as
the dynamic content.
5. what is Until activity?
It executes a set of activities in a loop until the condition associated with the
activity evaluates to true.

Create an Until activity with UI


To use an Until activity in a pipeline, complete the following steps:

Search for Until in the pipeline Activities pane, and drag a Set Variable activity
to the pipeline canvas.

Select the Until activity on the canvas if it is not already selected, and its
Settings tab, to edit its details.

Shows the Settings tab of the Until activity in the pipeline canvas.

Enter an expression that will be evaluated after all child activities defined in
the Until activity are executed.
If the expression evaluates to false, the Until activity will execute all its child
activities again.
When it evaluates to true, the Until activity will complete.
The expression can be a literal string expression, or any combination of dynamic
expressions, functions, system variables, or outputs from other activities. The
example below checks the value of a previously defined pipeline array variable
called TestVariable to see if it evaluates to ['done'].

Shows the  Add dynamic content  pane with an expression to check a


variable for a defined value.

Define activities that the Until activity will execute by selecting the Edit
Activities button on the Until activity directly, or by selecting the Activities
tab to select it there. A new activities editor pane will be displayed where you
can add any activities for the Until activity to execute. In this example, a Set
Variable activity simply sets the value of the variable referenced in the
expression above to ['done'], so the Until activity's expression will be true the
first time it is executed, and the Until activity will stop. In your real-world
use, you can check any conditions required and the Until activity will continue to
execute its child activities each time the expression is evaluated, until the
conditions are met.

Shows the activities editor for an Until activity with a Set Variable activity
defined.

6. what is For Each Activity?


The ForEach Activity defines a repeating control flow in an Azure Data Factory or
Synapse pipeline.
This activity is used to iterate over a collection and executes specified
activities in a loop.

Create a ForEach activity with UI


To use a ForEach activity in a pipeline, complete the following steps:

You can use any array type variable or outputs from other activities as the input
for your ForEach activity. To create an array variable, select the background of
the pipeline canvas and then select the Variables tab to add an array type variable
as shown below.

Shows an empty pipeline canvas with an array type variable added to the pipeline.

Search for ForEach in the pipeline Activities pane, and drag a ForEach activity to
the pipeline canvas.

Select the new ForEach activity on the canvas if it is not already selected, and
its Settings tab, to edit its details.

Shows the UI for a Filter activity.

Select the Items field and then select the Add dynamic content link to open the
dynamic content editor pane.

Shows the  Add dynamic content  link for the Items property.

Select your input array to be filtered in the dynamic content editor. In this
example, we select the variable created in the first step.

Shows the dynamic content editor with the variable created in the first step
selected

Select the Activities editor on the ForEach activity to add one or more activities
to be executed for each item in the input Items array.

Shows the Activities editor button on the ForEach activity in the pipeline editor
window.

In any activities you create within the ForEach activity, you can reference the
current item the ForEach activity is iterating through from the Items list. You can
reference the current item anywhere you can use a dynamic expression to specify a
property value. In the dynamic content editor, select the ForEach iterator to
return the current item.

Shows the dynamic content editor with the ForEach iterator selected.

7. What is Linked Services


Linked services are much like connection strings, which define the connection
information needed for the service to connect to external resources.

8. What is incremental loading in ADF?


Incremental loading runs periodically after the intial loading of data and it
uploads the updated data.

9. Azure key vault?


We can use the credentials of the data store and computes in azure key vault.
Azure data facrtory retrives the credentials when executing the acitivity that uses
the data store.

10. what is CICD?


In Azure data factory cicd is continous integretation and continution delivary it
moves the data from one enviornment to another enviornment.

You might also like