You are on page 1of 2

Introduction to Azure Data Factory:

Azure Data Factory is a data-integration service based on the Cloud that allows us to create
data-driven workflows in the cloud for orchestrating and automating data movement and data
transformation. Data Factory is a perfect ETL tool on Cloud. Data Factory is designed to deliver
extraction, transformation, and loading processes within the cloud. The ETL process generally
involves four steps:

1.Connect & Collect: We can use the copy activity in a data pipeline to move data from both on-
premises and cloud source data stores.

2.Transform: Once the data is present in a centralized data store in the cloud, process or
transform the collected data by using compute services such as HDInsight Hadoop, Spark, Data
Lake Analytics, and Machine Learning.

3.Publish: After the raw data is refined into a business-ready consumable form, it loads the data
into Azure Data Warehouse, Azure SQL Database, and Azure Cosmos DB, etc.

4.Monitor: Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor,
API, PowerShell, Log Analytics, and health panels on the Azure portal.

Components in Azure Data Factory:

1. Pipeline
2. Activities
3. Datasets
4. Linked Services
5. Triggers
6. Integration Runtimes
7. Data Flows
8. Data Flow Debug

1.Pipeline: A pipeline in Azure Data Factory is a logical grouping of activities that together
perform a task. It represents a workflow that defines the sequence and dependencies of the
activities to execute.

2.Activities: Activities are the processing steps within a pipeline. There are different types of
activities available in Azure Data Factory, such as data movement activities (e.g., Copy Activity),
data transformation activities (e.g., Azure Databricks activity, HDInsight activity), control activities
(e.g., Execute Pipeline activity, If Condition activity), and more.

1
3.Datasets: Datasets represent data structures within Azure Data Factory that provide a
consistent structure for working with data sources and destinations. They define the schema and
location of the data to be used in activities.

4.Linked Services: Linked Services are connection configurations that define the connection
information required to connect Azure Data Factory to external data sources and destinations.
These connections can be to various types of data stores, such as Azure SQL Database, Azure
Blob Storage, Azure Data Lake Storage, on-premises databases, and many others.

5.Triggers: Triggers define when a pipeline should be automatically executed. Azure Data
Factory supports different types of triggers, including schedule-based triggers, event-based
triggers, and tumbling window triggers.

6.Integration Runtimes: Integration Runtimes define the compute infrastructure used to execute
data integration tasks within Azure Data Factory. They can be Auto Resolve Integration Runtimes
(managed by Azure Data Factory) or Self-Hosted Integration Runtimes (installed on your own
infrastructure).

7.Data Flows: Data Flows provide a visually designed data transformation experience within
Azure Data Factory. They allow users to design and execute data transformation logic using a
graphical interface without writing code.

8.Data Flow Debug: This component helps in debugging Data Flows by allowing users to monitor
data as it flows through the data transformation process.

Replication of storage:

Replication of storage represents how the blobs are replicated to ensure the safety of their
contents in case of hardware failure. When we create a blob, one of the options we have to
select is the replication type:

You might also like