You are on page 1of 4

Microsoft Azure

Microsoft Azure, often referred to as Azure, is a cloud computing platform


operated by Microsoft that provides access, management, and
development of applications and services via around the world-distributed
data centers.

AZURE DATA FACTORY


Azure Data Factory (ADF) is a cloud-based ETL and data integration
service provided by Azure
What is ETL?

ETL stands for “Extract, Transform, and Load” and is the process of extracting data
from one or more sources and moving it to a staging environment. There, the data is
cleaned and transformed before being loaded into a data warehouse for storage and
analysis.

Three ETL stages:

Extract
1. Data is extracted from source(s) and moved to a staging area. Common data-
source formats include relational databases, XML, JSON, and flat files, but can include
non-relational databases like Informational Management Systems too.
2. The data is validated upon extraction to ensure its accuracy. Data that fails
validation rules will be rejected, then discarded or (ideally) returned to its source for
further diagnostics.

Transform
3. The validated data is cleansed in the staging area. This crucial part of the data
transformation process involves identifying corrupt, duplicate, irrelevant, noisy, or
misrepresentative data and then replacing, modifying, or deleting it.
4. Other transformations occur so the data can be stored in a useful form. Common
transformations include sorting and filtering, merging data from multiple sources,
combining or splitting rows and columns, translating coded values, and performing basic
calculations. Sensitive data is also scrubbed, encrypted, redacted, and protected before
it is exposed to business users.

Load
5. The data is loaded to its end target for storage. For ETL, the end target is usually a
data warehouse, but it can be any data store. The process for loading the data varies
wildly according to organizational needs. Those that don’t rely on historical data may
overwrite old data with the new information, while others may wish to create a history by
loading the data in historical form at regular intervals.
6. Constraints defined within the database may also trigger upon load, further
filtering the data. The database may filter out duplicates that already exist in the
database, reject data that’s missing mandatory fields, or perform other actions based on
the parameters set by the organization.
7. The stored data is now ready for further analysis. Popular data analytics tools
include Tableau, Microsoft Power BI, and Qlik Sense.

 ETL tools are geared toward on-premise databases.

What is ELT?

Extract
1. Data is extracted from source(s) and moved to a staging area. Unlike ETL, the data
doesn’t undergo a validation process at this stage.
Load
2. The data is immediately loaded in its raw format to the data lake, where it will be
warehoused. Popular cloud storage solutions include Amazon Web
Services, Cloudera, Google Cloud, and Microsoft Azure.

Transform
3. The data is transformed on an as-needed basis. This saves time in the long run
because people won’t be applying transformations to data they don’t need

For further reference on ETL vs. ELT: What’s the Difference Between
These 2 Processes?
: https://www.snaplogic.com/blog/etl-vs-elt-whats-the-difference

ELT does not work with on-premise systems.

Linked Service
Azure Linked Service is a connection mechanism to connect to an external source
outside Azure Data Factory. It works as a connection string to hold the user
authentication information1. Linked services allow you to establish a connection with
your data stores. They are connectors you can use while working with assets in data
stores

Dataset
Dataset is an explicitly defined set of data that ADF can operate on

You might also like