You are on page 1of 1

ETL stands for extract, transform, load – three operations you perform to move raw data

from wherever it lives – in a cloud application or on-premises database – into a data


warehouse, where you can run business intelligence analysis applications against it.
Think of it as a data pipeline, along with all the plumbing required to connect it to the
resources at both ends.

There are two broad categories of ETL tools. The tools that first hit the market 10 to 20
years ago, of which Informatica is the most well-known, run on-premises. They were tied
to data warehouse software that ran on expensive high-end hardware, because the
analytics they supported required a lot of processing power. You'd have to make sure
that your box could handle peak demand, which meant it would be underutilized most
of the time. If you needed to add capacity later, it could take weeks to get approval for a
new capital expenditure and to have the new hardware shipped and installed. In that
environment, it made sense to do as much prep work as possible (i.e. transformation)
prior to loading data into them, to avoid consuming cycles that the analysts needed.

Today, however, cloud data warehouses like Amazon Redshift, Google BigQuery, and
Snowflake have nearly infinitely scalable computing power, so you can skip the preload
transformations and dump all of your raw data into your data warehouse. You can then
define transformations in SQL and run them in the data warehouse at query time. ETL
has, in effect, become ELT.

You might also like