0% found this document useful (0 votes)
221 views11 pages

Overview of the ETL Process Steps

The ETL process extracts data from various source systems, transforms it for analytical use, and loads it into a data warehouse. It requires inputs from developers, analysts, and executives. ETL must be agile and well-documented to adapt to business changes and provide decision-makers with valuable insights. The process extracts raw data, cleanses it by deduplicating and validating, and transforms it by filtering, calculating, and formatting before loading it into the data warehouse.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
221 views11 pages

Overview of the ETL Process Steps

The ETL process extracts data from various source systems, transforms it for analytical use, and loads it into a data warehouse. It requires inputs from developers, analysts, and executives. ETL must be agile and well-documented to adapt to business changes and provide decision-makers with valuable insights. The process extracts raw data, cleanses it by deduplicating and validating, and transforms it by filtering, calculating, and formatting before loading it into the data warehouse.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

ETL Process

Dr.S.Rasheeduddin,
Assoc.Prof.
CSE( AI & ML ) Department.
ETL (Extract, Transform, and Load) Process

• What is ETL?
• The mechanism of extracting information from
source systems and bringing it into the data
warehouse is commonly called ETL, which stands
for Extraction, Transformation and Loading.
• The ETL process requires active inputs from various
stakeholders, including developers, analysts, testers,
top executives.
ETL (Extract, Transform, and Load) Process
• To maintain its value as a tool for decision-makers, Data
warehouse technique needs to change with business changes.
ETL is a recurring method (daily, weekly, monthly) of a Data
warehouse system and needs to be agile, automated, and well
documented.
Why is ETL important?

• Organizations today have both structured and unstructured


data from various sources including:
• Customer data from online payment and customer
relationship management (CRM) systems.
• Inventory and operations data from vendor systems.
• Sensor data from Internet of Things (IoT) devices
• Marketing data from social media and customer feedback
• Employee data from internal human resources systems.
• By applying the process of extract, transform, and load (ETL),
individual raw datasets can be prepared in a format and
structure that is more consumable for analytics purposes,
resulting in more meaningful insights. 
How ETL Works?
• ETL consists of three separate phases:
Extraction

• Extraction is the operation of extracting information from a


source system for further use in a data warehouse
environment. This is the first stage of the ETL process.
• Extraction process is often one of the most time-consuming
tasks in the ETL.
• The source systems might be complicated and poorly
documented, and thus determining which data needs to be
extracted can be difficult.
• The data has to be extracted several times in a periodic
manner to supply all changed data to the warehouse and
keep it up-to-date.
Cleansing

• The cleansing stage is crucial in a data warehouse technique


because it is supposed to improve data quality.
• If an enterprise wishes to contact its users or its suppliers, a
complete, accurate and up-to-date list of contact addresses,
email addresses and telephone numbers must be available.
• If a client or supplier calls, the staff responding should be quickly
able to find the person in the enterprise database, but this need
that the caller's name or his/her company name is listed in the
database.
• If a user appears in the databases with two or more slightly
different names or different account numbers, it becomes
difficult to update the customer's information.
Transform
Transform
• In the staging area, the raw data undergoes data processing. Here, the data
is transformed and consolidated for its intended analytical use case. This
phase can involve the following tasks:
• Filtering, cleansing, de-duplicating, validating, and authenticating the data.
• Performing calculations, translations, or summarizations based on the raw
data. This can  include changing row and column headers for consistency,
converting currencies or other units of measurement, editing text strings,
and more.
• Conducting audits to ensure data quality and compliance
• Removing, encrypting, or protecting data governed by industry or
governmental regulators
• Formatting the data into tables or joined tables to match the schema of the
target data warehouse.
Load
• In this last step, the transformed data is
moved from the staging area into a target
data warehouse. Typically, this involves an
initial loading of all data, followed by periodic
loading of incremental data changes and, less
often, full refreshes to erase and replace data
in the warehouse. 

You might also like