Professional Documents
Culture Documents
ETL Process Training
ETL Process Training
Concepts
Introduction to
Building a
Datawarehousing
External Systems
Replication Services
Finance
Datamart
Independent
LAN Clients
Sales
Datamart
Mkting
Datamart
Dependent
Dependent
Enterprise
Data Warehouse
Light
Clients
Web
Server
Staging Area
To Warehouse/
Datamart
Metadata
Data
Stores
Legacy
System
Extraction/
Transformation
Server
Design/Mgmt
Scrubbing Tool
Mapping Tool
Extraction Mgmt Tool
Transformation Tool
Migration Mgmt Tool
Building a Datawarehouse
Steps Involved in Building a Datawarehouse
Extraction Phase
Transformation Phase
Transporting Phase
Insert statements create Logs.
Bulk Loader is advisable
Truncate target tables before full refresh
Index Management
Drop and reindex.
Refresh Phase
Extracting Data
Extraction Process in Detail
The Process of getting data from Legacy System or any
Data Source.After extracting data is put in staging area
where it can be scrubbed and cleaned.
The source of data may from a single source or from a
multiple source. If the source is from multiple sources
then a connector tool is required to connect between
multiple sources.
If the data is from single source it can come from OLTP
system or from a flat file.
Extracting Data
The extraction process can be done either by hand coded
method or by using tools.
Advantageous and disadvantages over Customprogrammed Extraction (PL SQL Scripts) and tool based
extraction.
Extracting Data
Extraction Techniques
Extraction Methods.
Bulk Extraction.
The entire data warehouse is refreshed
periodically by extraction's from the source
systems. All applicable data are extracted from the
source systems for loading into the warehouse.
This approach heavily uses the network
connection for loading data from source to target
databases, but such mechanism is easy to set up
and maintain.
Extraction Techniques
Extraction Methods.
Change-Based Replication
Only data that have been newly inserted or
updated in the source systems are extracted and
loaded into the warehouse. This approach uses
less network connection due to the volume of data
to be transported. This mechanism involves
complex programming to determine when a new
warehouse record to be inserted or when an
existing warehouse record must be updated.
Extraction Techniques
Hand Coding Development practices
Extracting Data
Criteria for Identifying Extraction Tool.
Extracting Data
Criteria for Identifying Extraction Tool.
Extraction Tools
Extraction Tools include
Apertus Carleton. Passport
Evolutionary Technologies. ETL Extract.
Platinum. InfoPump
TRANSFORMING DATA
Transforming Data
IMPORTANCE OF QUALITY DATA.
TRANSFORMATION
TRANSFORMING DATA : PROBLEMS AND
SOLUTIONS
TRANSFORMATION TECHNIQUES
TRANSFORMATION TOOLS
Transformation
Transformation :
Transformation is process by which extracted data are
transformed into appropriate format. The data extracted
in put into the staging area where cleaning, scrubbing
takes place and stored so that transformation of the
clean data can take place. For transformation phase
data can come from cleansing tool. After transformation
data goes to the transportation stage.
Manual Examination
A sampling methodology can be selected and a manual
examination can be made on the sampled data
Process Validation
Scripts can be generated which takes care of identifying
erroneous and segregate them.
Transformation Techniques
Address field
# 123 ABC Street,
DEF City,
Republic of GH
No :
Street :
City :
Country:
123
ABC STREET
DEF
GH
Transformation Techniques
Standardization : Standards and conventions for
abbreviations are applied to individual data items to
improve uniformity in both source and target objects.
System A
Order Date
05 August 1998
----------------------------System B
Order Date
08-08-98
System A
Order Date
August 05 1998
----------------------------System B
Order Date
August 08 1998
Transformation Techniques
Deduplication : Rules are defined to identify duplicate
stores of customers or products. In case of two or
more repeated records, they are merged to form one
warehouse record.
System A
Customer Name :
John W Istin
-----------------------------------System B
Customer Name :
John William Istin
Customer Name :
John William Istin
Transformation Tools
Some of the Transformation tools includes
Apertus Carleton. Enterprise/Integrarot.
Data Mirror. Transformation Server.
Informatica. Power Mart Designer.
TRANSPORTATION
Source Data
Load
Staging Area
Warehouse Schema
Datawarehouse Building
Source A part A
Source B part B
Source C part C
Operational
Extraction
Transformation
Categorization
of transaction
data
Users View
A
B
C
Analytical
ETVL Tools
The following are the Popular ETVL Tools
Informatica.
Sagent.
ETVL Tools
Oracle Warehouse Builder - Key Features
ETVL Tools
Oracle Warehouse Builder - Key Features
DEFINING WAREHOUSE
METADATA
Metadata
What is Metadata?
Traditionally defined as data about data
Form of abstraction that describes the
structure and contents of the data
warehouse
Metadata
Importance of Metadata
Metadata establish the context of the
Warehouse data
Metadata help warehouse administrators and users
locate and understand data items, both in the source
systems and in the warehouse data structures.
E.g.: The date 02/05/98 could mean either May 2, 1998
or February 5, 1998 depending on the date
convention used. Metadata describing the format of
this date field could help determine the definite and
unambiguous meaning of the data item.
Importance of Metadata
Importance of Metadata
Metadata are a form of Audit Trail for Data
Transformation
Metadata document the transformation of source data
into warehouse data. Hence warehouse metadata must
be capable of explaining how a particular piece of
warehouse data was derived from the operational
systems.
All business rules governing the transformation of data
to new values or new formats are also documented as
metadata.
Importance of Metadata
This kind of audit trail is required:
- to build the users confidence regarding the
veracity and quality of warehouse data
- to know where the data came from so that the user
has a good understanding of warehouse data
- by some warehousing products that use this type
of metadata to generate extraction and
transformation scripts for use in the warehouse
back-end
Importance of Metadata
Metadata Improve or Maintain Data Quality
Metadata can improve or maintain warehouse data quality
through the definition of valid values for individual warehouse
data items. Using a data quality tool prior to actual loading
into the warehouse, the warehouse load images can be
reviewed to check for compliance with valid values for key
data items. Data errors are quickly highlighted for correction.
Metadata can be used as the basis for any error-correction
processing that should be done if a data error is found. Errorcorrection rules are documented in the metadata repository
and executed by program code on an as needed basis.
DEVELOPING A METADATA
STRATEGY
METADATA STRATEGY
METADATA STRATEGY
EXAMINING
TYPES OF METADATA
METADATA TYPES
ADMINISTRATIVE METADATA
END-USER METADATA
OPTIMIZATION METADATA
ADMINISTRATIVE METADATA
ADMINISTRATIVE METADATA
ADMINISTRATIVE METADATA
END-USER METADATA
End-User metadata help users create their queries and
interpret the results, and also contain,
Warehouse Contents : Must describe the data
structure and contents of the data warehouse in user
friendly terms. Aliases, rules, summaries and
precomputed totals are to be documented.
Predefined Queries & Reports : Queries & reports
that have been predefined and documented to avoid
duplication of effort.
Business rules & Policies : All business rules and
changes of there rles over time should be documented.
END-USER METADATA
Hierarchy Definitions : Hierarchy definitions are
important to support driling up and down warehouse
dimensions.
Status Information : Status information is required
to inform warehouse users of the warehouse status
at any point of time.
Data Quality : Known data quality problems in the
ware house should be clearly documented, this will
prompt users to make careful use of warehouse
data.
END-USER METADATA
Warehouse load History : A history of data errors,
data volume, load schedule should be available.
Warehouse purging rules : The rules which
determine when data is removed from warehouse
should be known to end-users.
OPTIMIZATION METADATA
METADATA MANAGEMENT
TOOLS
COMMON WAREHOUSE
METADATA
Management
Analysis
Transformation OLAP
Resource
Foundation
Counts
Classes
Associations
CWM
157
115
CWMX
130
77
Total
287
192
Object
(UML)
Relational
Warehouse
Operation
Data
Information
Business
Mining Visualization Nomenclature
Record
Multi
Dimensional
XML
Business Data
Keys
Type
Software
Expressions
Information Types
Index Mapping Deployment
UML 1.3
(Foundation, Behavioral_Elements, Model_Management)
M
I
D
D
L
E
W
A
R
E
A
P
P
L
I
C
A
T
I
O
N
Meta-metamodel
Layer (M3)
Metamodel
Layer(M2)
Metadata/Model
Layer(M1)
User Data/Object
Layer (M0)
<Stock name=IBM
price=112/>
Our Vision..
Enable
Decisions@speed
of thought
Thank