You are on page 1of 40

Evaluating

ETL and Data


Integration
Platforms
Wayne Eckerson and Colin White
REPORT SERI ES
Research Sponsors
Business Objects
DataMirror Corporation
Hummingbird Ltd
Informatica Corporation
Evaluating ETL and Data Integration Platforms
Acknowledgements
TD W I w ould like to thank m any people w ho contributed to this report. First, w e appreciate
the m any users w ho responded to our survey, especially those w ho responded to our requests
for phone interview s. Second, w ed like to thank Steve Tracy and Pieter M im no w ho review ed
draft m anuscripts and provided feedback, as w ell as our report sponsors w ho review ed outlines,
survey questions, and report drafts. Finally, w e w ould like to recognize TD W Is production
team : D enelle H anlon, Theresa Johnston, M arie M cFarland, and D onna Padian.
About TDWI
The D ata W arehousing Institute (TD W I), a division of 101com m unications LLC, is the prem ier
provider of in-depth, high-quality education and training in the business intelligence and data
w arehousing industry. TD W I supports a w orldw ide m em bership program , quarterly educational
conferences, regional sem inars, onsite courses, leadership aw ards program s, num erous print
and online publications, and a public and private (M em bers-only) W eb site.
This special report is the property of The D ata W arehousing Institute (TD W I) and is m ade available to a restricted
num ber of clients only upon these term s and conditions. TD W I reserves all rights herein. Reproduction or
disclosure in w hole or in part to parties other than the TD W I client, w ho is the original subscriber to this report,
is perm itted only w ith the w ritten perm ission and express consent of TD W I. This report shall be treated at all
tim es as a confidential and proprietary docum ent for internal use only. The inform ation contained in the report is
believed to be reliable but cannot be guaranteed to be correct or com plete.
For m ore inform ation about this report or its sponsors, and to view the archived report W ebinar, please visit:
w w w .dw -institute.com /etlreport/.
2003 by 101com m unications LLC. All rights reserved. Printed in the U nited States. The D ata W arehousing
Institute is a tradem ark of 101com m unications LLC. O ther product and com pany nam es m entioned herein m ay
be tradem arks and/or registered tradem arks of their respective com panies. The D ata W arehousing Institute is a
division of 101com m unications LLC based in Chatsw orth, CA.
Scope, Methodology, and Demographics . . . . . . . . . . .3
Executive Summary: The Role of ETL in BI . . . . . . . . . .4
ETL in Flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
The Evolution of ETL . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Framework and Components . . . . . . . . . . . . . . . . . . .5
A Quick History . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
Code Generation Tools . . . . . . . . . . . . . . . . . .6
Engine-Based Tools . . . . . . . . . . . . . . . . . . . . . . .6
Code Generators versus Engines . . . . . . . . . . .7
Database-Centric ETL . . . . . . . . . . . . . . . . . . . . .8
Data Integration Platforms . . . . . . . . . . . . . . . . . . . . .9
ETL versus EAI . . . . . . . . . . . . . . . . . . . . . . . . . .9
ETL Trends and Requirements . . . . . . . . . . . . . . . . . . .10
Large Volumes of Data . . . . . . . . . . . . . . . . . . .10
Diverse Data Sources . . . . . . . . . . . . . . . . . .10
Shrinking Batch Windows . . . . . . . . . . . . . . . .11
Operational Decision Making . . . . . . . . . . . . .11
Data Quality Add-Ons . . . . . . . . . . . . . . . . . . . .12
Meta Data Management . . . . . . . . . . . . . . . . . . . . . . . .13
Packaged Solutions . . . . . . . . . . . . . . . . . . . . . . . . .14
Enterprise Infrastructure . . . . . . . . . . . . . . . . .14
SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Build or Buy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Why Buy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
Maintaining Custom Code . . . . . . . . . . . . . . . .16
Why Build? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Build and Buy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
User Satisfaction with ETL Tools . . . . . . . . . . . . . . .18
Challenges in Deploying ETL . . . . . . . . . . . . . . . . . .19
Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
1
Table of Contents
Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . .21
Buy If You . . . . . . . . . . . . . . . . . . . . . . . . . . .21
Build If You . . . . . . . . . . . . . . . . . . . . . . . . .22
Data Integration Platforms . . . . . . . . . . . . . . . . . . . . . .22
Platform Characteristics . . . . . . . . . . . . . . . . . .23
Data Integration Characteristics . . . . . . . . . . .23
High Performance and Scalability . . . . . . . . . . . . . .23
Built-In Data Cleansing and Profiling . . . . . . . . . . . .23
Complex, Reusable Transformations . . . . . . . . . . . . .24
Reliable Operations and Robust Administration . . . .25
Diverse Source and Target Systems . . . . . . . . . . . . .25
Update and Capture Utilities . . . . . . . . . . . . . . . . . . .26
Global Meta Data Management . . . . . . . . . . . . . . . .27
SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
ETL Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . .28
Available Resources . . . . . . . . . . . . . . . . . . . . .29
Vendor Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . .29
Overall Product Considerations . . . . . . . . . . . . . . . .30
Design Features . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
Meta Data Management Features . . . . . . . . . . . . . .31
Transformation Features . . . . . . . . . . . . . . . . . . . . .31
Data Quality Features . . . . . . . . . . . . . . . . . . . . . . . .32
Performance Features . . . . . . . . . . . . . . . . . . . . . . .32
Extract and Capture Features . . . . . . . . . . . . . . . . . .33
Load and Update Features . . . . . . . . . . . . . . . . . . . .33
Operate and Administer Component . . . . . . . . . . . . .34
Integrated Product Suites . . . . . . . . . . . . . . . . . . . .34
Company Services and Pricing . . . . . . . . . . . . . . . . .35
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
by Wayne W. Eckerson and Colin White
Evaluating ETL and Data
Integration Platforms
REPORT SERI ES
2 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
WAYNE ECKERSON is director of research for The D ata W arehousing Institute (TD W I), the
leading provider of high-quality, in-depth education and research services to data w arehousing
and business intelligence professionals w orldw ide. Eckerson oversees TD W Is M em ber publi-
cations and research services.
Eckerson has w ritten and spoken on data w arehousing and business intelligence since 1994.
H e has published in-depth reports and articles about data quality, data w arehousing, custom er
relationship m anagem ent, online analytical processing (O LAP), W eb-based analytical tools,
analytic applications, and portals, am ong other topics. In addition, Eckerson has delivered pre-
sentations at industry conferences, user group m eetings, and vendor sem inars. H e has also
consulted w ith m any vendor and user firm s.
Prior to joining TD W I, Eckerson w as a senior consultant at the Patricia Seybold G roup, and direc-
tor of the G roups Business Intelligence & D ata W arehouse Service, w hich he launched in 1996.
COLIN WHITE is president of Intelligent Business Strategies. W hite is w ell know n for his in-
depth know ledge of leading-edge business intelligence, enterprise portal, database and W eb
technologies, and how they can be integrated into an IT infrastructure for building and sup-
porting the intelligent business.
W ith m ore than 32 years of IT experience, W hite has consulted for dozens of com panies
throughout the w orld and is a frequent speaker at leading IT events. H e is a faculty m em ber
at TD W I and currently serves on the advisory board of TD W Is Business Intelligence Strategies
program . H e also chairs a portals and W eb Services conference.
W hite has co-authored several books, and has w ritten num erous articles on business intelli-
gence, portals, database, and W eb technology for leading IT trade journals. H e w rites a regular
colum n for DM Review m agazine entitled Intelligent Business Strategies.Prior to becom ing
an analyst and consultant, W hite w orked at IBM and Am dahl.
About the Authors
ETL- Ext ract , t ransf orm, and l oad (ETL) t ool s pl ay a cri t i cal part i n
creat i ng dat a warehouses, whi ch f orm t he bedrock of busi ness i nt el l i -
gence (see bel ow). ETL t ool s si t at t he i nt ersect i on of myri ad source
and t arget syst ems and act as a f unnel t o pul l t oget her and bl end
het erogeneous dat a i nt o a consi st ent f ormat and meani ng and
popul at e dat a warehouses.
Business Intelligence - TDWI uses t he t erm "busi ness i nt el l i gence"
or BI as an umbrel l a t erm t hat encompasses ETL, dat a warehouses,
report i ng and anal ysi s t ool s, and anal yt i c appl i cat i ons. BI proj ect s t urn
dat a i nt o i nf ormat i on, knowl edge, and pl ans t hat dri ve prof i t abl e
busi ness deci si ons.
TDWI Definitions
The TDWI Report Seri es i s desi gned
t o educat e t echni cal and busi ness
prof essi onal s about cri t i cal i ssues i n
BI. TDWIs i n-dept h report s of f er
obj ect i ve, vendor-neut ral research
consi st i ng of i nt ervi ews wi t h
i ndust ry expert s and a survey of BI
prof essi onal s worl dwi de. TDWI i n-
dept h report s are sponsored by
vendors who col l ect i vel y wi sh t o
evangel i ze a di sci pl i ne wi t hi n BI or
an emergi ng approach or t echnol ogy.
About the TDWI Report Series
26%
5%
Corporate IT professional (63%)
Systems integrator or external consultant (26%)
Business sponsor or business user (5%)
63%
6%
Other (6%)
Less than $10 million (15%)
$10 billion +(15%)
$1 billion to $10 billion (28%)
$100 million to $1 billion (25%)
$10 million to $100 million (18%)
15%
15%
28%
25%
18%
USA (62%)
Scandinavian countries (5%)
India (4%)
Canada (6%)
18%
62%
6%
4%
5%
3%
2%
United Kingdom (3%)
Other (18%)
Mexico (2%)
9%
8%
10%
7%
5%
7%
13%
13%
6%
5%
3%
14%
Federal government
Education
Telecommunications
Healthcare
Insurance
State/local government
Other industries
Consulting/professional services
Retail/wholesale/distribution
Manufacturing (non-computer)
Financial services
0 3 6 9 12 15
Software/Internet
Demographics
Posi t i on
Company Revenues
Indust ry
6%
Corporate (51%)
Business Unit/division (21%)
Department/functional group (21%)
51%
21%
21%
Does not apply (6%)
Organi zat i on
St ruct ure
Report Scope. Thi s report exami nes
t he current and f ut ure st at e of ETL
t ool s. It descri bes how busi ness
requi rement s are f uel i ng t he creat i on
of a new generat i on of product s cal l ed
dat a i nt egrat i on pl at f orms. It exami nes
t he pros and cons of bui l di ng versus
buyi ng ETL f unct i onal i t y and assesses
t he chal l enges and success f act ors
i nvol ved wi t h i mpl ement i ng ETL t ool s.
It f i ni shes by provi di ng a l i st of
eval uat i on cri t eri a t o assi st
organi zat i ons i n sel ect i ng ETL
product s or val i dat i ng current ones.
Methodology.The research f or t hi s
report was conduct ed by i nt ervi ewi ng
i ndust ry expert s, i ncl udi ng consul t ant s,
i ndust ry anal yst s, and IT prof essi onal s,
w ho have i mpl ement ed ETL t ool s.
The research i s al so based on a survey
of 1000+ busi ness i nt el l i gence
prof essi onal s t hat TDWI conduct ed
i n November 2002.
Survey Methodology.TDWI
recei ved 1,051 responses t o t he
survey. Of t hese, TDWI qual i f i ed 741
respondent s w ho had bot h depl oyed
ETL f unct i onal i t y and w ere ei t her IT
prof essi onal s or consul t ant s at end-
user organi zat i ons. Branchi ng l ogi c
account s f or t he vari at i on i n t he
number of respondent s t o each
quest i on. For exampl e, respondent s
w ho bui l t ETL programs w ere
asked a f ew di f f erent quest i ons f rom
t hose w ho bought vendor ETL
t ool s. M ul t i -choi ce quest i ons and
roundi ng t echni ques account f or
t ot al s t hat don t equal 100 percent .
Survey Demographics.M ost
respondent s were corporat e IT
prof essi onal s who work at l arge U.S.
compani es i n a range of i ndust ri es.
(See t he i l l ust rat i ons on t hi s page
f or breakout s.)
Scope and Methodology
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
3
Scope, Met hodol ogy, and Demographi cs
Count ry
4 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Executive Summary: The Role of ETL in Business Intelligence
ETL is the heart and soul of business intelligence (BI). ETL processes bring together and com -
bine data from m ultiple source system s into a data w arehouse, enabling all users to w ork off a
single, integrated set of dataa single version of the truth. The result is an organization that
no longer spins its w heels collecting data or arguing about w hose data is correct, but one that
uses inform ation as a key process enabler and com petitive w eapon.
In these organizations, BI system s are no longer nice to have, but essential to success. These
system s are no longer stand-alone and separate from operational processingthey are inte-
grated w ith overall business processes. As a result, an effective BI environm ent based on inte-
grated data enables users to m ake strategic, tactical, and operational decisions that drive the
business on a daily basis.
Why ETL is Hard. According to m ost practitioners, ETL design and developm ent w ork con-
sum es 60 to 80 percent of an entire BI project. W ith such an inordinate am ount of resources
tied up in ETL w ork, it behooves BI team s to optim ize this layer of their BI environm ent.
ETL is so tim e consum ing because it involves the unenviable task of re-integrating the enter-
prises data from scratch. O ver the span of m any years, organizations have allow ed their busi-
ness processes to dis-integrateinto dozens or hundreds of local processes, each m anaged by a
single fiefdom (e.g., departm ents, business units, divisions) w ith its ow n system s, data, and
view of the w orld.
W ith the goal of achieving a single version of truth, business executives are appointing BI
team s to re-integratew hat has taken years or decades to undo. Equipped w ith ETL and m od-
eling tools, BI team s are now expected to sw oop in like conquering heroes and rescue the
organization from inform ation chaos. O bviously, the challenges and risks are daunting.
Mitigating Risk. ETL tools are perhaps the m ost critical instrum ents in a BI team s toolbox.
W hether built or bought, a good ETL tool in the hands of an experienced ETL developer can
speed deploym ent, m inim ize the im pact of system s changes and new user requirem ents, and
m itigate project risk. A w eak ETL tool in the hands of an untrained developer can w reak
havoc on BI project schedules and budgets.
ETL in Flux
G iven the dem ands placed on ETL and the m ore prom inent role that BI is playing in corpora-
tions, it is no w onder that this technology is now in a state of flux.
More Complete Solutions. O rganizations are now pushing ETL vendors to deliver m ore com -
plete BI solutions.Prim arily, this m eans handling additional back-end data m anagem ent and
processing responsibilities, such as providing data profiling, data cleansing, and enterprise
m eta data m anagem ent utilities. A grow ing num ber of users also w ant BI vendors to deliver a
com plete solution that spans both back-end data m anagem ent functions and front-end report-
ing and analysis applications.
Better Throughput and Scalability. They also w ant ETL tools to increase throughput and per-
form ance to handle exploding volum es of data and shrinking batch w indow s. Rather than
refresh the entire data w arehouse from scratch, they w ant ETL tools to capture and update
changes that have occurred in source system s since the last load.
ETL: The Heart and
Soul of BI
ETL Work Consumes 60
to 80 Percent of all
BI Projects
ETL Tools Are the Most
Important in a BI
Toolbox
Users Seek Integrated
Toolsets
ETL Tools Must
Process More Data in
Less Time
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
5
The Evol ut i on of ETL
More Sources, Greater Complexity, Better Administration. ETL tools also need to handle a
w ider variety of source system data, including W eb, XM L, and packaged applications. To inte-
grate these diverse data sets, ETL tools m ust also handle m ore com plex m appings and trans-
form ations, and offer enhanced adm inistration to im prove reliability and speed deploym ents.
Near-Real-Time Data. Finally, ETL tools need to feed data w arehouses m ore quickly w ith
m ore up-to-date inform ation. This is because batch processing w indow s are shrinking and
business users w ant integrated data delivered on a tim elier basis (i.e., the previous day, hour,
or m inute) so they can m ake critical operational decisions w ithout delay.
Clearly, the m arket for ETL tools is changing and expanding. In response to user requirem ents,
ETL vendors are transform ing their products from single-purpose ETL products into m ulti-pur-
pose data integration platforms. BI professionals need help understanding these m arket and
technology changes as w ell as how to leverage the convergence of ETL w ith new technologies
to optim ize their BI architectures and ensure a healthy return on their ETL investm ents.
The Evolution of ETL
Framework and Components
Before w e exam ine the future of ETL, w e w ill define the m ajor com ponents in an ETL fram ew ork.
ETL stands for extract, transform, and load. That is, ETL program s periodically extract data
from source system s, transformthe data into a com m on form at, and then load the data into
the target data store, usually a data w arehouse.
As an acronym , how ever, ETL only tells part of the story. ETL tools also com m only moveor
transport data betw een sources and targets, document how data elem ents change as they
m ove betw een source and target (i.e., m eta data), exchangethis m eta data w ith other applica-
tions as needed, and administer all run-tim e processes and operations (e.g., scheduling, error
m anagem ent, audit logs, and statistics). A m ore accurate acronym m ight be EM TD LEA!
The diagram on page 5 depicts the m ajor com ponents involved in ETL processing. The follow -
ing bullets describe each com ponent in m ore detail:
Users Want Timelier
Data
Do We Need a New
Acronym?
Illustration S1. Thediagramshowsthecorecomponentsof an ETL product. Courtesy of Intelligent BusinessStrategies.
Extract
Administration
& operations
services
Transport services
Load
Meta data
import/export
Databases
& files
Source adapters
Transform
Runtime
meta data
services
Target adapters
Design
manager
Legacy
applications
Databases & files
Meta data
repository
ETL Processing Framework
6 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Design manager: Provides a graphical m apping environm ent that lets developers
define source-to-target m appings, transform ations, process flow s, and jobs. The designs
are stored in a m eta data repository.
Meta data management: Provides a repository to define, docum ent, and m anage infor-
m ation (i.e., m eta data) about the ETL design and runtim e processes. The repository
m akes m eta data available to the ETL engine at run tim e and other applications
Extract: Extracts source data using adapters, such as O D BC, native SQ L form ats, or flat file
extractors. These adapters consult m eta data to determ ine w hich data to extract and how .
Transform: ETL tools provide a library of transform ation objects that let developers transform
source data into target data structures and create sum m ary tables to im prove perform ance.
Load: ETL tools use target data adapters, such as SQ L or native bulk loaders, to insert
or m odify data in target databases or files.
Transport services: ETL tools use netw ork and file protocols (e.g., FTP) to m ove data
betw een source and target system s and in-m em ory protocols (e.g., data caches) to
m ove data betw een ETL run-tim e com ponents.
Administration and operation: ETL utilities let adm inistrators schedule, run, and
m onitor ETL jobs as w ell as log all events, m anage errors, recover from failures, and
reconcile outputs w ith source system s.
The com ponents above com e w ith m ost vendor-supplied ETL tools, but they can also be
built by data w arehouse developers. A m ajority of com panies have a m ix of packaged and
hom egrow n ETL applications. In som e cases, they use different tools in different projects, or
they use custom code to augm ent the functionality of an ETL product. The pros and cons of
building and buying ETL com ponents are discussed later in the report. (See Build or Buy?
on page 15.)
A Quick History
Code Generation Tools
In the early 1990s, m ost organizations developed custom code to extract and transform data
from operational system s and load it into data w arehouses. In the m id-1990s, vendors recog-
nized an opportunity and began shipping ETL tools designed to reduce or elim inate the labor-
intensive process of w riting custom ETL program s.
Early vendor ETL tools provided a graphical design environm ent that generated third-genera-
tion language (3G L) program s, such as a CO BO L. Although these early code-generation tools
sim plified ETL developm ent w ork, they did little to autom ate the runtim e environm ent or
lessen code m aintenance and change control w ork. O ften, adm inistrators had to m anually dis-
tribute and m anage com piled code, schedule and run jobs, or copy and transport files.
Engine-Based Tools
To autom ate m ore of the ETL process, vendors began delivering engine-basedproducts in
the m id to late 1990s that em ployed proprietary scripting languages running w ithin an ETL
or D BM S server. These ETL engines use language interpreters to process ETL w orkflow s at
runtim e. The ETL w orkflow s defined by developers in the graphical environm ent are stored
in a m eta data repository, w hich the engine reads at runtim e to determ ine how to process
incom ing data.
Although this interpretive approach m ore tightly unifies design and execution environm ents, it
doesnt necessarily elim inate all custom coding and m aintenance. To handle com plex or
unique requirem ents, developers often resort to coding custom routines and exits that the tool
ETL Engines Interpret
Design Rules at
Runtime
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
7
The Evol ut i on of ETL
accesses at runtim e. These user-developed routines and exits increase com plexity and m ainte-
nance, and therefore should be used sparingly.
All Processing Occurs in the Engine. Another significant characteristic of an engine-based
approach is that all processing takes place in the engine, not on source system s (although hybrid
architectures exist). The engine typically runs on a W indow s or U N IX m achine and establishes
direct connections to source system s. If the source system is non-relational, adm inistrators use a
third-party gatew ay to establish a direct connection or create a flat file to feed the ETL engine.
Som e engines support parallel processing, w hich enables them to process ETL w orkflow s in
parallel across m ultiple hardw are processors. O thers require adm inistrators to m anually define
a parallel processing fram ew ork in advance (using partitions, for exam ple)a m uch less
dynam ic environm ent.
Code Generators versus Engines
There is still a debate about w hether ETL engines or code generators, w hich have im proved
since their debut in the m id-1990s, offer the best functionality and perform ance.
Benefits of Code Generators. The benefit of code generators is that they can handle m ore
com plex processing than their engine-based counterparts,says Steve Tracy, assistant director
of inform ation delivery and application strategy at H artford Life Insurance in Sim sbury, CT.
M any engine-based tools typically w ant to read a record, then w rite a record, w hich lim its
their ability to efficiently handle certain types of transform ations.
Consequently, code generators also elim inate the need for developers to m aintain user-w ritten
routines and exits to handle com plex transform ations in ETL w orkflow s. This avoids creating
blind spotsin the tools and their m eta data.
In addition, code generators produce com piled code to run on various platform s. Com piled
code is not only fast, it also enables organizations to distribute processing across m ultiple plat-
form s to optim ize perform ance.
Because the [code generation] product ran on m ultiple platform s, w e did not have to buy a
huge server,says K evin Light, m anaging consultant of business intelligence solutions at ED S
Canada. Consequently, our clients overall capital outlay [for a code generation product] w as
less than half w hat it w ould cost them to deploy a com parable engine-based product.
Benefits of Engines. Engines, on the other hand, concentrate all processing in a single server.
Although this can becom e a bottleneck, adm inistrators can optim ally configure the engine
platform to deliver high perform ance. Also, since all processing occurs on one m achine,
adm inistrators can m ore easily m onitor perform ance and quickly upgrade capacity if needed
to m eet system level agreem ents.
This approach also rem oves political obstacles that arise w hen ETL adm inistrators try to dis-
tribute transform ation code to run on various source system s. By offloading transform ation
processing from source system s, ETL engines interfere less w ith critical runtim e operations.
Enhanced Graphical Development. In addition, m ost engine-based ETL products offer visual
developm ent environm ents that m ake them easier to use. These graphical w orkspaces enable
developers to create ETL w orkflow s com prised of m ultiple ETL objects for defining data m ap-
pings and transform ations. (See Illustration S2.) Although code generators also offer visual
developm ent environm ents, they often arent as easy to use as engine-based tools, lengthening
overall developm ent tim es, according to m any practitioners.
ETL Engines Centralize
Processing in a High-
Performance Server
Code Generators
Eliminate the Need
to Maintain Custom
Routines and Exits
Visual Development
Environments
Help Manage
Complex J obs
8 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Database-Centric ETL
Today, several D BM S vendors em bed ETL capabilities in their D BM S products (as w ell as
O LAP and data m ining capabilities). Since these vendors offer ETL capabilities at little or no
extra charge, organizations are seriously exploring this option because it prom ises to reduce
costs and sim plify their BI environm ents.
In short, users are now asking, W hy should w e purchase a third-party ETL tool w hen w e can
get ETL capabilities for free from our database vendor of choice? W hat are the additional ben-
efits of buying a third-party ETL tool?
To address these questions, organizations first need to understand the level and type of ETL
processing that database vendors support. Today, there are three basic groups:
Cooperative ETL. H ere, third-party ETL tools can leverage com m on database functionality to
perform certain types of ETL processing. For exam ple, third-party ETL tools can leverage stored
procedures and enhanced SQ L to perform transform ations and aggregations in the database
w here appropriate. This enables third-party ETL tools to optim ize perform ance by exploiting
the optim ization, parallel processing, and scalability features of the D BM S. It also im proves
recoverability since stored procedures are m aintained in a com m on recoverable data store.
Complementary ETL. Som e database vendors now offer ETL functions that m irror features
offered by independent ETL vendors. For exam ple, m aterialized view s create sum m ary tables
that can be autom atically updated w hen detailed data in one or m ore associated tables
change. Sum m ary tables can speed up query perform ance by several orders of m agnitude. In
addition, som e D BM S vendors can issue SQ L statem ents that interact w ith W eb Services-based
applications or m essaging queues, w hich are useful w hen building and m aintaining near-real-
tim e data w arehouses.
Competitive ETL. M ost database vendors offer graphical developm ent tools that exploit the ETL
capabilities of their database products. These tools provide m any of the features offered by
third-party ETL tools, but at zero cost, or for a license fee that is a fraction of the price of
independent tools. At present, how ever, database-centric ETL solutions vary considerably in
quality and functionality. Third-party ETL tools still offer im portant advantages, such as the
Illustration S2. Many ETL toolsprovidea graphical interfacefor mappingsourcestotargetsand managingcomplex workflows.
ETL Visual Mapping Interface
Why Purchase
an ETL Tool When
Our Database
Bundles One?
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
9
The Evol ut i on of ETL
range of data sources supported, transform ation pow er, and adm inistration. Today, database-
centric ETL products are useful for building departm ental data m arts, but this w ill change over
tim e as D BM S vendors enhance their ETL capabilities.
In sum m ary, database vendors currently offer ETL capabilities that both enhance and com pete
w ith independent ETL tools. W e expect database vendors to continue to enhance their ETL
capabilities and com pete aggressively to increase their share of the ETL m arket.
Data Integration Platforms
D uring the past several years, business requirem ents for BI projects have expanded dram ati-
cally, placing new dem ands on ETL tools. These requirem ents are outlined in the follow ing
section (See ETL Trends and Requirements). Consequently, ETL vendors have begun to dra-
m atically change their products to m eet these requirem ents. The result is a new generation
ETL tool that TD W I calls a data integration platform.
As a platform, this em erging generation of ETL tools provides greater perform ance, through-
put, and scalability to process larger volum es of data at higher speeds. To deal w ith shrinking
batch w indow s, these platform s also load data w arehouses m ore quickly and reliably, using
change data capture techniques, continuous processing, and im proved runtim e operations.
The platform s also provide expanded functionality, especially in the areas of data quality,
transform ation pow er, and adm inistration.
As a data integration hub, these products connect to a broader array of databases, system s,
and applications as w ell as other integration hubs. They capture and process data in batch or
real tim e using either a hub-and-spoke or peer-to-peer inform ation delivery architecture. They
coordinate and exchange m eta data am ong heterogeneous system s to deliver a highly integrat-
ed environm ent that is easy to use and adapts w ell to change.
Solution Sets. U ltim ately, data integration platform s are designed to m eet a larger share of an
organizations BI requirem ents. To do this, som e ETL vendors are extending their product
lines horizontally, adding data quality tools and near-real-tim e data capture facilities to pro-
vide a com plete data m anagem ent solution. O thers are extending vertically, adding analytical
tools and applications to provide a com plete BI solution.
ETL versus EAI
To be clear, ETL tools are just one of several technologies vying to becom e the preem inent
enterprise data integration platform . O ne of the m ost im portant technologies is enterprise
application integration (EAI) softw are, such as BEA System s, Inc.s W ebLogic Platform and
Tibco Softw are Inc.s ActiveEnterprise.
EAI softw are enables developers to create real-tim e, event-driven interfaces am ong disparate
transaction system s. For exam ple, during the Internet boom , com panies flocked to EAI tools to
connect e-com m erce system s w ith back-end inventory and shipping system s to ensure that
W eb sites accurately reflected product availability and delivery tim es.
Although EAI softw are is event-driven and supports transaction-style processing, it does not
generally have the sam e transform ation pow er as an ETL tool. O n the other hand, m ost ETL
tools do not support real-tim e data processing. To overcom e these lim itations, som e ETL and
EAI vendors are now partnering to provide the best of both approaches. EAI softw are cap-
tures data and application events in real tim e and passes them to the ETL tools, w hich trans-
form the data and loads it into the BI environm ent.
Database Vendors Both
Enhance and Compete
with ETL Vendors
Higher Performance,
Complete Solution
ETL Vendors Are
Expanding Vertically
and Horizontally
Some ETL and EAI
Vendors Are
Partnering...
10 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Already, som e ETL vendors have begun incorporating EAI functionality into their core engines.
These tools contain adapters to operational applications and operate in an alw ays aw akem ode
so they can receive and process data and events generated by the applications in real tim e.
Future Reality. It w ont be long before all ETL and EAI vendors follow suit, integrating both
ETL and EAI capabilities w ithin a single product. The w inners in this race w ill have a substan-
tial advantage in vying for delivery of a com plete, robust data integration platform .
ETL Trends and Requirements
As m entioned earlier, business requirem ents are fueling the evolution of ETL into data integra-
tion platform s. Business users are dem anding access to m ore tim ely, detailed, and relevant
inform ation to inform critical business decisions and drive fast-m oving business processes.
Large Volumes of Data
M ore business users today w ant access to m ore granular data (e.g., transactions) and m ore
years of history across m ore subject areas than ever before. N ot surprisingly, data w arehouses
are exploding in size and scope. Terabyte data w arehousesa rarity several years agoare
now fairly com m onplace. The largest data w arehouses exceed 50 terabytes of raw data, and
the size of these data w arehouses is putting enorm ous pressure on ETL adm inistrators to
speed up ETL processing.
O ur survey indicates that the percentage of organizations loading m ore than 500 gigabytes
w ill triple in 18 m onths, w hile those loading less than 1 gigabyte w ill decrease by a third.
(See Illustration 1.)
Diverse Data Sources
O ne reason for increased data volum es is that business users w ant the data w arehouse to cull data
from a w ider variety of system s. According to our survey, on average, organizations now extract
data from 12 distinct data sources. This average w ill inexorably increase over tim e as organizations
expand their data w arehouses to support m ore subject areas and groups in the organization.
Although alm ost all com panies use ETL to extract data from relational databases, flat files, and
legacy system s, a significant percentage now w ant to extract data from application packages,
such as SAP R/3 (39 percent), XM L files (15 percent), W eb-based data sources (15 percent),
and EAI softw are (12 percent). (See Illustration 2.)
Spreadmarts.In addition, m any respondents also w rote in the othercategory that they w ant
ETL tools to extract data from M icrosoft Excel and Access files. In m any com panies, critical data
ETL and EAI Will
Converge
More History, More
Detail, More Subject
Areas
Illustration 1. Theaveragedata load will increasesignificantlyin thenext 18 months. Based on 756 respondents.
TODAY 18 MONTHS
Less t han 1GB 59% 40%
1GB t o 500GB 38% 50%
500GB+ 3% 10%
Eighteen-Month Plans for Data Load Volumes
On Average, Data
Warehouses Cull Data
from 12 Distinct
Sources
Excel Spreadsheets
Are a Critical Data
Source
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
11
ETL Trends & Requi rement s
is locked up in these personal data stores, w hich are controlled by executives or m anagers run-
ning different divisions or departm ents. Each group populates these data stores from different
sources at different tim es using different rules and definitions. As a result, these spreadsheets
becom e surrogate independent data m artsor spreadm artsas TD W I calls them . The prolifer-
ation of spreadm arts precipitates organizational feuds (dueling spreadsheets) that can paralyze
an organization and drive a CEO to the brink of insanity.
Shrinking Batch Windows
W ith the expansion in data volum es, m any organizations are finding it virtually im possible to
load a data w arehouse in a single batch w indow . There are no longer enough hours at night
or on the w eekend to finish loading data w arehouses containing hundreds of gigabytes or
terabytes of data.
24x7Requirements. Com pounding the problem , batch w indow s are shrinking or becom ing
non-existent. This is especially true in large organizations w hose data w arehouses and opera-
tional system s span regions or continents and m ust be available around the clock. There is no
longer any system s dow ntim eto execute tim e consum ing extracts and loads.
Variable Update Cycles. Finally, the grow ing num ber and diversity of sources that feed the
data w arehouse m ake it difficult to coordinate a single batch load. Each source system oper-
ates on a different schedule and each contains data w ith different levels of freshnessor rele-
vancy for different groups of users. For exam ple, retail purchasing m anagers find little value in
sales data if they cant view it w ithin one day, if not im m ediately. Accountants, how ever, m ay
w ant to exam ine general ledger data at the end of the m onth.
Operational Decision Making
Another reason that ETL tools m ust support continuous extract/load processing is that users w ant
tim elier data. D ata w arehouses traditionally support strategic or tactical decision m aking based on
historical data com piled over several m onths or years. Typical strategic or tactical questions are:
What are our year-over-year trends?
How close are we to meeting this months revenue goals?
If we reduce prices, what impact will it have on our sales and inventory for the next
three months?
Illustration 2. Typesof data sourcesthat ETL programsprocess. Multi-choicequestion, based on 755 respondents.
39%
15%
65%
15%
4%
12%
81%
89%
15%
Replication or change data capture utilities
Web
Other
EAI/messaging software
Relational databases
Packaged applications
Mainframe/legacy systems
XML
Flat files
0 20 40 60 80 100
Data Sources
More Data, Less Time
Data Has Different
Expiration Dates
Users Want Timelier
Data
12 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Near-real-time
Loads Will Triple
in 18 Months
Users Want ETL
Vendors to Offer
Data Quality Tools
Operational BI. H ow ever, business users increasingly w ant to m ake operational decisions
based on yesterdays or todays data. They are now asking:
Based on yesterdays sales, which products should we ship to which stores at what prices
to maximize todays revenues?
What is the most profitable way to reroute packages off a truck that broke down this morning?
What products and discounts should I offer to the customer that Im currently speaking
with on the phone?
According to our survey, m ore than tw o-thirds of organizations already load their data w are-
houses on a nightly basis. (See Illustration 3.) H ow ever, in the next 18 m onths, the num ber of
organizations that w ill load their data w arehouses m ultiple tim es a day w ill double, and the
num ber that w ill load data w arehouses in near real tim e w ill triple!
To m eet operational decision-m aking requirem ents, ETL processing w ill need to support tw o types
of processing: (1) large batch ETL jobs that involve extracting, transform ing, and loading a signifi-
cant am ount of data, and (2) continuous processing that captures data and application events
from source system s and loads them into data w arehouses in rapid intervals or near real tim e.
M ost organizations w ill need to m erge these tw o types of processing, w hich is com plicated
because of the num erous interdependencies that adm inistrators m ust m anage to ensure consis-
tency of inform ation in the data w arehouse and reports.
Data Quality Add-Ons
M any organizations w ant to purchase com plete solutions, not technology or tools for building
infrastructure and applications.
Add-on Products. W hen asked w hich add-on products from ETL vendors are very im portant,
users expressed significant interest in data cleansing and profiling tools. (See Illustration 4.)
This is not surprising since m any BI projects fail because the IT departm ent or (m ore than
likely) an outside consultancy underestim ates the quality of source data.
Code, Load, and Explode. A com m on scenario is the code, load, and explodephenom enon.
This is w here ETL developers code the extracts and transform ations, then start processing only
to discover an unacceptably large num ber of errors due to unanticipated values in the source
data files. They fix the errors, rerun the ETL process, only to find m ore errors, and so on. This
ugly scenario repeats itself until the project deadlines and budgets becom e im periled and
angry business sponsors halt the project.
Illustration 3. Based on 754 respondents.
TODAY IN 18 M ONTHS
M ont hl y 32% 27%
Weekl y 34% 29%
Dai l y/ ni ght l y 69% 65%
M ul t i pl e t i mes per day 15% 30%
Near real t i me 6% 19%
Data Warehouse Load Frequency
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
13
ETL Trends & Requi rement s
In m y experience, 10 percent [of the tim e and cost associated w ith ETL] is learning the ETL tool the
first tim e out and m aking it conform to your w ill,said John M urphy, an IT professional in a new s-
group posting. N inety percent is the interm inable iterative process of learning just how little you
know about the source data, how dirty that data really is, how difficult it is to establish a cleansing
policy that results in certifiably correct data on the target that m atches the source, and how long
it takes to re-w ork all the assum ptions you m ade at the outset w hile you w ere still an innocent.
The problem w ith current ETL tools is that they assumethe data they receive is clean and
consistent. They cant assess the consistency or accuracy of source data and they cant handle
specialized cleansing routines, such as nam e and address scrubbing. To m eet these needs, ETL
vendors are partnering w ith or acquiring data quality vendors to integrate specialized data
cleansing routines w ithin ETL w orkflow s.
Meta Data Management
As the num ber and variety of source and target system s proliferate, BI m anagers are seeking
autom ated m ethods to docum ent and m anage the interdependencies am ong data elem ents in
the disparate system s. In short, they w ant a global m eta data m anagem ent solution.
As the hub of a BI project, an ETL tool is w ell positioned to docum ent, m anage, and coordi-
nate inform ation about data w ithin data m odeling tools, source system s, data w arehouses, data
m arts, analytical tools, portals, and even other ETL tools and repositories. M any users w ould
like to see ETL tools play a m ore pivotal role in m anaging global m eta data, although others
think ETL tools w ill alw ays be inherently unfit to play the role of global traffic cop.
Certainly, ETL vendors should store m eta data for its ow n dom ain, but if they start trying to
becom e a global m eta data repository for all BI tools, they w ill have trouble,says Tracy from
H artford Life. Id prefer they focus on im proving their perform ance, m anageability, im pact
analysis reports, and ability to support com plex logic.
Enforce Data Consistency. N evertheless, large com panies w ant a global m eta data repository to
m anage and distribute standarddata definitions, rules, and other elem ents w ithin and
throughout a netw ork of interconnected BI environm ents. This global repository could help
ensure data consistency and a single version of the truththroughout an organization.
Although ETL vendors often pitch their global m eta data m anagem ent capabilities, m ost tools
today fall short of user expectations. (And even single vendor all-in-one BI suites fall short.)
W hen asked in an open-ended question, W hat w ould m ake your ETL tool better?m any sur-
vey respondents w rote better m eta data m anagem entor enhanced m eta data integration.
Illustration 4. Asignificant percentageof userswant ETL toolstooffer data cleansingand profilingcapabilities. Based on
740 respondents.
10%
9%
21%
24%
43%
Packaged analytic reports
Data cleansing tools
Packaged data models
Analytic tools
Data profiling tools
0 10 20 30 40 50
Rating the Importance of Add-On Products
Lack of Knowledge
about Source Data
Cripples Projects
ETL Tools Assume
the Data Is Clean
Users Want ETL Tools
to Play a Pivotal Role
in Meta Data
Most ETL Tools Fall
Short of Expectations
14 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Packaged Solutions Can
Speed Deployment
I dont know if ETL tools should be used for a [BI] m eta data repository, but no one else is
stepping up to the plate,says ED S Canadas Light. Light says the gap betw een expectation
and reality som etim es puts him as a consultant in a tough position. M any clients think the
ETL tools can do it all, and w e [as consultants] end up as the m eat in a sandw ich trying to
force a tool to do som ething it cant.
Packaged Solutions
Besides add-on products, an increasing num ber of organizations are looking to buy com pre-
hensive, off-the-shelf packages that provide near instant insight to their data. For instance,
m ore than a third of respondents (39 percent) said they are m ore attracted to vendors that
offer ETL as part of a com plete application or analytic suite. (See Illustration 5.)
K err-M cG ee Corporation, an energy and chem icals firm in O klahom a City, purchased a pack-
aged solution to support its O racle Financials application. W e had valid num bers w ithin a
w eek, w hich is incredible,says Brian M orris, a data w arehousing specialist at the firm . W e
dont have a lot of tim e so anything that is pre-built is helpful.
A packaged approach excels, as in K err-M cG ees case, w hen there is a single source, w hich is
a packaged application itself w ith a docum ented data m odel and API. The package K err-
M cG ee used also supplied analytic reports, w hich the firm did not use since it already had
reports defined in another reporting tool.
Some Skepticism. H ow ever, there is skepticism am ong som e users about ETL vendors w ho w ant
to deliver end-to-end BI packages. W e w ant ETL vendors to focus on their core com petency, not
their brand,says Cynthia Connolly, vice president of application developm ent at Alliance Capital
M anagem ent. W e w ant them to enhance their ETL products, not branch off into new areas.
Enterprise Infrastructure
Multi-Faceted Usage. Finally, organizations also w ant to broaden their use of ETL to support
additional applications besides BI. (See Illustration 6.) In essence, they w ant to standardize on
a single infrastructure product to support all their enterprise data integration requirem ents.
A m ajority of organizations already use ETL tools for BI, data m igration, and conversion proj-
ects. But, a grow ing num ber w ill use ETL tools to support application integration and m aster
reference data projects in the next 18 m onths.
Illustration 5. Analyticsuites. Based on 747 respondents.
Yes (39%)
No (43%)
Not sure (18%)
39%
43%
18%
Analytic Suites
Does a vendors ability to offer ETL as part of a complete analytic suite make
the vendor and its products more attractive to your team?
Organizations Want a
Standard Data
Integration
Infrastructure
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
15
Bui l d or Buy?
SUMMARY
As data w arehouses am ass m ore data from m ore sources, BI team s need ETL tools to process
data m ore quickly, efficiently, and reliably. They also need ETL tools to assum e a larger role in
coordinating the flow of data am ong source and target system s and m anaging data consistency
and integrity. Finally, m any w ant ETL vendors to provide m ore com plete solutions, either to
m eet an organizations enterprise infrastructure needs or deliver an end-to-end BI solution.
Build or Buy?
First Decision. As BI team s exam ine their data integration needs, the first decision they need to
m ake is w hether to build or buy an ETL tool. D espite the ever-im proving functionality offered
by vendor ETL tools, there are still m any organizations that believe it is better to hand w rite
ETL program s than use off-the-shelf ETL softw are. These com panies point to the high cost of
m any ETL tools and the abundance of program m ers on their staff to justify their decision.
Strong Market to Buy.N onetheless, the vast m ajority of BI team s (82 percent) have pur-
chased a vendor ETL tool, according to our survey. O verall, 45 percent use vendor tools
exclusively, w hile 37 percent use a com bination of tools and custom ETL code. O nly 18 per-
cent exclusively w rite ETL program s from scratch. (See Illustration 7.)
Illustration 6. BesidesBI, organizationswant touseETL productsfor a varietyof data integration needs. Based on 740 responses.
40%
45%
81%
87%
Data warehousing/business intelligence
Reference data integration
(e.g., customer hubs)
Application integration
Data migration/conversion
0 20 40 60 80 100
Today
In 18 months
27%
29%
10%
11%
Current and Planned Uses of ETL
Illustration 7. Thevast majority of organizationsbuildingdata warehouseshavepurchased an ETL tool froma vendor. Based
on 761 responses.
Build only (18%)
Buy only (45%)
Build & buy (37%)
18%
45%
37%
Build or Buy?
Most Companies Buy
ETL Tools
16 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Why Buy?
Maintaining CustomCode
The prim ary reason that organizations purchase vendor ETL tools is to m inim ize the tim e and
cost of developing and m aintaining proprietary code.
W e need to expand the scale and scope of our data w arehouse and w e cant do it by hand-
cranking code,says Richard Bow les, data w arehouse developer at Safew ay Stores plc in
London. Safew ay is currently evaluating vendor ETL tools in preparation for the expansion of
its data w arehouse.
The problem w ith Safew ays custom CO BO L program s, according to Bow les, is that it takes
too m uch tim e to test each program w henever a change is m ade to a source system or target
data w arehouse. First, you need to analyze w hich program s are affected by a change, then
check business rules, rew rite the code, update the CO BO L copybooks, and finally test and
debug the revised program s, he says.
Custom code equals high m aintenance. W e w ant our team focused on building new routines
to expand the data w arehouse, not preoccupied w ith m aintaining existing ones,Bow les says.
Tools that Speed Deployment Save Money. M any BI team s are confident that vendor ETL tools
can better help them deliver projects on tim e and w ithin budget. These m anagers ranked
speed of deploym entas the num ber one reason they decided to purchase an ETL tool,
according to our survey. (See Table 1.)
K en K irchner, a data w arehousing m anager at W erner Enterprises, Inc., in O m aha, N E, is using
an ETL tool to replace a team of outside consultants that botched the firm s first attem pt at
building a data w arehouse.
O ur consultants radically underestim ated w hat needed to be done. It took them 15 m onths to
code 130 stored procedures,says K irchner. A [vendor ETL] tool w ill reduce the num ber of
people, tim e, and m oney w e need to deliver our data w arehouse.
ED S Canadas Light says in m any situations, clients have older ETL environm ents w here the
business rules are encapsulated in CO BO L or other 3G Ls, m aking it im possible to do im pact
analysis. M oving to an engine-based tool allow s the business rules to be m aintained in a sin-
gle location w hich facilitates m aintenance.
Table1. Rankingof reasonsfor buyingan ETL tool.
To deploy ETL more quickly
It had all the functionality we need
Help us manage meta data
Build a data integration infrastructure
1
2
3
4
0 1000 2000 3000 4000 5000
Ranking Score
Reasons for Buying an ETL Tool - Ranking
Custom Code Is
Laborious to Modify
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
17
Bui l d or Buy?
Why Build?
Cheaper and Faster. H ow ever, som e organizations believe it is cheaper and quicker to code
ETL program s than use a vendor ETL tool. (See Table 2.)
Custom softw are is w hat saves us tim e and m oney, even in m aintenance, because its geared
to our business,says M ike G albraith, director of G lobal Business System s D evelopm ent at
Tyco Electronics in H arrisburg, PA.
The key for ETL program sor any softw are developm ent projectis to w rite good code,
G albraith says. W e use softw are engineering techniques. W e w rite our code from specifica-
tions and a m eta data m odel. Everything is self docum enting and that has saved us a lot.
H ubert G oodm an, director of business intelligence at Cum m ins, Inc., a leading m aker of diesel
engines, says the cost to m aintain his groups custom ETL code is less than the annual m ainte-
nance fee and training costs that another departm ent is paying to use a vendor ETL tool.
G oodm ans team uses object-oriented techniques to w rite efficient, m aintainable PL/SQ L code.
To keep costs dow n and speed turnaround tim es, he outsources ETL code developm ent over-
seas to program m ers in India. (Som e practitioners say it is im practical to outsource G U I-based
ETL developm ent because of the interactive nature of such tools.)
The Hard J ob of ETL. Several BI m anagers also said vendor ETL tools dont address the really
challenging aspects of m igrating source data into a data w arehouse, specifically how to identi-
fy and clean dirty data, build interfaces to legacy system s, and deliver near-real-tim e data. For
these m anagers, the m ath doesnt add upw hy spend $200,000 or m ore to purchase an ETL
tool that only handles the easyw ork of m apping and transform ing data?
Build and Buy
Slow Migration to ETL Packages. D espite these argum ents, there is a slow and steady m igration
aw ay from w riting custom ETL program s. M ore than a quarter (26 percent) of organizations that
use custom code for ETL w ill soon replace it w ith a vendor ETL tool. Another 23 percent are
planning to augm ent their custom code w ith a vendor ETL tool. (See Illustration 8.)
W hen asked w hy they are sw itching, m any BI m anagers said that vendor ETL tools didnt
existw hen they first im plem ented their data w arehouse. O thers said they initially couldnt
afford a vendor ETL tool or didnt have tim e to investigate new tools or train team m em bers.
Typically, these team s leave existing hand code in place and use the ETL package to support
a new project or source system . O ver tim e, they displace custom code w ith the vendor ETL
Table2. Rankingof reasonsfor buyingan ETL tool. Based on respondentswhocodeETL exclusively.
Less expensive
Our culture encourages in-house development
Build a data integration infrastructure
Deploy more quickly
0 500 1000 1500 2000
1
2
3
4
Ranking Score
Reasons for Coding ETL Programs - Ranking
Cummins Outsources
ETL Coding Overseas
to Speed Development
Firms Are Gradually
Replacing Home
Grown Code
18 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
tool as their team gains experience w ith the tool and the tool proves it can deliver adequate
perform ance and functionality.
Mix and Match. Som e team s, how ever, dont plan to abandon custom code. Their strategy is
to use w hatever approach m akes sense in each situation. These team s often use custom code
to handle processes that vendor ETL tools dont readily support out of the box.
For exam ple, Preetam Basil, a Priceline.com data w arehousing architect, points out that
Priceline.com uses a vendor ETL tool for the bulk of its ETL processing, but it has developed
U N IX and SQ L code to extract and scrub data from its volum inous W eb logs.
User Satisfaction with ETL Tools
O verall, m ost BI team s are generally pleased w ith their ETL tools. According to our survey, 28
percent of data w arehousing team s are very satisfiedw ith their vendor ETL tool, and 51 per-
cent are m ostly satisfied.Vendor ETL tools get a slightly higher satisfaction rating than hand-
coded ETL program s. (See Illustration 9.)
A m ajority of organizations (52 percent) plan to upgrade to the next version of their ETL tool
w ithin 18 m onths, a sign that the team is com m itted to the product. Another 22 percent w ill
m aintain the product as isduring the period. A sm all fraction plan to replace their vendor ETL
tool w ith another vendor ETL tool (7 percent) or custom code (2 percent). (See Illustration 10.)
Illustration 8. Almost half (49 percent) of respondentsplan toreplaceor augment their customcodewitha vendor ETL tool.
Based on 415 respondents.
Continue to enhance the customcode (43%)
Replace it with a vendor ETL tool (26%)
Augment it with a vendor ETL tool (23%)
43%
26%
23%
7%
2%
Outsource it to a third party (2%)
Other (7%)
Eighteen-Month Plans for Your ETL Custom Code?
Users Are Mostly
Satisfied with
ETL Tools
Strategy: Custom Code
Where Appropriate
Illustration 9. Vendor ETL toolshavea higher satisfaction ratingthan customETL programsfor organizationsthat useeither vendor ETL
toolsor customcodeexclusively. Based on 341 and 134 responses, respectively.
Very Satisfied
Mostly Satisfied
Somewhat Satisfied
46%
24%
12%
18%
Not Very Satisfied
51%
16%
5%
28%
0 10 20 30 40 50 60
Vendor ETL tool
Custom code
Satisfaction Rating: Vendor Tools versus Custom Code
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
19
Bui l d or Buy?
Very few organizations have abandoned vendor ETL tools. M ost have purchased one tool and
continue to use it in a production environm ent. (See Illustrations 11a and 11b.) In other
w ords, there is very little ETL shelfw are.
Challenges in Deploying ETL
D espite this rosy picture, BI team s encounter num erous challenges w hen deploying vendor
ETL tools. The top tw o challenges are ensuring adequate data qualityand understanding
source data,according to our survey. (See Table 3.) Although these problem s are not neces-
sarily a function of the ETL tool or code, they do represent an opportunity for ETL vendors to
expand the capabilities of their toolset to m eet these needs.
Complex Transformations. The next m ost significant challenge is designing com plex trans-
form ations and m appings. The goal of [an ETL] tool is to m inim ize the am ount of code
you have to w rite outside of the tool to handle transform ations,says A lliance Capital
M anagem ents Connolly.
U nfortunately, m ost ETL tools fall short in this area. W e w ish there w as m ore flexibility in
w riting transform ations,says Brian K oscinski, ETL project lead at Am erican Eagle O utfitters in
W arrendale, PA. W e often need to drop a file into a tem porary area and use our ow n code to
sort the data the w ay w e w ant it before w e can load it into our data w arehouse.
Illustration 11a. Most BI teamsonly useoneETL tool. Based on 617 responses. Illustration 11b. Very fewETL toolsgounused. Based on 617 responses.
ETL Products in Production
Number of ETL Tools
0 (6%)
1 (60%)
2 (25%)
60%
25%
6%
6%
3%
3 (6%)
4+(3%)
0 (81%)
1 (15%)
2 (4%)
81%
15%
4%1%
3 (1%)
ETL Shelfware
How many vendor ETL tools has your team
purchased that it does not use?
Illustration 10. Most teamsarecommitted totheir current vendor ETL tool. Based on 622 responses.
Upgrade to the next or latest version (52%)
Maintain as is (22%)
Augment it with customcode or functions (11%)
52%
22%
11%
3%
7%
Replace it with a tool fromanother vendor (7%)
Augment it with a tool fromanother vendor (3%)
2%
4%
Replace it with hand-written code (2%)
Outsource it to a third party (0%)
Other (4%)
Eighteen-Month Plans for Your Vendor ETL Tool
Data Quality Issues
Top all Challenges
Goal: Minimize Code
Written Outside the
ETL Tool
20 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
According to our survey, only a sm all percentage of team s (11 percent) are currently extend-
ing vendor ETL tools w ith custom code. The rest are using a reasonable am ountof custom
code or none at allto augm ent their vendor ETL tools.
Skilled ETL Developers. Another significant challenge, articulated by m any survey respondents and
users interview ed for this report, is finding and training skilled ETL program m ers. M any say the
skills of ETL developers and fam iliarity w ith the tools can seriously affect the outcom e of a project.
Priceline.com s Basil agrees. Initially, w e didnt have ETL developers w ho w ere know ledge-
able about data w arehousing, database design, or ETL processes. As a result, they built ineffi-
cient processes that took forever to load.
ED S Canadas Light says its im perative to hire experienced ETL developers at the outset of a
project, no m atter w hat the cost, to establish developm ent standards. O therw ise, the prob-
lem s encountered dow n the line w ill be m uch larger and harder to grapple w ith.O thers add
that its w ise to hire at least tw o ETL developers in case one leaves in m idstream .
M ost BI m anagers are frustrated by the tim e it takes developers to learn a new tool (about three
m onths) and then becom e proficient w ith it (12 or m ore m onths.) Although the skill and experi-
ence of a developer counts heavily here, a good ETL developm ent environm entalong w ith
high-quality training and support offered by the vendorcan accelerate the learning curve.
The Temptation to Code. D evelopers often com plain that it takes m uch longer to learn the ETL
tool and m anipulate transform ation objects than sim ply coding the transform ations from
scratch. And m any give in to tem ptation. But perseverance pays off. D evelopers are five to
six tim es m ore productive once they learn an ETL tool com pared to using a com m and-line
interface, according to Pieter M im no.
Table3. Data quality issuesarethetoptwochallengesfacingETL developers.
Ensuring adequate data quality
Creating complex mappings
Creating complex transformations
Understanding source data
Ensuring adequate performance
Providing access to meta data
Finding skilled ETL programmers
Collecting and maintaining meta data
Ensuring adequate scalability
Loading data in near real time
Ensuring adequate reliability
Integrating with third-party tools or applications
Creating and integrating user defined functions
Integrating with load utilities
Debugging programs
0 1000 2000 3000 4000 5000 6000 7000 8000
1
11
10
9
8
7
6
5
4
3
2
15
14
13
12
Ranking Score
Ranking of ETL Challenges
Developers Take 12 or
More Months to Be
Proficient with an
ETL Tool
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
21
Bui l d or Buy?
Pricing
Sticker Shock. Perhaps the biggest hurdle w ith vendor ETL tools is their price tag. M any data
w arehousing team s experience sticker shock w hen vendor ETL salespeople quote license fees.
M any team s w eigh the perceived value of a tool against its list pricew hich often exceeds
$200,000 for m any toolsand w alk aw ay from the deal.
For $200,000 you can buy at least 4,000 m an hours of developm ent and the tools ongoing m ain-
tenance fee pays for one-half of a head count,says data w arehousing consultant D arrell Piatt.
The price of ETL tools is a difficult pill to sw allow for com panies that dont realize that a
data w arehouse is different from a transaction system , says Bryan LaPlante, senior consultant
w ith Pragm atek, a M inneapolis consultancy that w orks w ith m id-m arket com panies. These
com panies dont realize that an ETL tool m akes it easier to keep up w ith the continuous flow
of changes that are part and parcel of every data w arehousing program , he says.
But even com panies that understand the im portance of ETL tools find it difficult to open their
checkbooks. Although ETL tools are m uch m ore functional than they w ere five years ago,
m ost are still not w orth buying at list price,says Piatt.
Spread the Wealth. To justify the steep costs, m any team s sell the tool internally as an infrastruc-
ture com ponent that can support m ultiple projects. The initial purchase is expensive, but it is
easier to justify w hen you use it for m ultiple projects,says H eath H atchett, group m anager of
business intelligence engineering at Intuit in M ountain View , CA. O thers negotiate w ith vendors
until they get the price they w ant. Its a buyersm arket right now ,says Safew ays Bow les.
G iven the current econom y and the increasing m aturity of the ETL m arket, the list prices for ETL
products are starting to fall. The dow nw ard pressure on prices w ill continue for the next few
years as softw are giants bundle robust ETL program s into core data m anagem ent program s,
som etim es at no extra cost, and BI vendors package ETL tools inside of com prehensive BI suites.
In addition, a few start-ups now offer sm all-footprint products at extrem ely affordable prices.
Recommendations
Every BI project team has different levels and types of resources to m eet different business
requirem ents. Thus, there is no right or w rong decision about w hether to build or buy ETL capa-
bilities. But here are som e general recom m endations to guide your decision based on our research.
Buy If You
Want to Minimize Hand Coding. ETL tools enable developers to create ETL w ork-
flow s and objects using a graphical w orkbench that m inim izes hand coding and the
problem s that it can create.
Want to Speed Project Deployment. In the hands of experienced ETL program m ers,
an ETL tool can boost productivity and deploym ent tim e significantly. (H ow ever, be
careful, these productivity gains dont happen until ETL program m ers have w orked w ith
a tool for at least a year.)
Want to Maintain ETL Rules Transparently. Rather than bury ETL rules w ithin code,
ETL tools store target-to-source m appings and transform ations in a m eta data repository
accessible via the tools G U I and/or an API. This reduces dow nstream m aintenance
costs and insulates the team w hen key ETL program m ers leave.
Have Unique Requirements the Tool Supports. If the tool supports your unique
requirem entssuch as providing adapters to a packaged application your com pany just
purchasedit m ay be w orthw hile to invest in a vendor ETL tool.
Hard to J ustify the
Price Compared to
Using Programmers
Downward Pricing
Pressures Now Exist
22 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Need to Standardize Your BI Architecture. Purchasing a scalable ETL tool w ith suffi-
cient functionality is a good w ay to enable your organization to establish a standard
infrastructure for BI and other data m anagem ent projects. Reusing the ETL tool in m ulti-
ple projects helps to justify purchasing the tool.
Build If You
Have Skilled Programmers Available. Building ETL program s m akes sense w hen
your IT departm ent has skilled program m ers available w ho understand BI and are not
bogged dow n delivering other projects.
Practice Sound Software Engineering Techniques. The key to coding ETL is w riting
efficient code that is designed from a com m on m eta m odel and rigorously docum ented.
Have a Solid Relationship with a Consulting Firm. Consultancies that youve w orked
w ith successfully in the past and understand data w arehousing can efficiently w rite and
docum ent ETL program s. (Just m ake sure they teach you the code before they leave!)
Have Unique Functionality. If you have a significant num ber of com plex data sources
or transform ations that ETL tools cant support out of the box, then coding ETL pro-
gram s is a good option.
Have Existing Code. If you can reuse existing, high-quality code in a new project,
then it m ay be w iser to continue to w rite code.
Data Integration Platforms
As m entioned earlier, m any vendors are extending their ETL products to m eet new user
requirem ents. The result is a new generation of ETL products that TD W I calls data integration
platforms. These products extend ETL tools w ith a variety of new capabilities, including data
cleansing, data profiling, advanced data capture, increm ental updates, and a host of new
source and target system s. (See Illustration S3.)
W e can divide the m ain features of a data integration platform into platform characteristics and
data integration characteristics. N o ETL product today delivers all the features outlined below .
Illustration S3. Data integration platformsextend thecapabilitiesof ETL tools.
Extract
Administration
& operations
services
Transport services
Load
Meta data
import/export
Databases
& files
Source adapters
Transform
Runtime
meta data
services
Target adapters
Design
manager
Legacy
applications
Application
integration software
Global
Meta data
repository
Application
packages
Application
integration
software
Web &
XML data
Event
data
Databases
& files XML
Adapter
development
kit
Capture
Update
Clean
Profile
Data Integration Platforms
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
23
Dat a I nt egrat i on Pl at f orms
H ow ever, m ost ETL vendors are m oving quickly to deliver full-fledged data integration platform s.
PlatformCharacteristics:
H igh perform ance and scalability
Built-in data cleansing and profiling
Com plex, reusable transform ations
Reliable operations and robust adm inistration
Data Integration Characteristics:
D iverse source and target system s
U pdate and capture facilities
N ear-real-tim e processing
G lobal m eta data m anagem ent
High Performance and Scalability
Parallelization. D ata integration platform s offer exceedingly high throughput and scalability
due to parallel processing facilities that exploit high-perform ance com puting platform s.
The parallel engine processes w orkflow stream s in parallel using m ultiple operating system
threads, m ulti-pass SQ L, and an in-m em ory data cache to hold tem porary data w ithout storing
it to disk. W here dependencies exist am ong or w ithin stream s, the engine uses pipelining to
optim ize throughput.
The high-perform ance com puting platform supports clusters of servers running m ultiple CPU s
and load balancing softw are to optim ize perform ance under large, sustained loads.
Benchmarks. In the m id-1990s, ETL engines achieved a m axim um throughput of 10 to 15 giga-
bytes of data per hour, w hich w as sufficient to m eet the needs of m ost BI projects at the tim e.
Today, how ever, ETL engines now boast throughput of betw een 100 and 150 gigabytes an
hour or m ore, thanks to steady im provem ents in m em ory m anagem ent, parallelization, distrib-
uted processing, and high-perform ance servers.
1
Built-In Data Cleansing and Profiling
To avoid the code, load, and explodesyndrom e m entioned earlier, good data integration
platform s provide built-in support for data cleansing and profiling functionality.
Data Profiling Tools. D ata profiling tools provide an exhaustive inventory of source data.
(Although m any com panies use SQ L to sam ple source data value, this is akin to sighting the
tip of an iceberg.) D ata profiling tools identify the range of values and form ats of every field
as w ell as all colum n, row , and table dependencies. The tool spits out reports that serve as
the Rosetta Stone of your source files, helping you translate betw een hopelessly out-of-date
copy books and w hats really in your source system s.
Data Cleansing Tools. D ata cleansing tools validate and correct business rules in source data. M ost
data cleansing tools apply specialized functions for scrubbing nam e and address data: (1) parsing
and standardizing the form at of nam es and addresses, (2) verifying those nam es/addresses against
a third-party database, (3) m atching nam es to rem ove duplicates, and (4) householding nam es to
consolidate m ailings to a single address.
1
There are m any variables that affect throughputthe com plexity of transform ations, the num ber of sources
that m ust be integrated, the com plexity of target data m odels, the capabilities of the target database, and so
on. Your throughput rate m ay vary considerably from these rates.
Parallel Engines and
High-Performance
Computing Platforms
Todays ETL Engines
Can Process
150GB/Hour
Profiling Tools: the
Rosetta Stone to
Source Systems
24 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Som e data cleansing tools now can also apply m any of the above functions to specific types of
non-nam e-and-address data, such as products.
D ata integration platform s integrate data cleansing routines w ithin a visual ETL w orkflow . (See
Illustration S4.) At runtim e, the ETL tool issues calls to the data cleansing tool to perform its
validation, scrubbing, or m atching routines.
Ideally, data integration platform s exchange m eta data w ith profiling and cleansing tools. For
instance, a data profiling tool can share errors and form ats w ith an ETL tool so developers can
design m appings and transform ations to correct the dirty data. Also, a data cleansing tool can
share inform ation about m atched or householded custom er records so the ETL tool can com -
bine them into a single record before loading the data w arehouse.
Complex, Reusable Transformations
Extensible, Reusable Objects. D ata integration platform s provide a robust set of transform ation
objects that developers can extend and com bine to create new objects and reuse them in vari-
ous ETL w orkflow s. Ideally, developers should be able to deploy a custom object that dynam -
ically adapts to its environm ent, m inim izing the tim e developers spend updating m ultiple
instances of the object.
M ost ETL tools today lim it reuse because they are context sensitive,says Steve Tracy at H artford
Life Insurance in Sim sbury, CT. They let you copy and paste an object to support 37 instance of a
source system , but you have to configure each object separately for each. This becom es painful if
there is a change in the source system and you have to reconfigure all 37 instances of the object.
External Code. Although ETL tools have gained a raft of built-in and extensible transform ation
objects, developers alw ays run into situations w here they need to w rite custom code to sup-
port com plex or unique transform ations.
To address this pain point, data integration platform s provide internal scripting languages that
enable developers to w rite custom code w ithout leaving the tool. But if a developer is m ore
com fortable coding in C, C++, or another language, the tool should also be able to call exter-
nal routines from w ithin an ETL w orkflow and m anage those routines as if they w ere internal
objects. In the sam e w ay, the tool should be able to call third-party schedulers, security sys-
tem s, m eta data repositories, and analytical tools.
Illustration S4. Data integration platformsintegratedata cleansingroutinesintoan ETL graphical workflow.
Integrating Cleansing and ETL
Calls to External Code
Should Be Transparent
Most Reusable ETL
Objects Are Context
Sensitive
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
25
Dat a I nt egrat i on Pl at f orms
Reliable Operations and Robust Administration
N o m atter how m uch horsepow er a system possesses, if its runtim e environm ent is unstable,
overall perform ance suffers. In our survey, 86 percent of respondents rated reliabilityas a
very im portantETL feature. (See Illustration 13) This w as the highest rating of any ETL fea-
ture respondents w ere asked to evaluate.
Robust Schedulers. D ata integration platform s provide m uch m ore robust operational and
adm inistrative features than the current generation of ETL tools. For exam ple, they provide
robust schedulers that orchestrate the flow of inter-dependent jobs (both ETL and non-ETL)
and run in a lights-out environm ent. O r they interface w ith a third-party scheduling tool so the
organization can m anage both ETL and non-ETL jobs w ithin the sam e environm ent in a highly
granular fashion.
Conditional Execution. Job processing is m ore reliable because data integration platform s are
sm artenough to perform conditional execution based on thresholds, content, or inter-
process dependencies. This reduces the am ount of coding needed to prevent such outages,
especially as BI environm ents becom e m ore com plex.
Debugging and Error Recovery. In addition, data integration platform s provide robust debug-
ging and error recovery capabilities that m inim ize how m uch code developers have to w rite to
recover from errors in the design or runtim e environm ents. These tools also deliver reports
and diagnostics that are easy to understand and actionable. Instead of logging w hat happened
and w hen, users w ant the tools to say why it happened and recommend fixes.
Enterprise Consoles. Surprisingly, only 18 percent of com panies said a single ETL operations
console and graphical m onitoring w as very im portant.(See Illustration 15) This indicates that
m ost organizations do not yet need to m anage m ultiple ETL servers. This w ill change as ETL
usage grow s and organizations decide to standardize the scheduling and m anagem ent of ETL
processes across the enterprise.
W e suspect m ost organizations w ill use in-house enterprise system s m anagem ent tools, such
as Com puter AssociatesU nicenter, to m anage distributed ETL system s.
Diverse Source and Target Systems
Intelligent Adapters. The m ark of a good data integration platform is the num ber and diversity
of source and target system s it supports. Besides supporting relational databases and flat
filesw hich m ost ETL tools support todaydata integration platform s provide adapters to
intelligently connect to a variety of com plex and unique data stores.
Application Interfaces and Data Types. For exam ple, operational application packages, such as
SAPs R/3, contain unique data types, such as pooled and clustered tables, as w ell as several
different application and data interfaces. D ata integration platform s offer native support for
these interfaces and can intelligently handle unique data types.
Web Services and XML. D ata integration platform s also provide am ple support to the W eb
w orld. W e expect XM L and W eb Services to constitute an ever greater portion of the process-
ing that ETL tools perform . XM L is becom ing a standard m eans of interchanging both data and
m eta data. W eb Services (w hich use XM L) is becom ing the standard m ethod for enabling het-
erogeneous applications to interact over the Internet. N ot surprising, alm ost half (47 percent)
of survey respondents said W eb Services w ere an im portantor fairly im portantfeature for
ETL tools to possess. Its likely that W eb Services and XM L w ill becom e a de facto m ethod for
Users Want Greater
Reliability above
All Else
Web Services and
XML Are Fundamental
Services for ETL
26 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
stream ing XM L data from operational system s to ETL tools and from data w arehouses to
reporting and analytical tools and applications.
Desktop Data Stores. Also, data integration platform s support pervasive desktop applications,
such as M icrosoft Excel and Access. By connecting to these sources via XM L and W eb
Services, data integration platform s enable organizations to halt the grow th of spreadm arts
and the organizational infighting they create.
Adapter Development Kits. Finally, data integration platform s provide adapter developm ent kits that
let developers create custom adapters to link to unique or uncom m on data sources or applications.
Update and Capture Utilities
To m eet business requirem ents for m ore tim ely data and negotiate shrinking batch w indow s,
data integration platform s support a variety of data capture and loading techniques.
Incremental Updates. First data integration platform s update data w arehouses increm entally
instead of rebuilding or refreshingthem (i.e., dim ensional tables) from scratch every tim e.
M ore than one-third of our survey respondents (34 percent) said increm ental update is a very
im portantfeature in ETL tools. (See Illustration 14.)
Change Data Capture. To increm entally update a data w arehouse, ETL tools need to detect and
dow nload only the changes that occurred in the source system since the last load, a process
called change data capture.(Som etim es, the source system has a facility that identifies
changes since the last extract and feeds them to the ETL tool.) The ETL tool then uses SQ L
update, insert, and delete com m ands to update the target file w ith the changed data.
This process is m ore efficient than bulk refreshesw here the ETL tool com pletely rebuilds all
dim ension tables from scratch and appends new transactions to the fact table in a data w are-
house. M ore than a third (38 percent) of our respondents said that change data capture w as a
very im portant feature in ETL tools. (See Illustration 14.)
More Frequent Loads. Another w ay ETL tools can m inim ize load tim es is to extract and load
data at m ore frequent intervalsevery hour or lessexecuting m any sm all batch runs
throughout the day instead of a single large one at night or on the w eekends.
For exam ple, Intuit Inc. uses a vendor ETL tool to continuously extract data every couple of
m inutes from an O racle 11i source system and load it into the data w arehouse, according to
Intuits H atchett. H ow ever, for its large nightly loads, it uses hardw are clusters on its ETL serv-
er and load balancing to m eet its ever-shrinking batch w indow s.
O ne w ay to accom m odate continuous loads w ithout interfering w ith user queries, is for
adm inistrators to create m irror im ages of tables or data partitions. The ETL tool then loads the
m irror im age in the background w hile end users query the activetable or partition in the
foreground. O nce the load is com pleted, adm inistrators sw itch pointers from the original parti-
tion to the new ly updated one.
2
Bulk Loading. D ata integration platform s leverage native bulk load utilities to block slam data
into target databases rather than sim ply using SQ L inserts and updates. D ata integration plat-
form s can update target tables by selectively applying SQ L w ithin a bulk load operation,
som ething that tools today cant support.
26 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
2
Adm inistrators m ust m ake sure that the ETL tool or database does not update sum m ary tables in the target
database until a quiet periodusually at nightand users m ust be educated about datas volatility during
the period w hen its being continuously updated.
A Remedy to
Spreadmarts?
Full Refresh versus
Incremental Updating
Organizations Are
Loading Data More
Frequently
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
27
Dat a I nt egrat i on Pl at f orms
Near-real-time Data Capture. Finally, organizations seek ETL tools that capture and load data
into the data w arehouse in near real tim e. This trickle feedprocess is needed because busi-
ness users w ant integrated data delivered on a tim elier basis (i.e., the previous day, hour, or
m inuteso they can m ake critical tactical and operational decisions w ithout delay).
Priceline.com , for exam ple, uses m iddlew are to capture source data changes into a staging
area and then use a vendor ETL tool to load it into the data w arehouse. It has built a custom
program to feed m ultiple sessions in parallel to its vendor ETL tool, but is planning to
upgrade a new version that supports parallel processing soon.
EAI Software. As m entioned earlier in this report, m ost ETL tools today partner w ith EAI tools
to deliver near-real-tim e capabilities. The EAI tools typically use application-level interfaces to
peel off new transactions or events as soon as they are generated by the source application.
They then deliver these events to any application that needs them either im m ediately (near real
tim e), at predefined intervals (scheduled), or w hen the target application asks for them (publish
and subscribe).
Bi-directional, Peer-to-Peer Interfaces. Although EAI tools are associated w ith delivering real-
tim e data, they support a variety of delivery m ethods. Also, in contrast to ETL tools, their
application interfaces are typically bidirectional and peer-to-peer in nature, instead of unidirec-
tional and hub-and-spoke. EAI softw are flow s data and events back and forth am ong applica-
tions, each alternatively serving as sources or targets.
U nlike m ost of todays ETL tools, data integration platform s w ill blend the best of both ETL
and EAI capabilities in a single toolset. Few organizations w ant to support dual data integra-
tion platform s and w ould prefer to source ETL and EAI capabilities from a single vendor.
The upshot is that data integration platform s w ill process data in batch or near real tim e
depending on business requirem ents. They w ill also support unidirectional and bidirectional
flow s of data and events am ong disparate applications and data stores using a hub-and-spoke
or peer-to-peer architecture.
Global Meta Data Management
To optim ize and coordinate the flow of data am ong applications and data stores, data inte-
gration platform s offer global m eta data m anagem ent services. The tools autom atically docu-
m ent, exchange, and synchronize m eta data am ong various applications and data stores in a
BI environm ent.
Meta Data Repository and Interchange. To do this, the tools m aintain a global repository of
m eta data. The repository stores m eta data about ETL m appings and transform ations. It m ay
also contain relevant m eta data from upstream or dow nstream system s, such as data m odeling
and analytical tools.
Common Warehouse Metamodel. M ore im portantly, the tools exchange and synchronize m eta
data w ith other tools via a robust m eta data interchange interface, such as the O bject
M anagem ent G roups Com m on W arehouse M etam odel (CW M ), w hich is based on XM L.
According to our survey, 48 percent of respondents rated opennessand 41 percent rated
m eta data interfaceas very im portantdesign features. (See Illustration 12) Although som e
ETL vendors recently announced support for CW M and although the specifications are consid-
ered robust, its still too early to tell w hether these standards w ill gain substantial m om entum
to facilitate w idespread m eta data integration.
Trickle Feeding Data
in Near Real Time
Data Integration
Platforms Merge the
Best of ETL and EAI
Data Integration
Platforms Synchronize
Meta Data
Openness and Meta
Data Interfaces Are
Important Features
28 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Impact Analysis Reports. D ata integration platform s also generate im pact analysis reports that
analyze the dependencies am ong elem ents used by all system s in a BI environm ent. The
reports identify objects that m ay be affected by a change, such as adding a colum n, revising a
calculation, or changing a data type or attribute.
Such reports reduce the am ount of tim e and effort required to [m anage changes] and reduce
the need for custom w ork,w rote one survey respondent. M oreover, w hen appropriate, the
tools autom atically update each other w ith such changes w ithout IT intervention. (See
Illustration S5 for a sam ple im pact analysis report.) Thirty-three percent of respondents said
im pact analysis reports are a very im portantfeature.
Data Lineage. In addition, data integration platform s offer data lineagereports that describe or
depict the origins of a data elem ent or report and the business rules and transform ations that
w ere applied to it as it m oved from a source system to the data w arehouse or a data m art.
D ata lineage reports help business users understand the nature of the data they are analyzing.
SUMMARY
D ata integration platform s represent the next generation of ETL tools. These tools extend cur-
rent ETL capabilities w ith enhanced perform ance, transform ation pow er, reliability, adm inistra-
tion, and m eta data integration. The platform s also support a w ider array of source and target
system s and provide utilities to capture and load data m ore quickly and efficiently, including
supporting near-real-tim e data capture.
ETL Evaluation Criteria
Guidelines for Selecting ETL Products. It m ay take several years for m ajor ETL tools to evolve
into data integration platform s and support all the features described above. In the m eantim e,
you m ay need to select an ETL tool or reevaluate the one you are using. If so, there are
m any things to consider. The follow ing list is a detailedbut by no m eans com prehensive
list of evaluation criteria.
Illustration S5. ETL toolsshould quickly generatereportsthat clearly outlinedependenciesamongdata elementsin a BI envi-
ronment and highlight theimpact of changesin sourcesystemsor downstreamapplications.
Impact Analysis Report
Impact Analysis Reports
Save Time and Money
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
29
ETL Eval uat i on Cri t eri a
Its im portant to rem em ber that not all the criteria below m ay be applicable or im portant to
your environm ent. The key is to first understand your needs and requirem ents and then
build criteria from there. The list that follow s can help trigger ideas for im portant features
that you m ay need.
W eve tried to provide descriptive attributes to qualify the requirem ents so you can understand
the extent to w hich a tool m eets the criteria. If asked, all vendors w ill say their tool can per-
form certain functions. Its better to phrase questions in a w ay that lets you ascertain their true
level of support.
Available Resources
ETL Matrix. The evaluation list below is a sum m ary version of a m atrix docum ent that TD W I
M em bers can access at the follow ing U RL: w w w .dw -institute.com /etlm atrix/. Another set of cri-
teria is available at: w w w .clickstream dataw arehousing.com /Clickstream ETLCriteria.htm .
ETL Product Guide. If you w ould like to view a com plete list of ETL products, go to TD W Is
Marketplace Online (w w w .dw -institute.com /m arketplace) and click on the D ata
Integrationcategory, and then select the D W M apping and Transform ationsubcategory,
and related subcategories.
Courses on ETL. Finally, TD W I also offers several courses on ETL, including TD W I Data
Acquisition, TDWI Data Cleansing, and Evaluating and Selecting ETL and Data Cleansing
Tools. See our latest conference and sem inar catalogs for m ore details (w w w .dw -
institute.com /education/).
Vendor Attributes
Before exam ining tools, you need to evaluate the com pany selling the ETL tool. You w ant to
find out (1) w hether the vendor is financially stable and w ont disappear, and (2) how com -
m itted the vendor is to its ETL toolset and the ETL m arket. Asking these questions can quickly
shorten your list of vendors and the tim e spent analyzing products!
For m ore details, ask questions in the follow ing areas:
M ission and Focus
o Prim ary focus of com pany
o Percent of revenues and profits from ETL
o ETL m arket and product strategy
M aturity and Scale
o Years in business. Public or private?
o N um ber of em ployees? Salespeople? D istributors? VARS?
Financial H ealth
o Y-Y trends in revenues, profits, stock price
o Cash reserves; D ebt
Financial Indicators
o License-to-service revenues
o Percent of revenues spent on R& D
Relationships
o N um ber of distributors, O EM s, VARs
Focus on Your
Requirements
How Stable Is the
Vendor?
30 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Overall Product Considerations
Before diving in and evaluating an ETL tools features and functions, m ake sure you have the
right m indset. The first principle of tool selection is to question and verify everything. Its
never as easy as the vendor says,w rites a BI professional from a m ajor electronics retailer.
Verify everything they say dow n to the last turn of the screw before the sale is com plete and
the softw are license agreem ent is signed.
Platforms. After adopting a skeptical attitude, step back and see if the product m eets your
overall needs in key areas. Topping the list is w hether the product runs on platform s that you
have in house and w ill w ork w ith your existing tools and utilities (e.g., security, schedulers,
analytical tools, data m odeling tools, source system s, etc.).
Full Lifecycle Support. Second, find out w hether the product supports all ETL processes and
provides full lifecycle support for the developm ent, testing, and production of ETL program s.
You dont w ant a partial solution. Also, m ake sure it supports the preferred developm ent style
of your ETL developers, w hether thats procedural, object-oriented, or G U I-driven.
Requisite Performance and Functionality. Finally, m ake sure the tool provides adequate per-
form ance and supports the transform ations you need. The only w ay to validate this is to have
vendors perform a proof of concept (PO C) w ith your data. Although a PO C m ay take several
days to set up and run, it is the only w ay to really determ ine w hether the tool w ill m eet your
business and technical requirem ents and is w orth buying. You should also never pay the ven-
dor to perform a PO C or agree to purchase the product if they com plete the PO C successfully.
Its really im portant to take these tools for a test drive w ith your data,says ED S Canadas
Light. M im no adds, A proof of concept exposes how different the tools really arethis is
som ething youll never ascertain from evaluations on paper only.
Design Features
Ease of Use. M ost BI team s w ould like vendors to m ake ETL tools easier to learn and use. A
graphical developm ent environm ent can enhance the usability of an ETL product and acceler-
ate the steep learning curve. Alm ost all survey respondents (84 percent) said a visual m apping
interface, for exam ple, w as either a very im portantor fairly im portantdesign feature.
Illustration 12. Easeof useisrated asa very important design feature. Based on 746 respondents.
48%
45%
51%
40%
41%
59%
70%
34%
0 10 20 30 40 50 60 70 80
Visual mapping interface
Single logical meta data store
Meta data interfaces
Ease of use
Openness
Debugging
Meta data reports (e.g., impact analysis)
Transformation language and power
Ratings of Importance of Design, Transformation, and
Meta Data Features
You Dont Want a
Partial Solution
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
31
ETL Eval uat i on Cri t eri a
G ood ETL tools save developers tim e by letting them com bine and reuse objects, tasks, and
processes in various w orkflow s. W orkflow s are visual and interactive, letting developers focus
on one step at a tim e. W izards and cut-and-paste features also speed developm ent, as w ell as
interactive debuggers that dont require developers to w rite code.
M ore specifically, ask w hether the ETL tool provides:
An integrated, graphical developm ent environm ent
Interactive w orkflow diagram s to depict com plex flow s of jobs, processes, tasks, and data
A graphical drag-and-drop interface for m apping source and target data
The ability to com bine m ultiple jobs or tasks into a containerobject that can be visu-
ally depicted and opened up in a w orkflow diagram
Reuse of containers and custom objects in m ultiple w orkflow s and projects
A procedural or object-oriented developm ent environm ent
A scripting language to w rite custom transform ation objects and routines
Exits to call external code or routines on the sam e or different platform s
Version control w ith check-in and check-out
An audit trail that tracks changes to every process, job, task, and object
Meta Data Management Features
As the hub of a data w arehousing or data integration effort, an ETL tool is w ell positioned to
capture and m anage m eta data from a m ultiplicity of tools. Today, m ost ETL tools autom atical-
ly docum ent inform ation about their developm ent and runtim e processes in a m eta data
repository (e.g., relational tables or a proprietary engine) w hich can be accessed via query
tools, a docum ented API, or a W eb brow ser.
Today, the best ETL tools im port and export technical and business m eta data w ith targeted
upstream and dow nstream system s in the BI environm ent. They also generate im pact analysis
and data lineage reports across these tools. M ost should be w orking on w ays to autom ate the
exchange of m eta data using CW M and W eb Services standards and keep heterogeneous BI
system s in synch.
For m ore details on m eta data, ask w hether an ETL product provides:
Self-docum enting m eta data for all design and runtim e processes
Support for technical, business, and operational m eta data
A rich repository for storing m eta data
A hierarchical repository that can m anage and synchronize m ultiple local or globally
distributed m eta data repositories
The ability to reverse-engineer source m eta data, including flat files and spreadsheets
A robust interface to interchange m eta data w ith third-party tools
Robust im pact analysis reports
D ata lineage reports that show the origin and derivation of an object
Transformation Features
Extensible, Reusable, Context-Independent Objects. The best ETL tools provide a rich library of
transform ation objects that can be extended and com bined to create new , context-independ-
ent, reusable objects. The tools also provide an internal scripting language and transparent
support for calls to external routines.
Evaluate the Reality
versus Promise of
a Vendors Meta
Data Strategy
32 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
To learn about transform ation features, ask w hether the tool provides:
A rich library of base transform ation objects
The ability to extend base objects
Context-independent objects or containers
Record- and set-level processing logic and transform s
G eneration of surrogate keys in m em ory in a single pass
Support for m ultiple types of transform ation and m apping functions
The ability to create tem porary files, if needed for com plex transform ations
The ability to perform recursive processing or loops
Data Quality Features
The best ETL tools provide data profiling facilities, and can spaw n data cleansing routines
from w ithin ETL w orkflow s. To get details on data profiling and cleansing facilities, ask
w hether the ETL product:
Provides internal or third-party data profiling and cleansing tools
Integrates cleansing tasks into visual w orkflow s and diagram s
Enables profiling, cleansing and ETL tools to exchange data and m eta data
Autom atically generates rules for ETL tools to build m appings that detect and fix data defects
Profiles form ats, dependencies, and values of source data files
Parses and standardizes records and fields to a com m on form at
Verifies record and field contents against external reference data
M atches and consolidates file records
Performance Features
Perform ance features received som e of the highest ratings in our survey. (See Illustration 13.)
As the volum e of data increases, batch w indow s shrink, and users require m ore tim ely data,
team s are looking to ETL tools to m ake up the difference.
The best ETL tools offer a parallel processing engine that runs on a high-perform ance com put-
ing server. It leverages threads, clusters, load balancing, and other facilities to speed through-
put and ensure adequate scalability.
Illustration 13. Reliability and performance/throughput wererated asvery important featuresby morethan 80 percent of
750 survey respondents.
70%
56%
70%
22%
34%
81%
86%
Parallel processing
Real-time data capture
Incremental update
Reliability
Availability
Scalability
Performance and throughput
0 20 40 60 80 100
Important Performance Features: Rating
To learn m ore about perform ance features, ask w hether the product provides:
A flexible, high-perform ance w ay to process and transform source data
Intelligent m anagem ent of aggregate/sum m ary tables
o Bundled w ith product or extra priced option
H igh-speed processing using m ultiple concurrent w orkflow s, m ultiple hardw are proces-
sors, and system s threads and clustered servers
In-m em ory cache to avoid creating interm ediate tem porary files
Linear perform ance im provem ent w hen adding processor and m em ory
Extract and Capture Features
Leading ETL tools provide native support for a w ide range of source databases, a top require-
m ent am ong our survey respondents. (See Illustration 14.) The low est com m on denom inator is
support for O D BC/JD BC, but m ake sure the tool can directly access the sources you support,
preferably at no additional cost and w ithout the use of a third-party gatew ay.
U sers also em phasized the im portance of extracting and integrating data from m ultiple source
system s. The ability to use an ETL tool to extract data from three sources and com bine them
together is w orth the effort com pared to w riting the routine in som ething like PL/SQ L,says
Am erican Eagles K oscinski.
To understand extract capabilities, ask w hether the tool provides:
The ability to schedule extracts by tim e, interval, or event.
Robust adapters that extract data from various sources
A developm ent fram ew ork for creating custom data adapters
A robust set of rules for selecting source data
Selection, m apping, and m erging of records from m ultiple source files
A facility to capture only changes in source files.
Load and Update Features
To understand load and update features, ask w hether the product can:
Load data into m ultiple types of target system s
Load data into heterogeneous target system s in parallel
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
33
ETL Eval uat i on Cri t eri a
Illustration 14. Userswant an ETL tool tosupport a widerangeof sources. Based on 745 respondentswhorated theabove
itemsasvery important.
34%
33%
36%
12%
22%
38%
54%
Support for native load utilities
Web services support
Real-time data capture
Breadth of sources
Incremental update
Breadth of targets supported
Change data capture
0 10 20 30 40 50 60
Extract and Load Features: User Ratings
Support data partitions on the target
Both refresh and append data in the target database
Increm entally update target data
Support high-speed load utilities via an API
Turn off referential integrity and drop indexes, if desired
Take a snapshot of data after a load for recovery purposes, if desired
Autom atically generate D D L to create tables
Operate and Administer Component
The ultim ate test of an ETL product is how w ell it runs the program s that ETL developers design,
and how w ell it recovers from errors. Thus, m onitoring and m anaging the ETL runtim e environ-
m ent is a critical requirem ent. O ur survey respondents rated error reporting and recoveryespe-
cially high (79 percent) w ith schedulingnot far behind (66 percent). (See Illustration 15.)
For detailed inform ation on adm inistrative features, ask w hether the product provides:
A visual console for m anaging and m onitoring ETL processes
A robust, graphical scheduler to define job schedules and dependencies
The ability to validate jobs before running them .
A com m and line or application interface to control and run jobs
Robust graphical m onitoring that displays in near real tim e:
The ability to restart from checkpoints w ithout losing data
M API-com pliant notification of errors and job statistics
Built-in logic for defining and m anaging rejected records
Robust set of easy-to-read, useful adm inistrative reports
A detailed job or audit log that records all activity and events in the job:
Rigorous security controls, including links to LD AP and other directories
Integrated Product Suites
M any vendors today bundle ETL tools w ithin a business intelligence or data m anagem ent
suite. Their goal is to ensure a suiteor com plete, integrated solutionw ill provide m ore
value and be m ore attractive to m ore organizations.
34 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Illustration 15. Error reportingleadsall administration features, accordingto745 respondentswhorated theaboveitemsas
very important.
40%
18%
51%
18%
66%
79%
Single operations console
Graphical monitoring
Error reporting and recovery
Single logical meta data store
Debugging
Scheduling
0 10 20 30 40 50 60 70 80
Rating of Administration Features
To drill into add-on products, ask:
Is the package geared to a specific source application or is it m ore generic?
H ow integrated is the ETL tool w ith other tools in the suite?
o D o they have a com m on look and feel?
o D o they share m eta data in an autom ated, synchronized fashion?
o D o they share a com m on install and m anagem ent system ?
Company Services and Pricing
As m entioned earlier in this report, m any user organizations find the price of ETL tools too
high. Alm ost three-quarters of survey respondents (71 percent) consider pricing am ong the top
three factors affecting their decision to buy a vendor ETL tool. (See Illustration 16.)
Bundles versus Components. U sers pay close attention to vendor packaging options. Vendors
that offer all-in-one packages at high list prices can often price them selves out of contention.
Vendors that break their packages into com ponents that enable users to buy only w hat they
need are m ore attractive to users w ho w ant to m inim ize their initial cost outlays. H ow ever,
runtim e charges can ruin the equation if a vendor charges the custom er each tim e a com po-
nent runs on a different platform .
M ore specifically, ask about:
Pricing for a four CPU system on W indow s and U N IX
Annual m aintenance fees
Level of support, training, consulting, and docum entation provided
Pricing for add-on softw are and interfaces required to run the product
Annual user conferences
Cost of m oving from a single server to m ultiple servers
U ser group available
O nline discussion groups
THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
35
ETL Eval uat i on Cri t eri a
Illustration 16. Pricingisa significant factor affectingpurchasingdecisions. Based on 621 respondents.
A lot. It's a top factor. (10%)
Significantly. Among the top 3 factors. (61%)
Somewhat. An important but not critical factor. (26%)
61%
26%
4%
10%
Not much. It is a requirement for doing business. (4%)
How Does Pricing Affect Your ETL Purchasing Decision?
Conclusion
ETL tools are the traffic cop for business intelligence applications. They control the flow of
data betw een m yriad source system s and BI applications. As BI environm ents expand and
grow m ore com plex, ETL tools need to change to keep pace.
Next Generation Functionality. Today, organizations need to m ove m ore data from m ore sources
m ore quickly into a variety of distributed BI applications. To m anage this data flow , ETL tools
need to evolve from batch-oriented, single-threaded processing that extracts and loads data in
bulk to continuous, parallel processes that capture data in near real tim e. They also need to pro-
vide enhanced adm inistration and ensure reliable operations and high availability.
As the traffic cop of the BI environm ent, they need to connect to a w ider variety of system s and
data stores (notably XM L, W eb Services-based data sources, and spreadsheets) and coordinate
the exchange of inform ation am ong these system s via a global m eta data m anagem ent system .
To sim plify and speed the developm ent of com plex designs, the tools need a visual w ork envi-
ronm ent that allow s developers to create and reuse custom transform ation objects.
Finally, ETL tools need to deliver a larger part of the BI solution by integrating data cleansing
and profiling capabilities, EAI functionality, toolkits for building custom adapters, and possibly
reporting and analysis tools and analytic applications.
This is a huge list of new functionality. The sum total represents a next-generation ETL prod-
uctor a data integration platform . It w ill take ETL vendors several years to extend their
products to support these areas, but m any are w ell on their w ay.
Focus on Your Business Requirements. The key is to understand your current and future ETL
processing needs and then identify the product features and functions that support those
needs. W ith this know ledge, you can then identify one or m ore products that can adequately
m eet your ETL and BI needs.
36 THEDATA WAREHOUSING INSTITUTE
www.dw-institute.com
Evaluating ETL and Data Integration Platforms
ETL Tools Need to
Keep Pace
It Will take ETL
Vendors Several
Years to Deliver Data
Integration Platforms
Concl usi on
Business Objects
3030Orchard Parkway
San J ose, CA 95134
408.953.6000
Web: www.businessobjects.com
Business Objects is the world's leading provider of business
intelligence (BI) solutions. Business intelligence lets organi-
zations access, analyze, and share information internally
with employees and externally with customers, suppliers,
and partners. It helps organizations improve operational
efficiency, build profitable customer relationships, and
develop differentiated product offerings.
The company's products include data integration tools, the
industry's leading integrated business intelligence platform,
and a suite of enterprise analytic applications. Business
Objects is the first to offer a complete BI solution that is
composed of best-of-breed components, giving organiza-
tions the means to deploy end-to-end BI to the enterprise,
from data extraction to analytic applications.
Business Objects has more than 17,000 customers in over
80 countries.The company's stock is publicly traded under
the ticker symbols NASDAQ:BOBJ and Euronext Paris
(Euroclear code 12074). It is included in the SBF 120 and IT
CAC 50 French stock market indexes.
DataMirror
3100Steeles Avenue East,
Suite 1100
Markham, Ontario L3R 8T3
Canada
905.415.0310
Fax: 905.415.0340
Email: info@datamirror.com
Web: www.datamirror.com
DataMirror (Nasdaq:DMCX;TSX:DMC), a leading
provider of enterprise application integration and resiliency
solutions, gives companies the power to manage, monitor,
and protect their corporate data in real time. DataMirrors
comprehensive family of LiveBusiness solutions enables
customers to easily and cost effectively capture, transform,
and flow data throughout the enterprise. DataMirror
unlocks the experience of now by providing the instant
data access, integration, and availability companies require
today across all computers in their business.
1,700 companies have gone live with DataMirror software,
including Debenhams, Energis, GMAC Commercial
Mortgage, the London Stock Exchange, OshKosh BGosh,
Priority Health,Tiffany & Co., and Union Pacific Railroad.
Hummingbird Ltd.
1Sparks Avenue
Toronto, Ontario M2H 2W1
Canada
Email: getinfo@hummingbird.com
Web: www.hummingbird.com
Headquartered in Toronto, Canada, Hummingbird Ltd.
(NASDAQ: HUMC,TSE: HUM), is a global enterprise
software company employing 1,300 people in nearly 40
offices around the world. Hummingbirds revolutionary
Hummingbird Enterprise, an integrated information and
knowledge management solution suite, manages the entire
lifecycle of information and knowledge assets.
Hummingbird Enterprise creates a 360-degree view of all
enterprise content, both structured and unstructured data,
with a portfolio of products that are both modular and
interoperable.Today, five million users rely on
Hummingbird to connect, manage, access, publish, and
search their enterprise content. For more information,
please visit: http://www.hummingbird.com.
Informatica Corporation
Headquarters
2100Seaport Boulevard
Redwood City, CA 94063
650.385.5000
Toll-free U.S.: 800.653.3871
Fax: 650.385.5500
Informatica offers the industry's only integrated business
analytics suite, including the industry-leading enterprise data
integration platform, a suite of packaged and domain-specific
data warehouses, the only Internet-based business intelli-
gence solution, and market-leading analytic applications.The
Market Leading Data Integration Solution comprise of
Informatica PowerCenter, PowerMart, and Informatica
PowerConnect(tm) products, the industry's leading data
integration platform helps companies fully leverage and
move data from virtually any corporate system into data
warehouses, operational data stores, staging areas, or other
analytical environment. Offering real-time performance,
scalability, and extensibility, the Informatica data integration
platform can handle the challenging and unique analytic
requirements of even the largest enterprises.
SPONSORS
The Data Warehousing Institute
5200Southcenter Blvd., Suite 250
Seattle, WA 98188
Local: 206.246.5059
Fax: 206.246.5952
Email:info@dw-institute.com
Web: www.dw-institute.com
Membership
As the data warehousing and business intelligence field continues to evolve and develop,
it is necessary for information technology professionals to connect and interact with one
another. TDWI provides these professionals with the opportunity to learn from each
other, network, share ideas, and respond as a collective whole to the challenges and
opportunities in the data warehousing and BI industry.
Through Membership with TDWI, these professionals make positive contributions to the
industry and advance their professional development. TDWI Members benefit through
increased knowledge of all the hottest trends in data warehousing and BI, which makes
TDWI Members some of the most valuable professionals in the industry. TDWI Members
are able to avoid common pitfalls, quickly learn data warehousing and BI fundamentals,
and network with peers and industry experts to give their projects and companies a
competitive edge in deploying data warehousing and BI solutions.
TDWI Membership includes more than 4,000 Members who are data warehousing
and information technology (IT) professionals from Fortune 1000 corporations,
consulting organizations, and governments in 45 countries. Benefits to Members
from TDWI include:

Quarterly Business Intelligence Journal

Biweekly TDWI FlashPoint electronic bulletin

Quarterly TDWI Member Newsletter

Annual Data Warehousing Salary, Roles, and Responsibilities Report

Quarterly Ten Mistakes to Avoid series

TDWI Best Practices Awards summaries

Semiannual What Works: Best Practices in Business Intelligence and


Data Warehousing corporate case study compendium

TDWIs Marketplace Online comprehensive product and service guide

Annual technology poster

Periodic research report summaries

Special discounts on all conferences and seminars

Fifteen-percent discount on all publications and merchandise


Membership with TDWI is available to all data warehousing, BI, and IT professionals
for an annual fee of $245 ($295 outside the U.S.). TDWI also offers a Corporate
Membership for organizations that register 5 or more individuals as TDWI Members.

You might also like