You are on page 1of 8

© 2002 Giga Information Group, Inc.

Copyright and Material Usage Guidelines


May 3, 2002

Criteria for Selection: ETL Technology, Part 1


Lou Agosta

Giga Position
Important evaluation criteria to consider when choosing extraction, transformation and loading (ETL)
products include the following:

•= Usability
•= Transformations
•= Metadata integration
•= Performance (proven scalability)
•= Interoperability with other tools (especially data quality)
•= Diversity of execution platforms
•= Diversity of data sources accessed
•= Price
•= Vendor service and support

In spite of a convergence of functionality, vendor implementations of these features continue to be diverse


and variable, and a careful analysis of requirements can provide the basis for a decision about which product
to acquire. This Planning Assumption discusses these evaluation criteria and provides guidance for selection
of an ETL product, specifically products from Informatica, Ascential and SAS. The basic features of the
market for ETL technology were reviewed in detail in previous Giga research (see Planning Assumption,
Market Overview Update: ETL, Lou Agosta).

Proof/Notes
This research is based on conversations with and written survey responses from vendors and the users of ETL
tools (developers and managers) conducted during the fourth quarter of 2001 and the first quarter of 2002.
(All the quotations are direct end-user comments, although proper names have been edited out to render the
statements anonymous.) The criteria of interoperability, diversity of execution platforms, diversity of data
sources accessed and pricing details are summarized in the table, Four Criteria for ETL Tools: Ascential,
Informatica and SAS WA, at the end of this Planning Assumption.

Usability
Informatica transformations are presented graphically to the user via the Designer client tool.
Transformations are selectable from a toolbar or through menus. Implementation wizards and/or context-
sensitive help are available for each transformation. One Informatica PowerCenter 5.1 manager told Giga,
“PowerCenter’s overall process and environment is excellent. It is really easy to read in a COBOL layout to
create a data source. … PowerCenter’s design is excellent and is very easy for non-programmers to
understand and use.” The manager praised the tool’s graphic interface, which allows click-and-drag
movement from source column to target column, as well as the special transformations, filters and
aggregators, which can be dropped into the data flow very easily. “Once the map has been created and

Planning Assumption ♦ Criteria for Selection: ETL Technology, Part 1


RPA-052002-00004
© 2002 Giga Information Group, Inc.
All rights reserved. Reproduction or redistribution in any form without the prior permission of Giga Information Group is expressly prohibited. This information
is provided on an “as is” basis and without express or implied warranties. Although this information is believed to be accurate at the time of publication, Giga
Information Group cannot and does not warrant the accuracy, completeness or suitability of this information or that the information is correct.
Criteria for Selection: ETL Technology, Part 1 ♦ Lou Agosta

validated in the designer component, it can be run from the server manager. The server manager is where the
actual data source/target names are added when a session is created,” the manager said. The manager’s one
criticism centered on the tool’s inability to write to different target types during a single session, e.g., it is
impossible to target a table in a database and a file or table in different databases concurrently. Informatica
states it will be supporting heterogeneous targets in the next release of PowerCenter.

Ascential DataStage maintains a single, top-down design paradigm and a single GUI regardless of where the
resulting processing will actually occur. Data transformations are presented to the developer/user through the
DataStage Designer Canvas and can also be viewed through the metadata management component.
DataStage concentrates on the ETL viewpoint of the data integration process, while the metadata
management functionality shows cross-tool information, including schema design tools, ETL tools and
reporting/query tools. Comments from DataStage developers were generally positive about the usability of
the design workstation.

SAS Warehouse Administrator (WA) guides the developer through the interface to set up data sources, targets
and transformation process, and provides a single point of control. One SAS WA user applauded the tool’s
ability to support multiple platform source to target conversion, as well as its code generation to cut
development time and its ease in showing the true complexity of the existing environment to non-technical
users by showing the diagram of source to target. The weaknesses cited, however, included the time it takes
to find and correct coding or logic errors across steps where the coding is repeated in more than one step. “It
is faster to use a search utility in our non-warehouse environment to do this task today, and this creates
frustration for the developers who use the warehouse,” the user said.

Transformations
Informatica PowerCenter comes bundled with a set of 80 transformation functions that are built into the
product. In an apparent knock at Informatica and Oracle Warehouse Builder — OWB (see Planning
Assumption, Criteria for Selection: ETL Technology, Part 2, Lou Agosta), data is not staged in a database
of any kind or for any purpose (e.g., run transformation scripts, hashing, lookups, aggregates), eliminating
processing such as those imposed from hashing algorithms. While Informatica PowerCenter is not a code
generator, it does have capabilities for defining business rules as part of its metadata management
capabilities. Those rules can be used to manage sessions, transformations, mappings, etc. Finally,
Informatica publishes an open metadata exchange format-MX2 API, which allows users to import data
modeling and business intelligence (BI) tools.

The Ascential DataStage Basic Data Transformation language (similar to Visual Basic) allows for highly
complex transforms to be developed if a suitable one cannot be found among the 300-plus transforms that
ship as standard with the product or those available that have been developed by other developers/consultants
or users. The entire data integration process can be fully achieved within the DataStage Designer
environment, ranging from simple to highly complex transformations and including every step of the
development process.

When SAS’ use of its own Base SAS statistics functions is included, SAS has 11,000 different
transformations from which to choose. SAS WA generates SAS procedural code “under the covers” and is
generally classified as a “code-generating tool” as are OWB and DataStage XE/390 (DataStage XE is an
engine). According to one SAS user with whom Giga has spoken, SAS suffers from an abundance of
functionality: “SAS is easy to get ‘up and running’ on quickly. However, there is so much functionality that it
can take many years to feel that one has mastered it completely.” Thus, finding the best solution is sometimes
difficult because SAS provides so many ways to tackle a problem, the user said. Another respondent faulted
SAS’s rigidity when it comes to making changes dynamically. “Every time we need to modify the column or
columns, we need to create two tables on the original and one on the new one. This requires twice the
storage,” the user said.

Planning Assumption ♦ RPA-052002-00004 ♦ www.gigaweb.com


© 2002 Giga Information Group, Inc.
Page 2 of 8
Criteria for Selection: ETL Technology, Part 1 ♦ Lou Agosta

Interoperability
See the table below for details on the interoperability of Informatica, Ascential and SAS WA with other tools.

Ascential acquired data quality vendor Vality in April 2002 (see IdeaByte, Ascential Validates Data Quality
With Vality Acquisition, Lou Agosta).

SAS is using DataFlux technology in SAS software and has purchased the company; it now provides an
interface to it through BlueFusion, the data quality software development kit (SDK) (see IdeaByte, DataFlux
Agrees to Be Acquired by SAS Institute, Lou Agosta).

Metadata
Informatica has developed an object-based metamodel of its repository using the Unified Modeling Language
(UML) standard. This metamodel was developed in cooperation with the Open Information Model (OIM)
standard from Microsoft and the Meta Data Coalition (MDC), which is a participant in the Common
Warehouse Metamodel (CWM) of the Object Management Group (OMG). Informatica is now a member of
the OMG and is focusing on utilizing the Extensible Markup Language (XML) Metadata Interchange (XMI)
standard for exchanging metadata with external applications. Informatica is also planning to support user-
defined extensions for the metadata objects in its repository to provide a more open and extensible
architecture for metadata integration and management within and across enterprises. Informatica metadata is
stored in a relational database with the user’s choice of Oracle, Sybase, Informix, MS SQL Server or DB2
UDB. Informatica reportedly has patented technology to allow the creation of a hub and spoke data
warehouse deployment so that metadata can be distributed globally to any spoke of the hub and spoke
distributed data warehouse. Technical and business metadata are stored in various tables for various
repository objects, such as source, target, transformation, mapping, etc. Informatica has developed a complete
set of XML document type definition (DTD) rules for validating and exchanging metadata in its repository.
The DTD can be used by the Informatica client tools as well as the Informatica Metadata Exchange API to
import and export metadata in XML files.

Ascential DataStage XE metadata management supports a full complement of business and technical
metadata across the information asset management spectrum, including, but not limited to, data modeling,
ETL design, ETL processing, BI and online analytical processing (OLAP). When ETL and data
transformation metadata is imported to DataStage XE, it is displayed in impact analysis and data lineage
diagrams that connect metadata from modeling tools and BI tools to the ETL metadata. DataStage XE
provides end-to-end metadata management for the enterprise. Metadata integration is provided through
semantic metadata integration via a logical integration architecture, bidirectional metadata translation among
tools through DataStage XE MetaBrokers, and integration of design and event metadata for ETL data.
Metadata analysis supports cross-tool impact analysis to manage change across the whole environment — not
just for ETL, data lineage analysis to determine when and how data assets populate warehouses and marts,
and built-in or customizable metadata queries and reports. Metadata sharing and reuse is provided —
metadata reuse via a publish-and-subscribe model. Metadata can be defined once and reused throughout the
suite of integrated tools and automatic notification to subscribers when changes occur. Metadata delivery
occurs via online documentation of any collection of metadata to .html, .xml, .rtf, .txt and .csv formats and
automatic metadata propagation to DataStage XE Portal Edition. Ascential MetaBrokers provide metadata
import/export to DataStage XE for data modeling tools, BI/OLAP tools and other tools on a customized
basis. Current MetaBrokers include: ERwin, PowerDesigner, Oracle Designer 2.1.2 and 6i, ER/Studio,
DataStage, Cognos Impromptu, Business Objects, Brio, MicroStrategy, Hyperion Essbase and a
MetaBroker View for the CWM model.

SAS WA captures the business and technical metadata from the ETL process. The existing metadata
repository supports the use of other databases, such as Oracle. Many instances can be cited using another
RDBMS, e.g., Oracle. SAS is a participant in the CWM. In addition, SAS partners with Meta Integration
Technologies Inc. to bridge WA with some 50 metadata products. Version 3.0 — planned for the third

Planning Assumption ♦ RPA-052002-00004 ♦ www.gigaweb.com


© 2002 Giga Information Group, Inc.
Page 3 of 8
Criteria for Selection: ETL Technology, Part 1 ♦ Lou Agosta

quarter of 2002 — includes a complete rewrite of WA. A Java-based thin client will support large-scale
development (multi-user development, better version control and change management capabilities and
significant GUI enhancements). Both technical and business rules are supported by the product. Business
rules are typically SQL based, and technical process metadata is stored as hierarchical. Release 2.2 of
SAS/Warehouse Administrator offers a process automation capability that uses metadata to navigate and
query warehouse objects. This new feature also uses metadata to automatically enable other ETL processes
and to drive reporting applications, such as SAS’ Enterprise Reporter software.

Performance (Proven Scalability)


The Informatica PowerCenter Server engines are multi-threaded and exploit a pipelined architecture. They
take advantage of the added performance of symmetrical multiprocessing (SMP) platforms to scale to
enterprise demands. The multi-threaded engine allows multiple jobs to run concurrently, while the pipelined
architecture allows the reading, transformation and writing processes of each job to execute concurrently. The
engine is tunable with features such as processor/memory optimization, one-pass source linkage, high target-
driver (non-SQL) performance, caching lookup tables in memory and ability to optimize select statements.
Giga spoke with one user of PowerCenter who reported that aggregators, joiners and lookups are memory
intensive, consume system resources and slow down sessions. However, other users have reported inserting
hundreds of rows per second into Oracle databases on Sun hardware. Another PowerCenter user told Giga
that the amount of data the engine can transform per hour depends greatly on the amount of work that is
required and the length of each row and how busy the server is. Some straightforward mappings process in
excess of 3,000 rows per second, the user said; others that require decimal precision and many
transformations on each row can run as slow as 50 rows per second. “When we started, we only ran
PowerCenter on one server and processed only a few hundred megabytes of data daily. We’ve been able to
increase our processing power, and we’ve seen PowerCenter take advantage of those resources quite
successfully,” the user said.

Ascential acquired Torrent Systems in November 2001, providing DataStage with the performance options of
extended parallelism. Leveraging the Torrent parallel processing technology for very high data volume and/or
short batch processing windows on SMP, cluster SMP and massively parallel processing (MPP) platforms
extends DataStage XE’s ability to scale into very large data integration projects (see IdeaByte, The Case for
an ETL Benchmark, Lou Agosta).

According to one operator of SAS ETL, “(Our) initial experience was disappointing, but when this
application was embedded in SAS as data quality — cleanse solution — the performance has been much
better.” Because SAS is not a multithreaded application, MPConnect and SPDS are needed to truly exploit
the operating environment, the user reported. “We have increased the throughput from 10 records per second
to about 500 records per second. The solution is highly scalable and can handle files big files (even up to
300GB per file),” the user said. SAS states it has partnered with Platform Computing to leverage its
distributed resource management (DRM) capability. DRM is incorporated into Platform’s JobScheduler to
effectively assign resources for job management. It identifies and allocates resources to manage performance.

Vendor Service and Support


Informatica owners and users expressed satisfaction with the level of service, though some caveats showed
up. A typical report complimented Informatica on dramatic improvements in PowerCenter support in the past
few years. “We are very satisfied with the help we receive from them. They’ve also added significant training
classes and enhanced their documentation to make getting the most out of the product easier,” one user said.
However, users also pointed out that Informatica could benefit from improved communication with
customers, e.g., keeping customers consistently “in the loop.” In one example known to Giga, an early
version of CA Advantage Data Transformer beat out Informatica because neither tool had an out-of-the-box
transformation needed to perform round robin aggregation, and the CA consultants stopped arguing about the
need to use a predefined transformation and won the account by coding the solution as a reusable transform
in the proprietary CA ADT scripting language.

Planning Assumption ♦ RPA-052002-00004 ♦ www.gigaweb.com


© 2002 Giga Information Group, Inc.
Page 4 of 8
Criteria for Selection: ETL Technology, Part 1 ♦ Lou Agosta

Ascential received the following responses from owners and users of DataStage: One typical comment
praised the company’s response to its customers, “We chose DataStage through an RFP process in 2000 and
have experienced mixed results. None of the tools are really there yet, and with our requirement to support
bilingual data, we could not look at certain prominent tools that simply did not meet that mandatory
requirement. Ascential appears to be listening to their customers and is incorporating suggested changes to
their tool, and it is maturing well.” Another client commented, “I have worked with 30 software vendors over
seven years; and the service we have from Ascential is the best I have ever received.” The conclusion is that
Ascential has some very satisfied clients.

SAS received the highest compliments of all the ETL vendors about whom Giga has conversations with
clients. One client said, “SAS has one of the best tech support systems. Most of the issues are resolved in less
than 24 hours. SAS has significantly subsidized its cost for taking SAS classes at its training facilities,
encouraging companies to send their people for highly adaptive classroom instructions.” Another client
enthused, “SAS’ product support is, in my opinion, unparalleled. Most problems are resolved on the phone
immediately, some within the same day. Rarely, it has taken two to three days. … Training is the best I have
ever seen, bar none.” The client praised the training course manuals that allowed the company to save
training budget money by sending just one person to a class and using the manual to train the rest of the staff.
“We feel that this gets us at least 90 percent of taking the course in person,” the client said. Another client
lauded the course materials, and SAS’ “outstanding” technical support and professional services for
facilitating the transition in using SAS Warehouse Administrator.

Alternative View
The ETL market has been described as a mature market. Even if that is so, it is a mature market about to
experience two discontinuities. The first of those discontinuities is the dawning appreciation at the high-
volume end that a hub and spoke (“data hub”) architecture is orders of magnitude more efficient and
manageable than point-to-point solutions as provided by individual ETL tools. In spite of a certain weakness
in metadata support, the enterprise application integration (EAI) vendors (e.g., SeeBeyond, NEON, TIBCO,
MQSeries) will seize the high ground and make real-time data warehousing the new paradigm. Significant
consulting services will be combined with powerful data hub architecture to provide efficient and flexible
many-to-many data integration solutions. At the medium and low volume end of the market, it is the
realization that Microsoft DTS will succeed in transforming the market into a commodity one. This is
especially true when combined with improved scalability, reliability and availability of NT and the successful
proliferation of the CWM standard among allied tool vendors in the design, development and operational
processes markets. The net result will be a solution to the problem of system interoperability with
significantly reduced implementation time and measurably improved total economic impact.

Findings
Informatica PowerCenter has acquired its best-of-breed reputation by providing comprehensive integration to
virtually all data sources. In conjunction with Informatica PowerConnect products, Informatica PowerCenter
integrates enterprise, relational and open data sources, such as enterprise resource planning (ERP), customer
relationship management (CRM), procurement, XML, real-time messaging, clickstream, mainframe, AS/400
and legacy systems (see Planning Assumption, Market Overview Update: ETL, Lou Agosta).

Although Ascential’s market share has slipped in 2001 due to management distractions with the Informix
divestiture, DataStage XE still deserves its appellation as a “best-of-breed ETL tool.” Ascential’s acquisitions
of Torrent (parallel performance), Vality (data quality) and MetaRecon (data profiling) put it in a strong
position to leapfrog the competition via software integration, a competence in which Ascential has
demonstrated results. During the next six to 12 months, these technologies have the potential to further
differentiate the capabilities of these two approaches as Ascential further integrates data quality and
performance into the core of DataStage. Ascential was also profitable in 2001, whereas Informatica was not,
with analytic applications reportedly accounting 8 percent of revenues and 30 percent of costs.

Planning Assumption ♦ RPA-052002-00004 ♦ www.gigaweb.com


© 2002 Giga Information Group, Inc.
Page 5 of 8
Criteria for Selection: ETL Technology, Part 1 ♦ Lou Agosta

SAS has a market penetration that is second to none of the ETL vendors discussed in this series. SAS
products are used at more than 38,000 sites — including 99 of the top 100 businesses on the Fortune 500 —
to analyze and leverage relationships with customers and suppliers, to substitute information for inventory
and to enable end-to-end business intelligence applications. SAS has a very strong end-to-end solution that
integrates leading data warehousing, analytics and traditional BI applications to create intelligence from
massive amounts of data. The SAS Warehouse Administrator product will receive new life from a refresh
scheduled in the second quarter of 2002, though additional time will be useful to determine the success of
this timely new release. When questioned about the level of service provided to users of SAS WA, clients
consistently praise the vendor for superior responsiveness and service, a reply that provides the basis for
understanding the SAS loyalty effect.

SAS WA offers the richest set of transformation options of the three contenders. Indeed the options are so
varied and complex that one end user found the sheer possibilities to be a liability. However, the likelihood is
this is an issue to be addressed by proper training and on-the-job acquisition of experience. Limitations also
exist in terms of SAS’ proprietary data format. So, a trade-off definitely exists between rich functionality and
the relative lack of openness. Many power users will choose the former to empower the deep analysis needed
for their complex analytic applications.

Recommendations
Ascential and Informatica are the two top best-of-breed contenders in the market. Until recently their market
share was neck and neck, though Informatica has now pulled ahead. In regard to the technology, the
competition is close and intense. The vendor drama should not distract users from the solid capabilities of
each of these choices. If a client wants to obtain data profiling, data quality and parallel processing
technology from the same source, then Ascential’s recent acquisitions arguably provide it with an edge.
Clients should use the intense competition between these two leading best-of-breed contenders to bargain for
concessions such as additional training, premium support, price discounts or additional functionality.

SAS Warehouse Administrator belongs on the short list of installations with significant SAS expertise. SAS
has a vast installed base and its clients that operate an end-to-end SAS solution, including the proprietary data
server, are among the most satisfied of its clients, though not all the components are best-of-breed. While the
current version SAS WA 2.2 is due for a refresh, the good news is that one is shipping in the second quarter
of 2002, though some lead time will be needed to see how the market judges the result. SAS is second to
none in terms of service and customer support and is clearly differentiated by its ability to get to know its
clients and build long-term win-win relations with them.

References
Related Giga Research
Planning Assumptions
Criteria for Selection: ETL Technology, Part 2, Lou Agosta
Market Overview Update: ETL, Lou Agosta
Market Overview: ETL in Transition, Lou Agosta
Emerging Internet Data Integration Solutions, Mike Gilpin

IdeaBytes
Ascential Validates Data Quality With Vality Acquisition, Lou Agosta
Oracle Warehouse Builder Offers Study in Constraints and Value, Lou Agosta

Planning Assumption ♦ RPA-052002-00004 ♦ www.gigaweb.com


© 2002 Giga Information Group, Inc.
Page 6 of 8
Criteria for Selection: ETL Technology, Part 1 ♦ Lou Agosta

Sunopsis: Another Extract, Transform and Load Product With Enterprise Application Integration Aspirations,
Henry Peyret

Four Criteria for ETL Tools: Ascential, Informatica and SAS WA


Execution
Vendor Interoperability Data Sources Pricing
Platforms
Data quality: Vality IBM DB2 UDB, Windows NT, DataStage XE
(ASC owned), Trillium, DB2/OS390, 2000 5.0 $180,000
FirstLogic DB2/AS400, Oracle 8i,
Oracle 8, Oracle 7, Unix: Sun Additional
Design: ERwin, Oracle Express, Informix Solaris, IBM components are
PowerDesigner, Oracle XPS, Sybase, MS OLE- AIX, HP-UX, priced and
Designer 2.1.2 and 6i, DB, SQL Server, Tru 64 and packaged
ER/Studio PeopleSoft, Siebel, Linux separately.
SAP, XML, Universe,
BI: Cognos Impromptu, Unidata, text files (fixed, Mainframe:
Business Objects, Brio, delimited, etc.), complex OS/390
Ascential
MicroStrategy, flat files, RedBrick,
Hyperion Essbase and NCR/TeraData,
others MQSeries, POP3 Web
Logs, EDA, Adabas,
Parallel processing: change data capture
Torrent Orchestrate
(ASC owned)

MetaBroker View for


the CWM model

Data Quality: Trillium, IBM DB2, Informix, MS IBM AIX, PowerCenter


Evoke and First Logic SQL Server, NCR HP-UX, Sun starts at $93,500.
Teradata, Oracle, Solaris,
BI: Business Objects, Sybase, Flat Files, IMS, Compaq Informatica
Brio and MicroStrategy VSAM, MS Access, Tru64, MS PowerMart starts
ODBC PowerConnect: Windows NT at $60,500 on NT
SAP, PeopleSoft, Server and $88,000 on
Siebel, IBM MQSeries, Unix.
Informatica mainframe and AS/400,
XML, real-time
messaging, clickstream

PowerBridge: Hyperion
Essbase

PowerPlug: ERP
application metadata
Data Quality: DataFlux 50 different Access HP-UX, Sun- $43,300
(SAS owned) engines. DB2 under Solaris, IBM-
OS/390, DB2 under VM, AIX, Tru64, Base SAS:
Design: ERwin, Oracle DB2 under Unix or PC, Windows 98, approx. $3,000-
Designer and CA-OpenIngres, NT, 2000, and $45,000
SAS WA
PowerDesigner Informix, ODBC, OLE Linux
DB, Sybase, MS SQL Access engines:
BI: SAS Enterprise Server, Teradata, $15,000-$30,000
Miner, SAS Enterprise Oracle, Oracle Rdb, each
Guide, SAS Internet Adabas, CA-Datacom,

Planning Assumption ♦ RPA-052002-00004 ♦ www.gigaweb.com


© 2002 Giga Information Group, Inc.
Page 7 of 8
Criteria for Selection: ETL Technology, Part 1 ♦ Lou Agosta

Reporting CA-IDMS, IMS-DL/1, PC SAP $55,000


File formats, System
Partners with Meta 2000, Baan, PeopleSoft,
Integration SAP R/3, SAP BW*
Technologies Inc. to
bridge WA with some
50 metadata products
Source: Giga Information Group

* There are so many options and combinations that SAS offers a guide on its Web page to assess your desired
combination. For example, this application provides information about the relationship between your operating
system, your DBMS and your SAS release. For example: I have SAS v8.1 and Oracle 8.0.4 on HP-UX. Will
SAS support an Oracle upgrade to 8.1.6? Go to www.sas.com/service/techsup/access/searchPage.hsql and
plug in the options. The answer is yes. Note that extract license fees apply to each Access engine.

Planning Assumption ♦ RPA-052002-00004 ♦ www.gigaweb.com


© 2002 Giga Information Group, Inc.
Page 8 of 8