You are on page 1of 3

Snapshot review of data architecture tools

The following short report is an incomplete survey of the marketplace for data loading, data analyses and data cleaning tools. This
brief review is being published early so others may use it to inform their own research. This is in line with the IDMAPS Teams
strategy of publishing information early; by publishing as soon as we think information might have value to the community we hope
to enhance the benefits to the community. The IDMAPS project will be selecting a toolkit in this space and will describe our reasons
once a final decision has been made.
The Data Architecture field seems to fall between three marketing labels, with vendors deliberately blurring the boundaries in order
to make their product fit with whichever term has value with the customer. Tools in Data Warehousing, Extract Transform Load
(ETL), and Business Intelligence (BI) all appear to have some relevance to the data architectures. While Business Intelligence
revolves around data reporting, it tends to be cross marketed with data loading and transformation tools. Overall, extract transform
load is the term most often used in this space and the marketplace for these tools seems relatively mature. Below is a brief
summary table of our research so far. Products are listed in current order of preference, though this order can and most likely will
change as we look at the tools more in depth. We have deliberately excluded tools that require their own bespoke hardware as they
are likely to be expensive and an overkill for this situation.

ETL toolkits brief survey of mark place


Toolkit
Pentaho Data
Integration
(Kettle)

url
http://www.pentaho.com/

Talend

http://uk.talend.com/index.php

Clover ETL

http://www.cloveretl.com/

Apatar

http://www.apatar.com/

Comments
Has good demos and a variety of screen casts explaining how it works. Pentaho is
reviewed well in a comparison of ETL tools at
http://www.pentaho.com/docs/informatica_pentaho_etl_tools_comparison.pdf
Quick 2 hour test drive was somewhat disappointing: couldnt find a way of easily
getting it to sanity check the flat file it was importing (it had blank lines at the start).
Talend has a data synchronizer tool, claims to have direct access to SAP. It appears to
have a good match to our current data model (flat files) and the demo looks good. Gets
well reviewed by InfoWorld.
Looks viable: the ETL tool is free, but a server to run it "with higher performance" is
paid for (price on asking) and a GUI designer tool is also paid for (450 USD per head).
Looks a viable tool. Demos on website not massively related to our area, and seems to
focus on integration with packages like Salesforce etc.

Datacleaner

http://datacleaner.eobjects.org/

Mainly focuses on cleaning data and loading it in to database; has good screen casts
on their website.

Redhat
(Metamatrix)
SAGA.M31
Galaxy
Chainbuilder

http://www.redhat.com/metama
trix/

Jitterbit

http://www.jitterbit.com/Product
/index.php

SOA for building data webservices, may not be that good at supporting our legacy of
file based bulk transfer.
Terms and conditions only in German and yet to be translated, hence ruled out. Looked
webservice heavy too.
ChainBuilder ESB is a Java Business Integration (JBI) compliant product which
consists of a set of Eclipse GUI plug-ins, runtime server components and a Web-based
Admin console. It primary uses are in Service Oriented Architecture (SOA)
environments and Enterprise Application Integration (EAI).
Jitterbit is a data & application integration suite available in open source. It was
developed to provide business users a quick, cost-effective and simple way to
configure, test, deploy and manage integration solutions. However it has a limited
number of connecters and none for messaging.

http://galaxy.sagadc.com/
http://www.chainforge.net/com
munity

Many big commercial vendors also have offerings in this space:

SAP Netweaver Process Integration https://www.sdn.sap.com/irj/sdn/nw-pi71 toolkit which supports a broad set of ETL and
SOA techniques, however the licensing model employed may rule it out for consideration as a toolkit for data loading within
our institute.
Tibco http://www.tibco.com/ also have a well regarded toolkit in this space but again licensing may be an issue.

IBM Websphere http://www-01.ibm.com/software/websphere/ particularly websphere transformation extender.

Oracle has its data warehousing product http://www.oracle.com/technology/products/warehouse/index.html

There are many more vendors with their own offering in this space.

You might also like