several products (e.g., Rational Data Architect (RDA),Rational Application Development
, and WebSphere In-tegration Developer
). IBM also supports at least twoETL tools: IBM WebSphere DataStage, and another inDB2 Warehouse Enterprise Edition.In this paper we describe Orchid, a system origi-nally designed to convert declarative Clio schema map-pings  into IBM WebSphere DataStage ETL jobs andvice versa. Orchid provides an abstract operator modelthat serves as a common model for both transformationparadigms; both mappings and ETL jobs are transformedinto instances of this common model. As an additionalbeneﬁt, instances of this common model can be opti-mized and deployed into multiple target environments.For example, instead of converting an ETL job into amapping, Orchid can rewrite the job and deploy it back as a sequence of combined SQL queries and ETL jobs.This rewrite and deployment of ETL jobs occurs auto-matically and reduces the workload of (highly expensive)ETL programmers.Mapping and ETL tools are aimed at different setsof users. In general, mapping tools are aimed at datamodelers and analysts that want to express, at a high-level, the main components of a data transformationor integration job. In this kind of scenario, declarativespeciﬁcations and simple GUIs based on lines work well. ETL tools are aimed at developers interested in theefﬁcient implementation of the data exchange/integrationtask. Since designers and developers work as a teamwhen implementing a task, collaboration is facilitatedif the tools used can interoperate. However, mappingand ETL tools do not directly interoperate, and usersoften require manual processes to support the followingfeatures:
Starting from declarative mappings, generate ETL jobs reﬂecting the mapping semantics, which canthen be further reﬁned by an ETL programmer.
Starting from an ETL job, extract a declarative map-ping that represents the logical aspects of the ETLoperations as a source-to-target schema mapping.
Support “round-tripping” for the different datatransformation representations, allowing incremen-tal changes in one representation to propagate intothe other.To illustrate the use of the above features, let us takea look at how Orchid’s capabilities are utilized in anindustrial product. Orchid’s technology is now part of
, a component of IBM Information Server
.IBM Information Server is a new software platformproviding a host of tools for enterprise information inte-gration, including IBM WebSphere DataStage. FastTrack uses IBM Information Server’s metadata repository tofacilitate collaboration between designers of a data inte-gration or data exchange application. For instance, toolsare provided for system analysts to enter relationshipsbetween data sources in a declarative way (e.g., as cor-respondences between two schema elements, as businessrules, etc.). These declarative speciﬁcations are capturedand stored as schema mappings. The mappings are oftenincomplete, only partially capturing the transformationsemantics of the application. In fact, some of the rulescan be entered in a natural language such as English.FastTrack converts these mappings into IBM Web-Sphere DataStage (ETL) job skeletons that contain someunresolved place-holder stages that are not completelyspeciﬁed. For example, an analyst might not know howto join two or more input tables, but FastTrack, neverthe-less, detects that the mapping requires a join and createsan empty join operation (no join predicate is created) inthe ETL job. Similarly, business rules entered in Englishare passed as annotations to the appropriate ETL stage.These annotations guide ETL programmers writing theactual transformation functions.Once the programmers are done reﬁning the generatedETL job, they communicate the reﬁnements back to theanalysts for review. The programmers can regenerate themappings based on the reﬁned ETL jobs; unless theusers radically modify the ETL jobs, the regeneratedmappings will match the original mappings but willcontain the extra implementation details just entered bythe programmers. In our example, the analyst will nowsee the join condition used for those input tables. In analternative scenario, users can convert existing ETL jobsinto a ﬂow of mappings and send them to analysts forreview.Converting between mappings and ETL systems raisesa number of challenges. The ﬁrst is the different levelsof abstraction between mappings and ETL jobs. Becausemappings are declarative speciﬁcations, they do notcapture (by design) the exact method by which theirtransformation semantics are to be implemented. Forexample, a mapping from two relational tables intoa target table is often speciﬁed with a join operationbetween the two tables. As with SQL and other declar-ative languages, mappings do not capture how this join