Professional Documents
Culture Documents
Abstract—This paper first analyzed the problems of existing ETL because workflow can separate the ETL process from view
tools, and proposed an ETL service model based on metadata, and maintenance. The use of this approach has the advantage of
then summarizes the types of metadata and their application scope. being able to better describe the ETL process, shortcomings are
Based on this ETL service model, a concrete ETL service users can not make advantage of the existing research results of
framework was put forward; many important services were also view maintenance in the relational model, and graphical working
discussed, such as metadata management service, metadata ways are also not suitable for ETL processes which have a large
definition services, ETL transformation rules service, process number of rules. For instance, it is easier to form a complex map
definition service, SQL code generation and optimization services, graphics when establishing the mapping rules between the
process control services and so on. At last, definition method and source data and the destination data which conclude many fields.
related algorithms of ETL rules are designed and analyzed. There is also a representative of the viewpoint which is raised by
Practice has proved that the model and framework proposed in this
Christof Bornhoved that use metadata-driven integration model
paper can improve the ETL efficiency to a large extent.
to integrate Internet resources [3], whose aim is to fully utilize
Keywords-ETL Services Framework; Metadata; ETL Rules the data warehouse metadata for data integration, but its
drawback is difficult to implement ETL processes and
I. Introduction interoperability of rules-based business.
ETL (Extract-Transform-Load), which is the process of This paper combines the advantages of the above two kinds
extracting data from a variety of heterogeneous data sources, of ideas and put forward the ETL service model based on
and transforming those extracted data into needed format, and metadata. That is, in order to implement the entire ETL process,
then loading those data into the DW (Data Warehouse) [1]. ETL designers only need to analyze the ETL process and describe the
is not only the cornerstone and soul of building a data warehouse, involved ETL metadata, and then automatically generate SQL
but also a necessary step for establishing the DW, so it plays an script based on these metadata. And eventually all the ETL
important role in the process of building a data warehouse. process can be released with the unified service format. The
Under normal circumstances, it is 60% to 80% that workload of main advantage of this idea is its ability to maximize the
developing ETL accounting for that of developing entire data research results of mature relationship model, and to maximize
warehouse system [DEMA97]. the reuse of ETL process, so that ETL process is more flexible
and has higher performance. Therefore, the focus of this study is
Besides hand-coding ETL method, users can also use
the versatility and efficiency of ETL engine, which can
existing ETL tools to implement ETL process, such as IBM
implement effective expansion and flexible control the ETL
Visual Warehousing, Microsoft DTS, Oracle Warehouse Builder,
process by using metadata. In order to take full advantage of
etc. But these tools is difficult to manipulate, and is very time-
mature relational model, this study mainly emphases on the
consuming if users want to master the relevant rules and
relational model.
language. The ETL designers are required to be familiar with
data structures, ETL rules and operational processes. In addition,
designers need not only to understand the overall ETL process, II. Design of ETL Service Model
but also to know the detailed definition of each concrete steps. Metadata is the data which is used to describe involved data.
So, it is very difficult to improve the efficiency of ETL process In different environments, metadata is on behalf of a different
development. What’s more, designers must redesign the ETL type of data. The metadata of this study is not only including
process when changing business rules or altering data source and data warehouse meta-data, but also involved with the metadata
data destination. So, it can be seen that it is difficult to design of data sources, transformation rules, extraction rules and
rapidly and reuse rules for majority ETL tools. workflow rules. The metadata in the ETL service model,
In fact, there is a large number of studies on the ETL respectively, involve different scopes.
implement way. For example, one opinion thinks that the
workflow should be used to describe the ETL process [2],
and then rule definition services will achieve the metadata from
the metadata database. Those metadata will be displayed as
graphic elements, and then client can use those metadata to
define ETL rules and save it into metadata database. The detail
algorithm of rule definition is as shown as Fig. 4.
V. Conclusion
Based on the ETL service framework proposed in this paper,
an ETL prototype system has been developed and its
performance has been carried out. In this prototype system,
ORACLE 10G were used as data warehouse and metadata