Professional Documents
Culture Documents
- .j
References [Artifact!ReferentAr(ifact!Rcfe . . . . Type[ S ..... ]
DataType].~.¢, I File ]LineNo]
Mapping Engine The mapping engine supports [MHH00]. For example, when value correspondences
the creation, evolution and maintenance of mappings are added, deleted or modified within the correspon-
between pairs of schemas. A mapping is a set of dence engine, the mapping engine uses the new cor-
queries from a source schema to a target schema that respondences or modification to update the mapping.
will translate source data into the form of the target Similarly, in order to verify a particular mapping, the
schema. Clio produces a mapping (or set of alterna- mapping engine may invoke the schema engine to ver-
tive mappings) that are consistent with the available ify whether a specific constraint holds in a schema.
correspondences a n d schema information. The map-
ping engine is therefore using information gathered
by both the schema engine and the correspondence 4 A Data Warehouse Example
engine. As with the correspondences and schemas
To illustrate our approach, we present an example
constructs suggested by Clio, mappings are verified
based on a proposed software engineering warehouse
using the data view to help users understand Mter-
for storing and exchanging information extracted
native mappings. Users see example data from se-
from computer programs [BGH99]. Such warehouses
lected source tables and the contents of the target as
have been proposed both to enable new program
they would appear under the current mapping. Ex-
analysis applications, including data mining appli-
amples are carefully chosen to both illustrate a given
cations [MG99], and to promote data exchange be-
mapping (and the correspondences it uses) and to
tween research groups using different tools and soft-
illustrate the perhaps subtle differences between al-
ware artifacts for experimentation [HMPP~97]. Fig-
ternative mappings [YMHF01]. For example, in some
ure 2 depicts a portion of a warehouse schema for
cases, changing a join from an inner join to an outer
this information. This schema has been designed to
join may dramatically change the data produced by
represent data about a diverse collection of software
the mapping. In other cases, the same change may
artifacts that have been extracted using different soft-
have no effect due to constraints that hold on the
ware analysis tools. The warehouse schema was de-
schemas.
signed to be flexible and uses a very generic represen-
To permit scalability and incremental invocation tation of software data as labeled graphs. Conceptu-
of the tool, we also permit (partial) mappings to be ally, software artifacts (for example, functions, data.
read and modified. Such mappings may be created by types, macros, etc.) form the nodes of the graph. As-
a former session with Clio or by another integration sociations or references between artifacts (for exam-
tool. For example, a user may have used Clio to map ple, function calls or data references) form the edges.
a source and target schema. At a later time, after Two of the main tables for artifacts and references
the source schema has evolved, the user may again are depicted in the figure.
invoke Clio to create a mapping from the modified As new software analysis tools are deveioped, the
source to the target. The old mappings may be read data from these tools must be mapped into this inte-
in and used as a starting point for the mapping pro- grated schema. In Figure 2, we also give a relational
cess. Modification is done using operations on data representation of an example source schema. This
examples, in the data view [YMHF01]. schema was imported using a wrapper built on top of
The mapping creation process is inherently inter- output files produced by a program analysis tool. The
active and incremental. Clio stores the current map- wrapper produces a flat schema with no constraints.
ping within its knowledge base and, through an in- Clio's schema engine is used to suggest a set of keys
cremental mapping discovery algorithm, allows users and foreign keys that hold on the data.. Foreign keys
to extend and refine mappings one step at a time are depicted by dashed lines. Key attributes are un-
Figure 3: Value correspondences used to m a p between the source schema the warehouse schema