Professional Documents
Culture Documents
A Database Infrastructure
for Schema Manipulation
Philip A. Bernstein
Microsoft Research
Order Product
Scheduled
Delivery
Salesperson
Business
Business Rules
Process
Emp.Sal < Update
Marketing
Emp.Mgr.Sal Authorize
Credit
Order
Entry
Bill
Customer
Table Defns
Schedule
Delivery
Inventory
Goals
Generic solutions
“Set”-at-a-time programming
Dept# Dept#
Name Name
First
A mapping is a
model that Last
represents a
transformation
Sept. 15, 2003 © 2003 Microsoft Corporation 7
Models and Mappings
A model is a rooted directed graph, which
represents a complex information structure.
map1
Relational XSD
Emp Emp
Schema
E# E#
Dept# Dept#
Name Name
First
Or it could be a
binary table (a Last
morphism)
Sept. 15, 2003 © 2003 Microsoft Corporation 8
Model Mgmt Algebra
map = Match (M1, M2)
<M3, map13, map23> = Merge
(M1, M2, map)
map3 = Compose(map1,
map2)
<M2, map12> = Diff(M1, map)
<M2, map12> = ModelGen(M1,
metamodel2)
Sept. 15, 2003 M = Copy( M )Corporation
© 2003 Microsoft 9
Outline
Introduction to Model Management
Using MM to solve meta data
problems
Matching anatomy ontologies
Model merging
Wrap-up
M1 map12 M2
Data translation
XML message translation for e-commerce
Integrate custom apps with commercial apps
Data warehouse loading (clean & transform)
M1 map12 M2
m
ap
23
ap
13
m
M3
View integration
Data integration
M1 map12 M2
Design tools (ER → SQL)
Wrapper generation (SQL → OO or XML)
M1 map12 M2
a p 3
a p 3 ModelGen(xsd2′, SQL)
. m 6. map7 = map4 • map5 • map6
2
7. <rdb2, map8, map9> =
xsd2 3. map4 rdb3 7. Merge(rdb3, rdb4, map7)
m ap
4. map5
6. map7
9
rdb2
a p8
m
xsd2′ 5. map6 rdb4 7.
Sept. 15, 2003 © 2003 Microsoft Corporation 16
Complete Script in Rondo
Operator Definition: PropagateChanges(s1, d1, s1_d1, s2, c, s2_c)
1. s1_s2 = Match(s1, s2);
2. 〈d1′, d1′_d1〉 = Delete(d1, Traverse(All(s1) − Domain(s1_s2), s1_d1));
3. 〈c′, c′_c〉 = Extract(c, Traverse(All(s2) − Range(s1_s2), s2_c));
4. c′_d1′ = c′_c ∗ Invert(s2_c) ∗ Invert(s1_s2) ∗ s1_d1 ∗ Invert(d1′_d1);
5. 〈d2, c′_d2, d1′_d2〉 = Merge(c′, d1′, c′_d1′);
• s2_d2 = s2_c ∗ Invert(c′_c) ∗ c′_d2 +
Invert(s1_s2) ∗ s1_d1 ∗ Invert(d1′_d1) ∗ d1′_d2;
7. return 〈d2, s2_d2〉;
Operator Use:
SQLXSD: PropagateChanges(s1, d1, s1_d1, s2, ModelGen(s2, XSD));
CRM:
Heart h-S-C
Heart sensibly
hasStructuralComponent
ValveInHeart
Valve In
sensibly
Heart
Cardiac
Heart
valve
S=1
Valve In
sensibly
Heart
Sept. 15, 2003 © 2003 Microsoft Corporation 24
Anatomy Matching Algorithm
1. Lexical Match
• Normalize string, UMLS dictionary lookup,
convert to concept-ID from thesaurus
2. Structure Match
• Similarity(reified nodes)
= Average(neighbors)
• Back-propagate to neighbors
Cardiac
Heart
valve
S=1
S = 2/3
Heart h-S-C S=1
Valve In
sensibly
Heart
Sept. 15, 2003 © 2003 Microsoft Corporation 26
Anatomy Matching Algorithm
1. Lexical Match
• Normalize string, UMLS dictionary lookup,
convert to concept-ID from thesaurus
2. Structure Match
• Similarity(reified nodes)
= Average(neighbors)
• Back-propagate to neighbors
3. Align Super-classes
• Super-class similarity = average similarity of
children, grandchildren, great-grandchildren
• Adds 213 matches (to 3567)
Sept. 15, 2003 © 2003 Microsoft Corporation 27
Some Lessons
A common encoding of models is hard
and involves compromises
Different styles of reifying relationships
CRM stores transitive relationships
Match needs to invent generalizations
In FMA, arterial supply, venous drainage,
nerve supply, lymphatic drainage
In CRM, these all map to isServedBy
On big models, Match is expensive
Some steps required days to execute
Cross-product filled 80 GB (< 1GB input).
Sept. 15, 2003 © 2003 Microsoft Corporation 28
Outline
Introduction to Model Management
Using MM to solve meta data
problems
Matching anatomy ontologies
Model merging
Wrap-up
⇒
Emp map Emp Emp
X X X Y X Z
a a a a a
Y Z Y Z W
Successive fixups lead to different results
Batch them at the end, to get a unique minimal result
Now enrich the meta-model (containment, complex
mappings, …) & merge semantics (conflicts, deletes)
Sept. 15, 2003 © 2003 Microsoft Corporation 31
Resolving Merge Conflicts
Meta Meta Emp mapee Employee
Model
Emp# 1 EmployeeID
Conflict
Name 2 FirstName
Model
Conflict 3 4 LastName
5 8
Emp
6 9
Emp#
7 10
Meta Name
Model 11
Conflict FirstNameLastName
Sept. 15, 2003 © 2003 Microsoft Corporation 32
Contributions to Merge
[Pottinger & Bernstein, VLDB 03]
Generic correctness criteria for Merge
Use of first-class input mapping (not just
correspondences)
Taxonomy of conflicts & resolution strategies
Characterize when Merge can be automatic
A merge algorithm for an EER representation
Experimental evaluation