Professional Documents
Culture Documents
Systems
Baladevi C
28 Feb. 2017
Baladevi C 2017
OUTLINE
Baladevi C 2017
Top IT Spending Priorities
Baladevi C 2017
Real World Applications
Business The Web
Science
Government
Web:Hundreds of millions of
high quality tables on the
web.
Pretty much everywhere
Baladevi C 2017
Integration in data management: Evolution
Baladevi C 2017
Integration in data management: Evolution
Baladevi C 2017
Problems in integrating DBs
Baladevi C 2017
Integration in data management: Evolution
Baladevi C 2017
Data Integration System
Baladevi C 2017
Examples of Heterogeneity
Baladevi C 2017
Formal framework for data integration
Definition
A data integration system I is a triple G,S,M , where
G is the global(mediated) schema
S is the source schema
M is the mapping between S and G
Baladevi C 2017
Data integration Architecture[2]
1.PNG
Baladevi C 2017
Data integration Architecture
Baladevi C 2017
A simple Example
Baladevi C 2017
A simple Example
Mediated Schema
Movie: Title, director, year, genre
Actors: title, actor
Plays: movie, location, startTime
Reviews: title, rating, description
Baladevi C 2017
Challenges in DIS
Baladevi C 2017
Mapping[2]
Baladevi C 2017
Global As View
Baladevi C 2017
Local as View
Baladevi C 2017
Query Answering/Rewriting in GAV
Query
Find reviews for movies starring Bob
Query over Mediated Schema
q(title, review) : MovieActor(title, ’Bob’), MovieReview(title, re-
view).
Reformulated Query
q(title, review) : DB1(id, title, ‘Bob’, year),DB3(id, review)
q(title, review) : DB1(id, title, ‘Bob’, year), DB2(id, ‘Bob’, year),
DB3(id, review)
Baladevi C 2017
Bucket Algorithm[2]
Baladevi C 2017
An Example for Bucket algorithm[2]
Mediated Schema:
Enrolled(student, dept) Registered(student, course, year)
Course(course, number)
View of Data Sources:
V1(student,number,year) :- Registered(student,course,year),
Course(course,number), number≥ 500, year ≥ 1992.
V2(student,dept,course) :- Registered(student,course,year),
Enrolled(student,dept)
V3(student,course) :- Registered(student,course,year), year
≤ 1990.
V4(student,course,number) :- Registered(student,course,year),
Course(course,number),
Enrolled(student,dept), number ≤ 100
Baladevi C 2017
An Example for Bucket algorithm[2]
S → student, D → dept, Y → year , C → course
Query is:
q(S,D) :- Enrolled(S,D), Registered(S,C,Y), Course(C,N), N
≥ 300, Y ≥ 1995.
Bucket Formed:
Baladevi C 2017
QR Decomposition[4]
Baladevi C 2017
A Simple Example
Baladevi C 2017
Frequency based Coverage Statistics Mining[3]
Baladevi C 2017
Frequency based Coverage Statistics Mining[3]
Baladevi C 2017
Problem Definition
Problem Definition
Our objective is to form an approximate view of entire data sources
at a global level in order to reduce the storage requirement at
global level and efficient retrieval of data.
Baladevi C 2017
A first Approximation
Baladevi C 2017
References
Xin Dong, Alon Y. Halevy, and Cong Yu. Data integration with
uncertainty.InProceedings of the 33rd International Conference on Very
Large DataBases, VLDB 07, pages 687698. VLDB Endowment, 2007.
Baladevi C 2017