Deductive Database Xujia

School of Computing, CS2102S Presentation
Deductive Database
Deductive Database
1. The limitation of SQL-92 based RDBMS

RDBMS is powerful RDBMS still has limitation: Can SQL-92 support recursive queries?
Example 1: Assembly(part, subpart, qty) Can we find all components of trike? Can: If we know the height of the tree although the process may be troublesome Can not: If we dont know the height previously; this is a limitation of our rationally algebra To solve this problem introduce a new Relational Query Language Datalog
2. Deductive Database Manage System with Datalog

Deductive database: http://dict.die.net/deductive%20database/
A combination of a conventional database containing facts, a knowledge base containing rules, and an inference engine which allows the derivation of information implied by the facts and rules. Commonly, the knowledge base is expressed in a subset of first-order logic and either a SLDNF or Datalog inference engine is used.
-1-
Deductive Database
Rule
Head(involving output relation):-body(involving input relation [,output relation]) ([ ] means optional) Terminology: (:- denotes logical implication)
Body: The right side of :- is called the body of the rule Head: The left side of :- is called the head of the rule Output Relation:The relation defined by the rule Input Relation: The relation already exists before applying the rule Understanding/Interpretation If the tuples mentioned in the body exist in the database, then the tuples mentioned in the head of the rule must also be in the database. For short, if the body is true, then the head is true. We can view the Datalog rules as a function f which maps an instance of input relation to an instance of the output relation f: input instance->output instance Example 2: Components (Part, Subpart):-Assembly (Part, Components (Part, Subpart):-Assembly (Part, Subpart, Part2, Qty). Qty),
Components (Part2,
Subpart).
These two are rules in Datalog and recursively define a relation named Components. Understanding/Interpretation Rule 1 For all values for Part, Subpart, Qty, if there is a tuple <Part, Subpart, Qty> in Assembly, then there is tuple <Part, Subpart> in Components.
-2-
Deductive Database
Rule 2: For all values for Part, Subpart, Qty, if there is a tuple <Part, Part2, Qty> in Assembly, and a tuple <Part2, Subpart> in Components, then there is a tupe <Part, Subpart> in Components. Here Components is output relation and Assembly input relation Each rule can be used to infer or deduce some new tuples for the output relation, so we often call the database systems that support Datalog rules deductive database manage systems.
3. Theoretical foundation: Least Model
& Least Fixpoint
The meaning of a Datalog program is usually defied in two different ways, both of which essentially describe the relational instances of the output relations. Least model semantics: declarative; a way to understanding the program without thinking about how the program is excuted. Model: a collection of relational instances, one instance for each relation in the program, which satisfies the rules of the program Least Model: a model of a program M such that for any other model M2 of the same program, for each relation R in the program, the instance for R in M is contained in the instance for R in M2. Example 3:
<contents of example>
Least fixpoint semantics Fix point of function real-valued f (R->R): f(x)=x Sin(0)=0 cos(0.9998477415310881129598107686798)=
-3-
Deductive Database
0.9998477415310881129598107686798 Function from set to set f(set-> set): f(A->B):B={for any x in A, f(x)} double: multiply every element of the double{1,2,3}={2,4,6} input set by 2
double+: double union the input set double+{1,2,3}={1,2,3,4,6} double+{1,2,4,6,8}={1,2,4,6,8}=A double+{2,4,6,8..}={2,4,6,8...}=B double+{4,6,8..}={4,6,8.}=C All of A,B and C are fix point of function double+; you have seen fixpoint may not be unique; C is a subset of B which in turn is a subset of A. Least fixpoint of a function: a fixpoint that is smaller (subset of) than any other fixpoint of the function Does a function which has a fixpoint always have a least fixpoint? What is the least fixpoint for the function double+ Relation is just a set of tuples. Components=Title(Assembly.Subpart=Components.Part(Assembly
Components))
defined above?
Or Components=f(Components) -----------Recursive definition (Because the input relation Assembly is given) Compute the fixpoint: f(f(f(f(ff(a)))) =x=f(x) Sin(sin(sin(sin(1))) =0=sin(0)
Example 4: Compute fixpoint for Components=f(Components) We use Components_n to denote the instance of Components after n-th application of the function or say the Datalog rules.
-4-
Deductive Database
Step 1: initialize Components to get Components_0 Step 2: Apply first rule, we get Components_1 with [how many] new tuples added Step 3: Apply second rule, we get Components_2, with [how many] new tuples added Step 4: Apply second rule, we get Components_3, with [how many] new tuples added Step 5: Apply second rule, we get Components_4. Note: this time no new tuples generated, so we stop. Finally, Components_4 is a fixpoint; if we set Components_0=null, then Component is the least fixpoint.
Least Model and Least Fixpoint

Every function defined in terms of relational algebra that dont contain set difference is guaranteed to have a least fixpoint. Least fixpoint can be computed by repeatedly applying the rules of the program on the given instances of input relations. Further, every Datalog program is guaranteed to have a least model. Least Model=Least Fixpoint Thanks to least model, we can understand a program in terms of If the body is true, then the head is true. Thanks to least fixpoint, the system can compute the answer by repeatedly applying rules
4. Recursive queries with Negation

Example 5: Divide parts into two classes Big and Small Big(Part):-Assembly(Part, Subpart, Qty), Qty>2, not Small(Part). Small(Part):-Assembly(Part, Subpart, Qty), not Big(Part). This program has two fixpoints depending on the order to execute the 2 rules:
-5-
Deductive Database
Fixpoint 1: Big={trike}; Small={frame, wheel, tire} Fixpoint 2: Big={}; Small={trike, frame, wheel, tire} Neither.
Which is the least fixpoint? This program is not safe.
5. Safe of a Datalog program

Example 6: Price_Parts(Part, Price):-Assembly(Part, Subpart, Qty) What is the output relation? This program is either not safe. If the least model of a program is not finite, for even one instance of its input relations, then we say the program is unsafe If a relation appears in the body of a rule preceded by not, we call this a negated occurrence ; other wise we call it a positive occurrence. A range-restricted program is one such that every variable in the head of the rule must appear in some positive relation occurrence in the body of the rule. Every range-restricted program has a finite least model, if the input relation instances are finite. Its infiniteprice can be any positive number
6. Stratification: a solution to the problem caused by negation

Depend on: a table T depends on a table S if some rule with T in the head contains S or recursively contains a predicate that depends on S, in the body. Depend negatively on: a table T depends on a table S if some rule with T in the head contains a negated occurrence of S or recursively contains a predicate that depends negatively on S, in the body. To solve the problem caused by negation, we should specify a reasonable order of executions.
-6-
Deductive Database
We classify the tables in a program into strata as follows:
The tables that do not depend on any other tables are in stratum 0 The tables that do not appears in lower strata and depend only on tables in stratum n or lower strata and depend negatively only on tables in lower strata are in stratum n+1. Stratified program: a program whose tables can be classified into strata according to the above algorithm. A stratified program is evaluated stratum by stratum, starting with stratum 0. The Big/Small program in Example [], is not stratified. How to change it into a stratified or say safe program? Just remove one not.
7. Relational Algebra VS Stratified Datalog

Every relational algebra query can be expressed as a range-restricted and stratified Datalog program. Is the converse statement true? Translation: Selection: Projection: Cross Product: Union: Set difference:
Result(y):-R(x,y), x=c Result(x):-R(x,y) Result(x,y,u,v):-R(x,y), S(u,v) Result(x,y):-R(x,y) Result(u,v):-S(u,v) Result(x,y):-R(x,y), not S(x,y)
8. Datalog with
Aggregation
-7-
Deductive Database
Datalog can be extended with SQL-style grouping and aggregation operations: SUM COUNT NumParts(Part, SUM(<Qty>)):-Assembly(Part, Subpart, Qty). SELECT Part, SUM(Qty) FROM Assembly GROUP BY Part
9. Efficient evaluation of recursive queries

Evaluation recursive queries by repeatedly applying the rules as mentioned before, which is called Nave fixpoint evaluation, is not efficient: Repeated inferences: The same tuple may be inferred repeatedly in the same way, that is, using the same rule and the same tuples for tables in the body of the rule. Unnecessary inferences: To find the components of wheel, do we need to compute the components of the root trike? One application of all rules of a program is called iteration. How to solve these two problems? 1. During every iteration, how can we get new tuples? If all the tuples of the tables in the bodies of all rules are generated before last iteration, then we will make a duplicate tuple because the exactly same process took place during last iteration;if at least one tuple of the tables in the body are generated firstly during last iteration, then we tend to create a new tuple. So we keep track of tuples generated for the first time during last iteration to avoid repeated inferences. This is called Seminaive fixpoint evaluation. 2. Example 7: find parts at the same level with spoke SameLevel(S1, S2):- Assmebly(P1, S1, Q1), Assembly(P1, S2, Q2). SameLevel(S1, S2):- Assembly(P1, S1, Q1), Assembly(P2, S2, Q2), SameLevel(P1, P2) Result(S2):- SameLevel(S1, S2), S1=spoke. Only a small part of SameLevel is required for our question Computation of SamLevel tuples whose first field is what value has potential to contribute to answer the above query?
-8-
Deductive Database
Do we need to compute the parts of the same level with the subparts of spoke? Do we need to compute the parts of nodes which are on the path from spoke to the root trike?
Rewrite the program: Magic_SameLevel(P):- Magic_SameLevel(S), Assembly(P, S, Q). Magic_SameLevel(spoke):- . (the body is empty) SameLevel(S1, S2):- Assmebly(P1, S1, Q1), Assembly(P1, S2, Q2), Magic_SameLevel(S1). SameLevel(S1, S2):- Assembly(P1, S1, Q1), Assembly(P2, S2, Q2), SameLevel(P1, P2), Magic_SameLevel(S1).
-9-
Deductive Database
Q&A:
1. Consider the schema: Parent(parent, child) Which of following statements is not true? A. We can find all grandsons and granddaughters of some person with SQL-92, although it doesnt support recursive queries. B. We can not find all descendants a person has, with relational algebra, because it doesnt support recursive queries. C. Define a Datalog program: Cousin(x,y):- Parent(p, y), Parent(p, x). Cousin(, x):- Parent(p1,x), Patent(p2,y), Cousin(p1, p2). To find all siblings an cousins (not only cousin-german) John has, we can avoid repeated inferences by computing only tuples of Cousin whose first field is John. D. The Parent relation is not recursive defined but the Cousin relation defined in ( C ) is. E. Relational algebra and Datalog program are not equivalent although every relational algebra query can be expressed by range-restricted and stratified Datalog program.
3. Short question: Summarize the algorithm of nave\Seminaive evaluation of a recursive queries.
------------------------------------------ END--------------------------------------------------
- 10 -

Deductive Database Xujia

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deductive Database Xujia

Uploaded by

Copyright:

Available Formats

School of Computing, CS2102S Presentation

1. The limitation of SQL-92 based RDBMS

2. Deductive Database Manage System with Datalog

School of Computing, CS2102S Presentation

School of Computing, CS2102S Presentation

3. Theoretical foundation: Least Model

& Least Fixpoint

School of Computing, CS2102S Presentation

School of Computing, CS2102S Presentation

Least Model and Least Fixpoint

4. Recursive queries with Negation

School of Computing, CS2102S Presentation

Which is the least fixpoint? This program is not safe.

5. Safe of a Datalog program

6. Stratification: a solution to the problem caused by negation

School of Computing, CS2102S Presentation

We classify the tables in a program into strata as follows:

7. Relational Algebra VS Stratified Datalog

School of Computing, CS2102S Presentation

9. Efficient evaluation of recursive queries

School of Computing, CS2102S Presentation

School of Computing, CS2102S Presentation

3. Short question: Summarize the algorithm of nave\Seminaive evaluation of a recursive queries.

You might also like