You are on page 1of 4

DATA BASES 2 – JUNE 21ST, 2019 – DURATION: 2H

PROF. SARA COMAI, PROF. DANIELE M. BRAGA


A. Active Databases (9 p.)
CITY (Name, Description)
POINTOFINTEREST (POI, Name, City, Description)
DIRECTCONNECTION (FromPOI1, ToPOI2, Distance)
The above relational schema stores graphs of Points of Interest (POI) of several cities. Connections are directed: if it is
possible to move from POI1 to POI2 and viceversa, then two tuples independently represent the two connections (and
of course the distance may not be the same). Write a set of triggers to implement the following behaviors: (a) upon
deletion of a POI the incoming/outgoing connections to/from that POI must be deleted; (b) whenever a city is deleted,
all the data related to that city are also deleted; (c) it is forbidden to insert a new DirectConnection from POI1 to POI2
if the distance of such connection is more than 3 times the distance from POI1 to POI2 passing through another POI; (d)
if the last POI of a city is deleted, then the city is also deleted.
Finally, build the triggering graph and discuss termination of the designed rule set.

B. Distributed Deadlock Detection (6 p.)


The nodes A, B C of a distributed transactional system are aware of the following remote and local waiting conditions:
A: EBt3 ECt2 t1EC t3t5 t5t1 t5t2
B: ECt2 t3EA t2t3
C: t2EA t2EB t1t4 t4t2
Execute the Obermarck’s algorithm twice, with different conventions:
• once sending messages of the form EXtitjEY forward (toward node Y) and only if i > j and
• once with the “opposite” conventions, i.e., backward (toward node X) and only if i < j.
Discuss the outcome, and explain it, taking into account the properties of the algorithm and the initial conditions.

C. XML… and a bit of Concurrency Control (9 p.)


<!ELEMENT Collection ( Schedule* )>
<!ELEMENT Schedule ( Operation+ )>
<!ATTLIST Schedule Identifier ID #REQUIRED>
<!ELEMENT Operation ( TransactionId, Resource )>
<!ATTLIST Operation type ( “r” | “w” ) #REQUIRED>
The DTD above describes a collection of schedules executed on a transactional system. In each schedule the operations
are stored in the order in which they were executed. For simplicity, assume that transactions cannot perform the same
type of operation on the same resource twice in the same schedule (i.e., there cannot be two wi(x) or two ri(x) in the
same schedule). Unspecified elements only contain PCData. Express in XQuery:
(4 p.) 1. A query that computes the number of schedules in the collection that contain an update pattern [ri(x) …. wi(x)].
(5 p.) 2. The definition of a function local:conflict-eq( … ) as xs:boolean that checks if two schedules are conflict equivalent.

D. Physical Databases (6 p.)


A table POINTOFINTEREST (POI, Name, City, Description) stores 30K
tuples on 1.2K blocks in a primary entry sequenced storage. A table select P1.POI, P2.POI
CONNECTION (FromPOI1, ToPOI2, Distance) stores 1.5M tuples stored from ( POINTOFINTEREST P1 join CONNECTION
on 12K blocks in a primary hash built on the primary key with negligible on P1.POI = FromPOI1 )
overflow chains. We know that val(City)=300 and that only 25% of the join POINTOFINTEREST P2
connections have a distance below 1.000 meters. Consider the query on P2.POI = ToPOI2
boxed on the right (couples of “close” attractions in Lisbon). where P1.City=“Lisbon” and P2.City=“Lisbon”
Describe briefly (but precisely) a reasonable query plan and estimate its and Distance < 1000
execution cost in the following scenarios. Cost estimations provided
without a clear description of the associated plan will not be considered.
(a) There are no secondary indexes;
(b) There is also a B+(City) index for POINTOFINTEREST (F=35, 3 levels, 600 leaf nodes).
A.
CREATE TRIGGER Ta
AFTER DELETE ON PointOfInterest
FOR EACH ROW
BEGIN
DELETE FROM DirectConnections WHERE old.POI =FromPOI1 or old.POI =ToPOI2;
END;

----------------------------------------------------------------------------------

CREATE TRIGGER Tb
AFTER DELETE ON City
FOR EACH ROW
BEGIN
DELETE FROM PointOfInterest WHERE City=old.Name;
END;

Of course the adjacent connections are deleted by activations of trigger Ta

----------------------------------------------------------------------------------

CREATE TRIGGER Tc
BEFORE INSERT ON DirectConnection
FOR EACH ROW
WHEN new.Distance > ANY 3 * ( SELECT C1.Distance+C2.Distance
FROM DirectConnection C1, DirectConnection C2
WHERE C1.FromPOI1=new.FromPOI1 and C2.ToPOI2=new.ToPOI2
and C1.ToPOI2=C2.FromPOI1)
RAISE_EXCEPTION(…);

----------------------------------------------------------------------------------

CREATE TRIGGER Td
AFTER DELETE ON PointOfInterest
FOR EACH ROW
WHEN NOT EXISTS (SELECT * FROM PointOfInterest WHERE City=old.City)
DELETE FROM City where Name=old.City;

----------------------------------------------------------------------------------

Triggering graph:
Ta Tb

Td Tc

The triggering graph is cyclic but there is no risk of nontermination, because all the actions involved in the cycle are
deletions, and – at worst – the database is emptied in a finite number of iterations.
B.
The two executions return different outcomes (one finds a deadlock, the other doesn’t). This sounds absurd, and
lets us doubt of the properties of the algorithm… but we must have faith in the Masters!
The reason for this oddity is that the initial conditions are inconsistent: node A invoked a sub-transaction on node
C, but node C is unaware that t1 is a sub-transaction invoked by Node A !! In simpler words, there is a missing
initial condition on node C: EA  t1

And of course on corrupted data even the best algorithm may produce untrustworthy results.

C.
1. We just count the schedules having any read followed by a write on the same resource by the same transaction
count( for $s in //Schedule
where some $o1 in $s/Operation[@type= “r”],
$o2 in $s/Operation[@type= “w”]
satisfies ( $o1 << $o2 and $o1/Resource = $o2/Resource and $o1/TransactionId = $o2/TransactionId )
return <foo/> )

2. Equivalence holds if the schedules are permutations of one another (same operations) and have the same conflicts

declare function local:conflict_eq( $s1 as element(), $s2 as element() ) as xs:boolean {


deep-equals( sorted_ops($s1), sorted_ops($s2) ) and
deep-equals( sorted_conflicts($s1), sorted_conflicts($s2) )
}
declare function local:sorted_ops( $s as element() ) as element()* {
for $op in $s/Operation
order by $op/TransactionId, $op/Resource, $op/@type
return $op
}

declare function local:sorted_conflicts( $s as element() ) as element()* {


for $op1 in $s/Operation,
$op2 in $s/Operation[ . >> $op2 ]
where $op1/TransactionId != $op2/TransactionId (: different transactions :)
and $op1/Resource = $op2/Resource (: same resource :)
and ( $op1/@type, $op2/@type ) = “w” (: at least one is a write :)
order by $op1/TransactionId, $op2/TransactionId, $op1/Resource, $op2/Resource, $op1/@type, $op2/@type
return <conflict> {$op1,$op2} </conflict >
}

N.B.: this formulation aims at maximizing readability and maintainability, not efficiency

D.
The peculiarity of this query is that any two POI in Lisbon must be joined with one another (a sort of Lisbon-self-join)
and checked against the connections to see if they are close enough.
Two main strategies are possible:
str1 : scan the connections, and if they are “short” (25%) then check if the connected POIs are both in Lisbon
str2 : build the “cartesian product” of the Lisbon POIs and lookup via the hash to check if they are connected an close

(a)
str1 is ineffective here as there is no means to lookup a POI without a full scan of the table:
12K + 25% ∙ 1.5M ∙ 2 ∙ 1.2K = BOOOOM !
A similar idea is to perform a sort of nested loop that for every block of Connection scans POIs and joins all:
12K + 12K ∙ 1.2K = 14.4 M (still very costly – I don’t even try to estimate the number of blocks of Connection with no arcs shorter than 1000, if any…)
str2 needs to scan POI many times… with a self-nested-loop… and (30K/30 ∙ (30K/30 - 1) ) = 9.900 lookups
1.2K + 1.2K ∙ 1.2K + (100 ∙ 99 ∙ 1 ) = 1.45 M
…unless we allow caching of 100 POI ids… it only takes one or two extra pages in main memory! :
1.2K + (100 ∙ 99 ∙ 1 ) = 11.1 K

(b)
str1 : The B+ helps to quickly extract (and possibly cache) the 100 POI ids without scanning the table, at a cost of:
2 (interm. nodes) + 3 (leaf nodes) + 100 (pointers) = 105
We could then scan Connection and immediately identify the wanted arcs: 12K + 105 = 12.1 K
str2 : As the B+ makes it faster to cache the 100 POI ids, we then perform the lookups based on the cached POIs:
2 (interm. nodes) + 3 (leaf nodes) + 100 (pointers) + 9.9K (lookups) = 10 K

You might also like