You are on page 1of 14

DEPARTMENT OF COMPUTER ENGINEEERING

COLLEGE OF TECHNOLOGY, PANTNAGAR

Advance Database Management System


(Assignment)

Submitted To: Submitted By:


Prof. Sunita Jalal Purushottam Das
Assistant Professor ID.No. 34946
Deptt. Of Computer M.Tech(CSE)
Engineering Deptt. Of Computer
Engineering

INDEX

1
S.NO. TITLE PAGE NO. SIGNATURE

1 Assignment 1 3
2 Assignment 2 9
3 Assignment 3 11

Assignment-1
2
Q1. When a transaction is rolled back under timestamp ordering, it is assigned a new
timestamp. Why can it not simply keep its old timestamp?

Ans: With each transaction Ti in the system, we associate a unique fixed timestamp, denoted by
TS(Ti). This timestamp is assigned by the database system before the transaction Ti starts
execution. If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj
enters the system, then TS(Ti) < TS(Tj ). There are two simple methods for implementing this
scheme:The timestamps of the transactions determine the serializability order. Thus, if TS(Ti) <
TS(Tj ), then the system must ensure that the produced schedule is equivalent to a serial schedule
in which transaction Ti appears before transaction Tj .The timestamp-ordering protocol ensures
that any conflicting read and write operations are executed in timestamp order.

If a transaction Ti is rolled back by the concurrency-control scheme as result of issuance of either


a read or write operation, the system assigns it a new timestamp and restarts it. Because the
transaction only rollbacks or got to a previous state of a transaction if only we arrive at certain
previous parameter of process. If a new timestamp is not assigned to the transaction then it never
rollbacks. So, for rollback it is must to give a new timestamp to transaction.

Q2. In multiple-granularity locking, what is the difference between implicit and explicit
locking?
Ans: Consider the tree of Fig.1, which consists of four levels of nodes. The highest level
represents the entire database. Below it are nodes of type area; the database consists of exactly
these areas. Each area in turn has nodes of type file as its children. Each area contains exactly
those files that are its child nodes. No file is in more than one area. Finally, each file has nodes of
type record. As before, the file consists of exactly those records that are its child nodes, and no
record can be present in more than one file.
Each node in the tree can be locked individually. As we did in the two-phase locking protocol,
we shall use shared and exclusive lock modes. When a transaction locks a node, in either shared
or exclusive mode, the transaction also has implicitly locked all the descendants of that node in
the same lock mode. For example, if transaction Ti gets an explicit lock on file Fc of Fig.1, in
exclusive mode, then it has an implicit lock in exclusive mode all the records belonging to that
file. It does not need to lock the individual records of Fc explicitly.

Q3. Under what conditions is it less expensive to avoid deadlock than to allow deadlocks to
occur and then to detect them?
Ans. Deadlock avoidance, on the other hand, allows the three necessary conditions but makes
judicious choices to assure that the deadlock point is never reached. As such, avoidance allows
more concurrency than prevention. With deadlock avoidance, a decision is made dynamically
whether the current resource allocation request will, if granted, potentially lead to a deadlock.
Deadlock avoidance thus requires knowledge of future process resource requests. Deadlock
avoidance involves the analysis of each new resource request to determine if it could lead to
deadlock, and granting it only if deadlock is not possible. Deadlock avoidance has the advantage

3
Fig.1 Multiple Granularity

that it is not necessary to preempt and rollback processes, as in deadlock detection, and is less
restrictive than deadlock prevention. However, it does have a number of restrictions on its use:
• The maximum resource requirement for each process must be stated in advance.
• The processes under consideration must be independent; that is, the order in which they execute
must be unconstrained by any synchronization requirements.
• There must be a fixed number of resources to allocate.
• No process may exit while holding resources

Q4. Deadlock is avoided by deadlock avoidance schemes. Is starvation still possible?


Explain your answer.
Ans: Deadlock avoidance allows the three necessary conditions but makes judicious choices to
assure that the deadlock point is never reached. As such, avoidance allows more concurrency
than prevention. With deadlock avoidance, a decision is made dynamically whether the current
resource allocation request will, if granted, potentially lead to a deadlock. Deadlock avoidance
thus requires knowledge of future process resource requests. Yes, Starvation is still possible.
Because the deadlock avoidance scheme works on the phenomena that first satisfy maximum
number of process, whose need can be fulfilled. But assume a scenario that a process wanting a
large number of resources is waiting throughout while other processes are using the resources.

Q5. Explain the phantom phenomenon. Why may this phenomenon lead to an incorrect
concurrent execution despite the use of the two-phase locking protocol?

Ans:

4
Let two transactions T1 and T2 do not access any tuple in common, yet they conflict with each
other! In effect, T1 and T2 conflict on a phantom tuple. If concurrency control is performed at
the tuple granularity, this conflict would go undetected. This problem is called the phantom
phenomenon. We have assumed implicitly that the only data items accessed by a transaction are
tuples. However, T1 is a transaction that reads information about what tuples are in a relation,
and T2 is a transaction that updates that information. Clearly, it is not sufficient merely to lock
the tuples that are accessed; the information used to find the tuples that are accessed by the
transaction must also be locked.

The simplest solution to this problem is to associate a data item with the relation; the data item
represents the information used to find the tuples in the relation. The major disadvantage of
locking a data item corresponding to the relation is the low degree of concurrency— two
transactions that insert different tuples into a relation are prevented from executing concurrently.
A better solution is the index-locking technique. Any transaction that inserts a tuple into a
relation must insert information into every index maintained on the relation. We eliminate
the phantom phenomenon by imposing a locking protocol for indices.

Q6. EXPLAIN B+ TREES AND ITS USAGE IN DBMS?


Ans: B-trees and B+-trees are special cases of the well-known tree data structure.
B-tree: A B-tree of order p, when used as an access structure on a key field to search for records
in a data file, can be defined as follows:

1. Each internal node in the B-tree (Fig.2) is of the form

<P1' <K1, Pr1>, P2, <K2, Pr2>, .. , <Kq-1, Prq-1>, Pq>, where q <=5 p. Each Pi is a tree
pointer-a pointer to another node in the B-tree. Each Prj is a data pointer the record whose search
key field value is equal to K.
2. Within each node, K1 < K2< ... < Kq-1.
3. For all search key field values X in the subtree pointed at by Pj (the ith subtree, see
Fig.2), we have:
Ki-1< X < Ki for 1 < i < q; X < Ki, for i = 1; and Ki-1 < X for i = q.
4. Each node has at most p tree pointers.
5. Each node, except the root and leaf nodes, has at least [(p/2)] tree pointers. The
root node has at least two tree pointers unless it is the only node in the tree.
6. A node with q tree pointers, q <=5 p, has q - 1 search key field values (and hence
has q - 1 data pointers).
7. All leaf nodes are at the same level. Leaf nodes have the same structure as internal
nodes except that all of their tree pointers Pi are null.

A B-tree starts with a single root node (which is also a leaf node) at level 0 (zero). Once the root
node is full with p - 1 search key values and we attempt to insert another entry in the tree, the
root node splits into two nodes at level 1. Only the middle value is kept in the root node, and the
rest of the values are split evenly between the other two nodes. When a non-root node is full and
a new entry is inserted into it, that node is split into two nodes at the same level, and the middle
entry is moved to the parent node along with two pointers to the new split nodes. If the parent

5
node is full, it is also split. Splitting can propagate all the way to the root node, creating a new
level if the root is split.

Fig.2: B-tree structures. (a) A node in a B-tree with q - 1 search values.


(b) A B-tree of order p = 3. The values were inserted in the order 8,5, 1, 7, 3, 12,9,6.

B-trees are sometimes used as primary file organizations. In this case, whole records are stored
within the B-tree nodes rather than just the <search key, record pointer> entries. This works well
for files with a relatively small number of records, and a small record size. B-trees provide a
multilevel access structure that is a balanced tree structure in which each node is at least half full.
Each node in a B-tree of order p can have at most p-1 search values.

B+-Trees

Most implementations of a dynamic multilevel index use a variation of the B-tree data structure
called a B+-tree. In a B+-tree, data pointers are stored only at the leaf nodes of the tree; hence,
the structure of leaf nodes differs from the structure of internal nodes. The leaf nodes have an
entry for every value of the search field, along with a data pointer to the record (or to the block
that contains this record) if the search field is a key field. For a non-key search field, the pointer
points to a block containing pointers to the data file records, creating an extra level of
indirection. The leaf nodes of the B+-tree are usually linked together to provide ordered access
on the search field to the records. These leaf nodes are similar to the first (base) level of an
index. Internal nodes of the B+-tree correspond to the other levels of a multilevel index..Some
search field values from the leaf nodes are repeated in the internal nodes of the b+ tree to guide
the search. The structure of the internal nodes of a B+-tree of order p:
1. Each internal node is of the form

6
<P1, K1, P2, K2, ... , Pq-1, Kq-1, Pq>
where q <=5 P and each Pi is a tree pointer.
2. Within each internal node, KI < K2< ... <Kq-1;
3. For all search field values X in the subtree pointed at by Pi we have Ki - 1 < X <=Ki,
for 1 < i < q; X <=Ki, for i = 1; and Ki-1 < X for i = q (see Fig.3).
4. Each internal node has at most p tree pointers.
5. Each internal node, except the root, has at least [p/2] tree pointers. The root
node has at least two tree pointers if it is an internal node.
6. An internal node with q pointers, q<=5 p, has q - 1 search field values.
The structure of the leaf nodes of a B+-tree of order p (Fig.3) is as follows:
1. Each leaf node is of the form

Fig.3: The nodes of a B+-tree. (a) Internal node of a B+-tree with q – 1 search values. (b) Leaf
node of a W-tree with q-1 search values and q-l data pointers.

where q <= p, each Pri, is a data pointer, and Pnext points to the next leaf node of the
B+-tree.
2. Within each leaf node, K1 < K2 < ... < Kq-1, q <=p.
3. Each Pr, is a data pointer that points to the record whose search field value is K, or
to a file block containing the record (or to a block of record pointers that point to
records whose search field value is K, if the search field is not a key).
4. Each leaf node has at least [(p/2)] values.
5. All leaf nodes are at the same level.
Entries in the internal nodes of a B+-tree include search values and tree pointers without any data
pointers, more entries can be packed into an internal node of a B+-tree than for a similar B-tree.

7
Thus, for the same block (node) size, the order p will be larger for the B+-tree than for the B-
tree. This can lead to fewer B+-tree levels, improving search time. Because the structures for
internal and for leaf nodes of a B+-tree are different, the order p can be different. We will use p
to denote the order for internal nodes and Pleaf to denote the order for leaf nodes, which we
define as being the maximum number of data pointers in a leaf node.Both B and B+ trees are
used to store data in the databases.

8
Assignment-2
Q1.Discuss the relative advantages of centralized and distributed database?

Answer:

1. A distributed database allows a user convenient and transparent access to data which is not
stored at the site, while allowing each site control over its own local data. A distributed database
can be made more reliable than a centralized system because if one site fails, the database can
continue functioning, but if the centralized system fails, the database can no longer continue with
its normal operation. Also, a distributed database allows parallel execution of queries and
possibly splitting one query into many parts to increase throughput.
2. A centralized system is easier to design and implement. A centralized system is cheaper to
operate because messages do not have to be sent.

Q.2.Explain how the following differ: fragmentation transparency, replication


Transparency, and location transparency?

Answer:

a. With fragmentation transparency, the user of the system is unaware of any fragmentation the
system has implemented. A user may formulate queries against global relations and the system
will perform the necessary transformation to generate correct output.

b. With replication transparency, the user is unaware of any replicated data. The system must
prevent inconsistent operations on the data. This requires more complex concurrency control
algorithms.

c. Location transparency means the user is unaware of where data are stored. The system must
route data requests to the appropriate sites.

Q.3.How might a distributed database designed for local-area network differ from one
design for a wide-area network?

Answer:
Data transfer on lan is much faster than wan. Thus, the replication and fragmentation will not
speed up throughput as much as in wan, but in lan it increase reliability and availability.

Q.4. Consider a distributed system with two sites, A and B. Consider whether site A can
distinguish among the following:
a. B goes down.
b. The link between A and B goes down.
c. B is extremely overloaded and its response time is 100 times longer than normal.

9
What implications does your answer have for recovery in distributed systems?

Answer:
One technique would be for B to periodically send a I-am-up message to A indicating it is still
alive.
If A does not receive an I-am up message, it can assume either B—or the network link—is
down. Note that an I-am-up message does not allow A to distinguish between each type of
failure.

One technique that allows A better to determine if the network is down is to send an Are-you-up
message to B using an alternate route. If it receives a reply, it can determine that indeed then
network link is down and that B is up.

If we assume that A knows B is up and is reachable (via the I-am up mechanism) and that A has
some value N that indicates a normal response time,

A could monitor the response time from B and compare values to N, allowing A to determine if
B is overloaded or not. The implications of both of these techniques are that A could choose
another host—say C—in the system if B is either down, unreachable or overloaded.

Q.5.Suggest an alternative scheme for implementing the persistent messaging based on


sequence numbers instead of timestamps?

Answer:

Web Services Reliability (WS-Reliability) provides SOAP-based (simple object access protocol)
web services with the ability to exchange messages asynchronously with guaranteed delivery,
without duplicates, and with message ordering. It is a SOAP standard for managing message
aggregation and sequencing, and provides a standard tactic for implementing the Guaranteed
Delivery and Resequencer patterns, among others. WS-Reliability leverages the SOAP Header
mechanism to add the header elements Message Header, Reliable Message, Message Order, and
RM Response to SOAP messages. These elements denote message identifiers such as group ids
and sequence numbers, timestamps, time to live values, message type values, sender and receiver
information, and acknowledgment callback information. WS-Reliability is produced by a number
of vendors, including Sun, Oracle, and Sonic, and it has been submitted to OASIS. It is heavily
influenced by the functionality of the ebXML Message Service.

10
Assignment-3

ARIES
ARIES, like many other algorithms, is based on the WAL protocol that ensures recoverability of
a database in the presence of a crash. All updates to all pages are logged (e.g. in logical fashion).
ARIES uses an LSN stored on each page to correlate the state of the page with logged updates of
that page. By examining the LSN of a page (called the PageLSN) it can be easily determined
which logged updates are reected in the page. Being able to determine the state of a page w.r.t.
logged updates is critical whilst repeating history, since it is essential that any update be applied
to a page once and only once. Failure to respect this requirement will in most cases result in a
violation of data consistency.
Updates performed during forward processing of transactions are described by Update Log
Records (ULRs). However, logging is not restricted to forward processing. ARIES also logs,
using Compensation Log Records (CLRs), updates (i.e. compensations of updates of aborted /
incomplete transactions) performed during partial or total rollbacks of transactions. By
appropriate chaining of CLR records to log records written during forward processing, a bounded
amount of logging is ensured during rollbacks, even in the face of repeated failures during crash
recovery. This chaining is achieved by 1) assigning LSNs in ascending sequence; and 2) adding a
pointer (called the PrevLSN) to the most recent preceding log record written by the same
transaction to each log record.
When the undo of a log record causes a CLR record to be written, a pointer (called the
UndoNextLSN) to the predecessor of the log record being undone is added to the CLR record.
The UndoNextLSN keeps track of the progress of a rollback. It tells the system from where to
continue the rollback of the transaction, if a system failure were to interrupt the completion of
the rollback.
Periodically during normal processing, ARIES takes fuzzy checkpoints in order to avoid
quiescing the database while checkpoint data is written to disk. Checkpoints are taken to make
crash recovery more efficient.

When performing crash recovery, ARIES makes three passes (i.e. Analysis, Redo and Undo)
over the log.

During Analysis, ARIES scans the log from the most recent checkpoint to the end of the log. It
determines
1) the starting point of the Redo phase by keeping track of dirty pages; and
2) the list of transactions to be rolled back in the Undo phase by monitoring the state of
transactions.
During Redo, ARIES repeats history. It is ensured that updates of all transactions have been
executed once and only once. Thus, the database is returned to the state it was in immediately
before the crash.

Finally, Undo rolls back all updates of transactions that have been identified as active at the
time the crash occurred.

11
ARIES
1. Uses log sequence number (LSN) to identify log records. Stores LSNs in pages to identify
what updates have already been applied to a database page
2. Physiological redo
3. Dirty page table to avoid unnecessary redos during recovery
4. Fuzzy checkpointing that only records information about dirty pages, and does not require
dirty pages to be written out at checkpoint time

1. The Log and Log Sequence Number (LSN)


a) A unique LSN is associated with every log record.
 LSN increases monotonically and indicates the disk address of the log record it is
associated with.
 In addition, each data page stores the LSN of the latest log record corresponding to a
change for that page.
b) A log record stores
 the previous LSN of that transaction
 the transaction ID
 the type of log record.
c) A log record stores:
 Previous LSN of that transaction: It links the log record of each transaction. It is like a
back pointer points to the previous record of the same transaction
 Transaction ID
 Type of log record
d) For a write operation the following additional information is logged:
 Page ID for the page that includes the item
 Length of the updated item
 Its offset from the beginning of the page

2. Physiological redo
Affected page is physically identified, action within page can be logical
1. Physiological redo is used to reduce logging overheads
e.g. when a record is deleted and all other records have to be moved to fill hole
 Physiological redo can log just the record deletion
 Physical redo would require logging of old and new values for much of the page
 Requires page to be output to disk atomically
2. Physiological redo is easy to achieve with hardware RAID, also supported by some disk
systems.
3. In physiological redo incomplete page output can be detected by checksum techniques,
 But extra actions are required for recovery.
 Treated as a media failure.

3. The Transaction table and the Dirty Page table


For efficient recovery following tables are also stored in the log during checkpointing:

12
a. Transaction table: Contains an entry for each active transaction, with information such as
transaction ID, transaction status and the LSN of the most recent log record for the transaction.
b. Dirty Page table: Contains an entry for each dirty page in the buffer, which includes the
page ID and the LSN corresponding to the earliest update to that page.

4. Checkpointing
A checkpointing does the following:
 Writes a begin_checkpoint record in the log
 Writes an end_checkpoint record in the log. With this record the contents of transaction
table and dirty page table are appended to the end of the log.
 Writes the LSN of the begin_checkpoint record to a special file. This special file is
accessed during recovery to locate the last checkpoint information.
 To reduce the cost of checkpointing and allow the system to continue to execute
transactions, ARIES uses “fuzzy checkpointing”.

The following steps are performed for recovery by ARIES:-

a. Analysis phase: Start at the begin_checkpoint record and proceed to the end_checkpoint
record. Access transaction table and dirty page table are appended to the end of the log. Note
that during this phase some other log records may be written to the log and transaction table may
be modified. The analysis phases compiles the set of redo and undo to be performed and ends.

b. Redo phase: Starts from the point in the log up to where all dirty pages have been flushed,
and move forward to the end of the log. Any change that appears in the dirty page table is
redone.

c. Undo phase: Starts from the end of the log and proceeds backward while performing
appropriate undo. For each undo it writes a compensating record in the log.
The recovery completes at the end of undo phase

Some other Features of ARIES are:


1. Recovery Independence:
 Pages can be recovered independently of others.
E.g. if some disk pages fail they can be recovered from a backup while other pages are
being used
2. Savepoints:
 Transactions can record savepoints and roll back to a savepoint
 Useful for complex transactions
 Also used to rollback just enough to release locks on deadlock
3. Fine-grained locking:
 index concurrency algorithms that permit tuple level locking on indices can be used
 These require logical undo, rather than physical undo, as in advanced recovery algorithm

13
4. Recovery optimizations: For example:
 Dirty page table can be used to prefetch pages during redo
 Out of order redo is possible:
 redo can be postponed on a page being fetched from disk, and
performed when page is fetched.
 Meanwhile other log records can continue to be processed

14

You might also like