You are on page 1of 19

Subject: Database Management System Mukesh Kumar

Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)


UNIT-4
Syllabus:
Transaction Processing Concept: Transaction system, Testing of serializability, serializability of
schedules, conflict & view serializable schedule, recoverability, Recovery from transaction failures, log
based recovery, checkpoints, deadlock handling.
Distributed Database: distributed data storage, concurrency control, directory system.

Transaction: A transaction is a unit of program execution that accesses and possibly updates various
data items. Usually, a transaction is initiated by a user program written in a high-level data-manipulation
language or programming language (for example, SQL, COBOL, C, C++, or Java), where it is delimited
by statements (or function calls) of the form begin transaction and end transaction. The transaction
consists of all operations executed between the begin transaction and end transaction.

ACID Properties:
To ensure integrity of the data, we require that the database system maintain the following properties of
the transactions known as ACID properties.
 Atomicity. Either all operations of the transaction are executed properly, or none. There must be no
state in a database where a transaction is left partially completed. States should be defined either
before the execution of the transaction or after the execution/abortion/failure of the transaction.
 Consistency. The database must remain in a consistent state after any transaction. No transaction
should have any adverse effect on the data residing in the database. If the database was in a
consistent state before the execution of a transaction, it must remain consistent after the execution of
the transaction as well.
 Isolation. Even though multiple transactions may execute concurrently, the system guarantees that,
for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti
started, or Tj started execution after Ti finished. Thus, each transaction is unaware of other
transactions executing concurrently in the system.
 Durability. After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.

Operation on Transactions:
Transactions access data using two operations:
 read(X), which transfers the data item X from the database to a local buffer belonging to the
transaction that executed the read operation.
 write(X), which transfers the data item X from the the local buffer of the transaction that executed
the write back to the database.

Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be defined
as:
Ti: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
Transaction State: A transaction must be in one of the following states:
• Active, the initial state; the transaction stays in this state while it is executing
• Partially committed, after the final statement has been executed
• Failed, after the discovery that normal execution can no longer proceed
• Aborted, after the transaction has been rolled back and the database has been restored to its state
prior to the start of the transaction
• Committed, after successful completion.

1. A transaction starts in the active state.


2. When it finishes its final statement, it enters the partially committed state. At this point, the
transaction has completed its execution, but it is still possible that it may have to be aborted,
since the actual output may still be temporarily residing in main memory, and thus a hardware
failure may preclude its successful completion.
3. A transaction enters the failed state after the system determines that the transaction can no longer
proceed with its normal execution (for example, because of hardware or logical errors).
4. Failed transaction must be rolled back. Then, it enters the aborted state. At this point, the system
has two options:
• It can restart the transaction, but only if the transaction was aborted as a result of some
hardware or software error that was not created through the internal logic of the transaction.
A restarted transaction is considered to be a new transaction.
• It can kill the transaction. It usually does so because of some internal logical error that can be
corrected only by rewriting the application program, or because the input was bad, or because
the desired data were not found in the database.
5. If a transaction executes all its operations successfully, it is said to be committed. All its effects
are now permanently established on the database system.

Implementation of Atomicity and Durability:

Shadow Copy: Shadow copy is a simple, but extremely inefficient scheme to maintain atomicity and
durability of transaction. This scheme, which is based on making copies of the database, called shadow
copies, assumes that only one transaction is active at a time. The scheme also assumes that the database
is simply a file on disk. A pointer called db-pointer is maintained on disk; it points to the current copy of
the database.
In the shadow-copy scheme, a transaction that wants to update the database first creates a complete copy
of the database. All updates are done on the new database copy, leaving the original copy, the shadow
copy, untouched. If at any point the transaction has to be aborted, the system merely deletes the new
copy. The old copy of the database has not been affected.

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4

If the transaction completes, it is committed as follows.


1. The operating system is asked to make sure that all pages of the new copy of the database have
been written out to disk.
2. The operating system has written all the pages to disk, the database system updates the pointer
db-pointer to point to the new copy of the database;
3. The new copy then becomes the current copy of the database. The old copy of the database is
then deleted.
The transaction is said to have been committed at the point where the updated db-pointer is written to
disk.

Concurrent Executions
Transaction-processing systems usually allow multiple transactions to run concurrently. Allowing
multiple transactions to update data concurrently causes several complications with consistency.
Ensuring consistency in spite of concurrent execution of transactions requires extra work; it is far easier
to insist that transactions run serially—that is, one at a time, each starting only after the previous one has
completed.
There are two good reasons for allowing concurrency:
 Improved throughput and resource utilization: Concurrent transaction increases the
throughput of the system—that is, the number of transactions executed in a given amount of
time.
 Reduced waiting time: Concurrent transaction reduces the average response time: the average
time for a transaction to be completed after it has been submitted.

Schedules: The execution sequences that describe chronological order in which instructions are
executed in the system are called schedules.
Let T1 and T2 be two transactions that transfer funds from one account to another.
Transaction T1 transfers $50 from account A to Transaction T2 transfers 10% of the balance from
account B. It is defined as: account A to account B. It is defined as
T1: read(A); T2: read(A);
A := A − 50; temp := A * 0.1;
write(A); A := A − temp;
read(B); write(A);
B := B + 50; read(B);

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
write(B). B := B + temp;
write(B)
A schedule for a set of transactions must consist of all instructions of those transactions, and must
preserve the order in which the instructions appear in each individual transaction.

Serial Schedule: Serial schedule consists of a sequence of instructions from various transactions, where
the instructions belonging to one single transaction appear together in that schedule.
Now let first execution sequence T1 followed by T2 as schedule 1, and to the second execution sequence
T2 followed by T1 as schedule 2.

Schedule 1 Schedule 2

Serial schedule must be in following order:


Serial schedule 1: r1(A), w1(A), r1(B), w1(B), r2(A), w2(A), r2(B), w2(B)
Serial schedule 2: r2(A), w2(A), r2(B), w2(B), r1(A), w1(A), r1(B), w1(B),

Concurrent Schedule: Concurrent schedule also known as non-serial schedule. A non-serial schedule is
a schedule where the operations of a group of concurrent transactions are interleaved. Eg. Schedule-3

Schedule 3 Schedule 4

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
Serializability: Serializability is the classical concurrency scheme. It ensures
that a schedule for executing concurrent transactions is equivalent to one that
executes the transactions serially in some order. It assumes that all accesses to
the database are done using read and write operations..

To ensure serializability, we consider only two operations: read and write.


Between a read(Q) and a write(Q) instruction on a data item Q, a transaction
may perform an arbitrary sequence of operations on the copy of Q that is
residing in the local buffer of the transaction.

Equivalence Schedules
An equivalence schedule can be of the following types −

1. Result Equivalence
If two schedules produce the same result after execution, they
are said to be result equivalent. They may yield the same result
for some value and different results for another set of values.
That's why this equivalence is not generally considered
significant.

2. View Equivalence
Two schedules would be view equivalence if the
transactions in both the schedules perform similar
actions in a similar manner.
For example −
 If T reads the initial data in S1, then it also
reads the initial data in S2.
 If T reads the value written by J in S1, then it
also reads the value written by J in S2.
 If T performs the final write on the data value
in S1, then it also performs the final write on
the data value in S2.

3. Conflict Equivalence
Two operations in a schedule would be conflicting if they have the following properties −
 They belong to different transactions.
 They access the same data item.
 At least one of them is "write" operation.

Two schedules S1 and S2 having multiple transactions with conflicting operations are said to be conflict
equivalent if and only if −
 Both the schedules contain the same set of Transactions.
 The order of conflicting pairs of operation is maintained in both the schedules.

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
Conflict Serializability: Let us consider a schedule S in which there are two consecutive instructions Ii
and Ij, of transactions Ti and Tj , respectively (i ≠ j).
 If Ii and Ij refer to different data items, then we can swap Ii and Ij without affecting the results of any
instruction in the schedule.
 If Ii and Ij refer to the same data item Q, then the order of the two steps may matter. Since we are
dealing with only read and write instructions, there are four cases that we need to considered:

1. Ii = read(Q), Ij = read(Q). The order of Ii and Ij does not matter, since the same value of Q is
read by Ti and Tj , regardless of the order.

2. Ii = read(Q), Ij = write(Q). If Ii comes before Ij, then Ti does not read the value of Q that is
written by Tj in instruction Ij. If Ij comes before Ii, then Ti reads the value of Q that is written by
Tj. Thus, the order of Ii and Ij matters

3. Ii = write(Q), Ij = read(Q). The order of Ii and Ij matters for reasons similar to those of the
previous case.

4. Ii = write(Q), Ij = write(Q). Since both instructions are write operations, the order of these
instructions does not affect either Ti or Tj. However, the value obtained by the next read(Q)
instruction of S is affected, since the result of only the latter of the two write instructions is
preserved in the database. If there is no other write(Q) instruction after Ii and Ij in S, then the
order of Ii and Ij directly affects the final value of Q in the database state that results from
schedule S.

Thus, only in the case 1 where both Ii and Ij are read instructions does the relative order of their
execution not matter.

Schedule1: showing only the read and write instructions. Schedule2

Let Ii and Ij be consecutive instructions of a schedule S. If Ii and Ij are instructions of different


transactions and Ii and Ij do not conflict, then we can swap the order of Ii and Ij to produce a new
schedule Sꞌ. We expect S to be equivalent to Sꞌ, since all instructions appear in the same order in both
schedules except for Ii and Ij, whose order does not matter.
For the above schedule1- We continue to swap non-conflicting instructions:
1. Swap the read(B) instruction of T1 with the read(A) instruction of T2.
2. Swap the write(B) instruction of T1 with the write(A) instruction of T2.
3. Swap the write(B) instruction of T1 with the read(A) instruction of T2.

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
If a schedule S can be transformed into a schedule Sꞌ by a series of swaps of non-conflicting instructions,
we say that S and Sꞌ are conflict equivalent.
The final result of these swaps is a serial schedule. Thus, if a schedule S can be transformed into a
schedule Sꞌ by a series of swaps of non-conflicting instructions, we say that S and Sꞌ are conflict
equivalent.
The concept of conflict equivalence leads to the concept of conflict serializability. We say that a
schedule S is conflict serializable if it is conflict equivalent to a serial schedule.

Recoverability:

1. Recoverable Schedules
A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a data
item previously written by Ti, the commit operation of Ti appears before the commit operation of Tj

Let the schedule in which T9 is a transaction that performs only one


instruction: read(A). Suppose that the system allows T9 to commit
immediately after executing the read(A) instruction. Thus, T9 commits
before T8 does. Now suppose that T8 fails before it commits. Since T9 has
read the value of data item A written by T8, we must abort T9 to ensure
transaction atomicity. However, T9 has already committed and cannot be
aborted. Thus, we have a situation where it is impossible to recover correctly
from the failure of T8.

2. Cascadeless Schedules:
If a schedule is recoverable, to recover correctly from the failure of a transaction Ti, we may have to roll
back several transactions. Such situations occur if transactions have read data written by Ti.
A cascadeless schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a data
item previously written by Ti, the commit operation of Ti appears before the read operation of Tj . It is
easy to verify that every cascadeless schedule is also recoverable.
.
Consider the partial schedule Transaction T10 writes a value of
A that is read by transaction T11. Transaction T11 writes a
value of A that is read by transaction T12. Suppose that, at this
point, T10 fails. T10 must be rolled back. Since T11 is
dependent on T10, T11 must be rolled back. Since T12 is
dependent on T11, T12 must be rolled back. This phenomenon,
in which a single transaction failure leads to a series of
transaction rollbacks, is called cascading rollback.

Recovery: Recovery scheme is an integral part of a database system that can restore the database to the
consistent state that existed before the failure. The recovery scheme must also provide high availability;
that is, it must minimize the time for which the database is not usable after a crash.

Failure Classification:
There are various types of failure that may occur in a system, each of which needs to be deal with in a
different manner. The simplest type of failure is one that does not result in the loss of information in the
system. The failures that are more difficult to deal with are those that result in loss of information. We
shall consider only the following types of failure:

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
1. Transaction failure. There are two types of errors that may cause a transaction to fail:
a. Logical error. The transaction can no longer continue with its normal execution because of
some internal condition, such as bad input, data not found, overflow, or resource limit
exceeded.
b. System error. The system has entered an undesirable state (for example, deadlock), as a
result of which a transaction cannot continue with its normal execution. The transaction,
however, can be re-executed at a later time.
2. System crash. There is a hardware malfunction, or a bug in the database software or the operating
system, that causes the loss of the content of volatile storage, and brings transaction processing to a
halt. The content of nonvolatile storage remains intact, and is not corrupted. The assumption that
hardware errors and bugs in the software bring the system to a halt, but do not corrupt the
nonvolatile storage contents, is known as the fail-stop assumption. Well-designed systems have
numerous internal checks, at the hardware and the software level that brings the system to a halt
when there is an error. Hence, the fail-stop assumption is a reasonable one.
3. Disk failure. A disk block loses its content as a result of either a head crash or failure during a data
transfer operation. Copies of the data on other disks, or archival backups on tertiary media, such as
tapes, are used to recover from the failure.

Storage Structure
We have already described the storage system. In brief, the storage structure can be divided into two
categories –
 Volatile storage − As the name suggests, a volatile storage cannot survive system crashes.
Volatile storage devices are placed very close to the CPU; normally they are embedded onto the
chipset itself. For example, main memory and cache memory are examples of volatile storage.
They are fast but can store only a small amount of information.
 Non-volatile storage − these memories are made to survive system crashes. They are huge in
data storage capacity, but slower in accessibility. Examples may include hard-disks, magnetic
tapes, flash memory, and non-volatile (battery backed up) RAM.

Recovery and Atomicity


When a system crashes, it may have several transactions being executed and various files opened for
them to modify the data items. Transactions are made of various operations, which are atomic in nature.
But according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained,
that is, either all the operations are executed or none.

When a DBMS recovers from a crash, it should maintain the following −


 It should check the states of all the transactions, which were being executed.
 A transaction may be in the middle of some operation; the DBMS must ensure the atomicity of
the transaction in this case.
 It should check whether the transaction can be completed now or it needs to be rolled back.
 No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining the
atomicity of a transaction −
 Maintaining the logs of each transaction, and writing them onto some stable storage before
actually modifying the database.
 Maintaining shadow paging, where the changes are done on a volatile memory, and later, the
actual database is updated.

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction. It is
important that the logs are written prior to the actual modification and stored on a stable storage media,
which is failsafe. An update log record describes a single database write. It has these fields:
 Transaction identifier is the unique identifier of the transaction that performed the write
operation.
 Data-item identifier is the unique identifier of the data item written. Typically, it is the location
on disk of the data item.
 Old value is the value of the data item prior to the write.
 New value is the value that the data item will have after the write.

Other special log records exist to record significant events during transaction processing, such as the
start of a transaction and the commit or abort of a transaction. We denote the various types of log
records as:
 <Ti start>. Transaction Ti has started.
 <Ti, Xj, V1, V2>. Transaction Ti has performed a write on data item Xj . Xj had value V1 before the
write, and will have value V2 after the write.
 <Ti commit>. Transaction Ti has committed.
 <Ti abort>. Transaction Ti has aborted.

Whenever a transaction performs a write, it is essential that the log record for that write be created
before the database is modified. Once a log record exists, we can output the modification to the database
if that is desirable. The database can be modified using two approaches
 Deferred database modification − All logs are written on to the stable storage and the database is
updated when a transaction commits.
 Immediate database modification − Each log follows an actual database modification. That is, the
database is modified immediately after every operation.

Using the log, the system can handle any failure that does not result in the loss of information in
nonvolatile storage. The recovery scheme uses two recovery procedures:
• undo(Ti) restores the value of all data items updated by transaction Ti to the old values.
• redo(Ti) sets the value of all data items updated by transaction Ti to the new values.

After a failure has occurred, the recovery scheme consults the log to determine which transactions need
to be redone, and which need to be undone:
 Transaction Ti needs to be undone if the log contains the record <Ti start>, but does not contain the
record <Ti commit>.
 Transaction Ti needs to be redone if the log contains both the record <Ti start> and the record <Ti
commit>.

Checkpoints
There are two major difficulties with this Log Based Recovery and redo /undo operation.
1. The search process is time consuming.
2. Most of the transactions that, according to our algorithm, need to be redone have already written
their updates into the database. Although redoing them will cause no harm, it will nevertheless
cause recovery to take longer.

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
Thus keeping and maintaining logs in real time and in real environment may fill out all the memory
space available in the system. As time passes, the log file may grow too big to be handled at all.

To reduce these types of overhead, we introduce checkpoints. Checkpoint is a mechanism where all the
previous logs are removed from the system and stored permanently in a storage disk. Checkpoint
declares a point before which the DBMS was in consistent state, and all the transactions were
committed. The system periodically performs checkpoints, which require the following sequence of
actions to take place:
1. Output onto stable storage all log records currently residing in main memory.
2. Output to the disk all modified buffer blocks.
3. Output onto stable storage a log record <checkpoint>.

Recovery
The exact recovery operations to be performed depend on the modification technique being used. For the
immediate-modification technique, the recovery operations are:
 For all transactions Tk in T that have no <Tk commit> record in the log, execute undo(Tk).
 For all transactions Tk in T such that the record <Tk commit> appears in the log, execute redo(Tk).

When a system with concurrent transactions crashes and recovers, it behaves in the following manner −
 The recovery system reads the logs backwards
from the end to the last checkpoint.
 It maintains two lists, an undo-list and a redo-list.
 If the recovery system sees a log with <Tn, Start>
and <Tn, Commit> or just <Tn, Commit>, it puts
the transaction in the redo-list.
 If the recovery system sees a log with <Tn, Start>
but no commit or abort log found, it puts the
transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in
the redo-list and their previous logs are removed and then redone before saving their logs.

Deadlock
A system is in a deadlock state if there exists a set of transactions such that every transaction in the set is
waiting for another transaction in the set. More precisely, there exists a set of waiting transactions {T0,
T1, . . ., Tn} such that T0 is waiting for abdata item that T1 holds, and T1 is waiting for a data item that
T2 holds, and . . ., and Tn−1 is waiting for a data item that Tn holds, and Tn is waiting for a data item
that T0 holds. None of the transactions can make progress in such a situation.

Deadlock Handling
There are two principal methods for dealing with the deadlock problem.
1. Deadlock prevention protocol
2. Deadlock detection and deadlock recovery

1. Deadlock Prevention:
There are Two different deadlock prevention schemes using timestamps have been proposed

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
(a). The wait–die scheme is a nonpreemptive technique. When transaction Ti requests a data item
currently held by Tj , Ti is allowed to wait only if it has a timestamp smaller than that of Tj (that
is, Ti is older than Tj ). Otherwise, Ti is rolled back (dies).
For example, suppose that transactions T1, T2, and T3 have timestamps 5, 10, and 15,
respectively. If T1 requests a data itemheld by T3, then T1 will wait. If T3 requests a data item
held by T2, then T3 will be rolled back.
(b). The wound–wait scheme is a preemptive technique. It is a counterpart to the wait–die scheme.
When transaction Ti requests a data item currently held by Tj , Ti is allowed to wait only if it has
a timestamp larger than that of Tj (that is, Ti is younger than Tj ). Otherwise, Tj is rolled back
(Tj is wounded by Ti).
For example, with transactions T1, T2, and T3, if T1 requests a data item held by T1, then the
data item will be preempted from T2, and T2 will be rolled back. If T3 requests a data item held
by T2, then T3 will wait.

2. Deadlock Detection and Recovery:


If a system does not employ some protocol that ensures deadlock freedom, then a detection and recovery
scheme must be used. An algorithm that examines the state of the system is invoked periodically to
determine whether a deadlock has occurred.
Deadlock Detection:
A deadlock exists in the system if and only if the wait-for graph contains a cycle. Each transaction
involved in the cycle is said to be deadlocked. To detect deadlocks, the system needs to maintain the
wait-for graph, and periodically to invoke an algorithm that searches for a cycle in the graph.

Wait-for graph without a cycle. Wait-for graph with a cycle.

Wait-for-Graph
Deadlocks can be described precisely in terms of a directed graph called a wait-for graph. This graph
consists of a pair G = (V, E), where V is a set of vertices and E is a set of edges. The set of vertices
consists of all the transactions in the system. Each element in the set E of edges is an ordered pair Ti →
Tj. If Ti → Tj is in E, then there is a directed edge from transaction Ti to Tj , implying that transaction
Ti is waiting for transaction Tj to release a data item that it needs.

When transaction Ti requests a data item currently being held by transaction Tj , then the edge Ti → Tj
is inserted in the wait-for graph. This edge is removed only when transaction Tj is no longer holding a
data item needed by transaction Ti.

Recovery from Deadlock


When a detection algorithm determines that a deadlock exists, the system must recover from the
deadlock. The most common solution is to roll back one or more transactions to break the deadlock.
Three actions need to be taken:

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
1. Selection of a victim. Given a set of deadlocked transactions, we must determine which transaction
(or transactions) to roll back to break the deadlock.We should roll back those transactions that will
incur the minimum cost. Many factors may determine the cost of a rollback, including
a. How long the transaction has computed, and how much longer the transaction will compute
before it completes its designated task.
b. How many data items the transaction has used.
c. How many more data items the transaction needs for it to complete.
d. How many transactions will be involved in the rollback.

2. Rollback: Once we have decided that a particular transaction must be rolled back, we must
determine how far this transaction should be rolled back.
(a). Total rollback: Abort the transaction and then restart it. However, it is more effective to roll
back the transaction only as far as necessary to break the deadlock.
(b).Partial rollback requires the system to maintain additional information about the state of all the
running transactions. Specifically, the sequence of lock requests/grants and updates performed
by the transaction needs to be recorded.

3. Starvation. In a system where the selection of victims is based primarily on cost factors, it may
happen that the same transaction is always picked as a victim. As a result, this transaction never
completes its designated task, thus there is starvation. We must ensure that transaction can be picked
as a victim only a (small) finite number of times. The most common solution is to include the
number of rollbacks in the cost factor.

Distributed Database: A distributed database is a collection of multiple interconnected databases,


which are spread physically across various locations that communicate via a computer network. A
distributed DBMS manages the distributed database in a manner so that it appears as one single database
to users.
Features of Distrubuted Database System:
 Databases are logically interrelated and interconnected with each other. Often they represent a single
logical database.
 Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.
 The processors in the sites are connected via a network. They do not have any multiprocessor
configuration.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system.

Distributed Database Management System


A distributed database management system (DDBMS) is a centralized software system that manages a
distributed database in a manner as if it were all stored in a single location.
Features
 It is used to create, retrieve, update and delete distributed databases.
 It synchronizes the database periodically and provides access mechanisms by the virtue of which the
distribution becomes transparent to the users.
 It ensures that the data modified at any site is universally updated.

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
 It is used in application areas where large volumes of data are processed and accessed by numerous
users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.

Advantages of Distributed Databases


Following are the advantages of distributed databases over centralized databases.

1. Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers and
local data to the new site and finally connecting them to the distributed system, with no interruption
in current functions.
2. More Reliable − In case of database failures, the total system of centralized databases comes to a
halt. However, in distributed systems, when a component fails, the functioning of the system
continues may be at a reduced performance. Hence DDBMS is more reliable.
3. Better Response − If data is distributed in an efficient manner, then user requests can be met from
local data itself, thus providing faster response. On the other hand, in centralized systems, all queries
have to pass through the central computer for processing, which increases the response time.
4. Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not
feasible in centralized systems.

Types of Distributed Databases


Distributed databases can be broadly classified into homogeneous and heterogeneous distributed
database environments, each with further sub-divisions, as shown in the following illustration.

Homogeneous Distributed Databases


In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its
properties are −

 The sites use very similar software.

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
 The sites use identical DBMS or DBMS from the same vendor.
 Each site is aware of all other sites and cooperates with other sites to process user requests.
 The database is accessed through a single interface as if it is a single database.

Types of Homogeneous Distributed Database


There are two types of homogeneous distributed database −
 Autonomous − Each database is independent that functions on its own. They are integrated by a
controlling application and use message passing to share data updates.
 Non-autonomous − Data is distributed across the homogeneous nodes and a central or master
DBMS co-ordinates data updates across the sites.

Heterogeneous Distributed Databases


In a heterogeneous distributed database, different sites have different operating systems, DBMS
products and data models. Its properties are −
 Different sites use dissimilar schemas and software.
 The system may be composed of a variety of DBMSs like relational, network, hierarchical or
object oriented.
 Query processing is complex due to dissimilar schemas.
 Transaction processing is complex due to dissimilar software.
 A site may not be aware of other sites and so there is limited co-operation in processing user
requests.

Types of Heterogeneous Distributed Databases


 Federated − The heterogeneous database systems are independent in nature and integrated
together so that they function as a single database system.
 Un-federated − The database systems employ a central coordinating module through which the
databases are accessed.

Distributed DBMS Architectures


DDBMS architectures are generally developed depending on three parameters −
 Distribution − It states the physical distribution of data across the different sites.
 Autonomy − It indicates the distribution of control of the database system and the degree to
which each constituent DBMS can operate independently.
 Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system
components and databases.

Architectural Models
1. Client - Server Architecture for DDBMS
This is a two-level architecture where the functionality is divided into servers and clients. The
server functions primarily encompass data management, query processing, optimization and
transaction management. Client functions include mainly user interface. However, they have
some functions like consistency checking and transaction management.
The two different client - server architecture are −
 Single Server Multiple Client
 Multiple Server Multiple Client (shown in the following diagram)

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4

Peer- to-Peer Architecture for DDBMS


In these systems, each peer acts both as a client and a server for imparting database services. The peers
share their resource with other peers and co-ordinate their activities.
This architecture generally has four levels of schemas −
 Global Conceptual Schema − Depicts the global logical view of data.
 Local Conceptual Schema − Depicts logical data organization at each site.
 Local Internal Schema − Depicts physical data organization at each site.
 External Schema − Depicts user view of data.

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
Distributed Data Storage
Consider a relation r that is to be stored in the database. There are two approaches to storing this relation
in the distributed database:
 Replication. The system maintains several identical replicas (copies) of the relation, and stores each
replica at a different site. The alternative to replication is to store only one copy of relation r.
 Fragmentation. The system partitions the relation into several fragments, and stores each fragment
at a different site.
Fragmentation and replication can be combined: A relation can be partitioned into several fragments and
there may be several replicas of each fragment. In the following subsections, we elaborate on each of
these techniques.

Data Replication:
If relation r is replicated, a copy of relation r is stored in two or more sites. In the most extreme case, we
have full replication, in which a copy is stored in every site in the system.
There are a number of advantages and disadvantages to replication.
 Availability. If one of the sites containing relation r fails, then the relation r can be found in another
site. Thus, the system can continue to process queries involving r, despite the failure of one site.
 Increased parallelism. In the case where the majority of accesses to the relation r result in only the
reading of the relation, then several sites can process queries involving r in parallel. The more
replicas of r there are, the greater the chance that the needed data will be found in the site where the
transaction is executing. Hence, data replication minimizes movement of data between sites.
 Increased overhead on update. The system must ensure that all replicas of a relation r are
consistent; otherwise, erroneous computations may result. Thus, whenever r is updated, the update
must be propagated to all sites containing replicas. The result is increased overhead.

In general, replication enhances the performance of read operations and increases the availability of data
to read-only transactions. However, update transactions incur greater overhead. Controlling concurrent
updates by several transactions to replicated data is more complex than in centralized systems

Data Fragmentation:
If relation r is fragmented, r is divided into a number of fragments r1, r2, . . . , rn. These fragments
contain sufficient information to allow reconstruction of the original relation r. There are two different
schemes for fragmenting a relation:
 Horizontal fragmentation
 Vertical fragmentation.
Horizontal fragmentation:
1. The Horizontal Fragmentation splits the relation by assigning each tuple of r to one or more
fragments.
2. A relation r is partitioned into a number of subsets, r1, r2, . . . , rn. Each tuple of relation r must
belong to at least one of the fragments, so that the original relation can be reconstructed, if needed.
3. Horizontal fragmentation is usually used to keep tuples at the sites where they are used the most, to
minimize data transfer.
4. A horizontal fragment can be defined as a selection on the global relation r. That is,we use a
predicate Pi to construct fragment ri : ri =σPi (r )
5. We reconstruct the relation r by taking the union of all fragments; that is:
r = r1 ∪ r2 ∪ · · · ∪ rn

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
Vertical fragmentation:
1. Vertical fragmentation splits the relation by decomposing the scheme R of relation r.
2. Vertical fragmentation of r(R) involves the definition of several subsets of attributes R1, R2,
………, Rn of the schema R so that:
R = R1 ∪ R2 ∪ · · · ∪ Rn
3. Each fragment ri of r is defined by:
ri = ∏Ri (r )
4. The fragmentation should be done in such a way that we can reconstruct relation r from the
fragments by taking the natural join:
r = r1 >< r2 >< r3 >< · · · >< rn
5. The relation r can be reconstructed is to include the primary-key attributes of R in each Ri .More
generally, any superkey can be used. It is often convenient to add a special attribute, called a tuple-
id, to the schema R.

Transparency in Distributed Database:


The user of a distributed database system should not be required to know where the data are physically
located nor how the data can be accessed at the specific local site. This characteristic, called data
transparency, can take several forms:
• Fragmentation transparency. Users are not required to know how a relation has been fragmented.
• Replication transparency. Users view each data object as logically unique. The distributed system
may replicate an object to increase either system performance or data availability. Users do not have
to be concerned with what data objects have been replicated, or where replicas have been placed.
• Location transparency. Users are not required to know the physical location of the data. The
distributed database system should be able to find any data as long as the data identifier is supplied
by the user transaction.

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4
Commit Protocols: To ensure atomicity, all the sites in which a transaction T executed must agree on
the final outcome of the execution. T must either commit at all sites, or it must abort at all sites. To
ensure this property, the transaction coordinator of T must execute a commit protocol.

Among the simplest and mostwidely used commit protocols is the two-phase commit protocol (2PC).
An alternative is the three-phase commit protocol (3PC), which avoids certain disadvantages of the 2PC
protocol but adds to complexity and overhead.

Two Phase Commit


The steps performed in the two phases are as follows:
Phase 1: Prepare Phase
 After each slave has locally completed its transaction, it sends a ―DONE‖ message to the controlling
site. When the controlling site has received ―DONE‖ message from all slaves, it sends a ―Prepare‖
message to the slaves.
 The slaves vote on whether they still want to commit or not. If a slave wants to commit, it sends a
―Ready‖ message.
 A slave that does not want to commit sends a ―Not Ready‖ message. This may happen when the
slave has conflicting concurrent transactions or there is a time-out.

Phase 2: Commit/Abort Phase


 After the controlling site has received ―Ready‖ message from all the slaves:
o The controlling site sends a ―Global Commit‖ message to the slaves.
o The slaves apply the transaction and send a ―Commit ACK‖ message to the controlling site.
o When the controlling site receives ―Commit ACK‖ message from all the slaves, it considers the
transaction as committed.
 After the controlling site has received the first ―Not Ready‖ message from any slave:
o The controlling site sends a ―Global Abort‖ message to the slaves.
o The slaves abort the transaction and send a ―Abort ACK‖ message to the controlling site.
o When the controlling site receives ―Abort ACK‖ message from all the slaves, it considers the
transaction as aborted.

Distributed Three-phase Commit


 Phase 1: Prepare Phase
o The steps are same as in distributed two-phase commit.
 Phase 2: Prepare to Commit Phase
o The controlling site issues an ―Enter Prepared State‖ broadcast message.
o The slave sites vote ―OK‖ in response.
 Phase 3: Commit / Abort Phase
o The steps are same as two-phase commit except that ―Commit ACK‖/‖Abort ACK‖ message is
not required.

I.T.S Engineering College, Greater Noida


Subject: Database Management System Mukesh Kumar
Subject Code: (NCS-502) Assistant Professor (CSE-Deptt)
UNIT-4

I.T.S Engineering College, Greater Noida

You might also like