You are on page 1of 61

E

UNIT-TWO
TRANSACTIONS PROCESSING

What is a transaction?
A Transaction is a mechanism for applying the
desired modifications/operations to a database. It
is evident in real life that the final database
instance after a successful manipulation of the
content of the database is the most up-to-date
copy of the database.
Action, or series of actions, carried out by a single
user or application program, which accesses or
changes contents of database. (i.e. Logical unit of
work on the database.)

1 Advanced Database Systems Lecture Note


E

A transaction could be a whole program,


part/module of a program or a single command.
Changes made in real time to a database are
called transactions. Examples include ATM
transactions, credit card approvals, flight
reservations, hotel check-in, phone calls,
supermarket canning, academic registration and
billing.
A transaction could be composed of one or more
database and non-database operations.
Transforms database from one consistent state to
another, although consistency may be violated
during transaction.
A database transaction is a unit of interaction
with database management system or similar
system that is treated in a coherent and reliable
way independent of other transactions.
Transaction processing system
 A system that manages transactions and controls their access to a DBMS is called a TP
monitor.
 A transaction processing system (TPS) generally consists of a TP monitor, one or more
DBMSs, and a set of application programs containing transaction.
 In database field, a transaction is a group of logical operations that must all succeed or fail
as a group. Systems dedicated to supporting such operations are known as transaction
processing systems.
6

2 Advanced Database Systems Lecture Note


E

In comparison with database transaction, application program is series of transactions with non-
database processing in between.

Distinction between business transactions and online transactions:


 A business transaction is an interaction in the real world, usually between an
enterprise and a person, where something is exchanged.
 An online transaction is the execution of a program that performs an administrative
or real-time function, often by accessing shared data sources, usually on behalf of an
online user (although some transactions are run offline in batch).

What we are interested about is the online transaction, which is the interaction between the users
of a database system and the shared data stored in the database. This transaction program
contains the steps involved in the business transaction.

Transactions can be started, attempted, then committed or aborted via data manipulation
commands of SQL.

Can have one of two outcomes for any transaction:


 Success - transaction commits and database reaches a new consistent state
 Committed transaction cannot be aborted or rolled back.
 How do you discard a committed transaction?
 Failure - transaction aborts, and database must be restored to consistent state before it
Started.
 Such a transaction is rolled back or undone.
 Aborted transaction that is rolled back can be restarted later.
In many data manipulation languages, there are keywords used to indicate different states of a
transaction.

A single transaction might require several queries, each reading and/or writing information in the
database.When this happens it is usually important to be sure that the database is not left with
only some of the queries carried out. For example, when doing a money transfer, if the money was
debited from one account, it is important that it also be credited to the depositing account. Also,
transactions should not interfere with each other.

A transaction is expected to exhibit some basic features or properties to be considered as a valid


transaction. These features are:

A: Atomicity
C: Consistency
I: Isolation
D: Durability

3 Advanced Database Systems Lecture Note


E

It is referred as ACID property of transaction. Without the ACID property, the integrity of
the database cannot be guaranteed.

Atomicity
Is All or None property

Every transaction should be considered as an atomic process which cannot be sub divided into small tasks.
Due to this property, just like an atom which exists or does not exist, a transaction has only two states.

Done or Never Started.

Done - a transaction must complete successfully and its effect should be visible in the
database.

Never Started -If a transaction fails during execution then all its modifications must be
undone to bring back the database to the last consistent state, i.e., remove the effect
of failed transaction.

No state between Done and Never Started

Consistency
If the transaction code is correct then a transaction, at the end of its execution, must leave the database
consistent. A transaction should transform a database from one previous consistent state to another
consistent state.

Isolation
A transaction must execute without interference from other concurrent transactions and its intermediate or
partial modifications to data must not be visible to other transactions.

Durability
The effect of a completed transaction must persist in the database, i.e., its updates must be available to
other transaction immediately after the end of its execution, and it should not be affected due to failures
after the completion of the transaction.

In practice, these properties are often relaxed somewhat to provide better performance.

State of a Transaction
A transaction is an atomic operation from the users’ perspective. But it has a collection of operations and it
can have a number of states during its execution.

A transaction can end in three possible ways.

1. Successful Termination:

4 Advanced Database Systems Lecture Note


E

- When a transaction completes the execution of all operations in it and reaches the COMMIT
command.
2. Suicidal Termination:
- When the transaction detects an error during its processing and decide to abrupt itself before the
end of the transaction and perform a ROLL BACK
3. Murderous Termination:
- When the DBMS or the system force the execution to abort for any reason.

Start Ok to Commit
Commit Commit
Database
Modified

No Error

System Detects
Error
Modify Abort End of
Transaction

Error Detected System


by Transaction Initiated Consistent
Consistent State

(PProjID=WProjID)
<PProj
State
Error Transaction RollBack
(WEmpID=EE Database
Initiated ID>
mpID)
unmodified

Most SQL statements seem to be very short<FName


and easy to execute. But the reverse is true if you consider it as
a one command transaction. Actually a database system interprets a transaction not as an application
, LName>
program but as a logical sequence of low- level operations read and write (referred to as primitives).

Ways of Transaction Execution


In a database system many transactions are executed. Basically there are two ways of executing a set of
transactions:

(a) Serially: Serial Execution: In a serial execution transactions are executed strictly serially. Thus,
Transaction Ti completes and writes its results to the database then only the next transaction Tj is
scheduled for execution. This means at one time there is only one transaction that is being executed
in the system. The data is not shared between transactions at one specific time.

5 Advanced Database Systems Lecture Note


E

In Serial transaction execution, one transaction being executed does not interfere the execution of any other
transaction.

Good things about serial execution

 Correct execution, i.e., if the input is correct then output will be correct.
 Fast execution, since all the resources are available to the active.
The worst thing about serial execution is very inefficient resource utilization.

Example of a serial execution, Suppose data items X = 10, Y = 6, and N =1 and T1 and T2 are transactions.

T1 T2

read (X) read (X)

X := X+N X := X+N

write (X) write (X)

read (Y)

Y := Y+N

write (Y)

We execute this transaction serially as follows:

Time T1 T2

read (X) {X = 10}

X := X+N {X = 11}

write (X) {X = 11}

6 Advanced Database Systems Lecture Note


E

read (Y) {Y = 6}

Y := Y+N {Y = 7}

write (Y) {Y = 7}

read (X) {X = 11}

X := X+N {X = 12}

write (X)

Final values of X, Y at the end of T1 and T2: X = 12 and Y = 7.


Thus we can witness that in serial execution of transaction, if we have two transactions Ti and Ti+1, then
Ti+1 will only be executed after the completion of Ti.

(b) Concurrently: is the reverse of serially executable transactions, in this scheme the individual
operations of transactions, i.e., reads and writes are interleaved in some order.

Time T1 T2

read (X) {X = 10}

read (X) {X = 10}

X := X+N {X = 11}

X := X+N {X = 11}

write (X) {X = 11}

write (X)

read (Y) {Y = 6}

Y := Y+N {Y = 7}

write (Y) {Y = 7}

Final values at the end of T1 and T2 : X = 11, and Y = 7. This improves resource
utilization, unfortunately gives incorrect result.
The correct value of X is 12 but in concurrent execution X =11, which is incorrect. The
reason for this error is incorrect sharing of X by T1 and T2.

7 Advanced Database Systems Lecture Note


E

In serial execution T2 read the value of X written by T1 (i.e., 11) but in concurrent
execution T2 read the same value of X (i.e., 10) as T1 did and the update made by T1 was
overwritten by T2’s update.
This is the reason the final value of X is one less that what is produced by serial execution.

8 Advanced Database Systems Lecture Note


E

Problems Associated with Concurrent Transaction Processing


Although two transactions may be correct in themselves, interleaving of operations may
produce an incorrect result which needs control over access.
Having a concurrent transaction processing, one can enhance the throughput of the
system. As reading and writing is performed from and on secondary storage, the system
will not be idle during these operations if there is a concurrent processing.
Every transaction should be correct by themselves, but this would not guarantee that the
interleaving of these transactions will produce a correct result.
The three potential problems caused by concurrency are:
 Lost Update Problem
 Uncommitted Dependency Problem
 Inconsistent Analysis Problem.
Lost Update Problem
Successfully completed update on a data set by one transaction is overridden by another
transaction/user.
E.g. Account with balance A=100.
 T1 reads the account A
 T1 withdraws 10 from A
 T1 makes the update in the Database
 T2 reads the account A
 T2 adds 100 on A
 T2 makes the update in the Database
 In the above case, if done one after the other (serially) then we have no problem.
 If the execution is T1 followed by T2 then A=190
 If the execution is T2 followed by T1 then A=190
 But if they start at the same time in the following sequence:
 T1 reads the account A=100
 T1 withdraws 10 making the balance A=90
 T2 reads the account A=100
 T2 adds 100 making A=200
 T1 makes the update in the Database A=90
 T2 makes the update in the Database A=200

9 Advanced Database Systems Lecture Note


E

 After the successful completion of the operation in this schedule, the final value of A
will be 200 which override the update made by the first transaction that changed the
value from 100 to 90.

Uncommitted Dependency Problem


Occurs when one transaction can see intermediate results of another transaction before it
is committed.
E.g.
 T2 increases 100 making it 200 but then aborts the transaction before it is
committed. T1 gets 200, subtracts 10 and make it 190. But the actual balance
should be 90

Inconsistent Analysis Problem


Occurs when transaction reads several values but second transaction updates some of
them during execution and before the completion of the first.
E.g.
 T2 would like to add the values of A=10, B=20 and C=30. after the values are
read by T2 and before its completion, T1 updates the value of B to be 50. at the
end of the execution of the two transactions T2 will come up with the sum of
60 while it should be 90 since B is updated to 50.

As discussed above, the objective of Concurrency Control Protocol is to schedule


transactions in such a way as to avoid any interference between them. This demands a
new principle in transaction processing, which is serializability of the schedule of
execution of multiple transactions.

UNIT-THREE

10 Advanced Database Systems Lecture Note


E

Concurrency Control Techniques


Concurrency Control is the process of managing simultaneous operations on the database
without having them interfere with one another. Prevents interference when two or more
users are accessing database simultaneously and at least one is updating data. Although
two transactions may be correct in themselves, interleaving of operations may produce an
incorrect result.
Three basic concurrency control techniques:

1. Locking methods
2. Time stamping
3. Optimistic

Locking and Time stamping are pessimistic approaches since they delay transactions.
Both Locking and Time stamping are conservative approaches: delay transactions in case they
conflict with other transactions.
The optimistic approach allows us to proceed and check conflicts at the end.
Optimistic methods assume conflict is rare and only check for conflicts at commit.

Locking Method
A LOCK is a mechanism for enforcing limits on access to a resource in an environment
where there are many threads of execution. Locks are one way of enforcing concurrency
control policies. Transaction uses locks to deny access to other transactions and so prevent
incorrect updates.

Lock prevents another transaction from modifying item or even reading it, in the
case of a write lock.
Lock (X): If a transaction T1 applies Lock on data item X, then X is locked and it is not
available to any other transaction.
Unlock (X): T1 Unlocks X. X is available to other transactions.

11 Advanced Database Systems Lecture Note


E

Types of a Lock
Shared lock: A Read operation does not change the value of a data item. Hence a data
item can be read by two different transactions simultaneously under share
lock mode. So only to read a data item T1 will do: Share lock (X), then Read
(X), and finally Unlock (X).
Exclusive lock:A write operation changes the value of the data item. Hence two write
operations from two different transactions or a write from T1 and a read
from T2 are not allowed. A data item can be modified only under Exclusive
lock. To modify a data item T1 will do: Exclusive lock (X), then Write (X) and
finally Unlock (X).
When these locks are applied, then a transaction must behave in a special way. This
special behavior of a transaction is referred to as well-formed.

Well-formed:A transaction is well- formed if it does not lock a locked data item and it
does not try to unlock an unlocked data item.

Locking - Basic Rules


 If transaction has shared lock on item, can read but not update item.
 If transaction has exclusive lock on item, can both read and update item.
 Reads cannot conflict, so more than one transaction can hold shared locks
simultaneously on same item.
 Exclusive lock gives transaction exclusive access to that item.
 Some systems allow transaction to upgrade a shared lock to an exclusive lock, or
vice-versa.

Examples: T1 and T2 are two transactions. They are executed under locking as follows. T1
locks A in exclusive mode. When T2 wants toc lock A, it finds it locked by T1 so T2 waits
for Unlock on A by T1. When A is released then T2 locks A and begins execution.
Suppose a lock on a data item is applied, the data item is processed and it is unlocked
immediately after reading/writing is completed as follows.
Initial values of A = 10 and B = 20.

12 Advanced Database Systems Lecture Note


E

Serial Execution of T1 and then T2 Concurrent Execution of T1 and T2

T1 T2 T1 T2
Lock (A) Lock (A)

read (A) {A = 10} read (A) {A = 10}

A := A + 100 A := A + 100

write (A) (A = 110} write (A) (A = 110}

Unlock (A) Unlock (A)

Lock (B) Lock (B)

read (B) {B = 20} read (B) {B = 20}

B := B + 10 B := B * 5

write (B) {B =30} write (B) {B = 100}

Unlock (B) Unlock (B)

Lock (B) Lock (B)

read (B) {B = 30} read (B) {B = 100}

B := B * 5 B := B + 10

write (B) {B = 150} write (B) {B = 110}

Unlock (B) Unlock (B)

Lock (A) Lock (A)

Read (A) {A = 110} Read (A) {A = 110}

A := A + 20 A := A + 20

Write (A) {A = 130} Write (A) {A = 130}

Unlock (A) Unlock (A)

Final Result: A=130 B=150 Final Result: A=130 B=110

13 Advanced Database Systems Lecture Note


The final result of the two transactions using the two types of transaction execution (serial
and concurrent) is not the same. This indicates that the above method of locking and
unlocking is not correct. Thus, to preserve consistency we have to use another approach to
locking, two-phase locking scheme.
Two-Phase Locking (2PL)
A transaction follows 2PL protocol if all locking operations precede the first unlock
operation in the transaction.
The 2PL protocol demands locking and unlocking of a transaction to have two phases.

 Growing phase - acquires all locks but cannot release any locks.

 Shrinking phase - releases locks but cannot acquire any new locks.

Locking methods: problems


Deadlock: A deadlock that may result when two (or more) transactions are each
waiting for locks held by the other to be released.
Deadlock - possible solutions
Only one way to break deadlock: abort one or more of the transactions in the deadlock.
Deadlock should be transparent to user, so DBMS should restart transaction(s).
Two general techniques for handling deadlock:
 Deadlock prevention.
 Deadlock detection and recovery.

Timeout
The deadlock detection could be done using the technique of TIMEOUT. Every transaction
will be given a time to wait in case of deadlock. If a transaction waits for the predefined
period of time in idle mode, the DBMS will assume that deadlock occurred and it will abort
and restart the transaction.

14 Advanced Database Systems Lecture Note


Time-stamping Method
Timestamp: a unique identifier created by DBMS that indicates relative starting time of a
transaction.
Can be generated by:
 using system clock at time transaction started, or
 Incrementing a logical counter every time a new transaction starts.
Time-stamping A concurrency control protocol that orders transactions in such a way
that older transactions, transactions with smaller time stamps, get priority in the event of
conflict.
 Transactions ordered globally based on their timestamp so that older transactions,
transactions with earlier timestamps, get priority in the event of conflict.
 Conflict is resolved by rolling back and restarting transaction.
 Since there is no need to use lock there will be No Deadlock.
In timestamp ordering, the schedule is equivalent to the particular serial order that
corresponds to the order of the transaction timestamps.

To implement this scheme, every transaction will be given a timestamp which is a


unique identifier of a transaction.
If Ti came to processing prior to Tj then TS of Tj will be larger than TS of Ti.
Again each data item will have a timestamp for Read and Write.
 WTS(A) which denotes the largest timestamp of any transaction that
successfully executed Write(A)
 RTS(A) which denotes the largest timestamp of any transaction that
successfully executed Read(A)
These timestamps are updated whenever a new Read(A) or Write(A) instruction
is executed.
Read/write proceeds only if last update on that data item was carried out by an older
transaction. Otherwise, transaction requesting read/write is restarted and given a new
timestamp.

The timestamp ordering protocol ensures that any conflicting read and write
operations are executed in the timestamp order.

15 Advanced Database Systems Lecture Note


The protocol manages concurrent execution such that the time-stamps determine the
serializability order.
Rules for permitting execution of operations in Time-stamping Method
Suppose that Transaction Ti issues Read(A)
 If TS(Ti) < WTS(A): this implies that Ti needs to read a value of A which was
already overwritten. Hence the read operation must be rejected and Ti is rolled
back.
 If TS(Ti) >= WTS(A): then the read is executed and RTS(A) is set to the
maximum of RTS(A) and TS(Ti).
Suppose that Transaction Ti issues Write(A)
 If TS(Ti) < RTS(A): then this implies that the value of A that Ti is producing
was previously needed and it was assumed that it would never be produced.
Hence, the Write operation must be rejected and Ti is rolled back.
 If TS(Ti) < WTS(A): then this implies that Ti is attempting to Write an object
value of A. hence, this write operation can be ignored.
 Otherwise the Write operation is executed and WTS(A) is set to the maximum
of WTS(A) or TS(Ti).
A transaction that is rolled back due to conflict will be restarted and be given a new
timestamp.

Problem with timestamp-ordering protocol:


 Suppose Ti aborts, but Tj has read a data item written by Ti
 Then Tjmust abort; if Tjhad been allowed to commit earlier, the schedule
is not recoverable.
 Further, any transaction that has read a data item written by Tj must also
abort
 This can lead to cascading rollback…

Cascading Rollback
Whenever some transaction T tries to issue a Read_Item(X) or a Write_Item(X) operation,
the basic timestamp ordering algorithm compares the timestamp of T with the read
timestamp and the write timestamp of X to ensure that the timestamp order of execution of

16 Advanced Database Systems Lecture Note


the transactions is not violated. If the timestamp order is violated by the operation, then
transaction T will violate the equivalent serial schedule, so T is aborted. Then T is
resubmitted to the system as a new transaction with new timestamp. If T is aborted and
rolled back, any transaction Ti that may have used a value written by T must also be rolled
back. Similarly, any transaction Tj that may have used a value written by Ti must also be
rolled back, and so on. This effect is known as cascading rollback.

Optimistic Technique
 Locking and assigning and checking timestamp values may be unnecessary for
some transactions
 Assumes that conflict is rare.
 When transaction reaches the level of executing commit, a check is performed
to determine whether conflict has occurred. If there is a conflict, transaction is
rolled back and restarted.
 Based on assumption that conflict is rare and more efficient to let transactions
proceed without delays to ensure serializability.
 At commit, check is made to determine whether conflict has occurred.
 If there is a conflict, transaction must be rolled back and restarted.
 Potentially allows greater concurrency than traditional protocols.
 Three phases:
1. Read
2. Validation
3. Write

1. Optimistic Techniques - Read Phase


 Extends from start until immediately before commit.
 Transaction reads values from database and stores them in local variables.
Updates are applied to a local copy of the data.

2. Optimistic Techniques - Validation Phase


 Follows the read phase.
 For read-only transaction, checks that data read are still current values. If no
interference, transaction is committed, else aborted and restarted.
 For update transaction, checks transaction leaves database in a consistent state, with
serializability maintained.

17 Advanced Database Systems Lecture Note


3. Optimistic Techniques - Write Phase
 Follows successful validation phase for update transactions.
 Updates made to local copy are applied to the database.

Granularity of data items


Granularity is the size of the data items chosen as the unit of protection by a concurrency
control protocol.
It could be:
 The entire database
 A file
 A page (a section of physical disk in which relations are stored)
 A record
 A field value of a record

The granularity has effect on the performance of the system. As locking will prevent access
to the data, the size of the data required to be locked will prevent other transactions from
having access. If the entire database is locked, then consistency will be highly maintained
but less performance of the system will be witnessed. If a single data item is locked;
consistency maybe at risk but concurrent processing and performance will be enhanced.
Thus, as one go from the entire database to a single value, performance and concurrent
processing will be enhanced but consistency will be at risk and needs good concurrency
controlmechanism and strategy.

Transaction Subsystem
 Transaction manager
 Scheduler
 Recovery manager
 Buffer manager

18 Advanced Database Systems Lecture Note


 Transaction Manager- In DBMS co-ordinates transactions on behalf of application
programs.
 Communicates with scheduler (lock manager) which is responsible for
implementing a strategy for concurrency control.

 The Scheduler- in the DBMS ensures that the individual steps of different
transactions preserve consistency.

 Recovery Manager: Ensures that database is restored to the original state incase
failure occurs.

 Buffer Manager: responsible for transfer of data between disk storage and main
memory.

UNIT - FOUR

Database Recovery

Database recovery -is the process of restoring database to a correct state in the event of a
failure.
A database recovery - is the process of eliminating the effects of a failure from the
database.
Recovery, in database systems terminology, is called restoring the last consistent state of
the data items.
Types of failures
A failure is a state where data inconsistency is visible to transactions if they are
Scheduled for execution.
The kind of failure could be:
 System crashes, resulting in loss of main memory.
 Media failures, resulting in loss of parts of secondary storage.
 Application software errors.

19 Advanced Database Systems Lecture Note


 Natural physical disasters.
 Carelessness or unintentional destruction of data or facilities.
 Sabotage.
In databases usually a failure can generally be categorized as one of the following major
groups:
1. Transaction failure: a transaction cannot continue with its execution, therefore, it is
aborted and if desired it may be restarted at some other time.
Reasons:- Deadlock,
- timeout,
- protection violation,
-or system error.
2. System failure: the database system is unable to process any transactions.
Some of the common reasons of system failure are:
-wrong data input,
- register overflow,
- addressing error,
- power failure,
-memory failure, etc.
3. Media failure: failure of non-volatile storage media (mainly disk).
Some of the common reasons are:
- head crash,
-dust on the recording surfaces,
- fire, etc.

To make the database secured, one should formulate a “plan of attack” in advance. The
plan will be used in case of database insecurity that may range from minor inconsistency to
total loss of the data due to hazardous events.
The basic steps in performing a recovery are

1. Isolating the database from other users. Occasionally, you may need to drop and re-
create the database to continue the recovery.

2. Restoring the database from the most recent use able dump.

3. Applying transaction log dumps, in the correct sequence, to the database to make the
data as current as possible.

20 Advanced Database Systems Lecture Note


It is a good idea to test your backup and recovery plans periodically by loading the
backups and transaction logs into a test database and verifying that your procedure really
works.
One can recover databases after three basic types of problems:
User error,
Software failure, and
Hardware failure.
Each type of failure requires a recovery mechanism. In a transaction recovery, the effect of
failed transaction is removed from the database, if any. In a system failure, the effects of
failed transactions have to be removed from the database and the effects of completed
transactions have to be installed in the database.
The database recovery manger is responsible to guarantee the atomicity and durability
properties of the ACID property.
The execution of a transaction T is correct if, given a consistent database; T leaves the
database in the next consistent state. Initial consistent state of the database: S1. T changes
the consistent state from S1 to S2. During this transition the database may go into an
inconsistent state, which is visible only to T. If the system or T fails then database will not
be able to reach S2, consequently the visible state would be an inconsistent state. From this
state we can go either forward (to state S2) or go backward (to state S1).

Example:
The initial value of A=100, B=200 and C=300
The Required state after the execution of T1 is A=500, B=800 and C=700
Thus S1= (100,200,300) AND S2= (500,800,700)
Transaction (T1)

Time Operation
1 A=A+200
2 B=B-100

21 Advanced Database Systems Lecture Note


3 C=C-200
Failure
4 A=A+200
5 B=B+700
6 C=C+600

Force writing: the explicit writing of the buffers to secondary storage.

22 Advanced Database Systems Lecture Note


Transactions and Recovery
 Transactions represent basic unit of recovery.
 Recovery manager responsible for atomicity and durability.
 If failure occurs between commit and database buffers being flushed to secondary
storage then, to ensure durability, recovery manager has to redo (rollforward)
transaction's updates.
 If transaction had not committed at failure time, recovery manager has to undo
(rollback) any effects of that transaction for atomicity.
 Partial undo - only one transaction has to be undone.
 Global undo - all transactions have to be undone.
Transaction Log: Execution history of concurrent transactions.

 DBMS starts at time t0, but fails at time tf. Assume data for transactions T2 and T3 have been
written to secondary storage.
 T1 and T6 have to be undone. In absence of any other information, recovery manager has to redo
T2, T3, T4, and T5.
 tc is the checkpoint time by the DBMS

Recovery Facilities
 DBMS should provide following facilities to assist with recovery:
 Backup mechanism: that makes periodic backup copies of database.
 Logging facility: that keeps track of current state of transactions and
database changes.
 Checkpoint facility: that enables updates to database in progress to be
made permanent.
 Recovery manger: which allows DBMS to restore the database to a
consistent state following a failure.

23 Advanced Database Systems Lecture Note


Restoring the database means transforming the state of the database to the immediate good
state before the failure. To do this, the change made on the database should be preserved.
Such kind of information is stored in a system log or transaction log file.
Log File
 Contains information about all updates to database:
 Transaction records.
 Checkpoint records.
 Often used for other purposes (for example, auditing).
 Transaction records contain:
 Transaction identifier.
 Type of log record,(transaction start, insert, update, delete, abort,
commit).
 Identifier of data item affected by database action (insert, delete, and
update operations).
 Before-image of data item.
 After-image of data item.
 Log management information.
 Log file sometimes split into two separate random-access files.
 Potential bottleneck; critical in determining overall performance.

24 Advanced Database Systems Lecture Note


Check pointing
Checkpoint: is a point of synchronization between database and log file.
All buffers are force-written to secondary storage.
 Checkpoint record is created containing identifiers of all active transactions.
 When failure occurs, redo all transactions that committed since the checkpoint
and undo all transactions active at time of crash.
 In previous example, with checkpoint at time tc, changes made by T2 and T3 have
been written to secondary storage.
 Thus:
 only redo T4 and T5,
 Undo transactions T1 and T6.

Recovery Techniques
Damage to the database could be either physical and relate which will result in the loss of
the data stored or just inconsistency of the database state after the failure. For each we can
have a recover mechanism:
1. If database has been damaged:
 Need to restore last backup copy of database and re apply updates of
committed transactions using log file.
 Extensive damage/catastrophic failure: physical media failure; is restored
by using the backup copy and by re executing the committed transactions
from the log up to the time of failure.

2. If database is only inconsistent:


 No physical damage/only inconsistent: the restoring is done by reversing
the changes made on the database by consulting the transaction log.
 Need to undo changes that caused inconsistency. May also need to redo
some transactions to ensure updates reach secondary storage.
 Do not need backup, but can restore database using before- and after-
images in the log file.

25 Advanced Database Systems Lecture Note


Recovery Techniques for Inconsistent Database State

Recovery is required if only the database is updated. The kind of recovery also
depends on the kind of update made on the database.
Database update: A transaction’s updates to the database can be applied in three
Ways:
 Three main recovery techniques:
1. Deferred Update
2. Immediate Update
3. Shadow Paging

Deferred Update
 Updates are not written to the database until after a transaction has reached its
commit point.
 If transaction fails before commit, it will not have modified database and so no
undoing of changes required.
 May be necessary to redo updates of committed transactions as their effect may not
have reached database.
 A transaction first modifies all its data items and then writes all its updates to the
final copy of the database. No change is going to be recorded on the database before
commit. The changes will be made only on the local transaction workplace. Update
on the actual database is made after commit and after the change is recorded on the
log. Since there is no need to perform undo operation it is also called NO-
UNDO/REDO Algorithm

Immediate Update/ Update-In-Place


 Updates are applied to database as they occur.
 Need to redo updates of committed transactions following a failure.
 May need to undo effects of transactions that had not committed at time of failure.
 Essential that log records are written before write to database. Write-ahead log
protocol.

26 Advanced Database Systems Lecture Note


 If no "transaction commit" record in log, then that transaction was active at failure
and must be undone.
 Undo operations are performed in reverse order in which they were written to log.

As soon as a transaction updates a data item, it updates the final copy of the database on
the database disk. During making the update, the change will be recorded on the
transaction log to permit rollback operation in case of failure. UNDO and REDO are
required to make the transaction consistent.
Thus it is called UNDO/REDO Algorithm. This algorithm will undo all updates made
in place before commit. The redo is required because some operations which are
completed but not committed should go to the database.
If we don’t have the second scenario, then the other variation of this algorithm is called
UNDO/NO-REDO Algorithm.

Shadow Paging
 Maintain two page tables during life of a transaction:
current page and shadow page table.
 When transaction starts, two pages are the same.
 Shadow page table is never changed thereafter and is used to restore database in
event of failure.
 During transaction, current page -table records all updates to database.
 When transaction completes, current page table becomes shadow page table.

27 Advanced Database Systems Lecture Note


UNIT - FIVE
Distributed Database Systems
o Before the development of database technology, every application used to
have its own data in the application logic.
o Database development facilitates the integration of data available in an
organization from a number of applications and enforces security on data
access on a single local site.
But it is not always the case that organizational data reside in one central site.
This demand databases at different sites to be integrated and synchronized
with all the facilities of database approach.
o This is will be made possible by computer networks and data communication
optimized by internet, mobile and wireless computing and intelligent devices.

o This leads to Distributed Database Systems.


o The decentralization approach to data base mirrors the natural organizational
structure of companies which are logically distributed in to divisions,
departments, projects and so on and physically distributed in to offices, plants,
and factories each of which maintains its own operational data.

o Distributed Database is not a centralized database.

Centralized DB Distributed DB

28 Advanced Database Systems Lecture Note


Data Distribution Strategies
 Distributed DB -Stores logically related shared data and meta data at several
physically independent sites connected via network
 Distributed DBMS -is the software system that permits the management of a
Distributed DB and makes the distribution transparent to the
user.
 Data allocation -is the process of deciding where to allocate/store particular data
Item.
 There are 3 data allocation strategies:

1. Centralized: the entire DB is located at a single site. And


Computers access through networks. Known as
Distributed processing.
2. Partitioned: the DB is split into several disjoint parts (called
Partitions, segments or fragments) and stored
At several sites
3. Replicated: copies of one or more partitions are stored at
several sites
1, Selective- combines fragmentation (locality of reference for
Those which are less updated) replication and
Centralization as appropriate for the data.
2, Complete: - database copy is made available in each site.
Snap shot is one method used here.
 In a distributed database system, the database is logically stored as single database but
Physically fragmented on several computers.
 The computers in a distributed system communicate with each other through various
communication media, such as high speed buses or telephone line.
 A distributed database system has the following components.
o Local Data Base Management System (LDBMS)
o Distributed Data Base Management System(DDBMS)
o Global System Catalog(GSC)
o Data communication (DC)

29 Advanced Database Systems Lecture Note


 A distributed database system consists of a collection of sites, each of which maintains a
local database system (Local DBMS) but each local DBMS also participates in at least one
global transaction where different databases are integrated together.

o Local Transaction: transactions that access data only in that single site
o Global Transaction: transactions that access data in several sites.

 Parallel DBMS: a DBMS running across multiple processors and disks that is designed
to execute operations in parallel, whenever possible, in order to improve performance.
Three architectures for parallel DBMS:

 Shared Memory- for fast data access for a limited number of processors.
 Shared Disk- for application inherently centralized
 Shared nothing.- massively parallel
 What makes DDBMS different is that
o The various sites are aware of each other
o Each site provides a facility for executing both local and global transactions.
 The different sites can be connected physically in different topologies.
o Fully /networked,
o Partially Connected,
o Tree Network,
o Star Network and
o Ring Network
 The differences between these sites is based on:
o Installation Cost: cost of linking sites physically.
o Communication Cost: cost to send and receive messages and data
o Reliability: resistance to failure
o Availability: degree to which data can be accessed despite the failure.
 The distribution of the database sites could be:
1. Large Geographical Area: Long-Haul Network
 relatively slow
 less reliable
 uses telephone line, microwave, satellite
2. Small Geographical Area: Local Area Network
 higher speed
 lower rate of error
 use twisted pair, base band coaxial, broadband coaxial, fiber optics
 Even though integration of data implies centralized storage and control, in distributed
database systems the intention is different. Data is stored in different database systems in a
decentralized manner but act as if they are centralized through development of computer
networks.
 A distributed database system consists of loosely coupled sites that share no physical
component and database systems that run on each site are independent of each other.

30 Advanced Database Systems Lecture Note


 Those which share physical components are known as Parallel DBMS.
 Transactions may access data at one or more sites
 Organization may implement their database system on a number of separate computer
system rather than a single, centralized mainframe. Computer Systems may be located at
each local branch office.

Functions of a DDBMS

 DDBMS have the following functionality.


 Extended Communication Services to provide access to remote sites.
 Extended Data Dictionary- to store data distribution details a need for global
system catalog.
 Distributed Query Processing - optimization of query remote data access.
 Extended security- access control to a distributed data
 Extended Concurrency Control –maintain consistency of replicated data.
 Extended Recovery Services- failures of individual sites and the
communication line.

Issues in DDBMS
How is data stored in DDBMS?
There are several ways of storing a single relation in distributed database systems.

1.Replication:
o System maintains multiple copies of similar data (identical data)
o Stored in different sites, for faster retrieval and fault tolerance.
o Duplicate copies of the tables can be kept on each system (replicated). With this
option, updates to the tables can become involved (of course the copies of the tables
can be read-only).
o Advantage: Availability, Increased parallelism (if only reading)
o Disadvantage: increased overhead of update
2.Fragmentation:
o Relation is partitioned into several fragments stored in distinct sites
 The partitioning could be vertical, horizontal or both.
o Horizontal Fragmentation
 Systems can share the responsibility of storing information from a
single table with individual systems storing groups of rows

31 Advanced Database Systems Lecture Note


 Performed by the Selection Operation
 The whole content of the relation is reconstructed using the
UNION operation

o Vertical Fragmentation
 Systems can share the responsibility of storing particular attributes of a
table.
 Needs attribute with tuple number (the primary key value be repeated.)
 Performed by the Projection Operation
 The whole content of the relation is reconstructed using the Natural
JOIN operation using the attribute with Tuple number (primary key
values)

32 Advanced Database Systems Lecture Note


o Both (hybrid fragmentation)
 A system can share the responsibility of storing particular attributes of a
subset of records in a given relation.
 Performed by projection then selection or selection then projection relational
algebra operators.
 Reconstruction is made by combined effect of Union and natural join
operators
Fragmentation is correct if it fulfils the following
1. Complete: - a data item must appear in at least one fragment of a given
relation R (R1, R2…Rn).
2. Reconstruction:- it must be possible to reconstruct a relation from the
fragments
3. Disjointness: - a data item should only be found in a single fragment except
for vertical fragmentation (the primary key is repeated for reconstruction).
3.Data transparency:
The degree to which system user may remain unaware of the details of how and where
the data items are stored in a distributed system.

1. Distribution transparency Even though there are many systems they appear
as one- seen as a single, logical entity.
2. Replication transparency Copies of data floating around everywhere also
seem like just one copy to the developers and users

33 Advanced Database Systems Lecture Note


3. Fragmentation transparency A table that is actually stored in parts
everywhere across sites may seem like just a single table in a single location
4. Location Transparency- the user doesn’t need to know where a data item is
physically located.

How does it work?


Distributed computing can be difficult to implement, particularly for replicated data that can be
updated from many systems. In order to operate a distributed database system has to take care of
 Distributed Query Processing
 Distributed Transaction Management
 Replication Data Management -If you are going to have copies of data on many machines
how often does the data get updated if it is changed in another system? Who is in charge of
propagating the update to the data?

 Distributed Database Recovery If one machine goes down how does that affect the others.
 Security: Just like any computer network, a distributed system needs to have a common
way to validate users entering from any computer in the network of servers.
 Common Data-Dictionary Your schema now has to be distinguished and work in
connection to schemas created on many systems.

Homogeneous and Heterogeneous Distributed Databases


 In a homogeneous distributed database
 All sites have identical software (DBMS)
 Are aware of each other and agree to cooperate in processing user requests.
 Each site surrenders part of its autonomy in terms of right to change schemas or
software
 Appears to the user as a single system
 In a heterogeneous distributed database
 Different sites may use different schemas and software (DBMS)
 Difference in schema is a major problem for query processing
 Difference in software is a major problem for transaction processing
 Sites may not be aware of each other and may provide only limited facilities for cooperation
in transaction processing.
 May need gateways to interface one another.

Why DDBMS/Advantages
1. Many existing systems

34 Advanced Database Systems Lecture Note


 Maybe you have no choice.
 Possibly there are many different existing system, with possible different kinds of
systems (Oracle, Informix, others) that need to be used together.
2. Data sharing and distributed control:
 User at one site may be able to access data that is available at another site.
 Each site can retain some degree of control over local data
 We will have local as well as global database administrator
3. Reliability and availability of data
 If one site fails the rest can continue operation as long as transaction does not
demand data from the failed system and the data is not replicated in other sites
4. Speedup of query processing
 If a query involves data from several sites, it may be possible to split the query into sub-
queries that can be executed at several sites which is parallel processing
 Query can be sent to the least heavily loaded sites
5. Expansion(Scalability)
 In a distributed environment you can easily expand by adding more machines to
the network.
Disadvantages of DDBMS
1. Software Development Cost
 Is difficult to install, thus is costly
2. Greater Potential for Bugs
 Parallel processing may endanger correctness of algorithms
3. Increased Processing Overhead
 Exchange of message between sites – high communication latency
 Due to communication jargons
4. Communication problems
5. Increased Complexity and Data Inconsistency Problems
 Since clients can read and modify closely related data stored in different database
instances concurrently.
6. Security Problems: network and replicated data security.

Query Processing and Transaction Management in DDBMS


Query Processing

There are different strategies to process a specific query, which in turn increase the performance of
the system by minimizing processing time and cost. In addition to the cost estimates we have for a
centralized database (disk access, relation size, etc), we have to consider the following in
distributed query processing:

35 Advanced Database Systems Lecture Note


o Cost of data transmission over the huge network
o Gain of parallel processing of a single query

For the case of Replicated data allocation, even though parallel processing is used to increase
performance, update will have a great impact since all the sites containing the data item should be
updated.For the case of fragmentation, update works more like the centralized database but
reconstruction of the whole relation will require accessing data from all sites containing part of the
relation.Let the distributed database has three sites (S1, S2, and S3). And two relations, EMPLOYEE
and DEPARTMENT are located at S1 and S2 respectively without any fragmentation. And a query
is initiated from S3 to retrieve employees [First Name (15 byte long), Last name (15 byte long) and
Department name (10 byte long) total of 40 bytes with the department they are working in.

Let:
For EMPLOYEE we have the following information
1. 10,000 records
2. each record is 100 bytes long
For DEPARTMENT we have the following information
3. 100 records
4. each record is 35 bytes long
There are three ways of executing this query:
1. Transfer DEPARTMENT and EMPLOYEE to S3 and perform the join there: needs transfer of
10,000*100+100*35=1,003,500 byte.
2. Transfer the EMPLOYEE to S2, perform the join there which will have 40*10,000 = 400,000
bytes and transfer the result to S3. we need 1,000,000+400,000=1,400,000 byte to be
transferred
3. Transfer the DEPARTMENT to S1, perform the join there which will have 40*10,000 =
400,000 bytes and transfer the result to S3. We need 3,500+400,000=403,500 byte to be
transferred.

Then one can select the strategy that will reduce the data transfer cost for this specific query. Other
steps of optimization may also be included to make the processing more efficient by reducing the
size of the relations using projection.

Transaction Management

Transaction is a logical unit of work constituted by one or more operations executed by a single
user. A transaction begins with the user's first executable query statement and ends when it is
committed or rolled back.

36 Advanced Database Systems Lecture Note


A Distributed Transaction is a transaction that includes one or more statements that,
individually or as a group, update data on two or more distinct nodes of a distributed database.

Representation of Query in Distributed Database

SQL Statement Object Database Domain


SELECT * FROM dept@sales.midroc.telecom.et; Dept sales midroc.telecom.et;

There are two types of transaction in DDBMS to access data from other sites:

1. Remote Transaction: contains only statements that access a single remote node. Thus,
Remote Query statement is a query that selects information from one or more remote tables,
all of which reside at the same remote node or site.

o For example, the following query accesses data from the dept table in the Addis schema
(the site) of the remote sales database:

o SELECT * FROM Addis.dept@sales.midroc.telecom.et;

o A remote update statement is an update that modifies data in one or more tables, all
of which are collocated at the same remote node. For example, the following query
updates the branch table in the Addis schema of the remote sales database:

UPDATE Addis.dept@ sales.midroc.telecom.et;

SET loc = 'Arada'

WHERE BranchNo = 5;

2. Distributed Transaction: contains statements that access more than one node.

o A distributed query statement retrieves information from two or more nodes.


o If all statements of a transaction reference only a single remote node, the transaction
is remote, not distributed.
o A database must guarantee that all statements in a transaction, distributed or non-
distributed, either commit or roll back as a unit. The effects of an ongoing transaction
should be invisible to all other transactions at all nodes; this transparency should be
true for transactions that include any type of operation, including queries, updates,
or remote procedure calls.
o For example, the following query accesses data from the local database as well as the
remote sales database:

37 Advanced Database Systems Lecture Note


SELECT ename, dname

FROM Dessie.emp DS, Addis.dept@ sales.midroc.telecom.et AD

WHERE DS.deptno = AD.deptno;

{Employee data is stored in Dessie and Sales data is stored in Addis, there is an employee
responsible for each sale}

DIFFERENT KINDS OF SQL STATEMENTS

 Remote query

select
client_nm
from
clients@accounts.motorola.com;

 Distributed query

select
project_name, student_nm
from
intership@accounts.motorola.com i, student s
where
s.stu_id = i.stu_id

 Remote Update

update intership@accounts.motorola.com set


stu_id = '242'
where
stu_id = '200'

 Distributed Update

update intership@accounts.motorola.com set


stu_id = '242'
where
stu_id = '200'
update student set
stu_id = '242'
where

38 Advanced Database Systems Lecture Note


stu_id = '200'
commit

Concurrency Control
o There are various techniques used for concurrency control in centralized database systems. The
techniques in distributed database system are similar with the centralized approach with additional
implementation requirements or modifications.
o The main difference or the change that should be incorporated is the way the lock manager is
implemented and how it functions.
o There are different schemes for concurrency control in DDBS

1. Non-Replicated Scheme
o No data is replicated in the system
o All sites will maintain a local lock manager (local lock and unlock)
o If site Si needs a lock on data in site Sj it send message to lock manager of site
Sj and the locking will be handled by site Sj
o All the locking and unlocking principles are handled by the local lock
manager in which the data object resides.
o Is simple to implement
o Need three message transfers
o To request a lock
o To notify grant of lock
o To request unlock

2. Single Coordinate Approach


o The system choose one single lock manager that resides in one of the sites (Si)
o All locks and unlocks requests are made at site Si where the lock manager resides
o Is simple to implement
o Needs two message transfers
o To request a lock
o To request unlock
o Simple deadlock handling
o Could be a bottleneck since all processes are handled at one site
o Is vulnerable/at risk if the site with the lock manager fails

39 Advanced Database Systems Lecture Note


UNIT - SIX

Introduction to Object-Oriented Database Systems


Object Orientation
• Object Orientation
• Set of design and development principles
• Based on autonomous computer structures known as objects
• OO Contribution areas
• Programming Languages
• Graphical User Interfaces
• Databases
• Design
• Operating Systems
Evolution of OO Concepts
• Concepts stem from object-oriented programming languages (OOPLs)
• Ada, ALGOL, LISP, SIMULA
• OOPLs goals
• Easy-to-use development environment
• Powerful modeling tools for development
• Decrease in development time
• Make reusable code
• OO Attributes
• Data set not passive

40 Advanced Database Systems Lecture Note


• Data and procedures bound together
• Objects can act on itself

Advanced Database Applications


The application of database is being highly practiced in complex applications
demanding different way of modeling and designing. Some of these applications
are:
• Computer Aided Design (CAD)
• Computer Aided Manufacturing
• Computer Aided Software Engineering
• Network Management System
• Multimedia System
• Digital Publication
• Geographic Information System
• Interactive and Dynamic Web Sites

Drawbacks/Weaknesses of Relational DBMS


In addition to the emergence of many advanced database application areas, there were
some drawbacks on the relational database management system.
1. Poor representation of ‘real world’ entities
 relations does not correspond to real world objects
2. Semantic overloading (semantically Overloaded)
 Representation of complex relationship is difficult
 Eg. M:N relationship is represented by adding one additional relation
(making the relation an entity)
 One can not distinguish a relation from a relationship.
 Difficult to distinguish different types of relationship.
3. Poor support for integrity and enterprise constraints
 Many commercial BMS do not support all integrity constraints
 The relational model do not support enterprise constraints it has to be
done on the DBMS
4. Homogeneous data structure
 Has vertical and horizontal homogeneity
 Horizontal Homogeneity: Each tuple of a relation have same
number and type of attributes

41 Advanced Database Systems Lecture Note


 Vertical Homogeneity: Values of an attribute should come from
same domain.
 Field value should be atomic
5. Limited operations
 Relational model has fixed operations on the data
 Does not allow additional/new operations
6. Difficulty in handling recursive queries
 Direct and indirect recursive queries of a single relation can not be
represented.
 Query to extract information from recursive relationship between tuples of
same entity is difficult to represent.
7. Impedance mismatch
 Mixing of different programming paradigms
 Mismatch between the languages used
8. Poor navigational process
 Access is based on navigating individual records from different relations
OO Concepts
• Object is a uniquely identifiable entity that contains both the attributes that describes the
state of the ‘real world’ object and the action that are associated with it.
• OODBMS can manage complex, highly interrelated information.
• Abstract representation of a real-world entity
• Unique identity
• Embedded properties
• Ability to interact with other objects and self
OID (Object Identity)
• Each object is unique in OO systems
• Unique to object
• Not a primary key (PK is unique only for a relation, PK is selected from
attribute making it dependent on the state)
• Is invariant (will not change)
• Independent of values of attributes ( two objects can have same state but will
have different OID)
• Is invisible to users
• Entity integrity is enforced
• Relationship: embedding the OID of one object into the other ( embed OID for
a branch to employee object)
• Advantage of using OID:
• Are efficient

42 Advanced Database Systems Lecture Note


• Are fast
• Cannot be modified by users (system generated)
• Independent of content

Attributes
• Called instance variables
• Domain
Object state
• Object values at any given time
• Values of attributes at any given point in time.

Methods
• Code/function that performs operation on object’s data
• Has name and body

43 Advanced Database Systems Lecture Note


• Methods define the behavior of object
• Can be used to change the state of the object by modifying attribute values

Messages
• Means by which objects communicate
• Request from one object to the other to activate one of its methods
• Invokes method/calls method to be applied
• Sent to object from real world or other object
• Notation: Object.Method
• Eg: StaffObject.updatesalary(slary)

Classes
• Blueprint for defining similar objects
• Objects with similar attributes and respond to same message are grouped
together
• Defined only once and used by many objects
• Collection of similar objects
• Shares attributes and structure

44 Advanced Database Systems Lecture Note


• Objects in a class are called instances of the class
Eg: Class Definition : defining the class BRANCH
BRANCH
Attributes
brabchNo
street
city
postcode
Methods
Print()
getPostCode()
numberofStaff()

Objects of BRANCH class


brabchNo=B005 brabchNo=B007 brabchNo=B003
street=st1 street=st2 street=st2
city=Addis city=Dire city=Bahirdar
postcode=1425 postcode=452 postcode=85

45 Advanced Database Systems Lecture Note


Object Characteristics
Class Hierarchy

• Superclass
• Subclass
Inheritance

• Ability of object to inherit the data structure and behavior of classes above it
• Single inheritance – class has one immediate superclass
• Multiple – class has more than one immediate superclass
Method Overriding

• Method redefined at subclass


Polymorphism

• Allows different objects to respond to same message in different ways

Object Classification
• Simple
 Only single-valued attributes
 No attributes refer to other objects
• Composite
 At least one multi-valued attribute
 No attributes refer to other object
• Compound
 At least one attribute that references other object

46 Advanced Database Systems Lecture Note


• Hybrid
 Repeating group of attributes
 At least one refers to other object

Characteristics of OO Data Model


• Supports complex objects
• Must be extensible
• Supports encapsulation
• Exhibit inheritance
• Supports object identity

OO vs. E-
R Model Components
OO Data Model E-R Model

Type Entity definition

Object Entity Instance

Class Entity set

Instance variable Attribute

47 Advanced Database Systems Lecture Note


N/A Primary key

OID N/A

Class hierarchy E-R diagram

OODBMS
• Object-oriented database technology is a marriage of object-oriented programming
and database technologies.
• Database management system integrates benefits of typical database systems with
OODM characteristics
• Handles a mix of data types (since OODBMS permits new data definition)
• Follows OO rules
• Follows DBMS rules
OO and Database Design
• Provides data identification and the procedures for data manipulation
• Data and procedures self-contained entity
• Iterative and incremental
• DBA does more programming
• Lack of standards
OODBMS Advantages
• More semantic information
• Support for complex objects
• Extensibility of data types (user defined data types)
• May improve performance with efficient caching
• Versioning
• Polymorphism: one operation shared by many objects and each acting differently
• Reusability
• Inheritance speeds development and application: defining new objects in terms of
previously defined objects Incremental Definition)
• Potential to integrate DBMSs into single environment
• Relationship between objects is represented explicitly supporting both navigational
and associative access to information.

OODBMS Disadvantages
• Strong opposition from the established RDBMSs
• Lack of theoretical foundation

48 Advanced Database Systems Lecture Note


• No standard
• No single data model
• Throwback to old pointer systems
• Lack of standard ad hoc query language
• Lack of business data design and management tools
• Steep learning curve
• Low market presence
• Lack of compatibility between different OODBMSs

UNIT – SEVEN

DATABASE ADMINISTRATION AND SECURITY


In today's society, information is a critical resource not only in the fields of industry,
commerce, education, or medicine, but also in the fields of military, diplomacy, or
governments. Some information is extremely important as to have to be protected. For
example, data corruption or fabrication in a hospital database could result in patients'
receiving the wrong medication. Disclosure or modification of military information could
endanger national security.

 Privacy – Ethical and legal rights that individuals have with regard to control
Over the dissemination and use of their personal information

 Database security – Protection of information contained in the database against


unauthorized access, modification or destruction

 Database integrity – Mechanism that is applied to ensure that the data in the
database is correct and consistent
A good database security management system has the following characteristics:
- data independence,
- shared access,
- minimal redundancy,

49 Advanced Database Systems Lecture Note


- data consistency,
- privacy, integrity, and availability.

 Privacy- signifies that an unauthorized user cannot disclose data


 Integrity- ensures that an unauthorized user cannot modify data
 Availability -ensures that data be made available to the authorized user unfailingly.
 Copyright- ensures the native rights of individuals as a creator of information.
 Validity -ensures activities to be accountable by law.

With a strong enforcement and management of these, it is said that the database system can
effectively prevent accidental security and integrity threats from system error, improper
authorization and concurrent usage anomalies. In addition to have an efficient system, it
should have prevention on malicious or intentional security and integrity threats where
computer system operator can bypass security as well as programmers as hackers.

There are certain security policy issues that we should recognize, where we should
consider administrative control policies, decide which security features offered by the
DBMS is used to implement the system, decide whether the focus of security
administration is left with DBA and whether it is centralized or decentralized. Besides, one
should decide on ownership of shared data as well.
When we talk about the levels of security protection, it may start from
 organization & administrative security,
 physical & personnel security,
 communication security and
 Information systems security
Database security and integrity is about protecting the database from being inconsistent and
being disrupted. We can also call it database misuse.
Database misuse could be Intentional or Accidental, where accidental misuse is easier to
cope with than intentional misuse.
Accidental inconsistency could occur due to:
 System crash during transaction processing
 Anomalies due to concurrent access
 Anomalies due to redundancy
 Logical errors

50 Advanced Database Systems Lecture Note


Likewise, even though there are various threats that could be categorized in this group.
Intentional misuse could be:
 Unauthorized reading of data.
 Unauthorized modification of data or.
 Unauthorized destruction of data.
Most systems implement good Database Integrity to protect the system from accidental
misuse while there are many computer based measures to protect the system from
intentional misuse, which is termed as Database Security measures.

Levels of Security Measures


Security measures can be implemented at several levels and for different components
of the system. These levels are:

1. Physical Level: concerned with securing the site containing the computer system. The
backup systems should also be physically protected from access except for authorized
users. In other words, the site or sites containing the computer systems must be
physically secured against armed or sneaky entry by intruders.
2. Human Level: concerned with authorization of database users for access the content at
different levels and privileges.
3. Operating System: concerned with the weakness and strength of the operating system
security on data files. Weakness may serve as a means of unauthorized access to the
database. No matter how secure the database system is, weakness in operating system
security may serve as a means of unauthorized access to the database. This also includes
protection of data in primary and secondary memory from unauthorized access.
4. Database System: concerned with data access limit enforced by the database system.
Access limit like password, isolated transaction and etc. Some database system users
may be authorized to access only a limited portion of the database. Other users may be
allowed to issues queries, but may be forbidden to modify the data. It is the
responsibility of the database system to ensure that these authorization restrictions are
not violated.
5. Application Level:Since almost all database systems allow remote access through
terminals or networks, software-level security with the network software is as important
as physical security, both on the Internet and networks private to an enterprise.

Even though we can have different levels of security and authorization on data objects and
users, who access which data is a policy matter rather than technical.

51 Advanced Database Systems Lecture Note


These policies
 should be known by the system:
 should be encoded in the system
 should be remembered:
 should be saved somewhere (the catalogue)

 Database Integrity-
-Constraints contribute to maintaining a secure database system by
preventing data from becoming invalid and hence giving misleading or
incorrect results
 Domain Integrity -means that each column in any table will have set of
allowed values and Can not assume any value other than the one specified in
the domain.
 Entity Integrity -means that in each table the primary key (which may be
composite) satisfies both of two conditions:
1. That the primary key is unique within the table and
2. That the primary key column(s) contains no null values.
 Referential Integrity means that in the database as a whole, things are set up
in such a way that if a column exists in two or more tables in the database
(typically as a primary key in one table and as a foreign key in one or more
other tables), then any change to a value in that column in any one table will
be reflected in corresponding changes to that value where it occurs in other
tables. This means that the RDBMS must be set up so as to take appropriate
actions to spread a change—in one table—from that table to the other tables
where the change must also occur.
The effect of the existence and maintenance of referential integrity is, in short,
that if a column exists in two or more tables in the database, every occurrence
of the column will contain only values that are consistent across the database.

 Key constraints- in a relational database, there should be some collection of


attributes with a special feature used to maintain the integrity of the database.
These attributes will be named as Primary Key, Candidate Key, Foreign Key,
and etc. These Key(s) should obey some rules set by the relational data model.
 Enterprise Constraint - means some business rules set by the enterprise on
how to use, manage and control the database

 Database Security - the mechanisms that protect the database against intentional or
accidental threats. Database security encompasses hardware, software, people and data.

52 Advanced Database Systems Lecture Note


Database Management Systems supporting multi-user database system must provide a
database security and authorization subsystem to enforce limits on individual and group
access rights and privileges.

SECURITY ISSUES AND GENERAL CONSIDERATIONS


 Legal, ethical and social issues regarding the right to access information
 Physical control issues regarding how to keep the database physically secured.
 Policy issues regarding privacy of individual level at enterprise and national level
 Operational consideration on the techniques used (password, etc) to access and
manipulate the database
 System level security including operating system and hardware control
 Security levels and security policies in enterprise level

The designer and the administrator of a database should first identify the possible threat that
might be faced by the system in order to take counter measures.

Threat may be any situation or event, whether intentional or accidental, that may adversely
affect a system and consequently the organization

 A threat may be caused by a situation or event involving a person, action, or


circumstance that is likely to bring harm to an organization
 The harm to an organization may be tangible or intangible
Tangible – loss of hardware, software, or data
INTANGIBLE – LOSS OF CREDIBILITY OR CLIENT CONFIDENCE

Examples of threats:
 Using another persons’ means of access
 Unauthorized amendment/modification or copying of data
 Program alteration
 Inadequate policies and procedures that allow a mix of confidential and normal
out put
 Wire-tapping
 Illegal entry by hacker
 Blackmail
 Theft of data, programs, and equipment
 Failure of security mechanisms, giving greater access than normal
 Staff shortages or strikes

53 Advanced Database Systems Lecture Note


Inadequate staff training

Viewing and disclosing unauthorized data

Electronic interference and radiation

Data corruption owing to power loss or surge

Fire (electrical fault, lightning strike, arson), flood, bomb

Physical damage to equipment

Breaking cables or disconnection of cables

 Introduction of viruses
 An organization deploying a database system needs to identify the types of threat it
may be subjected to and initiate appropriate plans and countermeasures, bearing in
mind the costs of implementing each.

COUNTERMEASURES: COMPUTER BASED CONTROLS


 The types of counter measure to threats on computer systems range from physical controls to
administrative procedures
 Despite the range of computer-based controls that are available, it is worth noting that,
generally, the security of a DBMS is only as good as that of the operating system, owing to
their close association
 The following are computer-based security controls for a multi-user environment:

 Authorization
 The granting of a right or privilege that enables a subject to have legitimate
access to a system or a system’s object
 Authorization controls can be built into the software, and govern not only
what system or object a specified user can access, but also what the user may
do with it
 Authorization controls are sometimes referred to as access controls
 The process of authorization involves authentication of subjects (i.e. a user or
program) requesting access to objects (i.e. a database table, view, procedure,
trigger, or any other object that can be created within the system)
 Views
 A view is the dynamic result of one or more relational operations operation on
the base relations to produce another relation
 A view is a virtual relation that does not actually exist in the database, but is
produced upon request by a particular user
 The view mechanism provides a powerful and flexible security mechanism by
hiding parts of the database from certain users

54 Advanced Database Systems Lecture Note


 Using a view is more restrictive than simply having certain privileges granted
to a user on the base relation(s)

 Backup and recovery


 Backup is the process of periodically taking a copy of the database and log file
(and possibly programs) on to offline storage media
 A DBMS should provide backup facilities to assist with the recovery of a
database following failure
 Database recovery is the process of restoring the database to a correct state in
the event of a failure
 Journaling is the process of keeping and maintaining a log file (or journal) of
all changes made to the database to enable recovery to be undertaken
effectively in the event of a failure
 The advantage of journaling is that, in the event of a failure, the database can
be recovered to its last known consistent state using a backup copy of the
database and the information contained in the log file
 If no journaling is enabled on a failed system, the only means of recovery is to
restore the database using the latest backup version of the database
 However, without a log file, any changes made after the last backup to the
database will be lost

 Integrity
 Integrity constraints contribute to maintaining a secure database system by
preventing data from becoming invalid and hence giving misleading or
incorrect results
 Domain Integrity: setting the allowed set of values
 Entity integrity: demanding Primary key values not to assume a NULL value
 Referential integrity: enforcing Foreign Key values to have a value that already
exist in the corresponding Candidate Key attribute(s) or be NULL.
 Key constraints: the rules the Relational Data Model has on different kinds of
Key.

Encryption

Authorization may not be sufficient to protect data in database systems, especially when there is a
situation where data should be moved from one location to the other using network facilities.

Encryption is used to protect information stored at a particular site or transmitted between sites
from being accessed by unauthorized users.

55 Advanced Database Systems Lecture Note


Encryption is the encoding of the data by a special algorithm that renders the data unreadable by
any program without the decryption key.
It is not possible for encrypted data to be read unless the reader knows how to decipher/decrypt
the encrypted data.

 If a database system holds particularly sensitive data, it may be deemed necessary to encode
it as a precaution against possible external threats or attempts to access it
 The DBMS can access data after decoding it, although there is a degradation in performance
because of the time taken to decode it
 Encryption also protects data transmitted over communication lines
 To transmit data securely over insecure networks requires the use of a Cryptosystem, which
includes:
1. An encryption key to encrypt the data (plain text)
2. An encr
3. encryption algorithm that, with the encryption key, transforms the plaintext into
cipher text
4. A decryption key to decrypt the ciphertext
5. A decryption algorithm that, with the decryption key, transforms the ciphertext
back into plaintext

 Data encryption standard (DES) is an approach which does both a substitution of characters
and a rearrangement of their order based on an encryption key.

56 Advanced Database Systems Lecture Note


Types of Cryptosystems

 Cryptosystems can be categorized into two


1. Symmetric encryption – uses the same key for both encryption and decryption
and relies on safe communication lines for exchanging the key.
2. Asymmetric encryption – uses different keys for encryption and decryption e.g.
RSA
 Generally, symmetric algorithms are much faster to execute on a computer than
those that are asymmetric. In the contrary, asymmetric algorithms are more secure
than symmetric algorithms.

 RAID technology (Redundant Array of Independent Disks)

 The hardware that the DBMS is running on must be fault-tolerant, meaning that the
DBMS should continue to operate even if one of the hardware components fails. This
suggests having redundant components that can be seamlessly integrated into the
working system whenever there is one or more component failures. The main
hardware components that should be fault-tolerant include disk drives, disk
controllers, CPU, power supplies, and cooling fans. Disk drives are the most
vulnerable components with the shortest times between failures of any of the
hardware components.
 RAID works on having a large disk array comprising an arrangement of several
independent disks that are organized to improve reliability and at same time increase
performance
 Performance is increased through data striping
 Data striping – the data is segmented into equal size partitions (the striping unit)
which are transparently distributed across multiple disks.

Security at different Levels of Data


Almost all RDBMSs provide security at different levels and formats of data. This includes:
1. Relation Level: permission to have access to a specific relation.
2. View Level: permission to data included in the view and not in the named
relations
3. Hybrid (Relation/View): the case where only part of a single relation is made
available to users through View.

57 Advanced Database Systems Lecture Note


Any database access request will have the following three major components

1. Requested Operation: what kind of operation is requested by a specific query?


2. Requested Object: on which resource or data of the database is the operation sought
to be applied?
3. Requesting User: who is the user requesting the operation on the specified object?

The database should be able to check for all the three components before processing
any request. The checking is performed by the security subsystem of the DBMS.
AUTHENTICATION

 All users of the database will have different access levels and permission for different data
objects, and
 authentication is the process of checking whether the user is the one with the privilege for
the access level.
 Is the process of checking the users are who they say they are.
 Each user is given a unique identifier, which is used by the operating system to determine
who they are
 Thus the system will check whether the user with a specific username and password is
trying to use the resource.
 Associated with each identifier is a password, chosen by the user and known to the
operation system, which must be supplied to enable the operating system to authenticate
who the user claims to be.

AUTHORIZATION/PRIVILEGE
Authorization refers to the process that determines the mode in which a particular (previously
authenticated) client is allowed to access a specific resource controlled by a server.
Most of the time, authorization is implemented by using Views.
 Views are unnamed relations containing part of one or more base relations creating a
customized/personalized view for different users.
 Views are used to hide data that a user needs not to see.

FORMS OF USER AUTHORIZATION


There are different forms of user authorization on the resource of the database. These forms
are privileges on what operations are allowed on a specific data object.

User authorization on the data/extension

58 Advanced Database Systems Lecture Note


1. Read Authorization: the user with this privilege is allowed only to read the content of
the data object.
2. Insert Authorization: the user with this privilege is allowed only to insert new
records or items to the data object.
3. Update Authorization: users with this privilege are allowed to modify content of
attributes but are not authorized to delete the records.
4. Delete Authorization: users with this privilege are only allowed to delete a record and
not anything else.

 Different users, depending on the power of the user, can have one or the combination of
the above forms of authorization on different data objects.
User authorization on the database schema
1. Index Authorization: deals with permission to create as well as delete an index table for
relation.
2. Resource Authorization: deals with permission to add/create a new relation in the
database.

3. Alteration Authorization: deals with permission to add as well as delete attribute.


4. Drop Authorization: deals with permission to delete and existing relation.

Role of DBA in Database Security

The database administrator is responsible to make the database to be as secure as possible.


For this the DBA should have the most powerful privilege than every other user. The DBA
provides capability for database users while accessing the content of the database.

The major responsibilities of DBA in relation to authorization of users are:


1. Account Creation:
involves creating different accounts for different USERS as well as USER GROUPS.

2. Security Level Assignment:


involves in assigning different users at different categories of access levels.

3. Privilege Grant:
involves giving different levels of privileges for different users and user groups.

59 Advanced Database Systems Lecture Note


4. Privilege Revocation:
involves denying or canceling previously granted privileges for users due to various
reasons.
5. Account Deletion:
involves in deleting an existing account of users or user groups. Is similar with
denying all privileges of users on the database.

Approaches to Database Security

There are two broader approaches to security. The two types of database security
mechanisms are:
1. Discretionary security mechanisms
 Grant different privileges to different users and user groups on various
data objects
 The privilege is to access different data objects
 The mode of the privilege could be
 Read,
 Insert,
 Delete,
 Update files, records or fields.
 Is more flexible
 One user can have A but not B and another user can have B but not A
2. Mandatory security mechanisms
 Enforce multilevel security
 Classifying data and users into various security classes (or levels)
and implementing the appropriate security policy of the
organization.
 Each data object will have certain classification level
 Each user is given certain clearance level
 Only users who can pass the clearance level can access the data object
 Is comparatively not-flexible/rigid
 If one user can have A but not B then B is accessed by users with higher
privilege and we can not have B but not A
The ability to classify user into a hierarchy of groups provide a powerful tool for
administering large systems with thousands of users and objects.
A database system can support one or both of the security mechanisms to protect the data.

60 Advanced Database Systems Lecture Note


In most systems it is better to filter those that are allowed rather than identifying the
not allowed. Since if some object is authorized then it means it is not constrained.

Statistical Database Security

 Statistical databases contain information about individuals which may not be permitted to be
seen by others as individual records.

 Such databases may contain information about various populations.

 Example: Medical Records, Personal Data like address, salary, etc


 SUCH KIND OF DATABASES SHOULD HAVE SPECIAL SECURITY MECHANISMS SO THAT CONFIDENTIAL
INFORMATION ABOUT PEOPLE WILL NOT BE DISCLOSED FOR MANY USERS.

 THUS STATISTICAL DATABASES SHOULD HAVE ADDITIONAL SECURITY TECHNIQUES WHICH WILL
PROTECT THE RETRIEVAL OF INDIVIDUAL RECORDS.

 ONLY QUERIES WITH STATISTICAL AGGREGATE FUNCTIONS LIKE AVERAGE, SUM, MIN, MAX,
STANDARD DEVIATION, MID, COUNT, ETC SHOULD BE EXECUTED.

 QUERIES RETRIEVING CONFIDENTIAL ATTRIBUTES SHOULD BE PROHIBITED.

 NOT TO LET THE USER MAKE INFERENCE ON THE RETRIEVED DATA, ONE CAN ALSO IMPLEMENT
CONSTRAINT ON THE MINIMUM NUMBER OF RECORDS OR TUPLES IN THE RESULTING RELATION BY
SETTING A THRESHOLD.

61 Advanced Database Systems Lecture Note