You are on page 1of 25

UNIT V

DATA STORAGE & INDEXES, TRANSACTION PROCESSING & ERROR


RECOVERY
1
a) What is meant by File Organization? [L1][CO6][4M]
b) Briefly discuss different types of file organization. [L6][CO6][8M]

File Organization

o The File is a collection of records. Using the primary key, we can access the records.
The type and frequency of access can be determined by the type of file organization
which was used for a given set of records.
o File organization is a logical relationship among various records. This method defines
how file records are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms
of blocks, and the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store
only one fixed length record in any given file. An alternative approach is to structure
our files so that we can contain multiple lengths for records.
o Files of fixed length records are easier to implement than the files of variable length
records.

Types of file organization:

File organization contains various methods. These particular methods have pros and cons on
the basis of access or selection. In the file organization, the programmer decides the best-
suited file organization method according to his requirement.

Types of file organization are as follows:

1
o Sequential file organization
o Heap file organization
o Hash file organization
o B+ file organization
o Indexed sequential access method (ISAM)
o Cluster file organization

2
a) Write about Index file organization. [L1][CO6][2M]
b) List various index structures. [L4][CO6][10M]
Indexing is a data structure technique which allows you to quickly retrieve records from a
database file. An Index is a small table having only two columns. The first column comprises
a copy of the primary or candidate key of a table. Its second column contains a set
of pointers for holding the address of the disk block where that specific key value stored.

An index –

 Takes a search key as input


 Efficiently returns a collection of matching records.

Primary Index in DBMS


Primary Index is an ordered file which is fixed length size with two fields. The first field is
the same a primary key and second, filed is pointed to that specific data block. In the primary
Index, there is always one to one relationship between the entries in the index table.

The primary Indexing in DBMS is also further divided into two types.

2
 Dense Index
 Sparse Index

Dense Index
In a dense index, a record is created for every search key valued in the database. This helps
you to search faster but needs more space to store index records. In this Indexing, method
records contain search key value and points to the real record on the disk.

Sparse Index
It is an index record that appears for only some of the values in the file. Sparse Index helps
you to resolve the issues of dense Indexing in DBMS. In this method of indexing technique, a
range of index columns stores the same data block address, and when data needs to be
retrieved, the block address will be fetched.

However, sparse Index stores index records for only some search-key values. It needs less
space, less maintenance overhead for insertion, and deletions but It is slower compared to the
dense Index for locating records.

Below is an database index Example of Sparse Index

3
Secondary Index in DBMS

The secondary Index in DBMS can be generated by a field which has a unique value for each
record, and it should be a candidate key. It is also known as a non-clustering index.
This two-level database indexing technique is used to reduce the mapping size of the first
level. For the first level, a large range of numbers is selected because of this; the mapping
size always remains small.

Secondary Index Example


Let’s understand secondary indexing with a database index example:

In a bank account database, data is stored sequentially by acc_no; you may want to find all
accounts in of a specific branch of ABC bank.

Here, you can have a secondary index in DBMS for every search-key. Index record is a
record point to a bucket that contains pointers to all the records with their specific search-key
value.

4
Clustering Index in DBMS
In a clustered index, records themselves are stored in the Index and not pointers. Sometimes
the Index is created on non-primary key columns which might not be unique for each record.
In such a situation, you can group two or more columns to get the unique values and create an
index which is called clustered Index. This also helps you to identify the record faster.

Example:

Let’s assume that a company recruited many employees in various departments. In this case,
clustering indexing in DBMS should be created for all employees who belong to the same
dept.

It is considered in a single cluster, and index points point to the cluster as a whole. Here,
Department _no is a non-unique key.

Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on the
disk along with the actual database files. As the size of the database grows, so does the size of
the indices. There is an immense need to keep the index records in the main memory so as to
speed up the search operations. If single-level index is used, then a large size index cannot be
kept in memory which leads to multiple disk accesses.

5
Multi-level Index helps in breaking down the index into several smaller indices in order to
make the outermost level so small that it can be saved in a single disk block, which can easily
be accommodated anywhere in the main memory.
B+ Tree
A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf
nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the
same height, thus balanced. Additionally, the leaf nodes are linked using a link list; therefore,
a B+ tree can support random access as well as sequential access.
Structure of B+ Tree
Every leaf node is at equal distance from the root node. A B + tree is of the order n where n is
fixed for every B+ tree.

6
3 Discuss about Hashing in detail with merits and demerits.
Hashing

In a huge database structure, it is very inefficient to search all the index values and reach the
desired data. Hashing technique is used to calculate the direct location of a data record on the
disk without using index structure.

In this technique, data is stored at the data blocks whose address is generated by using the
hashing function. The memory location where these records are stored is known as data
bucket or data blocks.

In this, a hash function can choose any of the column value to generate the address. Most of
the time, the hash function uses the primary key to generate the address of the data block. A
hash function is a simple mathematical function to any complex mathematical function. We
can even consider the primary key itself as the address of the data block. That means each
row whose address will be the same as a primary key stored in the data block.

The above diagram shows data block addresses same as primary key value. This hash
function can also be a simple mathematical function like exponential, mod, cos, sin, etc.
Suppose we have mod (5) hash function to determine the address of the data block. In this
case, it applies mod (5) hash function on the primary keys and generates 3, 3, 1, 4 and 2
respectively, and records are stored in those data block addresses.

7
Types of Hashing:

Static Hashing

In static hashing, the resultant data bucket address will always be the same. That means if we
generate an address for EMP_ID =103 using the hash function mod (5) then it will always
result in same bucket address 3. Here, there will be no change in the bucket address.

Hence in this static hashing, the number of data buckets in memory remains constant
throughout. In this example, we will have five data buckets in the memory used to store the
data.

8
Operations of Static Hashing
o Searching a record

When a record needs to be searched, then the same hash function retrieves the address of the
bucket where the data is stored.

o Insert a Record

When a new record is inserted into the table, then we will generate an address for a new
record based on the hash key and record is stored in that location.

o Delete a Record

To delete a record, we will first fetch the record which is supposed to be deleted. Then we
will delete the records for that address in memory.

o Update a Record

To update a record, we will first search it using a hash function, and then the data record is
updated.

If we want to insert some new record into the file but the address of a data bucket generated
by the hash function is not empty, or data already exists in that address. This situation in the
static hashing is known as bucket overflow. This is a critical situation in this method.

To overcome this situation, there are various methods. Some commonly used methods are as
follows:

1. Open Hashing

9
When a hash function generates an address at which data is already stored, then the next
bucket will be allocated to it. This mechanism is called as Linear Probing.

For example: suppose R3 is a new address which needs to be inserted, the hash function
generates address as 112 for R3. But the generated address is already full. So the system
searches next available data bucket, 113 and assigns R3 to it.

2. Close Hashing

When buckets are full, then a new data bucket is allocated for the same hash result and is
linked after the previous one. This mechanism is known as Overflow chaining.

For example: Suppose R3 is a new address which needs to be inserted into the table, the
hash function generates address as 110 for it. But this bucket is full to store the new data. In
this case, a new bucket is inserted at the end of 110 buckets and is linked to it.

10
Dynamic Hashing

o The dynamic hashing method is used to overcome the problems of static hashing like
bucket overflow.
o In this method, data buckets grow or shrink as the records increases or decreases. This
method is also known as Extendable hashing method.
o This method makes hashing dynamic, i.e., it allows insertion or deletion without
resulting in poor performance.

How to search a key


o First, calculate the hash address of the key.
o Check how many bits are used in the directory, and these bits are called as i.
o Take the least significant i bits of the hash address. This gives an index of the
directory.
o Now using the index, go to the directory and find bucket address where the record
might be.

How to insert a new record


o Firstly, you have to follow the same procedure for retrieval, ending up in some
bucket.
o If there is still space in that bucket, then place the record in it.
o If the bucket is full, then we will split the bucket and redistribute the records.

For example:

Consider the following grouping of keys into buckets, depending on the prefix of their hash
address:

The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6

11
are 01, so it will go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into
bucket B2. The last two bits of 7 are 11, so it will go into B3.

Insert key 9 with hash address 10001 into the above structure:
o Since key 9 has hash address 10001, it must go into the first bucket. But bucket B1 is
full, so it will get split.
o The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are 001, so it will go
into bucket B1, and the last three bits of 6 are 101, so it will go into bucket B5.
o Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and 100 entry
because last two bits of both the entry are 00.
o Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and 110 entry
because last two bits of both the entry are 10.
o Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry because last
two bits of both the entry are 11.

12
Advantages of dynamic hashing
o In this method, the performance does not decrease as the data grows in the system. It
simply increases the size of memory to accommodate the data.
o In this method, memory is well utilized as it grows and shrinks with the data. There
will not be any unused memory lying.
o This method is good for the dynamic database where data grows and shrinks
frequently.

Disadvantages of dynamic hashing


o In this method, if the data size increases then the bucket size is also increased. These
addresses of data will be maintained in the bucket address table. This is because the
data address will keep changing as buckets grow and shrink. If there is a huge
increase in data, maintaining the bucket address table becomes tedious.
o In this case, the bucket overflow situation will also occur. But it might take little time
to reach this situation than static hashing.

4
a) What is a transaction in database system? [L1][CO6] [2M]
b) List and explain the ACID Properties with neat diagram. [L4][CO6] [10M]
A transaction is a single logical unit of work that accesses and possibly modifies the
contents of a database. Transactions access data using read and write operations.  
In order to maintain consistency in a database, before and after the transaction, certain
properties are followed. These are called ACID properties.

13
Atomicity:
By this, we mean that either the entire transaction takes place at once or doesn’t happen at
all. There is no midway i.e. transactions do not occur partially. Each transaction is
considered as one unit and either runs to completion or is not executed at all. It involves the
following two operations. 
—Abort: If a transaction aborts, changes made to the database are not visible.  
—Commit: If a transaction commits, changes made are visible. 
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from
account X to account Y.

If the transaction fails after completion of T1 but before completion of T2.( say,
after write(X) but before write(Y)), then the amount has been deducted from X but not
added to Y. This results in an inconsistent database state. Therefore, the transaction must be
executed in its entirety in order to ensure the correctness of the database state.

Consistency:

This means that integrity constraints must be maintained so that the database is consistent
before and after the transaction. It refers to the correctness of a database. Referring to the
example above, 
The total amount before and after the transaction must be maintained. 
Total before T occurs = 500 + 200 = 700. 
Total after T occurs = 400 + 300 = 700. 
Therefore, the database is consistent. Inconsistency occurs in case T1 completes
but T2 fails. As a result, T is incomplete.

Isolation:

This property ensures that multiple transactions can occur concurrently without leading to
the inconsistency of the database state. Transactions occur independently without
interference. Changes occurring in a particular transaction will not be visible to any other
transaction until that particular change in that transaction is written to memory or has been
committed. This property ensures that the execution of transactions concurrently will result
in a state that is equivalent to a state achieved these were executed serially in some order.  
Let X= 500, Y = 500. 
Consider two transactions T and T”.

14
Suppose T has been executed till Read (Y) and then T’’ starts. As a result, interleaving of
operations takes place due to which T’’ reads the correct value of X but the incorrect value
of Y and sum computed by 
T’’: (X+Y = 50, 000+500=50, 500) 
is thus not consistent with the sum at end of the transaction: 
T: (X+Y = 50, 000 + 450 = 50, 450). 
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must
take place in isolation and changes should be visible only after they have been made to the
main memory.

Durability:

This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a
system failure occurs. These updates now become permanent and are stored in non-volatile
memory. The effects of the transaction, thus, are never lost.
Some important points:
Property Responsibility for maintaining properties
Atomicity Transaction Manager
Consistency Application programmer
Isolation Concurrency Control Manager
Durability Recovery Manager
The ACID properties, in totality, provide a mechanism to ensure the correctness and
consistency of a database in a way such that each transaction is a group of operations that
acts as a single unit, produces consistent results, acts in isolation from other operations, and
updates that it makes are durably stored.

5 List and explain different concurrency control mechanisms. [L4][CO6] [12M]

Concurrency control  is provided in a database to:


(i) enforce isolation among transactions.
(ii) preserve database consistency through consistency preserving execution of
transactions.
(iii) resolve read-write and write-read conflicts.

Concurrency control techniques are:


1. Two-phase locking Protocol
2. Time stamp ordering Protocol

 1. Two-Phase Locking Protocol : Locking is an operation which secures: permission to

15
read, OR permission to write a data item. Two phase locking is a process used to gain
ownership of shared resources without creating the possibility of deadlock. The 3 activities
taking place in the two phase update algorithm are:
(i). Lock Acquisition
(ii). Modification of Data
(iii). Release Lock
Two phase locking prevents deadlock from occurring in distributed systems by releasing all
the resources it has acquired, if it is not possible to acquire all the resources required
without waiting for another process to finish using a lock. This means that no process is
ever in a state where it is holding some shared resources, and waiting for another process to
release a shared resource which it requires. This means that deadlock cannot occur due to
resource contention. A transaction in the Two Phase Locking Protocol can assume one of
the 2 phases:
 (i) Growing Phase: In this phase a transaction can only acquire locks but cannot release
any lock. The point when a transaction acquires all the locks it needs is called the Lock
Point.
 (ii) Shrinking Phase: In this phase a transaction can only release locks but cannot
acquire any.
2. Time Stamp Ordering Protocol : A timestamp is a tag that can be attached to any
transaction or any data item, which denotes a specific time on which the transaction or the
data item had been used in any way. A timestamp can be implemented in 2 ways. One is to
directly assign the current value of the clock to the transaction or data item. The other is to
attach the value of a logical counter that keeps increment as new timestamps are required.
The timestamp of a data item can be of 2 types:
 (i) W-timestamp(X): This means the latest time when the data item X has been written
into.
 (ii) R-timestamp(X): This means the latest time when the data item X has been read
from. These 2 timestamps are updated each time a successful read/write operation is
performed on the data item X.
6 Explain lock-based concurrency control mechanisms with diagram in detail.
[L5][CO6][12M]

Lock Based Protocols in DBMS is a mechanism in which a transaction cannot Read or


Write the data until it acquires an appropriate lock. Lock based protocols help to eliminate the
concurrency problem in DBMS for simultaneous transactions by locking or isolating a
particular transaction to a single user.
A lock is a data variable which is associated with a data item. This lock signifies that
operations that can be performed on the data item. Locks in DBMS help synchronize access
to the database items by concurrent transactions.

All lock requests are made to the concurrency-control manager. Transactions proceed only
once the lock request is granted.

Binary Locks: A Binary lock on a data item can either locked or unlocked states.
Shared/exclusive: This type of locking mechanism separates the locks in DBMS based on
their uses. If a lock is acquired on a data item to perform a write operation, it is called an
exclusive lock.

16
1. Shared Lock (S):

A shared lock is also called a Read-only lock. With the shared lock, the data item can be
shared between transactions. This is because you will never have permission to update data
on the data item.

For example, consider a case where two transactions are reading the account balance of a
person. The database will let them read by placing a shared lock. However, if another
transaction wants to update that account’s balance, shared lock prevent it until the reading
process is over.

2. Exclusive Lock (X):

With the Exclusive Lock, a data item can be read as well as written. This is exclusive and
can’t be held concurrently on the same data item. X-lock is requested using lock-x
instruction. Transactions may unlock the data item after finishing the ‘write’ operation.

For example, when a transaction needs to update the account balance of a person. You can
allows this transaction by placing X lock on it. Therefore, when the second transaction wants
to read or write, exclusive lock prevent this operation.

3. Simplistic Lock Protocol

This type of lock-based protocols allows transactions to obtain a lock on every object before
beginning operation. Transactions may unlock the data item after finishing the ‘write’
operation.

4. Pre-claiming Locking

Pre-claiming lock protocol helps to evaluate operations and create a list of required data items
which are needed to initiate an execution process. In the situation when all locks are granted,
the transaction executes. After that, all locks release when all of its operations are over.

Starvation

Starvation is the situation when a transaction needs to wait for an indefinite period to acquire
a lock.

Following are the reasons for Starvation:

 When waiting scheme for locked items is not properly managed


 In the case of resource leak
 The same transaction is selected as a victim repeatedly

Deadlock

Deadlock refers to a specific situation where two or more processes are waiting for each other
to release a resource or more than two processes are waiting for the resource in a circular
chain.

17
Two Phase Locking Protocol
Two Phase Locking Protocol also known as 2PL protocol is a method of concurrency
control in DBMS that ensures serializability by applying a lock to the transaction data which
blocks other transactions to access the same data simultaneously. Two Phase Locking
protocol helps to eliminate the concurrency problem in DBMS.
This locking protocol divides the execution phase of a transaction into three different parts.

 In the first phase, when the transaction begins to execute, it requires permission for
the locks it needs.
 The second part is where the transaction obtains all the locks. When a transaction
releases its first lock, the third phase starts.
 In this third phase, the transaction cannot demand any new locks. Instead, it only
releases the acquired locks.

The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in
two steps:

 Growing Phase: In this phase transaction may obtain locks but may not release any
locks.
 Shrinking Phase: In this phase, a transaction may release locks but not obtain any
new lock

7 Explain about concurrency control based on time-stamp ordering. [L2][CO6][12M]


Timestamp Ordering Protocol

o The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
o The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.

18
o The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as
soon as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has
entered the system at 007 times and transaction T2 has entered the system at 009
times. T1 has the higher priority, so it executes first as it is entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write'
operation on a data.

Basic Timestamp ordering protocol works as follows:

1. Check the following condition whenever a transaction Ti issues a Read (X) operation:

o If TS(Ti) >= W_TS(X) then the operation is executed. Otherwise it is rejected.

2. Check the following condition whenever a transaction Ti issues a Write(X) operation:

o If TS(Ti) >R_TS(X) and TS(Ti) > W_TS(X) then the operation is executed.
Otherwise it is rejected.

Where,

TS(TI) denotes the timestamp of the transaction Ti.

R_TS(X) denotes the Read time-stamp of data-item X.

W_TS(X) denotes the Write time-stamp of data-item X.

o Timestamp Ordering Protocol ensures serializability since the precedence graph is as


follows:

o Timestamp Ordering Protocol ensures freedom from deadlock that means no


transaction ever waits.

19
8 Explain the following concepts [L2][CO5]
[12M]
i) Transaction States ii) Concurrent Executions
States of Transaction

In a database, the transaction can be in one of the following states -

Active state
o The active state is the first state of every transaction. In this state, the transaction is
being executed.
o For example: Insertion or deletion or updating a record is done here. But all the
records are still not saved to the database.

Partially committed
o In the partially committed state, a transaction executes its final operation, but the data
is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is
executed in this state.

Committed

A transaction is said to be in a committed state if it executes all its operations successfully. In


this state, all the effects are now permanently saved on the database system.

Failed state
o If any of the checks made by the database recovery system fails, then the transaction
is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.

20
Aborted
o If any of the checks fail and the transaction has reached a failed state then the
database recovery system will make sure that the database is in its previous consistent
state. If not then it will abort or roll back the transaction to bring the database into a
consistent state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the two
operations:
1. Re-start the transaction
2. Kill the transaction

DBMS Concurrency Control

Concurrency Control is the management procedure that is required for controlling concurrent
execution of the operations that take place on a database.

But before knowing about concurrency control, we should know about concurrent execution.

Concurrent Execution in DBMS


o In a multi-user system, multiple users can access and use the same database at one
time, which is known as the concurrent execution of the database. It means that the
same database is executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations,
thus maintaining the consistency of the database.

9 Explain the following in transaction failures [L2][CO5]


[12M]
i) Failure Classification
ii) undo and redo
Failure Classification

To find that where the problem has occurred, we generalize a failure into the following
categories:

21
1. Transaction failure
2. System crash
3. Disk failure

1. Transaction failure

The transaction failure occurs when it fails to execute or when it reaches a point from
where it can't go any further. If a few transaction or process is hurt, then this is called
as transaction failure.

Reasons for a transaction failure could be -

1. Logical errors: If a transaction cannot complete due to some code error or an


internal error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active
transaction because the database system is not able to execute it. For
example, The system aborts an active transaction, in case of deadlock or
resource unavailability.

2. System Crash
o System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.

Fail-stop assumption: In the system crash, non-volatile storage is assumed


not to be corrupted.

3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently. It
was a common problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk
storage.

Checkpoint

o The checkpoint is a type of mechanism where all the previous logs are removed from
the system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the

22
transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file. Then the
log file is updated with the new step of transaction till next checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed.

Recovery using Checkpoint

In the following manner, a recovery system recovers the database from this failure:

o The recovery system reads log files from the end to start. It reads log files from T4 to
T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o The transaction is put into redo state if the recovery system sees a log with <Tn,
Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous
list, all the transactions are removed and then redone before saving their logs.
o For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's why
the transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and
T3 transaction into redo list.
o The transaction is put into undo state if the recovery system sees a log with <Tn,
Start> but no commit or abort log found. In the undo-list, all the transactions are
undone, and their logs are removed.
o For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list

23
since this transaction is not yet complete and failed amid.

10. Explain log-Based Recovery in detail. [L5][CO6] [12M]

Log-Based Recovery

o The log is a sequence of records. Log of each transaction is maintained in some stable
storage so that if any failure occurs, then it can be recovered from there.
o If any operation is performed on the database, then it will be recorded in the log.
o But the process of storing the logs should be done before the actual transaction is
applied in the database.

Let's assume there is a transaction to modify the City of a student. The following logs are
written for this transaction.

o When the transaction is initiated, then it writes 'start' log.


1. <Tn, Start>
o When the transaction modifies the City from 'Noida' to 'Bangalore', then another log is
written to the file.
1. <Tn, City, 'Noida', 'Bangalore' >
o When the transaction is finished, then it writes another log to indicate the end of the
transaction.
1. <Tn, Commit>

There are two approaches to modify the database:

1. Deferred database modification:


o The deferred modification technique occurs if the transaction does not modify the
database until it has committed.
o In this method, all the logs are created and stored in the stable storage, and the
database is updated when a transaction commits.

2. Immediate database modification:


o The Immediate modification technique occurs if database modification occurs while
the transaction is still active.
o In this technique, the database is modified immediately after every operation. It
follows an actual database modification.

Recovery using Log records

24
When the system is crashed, then the system consults the log to find which transactions need
to be undone and which need to be redone.

1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then
the Transaction Ti needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti, commit>
or <Ti, abort>, then the Transaction Ti needs to be undone.

25

You might also like