Professional Documents
Culture Documents
B-tree is a special type of self-balancing search tree in which each node can contain more than
one key and can have more than two children. It is a generalized form of the binary search tree.
It is also known as a height-balanced m-way tree.
B-tree
Why do you need a B-tree data structure?
The need for B-tree arose with the rise in the need for lesser time in accessing physical storage
media like a hard disk. The secondary storage devices are slower with a larger capacity. There
was a need for such types of data structures that minimize the disk access.
Other data structures such as a binary search tree, avl tree, red-black tree, etc can store only one
key in one node. If you have to store a large number of keys, then the height of such trees
becomes very large, and the access time increases.
However, B-tree can store many keys in a single node and can have multiple child nodes. This
decreases the height significantly allowing faster disk accesses.
B-tree Properties
1. For each node x, the keys are stored in increasing order.
2. In each node, there is a boolean value x.leaf which is true if x is a leaf.
3. If n is the order of the tree, each internal node can contain at most n - 1 keys along with a pointer
to each child.
4. Each node except root can have at most n children and at least n/2 children.
5. All leaves have the same depth (i.e. height-h of the tree).
6. The root has at least 2 children and contains a minimum of 1 key.
7. If n ≥ 1, then for any n-key B-tree of height h and minimum degree t ≥ 2, h ≥ logt (n+1)/2.
Operations on a B-tree
Searching an element in a B-tree
Searching for an element in a B-tree is the generalized form of searching an element in a Binary
Search Tree. The following steps are followed.
1. Starting from the root node, compare k with the first key of the node.
If k = the first key of the node, return the node and the index.
2. If k.leaf = true, return NULL (i.e. not found).
3. If k < the first key of the root node, search the left child of this key recursively.
4. If there is more than one key in the current node and k > the first key, compare k with the next
key in the node.
If k < next key, search the left child of this key (ie. k lies in between the first and the second
keys).
Else, search the right child of the key.
5. Repeat steps 1 to 4 until the leaf is reached.
Searching Example
1. Let us search key k = 17 in the tree below of degree 3.
B-tree
2. k is not found in the root so, compare it with the root key.
5. Since k < 18, k lies between 16 and 18. Search in the right child of 16 or the left child of 18.
k lies in between 16 and 18
6. k is found
k is found
Hashing
Hashing is a technique or process of mapping keys, and values into the hash table by using a
hash function. It is done for faster access to elements. The efficiency of mapping depends on the
efficiency of the hash function used.
Let a hash function H(x) maps the value x at the index x%10 in an Array. For example if the list
of values is [11,12,13,14,15] it will be stored at positions {1,2,3,4,5} in the array or Hash table
respectively.
Transaction processing:
What is Transaction?
Properties of a transaction:
Atomicity: As a transaction is a set of logically related operations, either all of them should be
executed or none. A debit transaction discussed above should either execute all three operations
or none. If the debit transaction fails after executing operations 1 and 2 then its new value of
4000 will not be updated in the database which leads to inconsistency.
Consistency: If operations of debit and credit transactions on the same account are executed
concurrently, it may leave the database in an inconsistent state.
● For Example, with T1 (debit of Rs. 1000 from A) and T2 (credit of 500 to A) executing
concurrently, the database reaches an inconsistent state.
● Let us assume the Account balance of A is Rs. 5000. T1 readsofb(5000) and stores the value
in its local buffer space. Then T2 reads A(5000) and also stores the xvalue in its local buffer
space.
● T1 performs A=A-1000 (5000-1000=4000) and 4000 is stored in T1 buffer space. Then T2
performs A=A+500 (5000+500=5500) and 5500 is stored in the T2 buffer space. T1 writes
the value from its buffer background 66 to the database.
● A’s value is updated to 4000 in the database and then T2 writes the value from its buffer back
to the database. A’s value is updated to 5500 which shows that the effect of the debit
transaction is lost and the database has become inconsistent.
● To maintain consistency of the database, we need concurrency control protocols which will
be discussed in the next article. The operations of T1 and T2 with their buffers and database
have been shown in Table 1.
A=5000
W(A); A=5500
Isolation: The result of a transaction should not be visible to others before the transaction is
committed. For example, let us assume that A’s balance is Rs. 5000 and T1 debits Rs. 1000 from
A. A’s new balance will be 4000. If T2 credits Rs. 500 to A’s new balance, A will become 4500,
and after this T1 fails. Then we have to roll back T2 as well because it is $e4using the value
produced by T1. So transaction results are not made visible to other transactions before it
commits.
Durable: Once the database has committed a transaction, the changes made by the transaction
should be permanent. e.g.; If a person has credited $500000 to his account, the bank can’t say
that the update has been lost. To avoid this problem, multiple copies of the database are stored at
different locations.
Advantages of Concurrency:
In general, concurrency means, that more than one transaction can work on a system.
The advantages of a concurrent system are:
● Waiting Time: It means if a process is in a ready state but still the process does not get the
system to get execute is called waiting time. So, concurrency leads to less waiting time.
● Response Time: The time wasted in getting the response from the cpu for the first time, is
called response time. So, concurrency leads to less Response Time.
● Resource Utilization: The amount of Resource utilization in a particular system is called
Resource Utilization. Multiple transactions can run parallel in a system. So, concurrency
leads to more Resource Utilization.
● Efficiency: The amount of output produced in comparison to given input is called efficiency.
So, Concurrency leads to more Efftransactions
What is Serializability?
Serializability of schedules ensures that a non-serial schedule is equivalent to a serial schedule. It
helps in maintaining the transactions to execute simultaneously without interleaving one another.
In simple words, serializability is a way to check if the execution of two or more transactions are
maintaining the database consistency or not.
Schedules and Serializable Schedules in DBMS
Schedules in DBMS are a series of operations performing one transaction to the other.
R(X) means Reading the value: X; and W(X) means Writing the value: X.
1. Serial Schedule - A schedule in which only one transaction is executed at a time, i.e.,
one transaction is executed completely before starting another transaction.
Example:
Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
W(b)
R(a)
W(a)
Here, we can see that Transaction-2 starts its execution after the completion of Transaction-1.
Example:
Transaction-1 Transaction-2
R(a)
W(a)
Transaction-1 Transaction-2
R(b)
W(b)
R(b)
R(a)
W(b)
W(a)
We can see that Transaction-2 starts its execution before the completion of Transaction-1, and
they are interchangeably working on the same data, i.e., "a" and "b".
Two-phase locking has two phases, one is growing, where all the locks are being acquired by the
transaction; and the second phase is shrinking, where the locks held by the transaction are being
released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then
upgrade it to an exclusive lock.
Strict Two-Phase Locking
The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the
transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a
lock after using it. Strict-2PL holds all the locks until the commit point and releases all the locks
at a time.
Strict-2PL does not have cascading abort as 2PL does.
Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based protocol. This protocol
uses either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the
time of execution, whereas timestamp-based protocols start working as soon as a transaction is
created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age
of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004 is two
seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system
know when the last ‘read and write’ operation was performed on the data item.
Timestamp Ordering Protocol
The timestamp-ordering protocol ensures serializability among transactions in their conflicting
read and write operations. This is the responsibility of the protocol system that the conflicting
pair of tasks should be executed according to the timestamp values of the transactions.
● The timestamp of transaction Ti is denoted as TS(Ti).
● Read time-stamp of data-item X is denoted by R-timestamp(X).
● Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows −
● If a transaction Ti issues a read(X) operation −
o If TS(Ti) < W-timestamp(X)
▪ Operation rejected.
o If TS(Ti) >= W-timestamp(X)
▪ Operation executed.
o All data-item timestamps updated.