B+ Tree

B+ Tree
Table of Contents
Abstract.......................................................................................................................................................1
Introduction.................................................................................................................................................2
Algorithm.....................................................................................................................................................5
Search Algorithm.....................................................................................................................................5
Insertion Algorithm..................................................................................................................................5
Deletion Algorithm..................................................................................................................................6
Rules ................................................................................................................................................6w
Uses ....................................................................................................................................................7
Application...........................................................................................................................................7
Implementation...................................................................................................................................7
Conclusion.................................................................................................................................................19
References.................................................................................................................................................19
Abstract
Most queries can be executed more quickly if the values are stored in order. But it's not practical to hope
to store all the rows in the table one after another, in sorted order, because this requires rewriting the
entire table with each insertion or deletion of a row.This leads us to instead imagine storing our rows in a
tree structure. Our first instinct would be a balanced binary search tree like a red-black tree, but this really
doesn't make much sense for a database since it is stored on disk. You see, disks work by reading and
writing whole blocks of data at once — typically 512 bytes or four kilobytes. A node of a binary search
tree uses a small fraction of that, so it makes sense to look for a structure that fits more neatly into a disk
block.Hence the B+-tree, in which each node stores up to d references to children and up to d − 1 keys.
Each reference is considered “between” two of the node's keys; it references the root of a subtree for
which all values are between these two keys.
Introduction
In order, to implement dynamic multilevel indexing, B-tree and B+ tree are generally employed. The
drawback of B-tree used for indexing, however is that it stores the data pointer (a pointer to the disk file
block containing the key value), corresponding to a particular key value, along with that key value in the
node of a B-tree. This technique, greatly reduces the number of entries that can be packed into a node of a
B-tree, thereby contributing to the increase in the number of levels in the B-tree, hence increasing the
search time of a record. B+ tree eliminates the above drawback by storing data pointers only at the leaf
nodes of the tree. Thus, the structure of leaf nodes of a B+ tree is quite different from the structure of
internal nodes of the B+ tree. It may be noted here that, since data pointers are present only at the leaf
nodes, the leaf nodes must necessarily store all the key values along with their corresponding data
pointers to the disk file block, in order to access them. Moreover, the leaf nodes are linked to provide
ordered access to the records. The leaf nodes, therefore form the first level of index, with the internal
nodes forming the other levels of a multilevel index. Some of the key values of the leaf nodes also appear
in the internal nodes, to simply act as a medium to control the searching of a record.From the above
discussion it is apparent that a B+ tree, unlike a B-tree has two orders, ‘a’ and ‘b’, one for the internal
nodes and the other for the external (or leaf) nodes.
B+ tree
A B+ tree is an N-ary tree with a variable but often large number of children per node. A B+ tree consists
of a root, internal nodes and leaves.[1] The root may be either a leaf or a node with two or more children.
[2]
A B+ tree can be viewed as a B-tree in which each node contains only keys (not key–value pairs), and to
which an additional level is added at the bottom with linked leaves.The primary value of a B+ tree is in
storing data for efficient retrieval in a block-oriented storage context — in particular, filesystems. This is
primarily because unlike binary search trees, B+ trees have very high fanout (number of pointers to child
nodes in a node,[1] typically on the order of 100 or more), which reduces the number of I/O operations
required to find an element in the tree.A B+-tree requires that each leaf be the same distance from the
root, as in this picture, where searching for any of the 11 values (all listed on the bottom level) will
involve loading three nodes from the disk (the root block, a second-level block, and a leaf).In
practice, d will be larger — as large, in fact, as it takes to fill a disk block. Suppose a block is 4KB, our
keys are 4-byte integers, and each reference is a 6-byte file offset. Then we'd choose d to be the largest
value so that 4 (d − 1) + 6 d ≤ 4096; solving this inequality for d, we end up with d ≤ 410, so we'd use 410
for d. As you can see, d can be large.
A B+-tree maintains the following invariants:
 Every node has one more references than it has keys.

 All leaves are at the same distance from the root.
 For every non-leaf node N with k being the number of keys in N: all keys in the first child's
subtree are less than N's first key; and all keys in the ith child's subtree (2 ≤ i ≤ k) are between the
(i − 1)th key of n and the ith key of n.
 The root has at least two children.
 Every non-leaf, non-root node has at least floor(d / 2) children.
 Each leaf contains at least floor(d / 2) keys.
 Every key from the table appears in a leaf, in left-to-right sorted order.
In our examples, we'll continue to use 4 for d. Looking at our invariants, this requires that each leaf have
at least two keys, and each internal node to have at least two children (and thus at least one key).
Algorithm
Basic operations associated with B+ Tree:
Search Algorithm
In B+ Tree, a search is one of the easiest procedures to execute and get fast and accurate results from it.
The following search algorithm is applicable:
 To find the required record, you need to execute the binary search on the available records in the
Tree.
 In case of an exact match with the search key, the corresponding record is returned to the user.
 In case the exact key is not located by the search in the parent, current, or leaf node, then a "not
found message" is displayed to the user.
 The search process can be re-run for better and more accurate results.
 Perform a binary search on the records in the current node.
 If a record with the search key is found, then return that record.
 If the current node is a leaf node and the key is not found, then report an unsuccessful search.
 Otherwise, follow the proper branch and repeat the process.
 Output:
 The matched record set against the exact key is displayed to the user; otherwise, a failed attempt
is shown to the user.
Insertion algorithm
The following algorithm is applicable for the insert operation:
 50 percent of the elements in the nodes are moved to a new leaf for storage.
 The parent of the new Leaf is linked accurately with the minimum key value and a new location
in the Tree.
 Split the parent node into more locations in case it gets fully utilized.
 Now, for better results, the center key is associated with the top-level node of that Leaf.
 Until the top-level node is not found, keep on iterating the process explained in the above steps.
 Insertion of node in a B+ Tree:
o Allocate new leaf and move half the buckets elements to the new bucket.
o Insert the new leaf's smallest key and address into the parent.
o If the parent is full, split it too.
o Add the middle key to the parent node.
o Repeat until a parent is found that need not split.
o If the root splits, create a new root which has one key and two pointers. (That is, the value
that gets pushed to the new root gets removed from the original node)
Output:
The algorithm will determine the element and successfully insert it in the required leaf node.
.
Deletion Algoritham
 Deletion of a node in a B+ Tree:

 Descend to the leaf where the key exists.
 Remove the required key and associated reference from the node.
 If the node still has enough keys and references to satisfy the invariants, stop.
 If the node has too few keys to satisfy the invariants, but its next oldest or next
youngest sibling at the same level has more than necessary, distribute the keys between this node and the
neighbor. Repair the keys in the level above to represent that these nodes now have a different “split point”
between them; this involves simply changing a key in the levels above, without deletion or insertion.
 If the node has too few keys to satisfy the invariant, and the next oldest or next
youngest sibling is at the minimum for the invariant, then merge the node with its sibling; if the node is a non-
leaf, we will need to incorporate the “split key” from the parent into our merging.
 In either case, we will need to repeat the removal algorithm on the parent node to
remove the “split key” that previously separated these merged nodes — unless the parent is the root and we are
removing the final key from the root, in which case the merged node becomes the new root (and the tree has
become one level shorter than before).
Complexity
 Worst case search time complexity: Θ(logn)

 Average case search time complexity: Θ(logn)
 Best case search time complexity: Θ(logn)
 Worst case insertion time complexity: Θ(logn)
 Worst case deletion time complexity: Θ(logn)
 Average case Space complexity: Θ(n)
 Worst case Space complexity: Θ(n)
Rules for B+ Tree

Here are essential rules for B+ Tree.
 Leaves are used to store data records.

 It stored in the internal nodes of the Tree.
 If a target key value is less than the internal node, then the point just to its left side is followed.
 If a target key value is greater than or equal to the internal node, then the point just to its right
side is followed.
 The root has a minimum of two children.
Why use B+ Tree

Here, are reasons for using B+ Tree:
 Key are primarily utilized to aid the search by directing to the proper Leaf.
 B+ Tree uses a "fill factor" to manage the increase and decrease in a tree.
 In B+ trees, numerous keys can easily be placed on the page of memory because they do not have
the data associated with the interior nodes. Therefore, it will quickly access tree data that is on the
leaf node.
 A comprehensive full scan of all the elements i
 s a tree that needs just one linear pass because all the leaf nodes of a B+ tree are linked with each
other.
 The ReiserFS, NSS, XFS, JFS, ReFS, and BFS filesystems all use this type of tree for metadata
indexing
 BFS also uses B+ trees for storing directories.
 NTFS uses B+ trees for directory and security-related metadata indexing.
 EXT4 uses extent trees (a modified B+ tree data structure) for file extent indexing.
 Relational database management systems such as IBM DB2, Informix,Microsoft SQL
Server, Oracle 8, Sybase ASE,[4] and SQLite[ support this type of tree for table indices. Key–value
database management systems such as CouchDB and Tokyo Cabinet support this type of tree for
data access.
Applications
Importance of B+ Tree:
 Using B+, we can retrieve range retrieval or partial retrieval. Traversing through the tree structure
makes this easier and quicker.
 As the number of record increases/decreases, B+ tree structure grows/shrinks. There is no restriction
on B+ tree size, like we have in ISAM.
 Since we have all the data stored in the leaf nodes and more branching of internal nodes makes height
of the tree shorter. This reduces disk I/O. Hence it works well in secondary storage devices.
characteristics
.
1. Data record are only stored in the leaves.

2. Internal nodes store just keys.
3. All data is stored at the leaf nodes(leaf pages); all other nodes(index pages only store keys.)
4. All the leaf nodes are interconnected with each other leaf nodes for faster access.
5. Keys are used for directing a search to the proper leaf.
6. If a target is less than a key in an internal node,then the pointer just to its left is followed.
7. If a target key is greater or equal to the key in the internal node, then the ponter to its right is
followed.
8. B+ tree combines features of ISAM (indexed sequential access method) and B trees.
Implementation
The leaves (the bottom-most index blocks) of the B+ tree are often linked to one another in a linked list;
this makes range queries or an (ordered) iteration through the blocks simpler and more efficient (though
the aforementioned upper bound can be achieved even without this addition). This does not substantially
increase space consumption or maintenance on the tree. This illustrates one of the significant advantages
of a B+tree over a B-tree; in a B-tree, since not all keys are present in the leaves, such an ordered linked
list cannot be constructed. A B+tree is thus particularly useful as a database system index, where the data
typically resides on disk, as it allows the B+tree to actually provide an efficient structure for housing the
data itself (this is described in as index structure "Alternative 1").If a storage system has a block size of B
bytes, and the keys to be stored have a size of k, arguably th e most efficient B+ tree is one
where Although theoretically the one-off is unnecessary, in practice there is often a little extra space taken
up by the index blocks (for example, the linked list references in the leaf blocks). Having an index block
which is slightly larger than the storage system's actual block represents a significant performance
decrease; therefore erring on the side of caution is preferable.If nodes of the B+ tree are organized as
arrays of elements, then it may take a considerable time to insert or delete an element as half of the array
will need to be shifted on average. To overcome this problem, elements inside a node can be organized in
a binary tree or a B+ tree instead of an array. B+ trees can also be used for data stored in RAM. In this
case a reasonable choice for block size would be the size of processor's cache line. Space efficiency of B+
trees can be improved by using some compression techniques. One possibility is to use delta encoding to
compress keys stored into each block. For internal blocks, space saving can be achieved by either
compressing keys or pointers. For string keys, space can be saved by using the following technique:
Normally the i-th entry of an internal block contains the first key of block Instead of storing the full key,
we could store the shortest prefix of the first key of block that is strictly greater (in lexicographic order)
than last key of block i. There is also a simple way to compress pointers: if we suppose that some
consecutive blocks are stored contiguously, then it will suffice to store only a pointer to the first block
and the count of consecutive blocks.All the above compression techniques have some drawbacks. First, a
full block must be decompressed to extract a single element. One technique to overcome this problem is
to divide each block into sub-blocks and compress them separately. In this case searching or inserting an
element will only need to decompress or compress a sub-block instead of a full block. Another drawback
of compression techniques is that the number of stored elements may vary considerably from a block to
another depending on how well the elements are compressed inside each block.
Conclusion
 B+ Tree is a self-balancing data structure for executing accurate and faster searching, inserting
and deleting procedures on data
 We can easily retrieve complete data or partial data because going through the linked tree
structure makes it efficient.
 The B+ tree structure grows and shrinks with an increase/decrease in the number of stored
records.
 Storage of data on the leaf nodes and subsequent branching of internal nodes evidently shortens
the tree height, which reduces the disk input and output operations, ultimately consuming much
less space on the storage devices.
References

B+ Tree

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B+ Tree

Uploaded by

Copyright:

Available Formats

B+ Tree

A B+-tree maintains the following invariants:

 Every node has one more references than it has keys.

Basic operations associated with B+ Tree:

The following search algorithm is applicable:

 Perform a binary search on the records in the current node.

 Otherwise, follow the proper branch and repeat the process.

The following algorithm is applicable for the insert operation:

o If the parent is full, split it too.

o Add the middle key to the parent node.

o Repeat until a parent is found that need not split.

 Deletion of a node in a B+ Tree:

 Worst case search time complexity: Θ(logn)

Rules for B+ Tree

 Leaves are used to store data records.

Why use B+ Tree

1. Data record are only stored in the leaves.

You might also like