Professional Documents
Culture Documents
B+ Tree
B+ Tree
Table of Contents
Abstract.......................................................................................................................................................1
Introduction.................................................................................................................................................2
Algorithm.....................................................................................................................................................5
Search Algorithm.....................................................................................................................................5
Insertion Algorithm..................................................................................................................................5
Deletion Algorithm..................................................................................................................................6
Rules ................................................................................................................................................6w
Uses ....................................................................................................................................................7
Application...........................................................................................................................................7
Implementation...................................................................................................................................7
Conclusion.................................................................................................................................................19
References.................................................................................................................................................19
Abstract
Most queries can be executed more quickly if the values are stored in order. But it's not practical to hope
to store all the rows in the table one after another, in sorted order, because this requires rewriting the
entire table with each insertion or deletion of a row.This leads us to instead imagine storing our rows in a
tree structure. Our first instinct would be a balanced binary search tree like a red-black tree, but this really
doesn't make much sense for a database since it is stored on disk. You see, disks work by reading and
writing whole blocks of data at once — typically 512 bytes or four kilobytes. A node of a binary search
tree uses a small fraction of that, so it makes sense to look for a structure that fits more neatly into a disk
block.Hence the B+-tree, in which each node stores up to d references to children and up to d − 1 keys.
Each reference is considered “between” two of the node's keys; it references the root of a subtree for
which all values are between these two keys.
Introduction
In order, to implement dynamic multilevel indexing, B-tree and B+ tree are generally employed. The
drawback of B-tree used for indexing, however is that it stores the data pointer (a pointer to the disk file
block containing the key value), corresponding to a particular key value, along with that key value in the
node of a B-tree. This technique, greatly reduces the number of entries that can be packed into a node of a
B-tree, thereby contributing to the increase in the number of levels in the B-tree, hence increasing the
search time of a record. B+ tree eliminates the above drawback by storing data pointers only at the leaf
nodes of the tree. Thus, the structure of leaf nodes of a B+ tree is quite different from the structure of
internal nodes of the B+ tree. It may be noted here that, since data pointers are present only at the leaf
nodes, the leaf nodes must necessarily store all the key values along with their corresponding data
pointers to the disk file block, in order to access them. Moreover, the leaf nodes are linked to provide
ordered access to the records. The leaf nodes, therefore form the first level of index, with the internal
nodes forming the other levels of a multilevel index. Some of the key values of the leaf nodes also appear
in the internal nodes, to simply act as a medium to control the searching of a record.From the above
discussion it is apparent that a B+ tree, unlike a B-tree has two orders, ‘a’ and ‘b’, one for the internal
nodes and the other for the external (or leaf) nodes.
B+ tree
A B+ tree is an N-ary tree with a variable but often large number of children per node. A B+ tree consists
of a root, internal nodes and leaves.[1] The root may be either a leaf or a node with two or more children.
[2]
A B+ tree can be viewed as a B-tree in which each node contains only keys (not key–value pairs), and to
which an additional level is added at the bottom with linked leaves.The primary value of a B+ tree is in
storing data for efficient retrieval in a block-oriented storage context — in particular, filesystems. This is
primarily because unlike binary search trees, B+ trees have very high fanout (number of pointers to child
nodes in a node,[1] typically on the order of 100 or more), which reduces the number of I/O operations
required to find an element in the tree.A B+-tree requires that each leaf be the same distance from the
root, as in this picture, where searching for any of the 11 values (all listed on the bottom level) will
involve loading three nodes from the disk (the root block, a second-level block, and a leaf).In
practice, d will be larger — as large, in fact, as it takes to fill a disk block. Suppose a block is 4KB, our
keys are 4-byte integers, and each reference is a 6-byte file offset. Then we'd choose d to be the largest
value so that 4 (d − 1) + 6 d ≤ 4096; solving this inequality for d, we end up with d ≤ 410, so we'd use 410
for d. As you can see, d can be large.
In our examples, we'll continue to use 4 for d. Looking at our invariants, this requires that each leaf have
at least two keys, and each internal node to have at least two children (and thus at least one key).
Algorithm
Search Algorithm
In B+ Tree, a search is one of the easiest procedures to execute and get fast and accurate results from it.
To find the required record, you need to execute the binary search on the available records in the
Tree.
In case of an exact match with the search key, the corresponding record is returned to the user.
In case the exact key is not located by the search in the parent, current, or leaf node, then a "not
found message" is displayed to the user.
The search process can be re-run for better and more accurate results.
If a record with the search key is found, then return that record.
If the current node is a leaf node and the key is not found, then report an unsuccessful search.
Output:
The matched record set against the exact key is displayed to the user; otherwise, a failed attempt
is shown to the user.
Insertion algorithm
50 percent of the elements in the nodes are moved to a new leaf for storage.
The parent of the new Leaf is linked accurately with the minimum key value and a new location
in the Tree.
Split the parent node into more locations in case it gets fully utilized.
Now, for better results, the center key is associated with the top-level node of that Leaf.
Until the top-level node is not found, keep on iterating the process explained in the above steps.
Insertion of node in a B+ Tree:
o Allocate new leaf and move half the buckets elements to the new bucket.
o Insert the new leaf's smallest key and address into the parent.
o If the root splits, create a new root which has one key and two pointers. (That is, the value
that gets pushed to the new root gets removed from the original node)
Output:
The algorithm will determine the element and successfully insert it in the required leaf node.
.
Deletion Algoritham
Remove the required key and associated reference from the node.
If the node still has enough keys and references to satisfy the invariants, stop.
If the node has too few keys to satisfy the invariants, but its next oldest or next
youngest sibling at the same level has more than necessary, distribute the keys between this node and the
neighbor. Repair the keys in the level above to represent that these nodes now have a different “split point”
between them; this involves simply changing a key in the levels above, without deletion or insertion.
If the node has too few keys to satisfy the invariant, and the next oldest or next
youngest sibling is at the minimum for the invariant, then merge the node with its sibling; if the node is a non-
leaf, we will need to incorporate the “split key” from the parent into our merging.
In either case, we will need to repeat the removal algorithm on the parent node to
remove the “split key” that previously separated these merged nodes — unless the parent is the root and we are
removing the final key from the root, in which case the merged node becomes the new root (and the tree has
become one level shorter than before).
Complexity
Key are primarily utilized to aid the search by directing to the proper Leaf.
B+ Tree uses a "fill factor" to manage the increase and decrease in a tree.
In B+ trees, numerous keys can easily be placed on the page of memory because they do not have
the data associated with the interior nodes. Therefore, it will quickly access tree data that is on the
leaf node.
A comprehensive full scan of all the elements i
s a tree that needs just one linear pass because all the leaf nodes of a B+ tree are linked with each
other.
The ReiserFS, NSS, XFS, JFS, ReFS, and BFS filesystems all use this type of tree for metadata
indexing
BFS also uses B+ trees for storing directories.
NTFS uses B+ trees for directory and security-related metadata indexing.
EXT4 uses extent trees (a modified B+ tree data structure) for file extent indexing.
Relational database management systems such as IBM DB2, Informix,Microsoft SQL
Server, Oracle 8, Sybase ASE,[4] and SQLite[ support this type of tree for table indices. Key–value
database management systems such as CouchDB and Tokyo Cabinet support this type of tree for
data access.
Applications
Importance of B+ Tree:
Using B+, we can retrieve range retrieval or partial retrieval. Traversing through the tree structure
makes this easier and quicker.
As the number of record increases/decreases, B+ tree structure grows/shrinks. There is no restriction
on B+ tree size, like we have in ISAM.
Since we have all the data stored in the leaf nodes and more branching of internal nodes makes height
of the tree shorter. This reduces disk I/O. Hence it works well in secondary storage devices.
characteristics
.