Dynamic Multilevel Indexes using B-Trees and B+-Trees

Submitted by, MANJU MOHANDAS

Search Tree 

y Search tree is a special type of tree that is used to guide the search for a record, given the search key MLI is a variation of the search tree 

A node in a search tree with pointers to subtrees below it

P1

K1

«

Ki-1

Pi

Ki

«

Kq-1

Pq

X X< Ki

i=1

X

1<i<q

X Ki-1<X

i=q

Ki-1<X< Ki

A search tree of order p = 3

Search Tree    Each key value in the tree is associated with a pointer to the record in the data file having that value. Pointer could be to the disk block containing the record Search tree itself can be stored on the disk by assigning each tree node to a disk block .

< Kq-1  For all values X in the subtree pointed to by Pi. we have Ki-1<X< Ki for 1<i<q .. K1< K2<.. .Search Tree y Constraints:  Search keys within a node is ordered (increasing from L to R). X< Ki for i=1 & Ki-1<X for i=q. With in each node..

i=1 1<i<q i=q .

Search Tree     Algorithms for inserts and deletes do not guarantee that a search tree is balanced Keeping a search tree balanced HELPS!! Keeping search tree balanced yields a uniform search speed regardless of the value of the search key Deletions may lead to nearly empty nodes. thus wasting space and increasing no. of levels .

B-Tree     B-tree has additional constraints that ensure that tree is always balanced and that the space wasted by deletion is never excessive Algorithms for inserts and deletes are more complex in order to maintain these additional constraints They are mostly simple Become complicated only when inserts and deletes lead to splitting and merging of nodes respectively .

B-Tree: Characteristics    Automatically maintains as many levels of index as is appropriate for the size of the file being indexed Manages space on the blocks they use so that every block is between half full & completely full Each node corresponds to a disk block .

Structure of B-Trees       Balanced tree All paths from the root to a leaf have the same length Three layers in a B-tree  Root  Intermediate layer  Leaves Parameter p is associates with each B-tree Each node will have p search keys & p+1 pointers Pick p to be as large as will allow p+1 pointers & p keys to fit in one block .

8 bytes Assume no header information kept in block We choose p such that 4p + 8(p+1) <= 4096   p=340 Block can hold 340 keys & 341 pointers .Example      Block size = 4096 bytes Search key ² 4 byte integer Pointer .

(b) A B-tree of order p = 3. 6. 7. 12.B-tree structures. 9. . (a) A node in a B-tree with q ² 1 search values. 3. 5. The values were inserted in the order 8. 1.

Constraints for B-trees B-tree of order p..<Kq-1.< Kq-1 3. when used as an access structure on a key field to search for records in a data file..Prq-1>. Each Pi is a tree pointer ² a pointer to another node in the Btree. 2.a pointer to the record whose search key field value equal to Ki .<K2. Each internal node in the B-tree is of the form <P1 . Each Pri is a data pointer...<K1.P2.Pr1>.Pr2>. K1< K2<..X< Ki for i=1 &Ki-1<X for i=q. . With in each node. can be defined as follows. For all search key field values X in the subtree pointed at by Pi we have Ki-1<X< Ki for 1<i<q .Pq> where q”p.. 1.

A node with q tree pointers. has atleast (p/2) tree pointers. . Each node . except the root and leaf nodes. The root node has atleast two tree pointers unless it is the only node in the tree. q”p has q-1 search key field values. 6. Leaf nodes have the same structure as internal nodes except that all of their tree pointers Pi are null.Each node has at most p tree pointers. All leaf nodes are at the same level. 7. 5. 4.

if a node is full the insertion causes a split into two nodes y Splitting may propagate to other tree levels y A deletion is quite efficient if a node does not become less than half full y If a deletion causes a node to become less than half full.B-Trees & B+-Trees y An insertion into a node that is not full is quite efficient. it must be merged with neighboring nodes .

Data pointers are stored only at leaf nodes. Each nonleaf node in the tree has between [n/2] and n children. The leaf nodes of B+-tree are linked together to provide ordered access on the search field of the records. Variation of B-tree data structures. but cause some overhead issues in wasted space.B+ . B+ -Trees are good for searches. where n is fixed.Tree is in the form of a balanced tree in which every y y y y y path from the root of the tree to a leaf of the tree is the same length. .Tree Structure y A B+ .

. (b) Leaf node of a B+-tree with q ² 1 search values and q ² 1 data pointers. (a) Internal node of a B+-tree with q ²1 search values.FIGURE 14.11 The nodes of a B+-tree.

except the root .. X ” Ki for i=1 &Ki-1<X for i=q. has atleast (p/2) tree pointers. K1< K2<. Each internal node . 4.P2.K1. Each internalnode has at most p tree pointers.«Kq-1.The structure of internal node of B+-tree of order p 1. 5.. 6. 3. q”p has q-1 search key field values.< Kq-1 For all search key field values X in the subtree pointed at by Pi we have Ki-1<X ” Ki for 1<i<q .Pq> where q”p and each Pi is a tree pointer With in each internal node. Each internal node is of the form <P1 .. The root node has atleast two tree pointers if it is an internal node. A internal node with q tree pointers. 2..K2. .

Each leaf node is of the form <<K1...Prq-1>.< Kq-1 .Pnext> where q”p and each Pri is a data pointer. 4..Pr2>. With in each node. Each leaf node has atleast (p/2) values.Pr1>.. .The structure of the leaf node of a B+-tree of order p 1. 5.q”p. K1< K2<..<K2.a pointer to the record whose search key field value equal to Ki or to a file block containing the record. Each Pri is a data pointer.. 3. All leaf nodes are at the same level.a pointer to the record whose search key field value equal to Ki or to a file block containing the record. 2.<Kq-1.

except for the Pnext pointer.y y y y y The pointers in the internal nodes are tree pointers to blocks that are tree nodes Pointers in leaf nodes are data pointers to the data file records or blocks. We can also include Pprevious pointers. Pnext pointer is a tree pointer to the next leaf node We can traverse leaf nodes as a linked list using the Pnext pointers. .

then it adds the new record to the file and a pointer to the bucket of pointers.B+ .Tree Updates y Insertion ² If the new node has a search key that already exists in another leaf node. If the search key is different from all others. . it is inserted in order. y Deletion ² It removes the search key value from the node.

y For a B+-tree on a non key field an extra level of indirection Pr pointers is needed as block pointers that point to blocks that contain a set of record pointers to the actual records in the data file.. .

Example: B+ tree with order of 1 y Each node must hold at least 1 entry. and at most 2 entries Root 40 20 33 51 63 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* .

Example: Search in a B+ tree order 2 y Search: how to find the records with a given search key value? y Begin at root. all data entries >= 24* . Root 13 17 24 30 2* 3* 5* 7* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* . 16*. we need to do the sequential scan. and use key comparisons to go to leaf y Examples: search for 5*. y The last one is a range search. starting from the first leaf containing a value >= 24...

.How to Insert a Data Entry into a B+ Tree? y Let·s look at several examples first.

8* into Example B+ tree Root 13 17 24 30 2* 3* 5* 7* 8* 14* 15* 16* You overflow 13 17 24 30 2* 3* 5* 7* 8* One new child (leaf node) generated.Inserting 16*. . thus one more key value as well. must add one more pointer to its parent.

) middle value (leaf split) 2* 3* 5* 7* 8* 5 13 17 24 30 You overflow! . (Note that 5 is copied up and s continues to appear in the leaf.Inserting 8* (cont.) y Copy up the 13 17 5 24 30 Entry to be inserted in parent node.

Contrast this with a leaf split. ¡ Entry to be inserted in parent node. redistribute entries evenly. (Note that 17 is pushed up and only appears once in the index. 5 13 17 24 30 We split this node.Insertion into B+ tree (cont. and push up middle key.) ‡ Understand difference between copy-up and pushup ‡ Observe how minimum occupancy is guaranteed in both leaf and index pg splits.) 17 5 13 24 30 .

leading to increase in height.Example B+ Tree After Inserting 8* Root 17 5 13 24 30 2* 3* 5* 7* 8* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Notice that root was split. .

(Contrast with leaf splits. but push up middle key. y Put data entry onto L.Inserting a Data Entry into a B+ Tree: Summary y Find correct leaf L. must split L (into L and a new node L2) y Redistribute entries evenly. y Tree growth: gets wider or one level taller at top. y Insert index entry pointing to L2 into parent of L.) y Splits ´growµ tree. root split increases height. y This can happen recursively y To split index node. redistribute entries evenly. put middle key in L2 y copy up middle key. . y If L has enough space. done! y Else.

Deleting a Data Entry from a B+ Tree y Examine examples first « .

Delete 19* and 20* Root 17 5 13 24 30 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 22* 22* 24* 27* 29* Have we still forgot something? .

) Root 17 5 13 27 30 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* y y y y Notice how 27 is copied up.Deleting 19* and 20* (cont. But can we move it up? Now we want to delete 24 Underflow again! But can we redistribute this time? .

Deleting 24* y Observe the two leaf nodes are merged. but « y Observe `pull down· of index entry (below). and 27 is discarded from their parent. New root 30 22* 27* 29* 33* 34* 38* 39* 5 13 17 30 2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* .

find leaf L where entry belongs. done! y If L has only d-1 entries. y If L is at least half-full. decreasing height. merge L and sibling. . must delete entry (pointing to L or sibling) from parent of L. y If merge occurred. y Merge could propagate to root. y Remove the entry. y If re-distribution fails.Deleting a Data Entry from a B+ Tree: Summary y Start at root. y Try to re-distribute. borrowing from sibling (adjacent node with same parent as L).

Difference between B-tree and B+-tree y In a B-tree. pointers to data records exist at all levels of the tree y In a B+-tree. all pointers to data records exists at the leaf-level nodes y A B+-tree can have less levels (or higher capacity of search values) than the corresponding B-tree .

Conclusion y Search tree is a special type of tree that is used to guide the search for a record. y B. y A B+-Tree is a variation of B-Tree. given the search key y A B-Tree is in the form of a balanced tree in which every path from the root of the tree to a leaf of the tree is the same length. adjusts gracefully under inserts and deletes .B+-tree are dynamic.

encarta.com .Bibiliography y Fundamentals of Data Base Management Systems y Fundamentals of Data Base -Silber Shatz y www.org y www.wikipedia.