You are on page 1of 40

Dynamic Multilevel Indexes using B-Trees and B+-Trees

Submitted by, MANJU MOHANDAS

Search Tree

y Search tree is a special type of tree that is used to guide the search for a record, given the search key MLI is a variation of the search tree

A node in a search tree with pointers to subtrees below it

P1

K1

Ki-1

Pi

Ki

Kq-1

Pq

X X< Ki

i=1

1<i<q

X Ki-1<X

i=q

Ki-1<X< Ki

A search tree of order p = 3

Search Tree


Each key value in the tree is associated with a pointer to the record in the data file having that value. Pointer could be to the disk block containing the record Search tree itself can be stored on the disk by assigning each tree node to a disk block

Search Tree
y Constraints:  Search keys within a node is ordered (increasing from L to R). With in each node, K1< K2<....< Kq-1  For all values X in the subtree pointed to by Pi, we have Ki-1<X< Ki for 1<i<q , X< Ki for i=1 & Ki-1<X for i=q.

i=1

1<i<q

i=q

Search Tree
  

Algorithms for inserts and deletes do not guarantee that a search tree is balanced Keeping a search tree balanced HELPS!! Keeping search tree balanced yields a uniform search speed regardless of the value of the search key Deletions may lead to nearly empty nodes, thus wasting space and increasing no. of levels

B-Tree
 

 

B-tree has additional constraints that ensure that tree is always balanced and that the space wasted by deletion is never excessive Algorithms for inserts and deletes are more complex in order to maintain these additional constraints They are mostly simple Become complicated only when inserts and deletes lead to splitting and merging of nodes respectively

B-Tree: Characteristics


Automatically maintains as many levels of index as is appropriate for the size of the file being indexed Manages space on the blocks they use so that every block is between half full & completely full Each node corresponds to a disk block

Structure of B-Trees
  

  

Balanced tree All paths from the root to a leaf have the same length Three layers in a B-tree  Root  Intermediate layer  Leaves Parameter p is associates with each B-tree Each node will have p search keys & p+1 pointers Pick p to be as large as will allow p+1 pointers & p keys to fit in one block

Example
    

Block size = 4096 bytes Search key 4 byte integer Pointer - 8 bytes Assume no header information kept in block We choose p such that

4p + 8(p+1) <= 4096


 

p=340 Block can hold 340 keys & 341 pointers

B-tree structures. (a) A node in a B-tree with q 1 search values. (b) A B-tree of order p = 3. The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6.

Constraints for B-trees


B-tree of order p, when used as an access structure on a key field to search for records in a data file, can be defined as follows. 1. Each internal node in the B-tree is of the form <P1 ,<K1,Pr1>,P2,<K2,Pr2>,...<Kq-1,Prq-1>,Pq> where qp. Each Pi is a tree pointer a pointer to another node in the Btree. Each Pri is a data pointer- a pointer to the record whose search key field value equal to Ki . 2. With in each node, K1< K2<....< Kq-1 3. For all search key field values X in the subtree pointed at by Pi we have Ki-1<X< Ki for 1<i<q ,X< Ki for i=1 &Ki-1<X for i=q.

Each node has at most p tree pointers. 5. Each node , except the root and leaf nodes, has atleast (p/2) tree pointers. The root node has atleast two tree pointers unless it is the only node in the tree. 6. A node with q tree pointers, qp has q-1 search key field values. 7. All leaf nodes are at the same level. Leaf nodes have the same structure as internal nodes except that all of their tree pointers Pi are null.
4.

B-Trees & B+-Trees


y An insertion into a node that is not full is quite

efficient; if a node is full the insertion causes a split into two nodes
y Splitting may propagate to other tree levels y A deletion is quite efficient if a node does not

become less than half full


y If a deletion causes a node to become less than half

full, it must be merged with neighboring nodes

B+ - Tree Structure
y A B+ - Tree is in the form of a balanced tree in which every

y y y y y

path from the root of the tree to a leaf of the tree is the same length. Each nonleaf node in the tree has between [n/2] and n children, where n is fixed. B+ -Trees are good for searches, but cause some overhead issues in wasted space. Variation of B-tree data structures. Data pointers are stored only at leaf nodes. The leaf nodes of B+-tree are linked together to provide ordered access on the search field of the records.

FIGURE 14.11
The nodes of a B+-tree. (a) Internal node of a B+-tree with q 1 search values. (b) Leaf node of a B+-tree with q 1 search values and q 1 data pointers.

The structure of internal node of B+-tree of order p


1.

2. 3.

4. 5.

6.

Each internal node is of the form <P1 ,K1,P2,K2,Kq-1,,Pq> where qp and each Pi is a tree pointer With in each internal node, K1< K2<....< Kq-1 For all search key field values X in the subtree pointed at by Pi we have Ki-1<X Ki for 1<i<q , X Ki for i=1 &Ki-1<X for i=q. Each internalnode has at most p tree pointers. Each internal node , except the root , has atleast (p/2) tree pointers. The root node has atleast two tree pointers if it is an internal node. A internal node with q tree pointers, qp has q-1 search key field values.

The structure of the leaf node of a B+-tree of order p


1.

2. 3.

4. 5.

Each leaf node is of the form <<K1,Pr1>,<K2,Pr2>,...<Kq-1,Prq-1>,Pnext> where qp and each Pri is a data pointer- a pointer to the record whose search key field value equal to Ki or to a file block containing the record. With in each node, K1< K2<....< Kq-1 ,qp. Each Pri is a data pointer- a pointer to the record whose search key field value equal to Ki or to a file block containing the record. Each leaf node has atleast (p/2) values. All leaf nodes are at the same level.

y y y y y

The pointers in the internal nodes are tree pointers to blocks that are tree nodes Pointers in leaf nodes are data pointers to the data file records or blocks- except for the Pnext pointer. Pnext pointer is a tree pointer to the next leaf node We can traverse leaf nodes as a linked list using the Pnext pointers. We can also include Pprevious pointers.

B+ - Tree Updates
y Insertion If the new node has a search key that already

exists in another leaf node, then it adds the new record to the file and a pointer to the bucket of pointers. If the search key is different from all others, it is inserted in order. y Deletion It removes the search key value from the node.

y For a B+-tree on a non key field an extra level of indirection

Pr pointers is needed as block pointers that point to blocks that contain a set of record pointers to the actual records in the data file..

Example: B+ tree with order of 1


y Each node must hold at least 1 entry, and at most 2 entries
Root
40

20

33

51

63

10*

15*

20*

27*

33*

37*

40*

46*

51*

55*

63*

97*

Example: Search in a B+ tree order 2


y Search: how to find the records with a given search key value?
y Begin at root, and use key comparisons to go to leaf

y Examples: search for 5*, 16*, all data entries >= 24* ...
y The last one is a range search, we need to do the sequential scan, starting from the first leaf

containing a value >= 24.


Root

13

17

24

30

2*

3*

5*

7*

14* 15*

19* 20* 22*

24* 27* 29*

33* 34* 38* 39*

How to Insert a Data Entry into a B+ Tree?

y Lets look at several examples first.

Inserting 16*, 8* into Example B+ tree


Root
13 17 24 30

2*

3*

5*

7*

8*

14*

15* 16*

You overflow
13 17 24 30

2*

3*

5*

7*

8*

One new child (leaf node) generated; must add one more pointer to its parent, thus one more key value as well.

Inserting 8* (cont.)
y Copy up the
13 17 5 24 30 Entry to be inserted in parent node. (Note that 5 is copied up and s continues to appear in the leaf.)

middle value (leaf split)


2* 3*

5*

7*

8*

13

17

24

30

You overflow!

Insertion into B+ tree (cont.)


Understand difference between copy-up and pushup Observe how minimum occupancy is guaranteed in both leaf and index pg splits.
5 13 17 24 30

We split this node, redistribute entries evenly, and push up middle key.
Entry to be inserted in parent node. (Note that 17 is pushed up and only appears once in the index. Contrast this with a leaf split.)

17

13

24

30

Example B+ Tree After Inserting 8*


Root
17

13

24

30

2*

3*

5* 7* 8*

14* 15*

19* 20* 22*

24* 27* 29*

33* 34* 38* 39*

Notice that root was split, leading to increase in height.

Inserting a Data Entry into a B+ Tree: Summary


y Find correct leaf L. y Put data entry onto L. y If L has enough space, done! y Else, must split L (into L and a new node L2)
y Redistribute entries evenly, put middle key in L2 y copy up middle key. y Insert index entry pointing to L2 into parent of L.

y This can happen recursively y To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.) y Splits grow tree; root split increases height. y Tree growth: gets wider or one level taller at top.

Deleting a Data Entry from a B+ Tree


y Examine examples first

Delete 19* and 20*


Root
17

13

24

30

2*

3*

5* 7* 8*

14* 16*

19* 20* 22*

24* 27* 29*

33* 34* 38* 39*

22*

22* 24*

27* 29*

Have we still forgot something?

Deleting 19* and 20* (cont.)


Root
17

13

27

30

2*

3*

5* 7* 8*

14* 16*

22* 24*

27* 29*

33* 34* 38* 39*

y y y y

Notice how 27 is copied up. But can we move it up? Now we want to delete 24 Underflow again! But can we redistribute this time?

Deleting 24*
y Observe the two leaf

nodes are merged, and 27 is discarded from their parent, but y Observe `pull down of index entry (below).
New root

30

22*

27*

29*

33*

34*

38*

39*

13

17

30

2*

3*

5*

7*

8*

14* 16*

22* 27* 29*

33* 34* 38* 39*

Deleting a Data Entry from a B+ Tree: Summary


y Start at root, find leaf L where entry belongs. y Remove the entry. y If L is at least half-full, done! y If L has only d-1 entries,
y Try to re-distribute, borrowing from sibling (adjacent node with same

parent as L). y If re-distribution fails, merge L and sibling.

y If merge occurred, must delete entry (pointing to L

or sibling) from parent of L. y Merge could propagate to root, decreasing height.

Difference between B-tree and B+-tree


y In a B-tree, pointers to data records exist at all

levels of the tree


y In a B+-tree, all pointers to data records exists at

the leaf-level nodes


y A B+-tree can have less levels (or higher capacity of

search values) than the corresponding B-tree

Conclusion
y Search tree is a special type of tree that is used to guide

the search for a record, given the search key y A B-Tree is in the form of a balanced tree in which every path from the root of the tree to a leaf of the tree is the same length. y A B+-Tree is a variation of B-Tree. y B,B+-tree are dynamic, adjusts gracefully under inserts and deletes

Bibiliography
y Fundamentals of Data Base Management Systems y Fundamentals of Data Base

-Silber Shatz
y www.wikipedia.org y www.encarta.com

You might also like