You are on page 1of 48

2-3 TREE

Center for Distributed Computing, Jadavpur University


2-3 Tree Definition

All leaf nodes are at the same level.


All non leaf nodes are either binary or ternary.
Binary nodes are like BST nodes with single
key k.
Ternary nodes have two keys k1 and k2 and
three children pointers p1, p2, p3.
All keys < k1 reside in sub-tree pointed to by
p1.. All keys > k2 reside in sub-tree pointed to
by p3..Others reside in the sub-tree pointed to
by p2. All keys are distinct .

Center for Distributed Computing, Jadavpur University


2-3 Tree node

typedef struct nt {
T k1, k2;
int binary;
struct nt * p1, * p2, * p3;
} 23treenode;

binary p1 k1 p2 k2 p3

Center for Distributed Computing, Jadavpur University


Operations

 If h is the height of a 2-3 tree having n nodes,

then 2h-1 <= n <= 3h-1

Thus, log3(n+1) <= h <= log2(n+1)

Operations:
Insert node
Delete node
Search node

Center for Distributed Computing, Jadavpur University


INSERTION EXAMPLES

10
14 8 10
14 8 14

8 14

2 10 8 10 8 10
9 12

2 8 14 2 9 14 2 9 12 14

Center for Distributed Computing, Jadavpur University


INSERTION EXAMPLES …

15 10 5 10

8 14 8 14

2 9 12 15 2 5 9 12 15

10 11 10
20

14 8 14
8

9 15 20 2 5 9 11 12 15 20
2 5 12
Center for Distributed Computing, Jadavpur University
INSERTION EXAMPLES …

1 10
17 10
2 8 14
2 8 14 17
1 5 9 11 12 15 20
5 9 11 12 15 20
1

21 10

2 8 14 17

5 9 11 12 15 20 21
1
Center for Distributed Computing, Jadavpur University
DELETION EXAMPLES

10

2 8 14 17

5 9 11 12 15 20 21
1

21 10

2 8 14 17

5 9 11 12 15 20
1
Center for Distributed Computing, Jadavpur University
DELETION EXAMPLES …

10 9

2 14 17

1 5 8 11 12 20
15

1 9

5 14 17

2 11 12 15 20
8
Center for Distributed Computing, Jadavpur University
DELETION EXAMPLES …

8 14

9 17

11 12 15
2 5 20

15 9 14

2 5 11 12 17 20

Center for Distributed Computing, Jadavpur University


B TREE

Center for Distributed Computing, Jadavpur University


B-Tree Definition

A B-Tree of order d is a tree with following properties:


 Each node ( except possibly the root node ) contains at
most d records and at least d/2 records.
 The root node can have at most d records and as few as
one record.
 An internal node containing n records (1 <= m <= d)
with key values k1 to km, have pointers to m+1
subintervals of keys stored in sub-trees.
 A leaf node has empty sub-tree below it. All leaf nodes
are at the same level.

Center for Distributed Computing, Jadavpur University


B-TREE Node

typedef struct btnode {


int k;
T datalist[maxkeyno];
struct btnode * ptr[maxkeyno + 1];
}

Height of a B-Tree of order d containing n nodes is given


by  log  d/2 + 1 (n+1) >= h >= log d + 1(n+1)

2-3 tree is a B-Tree of order 2

Center for Distributed Computing, Jadavpur University


Insertion Examples on a B-Tree of order 4

20, 40, 10, 30

10 20 30 40

20
15

10 15 30 40

35, 7, 26,18
20

7 10 15 18 26 30 35 40
Center for Distributed Computing, Jadavpur University
Insertion Examples …

20 30
22

7 10 15 18 22 26 35 40

10 20 30
5

5 7 15 18 22 26 35 40

Center for Distributed Computing, Jadavpur University


Insertion Examples …

10 20 30
42 13 46 27 8

5 7 8 13 15 18 22 26 27 35 40 42 46

10 20 30 40
32

32 35 42 46
5 7 8 13 15 18 22 26 27

Center for Distributed Computing, Jadavpur University


Insertion Examples …

10 20 30 40
38 24 45

5 7 8 13 15 18 22 24 26 27 32 35 38 42 45 46

25

25
10 20 30 40

5 7 8 13 15 18 22 24

26 27 32 35 38 42 45 46
Center for Distributed Computing, Jadavpur University
Deletion examples

24

25 45
10 18
30 40

5 7 8
13 15 20 22
26 27 32 35 38 42 46

Center for Distributed Computing, Jadavpur University


Deletion examples …

24 22 22

10 18 10

5 7 8 13 15 20 5 7 8 13 15 18 20

10 22 30 40

27 32 35 38 42 46
5 7 8 13 15 18 20 26
Center for Distributed Computing, Jadavpur University
Deletion examples …

38 32 10 22 30

5 7 8 13 15 18 20 26 27 35 40 42 46

8 27 10 22 35

5 7 13 15 18 20 26 30 40 42 46
Center for Distributed Computing, Jadavpur University
Deletion examples …

46 13 42 10 22

5 7 15 18 20 26 30 35 40

5 15 22

7 10 18 20 26 30 35 40

Center for Distributed Computing, Jadavpur University


Deletion examples …

22 15 30

7 10 18 20 26 35 40

18 15 30

7 10 20 26 35 40

Delete 26 , 7 , 35 , 15
Center for Distributed Computing, Jadavpur University
B+ Tree

Center for Distributed Computing, Jadavpur University


What is a B+ Tree?

– A variation of B tree in which


– internal nodes contain only search keys (no data)
– Leaf nodes contain keys with pointers to data records
– Data records are in sorted order by the search key
– All leaves are at the same depth
– Internal nodes and Leaf nodes may have different
orders

Center for Distributed Computing, Jadavpur University


Definition of a B+Tree
A B+ tree is a balanced tree in which every
path from the root of the tree to a leaf is of
the same length, and each non-leaf node of
the tree has between M/2 and M children,
where M is fixed for internal nodes of a
particular tree.

Center for Distributed Computing, Jadavpur University


B+ Tree Nodes

 Internal node
 (NodePointer, Key)*(M-1) |NodePointer in each node
 First i keys are currently in use

k0 k1 … ki-1 __ … __

0 1 i-1 M-2

 Leaf node
– (Key, DataPointer)* L in each node


– first j Keys currently in use

k0 k1 kj-1 __ … __

0 1 L-1
Data for k0 Data for k2 Data for kj-1University
Center for Distributed Computing, Jadavpur
Example

B+ Tree with M = 4 (3 keys and 4 pointers)


Leaf nodes (having L <= 4 keys)linked together
10 40

3 15 20 30 50

1 2 10 11 12 20 25 26 40 42
3 5 6 9 15 17 30 32 33 36 50 60 70

Center for Distributed Computing, Jadavpur University


Advantages of B+ tree usage for databases

• keeps keys in sorted order for sequential traversing


• uses a hierarchical index to minimize the number of
disk reads
• uses partially full blocks to speed insertions and
deletions
• keeps the index balanced with a recursive algorithm
• In addition, a B+ tree minimizes waste by making
sure the interior nodes are at least half full. A B+ tree
can handle an arbitrary number of insertions and
deletions.

Center for Distributed Computing, Jadavpur University


Searching

 Just compare the key value with the data in the tree,
then return the result.
For example: find the value 45, and 15 in below tree.

Center for Distributed Computing, Jadavpur University


Searching

 Result:
1. For the value of 45, not found.
2. For the value of 15, return the position where the
pointer located.

Center for Distributed Computing, Jadavpur University


Insertion

 inserting a value into a B+ tree may unbalance the


tree, so rearrange the tree if needed.
 Example #1: insert 28 into the following tree.

25 28 30
Fits inside the
leaf
Center for Distributed Computing, Jadavpur University
Insertion

 Result:

Center for Distributed Computing, Jadavpur University


Insertion

 Example #2: insert 70 into the following tree

Center for Distributed Computing, Jadavpur University


Insertion

 Process: split the leaf and propagate middle key


up the tree

50 55 60 65 70
Does not
fit inside
50 55 60 65 70 the leaf

Center for Distributed Computing, Jadavpur University


Insertion

 Result: chose the middle key 60, and place it in the


index page between 50 and 75.

Center for Distributed Computing, Jadavpur University


Insertion

The insert algorithm for B+ Tree


Leaf Index Node Action
Node Full Full
NO NO Place the record in sorted position in the appropriate leaf page

YES NO 1. Split the leaf node


2. Place Middle Key in the index node in sorted order.
3. Left leaf node contains records with keys below the middle key.
4. Right leaf node contains records with keys equal to or greater than the
middle key.
YES YES 1. Split the leaf node.
2. Records with keys < middle key go to the left leaf node.
3. Records with keys >= middle key go to the right leaf node.
Split the index node.
4. Keys < middle key go to the left index node.
5. Keys > middle key go to the right index node.
6. The middle key goes to the next (higher level) index node.

IF the next level index node is full, continue splitting the index nodes.
Center for Distributed Computing, Jadavpur University
Insertion

 Exercise: add a key value 95 to the following tree.

75 80 85 90 95
Leaf node
full, split 75 80 85 90 95 25 50 60 75 85
the leaf.
Center for Distributed Computing, Jadavpur University
Insertion

 Result: again put the middle key 60 to the index


page and rearrange the tree.

Center for Distributed Computing, Jadavpur University


Deletion
 Same as insertion, the tree has to be rearranged if the deletion
result violate the rule of B+ tree.
 Example #1: delete 70 from the tree

OK. Node
>=50% full 60 65
Center for Distributed Computing, Jadavpur University
Deletion

 Result:

Center for Distributed Computing, Jadavpur University


Deletion
Example #2: delete 25 from the following tree, but 25 appears in the
index page.

But…

28 30
This is
OK. Center for Distributed Computing, Jadavpur University
Deletion

 Result: replace 25 with 28 in the index page.

Add 28

Center for Distributed Computing, Jadavpur University


Deletion
 Example #3: delete 60 from the following tree

65
Less than
50 55 65 50% full

Center for Distributed Computing, Jadavpur University


Deletion

 Result: delete 60 from the index page and combine


the rest of index pages.

Center for Distributed Computing, Jadavpur University


Deletion

 Delete algorithm for B+ trees

Data Page Below Fill Index Page Below Fill Action


Factor Factor
NO NO Delete the record from the leaf page. Arrange
keys in ascending order to fill void. If the key of
the deleted record appears in the index page,
use the next key to replace it.
YES NO Combine the leaf page and its sibling. Change
the index page to reflect the change.

YES YES 1. Combine the leaf page and its sibling.


2. Adjust the index page to reflect the
change.
3. Combine the index page with its sibling.

Continue combining index pages until you


reach a page with the correct fill factor or
you reach the root page.

Center for Distributed Computing, Jadavpur University


Conclusion

• For a B+ Tree:
• It is “easy” to maintain its balance
– Insertion/Deletion complexity O(logM/2)
• The disk based searching time is shorter than most
of other types of trees because branching factor is
high

Center for Distributed Computing, Jadavpur University


B+Trees and DBMS
– Used to index primary keys
– Can access records in O(logM/2) traversals (height
of the tree)
– Interior nodes contain Keys only
– Set node sizes so that the M-1 keys and M pointers fit
inside a single block on disk
– E.g., block size 4096B, keys 10B, pointers 8 bytes
– (8+ (10+8)*M-1) = 4096
– M = 228; 2.7 billion nodes in 4 levels
– One block read per node visited

Center for Distributed Computing, Jadavpur University


Applications

• B+ trees are used by


– NTFS, ReiserFS, NSS, XFS, JFS, ReFS, and BFS file systems
for metadata indexing
– BFS for storing directories.
– IBM DB2, Informix, Microsoft SQL Server, Oracle, Sybase
ASE, and SQLite for table indexes, MySQL, Postgres,
MongoDB, …

Center for Distributed Computing, Jadavpur University

You might also like