You are on page 1of 19

Part II: Sorting and Searching

Lecture 10: B-Trees

Lecture 10: B-Trees Part II: Sorting and Searching 1 / 19


Motivation

AVL tree is an excellent data structure for search, insertion


and deletion

Each operation on an n-node AVL tree takes O(log n) time

It works as long as the entire data structure can fit into the
main memory

When the data size is too large and the data has to reside on
disk, the performance of AVL tree may deteriorate rapidly

Lecture 10: B-Trees Part II: Sorting and Searching 2 / 19


A Practical Example

For a typical machine


Main memory: 100 nanoseconds per access (a nanosecond is
10−9 second)
Hard disk: 0.01 seconds per access
Note that hard disk is 5 orders of magnitude slower than main
memory

A database with 109 items (assume it doesn’t fit in memory)

A successful search needs log 109 = 30 disk accesses, which


take around 0.3 second.
This is way too slow!!

We want to reduce the number of disk accesses to a very


small constant

Lecture 10: B-Trees Part II: Sorting and Searching 3 / 19


From Binary to M-ary

Idea: allow the nodes to have many children


Less disk access = less tree height = more branching

As branching increases, the height decreases


An m-ary tree allows m-way branching
Each internal node has at most m children

A complete m-ary tree has height roughly logm n instead of


log n
if m = 100, then log100 109 < 5
Thus, we can speedup the search significantly

Lecture 10: B-Trees Part II: Sorting and Searching 4 / 19


B-Trees
A B-tree of minimum degree t ≥ 2 is an 2t-ary tree with the
following properties:
1 Every node x has the following fields:

a. n[x], the number of keys currently stored in node x


b. the n[x] keys themselves, stored in nondecreasing order
c. n[x] + 1 pointers c1 [x], c2 [x], . . . , cn[x]+1 [x] to its children
(Leaf nodes have no children, so their ci fields are undefined)
2 The keys keyi [x] separates the ranges of keys stored in each
subtree: if ki is any key stored in the subtree with root ci [x],
then k1 ≤ key1 [x] ≤ k2 ≤ key2 [x] ≤ · · · ≤ keyn[x] [x] ≤ kn[x]+1
3 All leaves appear in the same level
4 Every node other than the root has at least t − 1 keys. Every
internal node other than the root thus has at least t children.
If not empty, the root has at least one key
5 Every node has at most 2t − 1 keys. Every internal node thus
has at most 2t children
Lecture 10: B-Trees Part II: Sorting and Searching 5 / 19
B-Tree Example

CGM T X

AB DE F J KL NO QRS UW Y Z

t = 2: the simplest B-tree


Every node has at least one key. Every internal node thus has
at least 2 children
Every node has at most 3 keys. Every internal node thus has
at most 4 children

A node is full if it contains exactly 2t − 1 keys


(e.g., nodes colored in the above example)

Lecture 10: B-Trees Part II: Sorting and Searching 6 / 19


Height of B-Tree
Consider the worst case
the root contains one key
all other nodes contain t − 1 keys
which implies,
1 node at depth 0; 2 nodes at depth 1; 2t nodes at depth 2;
2t 2 nodes at depth 3; . . .; 2t h−1 nodes at depth h
Thus, for any n-key B-tree of minimum degree t ≥ 2 and height h
h
th − 1
X  
n ≥ 1 + (t − 1) 2t i−1 = 1 + 2(t − 1)
i=1
t−1
h
= 2t − 1.

n+1
Therefore, h ≤ logt 2 .

Compared with AVL trees, a factor of about log t is saved in


the number of nodes examined for most tree operations.
Lecture 10: B-Trees Part II: Sorting and Searching 7 / 19
Insertion

Basically follows insertion strategy of binary search tree


however, insert the new key into an existing leaf node

CGM T X

AB DE F J KL NO QRS UW Y Z

Insert V
P

CGM T X

AB DE F J KL NO QRS UV W Y Z

Lecture 10: B-Trees Part II: Sorting and Searching 8 / 19


Insertion: How to deal with a full node?
To deal with a full node, split it first!
Given a nonfull internal node x, an index i, and a node y such
that y = ci [x] is a full child of x.

]
]

]
x
x

x
ke 1 [
ke 1 [

1[
ke x]
]
x

i−
i−

i+
i[
i[

y
y

y
y

y
ke
ke

x GM x GKM
y = ci [x] y = ci [x] z = ci+1 [x]

J KL J L
T1 T2 T3 T4 T1 T2 T3 T4

1 split the full node y (having 2t − 1 keys) around its median


key keyt [y ] into two nodes having t − 1 keys each
2 move keyt [y ] up into y ’s parent x to separate the two nodes
Lecture 10: B-Trees Part II: Sorting and Searching 9 / 19
Insertion Strategy

Question
How can we assure that the parent of a full node is not full?

Answer
Split each full node along the path from the root to the leaf where
the new key will be inserted

A key can be inserted into a B-tree in a single pass down the


tree from the root to a leaf

Splitting the root is the only way to increase the height of a


B-tree

Lecture 10: B-Trees Part II: Sorting and Searching 10 / 19


Insertion: Example

(a) initial tree


P

CGM T X

AB DE F J KL NO QRS UW Y Z

GP

C M T X

AB DE F J KL NO QRS UW Y Z
(b) insert H: split the encountered full node

Lecture 10: B-Trees Part II: Sorting and Searching 11 / 19


Insertion: Example

(c) insert H: split the encountered full node


GP

C KM T X

AB DE F J L NO QRS UW Y Z

GP

C KM T X

AB DE F HJ L NO QRS UW Y Z
(d) insert H: insert into an existing nonfull leaf node

Lecture 10: B-Trees Part II: Sorting and Searching 12 / 19


Deletion
Basically follows deletion strategy of binary search tree
Trivial case: the leaf that contains the deleted key is not small
(i.e., before deletion, the number of its keys is at least t)

CGM T X

AB DE F J KL NO QRS UW Y Z

Delete E
P

CGM T X

AB DF J KL NO QRS UW Y Z

Lecture 10: B-Trees Part II: Sorting and Searching 13 / 19


Deletion Strategy

Question
How can we delete a key in one downward pass without “back up”?

Answer
Invariant: when the key is deleted from the subtree rooted at x,
the number of keys in x is at least the minimum degree t.

Under the above invariant, if the key is in a leaf, we can


delete this key without any worry
But we still have several cases to consider
Case 1 the key is in the internal node x
Case 2 the key is not in the internal node x

Lecture 10: B-Trees Part II: Sorting and Searching 14 / 19


Deletion: Case 1a
Case 1: the key k is in the internal node x
a. If the child y that precedes k in node x has at least t keys
1 find the predecessor k 0 of k in the subtree rooted at y
2 recursively delete k 0
3 replace k by k 0 in x
P

CGM T X

AB DF J KL NO QRS UW Y Z

Delete G
P

CF M T X

AB D J KL NO QRS UW Y Z

the predecessor F of G is moved up to take G ’s position


Lecture 10: B-Trees Part II: Sorting and Searching 15 / 19
Deletion: Case 1b
Case 1: the key k is in the internal node x
b. If the child z that follows k in node x has at least t keys
1 find the successor k 0 of k in the subtree rooted at z
2 recursively delete k 0
3 replace k by k 0 in x
P

CF M T X

AB D J KL NO QRS UW Y Z

Delete F
P

CJ M T X

AB D KL NO QRS UW Y Z

the successor J of F is moved up to take F ’s position


Lecture 10: B-Trees Part II: Sorting and Searching 16 / 19
Deletion: Case 1c
Case 1: the key k is in the internal node x
c. If both y and z have only t − 1 keys
1 merge k and z into y (y now contains 2t − 1 keys)
2 recursively delete k from y

L deleted (trivial case) P

CJ M T X

AB D K NO QRS UW Y Z

Delete J
P

CM T X

AB DK NO QRS UW Y Z

J is pushed down to make node DJK , from where J is deleted


Lecture 10: B-Trees Part II: Sorting and Searching 17 / 19
Deletion: Case 2a
Case 2: the key k is not in the internal node x, then determine the
root ci [x] whose subtree contains k. If ci [x] has only t − 1 keys
a. If ci [x] has an immediate sibling with at least t keys
1 give ci [x] an extra key by moving a key from x down into ci [x]
2 move a key from ci [x]’s immediate left or right sibling up into x
3 move the appropriate child pointer from the sibling into ci [x]
4 recursively delete k from the appropriate child of x
A deleted (trivial case) P

CM T X

B DK NO QRS UW Y Z

Delete B
P

DM T X

C K NO QRS UW Y Z

C is moved to fill B’s position, and D is moved to fill C ’s


Lecture 10: B-Trees Part II: Sorting and Searching 18 / 19
Deletion: Case 2b
Case 2: the key k is not in the internal node x, then determine the
root ci [x] whose subtree contains k. If ci [x] has only t − 1 keys
b. If ci [x] and both of its immediate siblings have t − 1 keys
1 merge ci [x] with one sibling
2 recursively delete k from the appropriate child of x
P

DM T X

C K NO QRS UW Y Z

Delete C
P

M T X

DK NO QRS UW Y Z

D is pushed down to get node CDK , from where C is deleted


Lecture 10: B-Trees Part II: Sorting and Searching 19 / 19

You might also like