Lec10 Handout

Part II: Sorting and Searching
Lecture 10: B-Trees
Lecture 10: B-Trees Part II: Sorting and Searching 1 / 19

Motivation
AVL tree is an excellent data structure for search, insertion

and deletion
Each operation on an n-node AVL tree takes O(log n) time
It works as long as the entire data structure can fit into the
main memory
When the data size is too large and the data has to reside on
disk, the performance of AVL tree may deteriorate rapidly

A Practical Example
For a typical machine

Main memory: 100 nanoseconds per access (a nanosecond is
10−9 second)
Hard disk: 0.01 seconds per access
Note that hard disk is 5 orders of magnitude slower than main
memory
A database with 109 items (assume it doesn’t fit in memory)
A successful search needs log 109 = 30 disk accesses, which

take around 0.3 second.
This is way too slow!!
We want to reduce the number of disk accesses to a very

small constant

From Binary to M-ary
Idea: allow the nodes to have many children

Less disk access = less tree height = more branching
As branching increases, the height decreases

An m-ary tree allows m-way branching
Each internal node has at most m children
A complete m-ary tree has height roughly logm n instead of

log n
if m = 100, then log100 109 < 5
Thus, we can speedup the search significantly

B-Trees
A B-tree of minimum degree t ≥ 2 is an 2t-ary tree with the
following properties:
1 Every node x has the following fields:
a. n[x], the number of keys currently stored in node x

b. the n[x] keys themselves, stored in nondecreasing order
c. n[x] + 1 pointers c1 [x], c2 [x], . . . , cn[x]+1 [x] to its children
(Leaf nodes have no children, so their ci fields are undefined)
2 The keys keyi [x] separates the ranges of keys stored in each
subtree: if ki is any key stored in the subtree with root ci [x],
then k1 ≤ key1 [x] ≤ k2 ≤ key2 [x] ≤ · · · ≤ keyn[x] [x] ≤ kn[x]+1
3 All leaves appear in the same level
4 Every node other than the root has at least t − 1 keys. Every
internal node other than the root thus has at least t children.
If not empty, the root has at least one key
5 Every node has at most 2t − 1 keys. Every internal node thus
has at most 2t children
B-Tree Example
CGM T X
AB DE F J KL NO QRS UW Y Z
t = 2: the simplest B-tree

Every node has at least one key. Every internal node thus has
at least 2 children
Every node has at most 3 keys. Every internal node thus has
at most 4 children
A node is full if it contains exactly 2t − 1 keys

(e.g., nodes colored in the above example)

Height of B-Tree
Consider the worst case
the root contains one key
all other nodes contain t − 1 keys
which implies,
1 node at depth 0; 2 nodes at depth 1; 2t nodes at depth 2;
2t 2 nodes at depth 3; . . .; 2t h−1 nodes at depth h
Thus, for any n-key B-tree of minimum degree t ≥ 2 and height h
h
th − 1
X
n ≥ 1 + (t − 1) 2t i−1 = 1 + 2(t − 1)
i=1
t−1
h
= 2t − 1.
n+1
Therefore, h ≤ logt 2 .
Compared with AVL trees, a factor of about log t is saved in

the number of nodes examined for most tree operations.
Insertion
Basically follows insertion strategy of binary search tree

however, insert the new key into an existing leaf node
CGM T X
Insert V
P
CGM T X
AB DE F J KL NO QRS UV W Y Z

Insertion: How to deal with a full node?
To deal with a full node, split it first!
Given a nonfull internal node x, an index i, and a node y such
that y = ci [x] is a full child of x.
]
]
]
x
x
x
ke 1 [
ke 1 [
1[
ke x]
]
x
i−
i−
i+
i[
i[
y
y
y
y
y
ke
ke
x GM x GKM
y = ci [x] y = ci [x] z = ci+1 [x]
J KL J L
T1 T2 T3 T4 T1 T2 T3 T4
1 split the full node y (having 2t − 1 keys) around its median

key keyt [y ] into two nodes having t − 1 keys each
2 move keyt [y ] up into y ’s parent x to separate the two nodes
Insertion Strategy
Question
How can we assure that the parent of a full node is not full?
Answer
Split each full node along the path from the root to the leaf where
the new key will be inserted
A key can be inserted into a B-tree in a single pass down the

tree from the root to a leaf
Splitting the root is the only way to increase the height of a

B-tree

Insertion: Example
(a) initial tree

P
CGM T X
GP
C M T X
(b) insert H: split the encountered full node

Insertion: Example
(c) insert H: split the encountered full node

GP
C KM T X
AB DE F J L NO QRS UW Y Z
GP
C KM T X
AB DE F HJ L NO QRS UW Y Z
(d) insert H: insert into an existing nonfull leaf node

Deletion
Basically follows deletion strategy of binary search tree
Trivial case: the leaf that contains the deleted key is not small
(i.e., before deletion, the number of its keys is at least t)
CGM T X
Delete E
P
CGM T X
AB DF J KL NO QRS UW Y Z

Deletion Strategy
Question
How can we delete a key in one downward pass without “back up”?
Answer
Invariant: when the key is deleted from the subtree rooted at x,
the number of keys in x is at least the minimum degree t.
Under the above invariant, if the key is in a leaf, we can

delete this key without any worry
But we still have several cases to consider
Case 1 the key is in the internal node x
Case 2 the key is not in the internal node x

Deletion: Case 1a
Case 1: the key k is in the internal node x
a. If the child y that precedes k in node x has at least t keys
1 find the predecessor k 0 of k in the subtree rooted at y
2 recursively delete k 0
3 replace k by k 0 in x
P
CGM T X
AB DF J KL NO QRS UW Y Z
Delete G
P
CF M T X
AB D J KL NO QRS UW Y Z
the predecessor F of G is moved up to take G ’s position

Deletion: Case 1b
b. If the child z that follows k in node x has at least t keys
1 find the successor k 0 of k in the subtree rooted at z
2 recursively delete k 0
3 replace k by k 0 in x
P
CF M T X
AB D J KL NO QRS UW Y Z
Delete F
P
CJ M T X
AB D KL NO QRS UW Y Z
the successor J of F is moved up to take F ’s position

Deletion: Case 1c
c. If both y and z have only t − 1 keys
1 merge k and z into y (y now contains 2t − 1 keys)
2 recursively delete k from y
L deleted (trivial case) P
CJ M T X
AB D K NO QRS UW Y Z
Delete J
P
CM T X
AB DK NO QRS UW Y Z
J is pushed down to make node DJK , from where J is deleted

Deletion: Case 2a
Case 2: the key k is not in the internal node x, then determine the
root ci [x] whose subtree contains k. If ci [x] has only t − 1 keys
a. If ci [x] has an immediate sibling with at least t keys
1 give ci [x] an extra key by moving a key from x down into ci [x]
2 move a key from ci [x]’s immediate left or right sibling up into x
3 move the appropriate child pointer from the sibling into ci [x]
4 recursively delete k from the appropriate child of x
A deleted (trivial case) P
CM T X
B DK NO QRS UW Y Z
Delete B
P
DM T X
C K NO QRS UW Y Z
C is moved to fill B’s position, and D is moved to fill C ’s

Deletion: Case 2b
Case 2: the key k is not in the internal node x, then determine the
root ci [x] whose subtree contains k. If ci [x] has only t − 1 keys
b. If ci [x] and both of its immediate siblings have t − 1 keys
1 merge ci [x] with one sibling
2 recursively delete k from the appropriate child of x
P
DM T X
C K NO QRS UW Y Z
Delete C
P
M T X
DK NO QRS UW Y Z
D is pushed down to get node CDK , from where C is deleted


Lec10 Handout

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec10 Handout

Uploaded by

Copyright:

Available Formats

Part II: Sorting and Searching

Lecture 10: B-Trees

Lecture 10: B-Trees Part II: Sorting and Searching 1 / 19

AVL tree is an excellent data structure for search, insertion

Each operation on an n-node AVL tree takes O(log n) time

Lecture 10: B-Trees Part II: Sorting and Searching 2 / 19

For a typical machine

A database with 109 items (assume it doesn’t fit in memory)

A successful search needs log 109 = 30 disk accesses, which

We want to reduce the number of disk accesses to a very

Lecture 10: B-Trees Part II: Sorting and Searching 3 / 19

Idea: allow the nodes to have many children

As branching increases, the height decreases

A complete m-ary tree has height roughly logm n instead of

Lecture 10: B-Trees Part II: Sorting and Searching 4 / 19

a. n[x], the number of keys currently stored in node x

t = 2: the simplest B-tree

A node is full if it contains exactly 2t − 1 keys

Lecture 10: B-Trees Part II: Sorting and Searching 6 / 19

Compared with AVL trees, a factor of about log t is saved in

Basically follows insertion strategy of binary search tree

Lecture 10: B-Trees Part II: Sorting and Searching 8 / 19

1 split the full node y (having 2t − 1 keys) around its median

A key can be inserted into a B-tree in a single pass down the

Splitting the root is the only way to increase the height of a

Lecture 10: B-Trees Part II: Sorting and Searching 10 / 19

(a) initial tree

Lecture 10: B-Trees Part II: Sorting and Searching 11 / 19

(c) insert H: split the encountered full node

Lecture 10: B-Trees Part II: Sorting and Searching 12 / 19

Lecture 10: B-Trees Part II: Sorting and Searching 13 / 19

Under the above invariant, if the key is in a leaf, we can

Lecture 10: B-Trees Part II: Sorting and Searching 14 / 19

the predecessor F of G is moved up to take G ’s position

the successor J of F is moved up to take F ’s position

L deleted (trivial case) P

J is pushed down to make node DJK , from where J is deleted

C is moved to fill B’s position, and D is moved to fill C ’s

D is pushed down to get node CDK , from where C is deleted

You might also like