Multi-level Indexes Using
B-Trees and B+-Trees
Husam A. Halim
2020
Hussam A. Halim Computer Science De
pt. 2020
Outline
• B+-Tree Motivation
• Definition of B+-Trees
• Insertion into a B+-Tree
• Deletion from a B+-Tree
• B+-Tree vs. B-Tree
• Determining the Size Of B+-Tree
Hussam A. Halim Computer Science De
pt. 2020
Objectives
• Learn about B+-Trees.
• Discover how to insert and delete items in a
B+-Tree.
• Explore the differences between B+-Trees and
B-Trees
• Learn how to organize data in a B+-Tree.
• Learn how to compute the size of a B+-tree
Hussam A. Halim Computer Science De
pt. 2020
B+-Tree Motivation
• The B+ Tree index structure is the most widely
used of several index structures that maintain
efficiency despite insertion and deletion of
data.
• They use idea of a balance tree in which every
path from the root of the tree to a leaf is of
the same length.
• All leaf nodes are at the same level.
Hussam A. Halim Computer Science De
pt. 2020
Definition of B+-Trees
• The structure of an internal node:
– q ≤ p.
– Pi is a pointer to
another node.
• The structure of a leaf node:
– Pnext is a pointer to
next leaf node.
– Pri is a data pointer
to the record whose key value is equal to Ki (or to the
data file block containing that record).
Hussam A. Halim Computer Science De
pt. 2020
B+ Tree node Structure
A B
Values <= A Values > A Values > B
B =< &&
Hussam A. Halim Computer Science De
pt. 2020
Definition of B+-Trees
• Each node is kept between half-full and completely full.
• A B+-tree of order P:
– Root has between 2 and P pointers (unless it’s a leaf).
– Internal nodes have between ⌊(P-1)/2⌋ and P-1 keys, and
#pointers = #keys + 1.
– Leaf nodes have between ⎡(P-1)/2⎤ and P-1 keys.
• Search-key values are kept in sorted order.
Hussam A. Halim Computer Science De
pt. 2020
B+-Tree: An example
A B+-Tree of order p = 4
• Internal nodes : ⌊(P-1)/2⌋ ≤ #keys ≤ P-1 1 ≤ #keys ≤ 3
• Leaf nodes : ⎡(P-1)/2⎤ ≤ #keys ≤ P-1 2 ≤ #keys ≤ 3
Hussam A. Halim Computer Science De
pt. 2020
B+-Tree : Insertion
• Find correct leaf L and place the key in L in sorted order.
Case 1 : Leaf not full -> done!
Case 2 : Leaf overflow
• Must split L (into L and a new leaf L2)
Left leaf L : records with keys <= middle key.
Right leaf L2 : records with keys > middle key.
• Copy up middle key, insert pointer pointing to L2 into parent of L.
• If parent node is overflow, go to case 3.
Case 3 : Non-Leaf overflow
• Same as case 2, split it evenly but push up (not copy) middle key.
• If parent node is overflow, go to case 3.
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2
Insert 1
8
5
5 1 18 55 88
Case2
Case
: overflow
1 : Leaf (new
not full
level) !
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2
Insert 1
7
3
3 5
5
3 1 1 53 15 88 7
CaseCase
2 : Leaf overflow
1 : Leaf (split)
not full
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2
Insert 12 3 5 8 8
3 1 5 812 78 7 8
812
Case 3 : Case
Non-Leaf
2 : Leaf
overflow
overflow
(new level)
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2
Insert
Insert 12
9
6 5
3 8
3 1 5 88 77 6 12 9
Case 12 : Leaf not
overflow
full
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2
Insert 6 5
3 7 88
3 1 5 7 6 88 12 9
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2
Insert 15
5
3 7 8
7 8 12
3 1 5 7 6 8 12
15 12
99 9 15
Case
Case
3 :2Non-Leaf
: Leaf overflow
overflow
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Insert : 8 5 1 7 3 12 9 6 15
=> Max # of keys in a node = P – 1 = 2
Insert 15
5 8
3 7 12
3 1 5 7 6 8 12 9 15
Hussam A. Halim Computer Science De
pt. 2020
B+-Tree : Deletion
• Find correct leaf L and delete the entry with the key k.
Case 1 : L is at least half-full
• Make changes to parent (if needed). Done!
Case 2 : L is underflow and have a proper sibling (with enough keys)
• Re-distribute, borrow from its sibling and make changes
to parent.
Case 3 : L is underflow and no proper sibling
• Merge L with its sibling and delete the entry (pointing to
L) from parent of L.
Case 4 : If a non-leaf is underflow try to re-distribute, if it fails, just
merge it with its sibling.
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 6 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1
Delete 6
7
1 6
5 9
1 5 6 7 8 9 12
Case 1 : not underflow
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 5 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1
Delete 12
7
1 5 8
9
1 5 7 8 9 129
Case 2 : underflow (re-distribute)
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 5 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1
Delete 9
7
1 5 8
1 5 7 8 9
Case 4 : Case
non-leaf is underflow
3 : underflow (redistribute)
(merge)
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 5 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1
Delete 9 7
1 5 7
1 5 7 8
7 8
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 5 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1
Delete 8
9
5
1 7
1 5 7 8
Case Case
4 : non-leaf
3 : underflow
is underflow
(merge)
(merge)
Hussam A. Halim Computer Science De
pt. 2020
Example : A B+ Tree of order P = 3. Delete : 5 12 9 8
=> Min # of keys in a leaf = ⌊(P-1)/2⌋ = 1
=> Min # of keys in a non-leaf = ⎡(P-1)/2⎤ = 1
Delete 8 5
1 1 5
1 5 7
Hussam A. Halim Computer Science De
pt. 2020
B+-Trees vs. B-Trees
• In B+-Trees, data pointers are stored only at
the leaf nodes, where in B-trees, data pointers
are stored in all nodes.
• In B+-Trees, leaf nodes linked together to
provide ordered access on the search field to
the records.
• B+-Trees have less levels, since internal nodes
include keys and pointers without any data
pointers.
Hussam A. Halim Computer Science De
pt. 2020
A B+-Tree of order P = 4 :
Hussam A. Halim Computer Science De
pt. 2020
Determining the size of B+-tree
• The order of a non-leaf node is determined by the maximum
child pointers and keys :
(P-1) * key size + (P * block pointer size) = block size
• The order of a leaf node is determined by the maximum
number of keys, record pointer, and block pointer :
Pleaf * key size + Pleaf * record pointer size + block pointer size = block size
• The height of a tree with branching factor m is no more than:
⌈logm(# of leaf pages)⌉
Hussam A. Halim Computer Science De
pt. 2020
Example
• Suppose we have a B+-tree index of order P where :
Key field V = 9 bytes Block size B = 512 bytes
Record pointer Pr = 7 bytes Block pointer Pb = 6 bytes.
– To find the max #keys an internal node can hold:
(P*Pb) + ((P – 1) * V) ≤ B
(P* 6) + ((P − 1) * 9) ≤ 512
(15 * P) ≤ 521
A block can hold up to P = 34 pointers (and 33 key value)
– We can also calculate the max #keys that a leaf can hold :
(Pleaf * (Pr + V)) + Pb ≤ B
(Pleaf * (7 + 9)) + 6 ≤ 512
(16 * Pleaf ) ≤ 506
Each leaf node can hold up to Pleaf = 31 key value
Hussam A. Halim Computer Science De
pt. 2020
Example
• Suppose we have a B-tree index of order P where :
Key field V = 9 bytes Block size B = 512 bytes
Record pointer Pr = 7 bytes Block pointer Pb = 6 bytes.
– To find the max #keys a node can hold:
(P * Pb) + ((P – 1) * (V + Pr)) ≤ B
(P * 6) + ((P − 1) * (9 + 7)) ≤ 512
(22 * P) ≤ 528
So a block can hold up to P = 24 pointers. This is less than the
value of 34 for the B+-tree, resulting in a larger branching
factor and more entries in each internal node of a B+-tree
than in the corresponding B-tree.
Hussam A. Halim Computer Science De
pt. 2020
Example
• Suppose that we construct a B+-tree on the field in the
previous example. To calculate the approx. number of entries
in the B+-tree, we assume that each node is 69% full.
– On the avg., each internal node will have P * 0.69 = 34 * 0.69 or
approx. 23 pointers (and 22 key values).
– Each leaf node, on the average, will hold Pleaf * 0.69 = 0.69 * 31 or
approx. 21 data record pointers.
– We can start at the root and see how many values and pointers
can exist :
Root : 1 node 22 key entries 23 pointers
Level 1 : 23 nodes 506 key entries 529
pointers
Level 2 : 529 nodes 11,638 key entries 12,167
pointers Hussam A. Halim Computer Science De
pt. 2020
Example
• Again, suppose that we construct a B-tree on the same field in
the previous example, and also Assume that each node is 69%
full to calculate the approx. number of entries in the B-tree.
– On the avg., each node will have P * 0.69 = 24 * 0.69 or approx.
16 pointers (and 15 key values).
– Each leaf node, on the average, will hold Pleaf * 0.69 = 0.69 * 31
or approx. 21 data record pointers.
– The number of nodes, keys, and pointers :
Root : 1 node 15 key entries 16 pointers
Level 1 : 16 nodes 240 key entries 256
pointers
Level 2 : 256 nodes 3840 key entries 4096 pointers
Level 3 : 4096 nodes 61,440 key entries
Hussam A. Halim Computer Science De
pt. 2020
Example
• Suppose we have a data file with following parameters :
– Number of records = 2,000,000
– Record (sizes in bytes) = emp(SSN(40), Name(12), Dept(5), Age(5))
– Block size = 1000 bytes
– block pointer = 10 bytes
We want to construct a B+-Tree index on the field SSN. How large
would it be ? (links between leaves are not taken into account).
– Index entries per leaf (Pleaf)= block_size / (key_size + block_pointer_size)
= 1000 / 50 = 20
– # of leaf blocks = #records / Pleaf = ⎡2,000,000 / 20 ⎤ = 100,000
Hussam A. Halim Computer Science De
pt. 2020
.Example Cont
– Branching factor (P) :
(P*pointer_size) + ((P – 1) * key_size) ≤ block_size
(P* 10) + ((P − 1) * 40) ≤ 1000
(50 * P) ≤ 1040
P = ⌊ 1040 / 50⌋ = 20
– #blocks in upper level = #blocks_in_lower_level/P = ⎡ 100,000 / 20 ⎤ = 5,000
– #blocks in upper level = ⎡ 5,000 / 20 ⎤ = 250
– #blocks in upper level = ⎡ 250 / 20 ⎤ = 13
– #blocks in upper level = ⎡ 13 / 20 ⎤ = 1
– # of levels = ⌈log20(1000)⌉ = 4
Hussam A. Halim Computer Science De
pt. 2020
Thanks
Hussam A. Halim Computer Science De
pt. 2020