Professional Documents
Culture Documents
Tree-Based Indexing
Tree-Based Indexing
§ Basic Concepts
§ Tree Based Indexing
§ Hashing (next week)
Database System Concepts - 7th Edition 14.2 ©Silberschatz, Korth and Sudarshan
Motivation
Database System Concepts - 7th Edition 14.3 ©Silberschatz, Korth and Sudarshan
Basic Concepts
Database System Concepts - 7th Edition 14.4 ©Silberschatz, Korth and Sudarshan
Example
Database System Concepts - 7th Edition 14.5 ©Silberschatz, Korth and Sudarshan
Example (cont’d)
Database System Concepts - 7th Edition 14.6 ©Silberschatz, Korth and Sudarshan
Example (cont’d)
Database System Concepts - 7th Edition 14.7 ©Silberschatz, Korth and Sudarshan
Index Evaluation Metrics
§ Access time: The time to find a particular data item, or set of items, using
the technique in question.
Database System Concepts - 7th Edition 14.8 ©Silberschatz, Korth and Sudarshan
Ordered Indices
§ In an ordered index, index entries are stored sorted on the search key
value.
Database System Concepts - 7th Edition 14.9 ©Silberschatz, Korth and Sudarshan
Dense Index Files
§ Dense index — Index record appears for every search-key value in the
file.
§ E.g. index on ID attribute of instructor relation
Database System Concepts - 7th Edition 14.10 ©Silberschatz, Korth and Sudarshan
Dense Index Files (Cont.)
Database System Concepts - 7th Edition 14.11 ©Silberschatz, Korth and Sudarshan
Sparse Index Files
§ Sparse Index: an index entry appears for only some of the search- key values.
• Applicable when records are sequentially ordered on search-key
Database System Concepts - 7th Edition 14.12 ©Silberschatz, Korth and Sudarshan
Sparse Index Files (Cont.)
§ Good tradeoff:
• for clustered index: sparse index with an index entry for every block in file,
corresponding to least search-key value in the block.
• For unclustered index: sparse index on top of dense index (multilevel index)
Database System Concepts - 7th Edition 14.13 ©Silberschatz, Korth and Sudarshan
Multilevel Index
§ If even outer index is too large to fit in main memory, yet another level of
index can be created, and so on.
§ Indices at all levels must be updated on insertion or deletion from the file.
Database System Concepts - 7th Edition 14.14 ©Silberschatz, Korth and Sudarshan
Multilevel Index (Cont.)
Database System Concepts - 7th Edition 14.15 ©Silberschatz, Korth and Sudarshan
Secondary Indices Example
§ Index record points to a bucket that contains pointers to all the actual
records with that particular search-key value.
§ Secondary indices have to be dense
Database System Concepts - 7th Edition 14.16 ©Silberschatz, Korth and Sudarshan
Clustering vs Nonclustering Indices
Database System Concepts - 7th Edition 14.17 ©Silberschatz, Korth and Sudarshan
Index Update: Deletion
10101 10101 Srinivasan Comp. Sci. 65000
32343 12121 Wu Finance 90000
76766 15151 Mozart Music 40000
22222 Einstein Physics 95000
§ If deleted record was the 32343 El Said History 60000
only record in the file with 33456 Gold Physics 87000
45565 Katz Comp. Sci. 75000
its particular search-key 58583 Califieri History 62000
value, the search-key is 76543 Singh Finance 80000
deleted from the index 76766 Crick Biology 72000
also. 83821 Brandt Comp. Sci. 92000
98345 Kim Elec. Eng. 80000
Database System Concepts - 7th Edition 14.18 ©Silberschatz, Korth and Sudarshan
Index Update: Insertion
• Dense indices – if the search-key value does not appear in the index,
insert it
§ Indices are maintained as sequential files
§ Need to create space for new entry, overflow blocks may be required
• Sparse indices – if index stores an entry for each block of the file, no
change needs to be made to the index unless a new block is created.
§ If a new block is created, the first search-key value appearing in the
new block is inserted into the index.
Database System Concepts - 7th Edition 14.19 ©Silberschatz, Korth and Sudarshan
Indices on Multiple Keys
Database System Concepts - 7th Edition 14.20 ©Silberschatz, Korth and Sudarshan
B+-Tree Index Files
Database System Concepts - 7th Edition 14.21 ©Silberschatz, Korth and Sudarshan
Example of B+-Tree
Leaf nodes
Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Srinivasan Wu
Database System Concepts - 7th Edition 14.22 ©Silberschatz, Korth and Sudarshan
B+-Tree Index Files (Cont.)
§ Each node that is not a root or a leaf has between én/2ù and n children.
§ Special cases:
• If the root is not a leaf, it has at least 2 children.
• If the root is a leaf (that is, there are no other nodes in the tree), it can
have between 0 and (n–1) values.
Database System Concepts - 7th Edition 14.23 ©Silberschatz, Korth and Sudarshan
B+-Tree Node Structure
§ Typical node
P1 K1 P2 … Pn-1 Kn-1 Pn
Database System Concepts - 7th Edition 14.24 ©Silberschatz, Korth and Sudarshan
Leaf Nodes in B+-Trees
Database System Concepts - 7th Edition 14.25 ©Silberschatz, Korth and Sudarshan
Non-Leaf Nodes in B+-Trees
§ Non leaf nodes form a multi-level sparse index on the leaf nodes. For a
non-leaf node with m pointers:
• All the search-keys in the subtree to which P1 points are less than K1
• For 2 £ i £ n – 1, all the search-keys in the subtree to which Pi points
have values greater than or equal to Ki–1 and less than Ki
• All the search-keys in the subtree to which Pn points have values
greater than or equal to Kn–1
• General structure
P1 K1 P2 … Pn-1 Kn-1 Pn
Database System Concepts - 7th Edition 14.26 ©Silberschatz, Korth and Sudarshan
Example of B+-tree
El Said Mozart
Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Srinivasan Wu
Database System Concepts - 7th Edition 14.27 ©Silberschatz, Korth and Sudarshan
Observations about B+-trees
§ Insertions and deletions to the main file can be handled efficiently, as the
index can be restructured in logarithmic time.
Database System Concepts - 7th Edition 14.28 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees
function find(v)
1. C=root
2. while (C is not a leaf node)
1. Let i be least number s.t. V £ Ki.
2. if there is no such number i then
3. Set C = last non-null pointer in C
4. else if (v = C.Ki ) Set C = Pi +1
5. else set C = C.Pi
3. if for some i, Ki = V then return C.Pi
4. else return null /* no record with search-key value v exists. */
Mozart
Adams Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Srinivasan Wu
Database System Concepts - 7th Edition 14.29 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees (Cont.)
§ Range queries find all records with search key values in a given range
Mozart
Adams Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Srinivasan Wu
Database System Concepts - 7th Edition 14.30 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees (Cont.)
§ If there are K search-key values in the file, the height of the tree is no
more than élogén/2ù(K)ù.
Database System Concepts - 7th Edition 14.31 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Insertion
§ Find the leaf node in which the search-key value would appear
• If there is room in the leaf node, insert (v, pr) pair in the leaf node
• Otherwise, split the node (along with the new (v, pr) entry) as
discussed in the next slide, and propagate updates to parent nodes.
Database System Concepts - 7th Edition 14.32 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Insertion (Cont.)
Result of splitting node containing Brandt, Califieri and Crick on inserting Adams
Next step: insert entry with (Califieri, pointer-to-new-node) into parent
Database System Concepts - 7th Edition 14.33 ©Silberschatz, Korth and Sudarshan
B+-Tree Insertion
Mozart Root node
Leaf nodes
Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Srinivasan Wu
Mozart
Affected nodes
Adams Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Srinivasan Wu
Database System Concepts - 7th Edition 14.34 ©Silberschatz, Korth and Sudarshan
B+-Tree Insertion
Mozart
Adams Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Srinivasan Wu
Adams Brandt Califieri Crick Einstein El Said Gold Katz Kim Lamport Mozart Singh Srinivasan Wu
Affected nodes
Database System Concepts - 7th Edition 14.35 ©Silberschatz, Korth and Sudarshan
Insertion in B+-Trees (Cont.)
§ Splitting a non-leaf node: when inserting (k,p) into an already full internal
node N
• Copy N to an in-memory area M with space for n+1 pointers and n
keys
• Insert (k,p) into M
• Copy P1,K1, …, K én/2ù-1,P én/2ù from M back into node N
• Copy Pén/2ù+1,K én/2ù+1,…,Kn,Pn+1 from M into newly allocated node N'
• Insert (K én/2ù,N') into parent N
§ Example
Database System Concepts - 7th Edition 14.36 ©Silberschatz, Korth and Sudarshan
Examples of B+-Tree Deletion
Mozart
Adams Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Srinivasan Wu
Affected nodes
Adams Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Wu
Database System Concepts - 7th Edition 14.37 ©Silberschatz, Korth and Sudarshan
Examples of B+-Tree Deletion (Cont.)
Gold
Adams Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Wu
Affected nodes
Adams Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart
Adams Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart
§ Node with Gold and Katz became underfull, and was merged with its sibling
§ Parent node becomes underfull, and is merged with its sibling
•Value separating two nodes (at the parent) is pulled down when merging
§ Root node then has only one child, and is deleted
Database System Concepts - 7th Edition 14.39 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Deletion
Assume record already deleted from file. Let V be the search key value of the
record, and Pr be the pointer to the record.
§ If the node has too few entries due to the removal, and the entries in the
node and a sibling fit into a single node, then merge siblings:
• Insert all the search-key values in the two nodes into a single node
(the one on the left), and delete the other node.
• Delete the pair (Ki–1, Pi), where Pi is the pointer to the deleted node,
from its parent, recursively using the above procedure.
Database System Concepts - 7th Edition 14.40 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Deletion
§ Otherwise, if the node has too few entries due to the removal, but the
entries in the node and a sibling do not fit into a single node, then
redistribute pointers:
• Redistribute the pointers between the node and a sibling such that
both have more than the minimum number of entries.
• Update the corresponding search-key value in the parent of the node.
§ The node deletions may cascade upwards till a node which has én/2ù or
more pointers is found.
§ If the root node has only one pointer after deletion, it is deleted and the
sole child becomes the root.
Database System Concepts - 7th Edition 14.41 ©Silberschatz, Korth and Sudarshan
Complexity of Updates
Database System Concepts - 7th Edition 14.42 ©Silberschatz, Korth and Sudarshan
B+-Tree File Organization
§ Insertion and deletion are handled in the same way as insertion and deletion
of entries in a B+-tree index.
Database System Concepts - 7th Edition 14.43 ©Silberschatz, Korth and Sudarshan
B+-Tree File Organization (Cont.)
§ Good space utilization important since records use more space than pointers.
Database System Concepts - 7th Edition 14.44 ©Silberschatz, Korth and Sudarshan
Other Issues in Indexing
Database System Concepts - 7th Edition 14.45 ©Silberschatz, Korth and Sudarshan
Bulk Loading and Bottom-Up Build
§ Efficient alternative 1:
• sort entries first (using efficient external-memory sort algorithms that will
be discussed later)
• insert in sorted order
§ insertion will go to existing page (or cause a split)
§ much improved IO performance, but most leaf nodes half full
Database System Concepts - 7th Edition 14.46 ©Silberschatz, Korth and Sudarshan
B-Tree Index Files
§ Similar to B+-tree, but B-tree allows search-key values to appear only once;
eliminates redundant storage of search keys.
§ Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional
pointer field for each search key in a nonleaf node must be included.
§ Generalized B-tree leaf node (a)
P1 K1 P2 … Pn-1 Kn-1 Pn
(a)
(b)
§ Nonleaf node (b) – pointers Bi are the bucket or file record pointers.
§ There are n − 1 keys in the leaf node, but there are m − 1 keys in the
nonleaf node.
Database System Concepts - 7th Edition 14.47 ©Silberschatz, Korth and Sudarshan
B-Tree Index File Example
Brandt Califieri
... and soon for other records...
record record
Leaf nodes
Brandt Califieri Crick Einstein El Said Gold Katz Kim Mozart Singh Srinivasan Wu
Database System Concepts - 7th Edition 14.48 ©Silberschatz, Korth and Sudarshan
B-Tree Index Files (Cont.)
Database System Concepts - 7th Edition 14.49 ©Silberschatz, Korth and Sudarshan
Indexing in Main Memory
§ B+- trees with small nodes that fit in cache line are preferable to reduce
cache misses
§ Key idea: use large node size to optimize disk access, but structure data
within a node using a tree with small node size, instead of using an array.
Database System Concepts - 7th Edition 14.50 ©Silberschatz, Korth and Sudarshan