Professional Documents
Culture Documents
Chittaranjan Pradhan
Single-Level Indexes
Multilevel Indexing
Search Trees
B-Trees
B
+ -Trees
B-Trees vs. B
+ -Trees
Chittaranjan Pradhan
School of Computer Engineering,
KIIT University
24.1
Index
Index
Chittaranjan Pradhan
Index
Single-Level Indexes
Primary Indexes
Index Clustering Indexes
Secondary Indexes
An index is a data structure that organizes data records on disk Dense/Sparse Indexes
24.2
Index
Single-Level Indexes
Chittaranjan Pradhan
Single-Level Indexes
Index is usually defined on a single attribute or field of a file, Index
Single-Level Indexes
called indexing field or indexing attribute Primary Indexes
Clustering Indexes
Secondary Indexes
Generally, the index stores each value of the index field along Dense/Sparse Indexes
with a list of pointers to all disk blocks that contain records with Multilevel Indexing
Search Trees
that field value B-Trees
+ -Trees
B
B-Trees vs. B
+ -Trees
The values in the index are ordered so that binary search can
be performed on the index. The index file is much smaller than
the data file, so searching the index using a binary search is
highly efficient
Primary Indexes
A primary index is an ordered file whose records are of fixed Index
Single-Level Indexes
length with two fields. The first field is of the same data type as Primary Indexes
the ordering key field of the data file, called the primary key Clustering Indexes
Secondary Indexes
and the second field is a pointer to a disk block Dense/Sparse Indexes
Multilevel Indexing
Search Trees
There is one index entry in the index file for each block in the B-Trees
B
+ -Trees
data file. Each index entry has the value of the primary key B-Trees vs. B
+ -Trees
field for the first record in a block and a pointer to that block as
its two field values
24.4
Index
Primary Indexes...
Chittaranjan Pradhan
Single-Level Indexes
The index file for a primary index needs substantially fewer Primary Indexes
blocks than does the data file, for two reasons. First, there are Clustering Indexes
Secondary Indexes
fewer index entries than there are records in the data file. Dense/Sparse Indexes
Second,each index entry is typically smaller in size than a data Multilevel Indexing
Search Trees
record because it has only two fields. A binary search on the B-Trees
+ -Trees
B
index file hence requires fewer block accesses than a binary B-Trees vs. B
+ -Trees
24.5
Index
Clustering Indexes
Chittaranjan Pradhan
Clustering Indexes
Index
If records of a file are physically ordered on a non-key field, Single-Level Indexes
that field is called the clustering field. Thus, clustering index is Primary Indexes
Clustering Indexes
the index created on the clustering field to speed up retrieval of Secondary Indexes
Dense/Sparse Indexes
records that have the same value for the clustering field
Multilevel Indexing
Search Trees
This differs from a primary index, which requires that the B-Trees
B
+ -Trees
+ -Trees
ordering field of the data file have a distinct value for each B-Trees vs. B
record
24.6
Index
Clustering Indexes...
Chittaranjan Pradhan
Index
Single-Level Indexes
Primary Indexes
24.7
Index
Secondary Indexes
Chittaranjan Pradhan
Secondary Indexes
A secondary index is also an ordered file with two fields. The Index
Single-Level Indexes
first field is of the same data type as some nonordering field of Primary Indexes
the data file that is an indexing field. The second field is either Clustering Indexes
Secondary Indexes
a block pointer or a record pointer. There can be many Dense/Sparse Indexes
file, which contains the value of the secondary key for the
record and a pointer either to the block in which the record is
stored or to the record itself
24.8
Index
Secondary Indexes...
Chittaranjan Pradhan
Index
Single-Level Indexes
Primary Indexes
24.9
Index
Dense/Sparse Indexes
Chittaranjan Pradhan
Index
Single-Level Indexes
Primary Indexes
Dense/Sparse Indexes Clustering Indexes
Secondary Indexes
Multilevel Indexing
• A dense index has an index entry for every search key Search Trees
24.10
Index
Multilevel Indexing
Chittaranjan Pradhan
Index
Multilevel Indexing Single-Level Indexes
Primary Indexes
A multilevel indexing can contain any number of levels, each of Clustering Indexes
Secondary Indexes
which acts as a non-dense index to the level below. The top Dense/Sparse Indexes
24.11
Index
Multilevel Indexing...
Chittaranjan Pradhan
Single-Level Indexes
Because of the insertion and deletion problem, most multilevel Primary Indexes
indexes use B-tree or B + -tree data structures, which leave Clustering Indexes
Secondary Indexes
space in each tree node (disk block) to allow for new index Dense/Sparse Indexes
24.12
Index
Search Trees
Chittaranjan Pradhan
Search Trees
A search tree is a special type of tree that is used to search a Index
record, given the value of one of the record’s fields Single-Level Indexes
Primary Indexes
Clustering Indexes
Secondary Indexes
A search tree of order ’p’ is a tree such that each node Dense/Sparse Indexes
contains at most ’p-1’ search values and ’p’ pointers in the Multilevel Indexing
order < P1 , K1 , P2 , K2 , ... Pq−1 , Kq−1 , Pq >, where q ≤ p; each Search Trees
B-Trees
+ -Trees
Pi is a pointer to a child node and each Ki is a search value B
B-Trees vs. B
+ -Trees
24.13
Index
Search Trees...
Chittaranjan Pradhan
Search Trees...
Two constraints must hold at all times on the search tree: Index
Single-Level Indexes
• Within each node, K1 < K2 < ... < Kq−1 Primary Indexes
Clustering Indexes
A search tree is balanced when all of its leaf nodes are at the
same level 24.14
Index
B-Trees
Chittaranjan Pradhan
B-Trees
B-tree is a specialized multi-way tree designed especially for Index
use on disk. A B-tree of order ’p’ can be defined as follows: Single-Level Indexes
Primary Indexes
• Each internal node is of the form <P1 , <K1 >, P2 , <K2 > ... Clustering Indexes
Secondary Indexes
<Kq−1 >, Pq > , where q ≤ p. Each Pi is a tree pointer, Dense/Sparse Indexes
with a single root node at level 0. The rules for the insertion to Single-Level Indexes
Primary Indexes
B-tree are: Clustering Indexes
Secondary Indexes
• It attempts to insert the new key into a leaf. If this would Dense/Sparse Indexes
result in that leaf becoming too big, split the leaf into two, Multilevel Indexing
Search Trees
promoting the middle key to the leaf’s parent B-Trees
B
+ -Trees
• If the insertion would result in the parent becoming too big, B-Trees vs. B
+ -Trees
split the parent into two, promoting the middle key. This
strategy might have to be repeated all the way to the top
• If necessary, the root is split in two and the middle key is
promoted to a new root, making the tree one level higher
Ex: Draw the B-tree of the order 5 for the data items 10, 50, 30,
70, 90, 25, 40, 45, 48
24.16
Index
B-Trees...
Chittaranjan Pradhan
Deletion
At deletion, removal should be done from a leaf: Index
Single-Level Indexes
• If the key is already in a leaf node, and removing it doesn’t Primary Indexes
cause that leaf node to have too few keys, then simply Clustering Indexes
Secondary Indexes
remove the key to be deleted Dense/Sparse Indexes
Multilevel Indexing
• If the key is not in a leaf, then it is guaranteed that its Search Trees
Deletion of 40 Index
Single-Level Indexes
Primary Indexes
Clustering Indexes
Secondary Indexes
Dense/Sparse Indexes
Multilevel Indexing
Search Trees
B-Trees
B
+ -Trees
B-Trees vs. B
+ -Trees
Deletion of 10
B + -Trees
The leaf nodes of the B + -tree are usually linked together to Index
provide ordered access on the search field to the records. Single-Level Indexes
Primary Indexes
These leaf nodes are similar to the first level of an index Clustering Indexes
Secondary Indexes
Dense/Sparse Indexes
Internal nodes of the B + -tree correspond to the other levels of Multilevel Indexing
Search Trees
a multilevel index. The structure of the internal nodes of a B-Trees
• Each internal node is of the form <P1 , <K1 >, P2 , <K2 > ...
<Kq−1 >, Pq > , where q ≤ p and each Pi is a tree pointer
• Within each internal node, K1 < K2 < ... <Kq−1
• Each internal node has at most ’p’ tree pointers
• For all search field values X in the subtree pointed by Pi ,
the rule is X ≤ K1 ; Ki−1 < X ≤ Ki for 1 < i < q; and Ki−1 < X
for i = q
• Each internal node has at least d p / 2 e children. The root
has at least two children if it is an internal node
• An internal node with q tree pointers, q ≤ p, has q-1
search key field values 24.19
Index
B + -Trees...
Chittaranjan Pradhan
Index
+ Single-Level Indexes
B -Trees... Primary Indexes
Clustering Indexes
The structure of the leaf nodes of a B + -tree of order p is as Secondary Indexes
Dense/Sparse Indexes
follows:
Multilevel Indexing
• Each leaf node is of the form «K1 , Pr1 >, <K2 , Pr2 > ... Search Trees
B-Trees
<Kq−1 , Prq−1 >, Pnext >, where q ≤ p, each Pri is a data B
+ -Trees
+ -Trees
B-Trees vs. B
pointer and Pnext is the pointer to the next leaf node
• Each leaf node has at least d p / 2 e values
• All leaf nodes are at the same level
• Within each leaf node, K1 ≤ K2 ≤ ... ≤ Kq−1
The pointers in internal nodes are the tree pointers, which point
to blocks that are tree nodes, whereas the pointers in leaf
nodes are the data pointers to the data file records
24.20
Index
B + -Trees...
Chittaranjan Pradhan
Index
Insertion Single-Level Indexes
The rules for the insertion of a data item to the B + -tree are: Primary Indexes
Clustering Indexes
Secondary Indexes
• Find correct leaf L Dense/Sparse Indexes
• Put data entry onto L. There are two options for this entry: Multilevel Indexing
Search Trees
• If L has enough space, done! B-Trees
B
+ -Trees
• Else, split L into L and a new node L2. Redistribute entries B-Trees vs. B
+ -Trees
evenly and copy up the middle key. Also, insert index entry
pointing to L2 into parent of L
• For each insertion, this process will be repeated
24.21
Index
B + -Trees...
Chittaranjan Pradhan
Deletion
The rules for deleting a data item from the B + -tree are: Index
Single-Level Indexes
• Find the correct leaf L Primary Indexes
Clustering Indexes
• Remove the entry. There may be two possible cases for Secondary Indexes
Multilevel Indexing
• If L is at least half-full, done! Search Trees
• Else, try to re-distribute the entries by borrowing from the B-Trees
B
+ -Trees
adjacent node or sibling. If re-distribution fails, merge L and B-Trees vs. B
+ -Trees
Deletion of 48
24.22
Index
B + -Trees...
Chittaranjan Pradhan
Deletion of 45 followed by 40
Index
Single-Level Indexes
Primary Indexes
Clustering Indexes
Secondary Indexes
Dense/Sparse Indexes
Multilevel Indexing
Search Trees
B-Trees
B
+ -Trees
B-Trees vs. B
+ -Trees
Index
Single-Level Indexes
Primary Indexes
Clustering Indexes
+ Secondary Indexes
B-Trees vs. B -Trees Dense/Sparse Indexes
Multilevel Indexing
• B + -tree searching is faster than the B-tree searching; Search Trees
24.24