You are on page 1of 15

Indexing

 An index is a data structure that organizes data


records on disk to optimize certain kinds of retrieval
operations.
 An index on a file speeds up selections on the search
key fields for the index.
 Any subset of the fields of a relation can be the search key
for an index on the relation (e.g., age or colour).
 Search key is not the same as key (minimal set of fields that
uniquely identify a record in a relation).
 A data entry with search key value k, denoted as k*,
contains enough information to locate (one or more)
data records with search key value k.
 An index contains a collection of data entries, and
supports efficient retrieval of all data entries k* with
a given key value k.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Alternatives for Data Entry k* in Index
 Three alternatives:
1. A data entry h is an actual data record (with
search key value k).
2. A data entry is a <k, rid> pair, where rid is the
record id of a data record with search key
value k.
3. A data entry is a <k. rid-list> pair, where rid-list
is a list of record ids of data records with
search key value k.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


Alternatives for Data Entries (Contd.)
 Alternative 1: Covering Index
 Index structure is a file organization for data
records (instead of a Heap file or sorted file).
 At most one index on a given collection of data
records can use Alternative 1.
 If data records are very large, # of pages
containing data entries is high.
 Implies size of auxiliary information in the index is also
large, typically.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3


Example of Alternative 1
Covering Index

shape colour holes


round red 2
square red 4
rectangle red 8
round blue 2
square blue 4
rectangle blue 8

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4


Example of Alternative 2
File with data records Index File with 6 data entries

shape colour holes colour location


round red 2 red 1
square red 4 red 3
rectangle red 8 red 2
round blue 2 blue 6
square blue 4 blue 4
rectangle blue 8 blue 5

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5


Example of Alternative 3
File with data records Index File with 6 data entries

shape colour holes


round red 2 colour locations
square red 4 red 1,2,3
rectangle red 8 blue 4,5,6
round blue 2
square blue 4
rectangle blue 8

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6


Alternatives for Data Entries (Contd.)

 Alternatives 2 and 3:
 Data entries typically much smaller than
data records.
• So, better than Alternative 1 with large
data records, especially if search keys are
small.
 Alternative 3 more compact than Alternative
2.
• But leads to variable sized data entries
even if search keys are of fixed length.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7
Index Classification
 Primary vs. secondary: If search key contains primary
key, then called primary index; other indexes are
secondary.
 Unique index: Search key contains candidate key that uniquely
identifies record.
 An index that uses alternative 1 is primary index.
 An index that uses alternatives 2 or 3 are secondary index.
 Clustered vs. unclustered: If order of data records is the
same as, or close to, order of data entries, then called
clustered index; otherwise it is unclustered index.
 Alternative 1 implies clustered; in practice, clustered also
implies Alternative 1 (since sorted files are rare).
 A file can be clustered on at most one search key.
 Cost of retrieving data records through index varies greatly
based on whether index is clustered or not!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8
1. Dense Index 2. Sparse Index
In Dense Index, there is an In this index based system, the
index for every record in the indexes of very few data items
database. If more than one are maintained in the index
record has the same search key file. Indexes are limited to one
then the dense index points to per block of data items.
the first record in the database
that has the search key. In sparse indexing database needs to
The dense name is given to this be sorted in an order.
index is based on the fact that
every record in the database has
a corresponding index in index
file so the index file is very dense
in this index based database.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9


Clustered vs. Unclustered Index
 Suppose that Alternative (2) is used for data entries,
and that the data records are stored in a Heap file.
 To build clustered index, first sort the Heap file (with
some free space on each page for future inserts).
 Overflow pages may be needed for inserts.

Index entries
CLUSTERED direct search for UNCLUSTERED
data entries

Data entries Data entries


(Index File)
(Data file)

Data Records Data Records


Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10
2 basic approaches to organize data
entries
 Hash-based indexing – hash data entries on
search key

 Tree –based indexing – build a tree-like data


structure that directs a search for data entries.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11


Hash-Based Indexes
 Good for equality selections.
 Index is a collection of buckets.
 Bucket = primary page plus zero or more overflow pages.
 Hashing function h: h(r) = bucket in which record r
belongs.
 h looks at the search key fields of r.
 If Alternative (1) is used, the buckets contain the data
records.
 With (2,3) they contain <key, rid> or <key, rid-list> pairs
 On inserts, the record is inserted into the appropriate
bucket, with 'overflow‘ pages allocated as necessary.
 To search for a record with a given search key value, we
apply the hash function to identify the bucket to which
such records belong and look at all pages in that bucket.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12


Tree-Based Indexes
 The data entries are arranged in sorted order by search key value,
and a hierarchical search data structure is maintained that directs
searches to the correct page of data entries.
 The lowest level of the tree, called the leaf level, contains the data
entries.
 All searches begin at the topmost node, called the root, and the
contents of pages in non-leaf levels direct searches to the correct
leaf page.
 Non-leaf pages contain node pointers separated by search key
values. The node pointer to the left of a key value k points to a
subtree that contains only data entries less than k. The node
pointer to the right of a key value k points to a subtree that
contains only data entries greater than or equal to k.

 The B+ tree is an index structure that ensures that all paths from
the root to a leaf in a given tree are of the same length, that is, the
structure is always balanced in height.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13
B+ Tree Indexes

Non-leaf
Pages

Leaf
Pages

 Leaf pages contain data entries, and are chained (prev & next)
 Non-leaf pages contain index entries; they direct searches:

index entry

P0 K 1 P1 K 2 P 2 K m Pm

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14


Example B+ Tree
Root

17

Entries < 17 Entries >= 17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

 Find 28*? 29*? All > 17* and < 30*


 Insert/delete: Find data entry in leaf, then
change it. Need to adjust parent sometimes.
 And change sometimes bubbles up the tree
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 15

You might also like