You are on page 1of 88

Module 6

Physical Database Design


Contents
• Indexing
• Single level indexing
• Multi-level indexing
What is indexing?
• A database structure used for quick location
and access
• Optimizes performance
• Less disk accesses required
• The first column is the Search key that contains a copy of the primary key or
candidate key of the table. These values are stored in sorted order so that the
corresponding data can be accessed quickly.
Note: The data may or may not be stored in sorted order.
• The second column is the Data Reference or Pointer which contains a set of
pointers holding the address of the disk block where that particular key value can
be found.
Classification of indexing
1) Dense index

2) Sparse index
• For every search key value in the data file, there is an index record.
• This record contains the search key and also a reference to the first
data record with that search key value.
Sparse index
The index record appears only for a few items in the data file. Each item points to a block as shown
We start at that record pointed to by the index record, and proceed along with the pointers in the file (that is,
sequentially) until we find the desired record.
Types of index
• Single level index
– Primary index -used on ordered data file with key
fields
– Secondary index- generated using a candidate key or
a non-key with duplicate values
– Clustering index- defined on ordered data file on non-
key field
• Multilevel index
– B-Tree
– B+ Tree
Primary Indexing – It is defined mainly on the primary key of the data-file, in which the data-file
is already ordered based on the primary key.
In a clustering index, the index entries are similar but there's one for each value of the
clustering field, and its block pointer points to the first of perhaps several blocks that have
records with that value of the clustering field. Note this is still a sparse index, since there
may be (and often will be) multiple records with any value of the cluster field, but only one
index entry
Secondary Index:
Similar to the other indexes, but the data file is not ordered (or not ordered by this index field)
Multi level index
• Multilevel index is stored on the disk along
with the actual database files
• Useful when index size is large
• Breaks down the index into several smaller
indices in order to make the outermost level
small enough to be saved in a single disk
block, which can easily be accommodated
anywhere in the main memory.
Multi level index
B Trees
• B-tree is one of the most important data structures
in computer science.
• B-tree is a multiway search tree.
• Several versions of B-trees have been proposed, but
only B+ Trees has been used with large files.
• A B+tree is a B-tree in which data records are in leaf
nodes, and faster sequential access is possible.

19
B+ tree: Internal/root node structure

P0 K1 P1 K2 ……………… Pn-1 Kn Pn

Each Pi is a pointer to a child node; each Ki is a search key value


# of search key values = n, # of pointers = n+1
▪ Requirements:
▪ K1 < K2 < … < Kn
▪ For any search key value K in the subtree pointed by Pi,
If Pi = P0, we require K < K1
If Pi = Pn, Kn  K
If Pi = P1, …, Pn-1, Ki < K  Ki+1

21
B+ tree: leaf node structure
L K1 r1 K2 ……………… Kn rn R

▪ Pointer L points to the left neighbor; R points to


the right neighbor
▪ K1 < K2 < … < Kn
▪ v  n  2v (v is the order of this B+ tree)
▪ We will use Ki* for the pair <Ki, ri> and omit L and
R for simplicity

22
Example: B+ tree with order of 1
• Each node must hold at least 1 entry, and at most
2 entries
Root
40

20 33 51 63

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

23
Example: Search in a B+ tree order 2
• Search: how to find the records with a given search key value?
– Begin at root, and use key comparisons to go to leaf
• Examples: search for 5*, 16*, all data entries >= 24* ...
– The last one is a range search, we need to do the sequential scan, starting from
the first leaf containing a value >= 24.
Root

13 17 24 30

2* 3* 5* 7* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

24
How to Insert a Data Entry into a B+
Tree?

• Let’s look at several examples first.

25
Inserting 16*, 8* into Example B+ tree
Root
13 17 24 30

2* 3* 5* 7* 8* 14* 15* 16*

You overflow

13 17 24 30

2* 3* 5* 7* 8*

One new child (leaf node)


generated; must add one more
pointer to its parent, thus one more
key value as well. 26
Inserting 8* (cont.)
• Copy up the 13 17 24 30
middle value Entry to be inserted in parent node.
5 (Note that 5 is
s copied up and
(leaf split) continues to appear in the leaf.)

2* 3* 5* 7* 8*

5 13 17 24 30 You overflow!

27
Insertion into B+ tree (cont.)
• Understand
difference 5 13 17 24 30
between copy-up
and push-up

We split this node, redistribute entries evenly,


• Observe how
and push up middle key.
minimum
occupancy is 
guaranteed in
both leaf and Entry to be inserted in parent node.
17 (Note that 17 is pushed up and only
index pg splits. appears once in the index. Contrast
this with a leaf split.)

5 13 24 30

28
Example B+ Tree After Inserting 8*

Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

Notice that root was split, leading to increase in height.

29
Inserting a Data Entry into a B+ Tree: Summary
• Find correct leaf L.
• Put data entry onto L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
• Redistribute entries evenly, put middle key in L2
• copy up middle key.
• Insert index entry pointing to L2 into parent of L.
• This can happen recursively
– To split index node, redistribute entries evenly, but push
up middle key. (Contrast with leaf splits.)
• Splits “grow” tree; root split increases height.
– Tree growth: gets wider or one level taller at top.
30
Deleting a Data Entry from a B+ Tree

• Examine examples first …

31
Delete 19* and 20*
Root
17

5 13 24 30

2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

22*

22* 24* 27* 29*

Have we still forgot something?


32
Deleting 19* and 20* (cont.)

Root

17

5 13 27 30

2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*

• Notice how 27 is copied up.


• But can we move it up?
• Now we want to delete 24
• Underflow again! But can we redistribute this time?
33
Deleting 24*
• Observe the two leaf
nodes are merged, and
27 is discarded from 30

their parent, but …


• Observe `pull down’ of 22* 27* 29* 33* 34* 38* 39*

index entry (below).

New root 5 13 17 30

2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*

34
Deleting a Data Entry from a B+ Tree: Summary

• Start at root, find leaf L where entry belongs.


• Remove the entry.
– If L is at least half-full, done!
– If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling (adjacent node
with same parent as L).
• If re-distribution fails, merge L and sibling.
• If merge occurred, must delete entry (pointing to L or
sibling) from parent of L.
• Merge could propagate to root, decreasing height.

35

You might also like