You are on page 1of 5

Advanced Database Management System Module

1: Indexing and Hashing Techniques


Problems

Example 1: for Primary Index


Suppose that we have an ordered file with r = 30,000 records stored on a disk with block size
B = 1024 bytes. File records are of fixed size and are unspanned, with record length R = 100
bytes. suppose that the ordering key field of the file is V = 9 bytes long, a block pointer is P =
6 bytes long, and we have constructed a primary index for the file. Find the I/O cost with
index and without index for the given data file in terms of disk block access.
Solution:
Data File without Index
Given
r=30000
B=1024 bytes
R= 100 bytes
Search key V=9 bytes

Index block pointer P=6 bytes


Blocking factor for the file : bfr = ⎣(B/R)⎦ = ⎣(1024/100)⎦ = 10 records per block.
Number of blocks needed for the file is b = 𝖥(r/bfr)⎤ = 𝖥(30000/10)⎤ = 3000 blocks.
Binary search on the data file would need approximately 𝖥log2b⎤= 𝖥(log23000)⎤ = 12 block
accesses.
Data file with index
V = 9 bytes
P = 6 bytes long, and we have constructed a primary index for the
file. Size of each index entry Ri = (9 + 6) = 15 bytes
Blocking factor for the index is bfri = ⎣(B/Ri )⎦ = ⎣(1024/15)⎦ = 68 entries per block.
The total number of index entries ri= number of blocks in the data file, which is 3000.
Number of index blocks bi = 𝖥(ri /bfri )⎤ = 𝖥(3000/68)⎤ = 45 blocks.
Binary search on the index file would need

𝖥(log2bi )⎤ = 𝖥(log245)⎤ = 6 block accesses.


To search for a record using the index, we need one additional block access to the data file
for a total of 6 + 1 = 7 block accesses
Example 2: for secondary Index
Consider the file with r = 30,000 fixed-length records of size R = 100 bytes stored on a disk
with block size B = 1024 bytes. Suppose that a secondary index on that non- ordering key
field of the file is available. Find the I/O cost of searching a non-ordering key field using the
secondary index?
Given
r=30000
B=1024 bytes
R= 100 bytes
Blocking factor for the file: bfr = ⎣(B/R) ⎦ = ⎣(1024/100)⎦ = 10 records per block.
Number of blocks needed for the file is b = 𝖥(r/bfr)⎤ = 𝖥(30000/10)⎤ = 3000 blocks
V = 9 bytes
(Block pointer of index file P = 6 bytes long, and we have constructed a secondary index for
the file.
so, each index entry is Ri = (9 + 6) = 15 bytes,
Blocking factor for the index is bfri = ⎣ (B/Ri) ⎦ = ⎣(1024/15)⎦ = 68 entries per block.
Since it is a dense secondary index,
Total number of index entries ri = number of records in the data file, which is 30,000.
Number of blocks needed for the index bi = 𝖥 (ri /bfri) ⎤ = 𝖥 (3000/68) ⎤ = 442 blocks.
A binary search on this secondary index needs 𝖥 (log2bi) ⎤ = 𝖥(log2442) ⎤ = 9 block accesses.

To search for a record using the index, we need an additional block access to the data file for
a total of 9 + 1 = 10 block accesses

Example 3: for multi-level index


Points to remember about multi-level index
● The value bfri is called the fan-out of the multilevel index with symbol fo.
● The first (or base) level of a multilevel index is an ordered file with a distinct value
for each K(i). Therefore, we can create a primary index for the first level called the
second level of the multilevel index. Because the second level is a primary index, we
can use block anchors so that the second level has one entry for each block of the
first level. The blocking factor bfri for the second level—and for all subsequent
levels—is the same as that for the first-level index because all index entries are the
same size; each has one field value and one block address.
● If the first level has r1 entries, and the blocking factor—which is also the fan-out—for
the index is bfri = fo, then the first level needs 𝖥(r1/fo)⎤ blocks, which is therefore the
number of entries r2 needed at the second level of the index.
● We can repeat this process for the second level. The third level, which is a primary
index for the second level, has an entry for each second-level block, so the number of
third-level entries is r3 = 𝖥(r2/fo)⎤.
● We can repeat the preceding process until all the entries of some index level t fit in a
single block. This block at the tth level is called the top index level.
Example 3.
Suppose that the dense secondary index of Example 2 is converted into a multilevel index.
Find out the I/O cost of the data file with multi-level index.
Solution:

We calculated the index blocking factor bfri = 68 index entries per block, which is also the
fan-out fo for the multilevel index;
Number of first level blocks b1 = 442 blocks was also calculated.
Number of second-level blocks will be b2 = 𝖥(b1/fo)⎤ = 𝖥(442/68)⎤ = 7 blocks,
Number of third-level blocks will be b3 = 𝖥(b2/fo)⎤ = 𝖥(7/68)⎤ = 1 block.
Hence, the third level is the top level of the index, and t = 3.

To access a record by searching the multilevel index, we must access one block at each level
plus one block from the data file, so we need t + 1 = 3 + 1 = 4 block accesses.
Compare this to Example 2, where 10 block accesses were needed when a single-level index
and binary search were used.

Example 4. (Calculate the order of a leaf and non-leaf node of a B Tree


Suppose that the search key field is V = 9 bytes long, the block size is B = 512 bytes, a record
pointer is Pr = 7 bytes, and a block pointer is P = 6 bytes. Calculate the order p of a B+-tree.
Solution: to find the order of internal node
An internal node of the B+-tree can have up to p tree pointers and p – 1 search field values;
these must fit into a single block.
Structure of an internal node in a B+tree is BP K BP K BP K…
Let p is the order of the non-leaf node and P is the block pointer
Hence, we have: (p * P) + ((p – 1) * V) ≤ B (if p is the order of the tree, then p-1 search keys
will be available in any node)
(P * 6) + ((P − 1) * 9) ≤ 512
(15 * p) ≤ 521
We can choose p to be the largest value satisfying the above inequality, which gives p = 34.
Solution: to find the order of leaf node
The leaf nodes of the B+-tree will have the same number of values and pointers, except that
the pointers are data pointers and a next pointer.
Structure of a leaf node in a B+tree is K DP K DP K DP …..BP
Hence, the order pleaf for the leaf nodes can be calculated as follows:
Search key field is V = 9 bytes long, the block size is B = 512 bytes, a record pointer is Pr = 7
bytes, and a block pointer is P = 6 bytes.
(pleaf * (Pr + V)) + P ≤ B
(pleaf * (7 + 9)) + 6 ≤ 512
(16 * pleaf) ≤ 506
It follows that each leaf node can hold up to pleaf = 31 key value/data pointer combinations,
assuming that the data pointers are record pointers.

Solve
1. Construct a B+ tree of order p=4 with the following key elements 10, 12, 16, 18, 22,
26, 63, 92, 110. Assume that the tree is initially empty and values are added in
ascending order.
2. Delete the key 200 from the B+tree given below.

3. Suppose that we are using extendable hashing on a file that contains records with the
following search-key values: 2, 3, 5, 7, 11, 17, 19, 23, 29, 31 Show the extendable hash
structure for this file if the hash function is h(x) = x mod 8 and buckets can hold three
records.

You might also like