Professional Documents
Culture Documents
Chapter 13: Disk Storage, Basic File Structures, and Hashing
Chapter 13: Disk Storage, Basic File Structures, and Hashing
the file, using linear search, if the file blocks are not stored on consecutive disk
blocks.
(g) Assume that the records are ordered via some key field. Calculate the average
number of block accesses and the average time needed to search for an arbitrary
record in the file, using binary search.
Answer:
(a) R = (9 + 20 + 20 + 1 + 10 + 35 + 12 + 9 + 4 + 4) + 1 = 125 bytes
bfr = floor(B / R) = floor(2400 / 125) = 19 records per block
b = ceiling(r / bfr) = ceiling(30000 / 19) = 1579 blocks
(b) Wasted space per block = B - (R * Bfr) = 2400 - (125 * 19) = 25 bytes
(c) Transfer rate tr= B/btt = 2400 / 1 = 2400 bytes/msec
bulk transfer rate btr= tr * ( B/(B+G) )
= 2400*(2400/(2400+600)) = 1920 bytes/msec
(d) For linear search we have the following cases:
i. search on key field:
if record is found, half the file blocks are searched on average: b/2= 1579/2 blocks
if record is not found, all file blocks are searched: b = 1579 blocks
ii. search on non-key field:
all file blocks must be searched: b = 1579 blocks
(e) If the blocks are stored consecutively, and double buffering is used, the time to read
n consecutive blocks= s+rd+(n*(B/btr))
i. if n=b/2: time = 20+10+((1579/2)*(2400/1920))= 1016.9 msec = 1.017 sec
(a less accurate estimate is = s+rd+(n*btt)= 20+10+(1579/2)*1= 819.5 msec)
ii. if n=b: time = 20+10+(1579*(2400/1920))= 2003.75 msec = 2.004 sec
(a less accurate estimate is = s+rd+(n*btt)= 20+10+1579*1= 1609 msec)
(f) If the blocks are scattered over the disk, a seek is needed for each block, so the time
to search n blocks is: n * (s + rd + btt)
i. if n=b/2: time = (1579/2)*(20+10+1)= 24474.5 msec = 24.475 sec
ii. if n=b: time = 1579*(20+10+1)= 48949 msec = 48.949 sec
(g) For binary search, the time to search for a record is estimated as:
ceiling(log 2 b) * (s +rd + btt)
= ceiling(log 2 1579) * (20+10+1) = 11 * 31 = 341 msec = 0.341 sec
13.27 A PARTS file with Part# as hash key includes records with the following Part#
values: 2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428,
3943, 4750, 6975, 4981, 9208. The file uses 8 buckets, numbered 0 to 7. Each
bucket is one disk block and holds two records. Load these records into the file in
the given order using the hash function h(K)=K mod 8. Calculate the average
number of block accesses for a random retrieval on Part#.
Answer:
The records will hash to the following buckets:
K h(K) (bucket number)
Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
2369 1
3760 0
4692 4
4871 7
5659 3
1821 5
1074 2
7115 3
1620 4
2428 4 overflow
3943 7
4750 6
6975 7 overflow
4981 5
9208 0
9209
Two records out of 15 are in overflow, which will require an additional block access. The
other records require only one block access. Hence, the average time to retrieve a random
record is:
(1 * (13/15)) + (2 * (2/15)) = 0.867 + 0.266 = 1.133 block accesses
13.28 Load the records of Exercise 5.27 into expandable hash files based on extendible
hashing. Show the structure of the directory at each step. Show the directory at each step,
and the global and local depths. Use the has function
h(k) = K mod 32.
Answer:
Hashing the records gives the following result:
13.29 Load the records of Exercise 13.27 into expandable hash files based on linear
hashing.
Start with a single disk block, using the hash function h = K mod 2, and show how the file
grows and how the hash functions change as the records are inserted. Assume that blocks
are split whenever an overflow occurs, and show the value of n at each stage
Answer:
Note: It is more common to specify a certain load factor for the file for triggering the
splitting of buckets (rather than triggering the splitting whenever a new record
being inserted is placed in overflow). The load factor lf could be defined as:
lf = (r) / (b * Bfr)
Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
10
11
12
subsequent search cost are less than the search cost before reorganization. Assume s=16,
rd=8.3 ms, btt=1 ms.
Primary Area = 600 buckets
Secondary Area (Overflow) = 600 buckets
Total reorganization cost = Buckets Read & Buckets Written for (600 & 600) +
1200 = 2400 buckets = 2400 (1 ms) = 2400 ms
Let X = number of random fetches from the file.
Average Search time per fetch = time to access (1 + 1/2) buckets where 50% of
time we need to access the overflow bucket.
Access time for one bucket access = (S + r + btt)
= 16 + 8.3 + 0-8
= 25.1 ms
Time with reorganization for the X fetches
= 2400 + X (25.1) ms
Time without reorganization for X fetches = X (25.1) (1 + 1/2) ms
= 1.5 * X * (25.1) ms.
Hence, 2400 + X (25.1) < (25.1) * (1.5X)
2374.9/ 12.55 < X
Hence, 189.23 < X
If we make at least 190 fetches, the reorganization is worthwhile.
13.41 Suppose we want to create a linear hash file with a file load factor of 0.7 and a
blocking factor of 20 records per bucket, which is to contain 112000 records initially.
(a) How many buckets should we allocate in primary areas?
(b) What should be the number of bits used for bucket addresses?
Answer:
(a) No. of buckets in primary area.
= 112000/(20*0.7) = 8000.
(b) K: the number of bits used for bucket addresses
2K < = 8000 < = 2 k+1
2 12 = 4096
2 13 = 8192
K = 12
Boundary Value = 8000 - 2 12
= 8000 - 4096
= 3904 -