Professional Documents
Culture Documents
large: data does not fit into the main memory, we have to use some
secondary storage
persistent: data written into a file should persist even after using it
so that it can be used again
reliable: should survive hardware and software failures, should be
able to recover from these failures
sharable: sharable by multiple users
Cache
main memory
flash memory
magnetic-disk storage
optical storage
magnetic-tape storage
Magnetic Disks
seek time
rotation delay
block transfer time
File Operations
Block Allocation
File Organization
Ordered
hashing
Access Methods
Primary Key Access Methods
Hashing
Primary Key Indexing
Multilevel Indexing
B - Trees
B+-Trees
Internal Hashing
Apply Hash
function
Key
Example:
Name
h(K) = K mod m
m = 70 - 90% of
the expected number
of records
Physical
Address
Department Salary
Name
Department
0
1
2
3 James Adams
15 Mary Jones
16
17 Henry Truman
-1
-1
-1
External Hashing
- number of disk accesses is never more than 2 but will usually be 1
- the file has 2 levels, the directory (bucket address table) and buckets
- the bucket contains actual records
- key is to choose a good hash function h such that no more than n records
have the same has value if n is the number of records that can be stored in a
bucket
- if there is a collusion, overflow buckets may be used
Part Number
2369
3760
4692
4871
5659
7115
1620
2428
Hash Function
20 mod 8 = 4
16 mod 8 = 0
21 mod 8 = 5
20 mod 8 = 4
25 mod 8 = 1
14 mod 8 = 6
9 mod 8 = 1
16 mod 8 = 0
0
1
2
3
4
5
3760
2428
5659
1620
2369
4871
null
null
4692
null
6
7
null
7115
null
Primary Indexing
EMPLOYEE
EMP #
107
201
371
624
Block Pointer
EMP #
NAME
DEPT
SALARY
107
10k
110
12k
112
20k
115
15k
201
25k
236
10k
307
30k
366
35k
371
12k
395
15k
524
33k
608
25k
624
20k
630
30k
724
30k
798
35k
Example
Clustering Indexing
EMPLOYEE
Salary
10k
12k
15k
20k
25k
30k
Block Pointer
EMP #
DEPT
SALARY
107
10k
236
10k
110
12k
371
12k
115
15k
395
15k
112
20k
624
20k
25k
608
25k
307
30k
630
30k
724
30k
524
33k
366
35k
798
35k
201
NAME
null
null
33k
35k
null
Secondary Indexing
EMPLOYEE
EMP # NAME
201
DEPT
1
SALARY
25k
110
12k
366
35k
107
10k
115
15k
236
10k
307
30k
112
20k
798
35k
307
366
371
395
15k
395
524
33k
524
724
30k
624
20k
630
30k
608
25k
608
624
630
724
798
371
12k
Secondary Index
Example:
Number of records, r = 30000
Block size, B = 1024 bytes
Record length, R = 100 bytes
Blocking factor, f = B/R = 1024/100 = 10 records/block
Number of blocks needed, b = 30,000/10 = 3000 blocks
Key field, V = 9 bytes
Block pointer, P = 6 bytes
Blocking factor for index entries = 1024/15 = 68
Number of blocks need to store index entries = 30000/68 =
442blocks
Number of block accesses needed = log2442 +1 = 9+1 = 10
Occupy more space
requires maintenance hence expensive
can create it on a non-key field
Multilevel Indexing
When the index file itself is large, then we can construct an index on index
This is always the primary index
Block Pointer
2
3
1
4
5
6
7
8
9
B+-Trees
All the keys and the associated data pointers to the record
reside in the leaf nodes
Example of a B-Tree