Professional Documents
Culture Documents
Chapter 03 - 1
Chapter 03 - 1
(PART 01)
FILE ORGANIZATION
WAAM Wanniarachchi
References
• Fundamentals of DBMS –Seventh Edition
Elmasri & navathe
File Organization
• File organization refers to the organization of the data of a
file into records, blocks, and access structures; this includes
the way records and blocks are placed on the storage
medium and interlinked.
• In other words File organization refers the logical
arrangement of data in a file system
• File is a collection of similar records that have same fields
but different values in each record
• There are various ways of organizing records in file :
• Heap file organization
12 Barak 87900
10 Roshel 89000
• Inserting a new record is very efficient. The last disk block of the file
is copied into a buffer, the new record is added, and the block is
then rewritten back to disk
• However, searching for a record using any search condition involves
a linear search through the file block by block—an expensive
procedure.
• If only one record satisfies the search condition then, on the
average, a program will read into memory and search half the file
blocks before it finds the record. For a file of b blocks,
• this requires searching (b/2) blocks, on average. If no records or
several records satisfy the search condition, the program must read
and search all b blocks in the file.
PROS/CONS OF HEAP STRUCTURE
Pros
• When data is being bulk-loaded into the relation.
• Insertion is efficient
• Best if file scans are common or insertions are frequent
Cons
• Retrieval requires a linear search which is inefficient
• Deletion can result in unused space which needs for reorganization
Serial File Organization
• A serial file contains records organized by the order in which they were
entered.
• The order of the records is fixed.
• File is unordered
• Serial files are primarily used as transaction files in which the transactions
are recorded in the order that they occur
• In general it is only used on a serial medium such as magnetic tape
• To retrieve a single record the whole file needs to be read from the
begging to end
SEQUENTIAL FILE ORGANIZATION
• The records are written consecutively when the file is created and must be accessed
consecutively when the file is used
• These are serial files whose records are sorted and stored in an ascending or
descending on a particular key field
• Every file record contains a data field (attribute) to uniquely identify that record.
This key is usually the primary key, though secondary keys may be used as well.
• The key difference between a sequential file and a serial file is that it is ordered in
a logical sequence based on a key field.
PROS/CONS OF SEQUENTIAL FILES
Pros
• Simple file design
• Efficient when most of the records must be processed (Batch processing – ex:
print pay checks to all employees)
• Very efficient if the data has a natural order
• Can be stored on magnetic tapes also
Cons
• Entire file must be processed even a single record is to be searched
• Overall processing is slow
Indexed sequential file organization
• Each record of a file has a key field which uniquely identifies that record
• An index consists of key and addresses
• An index is an auxiliary structure for a file that consists of an ordered sequence
of value-pointer pairs
• A value-pointer pair, for example, might be a customer-ID value and a disk-
address pointer that points to the block that includes the customer-ID value. A
sequence of these pairs ordered by customer ID would be an index.
• Can access the records either sequentially or randomly using the index
• Index is stored in a file and read into memory when the file is opened
• Multiple keys are possible ( primary and alternate keys)
• The type of file organization is suitable for both batch processing and online
processing.
IS FILE ORGANIZATION (2)
Pros/Cons of Indexed Sequence Files
• Pros
• Multiple keys are possible
• Both sequential and random access is possible
• Accessing of records is fast , if the index table is properly organized
• Cons
• More storage space is required because of indexing
• Less efficient of use the storage space
HASH FILE ORGANIZATION
• Hashing is an effective technique to calculate the direct location of a data
record on the disk without using index structure.
• Hashing uses hash functions with search keys as parameters to generate the
address of a data record.
• The field on which hash function is calculated is called as Hash field and if
that field acts as the key of the relation then it is called as Hash key.
Pros/Cons of Hash file organization
Pros
• When tuples are retrieve based on an exact match on the hash field value,
particularly if the access order is random.
Cons
• When tuples are retrieved based on a range of values for the hash field.
• When the hash field frequently updated. When a hash field updated,
the DBMS must deleted the entire tuple and possible relocate it to a new
address (if the has function results in a new address). Thus, frequent updating
of the hash field impacts performance.
Spanned Vs. Un-spanned Records