You are on page 1of 22

CHAPTER 03

(PART 01)
FILE ORGANIZATION

WAAM Wanniarachchi
References
• Fundamentals of DBMS –Seventh Edition
Elmasri & navathe
File Organization
• File organization refers to the organization of the data of a
file into records, blocks, and access structures; this includes
the way records and blocks are placed on the storage
medium and interlinked.
• In other words File organization refers the logical
arrangement of data in a file system
• File is a collection of similar records that have same fields
but different values in each record
• There are various ways of organizing records in file :
• Heap file organization

• Serial file Organization

• Sequential file organization

• Indexed sequential file organization

• Hash File Organization


HEAP/PILE FILES ORGANIZATION
• Any record can be placed any where in the file where there
is space for record
• There is no ordering of record
• Generally there is a single file for each relation (table)
• The files consists of randomly ordered records
• Add – New records can be inserted in any empty space that
can accommodate them
• Delete- When old records are deleted, the space is
available for new records
• Update – If updated records grow, they my need to relocate
to a new empty space. This needs to keep a list of empty
space
Cont.
EID Name Salary
2 Jhonny 34000
1 English 78000 Block1
3 Moore 67000
5 Barak 87900
4 Roshel 89000

EID Name Salary


Block 2
8 Jhonny 34000
9 English 78000

12 Barak 87900
10 Roshel 89000
• Inserting a new record is very efficient. The last disk block of the file
is copied into a buffer, the new record is added, and the block is
then rewritten back to disk
• However, searching for a record using any search condition involves
a linear search through the file block by block—an expensive
procedure.
• If only one record satisfies the search condition then, on the
average, a program will read into memory and search half the file
blocks before it finds the record. For a file of b blocks,
• this requires searching (b/2) blocks, on average. If no records or
several records satisfy the search condition, the program must read
and search all b blocks in the file.
PROS/CONS OF HEAP STRUCTURE

Pros
• When data is being bulk-loaded into the relation.
• Insertion is efficient
• Best if file scans are common or insertions are frequent

Cons
• Retrieval requires a linear search which is inefficient
• Deletion can result in unused space which needs for reorganization
Serial File Organization

• A serial file contains records organized by the order in which they were
entered.
• The order of the records is fixed.
• File is unordered
• Serial files are primarily used as transaction files in which the transactions
are recorded in the order that they occur
• In general it is only used on a serial medium such as magnetic tape
• To retrieve a single record the whole file needs to be read from the
begging to end
SEQUENTIAL FILE ORGANIZATION
• The records are written consecutively when the file is created and must be accessed
consecutively when the file is used
• These are serial files whose records are sorted and stored in an ascending or
descending on a particular key field
• Every file record contains a data field (attribute) to uniquely identify that record.
This key is usually the primary key, though secondary keys may be used as well.
• The key difference between a sequential file and a serial file is that it is ordered in
a logical sequence based on a key field.
PROS/CONS OF SEQUENTIAL FILES
Pros
• Simple file design
• Efficient when most of the records must be processed (Batch processing – ex:
print pay checks to all employees)
• Very efficient if the data has a natural order
• Can be stored on magnetic tapes also

Cons
• Entire file must be processed even a single record is to be searched
• Overall processing is slow
Indexed sequential file organization
• Each record of a file has a key field which uniquely identifies that record
• An index consists of key and addresses
• An index is an auxiliary structure for a file that consists of an ordered sequence
of value-pointer pairs
• A value-pointer pair, for example, might be a customer-ID value and a disk-
address pointer that points to the block that includes the customer-ID value. A
sequence of these pairs ordered by customer ID would be an index.
• Can access the records either sequentially or randomly using the index
• Index is stored in a file and read into memory when the file is opened
• Multiple keys are possible ( primary and alternate keys)
• The type of file organization is suitable for both batch processing and online
processing.
IS FILE ORGANIZATION (2)
Pros/Cons of Indexed Sequence Files
• Pros
• Multiple keys are possible
• Both sequential and random access is possible
• Accessing of records is fast , if the index table is properly organized

• Cons
• More storage space is required because of indexing
• Less efficient of use the storage space
HASH FILE ORGANIZATION
• Hashing is an effective technique to calculate the direct location of a data
record on the disk without using index structure.
• Hashing uses hash functions with search keys as parameters to generate the
address of a data record.
• The field on which hash function is calculated is called as Hash field and if
that field acts as the key of the relation then it is called as Hash key.
Pros/Cons of Hash file organization
Pros
• When tuples are retrieve based on an exact match on the hash field value,
particularly if the access order is random.

Cons
• When tuples are retrieved based on a range of values for the hash field.
• When the hash field frequently updated. When a hash field updated,
the DBMS must deleted the entire tuple and possible relocate it to a new
address (if the has function results in a new address). Thus, frequent updating
of the hash field impacts performance.
Spanned Vs. Un-spanned Records

• The records of a file must be allocated to disk blocks because a


block is the unit of data transfer between disk and memory. When
the block size is larger than the record size, each block will contain
numerous records.
• Suppose that the block size is B bytes. For a file of fixed-length
records of size R bytes, with B ≥ R, we can fit bfr = ⎣B/R⎦ records
per block,
• In general, R may not divide B exactly, so we have some unused
space in each block equal to |B − (BFR * R) BYTES|
Spanned Vs. Un-spanned Records
• To utilize this unused space, we can store part of a record on one
block and the rest on another.
• A pointer at the end of the first block points to the block containing
the remainder of the record in case it is not the next consecutive
block on disk. This organization is called spanned because records
can span more than one block
• Whenever a record is larger than a block, we must use a spanned
organization. If records are not allowed to cross block boundaries,
the organization is called unspanned.
File Operations

• Update Operations - change the data values by insertion, deletion, or update


• Retrieval Operations -retrieve them after optional conditional filtering
• Open - the file pointer points to the beginning of the file. There are options where the user
can tell the operating system where to locate the file pointer at the time of opening a file.
• Read mode – read and shared
• Write mode – read and write
• Locate - Every file has a file pointer, which tells the current position where the data is to be
read or written
• Close
• removes all the locks (if in shared mode),
• saves the data (if altered) to the secondary storage media, and
• releases all the buffers and file handlers associated with the
END

You might also like