Professional Documents
Culture Documents
The logical relationships among the many records that make up the file, particularly in terms
of means of identification and access to any given record, are referred to as file organization.
Simply put, file organization is the process of storing files in a specific order.
The File is a set of documents. We can get to the records by using the primary key. The sort
of file organization that was employed for a given set of records can determine the type and
frequency of access.
The first method of mapping a database to a file is to employ many files, each containing
only one fixed-length entry. Another option is to arrange our files so that we can store data
of various lengths. Fixed-length record files are easier to implement than variable-length
record files.
• It has an ideal record selection, which means records can be selected as quickly as
feasible.
• Insert, delete, and update transactions on records should be simple and rapid.
• Duplicate records cannot be created by inserting, updating, or deleting records.
• Records should be stored efficiently to save money on storage.
There are a variety of ways to organize your files. On the basis of access or selection, these
methods offer advantages and disadvantages. The programmers choose the finest file
organization strategy for their needs when it comes to file organization.
It’s a straightforward procedure. In this method, we store the records in sequential order,
one after the other. The record will be entered in the same order as it is inserted into the
tables.
When a record is updated or deleted, the memory blocks are searched for the record. When
it is discovered, it will be marked for deletion, and a new record will be added in its place.
Assume we have four records in order, R1, R3 …. R9, and R8. As a result, records are nothing
more than a table row. If we wish to add a new record R2 to the sequence, we’ll have to put
it at the end of the file. Records are nothing more than a row on a table.
The new record is always added at the end of the file in this approach, and the sequence is
then sorted in descending or ascending order. Records are sorted using any primary key or
other keys.
If a record is modified, it will first update the record, then sort the file, and then store the
revised record in the correct location.
Assume there is a pre-existing sorted sequence of four records, R1, R3 … R9, and R8. If a
new record R2 needs to be added to the sequence, it will be added at the end of the file,
and then the series will be sorted.
The new record is put in a different block when the data block is full. This new data block
does not have to be the next data block in the memory; it can store new entries in any data
block in the memory. An unordered file is the same as a heap file. Every record in the file
has a unique id, and every page in the file is the same size. The DBMS is in charge of storing
and managing the new records.
Let’s say we have five records in a heap, R1, R3, R6, R4, and R5, and we wish to add a fifth
record, R2. If data block 3 is full, the DBMS will insert it into whichever database it chooses,
such as data block 1.
In a heap file organization, if we wish to search, update, or remove data, we must traverse
the data from the beginning of the file until we find the desired record.
Hashing
Types of Hashing in DBMS
There are two primary hashing techniques:
1. Static Hashing
In static hashing, the hash function always generates the same bucket’s address. For
example, if we have a data record for employee_id = 107, the hash function is mod-5 which
is – H(x) % 5, where x = id. Then the operation will take place like this:
H(106) % 5 = 1.
This indicates that the data record should be placed or searched in the 1st bucket (or 1st
hash index) in the hash table.
Example:
Static Hashing Technique
The primary key is used as the input to the hash function and the hash function generates
the output as the hash index (bucket’s address) which contains the address of the actual
data record on the disk block.
Static Hashing has the following Properties
Data Buckets: The number of buckets in memory remains constant. The size of the hash
table is decided initially and it may also implement chaining that will allow handling some
collision issues though, it’s only a slight optimization and may not prove worthy if the
database size keeps fluctuating.
Hash function: It uses the simplest hash function to map the data records to its appropriate
bucket. It is generally modulo-hash function
Efficient for known data size: It’s very efficient in terms when we know the data size and its
distribution in the database.
It is inefficient and inaccurate when the data size dynamically varies because we have limited
space and the hash function always generates the same value for every specific input. When
the data size fluctuates very often it’s not at all useful because collision will keep happening
and it will result in problems like – bucket skew, insufficient buckets etc.
To resolve this problem of bucket overflow, techniques such as – chaining and open
addressing are used. Here’s a brief info on both:
1. Chaining
Chaining is a mechanism in which the hash table is implemented using an array of type
nodes, where each bucket is of node type and can contain a long chain of linked lists to store
the data records. So, even if a hash function generates the same value for any data record it
can still be stored in a bucket by adding a new node.
However, this will give rise to the problem bucket skew that is, if the hash function keeps
generating the same value again and again then the hashing will become inefficient as the
remaining data buckets will stay unoccupied or store minimal data.
2. Open Addressing/Closed Hashing
This is also called closed hashing this aims to solve the problem of collision by looking out for
the next empty slot available which can store data. It uses techniques like linear
probing, quadratic probing, double hashing, etc.
2. Dynamic Hashing
Dynamic hashing is also known as extendible hashing, used to handle database that
frequently changes data sets. This method offers us a way to add and remove data buckets
on demand dynamically. This way as the number of data records varies, the buckets will also
grow and shrink in size periodically whenever a change is made.
Properties of Dynamic Hashing
The buckets will vary in size dynamically periodically as changes are made offering more
flexibility in making any change.
Dynamic Hashing aids in improving overall performance by minimizing or completely
preventing collisions.
It has the following major components: Data bucket, Flexible hash function, and directories
A flexible hash function means that it will generate more dynamic values and will keep
changing periodically asserting to the requirements of the database.
Directories are containers that store the pointer to buckets. If bucket overflow or bucket
skew-like problems happen to occur, then bucket splitting is done to maintain efficient
retrieval time of data records. Each directory will have a directory id.
Global Depth: It is defined as the number of bits in each directory id. The more the number
of records, the more bits are there.
Working of Dynamic Hashing
Example: If global depth: k = 2, the keys will be mapped accordingly to the hash index. K bits
starting from LSB will be taken to map a key to the buckets. That leaves us with the following
4 possibilities: 00, 11, 10, 01.
B+ File Organization
The B+ tree file organization is a very advanced method of an indexed sequential access
mechanism. In File, records are stored in a tree-like structure. It employs a similar key-index
idea, in which the primary key is utilised to sort the records. The index value is generated for
each primary key and mapped to the record.
Unlike a binary search tree (BST), the B+ tree can contain more than two children. All
records are stored solely at the leaf node in this method. The leaf nodes are pointed to by
intermediate nodes. There are no records in them.
• The tree has only one root node, which is number 25.
• There is a node-based intermediary layer. They don’t keep the original record. The
only thing they have are pointers to the leaf node.
• The prior value of the root is stored in the nodes to the left of the root node, while
the next value is stored in the nodes to the right, i.e. 15 and 30.
• Only one leaf node contains only values, namely 10, 12, 17, 20, 24, 27, and 29.
• Because all of the leaf nodes are balanced, finding any record is much easier.
• This method allows you to search any record by following a single path and accessing
it quickly.
If a record has to be obtained based on its index value, the data block’s address is retrieved,
and the record is retrieved from memory.
Clusters are created when two or more records are saved in the same file. There will be two
or more than two tables in the very same data block in these files, and key attributes that
are used to link these tables together will only be kept once.