You are on page 1of 52

Unit VI

File Management

Marks:10
• Many different types of information maybe stored in
a file—source or executable programs, numeric or
text data, photos, music, video, and so on.
• A file has a certain defined structure, which depends
on its type.
• For example,
– A text file is a sequence of characters organized
into lines and pages.
– A source file is a sequence of functions, each of
which is further organized as declarations followed
by executable statements.
– An executable file is a series of code sections that
the loader can bring into memory and execute.
6.1 File-Concepts,Attributes, Operations,Types,Structure

Concepts:
• A file is a named collection of related information
that is recorded on secondary storage.
• A file provides contiguous logical address space.
• Files represent programs (both source and object
forms) and data.
• Data files may be numeric, alphabetic, alphanumeric,
or binary.
• In general, a file is a sequence of bits, bytes, lines, or
records.
• The information in a file is defined by its creator.
 Attributes of a File
• A file’s attributes vary from one OS to another but typically
consist of these:
– Name – It is the only information which is in human-
readable form.
– Identifier. It is unique number, identifies the file within the
file system; it is the non-human-readable name for the file.
– Type – needed for systems that support different types.
– Location – pointer to file location on device.
– Size – current file size (in bytes, words, or blocks).
– Protection – controls who can do reading, writing,
executing.
– Time, date, and user identification –these data can be
useful for protection, security, and usage monitoring.
• Information about files are kept in the directory structure,
which is maintained on the disk.
 File Operations
• A file is an abstract data type.
• The operating system can provide system calls to
create, write, read, reposition, delete, and truncate
files.
• To define a file properly, we need to perform
following operations on files:
– Create
– Write
– Read
– Reposition within file – file seek
– Delete
– Truncate
• Creating a file:
– Two steps are necessary to create a file.
• Allocate Space
• an entry for the new file must be made in the directory.
• Writing a file:
– Requires name of the file and the information to be
written to the file.
– Search the directory to find the file's location.
– keep a write pointer to the location in the file where the
next write is to take place.
– Updates the write pointer after each write
• Reading a file:
– Requires name of the file and the information to be read
– Search the directory to find the file's location.
– keep a read pointer to the location in the file where the
next read is to take place.
– Updates the read pointer after each read
• Repositioning within a file:
– The directory is searched for the appropriate entry
– current-file-position pointer is repositioned to a given
value
– Repositioning within a file need not involve any actual I/O.
– This file operation is also known as a file seek.
• Deleting a file:
– Search the directory for the named file
– Having found the associated directory entry
• Deallocate space
• Remove directory entry
• Truncating a file:
– The user may want to erase the contents of a file but keep
its attributes.
– Rather than forcing the user to delete the file and then
recreate it, this function allows all attributes to remain
unchanged—except for file length.
– That is file will be reset to length zero and its file space
released.
• Open(Fi) :
– open() operation takes a file name and searches the
directory, and move the content of entry to memory.
• Close (Fi) :
– move the content of entry Fi in memory to directory
structure on disk.
 File types
 File Structure
• File types also can be used to indicate the internal structure of
the file.
• For ex: source and object files have structures that match the
expectations of the programs that read them.
• An OS may require a file to have a specific structure so that the
OS will provide special operations for those files.
• A file can have different structures, determined by OS or
Program.
– A text file is a sequence of characters organized into lines
and pages.
– A source file is a sequence of functions, each of which is
further organized as declarations followed by executable
statements.
– An executable file is a series of code sections that the
loader can bring into memory and execute.
• three common structures
1. Unstructured/Byte Sequence
• file is an unstructured sequence of bytes or words
• read and write operation works on bytes
• Both UNIX and Windows use this approach.
2. Structured/Record structure
• file is a sequence of fixed-length records where each with some
internal structure.
• read operation returns a record and write operation just
appends a record.
• e.g., database
3. Complex/Tree structures
• file basically consists of a tree of record
• Records may be of same length or variable length
• Each record containing a key field in a fixed position in the
record.
• Because disk space is always allocated in blocks, All
file systems suffer from internal fragmentation; the
larger the block size, the greater the internal
fragmentation.
6.2 Access Methods
• Files store information. When it is used, this
information must be accessed and read into
computer memory.
• File access mechanism refers to the manner in which
the records of a file may be accessed. There are
several ways to access files −
1. Sequential access
2. Direct/Random access
1. Sequential access
• simplest access method
• In Sequential access, information in the file is
processed in order, one record after the other.
• For example, editors and compilers usually access
files in this fashion.
• Sequential access is based on a tape model of a file.
• Works as well on sequential-access devices
• A read operation reads the next portion of the file
and automatically updates a file pointer.
• Write operation appends to the end of the file and
updates the pointer to the end of the newly written
data.
• Advantages of sequential access
– It is simple to program and easy to design.
• Disadvantages of sequential access
– Sequential file is time consuming process.
– Sequential access increases interaction cost
– Random searching is not possible.
2. Direct access/Random Access
• Direct access method also known as relative access
method.
• A file is made up of fixed length logical records that allow
programs to read and write records rapidly in no
particular order.
• There are no restrictions on the order of reading or
writing for a direct-access file.
• The direct access is based on the disk model of a file,
since disks allow random access to any file block.
• Thus, we may read block 14, then read block 53, and then
write block 7.
• Direct-access files are used to access to large amounts of
information immediately. Databases provides direct
access.
• File operations must be modified to include the block
number as a parameter. read(n) and write(n)- where
n is the block number.
• The block number provided by the user to the OS is a
relative block number.
• Relative block number is an index relative to the
beginning of the file.
• Thus, the first relative block of the file is 0, the next is
1, and so on
• Advantages:
– Direct access file helps in online transaction processing
system (OLTP) like online railway reservation system.
– sorting of the records are not required
– accesses the desired records immediately
– updates several files quickly
– has better control over record allocation

• Disadvantages:
– Direct access file does not provide backup facility.
– It is expensive.
Swapping
• Swapping is a memory management scheme.
• Swapping is a mechanism in which a process can be
swapped out of main memory to secondary storage and
make that memory available to other processes. After some
time that process is swapped in into memory for continued
execution.
• There are two important concepts of swapping:
– Swap In :moving a process from backing store to main
memory
– Swap Out : moving a process from main memory to backing
store
• Swapping involves moving processes between main
memory and a backing store.
• The backing store is commonly a fast disk. It must be large
enough and must provide direct access.
• Whenever the CPU scheduler decides to execute a
process, it calls the dispatcher. The dispatcher checks
to see whether the next process in the queue is in
memory. If it is not, and if there is no free memory
region, the dispatcher swaps out a process currently
in memory and swaps in the desired process.
• Advantages:
– Helps in achieving the goal of Maximum CPU Utilization.
– Ensures proper memory availability for every process that
needs to be executed.
– Degree of Multiprogramming will be increased.
– used to improve main memory utilization
– helps to create and use virtual memory.
• Disadvantages:
– If the computer system loses power, the user might lose all
the information related to the program.
– If the swapping algorithm is not good, the overall method
can increase the number of page faults and decline the
overall processing performance.
File Allocation Methods
• Allocation method provides a way in which the disk
will be utilized and the files will be accessed
• There are various methods which can be used to
allocate disk space to the file.
1. Contiguous Allocation
2. Linked Allocation
3. Indexed Allocation
1. Contiguous Allocation
• Contiguous allocation requires that each file occupy a set of
contiguous blocks on the disk.
• Disk addresses define a linear ordering on the disk.
• For example, if a file requires n blocks and is given a block b as
the starting location, then the blocks assigned to the file will be:
b, b+1, b+2,……b+n-1.
• Contiguous allocation of a file is defined by the disk address of
the first block and length (in block units).
• The directory entry for a file with contiguous allocation contains
1. Address of starting block
2. Length of the allocated portion.
• Both sequential and direct access can be supported by
contiguous allocation.
• number of disk seeks required for accessing contiguously
allocated files is minimal.
The file ‘mail’ in the following figure starts from the block 19 with
length = 6 blocks. Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks
Contiguous Allocation
• Advantages:
- Easy to implement
- Both sequential and direct access can be supported by
contiguous allocation.
- Faster , since number of disk seeks required for accessing
contiguously allocated files is minimal.
• Disadvantages:
– Finding space for a new file is difficult.
– This method suffers from both internal and external
fragmentation. This makes it inefficient in terms of
memory utilization.
2. Linked Allocation
• With linked allocation, each file is a linked list of disk blocks.
• The disk blocks may be scattered anywhere on the disk.
• The directory contains a pointer to the first and last blocks of
the file.
• Each block contains a pointer to the next block occupied by the
file.
• These pointers are not made available to the user
• Thus, if each block is 512 bytes in size, and a disk address (the
pointer) requires 4 bytes, then the user sees blocks of 508
bytes.
• There is no external fragmentation with linked allocation.
• Supports sequential-access files.
• File size does not have to be specified.
• A file can continue to grow as long as free blocks are available.
The file ‘jeep’ in following image shows how the blocks are randomly
distributed. This file starts at block 9 and continue at block 16, then
block 1, then block 10, and finally block 25. The last block (25)
contains -1 indicating a null pointer and does not point to any other
block.
Linked Allocation
• Advantages:
- Creating a new file is very easy.
- There is no external fragmentation with linked allocation.
- A file can continue to grow as long as free blocks are
available.
- File size does not have to be specified.
Disadvantages:
- It can be used effectively only for sequential-access files. it
is inefficient to support a direct-access capability for
linked-allocation files.
- Additional space required for the pointers.
- Large number of disk seeks may be necessary , since
blocks are scattered everywhere
- Reliability: what if a pointer is lost or damaged?
3. Indexed Allocation
• Indexed allocation brings all the pointers together into one
location: the index block.
• Index block contains the pointers to all the blocks occupied by a
file .
• Indexed block doesn't hold the file data.
• Each file has its own index block, which is an array of disk-block
addresses.
• The ith entry in the index block points to the ith block of the file.
• The directory contains the address of the index block.
• Provides solutions to problems of contiguous and linked allocation.
• Supports Direct access. Hence provides fast access to the file
blocks.
• Indexed allocation supports direct access, without suffering from
external fragmentation.
• Wastage of space due to index-block.
The file ‘jeep’ in following figure with index-block 19. This index-block
contains pointers of all the allocated blocks.
Indexed Allocation Allocation
• Advantages:
- Creating a new file is very easy.
- There is no external fragmentation.
- A file can continue to grow as long as free blocks are
available.
- File size does not have to be specified.
- Supports Direct access. Hence provides fast access to the
file blocks.
Disadvantages:
- Additional space required for the index-block. The pointer
overhead for indexed allocation is greater than linked
allocation .
- A faulty index block could result in the loss of the entire
file.
6.3 Directory Structure
• The file system consists of two distinct parts:
1. a collection of files, each storing related data
2. directory structure, which organizes and provides
information about all the files in the system.
• The information about all files is kept in the directory
structure, which also resides on secondary storage.
• A directory is a container that is used to hold folders
and file. It is a logical structure.
• Logical structure of the directory can be defined in 3
ways:
1. Single-Level Directory
2. Two-Level Directory
3. Tree-Structured Directories
1. Single-Level Directory
• simplest directory structure
• There is only one directory in a single-level directory, and
that directory is called a root directory.
• All files are contained in the same directory, which is easy to
support and understand. This is shown in fig.

• The users are not allowed to create subdirectories under the


root directory.
• Since all files are in the same directory, they must have
• Single level directory has significant limitations when the
number of files increases or when the system has more than one
user.

• Advantages:
– Implementation of a single-level directory is the simplest.
– File creation, searching, deletion is very simple since we have
only one directory.

• Disadvantages:
– Different users cannot have the same file name.
– Protection cannot be implemented for multiple users.
– The task of choosing the unique file name is a little bit
complex.
– we cannot group the similar type of files.
2. Two-Level Directory
• In the two-level directory structure, each user has his
own user file directory (UFD).
• Each UFD lists only the files of a single user.
• When a user job starts or a user logs in, the system’s
master file directory (MFD) is searched.
• The MFD is indexed by user name or account number,
and each entry points to the UFD for that user.
• The user directories themselves must be created and
deleted as necessary
• The program creates a new UFD and adds an entry for
it to the MFD.
• This structure effectively isolates one user from
another.
• A two-level directory can be thought of as a tree of
height 2. The root of the tree is the MFD. Its direct
descendants are the UFDs. The descendants of the
UFDs are the files themselves. The files are the leaves
of the tree.
• A user name and a file name define a path name.
Every file in the system has a path name.
• Advantages:
– Different users can have the same file name.
– Searching becomes more efficient as only one user's list
needs to be traversed.

• Disadvantages:
– one user cannot share the file with another user.
3. Tree Structured Directory
• The directory is structured in the form of a tree.
• A two-level directory can be thought of as a tree of height
2.
• Tree Structured Directory allows users to create their own
subdirectories and to organize their files accordingly.
• The tree has a root directory, and every file in the system
has a unique path name.
• A directory (or subdirectory) contains a set of files or
subdirectories.
• One bit in each directory entry defines the entry as a file
(0) or as a subdirectory (1).
• Each user has its own directory and it cannot enter in the
other user's directory. However, the user has the
permission to read the root's data but he cannot write or
• Searching is more efficient in this directory structure. The
concept of current working directory is used. A file can be
accessed by two types of path, either relative or absolute.
• Absolute path is the path of the file with respect to the root
directory.
• A relative path name defines a path from the current directory.
• Advantages:
– Very generalize, since full path name can be given.
– Very scalable
– the probability of name collision is less.
– Searching becomes very easy, we can use both absolute
path as well as relative.

• Disadvantages:
– We cannot share files.
– It is inefficient, because accessing a file may go under
multiple directories.
Disk Organization
• Disk organization refers to the manner in which the disk drive is
organized for storage of data.
• It contains many platters.
• The platters of a hard disk have two sides for recording the data.
• Every surface of the platter has invisible concentric circles on it.
These circles are called tracks. All information stored on a hard
disk is recorded in tracks.
• Each track is further divided into sectors. A sector is the smallest
unit that can be accessed on a hard disk.
• A cylinder comprises the same track number on each platter.
• The platters rotate at a very high speed (5400 RPM to 10,000
RPM)
• Read-Write(R-W) head moves over the rotating hard disk. It is this
Read-Write head that performs all the read and writes operations
on the disk
• The logical structure depends on the type of operating
system and file system used.
• File System is a method and data structure that the
operating system uses to control how data is stored
and retrieved.
• Most common file systems used by OS are:
– Windows OS:
• FAT( File Allocation Table)
• FAT32
• NTFS (New Technology File System)
• ExFAT (Extensible File Allocation Table)
– MacOS:
• APFS (Apple File System )
• Mac OS Extended:
– Linux:
• EXT (Extended File System)
RAID Structure of Disk
• RAID (redundant array of independent disks) is a way
of storing the same data in different places on
multiple hard disks or solid-state drives (SSDs) to
protect data in the case of a drive failure.
• RAID can create redundancy, improve performance,
or do both.
• RAID employs the techniques of disk mirroring or
disk striping.
• Mirroring will copy identical data onto more than
one drive. Striping help spread data over multiple
disk drives.
RAID Levels
• Defined six levels of RAID - 0 through 5.
• RAID 0:
– RAID 0 (disk striping) is the process of dividing a body of
data into blocks and spreading the data blocks across.
– In the event of a disk failure, data is lost.
– Cost-efficient and straightforward to implement.
– Increased read and write performance.
– No overhead (total capacity use).
– Doesn't provide fault tolerance
– Doesn't provide redundancy.
• RAID 1:
– RAID 1 implements disk mirroring, where a copy of
the same data is recorded onto two drives.
– By keeping two copies of data on separate disks, data
is protected against a disk failure.
– Increased read performance.
– Provides redundancy and fault tolerance.
– Uses only half of the storage capacity.
– More expensive
• RAID 2:
– It combines bit-level striping with error checking and
information correction.
– Instead of data blocks, RAID 2 stripes data at the bit level
across multiple disks. Additionally, it uses the Humming
error ode correction (ECC) and stores this information on
the redundancy disk.
– The ability to correct stored information.
• RAID 3:
– Bit-Level Striping with Dedicated Parity
– It utilizes bit-level striping and a dedicated parity disk.
– it requires at least three drives, where two are used for
storing data strips, and one is used for parity.
– Good throughput when transferring large amounts of data.
– High efficiency with sequential operations.
• RAID 4:
– Block-Level Striping with Dedicated Parity
– It utilizes block-level striping and a dedicated parity disk.
– It consists of block-level data striping across two or more
independent disk and a dedicated parity disk.
– The implementation requires at least three disks – two for
storing data strips and one dedicated for storing parity and
providing redundancy.
– Fast read operations.
– Slow write operations.
• RAID 5:
– Striping with Parity
– It combines striping and parity to provide a fast and
reliable setup.
– gives the user storage usability as with RAID 1 and the
performance efficiency of RAID 0.
– Data is divided into data strips and distributed across
different disks in the array.
– Parity bits are distributed evenly on all disks after each
sequence of data has been saved.

You might also like