You are on page 1of 13

Chapter five

File system

Introduction
In computing, file system or filesystem is a method and data structure that the operating
system uses to control how data is stored and retrieved. Without a file system, data placed
in a storage medium would be one large body of data with no way to tell where one piece of
data stops and the next begins. By separating the data into pieces and giving each piece a
name, the data is easily isolated and identified. Taking its name from the way paper -based
data management system is named, each group of data is called a file. The structure and logic
rules used to manage the groups of data and their names is called a file system.

5.2. Files: data, metadata, operations, organization, buffering, sequential, non-


sequential
Metadata

Metadata is data about data. This refers to not the data itself, but rather to any information
that describes some aspect of the data. Everything from the title, information on how data
fits together (e.g., which page goes before which other page), when and by whom the data
was created, and lists of web pages visited by people, can be classified as metadata. Metadata
can be stored in a variety of places. Where the metadata relates to databases, the data is often
stored in tables and fields within the database. Sometimes the metadata exists in a specialist
document or database designed to store such data, called a data dictionary or metadata
repository. There are some types of specialist data files that include both the raw data and
the metadata (e.g., the SPSS .sav data file and .mdd data file, Triple S .sss). More generally,
metadata can be stored anywhere (e.g., in emails, questionnaires, data collection
instructions, or spreadsheets).

5.3 Directories: contents and structure


Directory
A directory is a container that is used to contain folders and files. It organizes f iles and folders
in a hierarchical manner. The directory can be viewed as a symbol table that translates file
names into their directory entries. If we take such a view, we see that the directory itself can
be organized in many ways. The organization must allow us to insert entries, to delete
entries, to search for a named entry, and to list all the entries in the directory.
In this section, we examine several schemes for defining the logical structure of the directory
system. When considering a particular directory structure, we need to keep in mind the
operations that are to be performed on a directory:

•Search for a file: We need to be able to search a directory structure to find the entry for a
particular file. Since files have symbolic names, and similar names may indicate a
relationship among files, we may want to be able to find all files whose names match a
particular pattern.
•Create a file: New files need to be created and added to the directory.
•Delete a file: When a file is no longer needed, we want to be able to remove it from the
directory.
•List a directory: We need to be able to list the files in a directory and the contents of the
directory entry for each file in the list.
•Rename a file: Because the name of a file represents its contents to its users, we must be
able to change the name when the contents or use of the file changes. Renaming a file may
also allow its position within the directory structure to be changed.
•Traverse the file system: We may wish to access every directory and every file within a
directory structure. For reliability, it is a good idea to save the contents and structure of the
entire file system at regular intervals. Often, we do this by copying all files to magnetic tape.
This technique provides a backup copy in case of system failure. In addition, if a file is no
longer in use, the file can be copied to tape and the disk space of that file released for reuse
by another file. In the following sections, we describe the most common schemes for
defining the logical structure of a directory

Single-Level Directory
The simplest directory structure is the single-level directory. All files are contained in the
same directory, which is easy to support and understand A single -level directory has
significant limitations, however, when the number of files increases or when the system has
more than one user. Since all files are in the same directory, they must have unique names.
If two users call their data filetest.txt, then the unique-name rule is violated. Even a single
user on a single-level directory may find it difficult to remember the names of all the files as
the number of files increases. It is not uncommon for a user to have hundreds of files on one
computer system and an equal number of additional files on another system. Keeping track
of so many files is a daunting task.

Two-Level Directory

As we have seen, a single-level directory often leads to confusion of file names among
different users. The standard solution is to create a separate directory for each user. In the
two-level directory structure, each user has his own user file directory (UFD). The UFDs have
similar structures, but each list only the files of a single user. When a user job starts or a user
logs in, the system’s master file directory (MFD) is searched. The MFD is indexed by user
name or account number, and each entry points to the UFD for that user (Figure below).
When a user refers to a particular file, only his own UFD is searched. Thus, different users
may have files with the same name, as long as all the file names within each UFD are unique.
To create a file for a user, the operating system searches only that user’s UFD to ascertain
whether another file of that name exists. To delete a file, the operating system confines its
search to the local UFD; thus, it cannot accidentally delete another user’s file that has the
same name.

Although the two-level directory structure solves the name-collision problem, it still has
disadvantages. This structure effectively isolates one user from another. Isolation is an
advantage when the users are completely independent but is a disadvantage when the users
want to cooperate on some tasks and to access one another’s files. Some systems simply do
not allow local user files to be accessed by other users
Tree-Structured Directories

Once we have seen how to view a two-level directory as a two-level tree, the natural
generalization is to extend the directory structure to a tree of arbitrary height (fig). This
generalization allows users to create their own subdirectories and to organize their files
accordingly. A tree is the most common directory structure. The tree has a root directory,
and every file in the system has a unique path name. A directory (or subdirectory) contains
a set of files or subdirectories. A directory is simply another file, but it is treated in a special
way. All directories have the same internal format.

One bit in each directory entry defines the entry as a file (0) or as a subdirectory (1). Special
system calls are used to create and delete directories. In normal use, each proces s has a
current directory. The current directory should contain most of the files that are of current
interest to the process. When reference is made to a file, the current directory is searched. If
a file is needed that is not in the current directory, then the user usually must either specify
a path name or change the current directory to be the directory holding that file. To change
directories, a system call is provided that takes a directory name as a parameter and uses it
to redefine the current directory. Thus, the user can change her current directory whenever
she wants. From one change directory () system call to the next, all open()system calls search
the current directory for the specified file. Note that the search path may or may not contain
a special entry that stands for “the current directory.”

5.4 File systems: partitioning, mount/unmount, and virtual file


systems
Partition

A partition is a logical division of a hard disk that is treated as a separate unit by operating
systems (OSes) and file systems. The OSes and file systems can manage information on each
partition as if it were a distinct hard drive. This allows the drive to operate as several smaller
sections to improve efficiency, although it reduces usable space on the hard disk because of
additional overhead from multiple OSes.
A disk partition manager allows system administrators to create, resize, delete and
manipulate partitions, while a partition table logs the location and size of the partition. Each
partition appears to the OS as a distinct logical disk, and the OS reads the partition table
before any other part of the disk. Once a partition is created, it is formatted with a file system
such as:

• NTFS on Windows drives;


• FAT32 and exFAT for removable drives;
• HFS Plus (HFS+) on Mac computers; or
• Ext4 on Linux.

Data and files are then written to the file system on the partition. When users boot the OS in
a computer, a critical part of the process is to give control to the first sector on the hard disk.
This includes the partition table that defines how many partitions will be formatted on the
hard disk, the size of each partition and the address where each disk partitio n begins. The
sector also contains a program that reads the boot sector for the OS and gives it control so
that the rest of the OS can be loaded into random access memory. A key aspect of partitioning
is the active or bootable partition, which is the designated partition on the hard drive that
contains the OS. Only the partition on each drive that contains the boot loader for the OS can
be designated as the active partition. The active partition also holds the boot sector and must
be marked as active. A recovery partition restores the computer to its original shipping
condition. In enterprise storage, partitioning helps enable short stroking, a practice of
formatting a hard drive to speed performance through data placement.

Partitioning and its types

There are three types of partitions: primary partitions, extended partitions and logical
drives. A disk may contain up to four primary partitions (only one of which can be active), or
three primary partitions and one extended partition.

Disk partitioning
Disk partitioning or disk slicing is the creation of one or more regions on secondary
storage, so that each region can be managed separately. These regions are called partitions.
It is typically the first step of preparing a newly installed disk, before any file system is
created. The disk stores the information about the partitions’ locations and sizes in an area
known as the partition table.
that the operating system reads before any other part of the disk. Each partition th en
appears to the operating system as a distinct “logical” disk that uses part of the actual disk.
System administrators use a program called a partition editor to create, resize, delete, and
manipulate the partitions. Partitioning allows the use of differ ent filesystems to be installed
for different kinds of files. Separating user data from system data can prevent the system
partition from becoming full and rendering the system unusable. Partitioning can also
make backing up easier. A disadvantage is that it can be difficult to properly size partitions,
resulting in having one partition with too much free space and another nearly totally
allocated.

Mounting and Unmounting File Systems

Before you can access the files on a file system, you need to mount the file system. When you
mount a file system, you attach that file system to a directory (mount point) and make it
available to the system. The root (/) file system is always mounted. Any other file system can
be connected or disconnected from the root (/) file system. When you mount a file system,
any files or directories in the underlying mount point directory are unavailable as long as the
file system is mounted. These files are not permanently affected by the mounting process,
and they become available again when the file system is unmounted. However, mount
directories are typically empty, because you usually do not want to obscure existing files.
For example, the following figure shows a local file system, starting with a root (/) file system
and the sbin, etc, and opt subdirectories.
Figure: Sample root (/) File System
To access a local file system from the /opt file system that contains a set of unbundled
products, you must do the following:

• First, you must create a directory to use as a mount point for the file system you want to
mount, for example, /opt/unbundled.
• Once the mount point is created, you can mount the file system (by using the mount
command), which makes all of the files and directories in /opt/unbundled available, as
shown in the following figure.
Figure 35: Mounting a File System

Virtual File System

An operating system can have multiple file systems in it. Virtual File Systems are used to
integrate multiple file systems into an orderly structure. The key idea is to abstract out that
part of the file system that is common to all file systems and put that code in a separate layer
that calls the underlying concrete file system to actually manage the data. Structure of Virtu al
File Systems in UNIX system:
Figure: Virtual File System
The VFS also has a ‘lower’ interface to the concrete file systems, which is labeled VFS
interface. This interface consists of several dozen function calls that the VFS can make to each
file system to get work done. VFS has two distinct interfaces: the upper one to the user
processes and the lower one to the concrete file systems. VFS supports remote file systems
using the NFS (Network File System) protocol.

5.6 Memory-mapped files


A memory-mapped file contains the contents of a file in virtual memory. This mapping
between a file and memory space enables an application, including multiple processes, to
modify the file by reading and writing directly to the memory. You can use managed code to
access memory-mapped files in the same way that native Windows functions access
memory-mapped files, as described in Managing Memory-Mapped Files. There are two types
of memory-mapped files:

Persisted memory-mapped files: Persisted files are memory-mapped files that are
associated with a source file on a disk. When the last process has finished working with the
file, the data is saved to the source file on the disk. These memory-mapped files are suitable
for working with extremely large source files.
Non-persisted memory: mapped files Non-persisted files are memory-mapped files that
are not associated with a file on a disk. When the last process has finished working with the
file, the data is lost and the file is reclaimed by garbage collection. These files are suitable for
creating shared memory for inter-process communications (IPC).
Processes, Views, and Managing Memory Memory: mapped files can be shared across
multiple processes. Processes can map to the same memory-mapped file by using a common
name that is assigned by the process that created the file. To work with a memory-mapped
file, you must create a view of the entire memory-mapped file or a part of it. You can also
create multiple views to the same part of the memory-mapped file, thereby creating
concurrent memory. For two views to remain concurrent, they have to be created from the
same memory-mapped file.

Multiple views may also be necessary if the file is greater than the size of the application’s
logical memory space available for memory mapping (2 GB on a 32-bit computer). There are
two types of views: stream access view and random access view. Use stream access views
for sequential access to a file; this is recommended for non-persisted files and IPC. Random
access views are preferred for working with persisted files. Memory-mapped files are
accessed through the operating system’s memory manager, so the file is automatically
partitioned into a number of pages and accessed as needed. You do not have to handle the
memory management yourself. The following illustration shows how multiple processes can
have multiple and overlapping views to the same memory-mapped file at the same time. The
following image shows multiple and overlapped views to a memory-mapped file:

5.7 Special Purpose File System


FAT file systems are commonly found on floppy disks, flash memory cards, digital cameras,
and many other portable devices because of their relative simplicity. Performance of FAT
compares poorly to most other file systems as it uses overly simplistic data structures,
making file operations time-consuming, and makes poor use of disk space in situations
where many small files are present. ISO 9660 and Universal Disk Format are two common
formats that target Compact Discs and DVDs. Mount Rainier is a newer extension to UDF
supported by Linux 2.6 series and Windows Vista that facilitates rewriting to DVDs in the
same fashion as has been possible with floppy disks.

5.8 Naming, searching, access


File Access Methods

Files store information. When it is used, this information must be accessed and read into
computer memory. The information in the file can be accessed in several ways. Some systems
provide only one access method for files. While others support many access methods, and
choosing the right one for a particular application is a major design problem.

1. Sequential file Access

The simplest access method is sequential access. Information in the file is processed in order,
one record after the other. This mode of access is by far the most common; for example,
editors and compilers usually access files in this fashion. Reads and writes make up the bulk
of the operations on a file. A read operation—read next ()—reads the next portion of the file
and automatically advances a file pointer, which tracks the I/O location. Similarly, the write
operation—write next ()—appends to the end of the file and advances to the end of the newly
written material (the new end of file). Such a file can be reset to the beginning, and on some
systems, a program may be able to skip forward or backward n records for some integer —
perhaps only for n= 1. Sequential access, which is depicted in Figure below, is based on a tape
model of a file and works as well on sequential-access devices as it does on random-access
ones.

Figure: Sequential file Access


2. Direct Access

Another method is direct access (or relative access). Here, a file is made up of fixed-length
logical records that allow programs to read and write records rapidly in no particular order.
The direct-access method is based on a disk model of a file, since disks allow random access
to any file block. For direct access, the file is viewed as a numbered sequence of blocks or
records. Thus, we may read block 14, then read block 53, and then write block 7. There are
no restrictions on the order of reading or writing for a direct-access file.
Direct-access files are of great use for immediate access to large amounts of information.
Databases are often of this type. When a query concerning a particular subject arrives, we
compute which block contains the answer and then read that block directly to provide the
desired information.

Figure: Direct file Access

3. Indexed File Access

Indexed file access is a method that incorporates the benefits of both sequential and
direct file access. This method involves creating an index file that maps logical keys or data
elements to their corresponding physical addresses within the file. Moreover, the system
stores the index separately from the data file, enabling quick access to locate the desired
data. Indexed file access is best suited for applications that require fast access to particular
data elements within a large file. it requires additional storage space for the index, which
can increase the cost and complexity of the system.
If a file can be sorted on any of the filed then an index can be assigned to a group of certain records.
However, A particular record can be accessed by its index. The index is nothing but the address of
a record in the file. In index accessing, searching in a large database became very quick and easy
but we need to have some extra space in the memory to store the index value.

SEARCHING

As the number of files in your folders increases, browsing through folders becomes a cumbersome
way of looking for files. However, you can find the file you need from among thousands of photos,
texts and other files by using the search function of your operating system. The search functio n
allows you to look for files and folders based on file properties such as the file name, save date or
size. The search function allows you to look for files and folders based on file properties such as
the file name, save date or size.

In Windows, you can search for files quickly by clicking the Start button at the bottom left of the
screen. Then, simply type the full or partial name of the file, program or folder. The search begins
as soon as you start typing, and the results will appear above the search field. If the file or program
you are looking for does not appear right away, wait a moment, as the search can take a while.
Also note that the search results are grouped by file type. If you are unable to find the file you are
looking for by using the quick search, you can narrow the search results by file type by clicking
the icons above the search term. You can narrow down your search results to: apps, settings,
documents, folders, photos, videos and music. Let’s say you recently downloaded a few photos
that were attached to an email message, but now you’re not sure where these files are on your
computer. If you’re struggling to find a file, you can always search for it. Searching allows you to
look for any file on your computer. To do this, click the Spotlight icon in the top-right corner of
the screen, then type the file name or keywords in the search box. The search results will appear
as you type. Simply click a file or folder to open it. If you’re using the search option, try using
different terms in your search.

For example, if you’re looking for a certain Microsoft Word document, try searching for a few
different file names you might have used when saving the document.

You might also like