Unit 3

Unit 3
Filing and filing structure
G.S.Gupta
PCMS, Chitawan
Storage and file structure

we have to do the work with higher-level models of a database. For example, at the
conceptual or logical level, we viewed the database, in the relational model, as a
collection of tables. Indeed, the logical model of the database is the correct level for
database users to focus on. This is because the goal of a database system is to
simplify and facility and facilitate access to data; use of the system should not be
burdened unnecessarily with the physical details of the implementation of the
system. We start with characteristics of the underlying storage media, such as disk
and tape systems. We then define various data structures that will allow fast access
to data. We consider several alternative structures, each best suited to a different
kind of access to data. The final choice of data structure needs to be made on the
basis of the expected use of the system and of the physical characteristics of the
specific machine.
Overview of physical storage media
Several types of data storage exist in most computer systems. These storage media
are classified by the speed with witch data can be accessed, by the cost per unit of
data to buy the medium, and by the mediums reliability. Among the media typically
available are these:
1) Cache. It is the memory which is work between the cpu and main memory.
The cache is the fastest and most costly for of storage. Cache memory is
small; is use is managed by the computer system hardware. We shall not be
concerned about managing cache storage in the database system.
2) Main memory. The storage medium used for data that are available to be
operated on is main memory. The general-purpose machine instruction
operate on main memory. Although main memory may contain megabytes of
data or even gigabytes of data in large server systems, it is generally too
small (or too expensive) for storing the entire database. The contents of main
memory are usually lost if a power failure or system crash occurs.
3) Flash memory.
(Electrically erasable programmable read-only memory (EEPROM))
Flash memory differs from main memory in that data survive power failure.
Reading data from flash memory takes less than 100 nanoseconds (a nanosecond
is 1 / 1000 of a microsecond), which is roughly as fast as reading data from main
memory. However, writing data to flash memory is more complicated- data can
be written once, which takes about 4 to 10 microseconds, but cannot be
overwritten directly. To overwrite memory that has been written already, we have
to erase an entire bank of memory at once; it is then ready to be written again. A
drawback of flash memory is that it can support only a limited number of erase
cycles, ranging from 10,000 to 1 million. Flash memory has found popularity as a
replacement for magnetic disks for storing small volumes of data (5 to 10
megabytes) in low-cost computer systems, such as computer systems that are
embedded I other devices, in hand- held computers, and other digital electronic
devices such as digital cameras .
Page :1
Unit 3
G.S.Gupta
PCMS, Chitawan
4) Magnetic-disk storage. The primary medium for the long-term on-line

storage of data is the magnetic disk. The system must move the data fro disk
to memory so that they can be accessed. After the system has performed the
designated operations, the data that have been modified must be written to
disk.
The size of magnetic disks currently ranges from a few gigabytes to 80
gigabytes. Both the lower and upper end of this lrange has een growing at about
50 percent per year, and we can expect much larger capacity disks every year.
Disk storage survives power failures and system crashes. Disk-storage devices
themselves may sometimes fail and thus destroy data, but such failures usually
occur much less frequently them do system crashes.
5) Optical storage. The most popular forms of optical storage are the compact
disk (CD), which can hold about 640 megabytes of data, and the digital
video disk (DVD) which can hold 4.7 or 8.5 gigabytes of data per side of the
disk (or up to 17 gigabytes on a two-sided disk). Data are stored optically on
a disk, and are read by a laser. The optical disks used in read-only compact
disks (CD-ROM) or read-only digital video disk (DVD-ROM) cannot be written,
but (CD-ROM) or read-only digital video disk (DVD-ROM) cannot be written,
but are supplied with data prerecorded.
There are record-once versions of compact disk ( called CD-R) and
digital video disk (called DVD-R),which can be written only once; such disks are
also called write-once, read-many (WORM) DISKS. There are also multiplewrite versions of compact disk (allied CD-RW) and digital video disk (DVD-RW
and DVD-RAM), which can be written multiple times. Recordable compact disks
are magnetic-optical storage devices that use optical means to read magnetically
encoded data. Such disks are useful for archival storage of data as well as
distribution of data.
6) Jukebox systems contain a few drives and numerous disks that can be
loaded into one the drives automatically (by a root arm) on demand.
7) Tape storage. Tape storage is used primarily for backup and archival data.
Although magnetic tape is much cheaper than disks, access to data is much
slower, because the tape must be accessed sequentially from the beginning.
For this reason, tape storage is referred to as sequential-access storage. In
contrast, disk storage is referred to as direct-access storage because it is
possible to read data from any location on disk.
Tapes have a high capacity ( 40 gigabyte to 300 gigabytes tapes are
currently available), and can be removed from the tape drive, so they are well
suited to cheap archival storage. Tape jukeboxes are used to hold exceptionally
large collections of data, such as remote-sensing data from satellites, which ould
include as much as hundreds of terabytes (1 terabyte =10 12 bytes), r even a
petabyte (1 petabyte =1015 bytes) of data.
The various storage media can be organized in a hierarchy according to their
speed and their cost. The higher levels are expensive, but are fast. As we move
down the hierarchy, the cost per bit decreases, whereas the access time
increases. This trade-off is reasonable; if a given storage system were both faster
and less expensive than another-other properties being the same-then there
Page :2
Unit 3
G.S.Gupta
PCMS, Chitawan
would be no reason to use the slower, ore expensive memory. In fact, many
early storage devices, including paper tape and core memories, are relegated to
museums now that magnetic tape and semiconductor memory have become
faster and cheaper. Magnetic tapes themselves were used to store active data
back when disks were expensive and had low
File Organization
A file is organized as a sequence of records. These records are mapped onto disk
blocks. Files are provided as a basic construct in operating systems, so we shall
assume the existence of an underlying file system. WE need to consider ways of
representing logical data models in terms of files.
To reduce block-access time, we can organize blocks on disk in a way that
corresponds closely to the way we expect data to be accessed. For example, if we
expect a file to be accessed sequentially, then we should ideally keep all the locks
of the file sequentially on adjacent cylinders. Older operating systems, such as
the IBM mainframe operating systems, provided programmers fine control on
placement of files, allowing a programmer to reserve a set of cylinders for storing
a file.
Subsequent operating systems, such as Unix and personal-computer
operating systems, hide the disk organization from user, and manage the
allocation internally. However, over time, a sequential file may become
fragmented that is, its blocks become scattered all over the disk. To reduce
fragmentation, the system can make a backup copy of the data on disk and
restore the entire disk.
Although blocks are of a fixed size determined by the physical properties of
the disk and by the operating system, record sizes very.
Variable-Length Records
Variable-length records arise in database systems in several ways:
1) Storage of multiple record types in a file
2) Record types that allow variable lengths for one or more fields
3) Record types that allow repeating fields
Byte-String Representation
A simple method for implementing variable-length records is to attach a
special end of-record (??) symbol to the end of each record.
The byte-string representation as described in Figure 11.10 has some
disadvantages:
1) It is not easy to reuse space occupied formerly by a deleted record..
Although techniques exist to manage insertion and deletion, they lead to a
large number of small fragments of disk storage that are wasted.
2) There is no space, in general, for records to grow longer. If a variablelength record becomes longer, it must be moved-movement is costly if
pointers to the record are stored elsewhere in the database (e.g., in
indices, or in other records), since the pointers must be located and
updated.
Page :3
Unit 3
G.S.Gupta
PCMS, Chitawan
The Slotted-page appears in Figure 11.11. There is a header at the beginning

of each block, containing the following information:
1) The number of record entries in the header.
2) The end of free space in the block.
3) An array whose entries contain the location and size of each record.
Fixed-Length Representation
Another way to implement variable-length records efficiently in a file system
is to use one or more fixed-length records to represent one variable-length
record. There are two ways of doing this:
1) Reserved space. If there is a maximum record length that is never
exceeded, we can use fixed-length records of that length. Unused space
(for records shorter than the maximum space) is filled with a special null,
or end-of-record, symbol.
2) List representation. We can represent variable-length records by lists of
fixed length records, chained together by pointers.
Organization of Records in Files
An instance of a relation is a set of records. Given a set of records, the next
question is how to organize them in a file. Several of the possible ways of
organizing records in files are:
1) Heaps file organization. Any record can be placed anywhere in the file
where there is space for the record. There is no ordering of records.
Typically, there is a single file for each relation.
2) Sequential file organization. Records are stored in sequential order,
according to the value of a search key of each record. Section 11.71
describes this organization.
3) Hashing file organization. A hash function is computed on some
attribute of each record. The result of the hash function specifies in which
block of the file the record should be placed. Chapter 12 describes this
organization; it is closely related to the indexing structures described in
that chapter.
Sequential File organization
A sequential file is designed for efficient processing of records in sorted
order based on some search-key. A search key is any attribute or set of
attributes; it need not be the primary key, or even a super key. To permit fast
retrieval of records in search-key order, we chain together records by
pointers. The pointer in each record points to the next record in search-key
order.
A sequential file of account records taken from
our banking
example. In that example, the records are stored in search-key order, using
branch name as the search key.
The sequential file organization allows records to be read in sorted
order; that can be useful for display purposes, as well as for certain queryprocessing algorithms that we shall study
Page :4
Unit 3
G.S.Gupta
PCMS, Chitawan
1) Locate the record in the file that comes before the record to be inserted in
search-key order.
2) If there is a free record (that is, space left a deletion) within the same
block as this record, insert the new record there. Otherwise, insert the
new record in an overflow block. In either case, adjust the pointers so as
to chain together the records in search-key order.
Amit
anamika
Bab
Devils
Miners
Data-Dictionary storage
A relational-database system needs to maintain data about the relations, such
as the schema of the relations. This information is called the data dictionary,
or system catalog. Among the types of information that the system must
store are these:
1) Names of the relations
2) Names of the attributes of each attributes
3) Domains and lengths of attributes
4) Names of views defined on the database, and definitions of those views
5) Integrity constraints (for example, key constraints)
Indexed File Organization
In a sequxed file organization, the records are stored either sequentially or
nonsequentially and an index is created that allows the application software to
locate individual records. Like a card atalog in a library, an index is a table that is
used to determine the location of rows in a file that satisfy some condition. Each
index entry matches a key value with one or more records. An index can point to
unique records or to potentially more than one record. An index that allows each
entry to point to more than one record is called a secondary key index.
Secondary key indexes are important for supporting many reporting requirements
and for providing rapid ad hoc data retrieval. An example would be an index on
the Finish field of a product record.
1) Unique primary index (UPI), which is an index on a unique field, possibly the
primary key of the table, and which not only is used to find table rows based
on this field value but also is used by the DBMS to determine where to store
a row based on the primary index field value.
2) Nonunique primary index (NUPI), which is an index on a nonunique field and
which not only is used to find table rows based on this field value but also is
used by the DBMS to determine where to store a row based on the primary
index field value.
Page :5
Unit 3
G.S.Gupta
PCMS, Chitawan
3) Unique secondary index (USI), which is an index on a unique field and which
is used only to find table rows based on this field value.
4) Nonunique secondary index (NUSI), which is an index on a nonunique field
and which is used only to find table rows based on this field value.
F
B
Amit
Anamika
bab
z
H
Devils
miners
Hashed File Organization

In a hashed file organization, the address of each record is determined using a
hashing algorithm. A hashing algorithm is a routine that converts a primary key
value into a record address. Although there are several variations of hashed files, in
most cases the records are located non sequentially as dictated by the hashing
algorithm. Thus, sequential data processing is impractical
A typical hashing algorithm uses the technique of dividing each primary key
value by suitable prime number and then using the remainder of the division as the
relative storage location.
Hashing and indexing can be combined into what is called a hash index table
to overcome this limitation. A hash index table uses hashing to map a key into a
location in an index, where there is a pointer to the actual data record matching the
hash key. The index is the target of the hashing algorithm, but the actual data are
stored separately from the addresses generated by hashing.
Page :6

Unit 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3

Uploaded by

Copyright:

Available Formats

Unit 3

Filing and filing structure

Storage and file structure

4) Magnetic-disk storage. The primary medium for the long-term on-line

The Slotted-page appears in Figure 11.11. There is a header at the beginning

Hashed File Organization

You might also like