You are on page 1of 58

Data base Management


Chapter Six:
Data Storage and Querying
Storage and File Structure
Indexing and Hashing
Query Processing and Optimization

Storage and File Structure
A database system is designed to hold large size of
data that need to be physically (permanently) on the
storage medium. The storage medium in a computer
can be categorized as:
Primary Storage
Secondary Storage
Tertiary Storage

Primary Storage
Storage media that have direct access to the
CPU: the main memory and the cache.
Cache is the lowest level in the memory hierarchy
that is built inside the microprocessor chip.
Typically the response time is in nanoseconds
The main memory is the next level in the
hierarchy that provides the main working
environment for the CPU to keep the programs
and data.

Secondary Storage
Storage media for permanent storage such as
magnetic disk and optical disk. Larger in
capacity but significantly slower than the
primary storages. Typical response time is in
milliseconds. The secondary storage is used as
a virtual memory, disk storage, and file

Tertiary Storage
Storage media that are used for archive and
backup storage data, such as magnetic tape.
Typical response time is few seconds or even
in minutes.
Virtual memory is storage on the disk that
can is often addressed by 32 bit address space,
hence 232~4GB of data can be managed.

Secondary Storage Device (Disk)
The disk drive consists of two movement
Disk Assembly
Head Assembly
The disk consists of circular platters that are
rotating around the spindle by the disk
Each platter has surface covered with a thin
layer of magnetic material.
Magnetic Hard Disk Mechanism

NOTE: Diagram is schematic, and simplifies the structure of actual disk drives
Secondary Storage Device (Disk)
The platters may be double-sided (dual
surface) both upper and lower or single-
sided. The surfaces are organized into tracks
that are concentric circles of distinct diameter
in each platter.
The corresponding tracks in the disc pack
(platter) form cylinders.
The trackers are further divided into sectors
which are segment of the circle separated by
gaps. 10
Secondary Storage Device (Disk)
The head assembly of the disk is placing the disk
heads for each surface closer to the track and the
disk assembly rotates the disk to locate the first
sector to be read or written. The movement of the
disk assembly and the head assembly for data
read/write is managed by a processor known as disk
While sectors are physical units of disk for bit
storage, blocks are logical that are set during disk
formatting the operating system. Typical size of
blocks is in a range form 512 to 4096 bytes.
A disk is having 8 double-sided platters. Each surface
is divided into 214=16384 tracks with 128 sectors.
There is 4096bytes space per sector. Determine the
size of the disk.
Bytes per sector = 4,096bytes
Bytes per track = 128*4096 = 524,288bytes
Bytes per surface = 16384*524288 = 858,9934,592bytes
Bytes in disk = 16*8589934592 = 137,438,953,472bytes
Disk Size = 128GB

An HDD (hard disk drive) is labeled with parameters
given below. Determine the permissible sector size.
16383 Cylinders
16 Heads
224 Sectors per Track.

Access time the time it takes from when a read or write
request is issued to when data transfer begins.
2. Disk rotation until the desired
sector arrives under the head:
3. Disk rotation until sector
Rotational latency (0-10s ms)
has passed under the head:
Data transfer time (< 1 ms) 1. Head movement
from current position
2 to desired cylinder:
1 Seek time (0-10s ms)

The three components of disk access time. Disks that spin
faster have a shorter average and worst-case access time.
Data Representation
Data is stored in a form of record that consists of a collection
related data items. The data items or values forms sequence
of bytes that corresponds to particular fields.
Data type representation:
FLOAT 4 or 8 Bytes
CHAR(n) n Bytes; pad character () is used to fill in unused
characters bytes.
VARCHAR(n) maximum of n+1 Bytes; unused characters bytes are
Enumerated types represent integer codes with the request bytes.

Fixed Length Record
Example: Consider the Employees table:
Employees(EmpId, Name, BDate, Address, Salary)
EmpId INTGER 4 Bytes
Name CHAR(30) 30 Bytes
BDate DATETIME 8 Bytes
Address VARCHAR(50) 51 Bytes
Salary FLOAT 4 Bytes

Thus the record is represented as:

The record takes 97 Bytes. 16
The number of bytes at which a field begins is said
to be the offset of the field.
Thus offset of EmpId is 0, Name is 4, BDate is 34,
In some machines the offset is required to be a
multiple of 4 numbers
Example: The Employee record in multiple of 4
offset representation is as follows

The record takes 100 Bytes.

Record Header
A record representation may include information
that may describe the record in a form of a record
header. A record header may consist of:
The record schema: a pointer to the schema definition.
The length of tee record.
Time stamp for the record.

The record size is 112 Bytes. 18

Records are packed into blocks having block header
as well.

The block header may consist of:

Link to one or more other blocks.
Information about the block.
Information about the relation.
Directory for the offset of each record.
Block ID.
Time stamp for the block.
Variable Length Record
Variable-length records arise in database systems in several ways:
Storage of multiple record types in a file.
Record types that allow variable lengths for one or more fields such
as strings (varchar)
Record types that allow repeating fields (used in some older data
Attributes are stored in order
Variable length attributes represented by fixed size (offset, length), with
actual data stored after all fixed length attributes
Null values represented by null-value bitmap

Variable-Length Records:
Slotted Page Structure

Slotted page header contains:

number of record entries
end of free space in the block
location and size of each record
Records can be moved around within a page to keep them
contiguous with no empty space between them; entry in the header
must be updated.
Pointers should not point directly to record instead they should
point to the entry for the record in header. 21
File Organization
File organization refers to the method of arranging a data of
file into records on external storage. Records organized on
storage media can be physically located with use of record id.
However, the user expects mainly to apply a search condition
based on certain field in the record. Hence the designer of the
file organization needs to look for a structure that can locate
the records easily.
One possible way is the use of indexes; data structures that
allow to find the record ids of records with given values in
index search key fields.
There are also alternatives, each ideal for some situations, and
not so good in others:
Heap (Random Order) files
Heap Files also known as Pile Files are suitable when
typical access is a file scan retrieving all records. Records
are placed in the file in the order in which they are
inserted where there is space.
Insertion is very efficient: the last disk block of the file
is copied into memory; the new record is added and
rewritten back to the disk.
Searching is expensive: the only search possible is linear
(exhaustive) search of block by block.

Deletion requires periodic reorganization: the record to
be deleted is located and the block is fetched to memory
the record is then deleted and the block is rewritten to the
disk. A deletion mark may be used to mark delete record
and different mark is used for valid record. The file is
reorganized by accessing each block and packing records
and removing deleted records for claiming unused spaces.

Sorted Files
Sorted Files also known as Sequential Files are best if
records must be retrieved in some order, or only a range
of records is needed. The records are physically ordered
based on the value of the desired field.
Insertion is expensive: the proper location for the
incoming record needs to be located and space has to be
created (may require data movement) then can only the
record be added. To minimize time and improve insertion
efficiency space may be interleaved between block as
overflow (transaction) block.
Searching is efficient: binary search is applicable in the
ordering key. But searching with the other criteria is
similar to the heap file organization.
Deletion is expensive: similar to the insertion operation
deletion may also involve large data movement.
Update: may require data reorganization if the updated
field is the ordering key, otherwise update operation is
simple operation that requires block reading; modifying
the record and rewriting the block back to disk.
Indexes are data structures to organize records via trees
or hashing. Like sorted files, they speed up searches for a
subset of records, based on values in certain search key
fields. Updates are much faster than in sorted files.

A file organization based on hashing provides fast access
to records on certain search condition. In hashing a hash
function also know as randomizing function, h() that is
applied to the hash field value to yield an address of the
disk block in which the record is to be stored.
The B-Tree data structure can be used as the primary
organization of the records. B-Tree is also used in

Indexing and Hashing
Basic Concepts
Indexes are auxiliary access structures that are used to speed
up the retrieval of records in response of a certain search
Search Key - attribute to set of attributes used to look up
records in a file.
An index file consists of records (called index entries) of the

Index files are typically much smaller than the original file
There are two basic kinds of indexes:
Ordered Indexes: Sorted order of the values in a key field.
Hash Indexes: Uniform distribution of values across a range of
buckets based on a hash function. 29
Ordered Indexes
A file with a record structure having several fields
(or attributes) is often accessed through an index
structure defined on a single field of the file called
search key or indexing field.
A single file may have several index structures on
various search keys.
If the file is physically organized sequentially in the
search key then the index is said to be Primary
Index or Clustering Index;
However, if the search key specifies an order
different from the sequential order of the file are
called Secondary Index or Non-clustering Index. 30
Primary Index
An index record (or index entry) is a separate file
from the data file that consists of the search key
values and pointers to one or more records. The file
is an ordered record with the search key and the
pointer identifies a disk block and an offset within
the block to identify the record.
The primary index is mostly built from the primary
key and in this context the record has distinct values
in the search key.
There are two types of ordered indexes namely
dense index and sparse index. 31
Dense Index
has an index record for every search key in the data
file. The number of entries in a dense index is equal
to the number of records in the data file.
Example: index on ID attribute of instructor

Dense index on dept_name, with instructor file
sorted on dept_name

Sparse (Non-dense) Index
has index entry for only the first records in a block
known as anchor record of the block. The numbers
of entries in the index file is equal the number of
blocks for the data file.
To locate a record with the help of the search key
from a sparse index a block that is pointed by the
index entry with the largest search key value that is
less than or equal to the searched value is read.


NOTE: A single data file can have only one primary

or clustering index.
Secondary Index
Secondary indexes provide a secondary way of
accessing the data file. Since the data file is not
organized in the search key of the secondary index a
block anchor can not be used for having a sparse
index in the secondary index. Hence a secondary is
necessarily a dense index.
Secondary indexes enhance the performance of
queries that use keys other than the search key of
the primary index.

Example: Secondary index on salary field of

Multilevel Indexes
The main reason for having an index file is to have
better search algorithm such as binary search that
reduces response time considerably. The binary
search requires (log2bi) block access for an index file
having bi blocks.
For large data size the index file will also increase in
size and it may not be kept in memory, hence require
several disk block reads.

Consider a data file having 10,000,000 records and
the block size is 10 records for data and 100 for
index (block factor). Determine the maximum
number of block access for a data search using:
a. A sequential search on the dense primary index
search key.
b. A sequential search on the sparse primary index
search key.
c. A binary search on a dense primary index search
key, and
d. A binary search on a sparse primary index
search key. 39
a. 10,000,000 + 1 = 10,000,001 blocks
b. 10,000,000/10 + 1= 1,000,001 blocks
c. log2(10,000,000) + 1= 25 blocks
d. log2(1,000,000) + 1= 21 blocks
REAMRK: Block Factor (bfr) is the ratio of the
block size to record size either for the data file or
index file. That is, a number of data or index records
that can fit in a block.

To deal this problem, the primary index file (also
applicable for secondary index) is treated as the data
file and a sparse index is built on top of it. The idea
behind this logic is a multilevel index that reduces
block access for reading the index file as well. The
index file that is used for creating the other primary
index is referred to as the first-level index of the
multilevel index and the index on the first index is
called second-level index of the multilevel index,
and so on

For the previous example, consider a two-level
index, then
a. A binary search for the dense primary index will result:
log2(10,000,000/100) + 1 + 1= 19 block access
b. A binary search for the sparse primary index will result:
log2(1,000,000/100) + 1+ 1 = 16 block access
For a three-level sparse primary index the binary
search will have:
log2(10,000/100) + 1+ 1 = 9 block access

B-Tree Index Files
B+-Tree Index Structure
The B+-tree index structure is a form a balanced tree
in which every path from the root of the tree to a
leaf of the tree is equal length. Each non-leaf node
in the tree has between n/2 and n children where n
is fixed for a particular tree.
The B+-tree index structure imposes performance
overhead on insertion and deletion and adds space
overhead too; however it alleviates degradation on
performance as the file grows, both for index lookup
and for sequential data scan.
A typical node of a B+-tree index structure is as

K1, K2, Kn-1 are the search key values and

P1, P2, Pn are pointers that point to either a file or
record if the node is a leaf or a next level node in the tree
structure otherwise.
The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn1
(Initially assume no duplicate keys, address duplicates later)

B-Tree Index Structure
B-Tree index structure is a B+-tree index structure
that does not allow the repetition of search key
Typical nodes of a B-tree index structure are as

(a) Leaf Node (b) Nonleaf Node


(Read on Insertion and Deletion of a node from a

B+-Tree/B-Tree index structure.)

Hash Index
The Hashing technique on file organization avoids
the need for accessing index structure that may
require more disk access (I/O operation).
Using hashing file organization the block of a record
is determined by computing a hash function on the
search key.
A storage that can store one or more records having
similar hash function result is referred to as bucket.

The hash function takes the search keys and
uniformly randomizes the records in the buckets.
Uniform distribution: the hash function assigns each
bucket the same number of search key values from the set
of all possible search key values.
Random distribution: the hash value will not be
correlated to any externally visible ordering on the search
key values; the hash function will appear to be random.
Example: A hash function that finds the sum of the
binary representation of the characters in the search
key value and take modulo to the number of buckets.
Bucket Overflow
The main reasons for bucket overflow are:
Insufficient Buckets: the number of buckets assigned may
not be sufficient for the current data size. The number of
buckets (nB) must be chosen in such way that it is greater
than the number records (nT) divided by the number of
records that can fit in a bucket (fT). That is nB>nT/fT.
Skew: some buckets may hold more records than others,
and they may go overflow while the others are still having
space. The major reason for skew are:
Multiple records for same search key,
Non-uniform distribution of search key by the hash
Best solution for the overflow of buckets is the use
of dynamic hashing (example: extendable hashing)
that can be modified dynamically to accommodate
the growth or the shrinkage of the database. But if a
static has is to be used then to avoid the
consequences of overflow one can choose either of
the following options:
Choose a hash function based on the current size, or
Choose a hash function based on the anticipated size of
the file, or
Periodically organize the hash structure in response to file
growth or shrinkage. 52
Query Processing and
Query Processing and
Query Processing refers to the range of activities
involved in extracting data from a database. The
basic steps in query processing are:
Parsing and Translation

Steps in Query Processing

Parser and Translator: The parser part of the
Parser and Translator phase of the query processing
is the one that is responsible for identifying the
language tokens such as SQL keywords, attribute
names, and relation names in the text query and
checks for the query syntax. An internal
representation of query is created as a tree data
structure known as query tree. The translator then
translates the query blocks from the query data
structure into relational algebra expressions.

Optimizer: The optimizer phase of the query
processing optimizes the relational algebra
expression using various algorithms for the query
blocks and produces an evaluation plan for execution.
The optimizer evaluates the cost of operations to
select the optimized evaluation plan.
Evaluation Engine: The evaluation engine also
known as Query Execution Engine takes a query
evaluation plan from the optimizer, executes the
plan, and returns the answer to the query.
-End of chapter six-