Chapter 1 - Introduction To File Structures

1 File Structures (17IS62)
File Structure CO’s

C362.1: Describe the creation of efficient file Structures and perform
different operations on files to organize them properly on secondary
1
storage devices. [K2]

Introduction to File Structures
C362.2: Explain different methods to organize record structures in

files. [K2]
C362.3: Apply concept of matching and merging on different models
of large files. [K3]
C362.4: Perform operations on files using the concepts of B-trees, B+
trees & apply indexing and multilevel indexing on files. [K2]
C362.5: Apply hashing and extendible hashing on files for efficient
retrieval of records. [K3]
File Structures
1 Introduction to File Structures
Introduction
Introduction to
to File
File Organization
Organization
Data processing from a computer science perspective:
1
–Storage of data
–Organization of data
–Access to data
This will be built on your knowledge of
Data Structures
File Structures
Introduction
Introduction to
to File
File Organization
Organization
Data processing from a computer science perspective:
1
–Storage of data
–Organization of data
–Access to data
This will be built on your knowledge of
Data Structures
File Structures
Computer
ComputerArchitecture
Architecture
1
CPU Differences
Registers
—Fast
Cache —Small
—Expensive
—Volatile
RAM Main Memory
(Semiconductor) —Slow
—Large
Disk, Tape, Second Storage —Cheap
DVD-R —Stable
File Structures
Memory
Memory Hierarchy
Hierarchy
1
File Structures
Memory
Memory Hierarchy
Hierarchy
On systems with 32-bit addressing, only 232 bytes can be

1
directly referenced in main memory.

The number of data objects may exceed this number!

Data must be maintained across program executions. This
requires storage devices that retain information when the
computer is restarted.
– We call such storage nonvolatile.
– Primary storage is usually volatile, whereas secondary and tertiary
storage are nonvolatile.
File Structures
Definition
Definition
1
File Structures is the Organization of Data in Secondary

Storage Device in such a way that minimize the access
time and the storage space.

A File Structure is a combination of representations for
data in files and of operations for accessing the data.
A File Structure allows applications to read, write and
modify data. It might also support finding the data that
matches some search criteria or reading through the data
in some particular order.
File Structures
Why
WhyStudy
StudyFile
FileStructure
StructureDesign?
Design?
I.I.Data
DataStorage
Storage
1
Computer Data can be stored in three kinds of

locations:
– Primary Storage ==> Memory
[Computer
Memory]
Our – Secondary Storage [Online Disk/ Tape/ CDRom
Focus that can be accessed by the computer]
– Tertiary Storage ==> Archival Data [Offline
Disk/Tape/ CDRom not directly available to the
computer.]
File Structures
II.
II.Memory
Memoryversus
versusSecondary
SecondaryStorage
Storage
1
Secondary storage such as disks can pack thousands of

megabytes in a small physical location.
Computer Memory (RAM) is limited.

However, relative to Memory, access to secondary
storage is extremely slow [E.g., getting information
from slow RAM takes 120. 10-9 seconds (= 120
nanoseconds) while getting information from Disk
takes 30. 10-3 seconds (= 30 milliseconds)]
File Structures
III.
III.How
HowCan
CanSecondary
SecondaryStorage
StorageAccess
AccessTime
Timebe
beImproved?
Improved?
1
By improving the File Structure.

Since the details of the representation of the data and the

implementation of the operations determine the efficiency of
the file structure for particular applications, improving these
details can help improve secondary storage access time.
File Structures
Overview
Overviewof ofFile
FileStructure
StructureDesign
Design
I.I.General
GeneralGoals
Goals
1
Get the information we need with one access to the disk.

If that’s not possible, then get the information with as few accesses
as possible.
Group information so that we are likely to get everything we need
with only one trip to the disk.
File Structures
II.
II.Fixed
Fixedversus
versusDynamic
DynamicFiles
Files
1
It is relatively easy to come up with file structure designs

that meet the general goals when the files never change.
When files grow or shrink when information is added and

deleted, it is much more difficult.
File Structures
II.
II.The
Theemergence
emergenceof
ofDisks
Disksand
andIndexes
Indexes
As files grew very large, unaided sequential access was not a good
1
solution.
Disks allowed for direct access.
Indexes made it possible to keep a list of keys and pointers in a

small file that could be searched very quickly.
With the key and pointer, the user had direct access to the large,
primary file.
File Structures
How
How Fast?
Fast?
Typical times for getting info
1
120 10 9
– Main memory: ~120 nanoseconds =
30 10 6
– Magnetic Disks: ~30 milliseconds =
An analogy keeping same time proportion as above
– Looking at the index of a book: 20 seconds
versus
– Going to the library: 58 days
File Structures
Comparison
Comparison
Main Memory
1
– Fast (since electronic)

– Small (since expensive)

– Volatile (information is lost when power failure occurs)
Secondary Storage
– Slow (since electronic and mechanical)
– Large (since cheap)
– Stable, persistent (information is preserved longer)
File Structures
Goal
Goal of
of the
the Course
Course
Minimize number of trips to the disk in order to get desired
1
information. Ideally get what we need in one disk access

or get it with as few disk access as possible.

Grouping related information so that we are likely to get
everything we need with only one trip to the disk (e.g.
name, address, phone number, account balance).
Locality of Reference in Time and Space
File Structures
Good
Good File
File Structure
Structure Design
Design
Fast access to great capacity

1
Reduce the number of disk accesses

By collecting data into buffers, blocks or buckets

Manage growth by splitting these collections
File Structures
History
History of
of File
File Structure
Structure Design
Design
1. In the beginning… it was the tape
1
– Sequential access
– Access cost proportional to size of file
[Analogy to sequential access to array data structure]

2. Disks became more common
– Direct access
[Analogy to access to position in array]
– Indexes were invented
•list of keys and points stored in small file
•allows direct access to a large primary file Great if
index fits into main memory.
As file grows we have the same problem we had with a large
primary file
File Structures
History
History of
of File
File Structure
Structure Design
Design
3. Tree structures emerged for main memory (1960`s)
1
– Binary search trees (BST`s)

– Balanced, self adjusting BST`s: e.g. AVL trees (1963)
4. A tree structure suitable for files was invented:

B trees (1979) and B+ trees
good for accessing millions of records with 3 or 4 disk accesses.
5. What about getting info with a single request?
– Hashing Tables (Theory developed over 60’s and 70’s but still
a research topic)
good when files do not change too much in time.
– Expandable, dynamic hashing (late 70’s and 80’s)
one or two disk accesses even if file grows dramatically
File Structures

Chapter 1 - Introduction To File Structures

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 1 - Introduction To File Structures

Uploaded by

Copyright:

Available Formats

1 File Structures (17IS62)

File Structure CO’s

storage devices. [K2]

C362.2: Explain different methods to organize record structures in

On systems with 32-bit addressing, only 232 bytes can be

directly referenced in main memory.

The number of data objects may exceed this number!

File Structures is the Organization of Data in Secondary

time and the storage space.

Computer Data can be stored in three kinds of

Secondary storage such as disks can pack thousands of

Computer Memory (RAM) is limited.

By improving the File Structure.

Since the details of the representation of the data and the

Get the information we need with one access to the disk.

It is relatively easy to come up with file structure designs

When files grow or shrink when information is added and

Indexes made it possible to keep a list of keys and pointers in a

– Fast (since electronic)

– Small (since expensive)

information. Ideally get what we need in one disk access

or get it with as few disk access as possible.

Locality of Reference in Time and Space

Fast access to great capacity

Reduce the number of disk accesses

By collecting data into buffers, blocks or buckets

[Analogy to sequential access to array data structure]

– Binary search trees (BST`s)

4. A tree structure suitable for files was invented:

You might also like