You are on page 1of 21

1 File Structures (17IS62)

File Structure CO’s


C362.1: Describe the creation of efficient file Structures and perform
different operations on files to organize them properly on secondary
1

storage devices. [K2]


Introduction to File Structures

C362.2: Explain different methods to organize record structures in


files. [K2]
C362.3: Apply concept of matching and merging on different models
of large files. [K3]
C362.4: Perform operations on files using the concepts of B-trees, B+
trees & apply indexing and multilevel indexing on files. [K2]
C362.5: Apply hashing and extendible hashing on files for efficient
retrieval of records. [K3]
File Structures
1 Introduction to File Structures
Introduction
Introduction to
to File
File Organization
Organization
Data processing from a computer science perspective:
1

–Storage of data
Introduction to File Structures

–Organization of data
–Access to data
This will be built on your knowledge of
Data Structures

File Structures
Introduction
Introduction to
to File
File Organization
Organization
Data processing from a computer science perspective:
1

–Storage of data
Introduction to File Structures

–Organization of data
–Access to data
This will be built on your knowledge of
Data Structures

File Structures
Computer
ComputerArchitecture
Architecture
1

CPU Differences
Registers
Introduction to File Structures

—Fast
Cache —Small
—Expensive
—Volatile
RAM Main Memory
(Semiconductor) —Slow
—Large
Disk, Tape, Second Storage —Cheap
DVD-R —Stable

File Structures
Memory
Memory Hierarchy
Hierarchy
1
Introduction to File Structures

File Structures
Memory
Memory Hierarchy
Hierarchy

On systems with 32-bit addressing, only 232 bytes can be


1

directly referenced in main memory.


Introduction to File Structures

The number of data objects may exceed this number!


Data must be maintained across program executions. This
requires storage devices that retain information when the
computer is restarted.
– We call such storage nonvolatile.
– Primary storage is usually volatile, whereas secondary and tertiary
storage are nonvolatile.

File Structures
Definition
Definition
1

File Structures is the Organization of Data in Secondary


Storage Device in such a way that minimize the access
Introduction to File Structures

time and the storage space.


A File Structure is a combination of representations for
data in files and of operations for accessing the data.
A File Structure allows applications to read, write and
modify data. It might also support finding the data that
matches some search criteria or reading through the data
in some particular order.

File Structures
Why
WhyStudy
StudyFile
FileStructure
StructureDesign?
Design?
I.I.Data
DataStorage
Storage
1

Computer Data can be stored in three kinds of


locations:
– Primary Storage ==> Memory
Introduction to File Structures

[Computer
Memory]
Our – Secondary Storage [Online Disk/ Tape/ CDRom
Focus that can be accessed by the computer]
– Tertiary Storage ==> Archival Data [Offline
Disk/Tape/ CDRom not directly available to the
computer.]

File Structures
II.
II.Memory
Memoryversus
versusSecondary
SecondaryStorage
Storage
1

Secondary storage such as disks can pack thousands of


megabytes in a small physical location.
Introduction to File Structures

Computer Memory (RAM) is limited.


However, relative to Memory, access to secondary
storage is extremely slow [E.g., getting information
from slow RAM takes 120. 10-9 seconds (= 120
nanoseconds) while getting information from Disk
takes 30. 10-3 seconds (= 30 milliseconds)]

File Structures
III.
III.How
HowCan
CanSecondary
SecondaryStorage
StorageAccess
AccessTime
Timebe
beImproved?
Improved?
1

By improving the File Structure.


Introduction to File Structures

Since the details of the representation of the data and the


implementation of the operations determine the efficiency of
the file structure for particular applications, improving these
details can help improve secondary storage access time.

File Structures
Overview
Overviewof ofFile
FileStructure
StructureDesign
Design
I.I.General
GeneralGoals
Goals
1
Introduction to File Structures

Get the information we need with one access to the disk.


If that’s not possible, then get the information with as few accesses
as possible.
Group information so that we are likely to get everything we need
with only one trip to the disk.

File Structures
II.
II.Fixed
Fixedversus
versusDynamic
DynamicFiles
Files
1

It is relatively easy to come up with file structure designs


that meet the general goals when the files never change.
Introduction to File Structures

When files grow or shrink when information is added and


deleted, it is much more difficult.

File Structures
II.
II.The
Theemergence
emergenceof
ofDisks
Disksand
andIndexes
Indexes

As files grew very large, unaided sequential access was not a good
1

solution.
Disks allowed for direct access.
Introduction to File Structures

Indexes made it possible to keep a list of keys and pointers in a


small file that could be searched very quickly.
With the key and pointer, the user had direct access to the large,
primary file.

File Structures
How
How Fast?
Fast?
Typical times for getting info
1

120 10 9
– Main memory: ~120 nanoseconds =
Introduction to File Structures

30 10 6
– Magnetic Disks: ~30 milliseconds =
An analogy keeping same time proportion as above
– Looking at the index of a book: 20 seconds
versus
– Going to the library: 58 days

File Structures
Comparison
Comparison
Main Memory
1

– Fast (since electronic)


Introduction to File Structures

– Small (since expensive)


– Volatile (information is lost when power failure occurs)
Secondary Storage
– Slow (since electronic and mechanical)
– Large (since cheap)
– Stable, persistent (information is preserved longer)

File Structures
Goal
Goal of
of the
the Course
Course
Minimize number of trips to the disk in order to get desired
1

information. Ideally get what we need in one disk access


Introduction to File Structures

or get it with as few disk access as possible.


Grouping related information so that we are likely to get
everything we need with only one trip to the disk (e.g.
name, address, phone number, account balance).

Locality of Reference in Time and Space

File Structures
Good
Good File
File Structure
Structure Design
Design

Fast access to great capacity


1

Reduce the number of disk accesses


Introduction to File Structures

By collecting data into buffers, blocks or buckets


Manage growth by splitting these collections

File Structures
History
History of
of File
File Structure
Structure Design
Design
1. In the beginning… it was the tape
1

– Sequential access
– Access cost proportional to size of file
Introduction to File Structures

[Analogy to sequential access to array data structure]


2. Disks became more common
– Direct access
[Analogy to access to position in array]
– Indexes were invented
•list of keys and points stored in small file
•allows direct access to a large primary file Great if
index fits into main memory.
As file grows we have the same problem we had with a large
primary file
File Structures
History
History of
of File
File Structure
Structure Design
Design
3. Tree structures emerged for main memory (1960`s)
1

– Binary search trees (BST`s)


– Balanced, self adjusting BST`s: e.g. AVL trees (1963)
Introduction to File Structures

4. A tree structure suitable for files was invented:


B trees (1979) and B+ trees
good for accessing millions of records with 3 or 4 disk accesses.
5. What about getting info with a single request?
– Hashing Tables (Theory developed over 60’s and 70’s but still
a research topic)
good when files do not change too much in time.
– Expandable, dynamic hashing (late 70’s and 80’s)
one or two disk accesses even if file grows dramatically

File Structures

You might also like