Module 1 Part2

Module-1: chapter4 & 5
Fundamental File Structure

Concepts & Managing Files of
Records
1
Outline I: Fundamental File
Structure Concepts
• Stream Files
• Field Structures
• Reading a Stream of Fields
• Record Structures
• Record Structures that use a length
indicator
2
Outline II: Managing Files of
Records
• Record Access
• More About Record Structures
• File Access and File Organization
• More Complex File Organization and
Access
• Portability and Standardization
3
Field and Record Organization:
Overview
• When we deal with file structures :
– Data to be persistent
– i.e. data read by a file/ written by another file data
should be same.
• The basic logical unit of data is the field
which contains a single data value.
• Fields are organized into aggregates, either as
many copies of a single field (an array) or as
a list of different fields (a record).
4
Field and Record Organization:
Overview
• When a record is stored in memory, we
refer to it as an object and refer to its
fields as members.
• Here we will study the ways that objects
can be represented as records in files.
5
Stream Files
• Here we deal with how data is handled
in streams.
• For E.g.
6
Stream Files
• If our input is as follows
Input 1 Input 2
•Mary Ames •Alan Mason
•123 Maple •90 Eastgate
•S llwater, OK 74075 •Ada, OK 74820
7
Stream Files
• In Stream Files, the information is written as a
stream of bytes containing no added
information as follows:
AmesMary123 MapleStillwaterOK74075MasonAlan90 EastgateAdaOK74820
• Problem: There is no way to get the

information
back in the organized record format.
8
Field Structures
• Due to the above problem we should use
some types of structures.
• There are many ways of adding structure to
files to maintain the identity of fields:
– Force the field into a predictable length
– Begin each field with a length indicator
– Place a delimiter at the end of each field to
separate from next field.
– Use a “keyword = value” expression to identify
each field and its content.
9
Field Structures
• Method 1:Force the field into a predictable length
The last byte

is used for
‘\0’
• Each field is fixed length specified in the above

class/ structure.
• In above class one record =>10+10+15+15+2+9=>61
bytes
10
Field Structures
Method 1 Contd…
• Result looks as follows:
• Problems:
– Wastage of space
• Ames requires 4 bytes but we use 10 bytes.
– If require more space than allotted.
• Solve these by fixing the lengths to larger space.
. 11
Field Structures
Method 2:Begin each field with a length indicator
• Begin each field with the length of that field
value.
• If length is too long we require more space for
length.
• Looks as follows:
12
Field Structures
Method 3: Place a delimiter at the end of each field to
separate from next field.
• Each field is separated by a delimiter.
• Delimiter can be white space characters like blank,
new line, tab
• The above can be used with in the values like blank
can be used in address.
• Hence we use vertical bar character.
13
Field Structures
Method 3:Use a “keyword = value” expression to
identify each field and its content.
This type of method is self-describing.

A unknown person can also understand the contents.
Use full for identifying missing values.
Overhead for few applications which doesn’t demand
this much information.
14
Reading a Stream of Fields
• A Program can easily read a stream of
fields and output ===>
Output
15
Reading a Stream of Fields
• This time, we do preserve the notion of
fields, but something is missing:
– Rather than a stream of fields
– These should be two records
16
Record Structure I
• A record can be defined as a set of fields that
belong together when the file is viewed in
terms of a higher level of organization.
• Like the notion of a field, a record is another
conceptual tool which needs not exist in the
file in any physical sense.
• Yet, they are an important logical notion
included in the file’s structure.
17
Record Structures II
• Methods for organizing the records of a file
include:
– Requiring that the records be a predictable number of
bytes in length.
– Requiring that the records be a predictable number of
fields in length.
– Beginning each record with a length indicator
consisting of a count of the number of bytes that the
record contains.
– Using a second file to keep track of the beginning byte
address for each record.
– Placing a delimiter at the end of each record to
separate it from the next record.
18
Method 1:Requiring that the records be a predictable number of
bytes in length.(fixed length not for field it is for record)
Method 2: Requiring that the records be a predictable number of

fields in length.
19
Method 3:Beginning each record with a length indicator
consisting of a count of the number of bytes that the
record contains.
Method 4:Using a second file to keep track of the

beginning byte address for each record.
20
Method 5:Placing a delimiter at the end of each
record to separate it from the next record.
21
Record Structures that Use a
Length Indicator
• To known how the record structure are dealt
we will consider length indicator method.
• Implementation:
– Writing the variable-length records to the
file
– Representing the record length
– Reading the variable-length record from the
file.
22
Length Indicator
Writing the variable-length records to the file:
–If we want to write length of a record to the initial position.
–We need to know the length of a record
–Hence we will read the data to a buffer then identify the length
using strlen function
23
Length Indicator
Representing the record length:
• 2 byte binary integer
• Convert into character string.
fprintf(file, ’%d’, length); //C stream
stream<<length<<‘ ’; //C++ sream
The above 2 functions inserts the length and places a
space as delimiter.
24
Length Indicator
Reading the variable-length record from the file:
–Read the records from a file
– records is read into buffer
–Then to object p.
–The value from buffer is read into character string
strbuff.
25
Mixing numbers & characters:
Use a file dump Contd..
• The actual length represented in a file
as a character string is as follows:
• If the data needs to be represented as a

2 byte integer:
26
Use a file dump Contd…
• Finally the data will be viewed in a file

as follows:
• When it is 2 byte representation.
27
Use a file dump
• In UNIX platform the data is dumped as
shown.(od – UNIX command)
28
Using Classes to Manipulate
Buffers
• Buffers mainly depends upon whether
they are:
– Fixed length
– Variable length
• It also depends on:
– Delimiter
29
Buffers-I
• Class with delimiter:
30
Buffers-I
• Pack function of a delimiter:
• Practically the data is packed is as

follows:
31
Buffers-I
• Unpack Function (Fields):
// Next field to be read hence NextByte is initialized
32
Buffers-II
• For Fixed length buffers:
33
Buffers-II
• There is initialize function which will
initializes the fields of the file.
34
Using Inheritance for Record
Buffer Classes
• Here we use Inheritance to remove
duplication of code if same procedures
are used by more classes.
• We have seen classes
– fstream , istream, ostream
– fstream inherits input/output operations
from parent class iostream.
– Which is nothing but inherits istream,
ostream
35
Buffer Classes
• They have used multiple inheritance:-

more than one base class.
• Virtual :- ensure that the class ios is
included only once in the hierarchy.
36
Buffer Classes
• 2 main classes
– Iostream (basic stream operations)
– fstreambase( to access the OS file
operations)
37
Buffer Classes
• Class hierarchy for record buffer

objects
38
Buffer Classes
• IOBuffer is the base class
• Protected members- to be used by only
inherited classes
39
Buffer Classes
• All methods are declared virtual : allows
subclass for there own implementation.
• =0 (pure virtual class):-
– IOBuffer doesn’t include implementation of
any method.
– No objects can be created.
40
Buffer Classes
• Write function of variable length buffer class.
• Tellp() : returns position in the output
sequence.
• Returns the address where it has written.
41
Buffer Classes
Here we are checking which function is called.

We are calling DelimFieldBuffer function
42
Assignment-1
• Explain with a program how data is
packed, unpacked with fixed length
records.
• Explain with a program how data is
packed, unpacked with variable length
records.
43
Record Access: Keys
• When looking for an individual record, it is

convenient to identify the record with a key
based on the record’s content (e.g., the Ames
record).
• When we consider to retrieve the record using
key then the key should having following
constraints:
– Canonical form ( rules to define a key)
– uniquely define a record
44
Record Access: Keys
• Rules:
– E.g. if key = AMES
• Then data can be written as Ames / AMES /
ames
• We should design a rule so that what
ever is input :
– It should convert any input to all Caps.
45
Record Access: Keys
• Uniquely key:
– i.e. if there are many records of same
• key : AMES
• To prevent the above:
– Define a primary key
– Which is unique to a record
• We can also create a secondary key in
support to the primary key.
46
Record Access: Keys
• When we choose a primary key we
should be careful as it contains real
data:
• Key should be unchangeable.
• To avoid the above problem we should
not choose data of a record as key
discussed later.
47
Record Access:
Using Sequential Search
• Evaluating Performance of Sequential
Search.
• Improving Sequential Search
Performance with Record Blocking.
• When is Sequential Search Useful?
48
Record Access:
Evaluating Performance of Sequential
Search:
– Best case: 1
– Average case: n/2
– Worst case: n
Sequential search steps:
– Read calls for each record
– To perform read the seek required to read a record.
– E.g.. 10 records=>10 read calls => 10 seek
– Seeking takes more time than read.
49
Record Access:
Improving Sequential Search
Performance with Record Blocking:
•If we have 100 records =>100 read calls
•Hence make a block of records
– E.g. 1 Block => 10 records
– Then 10 read calls => 10 blocks
– Block size will almost be of sector oriented.
– If 1 sector => 512 bytes => 10 records
50
Record Access:
Points of record blocking:
– Searching is still O(n) as no of records are
same.
– Seek time is reduced
– The amount of data transfer is more.
• Even if need to access the first record.
– Too expensive
51
Record Access:
When is Sequential Search Good?
– It is extremely easy to program
– Simple file structures
Mainly depends on:
• Processor speed
Mainly used:
• Tapes
• Lesser number of records
52
Record Access:
UNIX tools for sequential
processing
File structure in UNIX:

•ASCII file:- new line character => record delimiter
White space => field delimiter
•Provides rich no. of tools:- which are sequential
•cat myfile:- contents of my file
•wc(word count):- no. of lines, words, characters

–2 12 76
53
Record Access:
UNIX tools for sequential
processing
• grep (generalized regular expression):-
searches for a pattern
– grep Ada my file:displays as follows
– grep Ada my file | wc

• 1 6 36
54
Direct Access
• How do we know where the beginning of the
required record is?
Ü It may be in an Index (discussed in a different
unit)
Ü We know the relative record number (RRN)
Ü Position of a record relative to begining
Ü E.g. First record=> RRN 0, next record=> RRN 1
and so on
55
Direct Access
• RRN are not useful when working with variable
length-records: the access is still sequential.
• In order to work with RRN we need to work with
fixed-length records.
– If records are of fixed length:
• Using RRN we can calculate ByteOffset
• Byteoffset = n* r n=> no. of bytes
r => RRN no. of a record.
– If fixed length is 512 bytes & RRN=500 then
byteoffset?
56
Record Structure
 Choosing a Record Structure and Record
Length
 Header Records
 Adding Headers to C++ Buffer classes
57
Record Structure
Choosing a Record Structure and Record Length:
•To use RRN no. for direct access:
– First we should fix record length.
– Record length means: size of the field to be fixed
•Two ways to do:
– Fixed length field
– Fixed record length
58
Record Structure
1. Fixed length field approach:
• Simplicity
2. Fixed record length
• More efficient as a fixed amount of space at the end.
In the above 2 methods => 1 identification to be made:
– Differentiate between real data / unused space in
the record.
– The above can be done as follows:
• Record length indicator
• Delimiter
• Count fields
59
Record Structure
Header records:
•General information of a file.
•Header record at the beginning of the file to
hold this information.
•Information in header file:
– Count of no. of records
– Length of data records
– Date and time of the file updated.
– Name of the file
60
Record Structure
• Header record will be self describing
object
• Any to access a file will know about:
– File structures used in the file
– Helps in access of a record
– E.g. header record:
61
Record Structure
• Header record an example:
62
Encapsulating Record I/O
Operation in a single class
• Till now we have done a read/ write
operation :
– Two steps:
• Read/ write to a buffer
• Then buffer to a file
• Here we will use a class that hides
buffer.
• It looks as though we have read/ written
with a file. 63
• RecordFile is a class inherits BufferFile
• BufferFile contains functions to read/ write from a
buffer.
• Only we will use this functions.
64
• Shows how read/write functions of a
BufferFile is used to perform our task of
reading / writing.
65
File Access and File
Organization: A Summary
• File organization depends on:

– What use you want to make of the
file?
• Since using a file implies:
– File access
– File organization
– Both are linked.
66
File Access and File
Organization: A Summary
• Example:
– Fixed-length records makes direct access easier.
– If the documents have variable lengths, fixed-
length records is not a good solution
– The application determines our choice of both
access and organization.
– Hence we need to determine both access and
organization of a file.
67

Module 1 Part2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 1 Part2

Uploaded by

Copyright:

Available Formats

Module-1: chapter4 & 5

Fundamental File Structure

• If our input is as follows

AmesMary123 MapleStillwaterOK74075MasonAlan90 EastgateAdaOK74820

• Problem: There is no way to get the

The last byte

• Each field is fixed length specified in the above

This type of method is self-describing.

Method 2: Requiring that the records be a predictable number of

Method 4:Using a second ﬁle to keep track of the

• If the data needs to be represented as a

• Finally the data will be viewed in a ﬁle

• Practically the data is packed is as

// Next ﬁeld to be read hence NextByte is initialized

• They have used multiple inheritance:-

• Class hierarchy for record buffer

Here we are checking which function is called.

• When looking for an individual record, it is

File structure in UNIX:

•wc(word count):- no. of lines, words, characters

– grep Ada my ﬁle | wc

– Fixed record length

• File organization depends on:

You might also like