You are on page 1of 24

UNIT- IV:File I/O and File Organizations

File I/O: Introduction, C++ streams, C++ streams classes, File Stream classes, file operations,
finding end of file, File opening modes, File Organization, working with Text Files and Binary
Files, Random Access of a Record in a file.

File Organizations: Sequential, indexed sequential, direct, inverted, multi-list, directory systems,
Indexing using B-tree and B+ tree

C++ STREAMS

 In C++ Input and Output is handled using streams.


 A Stream is defined as a sequence of bytes flowing in and out of the programs
 An Input stream can be defined as a stream in which data bytes flow from an input
device(such as keyboard, file, network or another program) into the program.
 An Output stream can be defined as a stream in whichdata bytes flow from the program
to an output device (such as console, file, network or another program).
 Streams acts as an intermediaries between the programs and the actual IO devices

C++ STREAMS CLASSES

 The C++ I/O system contains a Hierarchy of classes that are used to define streams to
deal with both the console and disk files.
 The iostream.h library holds all the stream classes in the C++ programming language.

 Streams classes are classified into two categories


1. Console Stream Classes
2 File streams
CONSOLE STREAM CLASSES
 They are used to handle data from console.
 They are declared in iostream,h header file
 Types of Console Stream Classes:
1. ios
2. istream
3. ostream
4. iostream
5. streambuf

ios class
 This class is the base class for all stream classes.
 The streams can be input or output streams.
 This class defines members that are independent of how the templates of the class are
defined.
istream class
 The istream class handles the input stream in c++ programming language.
 These input stream objects are used to read and interpret the input as a sequence of
characters.
 The cin is an object belonging to class istream that handles the input streams in c++
programming language.
ostream class 
 The ostream class handles the output stream in c++ programming language.
 These output stream objects are used to write data as a sequence of characters on the
screen.
 cout is an object belonging to class ostream that handle the out streams in c++
programming language.
iostream class 
 The iostream class handles the input and output stream in c++ programming language.
 iostream class inherits the properties of istream and ostream classes
 These input output stream objects are used to write data as a sequence of characters on
the screen or used to read and interpret the input as a sequence of characters.

streambuf class 
 The streambuf class is used to create a stream buffer.
 Stream buffer is an object in charge of performing the reading and writing operations of
the stream object it is associated with.
 The stream delegates all such operations to its associated stream buffer object, which is
an intermediary between the stream and its controlled input and output sequences.
 The streambuf class is an elaborated base class designed to provide a uniform public
interface for all derived classes

FILE STREAM CLASSES

 File represents storage medium for storing data or information.


 In Files we store data i.e. text or binary data permanently and use these data to read or
write in the form of input output operations by transferring bytes of data.
 File stream classes are used to handle data from the files.
 They are declared in fstream,h header file
 Types of File Stream Classes:
1. ifstream
2. ofstream
3. fstream
4. filebuf
ifstream class
 ifstream class is used to read data to files.
 ifstream class objects are used to open a file for reading.
 A file is opened using ifstream objects if and only if the file exists otherwise error
message is thrown..
 It represents Input File Stream and this is used for reading from files.
ofstream class
 ofstream class is used to create and write data into the files.
 ofstream class objects are used to open a file for writing.
 A file is opened using ofstream objects.If the file exists the file is opened for writing
otherwise a new file is created and opened for writing.
 It represents Output File Stream and this is used for writing data into files.

fstream class
 fstream class is used to create read and write data to files.
 fstream class is inherited from ofstream and ifstream classes.
 It represents both File Output Stream and File Input Stream. So it can read from files and
write to files.
filebuf class
 The streambuf class is used to create a  file stream buffer object .
 A file stream buffer is used to read and write to files.
 These objects are associated to a file by calling member open. Once open, all input/output
operations performed on the object are reflected in the associated file.
 Objects of this class may internally maintain an intermediate input buffer and/or
an intermediate output buffer, where individual characters are read or written by i/o
operations.

FILE OPERATIONS

Operations performed on a file are


 Creating a file object
 Opening a file
 Reading data from a file
 Writing data into a file
 Checking the end of the file
 Closing a file
Creating a file object
 file objects are used to perform any operations on files
 File objects are created using the following Syntax
filestreamclass fileobjectname;
example:-1
ifstream fin;
example:-2
ofstream fout;
example:-3
fstream finout;
Note
 fin is a file object used for opening a file for reading only
 fout is a file object used for opening a file for writing
 finout is file object used for opening a file for reading and writing

Opening a file
 In order to perform any operation on a file we need to open the file.
 A file can be opened in two ways
 using open() function
 during the creation of file object
 using open() function
syntax:-
filestreamobject.open(filename, filemode);

example:-1(text file)
fin.open("file1.txt", ios::in);
example:-2(Binary file)
fin.open("file1.txt", ios::in | ios::binary);

 during the creation of file object


syntax:-
filestreamclass fileobjectname(filename, filemode);
example:-1(text file)
ifstream fin("file1.txt". ios::in);
example:-2(binary file)
ifstream fin("file1.txt". ios::in);| ios::binary);
Reading data from a file
 A file can be read in three ways
 Using getline() function
 Using read() function
 Using filestreamobject
 Using getline() function
 syntax:-
getline(filestreamobject, variable);
example:-
getline(fin, strData);
 Using read() function
syntax:-
filestreamobject.read((char*)& variable, sizeof(variable));
example:-
fin.read((char*)& x, sizeof(x));
 Using filestreamobject
syntax:-
filestreamobject>>variable;
example:-
fin>>x;
Writing data into a file
 A file can be written in three ways
 Using putline() function
 Using write() function
 Using filestreamobject
 Using petline() function
 syntax:-
putline(filestreamobject, variable);
example:-
putline(fout, strData);
 Using write() function
syntax:-
filestreamobject.write((char*)& variable, sizeof(variable));
example:-
fout.write((char*)& x, sizeof(x));
 Using filestreamobject
syntax:-
filestreamobject<<variable;
example:-
fout<<x;
Closing a file
 A file can be explicitly closed using close() function
syntax:-
filestreamobject.close();
example:-
fin.close();
Checking the end of the file
 A file can be checked for end of file using eof() function
syntax:-
filestreamobject.eof()
example:-
fin.eof()
 Note : The eof() function returns true if the end of file is reached.

FILE OPENING MODES

 File mode specifies the information about a file to the compiler.


 it supplies information regarding the operations that can be performed on file
 A file can be opened in different file modes.
Syntax:-
filstreamobject.open(filename,filemode);
example:-1
fin.open("input.txt",ios::in);
example:-2
fout.open("output.txt",ios::out);
example:-3
fout.open("output.txt",ios::binary);
example:-4
fout.open("output.txt",ios::nocreate);
 A file can be opened using different file modes simultaneously
example:-5
fout.open("output.txt",ios::out | ios::trunc);
WORKING WITH TEXT FILES AND BINARY FILES

Text Files
 Text files are special subset of binary files that are used to store human readable
characters as a rich text document or plain text document.
 Text files store data in sequential bytes but bits in text file represents characters.
 Text files are less prone to get corrupted as any undesired change may just show up once
the file is opened and then can easily be removed.
 Text files can be classified as plain text files and Rich text files.
 Because of simple and standard format to store data, text files are one of the most used
file formats for storing textual data and are supported in many applications.
Binary File
 Binary Files are used to store multiple types of data (images, audio, text, etc) under a
single file.
 Binary file are those typical files that store data in the form of sequence of bytes grouped
into eight bits or sometimes sixteen bits.
 When data is stored in a file in the binary format, reading and writing data is faster
because no time is lost in converting the data from one format to another format. 
 A small change in the Binary file can corrupt the file and make it unreadable to the
supporting application.
 One most common example of binary file is image file is .PNG or .JPG.

Program to read and write data from a text file

#include <iostream.h>
#include <fstream.h>
#include<conio.h>
#include<stdlib.h>
//class employee declaration
class Employee
{
public :
int empID;
char empName[100] ;
char designation[100];
int ddj,mmj,yyj;
int ddb,mmb,yyb;

void readEmployee()
{
cout<<"EMPLOYEE DETAILS"<<endl;
cout<<"ENTER EMPLOYEE ID : " ;
cin>>empID;
cout<<"ENTER NAME OF THE EMPLOYEE : ";
cin>>empName;
cout<<"ENTER DESIGNATION : ";
cin>>designation;
}
//function to write employee details
void displayEmployee()
{
cout<<"EMPLOYEE ID: "<<empID<<endl;
cout <<"EMPLOYEE NAME: "<<empName<<endl;
cout <<"DESIGNATION: "<<designation<<endl;
}
};

int main()
{
Employee emp;
emp.readEmployee();
fstream file;
file.open("empnew1.txt",ios::out);
if(!file)
{
cout<<"Error in creating file...\n";
exit(1);
}
file<<emp.empID<<endl;
file<<emp.empName<<endl;
file<<emp.designation<<endl;
file.close();
cout<<"Date saved into file the file.\n";
file.open("empnew1.txt",ios::in);
if(!file)
{
cout<<"Error in opening file...\n";
exit(1);
}

if(!file.eof())
{
file>>emp.empID;
file>>emp.empName;
file>>emp.designation;
cout<<endl<<endl;
cout<<"Data extracted from file..\n";
//print the object
emp.displayEmployee();
}
else
{
cout<<"Error in reading data from file...\n";
exit(1);
}

file.close();

getch();
return 0;
}

Output

EMPLOYEE DETAILS
ENTER EMPLOYEE ID : 90
ENTER NAME OF THE EMPLOYEE : rani
ENTER DESIGNATION : nurse
Date saved into file the file.

Data extracted from file..


EMPLOYEE ID: 90
EMPLOYEE NAME: rani
DESIGNATION: nurse

Program to read and write data from a binary file

#include <iostream.h>
#include <fstream.h>
#include<conio.h>
#include<stdlib.h>
//class employee declaration
class Employee
{
private :
int empID;
char empName[100] ;
char designation[100];
int ddj,mmj,yyj;
int ddb,mmb,yyb;
public :
void readEmployee()
{
cout<<"EMPLOYEE DETAILS"<<endl;
cout<<"ENTER EMPLOYEE ID : " ;
cin>>empID;
cout<<"ENTER NAME OF THE EMPLOYEE : ";
cin>>empName;
cout<<"ENTER DESIGNATION : ";
cin>>designation;
}
//function to write employee details
void displayEmployee()
{
cout<<"EMPLOYEE ID: "<<empID<<endl
cout<<"EMPLOYEE NAME: "<<empName<<endl
cout<<"DESIGNATION: "<<designation<<endl
}
};

int main()
{
Employee emp;
emp.readEmployee();
fstream file;
file.open("emp.dat",ios::out|ios::binary);
if(!file)
{
cout<<"Error in creating file...\n";
exit(1);
}

file.write((char*)&emp,sizeof(emp));
file.close();
cout<<"Date saved into file the file.\n";
file.open("emp.dat",ios::in|ios::binary);
if(!file)
{
cout<<"Error in opening file...\n";
exit(1);
}

if(file.read((char*)&emp,sizeof(emp)))
{
cout<<endl<<endl;
cout<<"Data extracted from file..\n";
//print the object
emp.displayEmployee();
}
else
{
cout<<"Error in reading data from file...\n";
return -1;
}

file.close();
getch();
return 0;
}

Output
EMPLOYEE DETAILS
ENTER EMPLOYEE ID : 90
ENTER NAME OF THE EMPLOYEE : rani
ENTER DESIGNATION : nurse
Date saved into file the file.

Data extracted from file..


EMPLOYEE ID: 90
EMPLOYEE NAME: rani
DESIGNATION: nurse

RANDOM ACCESS OF A RECORD IN A FILE.

File pointer
 Each file stream class contains a file pointer that is used to keep track of the current
read/write position within the file.
 By default, when opening a file for reading or writing, the file pointer is set to the
beginning of the file.
 In random file access the file pointer can be set to any desired position and the data can
be read from that position
 Random file access is performed using seekg() , seekp(), tellg() and tellp() functions

tellg()
 The tellg() function is used with input streams, and returns the current “get” position of
the pointer in the stream.
 It has no parameters and returns a value of the member type pos_type, which is an integer
data type representing the current position of the get stream pointer.
 Syntax:-
tellg();
 Returns: The current position of the get pointer on success, -1 on failure.

tellp()
 The tellp() function is used with output streams, and returns the current “put” position of
the pointer in the stream.
 It has no parameters and return a value of the member type pos_type, which is an integer
data type representing the current position of the put stream pointer.
 Syntax:
tellp();
 Return – Current output position indicator on success otherwise  return -1.

seekg()
 seekg() is a function in the iostream library (part of the standard library) that allows you
to seek to an arbitrary position in a file.
 It is used in file handling to sets the position of the next character to be extracted from
the input stream from a given file.
 Syntax – There are two syntax for seekg() in file handling :
istream& seekg(streampos position);
istream& seekg(streamoff offset, ios_base::seekdir dir);

Description –

position : is the new position in the stream buffer.


offset : is an integer value of type streamoff representing the offset in the
stream’s buffer. It is relative to the dir parameter.
dir : is the seeking direction. It is an object of type ios_base::seekdir that
can take any of the following constant values.

There are 3 direction we use for offset value :


ios_base::beg (offset from the beginning of the stream’s buffer).
ios_base::cur (offset from the current position in the stream’s buffer).
ios_base::end (offset from the end of the stream’s buffer).
Examples:-
fin.seekg(14, ios::cur); // move forward 14 bytes

fin.seekg(-18, ios::cur); // move backwards 18 bytes

seekp()
 The seekp() method of ostream is used to set the position of the pointer in the output
sequence with the specified position.
 This method takes the new position to be set and returns this ostream instance with the
position set to the specified new position.
 Syntax:
ostream& seekp(streampos pos);
 Parameter: This method takes the new position to be set as the parameter.
 Return Value: This method returns this ostream instance with the position set to the
specified new position.

Examples:-
fout.seekp(14, ios::cur); // move forward 14 bytes

Program to random access of data from a file

#include<iostream.h>
#include<fstream.h>
#include<stdlib.h>
#include<cstring.h>
#include<conio.h>
int main()
{
ifstream fin("sample.txt");
// If we couldn't open the input file stream for reading
if (!fin)
{
// Print an error and exit
cout << "Uh oh, sample.txt could not be opened for reading!" << endl;
exit(1);
}
string strData;
fin.seekg(0);
getline(fin, strData);
cout << strData << endl;
fin.seekg(0, ios::cur);
getline(fin, strData);
cout << strData << endl;
fin.seekg(2, ios::end);
getline(fin, strData);
cout << strData << endl;
getch();
return 0;
}
sample.txt
kiran
jyothi
ravi
raju
mani

Output

kiran
jyothi
raju

TYPES OF FILE ORGANIZATION

 File is a group of all the records. Therefore, a file contains Records and Records contain
fields; Fields contain data items; Data items contain characters (alphabets, digits, special
characters, etc.). Each character occupies one byte for its storage.
 The technique used to represent and store the records in a file is known as file organization.
 A record in a file is searched using single key or multiple keys.
 File organizations are classified into different types based on single key or multiple key.
Different types of file organizations are
1. Sequential File Organization
2. Direct Access or Random Access File Organization
3. Index Sequential Access File Organization
4. Inverted list File Organization
5. Multi list File Organization

1. Sequential access file organization


 Storing and sorting in contiguous block within files on tape or disk is called as sequential
access file organization.
 In sequential access file organization, all records are stored in a sequential order. The records
are arranged in the ascending or descending order of a key field.
 Sequential file search starts from the beginning of the file and the records can be added at the
end of the file.
 In sequential file, it is not possible to add a record in the middle of the file without rewriting
the file.
 Sequential file is time consuming process.
 Random searching is not possible.

2. Indexed sequential access file organization


 Indexed sequential access file combines both sequential file and direct access file
organization.
 In indexed sequential access file, records are stored randomly on a direct access device such
as magnetic disk by a primary key.
 This file have multiple keys. These keys can be alphanumeric in which the records are
ordered is called primary key.
 The data can be access either sequentially or randomly using the index. The index is stored in
a file and read into memory when the file is opened.
 It accesses the records very fast if the index table is properly organized.
 The records can be inserted in the middle of the file.
 It requires more storage space.
 It is expensive because it requires special software.

3. Direct access file organization


 Direct access file is also known as random access or relative file organization.
 In direct access file, all records are stored in direct access storage device (DASD), such as
hard disk. The records are randomly placed throughout the file.
 The records does not need to be in sequence because they are updated directly and rewritten
back in the same location.
 This file organization is useful for immediate access to large amount of information. It is
used in accessing large databases.
 In direct addressing with equi-size records, available disk space is divided out into nodes
large enough to hold a record. Numeric value of primary key is used to determine the node
into which a particular record is to be stored.
 It is also called as hashing.The available file space is divided into buckets and slots.
 Direct access file helps in online transaction processing system (OLTP) like online railway
reservation system.
 It accesses the desired records immediately.
4 Multilist File Organisation
 The basic approach to providing the linkage between an index and the file of data records is
called multilist organisation.
 A multilist file maintains an index for each secondary key.
 The index for secondary key contains, instead of a list of primary keys related to that
secondary key, only one primary key value related to that secondary key.
 That record will be linked to other records containing the same secondary key in the data file.
 Linking records together in order of increasing primary key value facilitates easy insertion
and deletion once the place at which the insertion or deletion to be made is known.
 Searching for a record with a given primary key value is difficult when no index is available,
since the only search possible is a sequential search.
 To facilitate searching on the primary key as well as on secondary keys, it is customary to
maintain several indexes, one for each key.
 Using an index in this way reduces the length of the lists and thus the search time.
 This idea is very easily generalised to allow for easy secondary key retrieval.
 We just set up indexes for each key and allow records to be in more than one list. This leads
to the multilist structure for file representation.
5. Inverted files
 In inverted file organisation, a linkage is provided between an index and the file of data
records.
 A key’s inverted index contains all of the values that the key presently has in the records of
the data file.
 Each key-value entry in the inverted index points to all of the data records that have the
corresponding value.
 Inverted files represent one extreme of file organisation in which only the index structures
are important. The records themselves may be stored in any way (sequentially ordered by
primary key, random, linked ordered by primary key etc.).
 Inverted files may also result in space saving compared with other file structures when record
retrieval does not require retrieval of key fields.
 Inverted files are similar to multilists. Multilists records with the same key value are linked
together with link information being kept in individual record.
 The no. of disk accesses needed is equal to the no. of records being retrieved + the no. to
process the indexes.
DIRECTORY STRUCTURE(SYSTEMS)

Directory:A collection of nodes containing information about all files is called directory.
There are 5 types of directory structure in Operating System.
1)Single Level Directory
2)Two Level Directory
3)Tree Structured Directory
4)Acyclic Graph Directory
5)General Graph Directory

1)Single Level Directory


 In single level directory all files are in the same directory.
 Since all files are in the same directory they should have unique names.
 Files are limited in length.
 Even a single user finds it difficult to remember the names of all the files as the number of
files increase,

2)Two Level Directory:


 Each user has Its own User File Directory(UFD).
 When the user job start or user login, the system Master File Directory(MFD) is searched.
MFD is indexed by username or Account Number.
 When user refers to a particular file, only his own UFD is searched. Thus different users
may have files with same name.
 The root of a tree is Master File Directory(MFD).The structure effectively isolates one user
from another.
 Sharing of files by users is not possible
3)Tree Structured Directory
 A directory(or Subdirectory) contains a set of files or subdirectories.
 All directories has the same internal format.
 Each file has a current directory. Current directory should contain most of the files that are
of current interest to the process.
 When a reference is made to a file, the current directory is searched.
 The user can change his current directory whenever he desires.
 If a file is not available in the current directory then the user usually must either specify a
pathname or change the current directory.

4)Acy
clic Graph Directory
 Acyclic Graph is the graph with no cycles.
 It allows directories to share subdirectories and files.
 With a shared file, only one actual file exists, so any changes made by one person are
immediately visible to the another.
 A problem with acyclic graph is: how do we guarantee there are no cycles?
5)General Graph Directory:
 General graph allows cycles.
 We have to allow only links to subdirectories not for file.
 Every time when a new link is added , use a cycle detection algorithm to determine
whether it is ok.

INDEXING

 Indexing is a data structure technique which allows you to quickly retrieve records from a
database file.
 Indexing is a way to optimize the performance of a database by minimizing the number of
disk accesses required when a query is processed.
 The index is a type of data structure. It is used to locate and access the data in a
database table quickly.
 It is defined based on the indexing attribute or field or key.
 Indexing technique are implemented by BTrees and B+ Trees
B TREE
 B Tree is a specialized m-way tree that can be widely used for disk access.
 A B-Tree of order m can have at most m-1 keys and m children.
 One of the main reason of using B tree is its capability to store large number of keys in a
single node and large key values by keeping the height of the tree relatively small.
 A B tree of order m contains all the properties of an M way tree. In addition, it contains
the following properties.
 Every node in a B-Tree contains at most m children.
 Every node in a B-Tree except the root node and the leaf node contain at least m/2
children.
 The root nodes must have at least 2 nodes.
 All leaf nodes must be at the same level.
 It is not necessary that, all the nodes contain the same number of children but, each node
must have m/2 number of nodes.
 Insertions are done at the leaf node level. The following algorithm needs to be followed
in order to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the node can
be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the increasing
order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
1. Insert the new element in the increasing order of elements.
2. Split the node into the two nodes at the median.
3. Push the median element upto its parent node.
4. If the parent node also contain m-1 number of keys, then split it too by
following the same steps.
B+ TREES

 B+ tree is an M-way tree with a variable but often large number of children per node.
 B+ tree is used in storing data for efficient retrieval in a block-oriented storage in  file
systems.
 A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or
a node with two or more children.
 All leaves are at the same distance from the root.
 The leaf nodes have an entry for every value of the search field, along with a data
pointer to the record .
 The leaf nodes of the B+ tree are linked together to provide ordered access on the
search field to the records.
 Internal nodes of a B+ tree are used to guide the search. Some search field values from
the leaf nodes are repeated in the internal nodes of the B+ tree.
Structure of Internal node
 Each internal node is of the form < P1, K1, P2, K2 . . . Pn-1, Kn-1, Pn > where Ki is the
key and Pi is a tree pointer Within each internal node, K1 < K2, . . . < Kn-1
 For all search field value x in the subtree pointed at by Pi, we have Ki-1 x <= Ki.
 Each internal node has at most p tree pointers.
 Each internal node, except the root, has at least (P/2) tree pointers.
Structure of a leaf node

 Each leaf node is of the form <<K1, P1>, <K2, P2> . . . <Kn-1,Pn-1>, Pnext> Within
each leaf node, K1 < K2 . . . < Kn-1.
 Pi is a data pointer that points to the record whose search field value is Ki.
 Each leaf node has at least ⌈(P/2)⌉ values. All leaf nodes are at the same level.

Inserting in B+ tree

 Perform a search operation in the B+ tree to check the ideal node where this new key
should go to.

 If the node is not full( does not violate the B+ tree property ), then add that key into this
node.

 Otherwise split the nodes into two nodes and push the middle key to the parent node and
then insert the new key.

 Repeat the above steps if the parent node is there and the current node keeps getting full.

Differences between BTree and B+ Tree

B Tree B+ Tree

Search keys can not be repeatedly stored. Redundant search keys can be present.
Data can be stored in leaf nodes as well as Data can only be stored on the leaf
internal nodes nodes.
Searching for some data is a slower Searching is comparatively faster as data
process since data can be found on can only be found on the leaf nodes.
internal nodes as well as on the leaf
nodes.
Deletion will never be a complexed
Deletion of internal nodes are so
process since element will always be
complicated and time consuming.
deleted from the leaf nodes.
Leaf nodes are linked together to make
Leaf nodes can not be linked together.
the search operations more efficient.

You might also like