Chapter 4.

Fundamental File Structure Concepts
Ki-Joon Han School of Computer Science & Engineering Konkuk University

kjhan@db.konkuk.ac.kr

Chapter Outline
4.1 Field and Record Organization 4.2 Using Classes to Manipulate Buffers 4.3 Using Inheritance for Record Buffer Classes 4.4 Managing Fixed-Length, Fixed-Field Buffers 4.5 An Object-Oriented Class for Record Files

File Processing (4)

Konkuk University (DB Lab.)

2

4.1 Field and Record Organization
Field
the smallest logically meaningful unit of information in a file

Aggregates of fields
(1) Array many copies of a single field (2) Record a list of different fields

Object (vs. record)
data residing in memory

Members (vs. fields)
object’s fields

File Processing (4)

Konkuk University (DB Lab.)

3

4.1.1 A Stream File
writestr.cpp (operator <<) => Appendix D
write the fields of a Person to a file as a stream of bytes containing no added information => no way to get field apart

keep the information divided into fields
ostream & operator << (ostream & outputFile, Person & p) { // insert (write) fields into stream outputFile << p.Lastname << p.FirstName << p.Address << p.City Mary Ames Alan Mason << p.State 123 Maple 90 Eastgate << p.ZipCode; Stillwater, OK 74075 Ada, OK 74820 return outputFile; }
AmesMarry123 MapleStillwaterOK74075MasonAlan90 EastgateAdaOK74820
File Processing (4) Konkuk University (DB Lab.) 4

char address [16].) 5 . char first [11]. char address [16]. char state [3]. char state [3]. char zip[10].4. char first [11]. }. }. File Processing (4) Konkuk University (DB Lab.2 Field Structures (1/5) Structure of fixed-length fields (1) In C: Struct Person{ char last [11].1. char city [16]. char zip [10]. char city [16]. (2) In C++: class Person { public: char last [11].

1. 4.3(a) force the fields into a predictable length (i..4. or very little variation in field lengths File Processing (4) Konkuk University (DB Lab.e.) 6 .2 Field Structures (2/5) Methods for structuring fields 1. Fix the length of fields : Fig. fixed-length fields) disadv. adding all the padding required to bring the fields up to a fixed length makes the file much larger data can be too long to fit into the allocated amount of space inappropriate a large amount of variability in the length of fields appropriate every field is already fixed in length.

1. vertical bar character “|”) File Processing (4) Konkuk University (DB Lab. tab) or special characters (e. 4. Begin each field with a length indicator : Fig..2 Field Structures (3/5) 2.g.g. Separate the fields with delimiters : Fig.3(b) store the field length just ahead of the field one byte for up to 256-byte field 3.3(c) use some special character or sequence of characters that will not appear within a field as a delimiter delimiter character white-space characters (e. blank.) 7 .4.. new line. 4.

self-describing structure (i.3(d) adv.4.) 8 . waste a lot of space for the keywords File Processing (4) Konkuk University (DB Lab.. 4.2 Field Structures (4/5) 4. Use a "Keyword=Value" expression to identify fields : Fig.e.1. a field provides information about itself) good format for dealing with missing fields disadv.

2 Field Structures (5/5) File Processing (4) Konkuk University (DB Lab.1.) 9 .4.

getline (p.3 Reading a Stream of Fields Extraction operator (operator >>) => Appendix D reads the stream of bytes. Person & p) { // read delimited fields from file Last Name: ‘Ames’ char delim.getline (p. ‘ | ’). First Name: ‘Alan’ stream. Address: ‘90 Eastgate’ stream. 30. ‘ | ’).ZipCode. ‘ | ’).City. breaking the stream into fields and storing it as a Person object istream & operator >> (istream & stream. } File Processing (4) Konkuk University (DB Lab.getline (p. 15. ‘ | ’). Zip Code: ‘74075’ Last Name: ‘Mason’ stream.getline (p.Address. City: ‘Ada’ State: ‘OK’ stream.LastName) ==0) return stream. ‘ | ’).FirstName.) 10 .getline (p.LastName. Address: ‘123 Maple’ City: ‘Stillwater if (strlen (p. 30.State.1. 30. 30. First Name: ‘Mary’ stream.4. ‘ | ’).getline (p. 10. State: ‘OK’ stream. Zip Code: ‘74820’ return stream.

4.) 11 .4.5(b) File Processing (4) Konkuk University (DB Lab. Make records a predictable number of bytes (fixedlength records) : Fig.4 Record Structures (1/5) Record a set of fields that belong together when the file is viewed in terms of a higher level of organization Methods for structuring records 1.1. 4.5(a) each record contains the same number of bytes the most commonly used methods not imply that the size or number of fields in the record must be fixed frequently used to hold variable numbers of variablelength fields : Fig.

4.1. 4.) 12 .6(b) use an index to keep a byte offset for each record in the original file two-file mechanism File Processing (4) Konkuk University (DB Lab.4.4 Record Structures (2/5) 2.5(c) each record contains a fixed number of fields 3. Begin each record with a length indicator : Fig. Use an index to keep track of addresses : Fig. Make records a predictable number of fields : Fig.6(a) each record contains a field indicating how many bytes there are in the record commonly used for handling variable-length records 4. 4.

. \n on UNIX or CR/NL on other O.. Place a delimiter(e. 4. “#”) at the end of each record : Fig.) 13 .4 Record Structures (3/5) 5.S.6(c) use end-of-line character (e.4.g.) as a record delimiter File Processing (4) Konkuk University (DB Lab.1.g.

4.) 14 .1.4 Record Structures (4/5) File Processing (4) Konkuk University (DB Lab.

) 15 .4.1.4 Record Structures (5/5) File Processing (4) Konkuk University (DB Lab.

1.) 16 .5 A Record Structure That Uses a Length Indicator (1/4) Method for selecting record organization the nature of data what you need to do with it Writing the variable-length records to the file a record-length field at the beginning of the record => the sum of the lengths of the fields in each record form of the record-length field => binary integer or ASCII characters File Processing (4) Konkuk University (DB Lab.4.

strcat (buffer. strcat (buffer.write (&length.ZipCode) . strcat (buffer. “ | ”) . p. strcat (buffer. p. strcat (buffer. strcat (buffer. “ | ”) . Person & p) { char buffer [MaxBufferSize] .5 A Record Structure That Uses a Length Indicator (2/4) WritePerson function write a variable-length. strcat (buffer. p. “ | ”) . “ | ”) .4. delimited buffer to a file buffer : character array for fields and filed delimiters const int MaxBufferSize = 200 int WritePerson (ostream & stream.write(&buffer.LastName) .Address) . length) . strcat (buffer.) 17 .State) . “ | ”) . short length=strlen (buffer) stream. //write length stream. p.City) .FirstName) .1. // create buffer of fixed size strcpy (buffer. sizeof(length)) . //write buffer } File Processing (4) Konkuk University (DB Lab. strcat (buffer. strcat (buffer. strcat (buffer. p. “ | ”) . p.

4.) 18 . // with C++ stream classes readvar. length).1.cpp => Appendix D implement record structure using binary field for length 40Ames|Marry|123 Maple|Stillwater|OK|4075|36 Mason|Alan|90 Eastgate|Ada|OK|74820 File Processing (4) Konkuk University (DB Lab. // with C stream stream << length << ‘ ’.5 A Record Structure That Uses a Length Indicator (3/4) Representing the record length write the length in the form of a 2-byte binary integer in C convert the length into a character (decimal) string using formatted output fprintf (file. “%d ”.

4.5 A Record Structure That Uses a Length Indicator (4/4) ReadVariablePerson function (i) read the length of a record (ii) move the characters of the record into a buffer (iii) break the record into fields int ReadVariablePerson (istream & stream. // terminate buffer with null istrstream strbuff (buffer). stream.read (&length. } File Processing (4) Konkuk University (DB Lab. buffer[length] = 0.read (buffer. length). person & p) { // read a variable sized record from stream and store it in p short length. sizeof(length)).) 19 .4) return 1. // use the istream extraction operator (See Fig.4. // create buffer space stream. // create a string stream strbuff >> p. char * buffer = new char[length+1].1.

36 Decimal value of number (a) stored as ASCII characters: (b) stored as a 2-byte integer: 40 36 40 36 Hex value stored ASCII in bytes character form 34 33 00 00 30 36 28 24 ‘4’ ‘3’ ‘0’ ‘6’ ‘\0’ ‘(’ ‘\0’ ‘$’ 40Ames|John|123 Maple|Stillwater|OK|74075|36Mason|Alan|90 Eastgate …… (Ames|John|123 Maple|Stillwater|OK|74075| $Mason|Alan|90 Eastgate …… 0x28 0x00 File Processing (4) 0x24 0x00 Konkuk University (DB Lab.1.) 20 .4.6 Mixing Numbers and Characters : Use of a File Dump (1/4) Number 40.

) 21 .1.4.6 Mixing Numbers and Characters : Use of a File Dump (2/4) File Processing (4) Konkuk University (DB Lab.

value(ASCII.1.6 Mixing Numbers and Characters : Use of a File Dump (3/4) File dumps to look inside a file at the actual bytes especially for non-printable ASCII bytes od -xc filename : offset.4. Hex) Offset 0000000 0000020 Values \0 ( M a A m e s P l e | | J o h S t i l n | l w 1 2 a t 3 e r ASCII Hex 0028 416d 6573 4d61 706c 657c 7c4a 6f68 6e7c 5374 696c 6c77 0 7 | 9 | O 7c4f 5 | 0 K | 3132 3320 6174 6572 M a s o 4d61 736f t g a 0 | 7374 6761 3832 307c 0000040 0000060 0000100 | O 7c4f n | t e K | 7 4 4b7c 3734 A l | A a n 616e d a 6461 \0 $ 3037 357c 0024 7c39 3020 4561 7 4b7c 3734 E a s 6e7c 416c 7465 7c41 4 8 2 File Processing (4) Konkuk University (DB Lab.) 22 .

variable-length fields.6 Mixing Numbers and Characters : Use of a File Dump (4/4) Common file structure each record has both binary and ASCII data (mixing data type). and consists of a fixed-length field (byte count) and several delimited. File Processing (4) Konkuk University (DB Lab.1.) 23 .4.

unpack. For output (i) start with an empty buffer object (ii) pack field values into the object one by one (iii) write the buffer contents to an output stream 2.2 Using Classes to Manipulate Buffers Buffer class to encapsulate the pack. read. and write operations of buffer objects 1.4. For input (i) initialize a buffer object by reading a record from an input stream (ii) extract the object’s field values => no direct access nor mixed pack & unpack operations File Processing (4) Konkuk University (DB Lab.) 24 .

private: char Delim.cpp class DelimTextBuffer {public: DelimTextBuffer (char Delim = '|'.) 25 . int size = -1). // max # of characters in buffer int NextByte. // delimiter character char * buffer. unpacking position in buffer } File Processing (4) Konkuk University (DB Lab. int Pack (const char * str. int maxBytes = 1000). // current size of packed fields int MaxBytes. // packing.h and deltext. // character array to hold field values int BufferSize.2.4. int Read (istream & file). int Unpack (char *str). int Write (ostream & file) const.1 Buffer Class for Delimited Text Fields (1/2) DelimitedTextBuffer class => Appendix E support variable-length buffers whose fields are represented as delimited text full class in deltext.

// pack the person into the buffer buffer. return stream.FirstName). // write the buffer to a file int DelimTextBuffer :: Read (istream & stream) // Read method { Clear().) 26 . } File Processing (4) Konkuk University (DB Lab.read(Buffer. //clear the current buffer contents stream.Pack (MaryAmes. BufferSize). //buffer overflow stream. // declare objects of class Person DelimTextBuffer buffer.read((char *)&BufferSize.Pack (MaryAmes.Pack (MaryAmes. // and add the delimiter buffer.fail()) return FALSE. // declare objects of class DelimTextBuffer buffer. // copy the characters to the buffer ….ZipCode). //read the record size if (stream. buffer. if (BufferSize > MaxBytes ) return FALSE.1 Buffer Class for Delimited Text Fields (2/2) Person MaryAmes.4.Write (stream). sizeof(BufferSize)).Lastname).good().2.

result = Buffer.Pack (ZipCode). result = result && Buffer. result = result && Buffer.Pack (State).Pack (Address).Pack (FirstName)..g.2 Extending Class Person with Buffer Operations Buffer for an object(e. Person) specify the order in which the members of the object are packed and unpacked Int Person::Pack (DelimTextBuffer & Buffer) const {// pack the fields into a DelimTextBuffer int result. } File Processing (4) Konkuk University (DB Lab. return result. result = result && Buffer.2. result = result && Buffer.Pack (LastName).Pack (City). result = result && Buffer.4.) 27 .

3 Buffer Classes for Length-Based and Fixed-length Fields (1/3) LengthTextBuffer class => Appendix E full class in lentext. // packing. // current size of packed fields int MaxBytes. int Read (istream & file). private: char * buffer.cpp class LengthTextBuffer {public: LengthTextBuffer (int maxBytes = 1000). unpacking position in buffer } File Processing (4) Konkuk University (DB Lab.h and lentext. int size = -1). int Pack (const char * field.2. int Unpack (char *field). // character array to hold field values int BufferSize. int Write (ostream & file) const. // max # of characters in buffer int NextByte.4.) 28 .

h and fixtext.3 Buffer Classes for Length-Based and Fixed-length Fields (2/3) FixedTextBuffer class full class in fixtext. // packing.) 29 . // current size of packed fields int MaxBytes. int size = -1). int Unpack (char *field). int Read (istream & file). private: char * buffer.2. int Pack (const char * field. // max # of characters in buffer int NextByte.cpp class FixedTextBuffer {public: FixedTextBuffer (int maxBytes = 1000). int AddField (int fieldSize).4. unpacking position in buffer int * FieldSizes. // character array to hold field values int BufferSize. int Write (ostream & file) const. // array of field sizes } File Processing (4) Konkuk University (DB Lab.

buffer. // LastName[11].3 Buffer Classes for Length-Based and Fixed-length Fields (3/3) FixedTextBuffer class(cont’d) use a fixed collection of fixed-length fields use fixed-length records AddField : support the specification of the fields and their size int Person::InitBuffer (FixedTextBuffer & buffer) { buffer. . 61 bytes total buffer.2.Init(6. // ZipCode[10].AddField (10). // FirstName[11]..AddField (9).) 30 . buffer. } File Processing (4) Konkuk University (DB Lab.. 61).AddField (10). return 1.4. // 6 fields.

3 Using Inheritance for Record Buffer Classes cpp files for the three classes a large percentage of the code is duplicated => use of the inheritance to eliminate almost all of the duplication File Processing (4) Konkuk University (DB Lab.4.) 31 .

4.) 32 .3.1 Inheritance in the C++ Stream Classes (1/3) Multiple inheritance in C++ ios istream fstreambase strstreambase ostream ifstream iostream ofstream fstream File Processing (4) Konkuk University (DB Lab.

1 Inheritance in the C++ Stream Classes (2/3) class class class class class class istream: virtual public ios { . . . ofstream: public fstreambase.) 33 . . Class istream define the read operations. . ostream: virtual public ios { . . . public ostream { . iostream: public istream. fstream: public fstreambase. public istream { . . . .4.3. public iostream { . the extraction operators Class ostream define the write operations Class ios define basic stream operations Class fstreambase for access to operation system file operations File Processing (4) Konkuk University (DB Lab. . . . ifstream: public fstreambase. public ostream { .

3. } Use an istrstream object strbuff to contain a string buffer istrstream is derived from istream strbuff >> p : manipulated by istream operation File Processing (4) Konkuk University (DB Lab. //create a string stream strbuff >> p. istrstream strbuff (buffer).4.1 Inheritance in the C++ Stream Classes (3/3) int ReadVariablePerson (istream & stream. Person & p) { .. //use the istream extraction operator return 1..) 34 .

3.4.2 A Class Hierarchy for Record Buffer Objects (1/5) Buffer class hierarchy => Appendix F File Processing (4) Konkuk University (DB Lab.) 35 .

Pack.) 36 . //max # of char in the buffer }.4. int size = -1) = 0. ⇒ ⇒ Define Read. virtual int Unpack (void * field. int maxbytes = -1) = 0.3.2 A Class Hierarchy for Record Buffer Objects (2/5) class IOBuffer {public: IOBuffer (int maxBytes = 1000). Write. //read a buffer virtual int Write (ostream &) const = 0. //sum of the sizes of packed fields int MaxBytes. protected: char * Buffer. //character array to hold field values int BufferSize. Unpack as virtual to allow each subclass to define its own implementation Use Pure virtual function (= 0) File Processing (4) Konkuk University (DB Lab. virtual int Read (istream &) = 0. //write a buffer virtual int Pack(const void * field.

) 37 .2 A Class Hierarchy for Record Buffer Objects (3/5) VariableLengthBuffer and FixedLengthBuffer classes support the read and write operations for different types of records class VariableLengthBuffer: public IOBuffer { public: VariableLengthBuffer (int MaxBytes = 1000). int Read (istream &). // return current size of buffer }. int Write (ostream &) const. File Processing (4) Konkuk University (DB Lab.3.4. int SizeOfBuffer () const.

4. }. int Unpack (void * field. int maxBytes = 1000). DelimFieldBuffer. int size = -1).) 38 . int maxBytes = -1).3. protected: char Delim. File Processing (4) Konkuk University (DB Lab.2 A Class Hierarchy for Record Buffer Objects (4/5) LengthFieldBuffer. int Pack (const void*. FixedFieldBuffer classes have the pack and unpack methods for the specific field representation class DelimFieldBuffer: public VariableLengthBuffer { public: DelimFieldBuffer (char Delim = -1.

//use the method DelimFieldBuffer::Unpack File Processing (4) Konkuk University (DB Lab.) 39 .Unpack (Buffer).Unpack (FirstName). //which unpack method ?:not compile time if (numBytes == -1) return FALSE. // unpack the other fields return TRUE. } Person MaryAmes. int numBytes. .Unpack (LastName). . DelimFieldBuffer Buffer.4. numBytes = Buffer. MaryAmes.3. LastName[numBytes] = 0. .2 A Class Hierarchy for Record Buffer Objects (5/5) Persistent objects move objects from memory to files ensure that fields are packed and unpacked int Person::Unpack(IOBuffer & Buffer) { Clear(). if (numBytes == -1) return FALSE. numBytes = Buffer.

//return # of defined fields protected: int * FieldSize... //max # of fields int NumFields. //define the next field int Pack ( ).4 Managing Fixed-Length. int AddField (int fieldSize). //actual # of defined fields }.4. int Unpack( ). FixedField Buffers (1/2) Read and write of fixed-length records Write method : write the fixed-size record Read method : must know the record size Use protected field for record size of FixedLengthBuffer object ⇒ class FixedFieldBuffer: public FixedLengthBuffer {public: .) 40 . File Processing (4) Konkuk University (DB Lab. int NumberOfFields () const. //array to hold field sizes int MaxFields.

AddField(10). 61 bytes total MaryAmes.) 41 .AddField(10). result = Buffer. //FirstName[11]. result = result && Buffer.. // 6 fields. } ⇒ Add the fields one at a time. .. FixedField Buffers (2/2) Initialization of FixedFieldBuffer int Person::InitBuffer (FixedFieldBuffer & Buffer) { int result. //LastName [11].InitBuffer (Buffer).4. each with its own size Preparation of a buffer for reading and writing objects FixedFieldBuffer Buffer(6. return result. File Processing (4) Konkuk University (DB Lab. 61).4 Managing Fixed-Length.

) 42 .h and buffile.cpp of Appendix F File Processing (4) Konkuk University (DB Lab.5 An Object-Oriented Class for Record Files (1/3) Have to know how to transfer objects to and from files BufferFile class support manipulation of files that are tied to specific buffer types support the creation of an object BufferFile from a specific buffer object to open and create files and to read and write records full class in buffile.4.

int Rewind()..) 43 . // write the buffer contents int Append(). int WriteHeader(). int MODE). }. // write the current buffer at the end of file protected: IOBuffer & Buffer. //create with a buffer int Open (char * filename. //open an existing file int Create(char * filename.). // reset to the first record int Read (int recaddr = -1). // the C++ stream of the file int HeaderSize. File Processing (4) Konkuk University (DB Lab.5 An Object-Oriented Class for Record Files (2/3) Class BufferFile {public: BufferFile (IOBuffer &). .4. // size of header int ReadHeader(). // reference to the file’s buffer fstream File. //create a new file int Close(). // read a record into a buffer int Write(int recaddr = -1).

// a buffer is created BufferFile file (buffer). file.Unpack (myobject). // BufferFile object file is attached file.Open (myfile). // buffer contains the packed record buffer.4.) 44 . // put the record into myobject ⇒ BufferFile is combined with a fixed-length buffer 5.Read ( ).2 “More about Record Structures” Put a header record on the beginning of each file BufferFile::Open reads the record size from the file header File Processing (4) Konkuk University (DB Lab.5 An Object-Oriented Class for Record Files (3/3) DelimFieldBuffer buffer.

Sign up to vote on this title
UsefulNot useful