You are on page 1of 9

CS 2604 Project 3

Spring 2002

BST Index
For this project you will implement a simple database program that will support search and modify operations on a file containing simple records of the following form: unique key value K, an unsigned short integer, followed by string field length SLen, an unsigned short integer, followed by checksum, an integer, followed by string field S, consisting of Slen arbitrary characters and possible padding.

The string field will contain at most 24 characters; if the actual string is shorter than that, the record will contain padding characters to fill the record out to a total of 32 bytes. The actual padding character is unspecified and should be of no concern to your implementation. The database file will consist of a sequence of these 32-byte records, stored in binary format. There is no stated limit on the number of records. It is guaranteed that no two records will contain the same key value. So far this is similar to the previous project. The primary difference is that the system will include a record index to support finding a specific record given its key value. The index will store (key, offset) pairs where the offset is the byte number at which the corresponding record begins in the database file. When performing a search, the system will pass the index the desired key value and the index will return the file offset for the matching record, if it exists. On program startup, the system will read the database file, block by block via the buffer pool, and build the index structure. There are many structures that could be used for the index. You will use a simple binary search tree (BST) as described in the course notes. This will not be a self-balancing tree, such as a splay or AVL tree, so there is the possibility that the index will provide suboptimal performance. That deficiency may be addressed in a later project. As before, the project assumes that the file of records may be too large to store all the records in primary memory at once. Therefore, you will also implement a general-purpose buffer pool that will mediate the disk operations and ideally reduce the number of individual disk accesses that must be performed. Note that if you implemented the buffer pool properly in the previous project, you can simply reuse it here.

Program Invocation:
Your program must take the names of the input and output files from the command line failure to do this will irritate the person for whom you will demo your project. The program will be invoked as: BinaryBP <dB file name> <command file name> <log file name>

If either of the specified input files does not exist, the program should print an appropriate error message and either exit or prompt the user for a correction.

Data Structures:
The primary data structures element of this project is a binary search tree (BST). Your implementation is under the following specific requirements: The BST must be encapsulated as a C++ template. The underlying structure must be linked, and you must use a C++ template for the tree nodes. The behavior of the BST must conform to the description given in class. Note that in this project, we do not allow entries with duplicate key values. For testing, your BST should have the ability to display itself to a specified output stream, as described in the notes. If you expect to receive help, be sure that your display function conforms to the formatting described in the course notes, but you may reverse the sides if you prefer.

Page 1 of 9

CS 2604 Project 3

Spring 2002

The system will also use a buffer pool to mediate transactions with the disk file, just as in the last project. All the requirements that were given there for the buffer pool still apply. Your design must make appropriate use of classes. The specification may imply the existence of additional classes besides those involved in the implementation of the BST. Aside from nodes and buffer objects used only within an encapsulating class, data members of classes must be private. If an error occurs during the parsing of the input file, theres an error in your code. However, your program should still attempt to recover, by flushing the current input line and proceeding to the next input line.

Other System Elements:


There must be a controller that receives data and commands from the command file and triggers the appropriate responses in the other system elements. The controller may be purely procedural. The controller may take responsibility for verifying the existence of the input files, and opening and closing the various file streams. The controller should log the initial system configuration (described below). The controller should create the data manager and buffer pool objects, and trigger the creation of the BST index. There must be a data manager class, separate from the controller, which is responsible for managing the execution of the show and update commands described below. The data manager will deal with record objects, not with raw data. The data manager should log results from processing show and update commands. The data records should be encapsulated as objects when being processed by the data manager. There should also be a file manager object that handles reading the command file, and stripping out comments. It may be useful to have a translator class that handles the data conversions that must take place when the data manager and buffer pool communicate, but this is not required. It may also be useful for each buffer to be an object.

The Binary Database File:


The database (dB) file used in this project will be in binary format. The dB file will consist of a sequence of records, as described above. Each record will consist of the four sections described above, in that order. There will not be any header data at the beginning of the file, or any extra data at the end. A hex dump of a sample binary dB file is included later in this document. Here is an annotated hex display of the first record in that file: Key: 4994 Pos Value Length: 11 Check sum: 7209 First character: d

00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -----------------------------------------------------------------82 6F 13 6D 0B 6A 00 20 29 20 1C 20 00 20 00 20 64 20 78 20 66 20 7A 20 74 20 72 20 6C 20 64 20

Last character: j

Padding

A complete binary data file is available on the course website. Logically the dB file can be viewed as a sequence of records, and each record can be located by moving the file pointer to the correct offset within the file. Note that you are absolutely forbidden to simply read and store the entire file in memory. Your implementation must make use of a buffer pool, as described above.

Page 2 of 9

CS 2604 Project 3

Spring 2002

Command File:
The execution of the program will be driven by a script file, as in the first project. As before, lines beginning with a semicolon character (';') are comments and should be ignored. The command file will start with a header that may include comments and will definitely include a line specifying the buffer pool "geometry": buffer<tab><buffer size in bytes><tab><# of buffer slots> After that header, each non-comment line of the command file will specify one of the commands described below. Each line consists of a sequence of tokens which will be separated by single tab characters. A newline character will immediately follow the final token on each line. The command file is guaranteed to conform to this specification, so you don't need to worry about error-checking when reading it. The following commands must be supported: show<tab><Key> Log the values of all of the data fields of the indicated record. These should be interpreted, not written in raw binary form. If the sequence number corresponds to a non-existent record, log an error message. This command will result in transactions with the BST index to obtain the file offset, and with the buffer pool, which must determine whether the record is in memory or not, load the appropriate file block if necessary, and then return the record data for display. update<tab><Key><tab><Length><tab><Checksum><tab><Length characters> Replace the data for the indicated record with the given data. Log a message confirming the update. If the given key corresponds to a non-existent record, add that record to the database. Note that this will require adding an entry to the index, and writing a record to the dB file (through the buffer pool, of course). Added records should be written to the end of the dB file. When performing the update, you will usually have to pad the record out to 32 bytes; you must use the asterisk character '*' for padding. This command will also result in transactions with BST index and the buffer pool. If the targeted record is not in memory, the buffer pool must load the appropriate file block. Once the targeted record is in memory, the buffer pool must over-write its data (in memory) with the supplied data (in binary format). debug buffers<tab> Log the current contents of the buffer pool. The display should be neatly formatted. For each file block stored in the buffer pool, log the file offset at which the block begins and the bytes stored for that block, formatted as pairs of hex digits. (Code that can be adapted for this purpose will be posted along with this specification.) debug index<tab> Log the current contents of the BST index. The display should be neatly formatted, and reflect a (possibly modified) inorder traversal, as shown in the course notes. For each tree node you should display the key value and file offset stored there. exit<tab> Terminate program execution. The buffer pool should perform any necessary writebacks before the dB file is closed. Summary statistics, described below, should be logged. All dynamic memory should be properly deallocated. A sample command script is included later in this document.

Page 3 of 9

CS 2604 Project 3

Spring 2002

Log File Description:


Since this assignment will be graded by TAs, rather than the Curator, the format of the output is left up to you. Of course, your output should be clear, concise, well labeled, and correct. The first two lines should contain your name, section specification (e.g., CS 2604 11:15 MWF), and project title. The next section of the log file should contain some initialization information: the names of the dB, command, and log files the buffer pool configuration including the number of slots and the size of each buffer the number of records stored in the dB file (same as the number of nodes in the BST)

The remainder of the log file output should come directly from your processing of the command file. You are required to echo each command that you process to the log file so that its easy to determine which command each section of your output corresponds to. Each command should be numbered, starting with 1, and the output from each command should be well formatted, and delimited from the output resulting from processing other commands. A complete sample log is included later in this document.

Submitting Your Program:


You will submit a gzipped tar file containing your project to the Curator System (read the Student Guide), and it will be archived until you demo it for one of the GTAs. Instructions for submitting are contained in the Student Guide. You will find a list of the required contents for the zipped file on the course website. Follow the instructions there carefully; it is very common for students to suffer a loss of points (often major) because they failed to include the specified items. Be very careful to include all the necessary source code files. It is amazingly common for students to omit required header or cpp files, or to submit the wrong version of their program. In such a case, it is obviously impossible to perform a test of the submitted program unless the student is allowed to supply the missing files. When that happens, to be fair to other students, we must assess the late penalty that would apply at the time of the demo. To avoid such problems, once you've prepared your gzipped tar file for upload, copy it to a new location, unarchive it, build an executable and test that executable. If you do that you can at least be sure you're not submitting an old, incomplete version. You will be allowed up to five submissions for this assignment, in case you need to correct mistakes. Test your program thoroughly before submitting it. If you discover an error you may fix it and make another submission. Your last submission will be graded, so fixing an error after the due date will result in a late penalty. The submission client can be found at: http://eags.cs.vt.edu:8080/curator/

Programming Standards:
The GTAs will be carefully evaluating your source code on this assignment for programming style, so you should observe good practice. See the Programming Standards page on the course website for specific requirements that should be observed in this course. As always, you should practice good object-centered design and implementation.

Evaluation:
You will schedule a demo with your assigned GTA. At the demo, the TA will supply your submitted project, and you will perform a build and run your program on the supplied test data. The GTA will evaluate the correctness of your results. In addition, the GTA will evaluate your project for good internal documentation and software engineering practice.

Page 4 of 9

CS 2604 Project 3

Spring 2002

Pledge:
Each of your program submissions must be pledged to conform to the Honor Code requirements for this course. Specifically, you must include the pledge statement provided with the earlier project specifications in the header comment for your main source code file.

Sample command script:


; Script file for P3 ; ; Buffer pool configuration: buffer 64 5 ; ; debug index ; show 467 show 246 show 1402 show 1577 show 539 debug buffers ; show 999 debug buffers ; update 467 3 42 abc debug buffers ; ; Quit: exit

Page 5 of 9

CS 2604 Project 3

Spring 2002

Sample log output:


Programmer: Bill McQuain CS 2604 Buffer Pool and Binary I/O Database file: Data.bin Command file: Script.txt Log file: Log.txt Number of buffers: Buffer size in bytes: 5 64 1

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Command: debug index 175:128 176:96 246:160 303:64 387:224 467:192 484:256 529:32 539:352 609:320 619:384 670:288 742:448 842:416 879:480 905:0 994:608 1007:576 1059:640 1136:544 1144:704 1175:672 1262:736 1270:512 1361:832 1402:800 1439:864 1531:768 1577:928 1593:896 1676:960 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Command: show 467 going to root 905:0 going left to 529:32 going left to 303:64 going right to 467:192 Record found. Record should be at offset 192 Buffer pool adding new block in buffer #0 Desired record found in buffer #0 467 13 10168 vinyguyudmvzb ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Command: show 246 going to root 905:0 going left to 529:32 going left to 303:64 going left to 176:96 going right to 246:160 Record found. Record should be at offset 160 Buffer pool adding new block in buffer #1 Desired record found in buffer #1 246 7 3160 fksxdpy ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Page 6 of 9

CS 2604 Project 3

Spring 2002

Command: show 1402 going to root 905:0 going right to 1270:512 going right to 1531:768 going left to 1402:800 Record found. Record should be at offset 800 Buffer pool adding new block in buffer #2 Desired record found in buffer #2 1402 5 1605 vlnra ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 5 Command: show 1577 going to root 905:0 going right to 1270:512 going right to 1531:768 going right to 1593:896 going left to 1577:928 Record found. Record should be at offset 928 Buffer pool adding new block in buffer #3 Desired record found in buffer #3 1577 9 4824 prusempaj ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 6 Command: show 539 going to root 905:0 going left to 529:32 going right to 670:288 going left to 609:320 going left to 539:352 Record found. Record should be at offset 352 Buffer pool adding new block in buffer #4 Desired record found in buffer #4 539 14 11648 pqhbsiujwdawzq ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 7 Command: debug buffers Buffer Offset Bytes ------------------------------------------------------------------------------------0 192 D3 01 0D 00 B8 27 00 00 76 69 6E 79 67 75 79 75 64 6D 76 7A 62 20 20 20 20 20 20 20 20 20 20 20 83 01 05 00 4C 06 00 00 68 78 76 6A 62 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 1 128 AF 6C F6 20 FB 20 7A 20 39 63 29 6A 61 64 1B 77 00 62 00 20 05 20 05 20 06 6B 06 20 02 78 02 64 17 6F 07 20 02 20 05 20 15 64 09 20 16 66 0E 61 00 79 00 20 00 20 00 20 00 6D 00 20 00 7A 00 77 9F 75 58 20 41 20 45 20 FA 66 D8 20 47 74 80 7A 7A 77 0C 20 01 20 06 20 61 79 12 20 6D 72 2D 71 00 6C 00 20 00 20 00 20 00 71 00 20 00 6C 00 20 00 6E 00 20 00 20 00 20 00 71 00 20 00 64 00 20 76 73 66 20 61 20 76 20 61 72 70 20 79 6F 70 20 64 70 6B 20 70 20 6C 20 65 71 72 20 75 6D 71 20 69 79 73 20 20 20 6E 20 79 69 75 20 78 6A 68 20 6B 76 78 20 20 20 72 20 70 6A 73 20 67 71 62 20 74 6F 64 20 20 20 61 20 6C 68 65 20 70 6B 73 20 73 70 70 20 20 20 20 20 6D 20 6D 20 6E 76 69 20 78 75 79 20 20 20 20 20 69 20 70 20 74 20 75 20 72 20 20 20 20 20 20 20 6E 20 61 20 69 20 6A 20

768

896

320

------------------------------------------------------------------------------------++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 8 Command: show 999 going to root 905:0 going right to 1270:512 going left to 1136:544

Page 7 of 9

CS 2604 Project 3

Spring 2002

going left to 1007:576 going left to 994:608 going right to empty subtree. No record with key value 999 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 9 Command: debug buffers Buffer Offset Bytes ------------------------------------------------------------------------------------0 192 D3 01 0D 00 B8 27 00 00 76 69 6E 79 67 75 79 75 64 6D 76 7A 62 20 20 20 20 20 20 20 20 20 20 20 83 01 05 00 4C 06 00 00 68 78 76 6A 62 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 1 128 AF 6C F6 20 FB 20 7A 20 39 63 29 6A 61 64 1B 77 00 62 00 20 05 20 05 20 06 6B 06 20 02 78 02 64 17 6F 07 20 02 20 05 20 15 64 09 20 16 66 0E 61 00 79 00 20 00 20 00 20 00 6D 00 20 00 7A 00 77 9F 75 58 20 41 20 45 20 FA 66 D8 20 47 74 80 7A 7A 77 0C 20 01 20 06 20 61 79 12 20 6D 72 2D 71 00 6C 00 20 00 20 00 20 00 71 00 20 00 6C 00 20 00 6E 00 20 00 20 00 20 00 71 00 20 00 64 00 20 76 73 66 20 61 20 76 20 61 72 70 20 79 6F 70 20 64 70 6B 20 70 20 6C 20 65 71 72 20 75 6D 71 20 69 79 73 20 20 20 6E 20 79 69 75 20 78 6A 68 20 6B 76 78 20 20 20 72 20 70 6A 73 20 67 71 62 20 74 6F 64 20 20 20 61 20 6C 68 65 20 70 6B 73 20 73 70 70 20 20 20 20 20 6D 20 6D 20 6E 76 69 20 78 75 79 20 20 20 20 20 69 20 70 20 74 20 75 20 72 20 20 20 20 20 20 20 6E 20 61 20 69 20 6A 20

768

896

320

------------------------------------------------------------------------------------++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 10 Command: update 467 3 42 abc Putting: 192 3 42 abc Writing data in buffer 0 Record updated. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 11 Command: debug buffers Buffer Offset Bytes ------------------------------------------------------------------------------------0 192 C0 00 03 00 2A 00 00 00 61 62 63 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 83 01 05 00 4C 06 00 00 68 78 76 6A 62 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 1 128 AF 6C F6 20 FB 20 7A 20 39 63 29 6A 61 64 1B 77 00 62 00 20 05 20 05 20 06 6B 06 20 02 78 02 64 17 6F 07 20 02 20 05 20 15 64 09 20 16 66 0E 61 00 79 00 20 00 20 00 20 00 6D 00 20 00 7A 00 77 9F 75 58 20 41 20 45 20 FA 66 D8 20 47 74 80 7A 7A 77 0C 20 01 20 06 20 61 79 12 20 6D 72 2D 71 00 6C 00 20 00 20 00 20 00 71 00 20 00 6C 00 20 00 6E 00 20 00 20 00 20 00 71 00 20 00 64 00 20 76 73 66 20 61 20 76 20 61 72 70 20 79 6F 70 20 64 70 6B 20 70 20 6C 20 65 71 72 20 75 6D 71 20 69 79 73 20 20 20 6E 20 79 69 75 20 78 6A 68 20 6B 76 78 20 20 20 72 20 70 6A 73 20 67 71 62 20 74 6F 64 20 20 20 61 20 6C 68 65 20 70 6B 73 20 73 70 70 20 20 20 20 20 6D 20 6D 20 6E 76 69 20 78 75 79 20 20 20 20 20 69 20 70 20 74 20 75 20 72 20 20 20 20 20 20 20 6E 20 61 20 69 20 6A 20

768

896

320

-------------------------------------------------------------------------------------

Page 8 of 9

CS 2604 Project 3

Spring 2002

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 12 Command: exit exit command ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Buffer pool cleaning up: Buffer pool writing block in buffer #0 to disk Hits: Misses: Writebacks: 1 5 1

Hex dump of sample binary data file:


00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -----------------------------------------------------------------89 03 16 00 E4 6C 00 00 63 7A 63 7A 62 6D 64 64 6E 78 72 6F 6D 74 76 74 70 75 67 70 64 6B 20 20 11 02 10 00 73 39 00 00 6B 74 71 64 74 67 66 75 6A 63 75 74 70 66 66 6D 20 20 20 20 20 20 20 20 2F 01 0D 00 DB 24 00 00 6C 64 72 64 66 74 6F 64 65 62 6C 65 65 20 20 20 20 20 20 20 20 20 20 20 B0 00 0F 00 9D 32 00 00 6C 63 61 70 6F 76 6D 64 6E 71 6C 77 67 6D 63 20 20 20 20 20 20 20 20 20 AF 00 17 00 9F 7A 00 00 76 64 69 6B 74 73 78 72 6C 62 6F 79 75 77 6C 6E 73 70 79 76 6F 70 75 20 F6 00 07 00 58 0C 00 00 66 6B 73 78 64 70 79 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 . . . 7A 05 05 00 45 06 00 00 76 6C 6E 72 61 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 51 05 12 00 47 49 00 00 6B 68 72 70 75 76 74 6C 76 66 66 62 6B 6A 69 73 6B 7A 20 20 20 20 20 20 9F 05 06 00 F0 08 00 00 65 70 77 68 70 69 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 39 06 15 00 FA 61 00 00 61 65 79 70 6C 6D 69 6E 63 6B 64 6D 66 79 71 71 72 71 69 6A 68 20 20 20 29 06 09 00 D8 12 00 00 70 72 75 73 65 6D 70 61 6A 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 8C 06 0E 00 4B 2E 00 00 7A 68 71 77 6A 6D 75 74 70 74 61 6A 78 7A 20 20 20 20 20 20 20 20 20 20 -----------------------------------------------------------------00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

Page 9 of 9

You might also like