3/29/2014

CSC190 Project 3 - Engineering Science Class of 1T7 Wiki

CSC190 Project 3
From Engineering Science Class of 1T7 Wiki Congratulations on having all earned 100% on Project 1 and Project 2. Unfortunately those projects were worth 0% of your overall CSC190 grade. Project 3, however, is worth 26%

Contents
1 Logistics 1.1 Part A 1.2 Part B 2 Compiling and Testing 3 Additional Resources 4 Part A 4.1 Overview 4.2 Data Types 4.3 API 4.4 Starter code 4.5 Student-developed tools 5 Part B 5.1 Public Data Types 5.2 New API Functions 5.2.1 GetHashTableInfo() 5.2.2 SetResizeBehaviour() 5.3 API Changes 5.3.1 CreateHashTable() 5.3.2 InsertEntry() 5.3.3 DeleteEntry() 5.3.4 GetLoadFactor() 5.4 Implementation Suggestions

Logistics
Part A
Work is to be done independently and submitted individually Project 3 Part A is worth 16% of your total CSC190 grade Due on Tuesday, March 25 at 2400 aka Tuesday midnight The git repository for Project 3 Part A can be cloned using:
https://design.engsci.utoronto.ca/wikis/1t7/index.php/CSC190_Project_3#Part_B_2 1/9

3/29/2014

CSC190 Project 3 - Engineering Science Class of 1T7 Wiki

g i tc l o n eg i t o l i t e @ d e s i g n . e n g s c i . u t o r o n t o . c a : p r o j e c t s / p 3 a / u t o r i d

Note that you must replace utorid with your own UTORid. All submissions must be received via a git push from your clone. Further instructions on using git are available elsewhere on this Wiki. You must submit the following files: HashTable.h HashTable.c HashTable.h should only include the declarations for the functions listed under API, HashTableObject, and HashTablePtr - it should not include the declarations of any additional functions or data structures. Those should be declared in HashTable.c so that they remain "private" to your implementation. Note that your submission will be compared with the submissions of other students to determine whether code has been shared inappropriately. As per the University of Toronto Academic Integrity Policy (http://www.utoronto.ca/academicintegrity/Academic_integrity.pdf) should there be a academic integrity concern on an assignment worth more than 10% the process involves the Dean of Engineering.

Part B
Project 3 Part B is worth 10% of your total CSC190 grade Due on Sunday, April 06 at 2400 The git repository for Project 3 Part B can be cloned using:
g i tc l o n eg i t o l i t e @ d e s i g n . e n g s c i . u t o r o n t o . c a : p r o j e c t s / p 3 b / u t o r i d

Note that you must replace utorid with your own UTORid. All submissions must be received via a git push from your clone. Further instructions on using git are available elsewhere on this Wiki. You must submit the following files: HashTable.h HashTable.c HashTable.h must only include the declarations for the types listed under "Public Data Types" and for the functions that comprise the API. No other declarations are permitted. All definitions should be contained in HashTable.c Note that your submission will be compared with the submissions of other students to determine whether code has been shared inappropriately. As per the University of Toronto Academic Integrity Policy (http://www.utoronto.ca/academicintegrity/Academic_integrity.pdf) should there be a academic integrity concern on an assignment worth 10% or less the process involves the Chair of the Division.

Compiling and Testing
https://design.engsci.utoronto.ca/wikis/1t7/index.php/CSC190_Project_3#Part_B_2 2/9

3/29/2014

CSC190 Project 3 - Engineering Science Class of 1T7 Wiki

Project 3 follows from the Labs in expecting you to program defensively. As such your code should compile cleanly, definitely using clang and ideally using both clang and gcc, with the following command line parameters:
s t d = c 9 9W a l lW e r r o rW c o n v e r s i o np e d a n t i cg

When linking you can assume the following command line parameters:
l m

Your code should run cleanly under Valgrind using the following command line parameters:
q u i e tl e a k c h e c k = f u l lt r a c k o r i g i n s = y e s

Note that you are not responsible for Valgrind errors related to any 3rd party libraries (such as GNU readline). That said, you should not be relying on such libraries for your core code - but for a tester they're perfectly reasonable.

Additional Resources
Note that these have been sourced by instructors and students http://en.wikipedia.org/wiki/Algorithms_%2B_Data_Structures_%3D_Programs - has a link to a PDF of Wirth's book on data structures

Part A
In part A of project 3 you will be implementing a data structure that maps keys to values. Specifically you will be implementing a hash table (http://en.wikipedia.org/wiki/Hash_table) . Implementing this data structure will involve integrating: the concept of an API (http://en.wikipedia.org/wiki/Application_programming_interface) as introduced in Lab 06 the concept of implementing a set of functions defined in a header (and then shipping a compiled object) as introduced in Labs 04 and 06 the concept of hashing and hash functions (http://en.wikipedia.org/wiki/Hash_function) as introduced in Midterm 02 Question 1

Overview
A hashtable is like a Python dictionary in that it allows you to store and retrieve arbitrary data based on a key. You must implement an open hash table with separate, linear chaining (http://en.wikipedia.org/wiki/Hash_table#Separate_chaining) .
https://design.engsci.utoronto.ca/wikis/1t7/index.php/CSC190_Project_3#Part_B_2 3/9

3/29/2014

CSC190 Project 3 - Engineering Science Class of 1T7 Wiki

Specifically you are to use the hash function once to determine the correct "bucket" in which to insert, find, or delete the entry. If there is a different entry in that bucket, then you must create a linked list of elements with the same hash value. In the degenerate case where you have one bucket, you will effectively be creating a linked list.

Data Types
t y p e d e fs t r u c t { i n ts e n t i n e l ; / /w h a t e v e re l s ey o ut h i n ki si m p o r t a n tt os t o r ei nt h eH a s h T a b l es t r u c t }H a s h T a b l e O b j e c t ; t y p e d e fH a s h T a b l e O b j e c t* H a s h T a b l e P T R ;

Note that HashTableObject.sentinel is included to help you validate that a structure is in fact a HashTableObject. Read the description of CreateHashTable() and the return codes for the other API functions for additional information. Note also that you will need to design and implement additional data structure(s) to implement your hash table.

API
i n tC r e a t e H a s h T a b l e (H a s h T a b l e P T R* h a s h T a b l e H a n d l e ,u n s i g n e di n ti n i t i a l S i z e) ;

initialSize is the initial (and, in Project 3A, final) number of "buckets" in the hash table expects that initialSize is > 0 creates (that is, allocates and initializes) a hash table object allocates sufficient space to initialize the hash table sets the value of sentinel to 0xDEADBEEF sets the value of *hashTableHandle to the address of the newly created hash table object returns 0 if the allocation and initialization was successful -1 if there was insufficient memory to allocate the hash table
i n tD e s t r o y H a s h T a b l e (H a s h T a b l e P T R* h a s h T a b l e H a n d l e) ;

expects that hashTableHandle points to a valid HashTablePTR frees all of the memory allocated by the hash table API (which will likely include the "bucket" array, linked list nodes, copies of the keys, etc.) sets the value of the hashTableHandle to NULL does not free any of the memory allocated outside of the API returns 0 if the destruction was successful -1 if the address pointed to by hashTableHandle does not point to a HashTableObject created by CreateHashTable()
https://design.engsci.utoronto.ca/wikis/1t7/index.php/CSC190_Project_3#Part_B_2 4/9

3/29/2014

CSC190 Project 3 - Engineering Science Class of 1T7 Wiki

i n tI n s e r t E n t r y (H a s h T a b l e P T Rh a s h T a b l e ,c h a r* k e y ,v o i d* d a t a ,v o i d* * p r e v i o u s D a t a H a n d l e) ;

expects that hashTablePTR points to a valid HashTableObject expects that key is a valid C string of length > 0 stores a copy of the key string (not of the address) to ensure that the key cannot be changed stores a copy of the address stored in the data pointer if an entry already exists for the key the value of previousDataHandle is assigned to the address of the existing data data is stored in the hash table in the event of a hash collision InsertEntry chains to a linked list (specifically, it does not probe further into the hash table) returns 0 if the insertion was successful 2 if there was an existing entry with the same key (and by implication that previousDataHandle now points to the address of the previous data) 1 if there was a hash collision and the insertion was successful -1 if hashTable does not point to a HashTableObject created by CreateHashTable() -2 if there was insufficient memory to complete the insert
i n tD e l e t e E n t r y (H a s h T a b l e P T Rh a s h T a b l e ,c h a r* k e y ,v o i d* * d a t a H a n d l e) ;

deletes an entry from the hash table and sets the value of dataHandle to the address of the data passed to InsertEntry() along with the specified key returns 0 if the entry was deleted successfully -1 if hashTable does not point to a HashTableObject created by CreateHashTable() -2 if the key was not found
i n tF i n d E n t r y (H a s h T a b l e P T Rh a s h T a b l e ,c h a r* k e y ,v o i d* * d a t a H a n d l e) ;

sets the value of dataHandle to the address of the data passed to InsertEntry() along with the specified key returns 0 if the entry was found -1 if hashTable does not point to a HashTableObject created by CreateHashTable() -2 if the key was not found

i n tG e t K e y s (H a s h T a b l e P T Rh a s h T a b l e ,c h a r* * * k e y s A r r a y H a n d l e ,u n s i g n e di n t* k e y C o u n t) ; N o t et h a tt h ep r e v i o u sd e c l a r a t i o nw a s :i n tG e t K e y s (H a s h T a b l e P T Rh a s h T a b l e ,c h a r*( * k e y s A r r a y H a n d l e ) [ ]

allocates space for an array of strings (aka. character pointers) large enough to hold all of the keys creates a copy (by both allocating and setting) of each key sets the value of the keyCount parameter to the number of keys in the array note that the caller is responsible for freeing the keysArray and all associated strings
https://design.engsci.utoronto.ca/wikis/1t7/index.php/CSC190_Project_3#Part_B_2 5/9

3/29/2014

CSC190 Project 3 - Engineering Science Class of 1T7 Wiki

returns 0 if the keys were returned successfully -1 if hashTable does not point to a HashTableObject created by CreateHashTable() -2 if there was insufficient memory to allocate space for all of the keys
i n tG e t L o a d F a c t o r (H a s h T a b l e P T Rh a s h T a b l e ,f l o a t* l o a d F a c t o r) ;

sets the value of the loadFactor parameter to the current load factor (http://en.wikipedia.org/wiki/Hash_table#Key_statistics) returns 0 if the load factor was determined and returned successfully -1 if hashTable does not point to a HashTableObject created by CreateHashTable()

Starter code
A copy-paste of the above code snippets in header format: https://gist.github.com/zhuowei/9533905 To use: run
w g e th t t p s : / / g i s t . g i t h u b u s e r c o n t e n t . c o m / z h u o w e i / 9 5 3 3 9 0 5 / r a w / h a s h t a b l e . h

Note: This will not work without modification. Modifications are left as an exercise to the reader. ;) (Since this is just the above code in a more usable format, the assumption is that this does not count as submitting someone else's code; feel free to remove this if said assumption is wrong)

Student-developed tools
Note that these tools have been developed by your classmates and have not been "vetted" by the CSC190 instructors. Use at your own risk. http://cdecl.org/ <- to translate crazy C declarations (like the aforementioned char * (*keysArrayHandle)[]) into something human readable https://github.com/zhuowei/csc190/ <- an unofficial tester for Project 3 (Updated Sunday, March 23 with key order fix - please re-test) To download:
g i tc l o n eh t t p s : / / g i t h u b . c o m / z h u o w e i / c s c 1 9 0 . g i to u t s i d e t e s t e r

then, copy your hashTable.c and hashTable.h into the new outsidetester folder. To run,
m a k e v a l g r i n dq u i e tl e a k c h e c k = f u l lt r a c k o r i g i n s = y e s. / h a s h t a b l e T e s t e r< q 1 i n p u t . t x t> q 1 o u t p u t . t x t d i f fq 1 o u t p u t . t x tq 1 _ z z _ o u t p u t . t x t
https://design.engsci.utoronto.ca/wikis/1t7/index.php/CSC190_Project_3#Part_B_2 6/9

3/29/2014

CSC190 Project 3 - Engineering Science Class of 1T7 Wiki

https://www.dropbox.com/sh/xyps7isauyfdi66/gaAYv5n_hf Another tester, more up to date. You may have to remove the part that tests the Hash function if your hash function is named differently. Run ./test or ./run for auto or manual testing options. My hashTable.h is included if you need to find the right names. (Here's a modified version of the above with the Hash function removed, commands modified to match the first tester, and a print function added to the tester proper: https://www.dropbox.com/s/gkls8lkwaxjja7u/moddedtester.zip )

Part B
In Part B of Project 3 you will primarily be improving the performance characteristics of the hash table you developed in part A. The requirements specified in Part A can result in the following issues: an excessive number of collisions (if the number of hash buckets is too small) O(n) behaviour in searching through the "chains" of nodes with the same hash value (because you are using a linked list to store the nodes involved in a collision) To address these issues you will: dynamically expand or contract the number of buckets in your hash table depending on the load factor replace the linked list "chains" with binary trees (which will be discussed in class next week and are the subject of Lab 09) Note that Part B focuses on time performance, not on space performance. In other words you can assume that there is sufficient memory available that you don't have to perform your operations "in place" and can instead temporarily make use of additional memory.

Public Data Types
Note that this is different than in Project 3A. Previously your HashTableObject was both declared and defined in your public header. In Project 3B, HashTableObject is only declared publicly; HashTableObject is defined in your implementation file.

t y p e d e fs t r u c tH a s h T a b l e O b j e c t T a gH a s h T a b l e O b j e c t ; t y p e d e fH a s h T a b l e O b j e c t* H a s h T a b l e P T R ; t y p e d e fs t r u c tH a s h T a b l e I n f o T a g { u n s i g n e di n tb u c k e t C o u n t ;/ /c u r r e n tn u m b e ro fb u c k e t s f l o a tl o a d F a c t o r ;/ /(n u m b e ro fe n t r i e s/n u m b e ro fb u c k e t s) f l o a tu s e F a c t o r ;/ /(n u m b e ro fb u c k e t sw i t ho n eo rm o r ee n t r i e s/n u m b e ro fb u c k e t s) u n s i g n e di n tl a r g e s t B u c k e t S i z e ;/ /n u m b e ro fe n t r i e si nt h eb u c k e tc o n t a i n i n gt h em o s te n t r i e s i n td y n a m i c B e h a v i o u r ;/ /w h e t h e ro rn o tt h eH a s hT a b l ew i l lr e s i z ed y n a m i c a l l y f l o a te x p a n d U s e F a c t o r ;/ /t h ev a l u eo fu s e F a c t o rt h a tw i l lt r i g g e ra ne x p a n s i o no ft h en u m b e ro fb u c f l o a tc o n t r a c t U s e F a c t o r ;/ /t h ev a l u eo fu s e F a c t o rt h a tw i l lt r i g g e rac o n t r a c t i o ni nt h en u m b e ro f }H a s h T a b l e I n f o ;

https://design.engsci.utoronto.ca/wikis/1t7/index.php/CSC190_Project_3#Part_B_2

7/9

3/29/2014

CSC190 Project 3 - Engineering Science Class of 1T7 Wiki

To be blunt, HashTable.h should contain only the declarations above and the function signatures for the API. Pedantically it should provide only the types, not the names, of the function parameters.

New API Functions
GetHashTableInfo()

i n tG e t H a s h T a b l e I n f o (H a s h T a b l e P T Rh a s h T a b l e ,H a s h T a b l e I n f o* p H a s h T a b l e I n f o)

fills in the contents of hashTableInfo with the values appropriate for hashTable assumes that hashTableInfo has been properly allocated by the caller (in other words, GetHashTableInfo() does not allocate any memory) returns 0 if hashTableInfo was filled in appropriately -1 if hashTable does not point to a HashTableObject created by CreateHashTable()
SetResizeBehaviour()

i n tS e t R e s i z e B e h a v i o u r (H a s h T a b l e P T Rh a s h T a b l e ,i n td y n a m i c B e h a v i o u r ,f l o a te x p a n d U s e F a c t o r ,f l o a tc o n t r

sets the resize behaviour for hashTable dynamicBehaviour is either true (nonzero) or false (0) expandUseFactor must be larger than contractUseFactor the caller is responsible for setting useful values of expandUseFactor and contractUseFactor returns 0 if the behaviours were set successfully 1 if contractUseFactor is greater than or equal to expandUseFactor -1 if hashTable does not point to a HashTableObject created by CreateHashTable()

API Changes
CreateHashTable()

on initial creation a hash table should be configured as though the following call was made:
S e t R e s i z e B e h a v i o u r (h a s h T a b l e ,1 ,0 . 7 ,0 . 2)

InsertEntry()

if dynamicBehaviour is true then the result of inserting an entry cannot cause useFactor to exceed expandUseFactor in the event of a hash collision InsertEntry chains to a binary search tree
https://design.engsci.utoronto.ca/wikis/1t7/index.php/CSC190_Project_3#Part_B_2 8/9

3/29/2014

CSC190 Project 3 - Engineering Science Class of 1T7 Wiki

DeleteEntry()

if dynamicBehaviour is true then the result of deleting an entry cannot cause useFactor to become smaller than contractUseFactor
GetLoadFactor()

the functionality of GetLoadFactor() has been subsumed by GetHashTableInfo() and as such GetLoadFactor() has been deprecated

Implementation Suggestions
start by implementing GetHashTableInfo() even if you're returning "garbage" values ... and add printing out the results of that function call to your tester then implement SetResizeBehaviour() even if you're not actually resizing ... and add checking whether the behaviour has been set using GetHashTableInfo() check on both of these implementations, since they're how you'll be interacting with the new behaviours next get expansion working first work out how to "copy" a hash table using only your existing API calls, code up a "proof of concept", and make sure that it's valgrind clean then handle determining whether or not expansion is necessary ... just output something along the lines of "now would be a good time to expand" to stderr and finally "tie" everything together wash, rinse, repeat for contraction ... at this point you're in the 5-6 out of 10 range ... "refactor" your linked-list based hash table implementation so that you are using the API described in Lab 08. Seriously. This is probably the most important step if you want to make your life easier. complete Lab 09 which has an API that bears a striking resemblance to the API in Lab 08. "drop in" Lab 09 in place of Lab 08 within you hash table implementation ... and now you're done! Retrieved from "https://design.engsci.utoronto.ca/wikis/1t7/index.php?title=CSC190_Project_3&oldid=1225" This page was last modified on 29 March 2014, at 19:06. This page has been accessed 3,250 times.

https://design.engsci.utoronto.ca/wikis/1t7/index.php/CSC190_Project_3#Part_B_2

9/9