You are on page 1of 11

CS107 Winter 2014

Handout 05

CS107 Midterm Examination

February 19th, 2014

This is a closed book, closed note, closed computer exam. You have 120 minutes to complete all problems. You dont need to #include any libraries, and you neednt use assert to guard against any errors. Understand that the majority of points are awarded for concepts taught in CS107, and not prior classes. You dont get many points for for-loop syntax, but you certainly get points for proper use of &, *, and the low-level C functions introduced in the course. If youre taking the exam remotely and have questions, you can telephone Jerry at 415-205-2242.

Good luck! SUNet ID: Last Name: First Name: _____________________ _____________________ _____________________

I accept the letter and spirit of the honor code. Ive neither given nor received aid on this exam. I pledge to write more neatly than I ever have in my entire life.

[signed] __________________________________________________________

Score 1. The accumulate generic 2. CLexicon, Take II 3. Short Answers Total [13] ______ [15] ______ [17] ______ [45] ______

Grader ______ ______ ______ ______

2 CVector Functions
typedef int (*CompareFn)(const void *addr1, const void *addr2); typedef void (*CleanupElemFn)(void *addr); typedef struct CVectorImplementation CVector; CVector *cvec_create(int elemsz, int capacity_hint, CleanupElemFn fn); void cvec_dispose(CVector *cv); int cvec_count(const CVector *cv); void *cvec_nth(const CVector *cv, int index); void cvec_insert(CVector *cv, const void *addr, int index); void cvec_append(CVector *cv, const void *addr); void cvec_replace(CVector *cv, const void *addr, int index); void cvec_remove(CVector *cv, int index); int cvec_search(const CVector *cv, const void *key, CompareFn cmp, int start, bool sorted); void cvec_sort(CVector *cv, CompareFn cmp); void *cvec_first(CVector *cv); void *cvec_next(CVector *cv, void *prev);

Other Functions
void void void void void *memcpy(void *dest, const void *src, size_t n); *memmove(void *dest, const void *src, size_t n); *malloc(size_t size); *realloc(void *ptr, size_t size); free(void *ptr);

size_t strlen(const char *s); char *strcpy(char *dest, const char *src); char *strncpy(char *dest, const char *src, size_t n); char *strdup(const char *s); char *strcat(char *dest, const char *src);

3 Problem 1: The accumulate generic [13 points] Consider the following two functions, each of which should be easy to understand:
int int_array_product(const int array[], size_t n) { int result = 1; for (size_t i = 0; i < n; i++) { result = result * array[i]; } return result; } double double_array_sum(const double array[], size_t n) { double result = 0; for (size_t i = 0; i < n; i++) { result = result + array[i]; } return result; }

Theres no denying the two routines are algorithmically trivial, but its still worth pointing out their similarities anyway. each one operates on an array of values, and in fact the implementation pledges to never change anything within the array each starts with an initial value (1 for int_array_product, 0.0 for double_array_sum) and ultimately produces a return value whose data type exactly matches the base type of the supplied array the return value of interest is generated by n applications of a binary function, where the first argument is the partial result of whats accumulated via prior iterations, and the second argument is the next element in the array

The above observations are explicitly spelled out so that you can implement a generic accumulate function that not only takes the base address of an array and its effective length, but also takes the size of the array element, the binary function that should be repeatedly applied, the address of the default value for empty arrays, and the address where the overall result should be placed. The generalization of a binary function can be captured by the following type definition:
typedef void (*BinaryFunc)(void *partial, const void *next, void *result);

Any function that can interpret the data at the first two addresses, combine them, and place the result at the address identified via the third address falls into the BinaryFunc function class. (The const appears with the second address, because its expected that the array elementsthe elements that cant be modifiedbe passed by address through the next parameter.)

4 a) [3 points] First, implement the generic accumulate routine, which accepts an array of elements, the arrays size, and the size of each array element, and repeatedly applies the supplied binary function as necessary to produce a result and places that result at the supplied address. Weve provided some of the more obvious code that should contribute to your implementation. You should fill in the three arguments needed so that memcpy can set the space addressed by result to be identical to the space addressed by init, and then go on to fill in the body of the for loop.
void accumulate(const void *base, size_t n, size_t elem_size, BinaryFunc fn, const void *init, void *result) { memcpy(________________, ________________, ________________); for (size_t i = 0; i < n; i++) {

} }

b) [4 points] Now reimplement the int_array_product function from the previous page to leverage the accumulate function you just implemented for part a). Assume the name of the callback function passed to accumulate (which you must implement) is called multiply_two_numbers.
static void multiply_two_numbers(void *partial, const void *next, void *result) {

int int_array_product(const int array[], size_t n) {

5 c) [6 points] Finally, implement the concatenate_strings function to leverage your accumulate function and a custom callback function to return a dynamically allocated string that is the ordered concatenation of all strings in the supplied array of the given length. For instance, the following code snippet:
const char *words[] = {"the", "re", "abo", "ut"}; char *concatenation = concatenate_strings(words, 4); printf("The concatenation is the string \"%s\".", concatenation); free(concatenation);

would print:
The concatenation is the string "thereabout".

Your implementation should only call accumulate once and should be careful to not leak any memory. Assume the name of your callback function (which you must also implement) is concatenate_two_strings, which needs to be passed to your one call to accumulate. concatenate_two_strings should dynamically allocate just enough space to store the concatenation of the two '\0'-terminated C strings associated with partial and next, and drop that '\0'-terminated concatenation in the space associated with result.
static void concatenate_two_strings(void *partial, const void *next, void *result) {

char *concatenate_strings(const char *array[], size_t n) {

6 Problem 2: The CLexicon, Take II [15 points] The lexicon is one of those data structures that operates much like a traditional dictionary, except it stores just the words but not the definitions. Theyre typically used in applications that need to verify that tokens are meaningful strings of textperhaps an English word, or a medical term, or a stock symbol. The most obvious way to implement a lexicon: to store a sorted array or CVector of dynamically allocated strings, as its simple and allows search functionality to run fairly quickly. Another approach might use a CMap (mapping all keys to a dummy value thats ignored, using the NULL versus non-NULL value returned by cmap_get to decide if a key is present) so that lookup can run in average O(1) time. All of the above approaches share a similar problemall of the strings are allocated individually, thereby fragmenting the heap into lots and lots of little nodes that make dynamic memory allocation in other parts of the program more work. One approach that avoids fragmentation: lay down all of the '\0'-terminated C strings side by side in a single stream of bytes, and use a CVector of offsets to index into that stream to enable fast lookup. To simplify the problem, well require that the stream of bytes storing the string always be sized to be exactly whats needed, and to simplify functionality used to add a word, well always just append new strings to the end of the stream. However, the numbers within the offsets CVector are ordered such that they identify all of the words in the CLexicon in alphabetical order. Heres the reduced interface for the CLexicon data type:
typedef struct CLexiconImpl CLexicon; struct CLexiconImpl { CVector *offsets; // CVector of offsets into the characters array char *characters; // the single block of memory that stores all strings int length; // the number of bytes addressed by the characters field }; CLexicon *clex_create(); bool clex_contains(CLexicon *lex, const char *word); void clex_add(CLexicon *lex, const char *word); void clex_dispose(CLexicon *lex);

If a lexicon is constructed using clex_create, and then clex_add is used to insert "be", "dabble", "cusp", and "apes" in that order, the parts of the CLexicon most relevant to the problem would look as follows:

length characters offsets

20 b e \0 d a b b l e \0 c u s p \0 a p e s \0 15 0 10 3

Heres a description of the above illustration: The four strings are laid down side by side in the array of bytes addressed by the characters field, and theyre laid down in the order they were inserted. offsets stores 15, 0, 10, and 3, because they each identify the index where one of the four words begins. And because "apes" (starting at offset 15) is alphabetically less that the other three words, the 15 occupies position 0. "be" comes next alphabetically, so the offset of the 'b' (0 in this case) occupies position 1. And so on. And so on. a) [5 points] Present implementations for clex_create and clex_dispose. clex_create dynamically allocates and returns an empty CLexicon, whereas clex_dispose releases any resources held by the supplied CLexicon.
typedef struct CLexiconImpl CLexicon; struct CLexiconImpl { CVector *offsets; // CVector of offsets into the characters array char *characters; // the single block of memory that stores all strings int length; // the number of bytes addressed by the characters field }; CLexicon *clex_create() {

void clex_dispose(CLexicon *lex) {

8 b) [5 points] Now implement clex_contains, which uses cvec_search to determine whether or not word is included in the CLexicon attached to lex. Your implementation should: run in O(lg n) time, when n is the number of strings stored in the lexicon, use cvec_search instead of rewriting a binary search routine, should make use of an auxiliary struct auxdata (which you'll define yourself) to pass necessary information to the helper comparison function (which well assume is called int_cmp), and may assume the key passed to cvec_search is always passed as the first argument to the helper comparison function.

struct auxdata {

int int_cmp(const void *addr1, const void *addr2) {

bool clex_contains(CLexicon *lex, const char *word) {

9 c) [5 points] Finally, implement the clex_add function, which inserts word into the CLexicon attached to lex if the word isnt already present. If word is already present, then return without modifying anything. If the word needs to be inserted, then the characters array must be resized to perfectly accommodate all of the previously inserted strings and the one being inserted. The strings within the characters array shouldnt be sorted, but the offsets within offsets should be ordered such that clex_contains can run in logarithmic time. You should avoid cvec_search, opting instead to do a linear search for the insertion index within offsets. (Just use an exposed for loop for the linear searchdont use lfind or lsearch or anything like that).
void clex_add(CLexicon *lex, const char *word) {

10 Problem 3: Short Answers [17 points] [6 points] For the following question, youre to print the decimal address the pointer arithmetic expression would produce (or say that the pointer arithmetic expression is illegal if its indeed illegal when compiled as weve compiled all of our CS107 assignments.) You should assume that base evaluates to 1000, that chars are one byte in size, shorts are two bytes in size, doubles are eight bytes in size, and all pointers are four bytes in size. What address is produced by a) (char *) base + 10 ? b) (char **) base + 10 ? c) (void *) base + 10 ? d) (double **) base + 10 ? e) (double ****) base + 10 ? f) (void **)((short *) base + 10) + 10 ?

[5 points] For the next question, step through the code very carefully and list the output produced on a little endian system. Recall that on a little endian system, the lowest significant byte of a multi-byte figure is placed at the lowest address. (%x is the printf placeholder to print ints, shorts, and chars as hexadecimal integers without any leading zeroes.)
int a = 0x12345678; int b = a & 0xF; char c = a; char d = *(char *)&a; short e = *(short *)((char *)&a + 1) char f = *(char *)((short *)&a + 1) printf("%x\n", printf("%x\n", printf("%x\n", printf("%x\n", printf("%x\n", b); c); d); e); f);

11 [6 points] Even though theyre very rare, some UTF8-encoded characters require five (five!) bytes of memory. When five bytes are needed, the leading byte of the five-byte figure is constrained to begin with the bit pattern 111110, and each of the remaining four bytes is constrained to begin with the bit pattern 10. That means the general format of any five-byte character (when encoded using UTF8) would look like this: 111110ab 10cdefgh 10ijklmn 10opqrst 10uvwxyz The code point (which is the UTF8 equivalent of an ACSII number) is the unsigned int represented when bits a through z are packed side by side. Youre to implement a function called compute_code_point that accepts a const unsigned char *, known to address the leading byte of a five-byte UTF8-encoded character, and constructs the unsigned int whose bit pattern is: 000000ab cdefghij klmnopqr stuvwxyz You should use bitmask operations like <<, >>, &, |, ~, and ^ to synthesize the code point within a four-byte unsigned int and then return it. (Note that buf is not '\0'-terminated, since its understood to be of length five.)
unsigned int compute_code_point(const unsigned char buf[]) {