91.

102: Honors Computing II Data Structures

Willie Boag
Spring 2013 May 10, 2013

Table of Contents
Kruskal’s MST
Abstract, Description Algorithm, Challenges Testing C Implementation

4

Simple Line Editor
Abstract, Description Drawbacks, Reflection C Implementation

29

Topological Sort
Abstract, Description Algorithm, Testing Reflection C Implementation

50

Bloom Filters
Abstract, Description Benefits, Drawbacks Implementation Choices, Motivation Results C Implementation

69

Fast Fourier Transform
Abstract Fourier Transform Original Algorithm, Fast Fourier Transform A Taste of Recursion Why It Matters, Conclusion C Implementation

84

Appendices
A: Kruskal’s MST Testing Code B: Topological Sort Testing Code

98

Kruskal’s MST
Abstract: A minimum spanning tree (MST) of a graph is a new graph that connects all of the vertices of the original in such a way that the sum of the edge weights of the tree is at a minimum. There are many algorithms that have been developed that efficiently find a MST of a graph. Kruskal’s algorithm is one example of a greedy algorithm.

Description: A graph is a set of vertices and the edges that connect them. The edges on the graph can be either one-directional (directed) or two-directional (undirected). Often times, every edge also has an associated weight. For instance, if you represent a road map as a graph, then the cities would be vertices, roads would be edges, and the time it takes to drive from one city to another one would be represented by the weight of the edge joins them. It is usually very useful to find the minimum spanning tree of a given graph. An MST has the property that all cycles are gone (which means it removes some edges from the original graph). One necessary condition for a MST to exist is that the graph must be connected. Since a MST of a graph can only be formed by removing edges, it would be impossible to connect every edge if the graph already starts disconnected. This issue had to be addressed when I was generating test cases for my program.

Algorithm: Kruskal’s algorithm is greedy. A greedy algorithm is one that it decides what action to take based on the best immediate choice. The reason this MST algorithm is greedy is because it decides which edge to consider next by selecting the edge of lowest weight. At any point of the process, the graph that we store the answer in represents a forest (a forest is just a collection of unconnected trees). At each iteration of the algorithm, the next minimum weight is selected. A decision is made as to whether to add that edge to the forest- if the edge combines two trees into one larger tree, then add it, but if the edge would create a cycle in a current tree, then discard it. Once the forest has been unified into one large tree, the MST has been found.

Challenges: While writing this program, the two most challenging functions to write were KruskalMST() and set_union(). Because we were given partially completed code to start from, some of the implementation choices were selected for us. As a result, I found that Collection (the array of sets) was a little clumsy to work with. I think I would’ve preferred a set of sets, because removing an element from the set was less intuitive when using the array. Replacing two sets with one set union created holes in the array, which seemed more awkward than it needed to be. The biggest problem that I had during this assignment was freeing all of my allocated space. I probably spent three times as much time trying to find all of my un-freed space as I did actually writing the program. The reason that I knew I had un-freed space was because I ran the command valgrind --leak-check=full --track-origins=yes –v which monitored my allocated memory and returned a summary of how many pointers were unfreed at the termination of my program. This command was shown to me last semester, although I do not know much at all about how it actually works. That being said, it was very helpful! After a few days of searching for my memory leaks, I finally fixed them all. The most subtle allocation bug that I saw came from set_union(), where any elements that were in both sets S1 and S2 would have one copy of the data stored in the union while the other copy was forgotten about. I could’ve fixed the problem by freeing the extra copy inside set_union(), but I did not want to mutilate the arguments. Ultimately, I decided to copy all of the data into the new set rather than just passing pointers. This made freeing the data much more straightforward.

Testing To test my MST program, I needed lots of test graphs. I wrote a program that generated random graphs. Unfortunately, my first attempt at this resulted in graphs that were not necessarily always connected. Since a MST requires a connected graph, my first approach failed. The solution, however, was very simple. After generating a random graph, I then made an edge from 0 to every other vertex with a very large weight. This allowed the algorithm to function normally but with the back-up edges that always connected everything to 0 (if need be). This ensured that my graphs were always connected.

h heap.h queue.h setinterface.C Implementation: Makefile Makefile 8 Header Files globals.h 9 10 11 12 13 14 15 Source Files main.c globals.h queue_interface.h set.h graph.c graph.c queue_interface.c 16 17 18 22 25 26 28 .c setinterface.c set.c heap.

o: graph.c setinterface.c clean: rm −f *.o graph.o queue_interface.c graph.o main.h gcc −g −ansi −pedantic −Wall −c globals.h set.o set.o globals.c queue_interface.h queue.h gcc −g −ansi −pedantic −Wall −c setinterface.o: queue_interface.c heap.c setinterface.c set.# # Programmer: Willie Boag # # Makefile for Kruskal’s Minimum Spanning Tree # mst: main.h gcc −g −ansi −pedantic −Wall −c graph.o: heap.o gcc −g −o mst main.h globals.o .c set.o setinterface.h gcc −g −ansi −pedantic −Wall −c queue_interface.h globals.h gcc −g −ansi −pedantic −Wall −c main.c queue_interface.o: globals.h graph.h globals.h gcc −g −ansi −pedantic −Wall −c set.h set.o graph.o set.o queue_interface.o: setinterface.o globals.o setinterface.h globals.c graph.h gcc −g −ansi −pedantic −Wall −c heap.h queue.c graph.o heap.h globals.c globals.o: main.c globals.h setinterface.o: set.h globals.h queue_interface.c heap.o heap.

generic_ptr *b ) . typedef void *generic_ptr . typedef enum { FALSE=0 .h (Kruskal’s MST) */ /********************************************************************/ #ifndef _globals #define _globals #define DATA( L ) ( ( L ) −> datapointer ) #define NEXT( L ) ( ( L ) −> next ) typedef enum { OK./********************************************************************/ /* Programmer: Willie Boag */ /* */ /* globals. extern int compare_vertex( generic_ptr *a. TRUE=1 } bool . ERROR } status . #endif .

} graph_header. vertex vertex2. int min_weight( graph G ) . status KruskalMST( graph G.h (Kruskal’s MST) */ /********************************************************************/ #ifndef _graph #define _graph #include "globals. . vertex vertex1. status add_edge( graph G. status write_graph( graph G ) . undirected } graph_type . int from . typedef struct { graph_type type . int *p_edge_cnt ). graph_type type ) . int vertex_cnt. *graph . } edge . int *p_vertex_cnt. int weight ). void destroy_graph( graph *p_G ) . extern extern extern extern extern extern extern #endif status init_graph( graph *p_G. #define UNUSED_WEIGHT (32767) #define WEIGHT(p_e) ((p_e) −> weight ) #define VERTEX(p_e) ((p_e) −> vertex_number ) typedef enum { directed. vertex vertex_number ./********************************************************************/ /* Programmer: Willie Boag */ /* */ /* graph. edge **matrix . int number_of_vertices. typedef struct { int weight .h" typedef int vertex . void graph_size( graph G. graph *T ) .

generic_ptr data . int (*p_cmp_f) () ) . heap_delete( heap *p_H./***************************************************************/ /* Programmer: Willie Boag */ /* */ /* heap. empty_heap( heap *p_H ) . int nextelement. . int heapsize .h" typedef struct { generic_ptr *base. extern extern extern extern #endif status bool status status init_heap( heap *p_H ) . } heap . int element. generic_ptr *p_data. int (*p_cmp_f)() ) . heap_insert( heap *p_H .h (Kruskal’s MST) */ /***************************************************************/ #ifndef _heap #define _heap #define HEAPINCREMENT 128 #include "globals.

/*******************************************************************/ /* Programmer: Willie Boag */ /* */ /* queue. data. #define #define #define #define #endif init_queue( p_Q ) init_heap( (heap *) p_Q) empty_queue( p_Q) empty_heap( (heap *) p_Q) qadd(p_Q. p_cmp_f ) qremove(p_Q. p_cmp_f) heap_insert( (heap *) p_Q.h" typedef heap queue . p_data. p_cmp_f) heap_delete( (heap *) p_Q. p_data. 0. p_cmp_f) . data.h (Kruskal’s MST) */ /*******************************************************************/ #ifndef _queue #define _queue #include "heap.

int *weight. extern status qremove_edge(queue *p_Q. #endif . int from. int *to. int *from.h" extern status qadd_edge(queue *p_Q . int (*p_cmp_func)() ) . int weight.h (Kruskal’s MST) */ /************************************************************/ #ifndef _queueinterface #define _queueinterface #include "queue./************************************************************/ /* Programmer: Willie Boag */ /* */ /* queue_interface. int to. int (*p_cmp_func)() ) .

int size ) . init_set( set *p_S. generic_ptr element. set_union( set *p_S1./************************************************************/ /* Programmer: Willie Boag */ /* */ /* set. generic_ptr *free . int universe_size . status (*p_write_f)() ) . set_write( set *p_S. int (*p_cmp_f)() ) . extern extern extern extern extern ment ) #endif status status bool status status . set_member( set *p_S. set_insert( set *p_S. int (*p_cmp_f)() ) . } set . generic_ptr element.h" typedef struct { generic_ptr *base .h (Kruskal’s MST) */ /************************************************************/ #ifndef _set #define _set #include "globals. int sizeofele . set *p_S2. set *p_S3. int (*p_func_cmp)().

/*****************************************************************/ /* Programmer: Willie Boag */ /* */ /* setinterface. #endif .h" extern status vertex_set_insert( set *p_S .h" #include "set. int v ) .h (Kruskal’s MST) */ /*****************************************************************/ #ifndef _setinterface #define _setinterface #include "globals.

write_graph( T ) .c (Kruskal’s MST) */ /******************************************************************/ #include <stdio. fclose( fileptr ) . &to. T . &weight ) != EOF ) add_edge( G. printf("\n The minimum total weight is %d. fscanf( fileptr. min_weight( T ) ) . fileptr = fopen( argv[1]. "%d %d %d". return 0 . to. &T ) . char *argv[] ){ FILE *fileptr . numberofvertices. } . numberofvertices graph G. destroy_graph( &G ) . . int weight./******************************************************************/ /* Programmer: Willie Boag */ /* */ /* main. &from. printf("\nThe edges of the MST are: \n" ) . KruskalMST( G . numberofvertices. "r"). undirected ). write_graph( G ) . while ( fscanf( fileptr. "%d". undirected ). weight ) .h" int main( int argc. to. from.h> #include <stdlib. destroy_graph( &T ) . printf("\nThe edges of the original graph are: \n" ) . init_graph( &G.h> #include "graph.\n ". init_graph( &T.h" #include "globals. &numberofvertices ). from.

/************************************************************/ /* Programmer: Willie Boag */ /* */ /* globals.c (Kruskal’s MST) */ /************************************************************/ #include "globals.h" extern int compare_vertex( generic_ptr *a. . generic_ptr *b ) { return *(int *)a } − *(int *)b .

int weight ) { if ( vertex1 < 0 || vertex1 > G −> number_of_vertices ) return ERROR . j++ ) { −> matrix[i][j]. } extern void destroy_graph( graph *p_G ) { free((*p_G) −> matrix[0] ) . i < vertex_cnt .h" #include <stdlib.h" "setinterface. j .weight = UNUSED_WEIGHT . −> matrix[i][j]. free((*p_G) −> matrix ) . if ( G == NULL ) return ERROR . } G −> matrix[0] = (edge *) malloc(vertex_cnt*vertex_cnt*sizeof(edge)) .h" "queue_interface. if ( weight <= 0 || weight >= UNUSED_WEIGHT ) return ERROR . return ERROR . −> matrix[i][j]. .h> #include <stdio.h" "queue.h" "set. graph_type type ) { graph G . i++ ) G −> matrix[i] = G −> matrix[0] + vertex_cnt * i . p_G = NULL . j < vertex_cnt . *p_G = G ./*******************************************************************/ /* Programmer: Willie Boag */ /* */ /* graph.h" "graph.vertex_number = j . G −> type = type . int vertex_cnt. G −> number_of_vertices = vertex_cnt . i < vertex_cnt.from = i . if (G −> matrix[0] == NULL ) { free ( G −> matrix ). for ( for G G G } i = 0 . return OK . return ERROR . } extern status add_edge ( graph G. G −> matrix = (edge **) malloc ( vertex_cnt * sizeof(edge *)). G = (graph) malloc ( sizeof(graph_header)) . vertex vertex1. int i. free( G ). if ( vertex2 < 0 || vertex2 > G −> number_of_vertices ) return ERROR .h> extern status init_graph( graph *p_G.c (Kruskal’s MST) */ /*******************************************************************/ #include #include #include #include #include #include "globals. if ( G −> matrix == NULL ) { free(G) . } for ( i = 1 . vertex vertex2. i++ ) ( j = 0 . free(*p_G) .

i < G −> number_of_vertices .weight != UNUSED_WEIGHT ) edges++ . j < G −> number_of_vertices .weight = weight .weight != UNUSED_WEIGHT) return &G −> matrix[vertex_number][other_vertex] . int n ) { int i . if( G −> type == undirected ) G −> matrix[vertex2][vertex1]. other_vertex++) { if (G −> matrix[vertex_number][other_vertex]. if ( set_member( &Collection[i]. size = 0 . if ( G −> type == undirected ) edges /= 2 . *p_vertex_cnt = G −> number_of_vertices . int v. } static int What_Set_Am_I_In( set *Collection. } extern edge *edge_iterator( graph G. i++ ) { if ( Collection[i]. i < numberofvertices . i++ ) . if (vertex_number < 0 || vertex_number >= G −> number_of_vertices) return NULL . if (p_last_return == NULL) other_vertex = 0 . else other_vertex = VERTEX(p_last_return) + 1 . } static int collection_size( set *Collection. for( i = 0 . other_vertex < G−> number_of_vertices . edge *p_last_return ) { vertex other_vertex . i++ ) for ( j = 0 . } return NULL .weight = weight . } extern void graph_size ( graph G.G −> matrix[vertex1][vertex2]. *p_edge_cnt = edges . edges = 0 . for ( i = 0 . } if ( i == n ) return −1 . for ( . return OK . j. for ( i = 0 . j++ ) if ( G −> matrix[i][j]. compare_vertex ) == TRUE ) break . int *p_edge_cnt ) { int i. (generic_ptr) &v.base == NULL ) continue . int *p_vertex_cnt . edges . vertex vertex_number. int numberofvertices ) { int i. i < n . return i .

VERTEX(p_edge). */ while ( collection_size(Collection. S2. . numberofvertices ) . &to. graph_size( G. &weight. if ( S1 != S2 ) { init_set( &S3. /* Free data of S1 and S2. compare_weight ) . &from. item < Collection[S2]. numberofvertices ) . set_union( &Collection[S1].base ) . S1. free( Collection[S1]. } /* While the tree is not fully formed. i++) { p_edge = NULL .base != NULL ) size++ . S2 = What_Set_Am_I_In( Collection. i++) { init_set( &Collection[i]. weight . item++) free( *item ) . numberofvertices.base . generic_ptr *item . */ for (item = Collection[S1].base . for (i = 0 . set *Collection. */ init_queue( &Q ) . i ) . graph *T ) { int i.if ( Collection[i]. to. generic_ptr b ) { return WEIGHT((edge *) a) − WEIGHT((edge *) b) . &Collection[S2].free . numberofvertices) > 1 ) { qremove_edge( &Q. /* Special case: graph has only one vertex. int from. sizeof(vertex) ) . for (i = 0 . item < Collection[S1]. i < numberofvertices . numberofvertices ) . */ Collection = (set *) malloc( sizeof(set) * numberofvertices ) . edge *p_edge . p_edge)) != NULL) qadd_edge( &Q. &numberofedges ) . */ if (numberofvertices == 1) return OK . queue Q . S1 = What_Set_Am_I_In( Collection. WEIGHT(p_edge). for (item = Collection[S2]. /* Construct a priority queue Q containing all edges. item++) free( *item ) . &numberofvertices. to. i < numberofvertices . } /* Create an array of sets. i. &S3. S3 . compare_vertex.free . 1 ) . } extern status KruskalMST( graph G. } static int compare_weight( generic_ptr a. compare_weight ) . return size . vertex_set_insert( &Collection[i]. while ( (p_edge = edge_iterator( G. p_edge−>from. from. numberofedges .

return OK . for ( j = 0 .base ) . } extern status write_graph( graph G ) { int i.weight ) . } extern int min_weight( graph G ) { int sum = 0. numberofvertices. } } /* Free all reserved space. i++ ) { for ( j = 0 .weight . S2. &numberofvertices. free( Q. */ while ( empty_queue( &Q ) == FALSE ) qremove_edge( &Q. "%d ". weight ) . free( Collection[S1]. numberofvertices. &from.base ) . i < numberofvertices . free( Collection ) . numberofedges . compare_weight ) . Collection[S1] = S3 .weight != UNUSED_WEIGHT ) { printf( printf( printf( printf( printf( } } } return OK . for ( i = 0 . G −> matrix[i][j]. for ( i = 0 . i++) free( Collection[S1]. } .free( Collection[S2]. &numberofvertices. add_edge( *T. j < numberofvertices . i < numberofvertices . i < numberofvertices . j < numberofvertices .base = NULL . "%d ". graph_size( G. j++ ) { if( G −> matrix[i][j]. "%d ". G −> matrix[i][j]. j. &numberofedges ).weight != UNUSED_WEIGHT ) sum = sum + G −> matrix[i][j]. j.base ) . graph_size( G. &to. } } return sum/2 . S1.vertex_number ) . &numberofedges ). i++ ) { "\n") . G −> matrix[i][j].from ) . "\n" ) . j++ ) { if( G −> matrix[i][j].base[i] ) . numberofedges . for ( i = 0 . &weight. i. Collection[S2].

p_H −> heapsize = HEAPINCREMENT . (p_H −> heapsize + HEAPINCREMENT) * sizeof(generic_ptr)) . return OK . if ( p_H −> base == NULL ) return ERROR ./***************************************************************/ /* Programmer: Willie Boag */ /* */ /* heap. int parent. * * The data is inserted in the heap by placing it at the end and * using siftup() to find its proper position. return OK. p_H −> nextelement = 0 . int (*p_cmp_f)() ). int (*p_cmp_f)() ) { . p_cmp_f() is a comparison function that * returns a value less than 0 if its first argument is less than * its second. */ newbase = (generic_ptr *) realloc( p_H −> base. } extern bool empty_heap ( heap *p_H ) { return (p_H −> nextelement == 0) ? TRUE : FALSE . int element. int element. Otherwise. 0 if the arguments are equal. p_H −> heapsize += HEAPINCREMENT . */ if (p_H −> nextelement == p_H −> heapsize) { /* * Not enough space in the array. generic_ptr data. } p_H −> base[p_H −> nextelement] = data .h> static void siftdown( heap *p_H. int (*p_cmp_f)() ) .h" #include "heap. static void siftup( heap *p_H. p_H −> base = newbase .h" #include <stdlib.c (Kruskal’s MST) */ /***************************************************************/ #include "globals. } extern void siftup( heap *p_H. p_H −> nextelement. /* * Insert data into p_H. if (newbase == NULL) return ERROR . siftup( p_H. p_cmp_f ) . p_H −> nextelement ++ . int (*p_cmp_f)() ) { generic_ptr *newbase . so more must be allocated. extern status init_heap( heap *p_H) { p_H −> base = (generic_ptr *) malloc(HEAPINCREMENT*sizeof(generic_ptr)) . p_cmp_f() * returns a value greater than 0. } extern status heap_insert( heap *p_H.

*/ return . If a swap is * made. tmpvalue = p_H −> base[element] . p_cmp_f ). int cmp_result. */ if (leftcmp > 0) { tmpvalue = p_H−>base[parent] . p_H −> base[parent] ). p_H is a heap except for the child’s position. element. } static void siftdown ( heap *p_H. int (*p_cmp_f)() ) { /* * p_H is a heap except for parent. siftdown(p_H. swapelement . */ int leftchild. *p_data = p_H −> base[element] . so call * siftdown() recursively. leftchild = 2 * parent + 1 . p_H−>base[leftchild] ) . if ( element != p_H −> nextelement ) { p_H −> base[element] = p_H −> base[p_H −> nextelement ] . cmp_result = (*p_cmp_f)(p_H −> base[element]. if (rightchild >= p_H −> nextelement) { /* * No right child. leftrightcmp . p_H−>base[leftchild] = tmpvalue . p_H−>base[parent] = p_H−>base[leftchild] . } . generic_ptr tmpvalue . siftup(p_H. p_H −> nextelement−− . parent = (element − 1)/2. rightcmp. if (cmp_result >= 0 ) return . p_H −> base[element] = p_H −> base[parent] . p_cmp_f ). leftcmp = (*p_cmp_f)(p_H−>base[parent]. rightchild. generic_ptr tmpvalue . parent. return. } return OK . generic_ptr *p_data. int (*p_cmp_f)() ){ if ( element >= p_H −> nextelement ) return ERROR . int parent. Find the correct place for parent * by swapping it with the smaller of its children. } extern status heap_delete( heap *p_H. int element. p_H −> base[parent] = tmpvalue . if (leftchild >= p_H −> nextelement) /* * No children. int leftcmp. rightchild = leftchild + 1 .int parent . if ( element == 0 ) return .

} rightcmp = (*p_cmp_f)( p_H−>base[parent]. swapelement. swapelement = (leftrightcmp < 0) ? leftchild : rightchild . p_H−>base[rightchild] ) . if (leftcmp > 0 || rightcmp > 0) { /* * Two children. */ leftrightcmp = (*p_cmp_f)( p_H−>base[leftchild]. p_H−>base[swapelement] = tmpvalue . siftdown( p_H. p_H−>base[rightchild] ) . } . p_cmp_f ) . } return . Swap with smaller child. tmpvalue = p_H−>base[parent] . p_H−>base[parent] = p_H−>base[swapelement] .return .

h> extern status qadd_edge( queue *p_Q . } return OK ./****************************************************************/ /* Programmer: Willie Boag */ /* */ /* queue_interface. int *from. if (qadd(p_Q.c (Kruskal’s MST) */ /****************************************************************/ #include #include #include #include "queue_interface.h" "globals. int *to. if ( qremove( p_Q. if ( p_edge == NULL ) return ERROR . p_cmp_func) == ERROR) { free(p_edge) . } extern status qremove_edge( queue *p_Q. } . int *weight. = p_edge −> weight .h" #include <stdlib. = p_edge −> vertex_number . ){ free( p_edge ) . (generic_ptr) p_edge.h" "graph. p_edge −> vertex_number = to . return ERROR . int weight. (generic_ptr *) &p_edge. int (*p_cmp_func)() ) { edge *p_edge . p_edge −> weight = weight . int (*p_cmp_func)( ) edge *p_edge = ( edge * ) malloc(sizeof( edge ) ) . p_cmp_func ) == ERROR ) return ERROR . int from. *from *to *weight = p_edge −> from . return OK . int to.h" "queue. p_edge −> from = from .

The dynamic array should * grow if needed. p_S −> base = (generic_ptr *) malloc( p_S−>universe_size * sizeof(generic_ptr)) . p_cmp_f ) == TRUE ) return OK . return OK . int count ) { while ( count−− > 0 ) *to++ = *from++ . int size ) { /* * Initialize a set of size elements. if (newset == NULL) return ERROR . . } extern status set_insert( set *p_S .h> #include <stdlib. generic_ptr element. p_S −> base = newset . p_S−>free = p_S−> base .h> #include "set. element. This set implementation * uses a dynamic array.h" #define MAX(a . (p_S−>universe_size + MINIMUM_INCREMENT) * sizeof(generic_ptr) ) . static status memcopy( byte *to.h" #include "globals./******************************************************************/ /* Programmer: Willie Boag */ /* */ /* set.c (Kruskal’s MST) */ /******************************************************************/ #include <stdio. */ p_S −> universe_size = MAX(size. /* * Insert element into the set. return OK . if (p_S−>base == NULL) return ERROR . if ( p_S −> universe_size == member_count(p_S) ) { newset = (generic_ptr *) realloc( p_S −> base. MINIMUM_INCREMENT) . } extern status init_set( set *p_S. byte *from. b) ( ( (a) > (b) ) ? (a) : (b) ) #define MINIMUM_INCREMENT 100 #define member_count(p_S) ( (int) ((p_S) −> free − (p_S) −> base) ) typedef char byte . */ if ( set_member( p_S. int (*p_cmp_f)() ) { generic_ptr *newset .

memcopy( (byte *) tmp. set_insert( p_S3. tmp = malloc( sizeofelement ) . int (*p_cmp_f)(). return FALSE . element) == 0) return TRUE . item++ ) (*p_write_f)(*item) . for ( item = p_S −> base . } extern status set_union( set *p_S1. tmp. generic_ptr element. set_insert( p_S3. int (*p_cmp_f)() ) { /* * Determine if element is in the set (using the passed comparison * function p_cmp_f()). } . item++) if ( (*p_cmp_f)(*item. p_S−>free++ . p_cmp_f) . p_cmp_f) . Search the set sequentially. return OK . item++) { if (set_member( p_S3. set *p_S3. */ generic_ptr *item . *item. p_S −> universe_size += MINIMUM_INCREMENT . } return OK . } bool set_member( set *p_S. set *p_S2. return OK . (byte *) *item. memcopy( (byte *) tmp. tmp . item < p_S2−>free . for (item = p_S1−>base . item < p_S1−>free . } extern status set_write( set *p_S. for (item = p_S−>base . item < p_S −> free. p_cmp_f) == TRUE) continue . (byte *) *item. status (*p_write_f)( ) ) { generic_ptr *item . */ generic_ptr *item. sizeofelement ) . item++) { tmp = malloc( sizeofelement ) . sizeofelement ) .p_S −> free = p_S −> base + p_S −> universe_size . item < p_S−>free . } for (item = p_S2−>base . tmp. } *p_S−>free = element . int sizeofelemen t ) { /* * Store the union of sets *p_S1 and *p_S2 into the set *p_S3.

. if ( set_insert( p_S.h" "setinterface. } ) .h> extern status vertex_set_insert( set *p_S . *p_int = v . (generic_ptr) p_int.c (Kruskal’s MST) */ /*****************************************************************/ #include #include #include #include "globals. compare_vertex ) == ERROR ) { free( p_int ) . } return OK .h" <stdlib. return ERROR . int v ) { int *p_int = ( int * ) malloc( sizeof( int ) if ( p_int == NULL ) return ERROR ./*****************************************************************/ /* Programmer: Willie Boag */ /* */ /* setinterface.h" "set.

the lines are lost/deleted. delete. and quit. Description: The Simple Line Editor is a case study in Data Structures: An Advanced Approach Using C by Esakov and Weiss. In addition.Simple Line Editor Abstract: This is a program that can edit text files. Drawbacks: This program cannot add lines to the end of a file. which makes the interface messier to deal with. It utilizes Doubly-Linked Lists. if you try to cut & paste lines of text to the front of the file. . The driver cannot process commands that have spaces. print. The operations that it can accomplish include insert. the interface for the program is not user-friendly. save. it interacts with the data one line at a time. Rather than being a full screen editor. cut & paste. Although it is not really a bug.

I began to dislike the Simple Line Editor. I did not like this application at all.Reflection Honestly. which I needed to write myself. double). circular. I found the program to be very boring. .especially since I had just finished working on the LISP interpreter the day before. I first started it in January (when I was going through all of Esakov & Weiss on my own). That was my favorite part of the assignment. As a result. It was the fourth application that relied on some form of a linked list. They all dealt with traversing their way through the list. I felt as though it was neither a learning experience (unlike LISP. There were many primitive functions not in the book. I decided to stop working my way through Esakov & Weiss and start working on other schoolwork. and I still did not like it. which was) nor was it an actually useful application. Fortunately. I revisited the code for this program in the first few days of May. Finishing the code was not an exciting process. I did manage to have some fun with the code. I decided that I could make the program more fun by writing all of these functions recursively. Since I had been going through the whole book in about two weeks. and it had a third set of primitives to copy (ordinary.

c interface.c 37 39 43 44 .c user.h 33 34 35 36 Source Files main.C Implementation Makefile Makefile 32 Header Files globals.h interface.h user.c dlists.h dlists.

c dlists.o dlists.o main.o user.h gcc −ansi −pedantic −Wall −c −g interface.o interface.c clean: rm −f *.h gcc −ansi −pedantic −Wall −c −g main.h dlists.o: main.o: dlists.h dlists.o: interface.c user.o .o interface.c dlists.o gcc −ansi −pedantic −Wall −o sle main.c interface.c interface.h interface.c dlists.h gcc −ansi −pedantic −Wall −c −g user.h globals.o: user.c user.o user.o dlists.h gcc −ansi −pedantic −Wall −c −g dlists.h user.# # Programmer: Willie Boag # # Makefile for Simple Line Editor # sle: main.

ERROR } status . TRUE=1 } bool . typedef enum { FALSE=0.h (Simple Line Editor) */ /************************************************************/ #ifndef _globals #define _globals #define DATA( L ) ( ( L ) −> datapointer ) #define NEXT( L ) ( ( L ) −> next ) #define PREV( L ) ( ( L ) −> previous ) typedef enum { OK. #define #define #define #define #define #define #define E_IO 1 E_SPACE 2 E_LINES 3 E_BADCMD 4 E_DELETE 5 E_MOVE 6 MAXERROR 7 #define BUFSIZE 80 #endif ./************************************************************/ /* Programmer: Willie Boag */ /* */ /* globals. typedef void *generic_ptr .

status double_append( double_list *p_L. status double_insert( double_list *p_L. double_list *p_source ) . int double_length( double_list L ) . double_list *p_end ) . generic_ptr data) . *double_list.h" typedef struct double_node double_node. . void paste_list( double_list *p_target. bool empty_double_list( double_list L ) ./*****************************************************************/ /* Programmer: Willie Boag */ /* */ /* dlists. double_list nth_double_node( double_list L. int double_node_number( double_list L ) . int n ) . } . extern extern extern extern extern extern extern extern extern extern extern extern extern extern extern extern #endif status allocate_double_node( double_list *p_L. status (*p_func_f)() ) .h (Simple Line Editor) */ /*****************************************************************/ #ifndef _dlists #define _dlists #include "globals. int n ) . status double_traverse( double_list L. void free_double_node( double_list *p_L ) . generic_ptr data ) . struct double_node { generic_ptr datapointer. generic_ptr data ) . status init_double_list( double_list *p_L ) . void cut_list( double_list *p_L. double_list previous. void destroy_double_list( double_list *p_L. double_list node ) . double_list next. double_list nth_relative_double_node( double_list L. void (*p_func_f)() ) . double_list *p_start. status double_delete( double_list *p_L. status double_delete_node( double_list *p_L. generic_ptr *p_data ) .

#endif ./**********************************************************/ /* Programmer: Willie Boag */ /* */ /* interface. char *buffer ) .h" extern status string_double_append( double_list *p_L.h (Simple Line Editor) */ /**********************************************************/ #ifndef _interface #define _interface #include "dlists.

double_list *p_L ). double_list *p_head. deletelines(char *linespec.h" extern int readfile( char *filename. double_list *p_current ) . double_list *p_head.h (Simple Line Editor) */ /****************************************************/ #ifndef _user #define _user #include "globals. extern extern extern extern int int int int insertlines( char *linespec. double_list *p_current ) . double_list *p_L ). extern int writefile( char *filename. double_list *p_current ) . double_list *p_head.h" #include "dlists. printlines( char *linespec./****************************************************/ /* Porgammer: Willie Boag */ /* */ /* user. movelines( char *linespec. #endif . double_list *p_head. double_list *p_current ) .

gets(buffer). . break. &linelist)) != 0) { printerror(rc). −1).c (Simple Line Editor) */ /********************************************************************/ #include "dlists. break. init_double_list(&linelist). rc = deletelines(&buffer[1]. gets(filename). currentline. double_length(linelist)). case ’D’: file_edited = TRUE.h" #include "user. &currentline). if ((rc = readfile(filename. printf("Enter the name of the file to edit: "). case ’P’: rc = printlines(&buffer[1]. &currentline). int rc. &linelist. currentline = nth_double_node(linelist.h> <ctype./********************************************************************/ /* Programmer: Willie Boag */ /* */ /* main. int main( int argc. file_edited = FALSE.h> <string. char buffer[BUFSIZ]. bool file_edited.h> void printerror( int errnum ) . while (exit_flag == FALSE) { printf("cmd: "). char *argv[] ) { /* * A simple text editor. if (rc) printerror(rc). */ char filename[BUFSIZ]. double_list linelist. exit(1). exit_flag = FALSE. &linelist.\n". exit_flag. if (rc) printerror(rc). } printf("%d lines read.h> <stdio. /* * Implement the following commands: * p − print * d − delete * i − insert * m − move * w − write * q − quit */ switch (toupper(buffer[0])) { case ’\0’: break.h" #include #include #include #include <stdlib.

"invalid line specification". if (rc) printerror(rc). "error deleting lines" }.errmsg[errnum−1]). can’t quit without writing * unless you enter q two times in a row. &linelist. break. "out of memory space". } } return 0. return. rc = movelines(&buffer[1]. else printf("%d lines written\n". Invalid error number: %d\n". } printf("%s\n". default: printerror(E_BADCMD). break. case ’M’: file_edited = TRUE. break. return. &currentline). errnum). rc = insertlines(&buffer[1]. } else exit_flag = TRUE. double_length(linelist)).case ’I’: file_edited = TRUE. Enter W to save. rc = writefile(filename. break. } . Q to discard. if (rc) printerror(rc). if (errnum < 0 || errnum >= MAXERROR) { printf("System Error. break. */ static char *errmsg[] = { "io error". */ if (file_edited == TRUE) { printf("File modified. } void printerror( int errnum ) { /* * Print error message to standard output. &buffer[1]). &linelist. "invalid command". file_edited = FALSE. &currentline). if (rc != 0) printerror(rc). file_edited = FALSE. case ’Q’: /* * If text has been modified. case ’W’: if (buffer[1] != ’\0’) strcpy(filename. &linelist).\n").

/********************************************************/ /* Programmer: Willie Boag */ /* */ /* dlists. return OK. } extern bool empty_double_list( double_list L ) { /* Return TRUE if L is an empty list. } extern void free_double_node( double_list *p_L ) { free(p_L). DATA(L) = data. */ *p_L = NULL.h> <stdlib. NEXT(L) = NULL.h" extern status allocate_double_node( double_list *p_L. } else { . L = (double_list) malloc(sizeof(double_node)). if (allocate_double_node(&L. *p_L = L. if (empty_double_list(*p_L) == TRUE) { PREV(L) = NEXT(L) = NULL. return. *p_L = NULL. if (L == NULL) return ERROR. } extern status double_insert( double_list *p_L. */ double_list L.c (Simple Line Editor) */ /********************************************************/ #include #include #include #include <stdio. } extern status init_double_list( double_list *p_L ) { /* * Initialize *p_L by setting the list pointer to NULL. return OK. PREV(L) = NULL. generic_ptr data ) { double_list L . */ return (L == NULL) ? TRUE : FALSE.h> "dlists. * Always return OK (a different implementation * may allow errors to occur).h" "globals. FALSE otherwise. generic_ptr data ) { /* Insert a new node containing data as the first item in *p_L. data) == ERROR) return ERROR.

) temp = NEXT(temp). if (node == *p_L) { if (next != NULL) *p_L = next. *p_data = DATA(*p_L). if (empty_double_list(*p_L) == TRUE) return ERROR. temp. */ double_list L. *p_L). generic_ptr data) { /* Append a node to the end of a double_list. PREV(L) = temp. if (next != NULL) PREV(next) = prev. PREV(*p_L) = L. NEXT(temp) = L. prev = PREV(node). } extern status double_delete_node( double_list *p_L.NEXT(L) = *p_L. NEXT(temp) != NULL . */ if (empty_double_list(*p_L) == TRUE) return ERROR. } else { for ( temp = *p_L . double_list node ) { /* * Delete node from *p_L. */ double_list prev. if (prev != NULL) NEXT(prev) = next. } extern status double_delete( double_list *p_L. else *p_L = prev. if (*p_L == NULL) { *p_L = L. } . data) == ERROR) return ERROR. next. PREV(L) = PREV(*p_L). next = NEXT(node). } return OK. } *p_L = L. if (PREV(L) != NULL) NEXT(PREV(L)) = L. return OK. return double_delete_node(p_L. if (allocate_double_node(&L. generic_ptr *p_data ) { /* * Delete the first node in *p_L and return the DATA in p_data. } extern status double_append( double_list *p_L.

if (*p_L == start) *p_L = NEXT(end) . lastnode . . */ return . } extern int double_length( double_list L ) { if (L == NULL) return 0 . NEXT(lastnode) = NEXT(target) . */ double_list target. else { source = *p_source . end . NEXT(target) = source . } *p_source = NULL . double_list *p_source ) { /* * Take *p_source and put it after *p_target. Assumes * *p_source is the first node in the list. */ double_list start. end = *p_end . if (NEXT(end)) PREV(NEXT(end)) = PREV(start) . −1) . if (PREV(start)) NEXT(PREV(start)) = NEXT(end) . if (empty_double_list(*p_target) == TRUE) *p_target = *p_source . if (NEXT(target) != NULL) PREV(NEXT(target)) = lastnode . return double_length( NEXT(L) ) + 1 . target = *p_target . } extern void paste_list( double_list *p_target.free_double_node(p_L). PREV(source) = target . source. if (empty_double_list(*p_source) == TRUE) /* * Nothing to do. double_list *p_end ) { /* *Extract the range of nodes *p_start −− *p_end from *p_L. PREV(start) = NEXT(end) = NULL . start = *p_start . double_list *p_start. } extern void cut_list( double_list *p_L. return OK. lastnode = nth_double_node(source.

return L . } if (n == 1) return L . free_double_node( p_L ) . p_func_f ) . } extern void destroy_double_list( double_list *p_L.} extern double_list nth_double_node( double_list L. if (n == −1) return PREV(L) . status (*p_func_f)() ) { if (L == NULL) return OK . (*p_func_f)( DATA(*p_L) ) . if (n == −1) { for ( . int n ) { if (L == NULL) return NULL . } . n − 1 ) . if (n > 0) return nth_relative_double_node( NEXT(L). return double_traverse( NEXT(L). NEXT(L) != NULL . } extern status double_traverse( double_list L. } extern int double_node_number( double_list L ) { if (L == NULL) return 0 . void (*p_func_f)() ) { if (empty_double_list(*p_L) == TRUE) return . *p_L = NULL . return nth_relative_double_node( PREV(L). destroy_double_list( &NEXT(*p_L). } extern double_list nth_relative_double_node( double_list L. if ((*p_func_f)(DATA(L)) == ERROR) return ERROR . p_func_f ) . n − 1 ) . return nth_double_node( NEXT(L). return double_node_number( PREV(L) ) + 1 . L = NEXT(L) ) . n + 1 ) . int n ) { if (n == 0) return L .

c (Simple Line Editor) */ /**********************************************************/ #include #include #include #include "dlists. } return OK ./**********************************************************/ /* Programmer: Willie Boag */ /* */ /* interface. (generic_ptr) str) == ERROR) { free(str) . if (double_append(p_L. } .h> <stdlib. strcpy(str. (char *) buffer) .h" "globals. char *buffer ) { char *str . str = (char *) malloc( sizeof(char) * (strlen((char *)buffer) + 1) ) .h> extern status string_double_append( double_list *p_L.h" <string. return ERROR . if (str == NULL) return ERROR .

double_list head. double_list *p_end ) . double_list current. double_list *p_L ) { /* * Read data from filename and put in the linked list *p_L. buffer) == ERROR) { fclose(fd) . static status writeline( char *s ) . writeline). rc = double_traverse(*p_L. double_list head. BUFSIZ. return 0.h" "user. return E_SPACE.h" <stdio. fd) != NULL) { if (string_double_append(p_L.h> <string. "w")) == NULL) return E_IO.h> static FILE *outputfd. double_list current.h" "dlists.h> <stdlib. return (rc == ERROR) ? E_IO : 0. static int parse_linespec( char *linespec./*********************************************************************/ /* Programmer: Willie Boag */ /* */ /* user. double_lis t *p_start.h" "interface. FILE *fd. * Use the static global variable outputfd to store the output * file descriptor so that it can be used by writeline(). extern int readfile( char *filename.h> <ctype. } extern int writefile( char *filename. static int parse_number( char *numberspec. fclose(outputfd). double_list *p_L ) { /* * Output the data in *p_L to the output file. "r")) == NULL) return 0. while (fgets(buffer. double_lis t *p_node ) .c (Simple Line Editor) */ /*********************************************************************/ #include #include #include #include #include #include #include #include "globals. filename. if ((outputfd = fopen(filename. } } fclose(fd). */ char buffer[BUFSIZ]. if ((fd = fopen(filename. */ status rc. } static status writeline( char *s ) { .

*/ double_list newdata. stdin). int cmp. */ if (fputs(s. return OK. &endnode). /* * If the list is empty. Outputfd * must point to a file previously pened with fopen (as * is done in writefile(). if (cmp != 0) { rc = string_double_append(&newdata. buffer). *p_head. parseerror. } extern int insertlines( char *linespec. char buffer[BUFSIZ]. } } while (cmp != 0). startnode = endnode = NULL. *p_current = nth_double_node(newdata. do { printf("insert>"). double_list *p_head. lastnode. double_list *p_current ) { /* * Insert new lines before the current line. outputfd) == EOF) return ERROR. status rc. startnode. ". Then "paste" the list before * startnode. if (parseerror) return parseerror. if (startnode != endnode) return E_LINES. */ init_double_list(&newdata). if (startnode == NULL) { /* * Empty list */ *p_head = newdata. it better be a single line number */ parseerror = parse_linespec(linespec. endnode. } /* * Collect the new lines in newdata. *p_current. fgets(buffer. &startnode. no linespec is allowed. −1)./* * Write a single line of output to outputfd. if (rc == ERROR) return E_SPACE. */ if (empty_double_list(*p_head) == TRUE) { if (strlen(linespec) != 0) return E_LINES. if ( empty_double_list(newdata) == TRUE) return 0. } else if (PREV(startnode) == NULL) { /* * Insert before the first line. cmp = strcmp(buffer.\n"). BUFSIZ. } else { /* * If a linespec is given. */ .

*p_head. −1). endnode = tmplist. currentnumber. *p_current = startnode. if (startnumber > endnumber) { tmplist = startnode. double_list *p_current ) { /* * Delete some lines (according to linespec from p_head. endnode. *p_current = startnode. &endnode ). if (rc) return rc. &startnode. int startnumber. int tmp. double_list tmpnode. if (rc) return rc. * Update p_current to be after last line deleted. &startnode. &endnode). double_list *p_head. */ paste_list(&PREV(startnode). } return 0. endnumber. } else { /* * Insertin the middle of the list. */ double_list startnode. &startnode. rc = parse_linespec(linespec. return 0. 1). */ double_list startnode. *p_current. endnumber = double_node_number(endnode). free). endnode. destroy_double_list(&startnode. double_list *p_head. endnumber = double_node_number(endnode). double_list new_current. } extern int movelines( char *linespec. &endnode). *p_head = newdata. startnumber = double_node_number(startnode). &newdata). int rc. *p_current. paste_list(&lastnode. startnumber = double_node_number(startnode). p_head). tmplist.lastnode = nth_double_node(newdata. −1). . *p_current = new_current. rc = parse_linespec(linespec. make p_current be before first line. *p_head. } extern int deletelines(char *linespec. if (new_current == NULL) new_current = nth_relative_double_node(startnode. Make sure the lines moved * do not include p_current. cut_list(p_head. int startnumber. } new_current = nth_relative_double_node(endnode. int rc. * If the last line is deleted. endnumber. double_list *p_current ) { /* * Move lines to after p_current. startnode = endnode.

* Set p_start to the starting line and p_end to the ending line. count. while ( count−− ) { printf("%d %s". */ if (currentnumber >= startnumber && currentnumber <= endnumber) return E_LINES. } static int parse_linespec( char *linespec. &startnode. double_list *p_head. } extern int printlines( char *linespec. */ double_list startnode. rc = parse_linespec( linespec. endnumber = double_node_number( endnode ) . *p_head. (char *) DATA(startnode) ) . startnode = nth_relative_double_node(startnode. . Direction indicates whether going forward or * backward. endnumber = tmp.currentnumber = double_node_number(*p_current). */ int rc . direction) . endnode = tmpnode. double_list head. startnumber += direction . &startnode). cut_list(p_head. &endnode ) . &endnode). direction . double_list current. count = (endnumber − startnumber) * direction + 1 . double_lis t *p_start. int startnumber. /* * Make sure start < end. return 0. startnumber = endnumber. endnode . startnumber = double_node_number( startnode ) . *p_current. } /* * Do not include the current line in the ones being moved. paste_list(&PREV(*p_current). */ if (startnumber > endnumber) { tmp = startnumber. return 0 . tmpnode = startnode. int rc . &startnode. if (rc) return rc . startnode = endnode. endnumber. startnumber. direction = (startnumber < endnumber) ? 1 : −1 . } *p_current = endnode . double_list *p_end ) { /* * Parse linespec (consisting of numberspec.numberspec). double_list *p_current ) { /* * Print out lines.

*p_num = ’\0’ . else { rc = parse_number(linespec. } else if (*numberspec == ’$’) { /* * Start with the last line. *p_num . return 0 . head. } else return E_LINES . */ *p_node = current . while (isdigit(*numberspec)) *p_num++ = *numberspec++ . } if (*p_start == NULL || *p_end == NULL) return E_LINES . } numberspec++ . if (rc) return rc . if (*p_node == NULL){ return E_LINES . */ *p_node = nth_double_node(head. double_lis t *p_node ) { /* * Parse a single numberspec. ’. } nextnumber = strchr(linespec. *p_node = nth_double_node(head. if (*p_node == NULL) return E_LINES . head. if (nextnumber == NULL) *p_end = *p_start . current. nodenumber = atoi( numberbuffer ) . */ p_num = numberbuffer . else { rc = parse_number(nextnumber + 1. . −1) . if (rc) return rc . int direction . double_list head. int nodenumber .char *nextnumber . } static int parse_number( char *numberspec. */ char numberbuffer[BUFSIZ]. numberspec++ . double_list current. current. if (*numberspec == ’. p_start) . nodenumber) . p_end) .’) { /* * Start with the current line. } else if (isdigit(*numberspec)) { /* * Have a line number. if (*linespec == ’\0’) *p_start = current .’) .

else return E_LINES . while ( isdigit(*numberspec)) *p_num++ = *numberspec++ . *p_node = nth_relative_double_node(*p_node.’)) return 0 . } else direction = 0 . numberspec++ . /* * If a digit and previously saw a plus or minus. if (p_node == NULL) return E_LINES . nodenumber) . then everything is ok. direction = 0 . figure * offset from p_node. } else if (*numberspec == ’−’) { direction = −1 . } . numberspec++ . } /* * If direction is 0 (meaning no offset or offset was parsed ok) * and at end of this numberspec. */ if (direction == 0 && (*numberspec == ’\0’ || *numberspec == ’./* * Any plusses or minuses? */ if (*numberspec == ’+’) { direction = 1 . *p_num = ’\0’ . */ if (isdigit(*numberspec) && direction != 0) { p_num = numberbuffer . nodenumber = atoi( numberbuffer) * direction .

two necessary conditions for a partial ordering to exist is that the comparison function is anti-symmetric and transitive. the graph cannot have cycles. As a result. only Directed Acyclic Graphs (DAGs) can be topologically sorted. then it would be the equivalent of a never-ending circle of pre-requisites. . A topological sort of a partial ordering is a permutation of the elements of the set such that there are no conflicts with the order established by the comparison function. In terms of what that means on a graph. As mentioned above. One example of a partial ordering is a set of courses and their prerequisites. Such an arrangement comes up in many fields of study. A topological sort that partial ordering would simply be a list of classes to take so that you always take the pre-requisites of a class first.Topological Sort Abstract: A topological sort of a partial ordering is an arrangement of elements of a set in such a way that they satisfy the rules by a given comparison function. If a cycle were to exist. and anti-symmetric. The pre-requisite rules impose restrictions on the order in which classes can be chosen. reflexive. Description: In Mathematics. the combination of a set and a comparison function form a partial ordering if the comparison function is transitive.

and I was forced to find another way. and remove it from consideration. except that a MST only exists for connected graphs. topologically sorted it using my program. This whole process was run in a loop that executed however many times I choose. . On my randomly generated graphs. proved to be very difficult to manage. a valid topological sort has been found. too. and then compared the sort to the ordering imposed by the edges of the graph. I know that there is at least one minimum element. The hardest part of my testing program was generating the DAG. Unfortunately. I wrote an automated program that generated a random DAG. At that point. Once it passed 1000/1000 cases. I loop through the vertices of the graph until I find that minimum element. I print that element. This was pretty much what I ended up using. As a result. This. Because there are no cycles in the graph. I had to modify the algorithm to find a minimum spanning forest rather than a tree. My next attempt involved trying to build the graph in so that I would never insert edges that caused cycles.Algorithm The general idea behind my chosen topological sorting algorithm is as follows. Testing In order to verify that my program produced a valid topological sort. This brings me back to the situation where I can find a new minimum element. connectivity was not necessarily implied. I accepted that it was (likely) a correct algorithm. Eventually. My naïve attempt involved generating a graph of random numbers in such a way that any two vertices had a chance of being connected. This process repeats until I eventually visit every node in my graph exactly once. I got the idea to generate a completely random graph (as I first did) and then run Kruskal’s Minimum Spanning Tree (MST) algorithm in order to eliminate cycles. this resulted in cyclic graphs (and topological sorts do not exist for cyclic graphs).

because we do not have the luxury of every two elements being comparable. quicksort works by partitioning an array into two sub arrays. . For instance. I have tried thinking of how other sorting algorithms would be implemented on partial orderings. and repeat) is the partial ordering analog of Selection Sort. we sort integers. Surprisingly (surprising to me.one array is full of elements less than the pivot and one array is full of elements greater than the pivot. it seems to me that it is harder. which have a total ordering. remove it. When we first learn the sorting problem. but that is where I find issues.Reflection I really enjoyed this problem. So how could we apply this method to a partial ordering? We cannot partition the array into the two sub arrays. I think that it is really interesting trying to sort a partial ordering. The current algorithm that I used for this topological sort (in which I find the minimum element. at least) our problem has become harder because of our less-constraining comparison function. Although it seems like it would be simpler to sort a less-constrained partial ordering (and maybe it is).

c 59 60 65 67 .h graph.C Implementation: Makefile Makefile 54 Header Files globals.c list.c queue.c graph.h queue.h list.h 55 56 57 58 Source Files main.

h list.h globals.h globals.o list.o list.o queue.o graph.c queue.c list.c list.c queue.h graph.c graph.o .o: graph.o main.o gcc −o tsort main.h gcc −ansi −pedantic −Wall −c −g main.o graph.h queue.c globals.h gcc −ansi −pedantic −Wall −c −g queue.h gcc −ansi −pedantic −Wall −c −g graph.o: main.o: queue.h globals.c clean: rm −f *.# # Programmer: Willie Boag # # Makefile for Topological Sort # tsort: main.o: list.c graph.h gcc −ansi −pedantic −Wall −c list.o queue.

typedef void *generic_ptr . typedef enum { FALSE=0 . #endif . TRUE=1 } bool . ERROR } status ./*************************************************************/ /* Programmer: Willie Boag */ /* */ /* globals.h (Topological Sort) */ /*************************************************************/ #ifndef _globals #define _globals #define DATA( L ) ( ( L ) −> datapointer ) #define NEXT( L ) ( ( L ) −> next ) #define RIGHT(T) ( (T) −> right ) #define LEFT(T) ( (T) −> left ) typedef enum { OK.

. vertex vertex2. extern extern extern extern extern extern extern extern #endif status traverse_graph( graph G. *graph . int *p_vertex_cnt. void graph_size( graph G. edge *p_last_return ) . undirected } graph_type ./********************************************************/ /* Programmer: Willie Boag */ /* */ /* graph. bool isadjacent( graph G. void destroy_graph( graph *p_G ) . } graph_header. vertex vertex_number. } edge . edge *edge_iterator( graph G. TOPOLOGICAL} searchorder . typedef struct { graph_type type . vertex vertex1. int weight ) . status (*p_func_f)() ) . vertex vertex2 ) . edge **matrix .h (Topological Sort) */ /********************************************************/ #ifndef _graph #define _graph #include "globals. status init_graph( graph *p_G. typedef enum {DEPTH_FIRST. #define UNUSED_WEIGHT (32767) #define WEIGHT(p_e) ((p_e) −> weight) #define VERTEX(p_e) ((p_e) −> vertex_number) typedef enum {directed. typedef struct { int weight. vertex vertex_number. vertex vertex1. status delete_edge( graph G. graph_type type ) . int vertex_cnt. searchorder order. vertex vertex2 ) .h" typedef int vertex . vertex vertex1. status add_edge( graph G. int number_of_vertices . BREADTH_FIRST. int *p_edge_cnt ) .

generic_ptr key. int (*p_cmp_f)()./**************************************************************/ /* Programmer: Willie Boag */ /* */ /* list. generic_ptr data ) . } . status append( list *p_L. generic_ptr data ) . status insert( list *p_L. void (*p_func_f)() ) . generic_ptr data ) .h (Topological Sort) */ /**************************************************************/ #ifndef _list #define _list #include "globals. extern status traverse( list L. extern extern extern extern extern extern status init_list( list *p_L ) . status (*p_func_f) () ) . list node ) . status delete_node( list*p_L. generic_ptr *p_data ) . list next.h" typedef struct node node. list *p_keynode ) . extern status find_key( list L. status delete( list *p_L. bool empty_list( list L ) . struct node { generic_ptr datapointer. *list . extern status allocate_node( list *p_L. extern void free_node(list *p_L ) . extern void destroy( list *p_L. #endif . list lastreturn ) . extern list list_iterator( list L.

status (*p_func_f)() ) . generic_ptr data ) . #endif . void qprint( queue Q. status qadd( queue *p_Q. generic_ptr *p_data ) . status qremove( queue *p_Q .h" typedef struct { node *front . bool empty_queue( queue *p_Q ) . node *rear . #define FRONT(Q) ((Q) −> front) #define REAR(Q) ((Q) −> rear) status init_queue( queue *p_Q ) .h (Topological Sort */ /***************************************************/ #ifndef _queue #define _queue #include "globals.h" #include "list./***************************************************/ /* Porgrammer: Willie Boag */ /* */ /* queue. } queue .

printf("\n Topological Traversal: ") . char *argv[] ){ FILE *fileptr . to. numberofvertices. int to ./******************************************************************/ /* Programmer: Willie Boag */ /* */ /* main.h" #include "globals. int numberofvertices . int weight . "%d %d %d". &numberofvertices ) . } . destroy_graph( &G ) . &from.c (Topological Sort) */ /******************************************************************/ #include "graph. &to. init_graph( &G. int from .h> status write_vertex( int a ) { printf( " %d ". fileptr = fopen( argv[1]. printf("\n\n") . "%d". from. a ) . traverse_graph( G. write_vertex ) .h" #include <stdio. weight ) . return OK . "r" ) . &weight) != EOF) add_edge( G. fscanf(fileptr. TOPOLOGICAL. fclose(fileptr) . directed ) . graph G . } int main( int argc. return 0 . while (fscanf( fileptr.

c (Topological Sort) */ /********************************************************/ #include #include #include #include <stdlib. G −> type = type . G = (graph) malloc(sizeof(graph_header)) . if (G −> matrix == NULL) { free(G) . j . j++) { G −> matrix[i][j]. for (i = 0 .h" "queue. . free((*p_G) −> matrix ) . } extern status add_edge( graph G. free(G) .weight = weight . graph_type type ) { graph G .weight = UNUSED_WEIGHT . G −> matrix = (edge **) malloc(vertex_cnt * sizeof(edge *)) . } extern void destroy_graph( graph *p_G ) { free((*p_G) −> matrix[0] ) . int weight ) { if (vertex1 < 0 || vertex1 > G −> number_of_vertices) return ERROR .h> extern status init_graph( graph *p_G. vertex vertex1. G −> matrix[vertex1][vertex2]. } G −>matrix[0] = (edge *) malloc(vertex_cnt * vertex_cnt * sizeof(edge)) . if (vertex2 < 0 || vertex2 > G −> number_of_vertices) return ERROR . int i. if (weight <= 0 || weight >= UNUSED_WEIGHT) return ERROR .h" #include <stdio. i++) { for (j = 0 . return ERROR .vertex_number = j . i++) G −>matrix[i] = G −> matrix[0] + vertex_cnt * i . return ERROR . vertex vertex2. if (G −>matrix[0] == NULL) { free(G −> matrix) . i < vertex_cnt . i < vertex_cnt .h" "graph. } for (i = 1./********************************************************/ /* Programmer: Willie Boag */ /* */ /* graph. G −> number_of_vertices = vertex_cnt . free(*p_G) .h> "globals. int vertex_cnt. j < vertex_cnt . } } *p_G = G . return OK . G −> matrix[i][j]. if (G == NULL) return ERROR . p_G = NULL .

} extern status delete_edge( graph G. status (* p_func_f)() ) { edge *tmp. else other_vertex = VERTEX(p_last_return) + 1 . for ( . edges = 0 . vertex vertex_number. other_vertex < G−> number_of_vertices . if (p_last_return == NULL) other_vertex = 0 .weight != UNUSED_WEIGHT) edges++ .weight = weight .weight = UNUSED_WEIGHT . i < G −> number_of_vertices . G −> matrix[vertex1][vertex2]. int *p_vertex_cnt. if (G −> type == undirected) G −> matrix[vertex2][vertex1]. if (vertex2 < 0 || vertex2 > G −> number_of_vertices) return ERROR . } extern void graph_size( graph G. *p_edge .weight = UNUSED_WEIGHT . j++) if (G −> matrix[i][j].weight == UNUSED_WEIGHT) ? FALSE : TRUE . vertex vertex2 ) { if (vertex1 < 0 || vertex1 > G −> number_of_vertices) return ERROR . vertex vertex2 ) { if (vertex1 < 0 || vertex1 > G −> number_of_vertices) return FALSE . other_vertex++) { if (G −> matrix[vertex_number][other_vertex]. . bool visited[]. } extern edge *edge_iterator( graph G. } return NULL . return OK . j . return OK . edge *p_last_return ) { vertex other_vertex .if (G −> type == undirected) G −> matrix[vertex2][vertex1]. int *p_edge_cnt ) { int i . if (vertex2 < 0 || vertex2 > G −> number_of_vertices) return FALSE . return . i++) for (j = i + 1 . for (i = 0 . *p_edge_cnt = edges . if (vertex_number < 0 || vertex_number >= G −> number_of_vertices) return NULL . vertex vertex1.edges .weight != UNUSED_WEIGHT) return &G −> matrix[vertex_number][other_vertex] . vertex vertex1. queue Q . *p_vertex_cnt = G −> number_of_vertices . return (G −> matrix[vertex1][vertex2]. vertex vertex_number. } static status breadth_first_search( graph G. j < G −> number_of_vertices . } extern bool isadjacent( graph G.

status rc . p_edge)) != NULL) if (visited[VERTEX(p_edge)] == FALSE) { rc = depth_first_search(G. if ((*p_func_f)(VERTEX(tmp)) == ERROR) return ERROR . edge_cnt. vertex_number. (generic_ptr *) &tmp) . edge *p_edge . p_edge)) != NULL) qadd( &Q. init_queue(&Q) . } } return OK . pred = (int *) malloc( sizeof(int) * vertex_cnt ) . vertex vertex_number. . &vertex_cnt. if ((*p_func_f)(vertex_number) == ERROR) return ERROR . } static status depth_first_search( graph G. while ( empty_queue(&Q) == FALSE ) { qremove( &Q. while ( (p_edge = edge_iterator(G. p_edge = NULL . if (visited[VERTEX(tmp)] == FALSE) { visited[VERTEX(tmp)] = TRUE . int i . vertex_number. p_edge = NULL . } return OK . *pred . p_edge)) != NULL) qadd( &Q.visited[vertex_number] = TRUE . p_func_f) . status (*p_ func_f)() ) { edge *p_edge . while ( (p_edge = edge_iterator(G. while ( (p_edge = edge_iterator(G. if (rc == ERROR) return ERROR . graph_size( G. } static int *count_predecessors( graph G ) { int vertex_cnt. if (pred == NULL) return NULL . VERTEX(p_edge). p_edge = NULL . (generic_ptr) p_edge ) . visited. VERTEX(tmp). bool visited[]. (generic_ptr) p_edge ) . visited[vertex_number] = TRUE . if ((*p_func_f)(vertex_number) == ERROR) return ERROR . &edge_cnt) .

vertex_cnt ) . edge_cnt . edge *p_edge . pred = count_predecessors( G ) . } extern status traverse_graph( graph G. i++) pred[i] = 0 . int count = 0. status (*p_func_f)() ) { int vertex_cnt. */ return −1 . while ( (p_edge = edge_iterator( G. int vertex_cnt. &edge_cnt) .for (i = 0 . i < vertex_cnt . i < vertex_cnt . } /* Should never get here. if ((*p_func_f)( ind ) == ERROR) return ERROR . return i . int n ) { int i . p_edge)) != NULL ) pred[VERTEX(p_edge)]++ . edge_cnt. } free(pred) . graph_size( G. while ( (p_edge = edge_iterator( G. for (i = 0 . } static int extract_min( int pred[]. status (*p_func_f)() ) { status rc . } return pred . return OK . searchorder order. ind. /* Uses assumption that at least element has a value of zero. i < n . while ( count < vertex_cnt ) { ind = extract_min( pred. bool *visited . */ for (i = 0 . count++ . i. *pred . } static status topological_sort( graph G. i++) { p_edge = NULL . p_edge)) != NULL ) pred[VERTEX(p_edge)]−− . i++) if (pred[i] == 0) { pred[i] = −1 . p_edge = NULL . . ind . &vertex_cnt.

return OK . } . visited. case BREADTH_FIRST: rc = breadth_first_search(G. i++) { if (visited[i] == FALSE) { switch (order) { case DEPTH_FIRST: rc = depth_first_search(G. p_func_f) . break . visited. &edge_cnt) . if (visited == NULL) return ERROR . p_func_f ) . break .int i . i++) visited[i] = FALSE . i < vertex_cnt . i. for ( rc = OK. &vertex_cnt. } } } free(visited) . i. graph_size( G. break . p_func_f) . rc = topological_sort( G. for (i = 0 . i < vertex_cnt && rc == OK . visited = (bool *) malloc(sizeof(bool) * vertex_cnt). i = 0 . case TOPOLOGICAL: i = vertex_cnt .

NEXT(L) = *p_L. data) == ERROR) return ERROR. generic_ptr data ) { list L. if (allocate_node(&L. if (L == NULL) return ERROR. if (empty_list(*p_L) == TRUE) *p_L = L. } return OK. data) == ERROR) return ERROR. } status init_list( list *p_L ) { *p_L = NULL. *p_L = NULL. if (allocate_node(&L. } status delete( list *p_L. . generic_ptr data ) { list L./**********************************************************/ /* Programmer: Willie Boag */ /* */ /* list. return OK. return delete_node(p_L. DATA(L) = data. } status append( list *p_L.c (Topological Sort) */ /**********************************************************/ #include <stdlib. tmplist. else { for (tmplist = *p_L. } bool empty_list( list L ) { return (L == NULL) ? TRUE : FALSE. NEXT(tmplist)!=NULL. generic_ptr *p_data ) { if ( empty_list(*p_L)) return ERROR. } status delete_node( list *p_L. tmplist=NEXT(tmplist)). *p_L = L . return OK. list node ) { list L. NEXT(tmplist) = L. *p_data = DATA(*p_L). return OK. } void free_node( list *p_L ) { free(*p_L). *p_L = L. generic_ptr data ) { list L = (list) malloc(sizeof(node)).h" status allocate_node( list *p_L. } status insert( list *p_L.h> #include "list. NEXT(L) = NULL. *p_L).h" #include "globals.

p_func_f). } status traverse( list L. void (*p_func_f) () ) { if (empty_list(*p_L) == FALSE) { destroy(&NEXT(*p_L). list *p_keynode ) { list curr = NULL. return traverse(NEXT(L). if ((*p_func_f)(DATA(L)) == ERROR) return ERROR. } list list_iterator( list L. while ( ( curr = list_iterator(L. return OK. if (*p_L == node) *p_L = NEXT(*p_L). } } . } void destroy( list *p_L. else NEXT(L) = NEXT(node). DATA(curr)) == 0 ) { *p_keynode = curr. else { for (L = *p_L. list lastreturn ) { return (lastreturn == NULL) ? L : NEXT(lastreturn). status (*p_func_f) () ) { if (empty_list(L)) return OK. int (*p_cmp_f)(). if (p_func_f != NULL) (*p_func_f)(DATA(*p_L)). generic_ptr key. p_func_f). L = NEXT(L)). if (L == NULL ) return ERROR. } free_node(&node). free_node(p_L). } status find_key( list L. return OK. curr)) != NULL ) { if ((*p_cmp_f)(key. L != NULL&& NEXT(L) != node.if (empty_list(*p_L) == TRUE) return ERROR. } } return ERROR.

data) == ERROR) return ERROR.h" #include "list. */ FRONT( p_Q ) = NULL.h" #include <stdio. generic_ptr data ) { /* * Add data to p_Q.h> extern status init_queue( queue *p_Q ) { /* *Initialize the queue to empty. */ list nodeinfront . REAR(p_Q) = newnode . } extern status qadd( queue *p_Q. } extern bool empty_queue( queue *p_Q ) { /* * Return TRUE if queue is empty./****************************************************/ /* Programmer: Willie Boag */ /* */ /* queue.h> #include "globals. */ list newnode . if (allocate_node(&newnode. */ return (FRONT(p_Q) == NULL) ? TRUE : FALSE . FALSE otherwise. if (empty_queue(p_Q) == FALSE) { NEXT(REAR(p_Q)) = newnode .c (Topological Sort) */ /****************************************************/ #include <stdlib. } else { FRONT(p_Q) = REAR(p_Q) = newnode . } extern status qremove( queue *p_Q. generic_ptr *p_data ) { /* * Remove a value from p_Q and put in p_data. REAR( p_Q ) = NULL . return OK . . } return OK .h" #include "queue.

if (empty_queue(&Q) == TRUE) return . status (*p_func_f)() ) { node *temp . } extern void qprint( queue Q. nodeinfront = FRONT(p_Q) . qprint( Q. (*p_func_f)( temp ) . } . if (REAR(p_Q) == FRONT(p_Q)) REAR(p_Q) = FRONT(p_Q) = NULL . else FRONT(p_Q) = NEXT(nodeinfront) . qremove( &Q. return OK . return . p_func_f ) . (generic_ptr *) &temp ) . *p_data = DATA(nodeinfront) .if (empty_queue(p_Q) == TRUE) return ERROR.

for instance when storing strings. data is stored in sets. However. The good thing about a hash table is that it strives to locate data very quickly. Another implementation could be a hash table (which has a function that will map your data to an index into your table). its location is determined via the hashing function. which would only take some constant amount of time. Consequently. although data would hash to the index in a constant amount of time. If there exists a one-to-one correspondence between your data and the natural numbers. they could report a false positive for membership. However. The goal is two quickly compute the index returned to the hashing function. Two of the most important operations on sets are insertion and testing for membership. One solution for this is problem is “chaining” pieces data together when they collide. such a correspondence is not always feasible. When time and space are the most important aspect of data look-up. The drawback of Bloom Filters is that they are probabilistic by nature. then the obvious choice would be a bit vector. the Bloom Filter data structure can be used. Description: There are many ways to represent a set in memory. However. they never report false negatives. When data is stored in the table. there would still be the . one of the most significant disadvantages of a hash table results from “collisions” (when different pieces of data map to the same index). In this solution.Bloom Filters Abstract: Often times.

the time required for the “yes” or “no” to be generated by searching for a member is independent of the number of collisions. Bloom Filters never “chain” in the event of a collision. As a result. collisions result in the generation of false positives. They are implemented as bit vectors. This is a significant gain over hash tables.possibility of traversing a chain of data in order to find a particular element. Bloom Filters are similar to hash tables. which means that they take up very little space. Bloom Filters are implemented as bit vectors. Benefits: As mentioned above. Unlike hash tables. Consequently. Bloom Filters would not be appropriate for situations that require high accuracy. As a result. but not with the usual one-to-one correspondence. when it is tested for membership at a later time. the less ideal the performance of the data structure. The bit at each hashed index is then set to true. when you test whether previously inserted data is in the set. there can never be false negatives. Bloom Filters have guaranteed constant look-up time. and it will be determined that the data must have been added to the set already. In addition. which not only store the inserted data. multiple hash functions are calculated using that data. resultant data is generated from it. Despite requiring little memory cost and guaranteed constant search time. but also have lots of unused space in order to probabilistically avoid collisions. Consequently. because once a member has been added to the set. Therefore. . except that they have guaranteed constant search time (but at the price of certainty). its hash-generated indices will always be set to true. and report a false positive for membership. constant search time is not guaranteed. the higher the number of collisions. Just as with hash tables. and each hash produces its own index into the bit vector. An important point here is that the inserted data is not actually stored. and that resultant data is stored (in the bit vector). the hash functions will generate the same indices that have already been set to true. Instead. Furthermore. the test will never fail. When data needs to be stored. Drawbacks: If a piece of data that is not actually added in the set just happens to hash to indices that have all been set to true by other members of the set. then the Bloom Filter will mistakenly believe that the current data was inserted.

My Bloom Filter was implemented in a polymorphic manner. Furthermore. I used three hashing functions. I was able to delete from my Bloom Filter.Implementation Choices: For my implementation. allowing them to be modified without re-compilation of the primitive functions. my hashing functions are very simple. As a result. because I find the idea of the probabilistic data structures to be fascinating. I chose to use an array of integers (instead of bits) in order to simplify the code. because my array (of integers) could hold more values than just true or false. 2. In addition. This decision was made so that the hashing functions do not need to know how large the Bloom Filter bit array is. As a result. test for membership. My functions are able to initialize the structure. This concept seems counter intuitive. The product of the ASCII values of the characters of the string. insert. For the purposes of dealing with strings. there is a seemingly mystical quality of randomness that can allow our algorithms and data structures to perform better (in expectation) when we don’t know their exact behavior before runtime. I chose to store how many inserted elements have mapped to each index. I chose to represent my data structure as a struct with two fields: the “bit” array. as opposed to inside of my hashing functions. the three chosen functions calculate: 1. I made the decision to modulo the calculated index inside of my primitives. . Motivation I wanted to implement and discuss Bloom Filters. 3. and the size of that array. and free the structure. More sophisticated (usually determined by the kind of data to be evaluated) would be used in practice in order to avoid collisions. I separated the hashing functions from the primitives. Just like with the randomized pivot selection of quicksort. The length of the string. delete. yet powerful. I could safely delete from the array without the fear of zeroing out the index of a member that also hashed to the same index as the element to be removed. The sum of the ASCII values of the characters of the string. My code focuses more on the concept than the efficiency of space. I created very basic interface functions.

One false positive example is explored in more detail below. the words “be”. I inserted the words of poem.txt into my Bloom Filter. and “have” were also predicted to be in my set. “in”. showing which words caused the false positive “in”. The words “to”.Results In this example. because they hashed to the same indices that were set to true by the words of poem. and “you” (which actually are in the file) were correctly stated to have been found using the Bloom Filter. in hash1: hash2: hash3: 23 30 2 to 3 12 2 belong 23 16 6 base 27 30 4 . In addition.txt. “of”. I then tested to see if 20 very common English words were (probably) in my set.

h 75 76 77 78 Source Files main.h hash.c hash.C Implementation: Makefile Makefile 74 Header Files globals.c 79 80 82 83 .h bloom_interface.c bloom_interface.h bloom.c bloom.

o bloom_interface.c bloom.o bloom_interface.o main.h bloom_interface.h gcc −ansi −pedantic −Wall −c main.o bloom.c bloom.o: hash.h globals.h globals.h globals.h gcc −ansi −pedantic −Wall −c bloom_interface.h hash.o hash.c clean: rm −f *.# # # # # Programmer: Willie Boag Makefile for Bloom Filters bloom: bloom.o main.c hash.h gcc −ansi −pedantic −Wall −c hash.o gcc −o bloom bloom.o: main.o: bloom.c hash.o: bloom_interface.h gcc −ansi −pedantic −Wall −c bloom.c main.h globals.c bloom_interface.o .c bloom.o hash.

#endif . ERROR } status ./*************************************************************/ /* Progammer: Willie Boag */ /* */ /* globals. typedef enum { FALSE=0 . TRUE=1 } bool .h (Bloom Filters) */ /*************************************************************/ #ifndef _globals #define _globals typedef enum { OK. typedef void *generic_ptr .

.h (Bloom Filters) */ /*****************************************************/ #ifndef _bloom #define _bloom #include "globals. bool bloom_member( bloom *p_B. void bloom_insert( bloom *p_B. generic_ptr data ) . } bloom . int size ) . int size . extern extern extern extern extern #endif status init_bloom( bloom *p_B. void destroy_bloom( bloom *p_B ) . status bloom_delete( bloom *p_B. generic_ptr data ) . generic_ptr data ) ./*****************************************************/ /* Programmer: Willie Boag */ /* */ /* bloom.h" typedef struct { int * base .

char *data ) . char *data ) . extern status str_bloom_delete( bloom *p_B. char *data ) .h (Bloom Filters) */ /*****************************************************/ #ifndef _bloom_interface #define _bloom_interface #include "globals.h" #include "bloom. #endif ./*****************************************************/ /* Programmer: Willie Boag */ /* */ /* bloom_interface.h" extern void str_bloom_insert( bloom *p_B. extern bool str_bloom_member( bloom *p_B.

#endif . extern int hash2( generic_ptr str ) . extern int hash3( generic_ptr str ) ./*********************************************/ /* Programmer: Willie Boag */ /* */ /* hash.h (Bloom Filters) */ /*********************************************/ #ifndef _hash #define _hash extern int hash1( generic_ptr str ) .

fid2 = fopen( argv[2]. fclose( fid1 ) . char word[20] . "%s ".h> #include <stdlib. "r" ) . word ) != EOF) if ( str_bloom_member(&B.h" #include <stdio. exit(1) . else printf("\n\"%s\": NO".c (Bloom Filters) */ /********************************************************/ #include "globals. if (argc != 3) { fprintf( stderr./bloom input_file comparison_file\n\n" int main( int argc. */ if (fid1 == NULL) { fprintf( stderr. } . return 0 . */ while (fscanf( fid2. printf("\n\n\n" ) . "r" ) . /* Read input data into Bloom Filter. fclose( fid2 ) . word ) . */ while (fscanf( fid1. "\n\tERROR: Could not open file: %s\n\n".h" #include "bloom. word) ) printf("\n\"%s\": maybe". /* Check comparison data for membership. word ) != EOF) str_bloom_insert( &B. argv[1] ) . "%s ". 32 ) . "\n\tERROR: Could not open file: %s\n\n". destroy_bloom( &B ) . } if (fid2 == NULL) { fprintf( stderr.h" #include "bloom_interface. word ) .h> #define CALL_USAGE "\n\tCall Usage: . } init_bloom( &B. exit(1) . *fid2 . } fid1 = fopen( argv[1]. word ) . CALL_USAGE ) . char *argv[] ) { FILE *fid1. /* Error−check file opens. bloom B ./********************************************************/ /* Programmer: Willie Boag */ /* */ /* main. argv[2] ) . exit(1) .

c (Bloom Filters) */ /*****************************************************/ #include #include #include #include #include "globals.base == NULL) return ERROR . int size ) { int i . if (B. h2 = hash2( data ) % p_B−>size .h" "bloom. } extern void bloom_insert( bloom *p_B. p_B−>base[h2]++ ./*****************************************************/ /* Programmer: Willie Boag */ /* */ /* bloom. *p_B = B . if (p_B−>base[h3] == 0) return FALSE .base = (int *) malloc( sizeof(int) * size ) . h2. B. h1 = hash1( data ) % p_B−>size . h1 = hash1( data ) % p_B−>size . } extern status bloom_delete( bloom *p_B. return OK . generic_ptr data ) { int h1. for (i = 0 . p_B−>base[h3]++ . h1 = hash1( data ) % p_B−>size . . if (p_B−>base[h2] == 0) return FALSE .base[i] = 0 .h" "hash. if (p_B−>base[h1] == 0) return FALSE . generic_ptr data ) { int h1. i < size .h" <stdlib. h3 = hash3( data ) % p_B−>size . bloom B . h3 .size = size . h2. i++) B.h> #include <stdio. h3 .h> extern status init_bloom( bloom *p_B. generic_ptr data ) { int h1. p_B−>base[h1]++ . h2 = hash2( data ) % p_B−>size . h2 = hash2( data ) % p_B−>size . h3 .h> <string. h3 = hash3( data ) % p_B−>size . return TRUE . h2. h3 = hash3( data ) % p_B−>size . } extern bool bloom_member( bloom *p_B. B.

} extern void destroy_bloom( bloom *p_B ) { free( p_B −> base ) . return OK .if (p_B−>base[h1] < 0) return ERROR . p_B−>base[h2]−− . if (p_B−>base[h3] < 0) return ERROR . p_B−>base[h1]−− . p_B −> base = NULL . if (p_B−>base[h2] < 0) return ERROR . } . p_B−>base[h3]−− .

/*****************************************************/ /* Programmer: Willie Boag */ /* */ /* bloom_interface.c (Bloom Filters) */ /*****************************************************/ #include "globals.h" #include "bloom.h" extern void str_bloom_insert( bloom *p_B, char *data ) { bloom_insert( p_B, (generic_ptr) data ) ; } extern bool str_bloom_member( bloom *p_B, char *data ) { return bloom_member( p_B, (generic_ptr) data ) ; } extern status str_bloom_delete( bloom *p_B, char *data ) { return bloom_delete( p_B, (generic_ptr) data ) ; }

/*******************************************************/ /* Programmer: Willie Boag */ /* */ /* hash.c (Bloom Filters) */ /*******************************************************/ #include "globals.h" #include <string.h> extern int hash1( generic_ptr data ) { int sum = 0 ; char *str = (char *)data; while (*str != ’\0’) sum += (int) *str++ ; return sum > 0 ? sum : −sum ; } extern int hash2( generic_ptr data ) { int prod = 1 ; char *str = data; while (*str != ’\0’) prod *= (int) *str++ ; return prod > 0 ? prod : −prod ; } extern int hash3( generic_ptr data ) { return strlen( (char *) data ) ; }

Fast Fourier Transform
Abstract: This is the algorithm that changed the world. The Fourier Transform is a tool that makes analyzing waves and signals very easy – and waves are everywhere (light, sound, radar, etc). Unfortunately, although the output of the transform is easy to work with, it used to be very expensive to actually go through the process of applying the transform – which defeated the purpose of it altogether. But in 1965, James Cooley and John Tukey discovered an efficient way to calculate the Fourier Transform of a signal. Their efficient algorithm is called the Fast Fourier Transform (FFT), because it is much more efficient than the original method for computing the Discrete Fourier Transform (DFT). As a result, massive amounts of data are able to be analyzed and processed in near-real time, profoundly impacting a large range technology- including MRIs and police radar guns.

To relate this analogy back to the Fourier Transform. There are many different methods that can be used to add them (such as physically moving them for the tail-to-tip method). it is the transform that actually breaks a signal into its corresponding sine waves. and z components and add the corresponding components together. was breaking our vectors down into their x. which we can call a and b.Fourier Transform: Using such a transform. The output of the transform would be the “coefficients” of each wave. This made the addition of a and b very easy. y. The idea is that for any given wave. The important step here. and z coordinates. but the most natural way to do it is to break the two vectors down into their x. Such a representation allows for easy calculations in noise reduction and convolution of waves. you can represent it as the sum of sine waves of different frequencies. The goal is to add these two vectors together. . The concept of breaking a wave down into its natural building blocks is better understood using a simpler example. y. Consider two 3-dimensional vectors. any wave can be “broken down” into its fundamental building blocks – sine waves.

Computers needed to analyze large amounts of data signals very quickly. because the time and space required to store the matrix of n by n values (where n is the length of the signal vector) grows very quickly as n becomes large.Original Algorithm: The original method for calculating the Fourier Transform involves generating a matrix of complex numbers. the FFT greatly reduced the cost of performing the Fourier Transform on a signal vector. Without getting too technical in my analysis. This process of solving simpler problems could continue could continue until the problem becomes very easy to solve. and that is just not possible with the complex matrix method. . In addition. By separating all of the odds rows from the even rows of the calculated output vector. they saw an incredible pattern! The grouped rows could be arranged in such a way that each group was formed by the Fourier Transform of a smaller vector. This process is very inefficient. there was no longer the need to calculate and store the very large complex matrix for this algorithm. The calculation of each (n/2)-sized vector would generate two (n/4)-sized vectors. I will say that once again. The un-scalability of this algorithm is what makes it so impractical. As a result. I will just say that repeatedly cutting the input size in half allowed for very efficient calculations. The data representing the signal is then stored in a vector and multiplied by the complex matrix. because of how critical that discovery is. They found a recursive formula that calculates the FFT of a vector of length n by calculating the FFTs of two vectors of length (n/2). Fast Fourier Transform: Cooley and Tukey realized that they could take advantage of the special structure of the complex matrix that relied on the periodic nature of the complex entries.

making sure to again change the + to – in the second row. the two terms (x0 + x2) and (x0 – x2) define the DFT of the length-2 vector {x0. notice that the terms (x0 + x2) and (x1 + x3) are the same across the top two rows. changing the + (as it has above) to -. Even without a full understanding of the mechanics of the algorithm. For the curious learner. The written expression on the upper-left-hand side of the page describes the components of a length-4 signal vector after applying the Fourier Transform matrix multiplication. Likewise. x2}.A Taste of Recursion: Without going into too much detail. I have grouped the odd rows and the even rows and separated the two groups with a black line. Furthermore. . The reason for this involves the periodic structure of complex numbers. In this picture. On the upper-right-hand side of the page. the length-4 DFT can be written in terms of two length-2 DFTs. The important idea behind this observation is that the large system of equations for calculating the DFT of a length-4 vector can be re-arranged into two smaller groups that exhibit remarkable similarities in structure. (x1 + x3) and (x1 – x3) is the DFT of the vector {x1. this section will point out some of the repetitive structures of the Fast Fourier Transform calculations. it can be show that the coefficients (that is. the terms involving e) are identical within groups. in both groups have the same characteristic of f(x0. except the top row of each group has a plus sign and the bottom row has a minus sign. We can do the same for the coefficients of the odd vector. x3}. the odd grouping has (x0 – x2) and (x1 – x3) in the same situation for the bottom two rows.x2) + scale * f(x1. the terms of the form ω (from the complex matrix) have already been evaluated. the important take-away is that we are able to divide the problem into smaller subproblems. The goal of this section is not to understand every little detail. We can obtain equal coefficients for the even vector by rewriting the second row. In other words. In addition.x3). which is where the complex numbers involving e came from. but instead to begin to see the overall picture. In the even grouping. Colors have been used as an aid when referencing the equations. This concludes the analysis for the length-4 FFT. As promised. There is a picture at the bottom of the page to clarify the above statements.

computations such as combining two overlapping signals together becomes very easy. . When reading the output of the time command. Imagine trying to meaningfully interpret MRI results if it took 400 times longer to process the data.148 seconds. and compared the computation difference for signal vectors of very large length. In case you do not have your calculator with you. Conclusion: The FFT is tremendously useful for breaking down signals and waves into their natural building blocks. Once in their more natural form. As a result. we can see that for a vector of length 8192. the gains in efficiency of the FFT over the original DFT algorithm are incredible.Why It Matters: Although that seems like a lot of work for “simplifying” an algorithm. The important thing to consider is how fast the computation time grows as the length of the signal vector increases. the FFT took .392 seconds required for the ordinary DFT algorithm. the FFT was about 408 times faster. Compare that to the 60. the important number to consider is the one on the middle line that says “user.” That is the time that was required for the actual algorithm to run. By comparing these two times. signals can be processed at incredibly high speeds. I have implemented both algorithms for the Fourier Transform.

C Implementation: Makefile Makefile 90 Header Files complex.h 91 Source Files fft.c 92 95 97 .c dft.c complex.

o: dft.o complex.c complex.c complex.c −D_GNU_SOURCE complex.o complex.o complex.o complex.o −lm dft: dft.c clean: rm −f *.o −lm dft.h gcc −ansi −pedantic −Wall −c dft.o .o: complex.o gcc −o dft dft.# # # # # Programmer: Willie Boag Makefile for Fast Fourier Transform all: dft fft fft: fft.h gcc −ansi −pedantic −Wall −c complex.o gcc −o fft fft.

} complex . OK } status . TRUE=1 } bool . complex add_complex( complex a. complex subtract_complex( complex a. typedef enum { ERROR./***************************************************************/ /* Programmer: Willie Boag */ /* */ /* complex. complex multiply_complex( complex a. double imaginary . double imaginary ) . complex b ) . complex b ) . complex load_complex( double real. complex b ) .h (Fast Fourier Transform) */ /***************************************************************/ #ifndef _complex #define _complex typedef struct { double real . typedef enum { FALSE=0 . #endif .

if (k == 0) { transform[0] = p[0] . v[i]. i < n . complex *evens. INVERSE } direction . omega = create_vector( n ) . k ) . int m.h> typedef enum { FORWARD. . /* Global variable. i++ ) { real = cos( −(2 * M_PI/n) * i ) .real. } complex *FFT( complex *p.h> <stdlib. i . for ( i = 0 .h" <stdio. for ( i = 0 . i++ ) printf( "\t%f %f\n". *odds . imag = sin( −(2 * M_PI/n) * i ) . transform = create_vector( n ) . complex *create_vector( int n ) { return (complex *) malloc( sizeof(complex) * n ) . *p_o . int n ) { int i . } void fill_omega_vector( int n ) { int i .c (Fast Fourier Transform) */ /***************************************************************/ #include #include #include #include "complex. n = pow(2. printf( "\n\n" ) . } p_e = create_vector( n/2 ) . */ complex *omega . complex *transform . complex scale. int k . imag . complex *p_e. i < n . double real./***************************************************************/ /* Programmer: Willie Boag */ /* */ /* fft. return transform . imag ) .imaginary ) . v[i]. printf( "\n" ) . omega[i] = load_complex( real. p_o = create_vector( n/2 ) .h> <math. scaled_odd . direction dir ) { int n. } } void print_vector( complex v[].

k−1. k = ceil( log( (double) n ) / log( 2. i < n/2 .0. *wf . u = create_vector( n ) .0 ) ) . for ( i = 0 . 0. . i < n . n = atoi( argv[1] ) .real /= (double) n . /* even + (scale /* even − (scale transform[ transform[ n/2 + } /* Scale result by 1/n for inverse FFT. fill_omega_vector( n ) . i++ ) { /* Forward or Inverse transform? */ scale.imaginary) .0 ) . u[i+n/2] = load_complex( 0. w = (complex *) malloc( sizeof(complex) * n ) . */ if (dir == INVERSE && m == 1) for (i = 0 . complex*v. i++ ) { u[i] = load_complex( i + 1. ] = subtract_complex( evens[i]. complex *uf. . i++ ) p_o[i] = p[2*i + 1] . i < n/2 . scaled_odd ) . char *argv[] ) { int i. 2*m. for ( i = 0 . . /* collect odds */ for ( i = 0 . v[i] ) . complex *w . for ( i = 0 . *w . i < n/2 . int n ) { int i . * * i i odd) */ odd) */ ] = add_complex( evens[i]. return transform . dir ) . i++ ) p_e[i] = p[2*i] .real = omega[m*i]. } int main( int argc. scale. /* scale * odd */ scaled_odd = multiply_complex( scale.real . n . 2*m. odds = FFT( p_o. } complex *pointwise_complex_multiply( complex *u. i++) { transform[i].0 .0 ) .imaginary /= (double) n . . i < n . } free( p_e ) free( p_o ) free( evens ) free( odds ) . scaled_odd ) . *v. i < n/2 . transform[i]. return w . *vf./* collect evens */ for ( i = 0 . /* Two n/2 FFTs */ evens = FFT( p_e. dir ) . odds[i] ) . v = create_vector( n ) . k−1. complex *u. 0. i++ ) w[i] = multiply_complex( u[i]. k .imaginary = (((dir == FORWARD) ? 1 : −1) * omega[m*i].

INVERSE) .0 ) . wf = pointwise_complex_multiply( uf. k. vf = FFT(v. vf. n ) . FORWARD ) . 1. wf ) .0 ) . return 0 . } . uf ) . n) . 0.v[i] = load_complex( v[i+n/2] = load_complex( } uf = FFT(u. printf("\n\nw:") . print_vector(w. u ) . vf ) . free( free( free( free( free( free( free( omega ) . FORWARD ) . k. 0. . k.0 . 1. w = FFT(wf.0 0. 1. 1. v ) . w ) .

Fast Fourier Transform) */ /*****************************************************/ #include #include #include #include "complex. char *argv[] ) { int n. for ( i = 0 . printf( "\n\n" ) . complex *w. */ { * ((i * j) % n) ) .h> <stdio. complex *F. *uf .h> complex *complex_matrix_vector_multiply( complex *A. v[i]. imag ) .real. printf( "\n" ) .h" <stdlib. * ((i * j) % n) ) . v[i].0 ) . i < n . int n ) . i++ ) { /* Fourier Transform matrix for ( j = 0 .h> <math.imaginary ) . j++ ) real = cos(((2 * M_PI)/n) imag = sin(((2 * M_PI)/n) } /* Inverse Fourier Transform matrix */ for ( j = 0 . j . *wf . i++ ) printf( "\t%f %f\n". void print_vector( complex v[].0 . n = atoi( argv[1] ) . j < n . = (complex *) malloc( sizeof(complex) * n ) . complex *u. complex *pointwise_complex_multiply( complex *u. 0. for ( i = 0 . 0. complex *v. int n ) { int i . i < n . IF[i*n + j] = load_complex( real.0/n) * cos(((2 * M_PI)/n) * (−(i * j) % n) ) . *IF . u v = (complex *) malloc( sizeof(complex) * n ) . IF = (complex *) malloc( sizeof(complex) * n * n ) . i < n/2 .0 ) . u[i+n/2] = load_complex( 0. int n ) ./*****************************************************/ /* Programmer: Willie Boag */ /* */ /* dft (vs. complex*v. j++ ) { real = (1. *vf . imag . i. F = (complex *) malloc( sizeof(complex) * n * n ) . F[i*n + j] = load_complex( real. .0/n) * sin(((2 * M_PI)/n) * (−(i * j) % n) ) . imag = (1. j < n . } int main( int argc.0. double real. } } /* Fill vectors with data */ for ( i = 0 . complex *x. i++ ) { u[i] = load_complex( i + 1. imag ) .

} b[i] = sum . } /* Perform Fourier Transform */ uf = complex_matrix_vector_multiply( F. 0.0 ) . complex*v. } complex *pointwise_complex_multiply( complex *u. } . vf. i < n . i++ ) w[i] = multiply_complex( u[i]. v. complex *x.0 . j++ ) { prod = multiply_complex( A[i*n + j]. n ) . i < n . wf = pointwise_complex_multiply( uf. j . w ) . } return b . n ) . n ) . u. vf = complex_matrix_vector_multiply( F. int n ) { int i. complex *w . n ) . uf ) . IF ) . F ) . wf. wf ) . free( free( free( free( free( free( free( free( u ) .0 ) . v[i+n/2] = load_complex( 0. print_vector(w. complex sum.0 . printf("\n\nw:") . x[j] ) .v[i] = load_complex( 1. w = (complex *) malloc( sizeof(complex) * n ) . complex *b . v ) .0 ) . b = (complex *) malloc( sizeof(complex) * n ) . 0. i++ ) { sum = load_complex( 0. v[i] ) . n) . int n ) { int i .0. 0. return w . vf ) . prod . prod ) . for ( i = 0 . } complex *complex_matrix_vector_multiply( complex *A. w = complex_matrix_vector_multiply( IF. sum = add_complex( sum. for ( j = 0 . for ( i = 0 . j < n . return 0 .

ai = a. complex b ) { complex sum . } extern complex subtract_complex( complex a. complex b ) { complex diff . } .real = (ar * br) − (ai * bi) . return sum . prod.h> extern complex load_complex( double real. sum. diff. diff.imaginary = a.imaginary = a.real = a. ai.real = real . br.real = a.imaginary .real .imaginary = imaginary .imaginary . return c ./***************************************************************/ /* Programmer: William George Boag */ /* */ /* complex. br = b.real + b. c. ar = a. bi = b.c (Fast Fourier Transform) */ /***************************************************************/ #include "complex.imaginary .imaginary . double ar.imaginary − b. bi . sum.real . } extern complex add_complex( complex a.real . return diff . return prod .real . complex b ) { complex prod . } extern complex multiply_complex( complex a. c.imaginary = (ar * bi) + (ai * br) .imaginary + b.real − b. double imaginary ) { complex c . prod.h" #include <stdio.

Appendices Appendix A Kruskal’s MST Testing Code 99 Appendix B Topological Sort Testing Code 100 .

i < edges . */ vertices = rand() % 20 + 1 . fid = fopen( "tmp.txt */ /****************************************************************/ #include <stdlib.h> int main( int argc. i < vertices . weight = rand() % 30 . } . vertices ) . fprintf( fid. b. int i . char *argv[] ) { FILE *fid . weight . a. b = rand() % vertices . for (i = 0 . edges . fclose( fid ) . if (fid == NULL) { fprintf( stderr. /* Set new seed. i++) { /* Random edge */ a = rand() % vertices . edges = rand() % 50 . i++) fprintf( fid. "\n\tERROR: Could not open file: tmp. "%d\n". /* No self−loops. */ fprintf( fid.c */ /* */ /* Task: Create a randomly generated graph that is guaranteed */ /* to be connected. exit(1) . */ srand(time(NULL)) . } /* Randomly determine number of edges and vertices.txt\n\n" ) .h> #include <time./****************************************************************/ /* Programmer: Willie Boag */ /* */ /* create. "%d %d %d\n".h> #include <stdio.txt". } /* Guarantee that graph is connected. return 0 . weight ) ."0 %d 1000\n". i ) . int a. */ for (i = 0 . */ if (a == b) b = (b + 1) % vertices . b . int vertices. /* Create a graph file. "w" ) . The graph is stored in the file */ /* tmp.

txt".h> #define CALL_USAGE "\n\tCall Usage: .txt fail%d. char command[70] . sprintf( command. int main( int argc.txt > . int i . failures = 0 .\n\n\n".txt fail%d. failures*2+1. n − failures. failures*2 + 2 ) . verify that the computed solution is correct.tmp. "\n\tFAILURE #%d".h> <time. "cp .tmp2. */ sprintf( command. /* Find a topological sort of the DAG./compare [n]\n\n" void create_graph( void ) . */ if (argc == 2) n = atoi(argv[1]) . */ /* Then. system( command ) . CALL_USAGE ) . /* Call Usage error−check */ if (argc != 1 && argc != 2) { fprintf( stderr. } /* Change number of iterations.2*failures + 1 ) . system( command ) . exit(1) . "cp . int is_topo( void ) . %d\n\n". failures ) . */ system( "~wboag/Public/msf .tmp. n ) .txt > . for (i = 0 . i++) { /* Create random graph (possibly containing cycles. if desired. return 0 . fprintf(stderr. sprintf( command.txt" ) . */ if ( !is_topo() ) { failures++ . } } /* Analysis of tests. /* Convert the graph into a DAG using the Minimum Spanning Forest algorithm. /* Verify if toplogical sort is correct.c */ /* */ /* Task: Find a topological sort of a set of randomly created graphs. "\n\tFiles: %d. */ create_graph() . } ./************************************************************************/ /* Programmer: Willie Boag */ /* */ /* Program: compare. */ printf("\n\nPassed on %d/%d random graphs. i < n .tmp2. fprintf(stderr.h> <stdio. char *argv[] ) { int n = 20.tmp.tmp2. "tsort .txt".2*failures + 2 ) .txt" ) . */ /************************************************************************/ #include #include #include #include <stdlib. system( command ) .h> <string.

} int is_topo( void ) { FILE *fid .txt". /* Initialize the visited "bit vector" */ for (i = 0 . exit(1) .tmp2. int a. /* Initialize the dependency matrix. &a. j++) depends[i][j] = 0 . depends = (int **) malloc( sizeof(int *) * vertices ) . fid = fopen( ". &weight) != EOF) . "\n\tERROR: Could not open file: . edges .txt". i < vertices .tmp. i++) depends[i] = (int *) malloc( sizeof(int) * vertices ) . "r") . weight . for (i = 0 . vertices ) . */ for (i = 0 . b. visited = (int *) malloc( sizeof(int) * vertices ) . int vertices. "w" ) . int i. */ vertices = rand() % 20 + 1 . &b. /* Set new seed. } fclose( fid ) . fscanf( fid. } /* Randomly determine number of edges and vertices. i < vertices . j . j < vertices . "%*[^:]:" ) .void create_graph( void ) { FILE *fid . /* Update dependency matrix for given DAG. weight . */ fprintf( fid. if (fid == NULL) { fprintf( stderr. edges = rand() % 50 . weight = rand() % 30 . i++) { /* Random edge */ a = rand() % vertices . i++) visited[i] = 0 . **depends . int a. weight ) . i < vertices . while (fscanf(fid. b . "%d %d %d". "%d".txt\n\n" ) . fscanf( fid. /* Create a graph file. a. */ fid = fopen(". int vertices . "%d %d %d\n". for (i = 0 . /* No self−loops. */ if (a == b) b = b + 1 % vertices . i < edges . int *visited. b. "%d\n". &vertices ) . b = rand() % vertices . fprintf( fid.tmp. int i . i++) for (j = 0 . */ srand(time(NULL)) .

} . j++) /* Case: dependency is present. /* Sweep through alleged topological sort. i < vertices . for (i = 0 . but they are out of order. &a ) . fclose(fid ) . "r" ) .tmp. i++) for ( j = 0 .depends[b][a] = 1 . j < vertices . return 1 . fclose( fid ) . fscanf( fid. Check dependencies */ fid = fopen( ". */ if ( depends[i][j] == 1 && visited[j] == 0) return 0 . "%d".txt". else visited[i] = 1 .

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.