This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

++

Page 1 of 1

The curiosity of our surroundings is the constant force that drives humanity forward. In an effort to better ourselves and our future we study the world we live in. Our own inventions define the way in which we lead our lives, and what people we become. Computers have now become the center of many cultures around the world, as technology unites people in a way that has never been seen before, changing the way we view each other. In the last decade, the Internet has become a medium that lets people share their ideas and opinions with millions of others. We are very close to a world where ignorance is no longer an excuse. Computers are, and will be even more, in the center of our lives. -- "World Around Us" by Alec Solway Computer programming is an exiciting field in the modern world. We make our lives easier by "telling" the computer to perform certain tasks for us. In a sense, this is what programming is. All types of tasks require some kind of data to be manipulated. Whether we want to play a game, or manage our portfolio, data is involved. By creating new ways to manage (access and change) data, we can make programs more efficient, and thus obtain more reliable and faster results. Different types of programs require different ways of handling data, however, standards exists between various programs. This website gives you a peek into those standards, into the world of data structures and algorithms. It is recommended that the reader have some experience with programming in general, although a brief review of the C++ concepts needed to understand the data structure tutorials is provided. Please click on "C++ Review" to begin your visit, or on "Data Structures" if you already have a solid C++ programming background. Enjoy! © 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\An Introduction to Data ... 12/13/2007

An Introduction to Data Structures with C++

Page 1 of 14

Binary Trees Binary trees are different from the three previous structures we have covered before. Lists, stacks, and queues were all linear structures, that is, the elements in them were logically following each other. A binary tree structure contains a root node, which is the first in the structure. The root points to one or two other nodes, its left and right children. The root is considered to be a parent of these two nodes. Each child is also a sub-tree, since it can have one or two children of its own. If a node has no children, it is referred to as a leaf node. Each node in the tree also has a level associated with it. The root node is at level 0, and increases with each row of nodes below the root. Binary trees have many different basic implementations. An array implementation is often times used, where every level must be completely filled. In larger trees, this can be a very big waste of space. For our demonstration, we will create a generic class using dynamic memory allocation. This particular implementation was created by the author using a mixture of possible approaches. It is very effective in explaining the concepts behind binary trees. The binary tree class gives the programmer complete control over the tree. Nodes may be removed and inserted into any location in the list. The class allows the user to traverse the tree by keeping a current pointer, just as in the linked list class. The programmer can then use the functions left(), right(), and parent() to move from one node to another. The class also allows the user to display the tree in inorder, post-order, and pre-order. The "order" refers to how the nodes are displayed. For instance, in preorder, a node's value is displayed, then the value of its left child, followed by the right child. In the case of in-order, the node's value is displayed between the value of its left and right child. In post order, the node's children are displayed before it. The implementation for the binary tree class is displayed below. Although it may look intimidating at first, the code is very easy to follow. The purpose and code behind each function is explained following the definition. You will note, most functions are programmed using recursion. Since each node is actually a tree within itself, using recursion is the easiest approach Many books make a class for a single node, and use it to implement the tree. However, we will separate the structure for each node and the entire tree to conserve overhead processing time. Each time a node is created, much less time and memory is used than when a whole tree structure is made. Each node will store a value, and pointers to its children and parent. These will be used and modified by the general tree class.

template <class ItemType> struct TreeNode { ItemType data;

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 2 of 14

TreeNode<ItemType> *left; TreeNode<ItemType> *right; TreeNode<ItemType> *parent; }; template <class ItemType> class BinaryTree { public: BinaryTree(); //create empty tree with default root node which has no value. set current to main root node. BinaryTree(TreeNode<ItemType>*,int); //create new tree with passed node as the new main root. set current to main root. if the second parameter is 0, the new object simply points to the node of the original tree. If the second parameter is 1, a new copy of the subtree is created, which the object points to. ~BinaryTree(); void insert(const ItemType&,int); //insert new node as child of current. 0=left 1=right void remove(TreeNode<ItemType>*); //delete node and its subtree ItemType value() const; //return value of current //navigate the tree void left(); void right(); void parent(); void reset(); //go to main_root void SetCurrent(TreeNode<ItemType>*); //return subtree (node) pointers TreeNode<ItemType>* pointer_left() const; TreeNode<ItemType>* pointer_right() const; TreeNode<ItemType>* pointer_parent() const; TreeNode<ItemType>* pointer_current() const; //return ItemType ItemType ItemType values of children and parent without leaving current node peek_left() const; peek_right() const; peek_parent() const;

//print the tree or a subtree. only works if ItemType is supported by << operator void DisplayInorder(TreeNode<ItemType>*) const; void DisplayPreorder(TreeNode<ItemType>*) const; void DisplayPostorder(TreeNode<ItemType>*) const; //delete all nodes in the tree void clear(); // bool IsEmpty() const; bool IsFull() const; private: TreeNode<ItemType>* current; TreeNode<ItemType>* main_root;

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 3 of 14

TreeNode<ItemType>* CopyTree(TreeNode<ItemType>*,TreeNode<ItemType>*) const; //create a new copy of a subtree if passed to the constructor bool subtree; //does it reference a part of a larger object? };

The first constructor simply sets the main_root and current data members to NULL, since the tree has no nodes. A new tree is made, therefore it is not part of a larger tree object, and the subtree value is set accordingly.

template <class ItemType> BinaryTree<ItemType>::BinaryTree() { //create a root node with no value main_root = NULL; current = NULL; subtree = false; }

The second constructor accepts a pointer to a node, and creates a new tree object with the node that is passed acting as the new tree's main root. current is then set to the main root. The second parameter specifies whether the new subtree object points directly to the original tree's nodes (the root and its decedents), or creates a copy of the subtree and is thus a new tree. The subtree variable specifies if the subtree points directly to the original tree's nodes. As you will later find out, this is important in the class destructor.

template <class ItemType> BinaryTree<ItemType>::BinaryTree(TreeNode<ItemType>* root, int op) { if(op = 0) { main_root = root; current = root; subtree = true; } else { main_root = CopyTree(root,NULL); current = main_root; subtree = false; } }

The CopyTree() function creates a copy of subtree root and returns a pointer to the location of the new copy's root node. The second parameter is a pointer to the parent of the subtree being passed. Since CopyTree() uses recursion to traverse the original tree, passing each node's parent as a parameter is the most efficient way of assigning each new node's parent value. Since the parent of the main root is always NULL, we pass NULL as the second parameter in the class constructor above.

template <class ItemType> TreeNode<ItemType>* BinaryTree<ItemType>::CopyTree(TreeNode<ItemType> *root, TreeNode<ItemType> *parent) const {

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 4 of 14

if(root == NULL) //base case - if the node doesn't exist, return NULL. return NULL; TreeNode<ItemType>* tmp = new TreeNode<ItemType>; //make a new location in memory tmp->data = root->data; //make a copy of the node's data tmp->parent = parent; //set the new node's parent tmp->left = CopyTree(root->left,tmp); //copy the left subtree of the current node. pass the current node as the subtree's parent tmp->right = CopyTree(root->right,tmp); //do the same with the right subtree return tmp; //return a pointer to the newly created node. }

The job of the class destructor is to delete all the nodes, and free up memory as usual. The clear() function is called just as in the previous data structure implementations. However, this operation is only performed if the object is a main tree. If the object is a subtree that points to the nodes of a larger tree, it will be deleted when the main tree itself is deleted. Attempting to delete the data in the memory associated with the subtree after it has already been deleted by the main tree will have unpredictable results.

template <class ItemType> BinaryTree<ItemType>::~BinaryTree() { if(!subtree) clear(); //delete all nodes }

The insert() function creates a new node as a child of current. The first parameter is a value for the new node, and the second parameter is an integer indicating what child the new node will become. A value of 0 indicates that the new node will be a left child of current, whereas a value of 1 indicates the new node will be a right child. If a node already exists in the location that programmer wishes to insert it, that node adopts the value passed to insert(). If the tree does not have any nodes, the second parameter is disregarded, and a main root is created.

template <class ItemType> void BinaryTree<ItemType>::insert(const ItemType &item,int pos) //insert as child of current 0=left 1=right. if item already exists, replace it { assert(!IsFull()); //if the tree has no nodes, make a root node, disregard pos. if(main_root == NULL) { main_root = new TreeNode<ItemType>; main_root->data = item; main_root->left = NULL; main_root->right = NULL; main_root->parent = NULL; current = main_root; return; //node created, exit the function } if(pos == 0) //new node is a left child of current { if(current->left != NULL) //if child already exists, replace value

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 5 of 14

(current->left)->data = item; else { current->left = new TreeNode<ItemType>; current->left->data = item; current->left->left = NULL; current->left->right = NULL; current->left->parent = current; } } else //new node is a right child of current { if(current->right != NULL) //if child already exists, replace value (current->right)->data = item; else { current->right = new TreeNode<ItemType>; current->right->data = item; current->right->left = NULL; current->right->right = NULL; current->right->parent = current; } } }

The remove() function removes the subtree referenced to by root, as well as the root node itself. Depending on whether it was a left or right child, the left or right pointer of the parent is set to NULL. The function uses recursion to perform the necessary operation on all nodes of the subtree. We must start with the nodes on the lowest level, and work our way up. If we were to delete the top level nodes first, we would loose the link the lower levels.

template <class ItemType> void BinaryTree<ItemType>::remove(TreeNode<ItemType>* root) { if(root == NULL) //base case - if the root doesn't exist, do nothing return; remove(root->left); //perform the remove operation on the nodes left subtree first remove(root->right); //perform the remove operation on the nodes right subtree first if(root->parent == NULL) //if the main root is being deleted, main_root must be set to NULL main_root = NULL; else { if(root->parent->left == root) //make sure the parent of the subtree's root points to NULL, since the node no longer exists root->parent->left = NULL; else root->parent->right = NULL; } current = root->parent; //set current to the parent of the subtree removed. delete root; }

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 6 of 14

**The next function returns the value of current.
**

template <class ItemType> ItemType BinaryTree::value() const { return current->data; }

The next five functions are used to navigate the tree. The programmer can visit a node's left child, right child, or parent, as well as reset current to the main root. Finally, the programmer can set current to a specific node by supplying a pointer to it. This is very helpful if the programmer would like to work with subtrees within the main tree object. Note, the SetCurrent() function should be used with caution. If a pointer is supplied to a node that is not within the tree, the results are unpredictable.

template <class ItemType> void BinaryTree<ItemType>::left() { current = current->left; } template <class ItemType> void BinaryTree<ItemType>::right() { current = current->right; } template <class ItemType> void BinaryTree<ItemType>::parent() { current = current->parent; } template <class ItemType> void BinaryTree<ItemType>::reset() { current = main_root; } template <class ItemType> void BinaryTree<ItemType>::SetCurrent(TreeNode<ItemType>* root) { current = root; }

The four functions that follow return pointers to various nodes in the tree, depending on current. This is a required parameter for a few of our other functions, such as remove() and the three display functions. It is also used by one of our class constructors, which can make a new tree object from a subtree. The only function that is required is pointer_current(), since the programmer can navigate the tree to any node. The other three functions were also included for ease of use. It is often times necessary to perform an operation on a node's children or parent without leaving the node. The functions are also useful if a programmer would like to work on a subtree. An external TreeNode* pointer can be created, set by one of the pointer returning functions, and then passed to the operation functions of the

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 7 of 14

class.

template <class ItemType> TreeNode<ItemType>* BinaryTree<ItemType>::pointer_left() const { return current->left; } template <class ItemType> TreeNode<ItemType>* BinaryTree<ItemType>::pointer_right() const { return current->right; } template <class ItemType> TreeNode<ItemType>* BinaryTree<ItemType>::pointer_parent() const { return current->parent; } template <class ItemType> TreeNode<ItemType>* BinaryTree<ItemType>::pointer_current() const { return current; }

The next three functions are also not required, but were added for ease of use. They return the values of a node's two children and parent without having to leave the node.

template <class ItemType> ItemType BinaryTree<ItemType>::peek_left() const { assert(current->left != NULL); return current->left->data; } template <class ItemType> ItemType BinaryTree<ItemType>::peek_right() const { assert(current->right != NULL); return current->right->data; } template <class ItemType> ItemType BinaryTree<ItemType>::peek_parent() const { assert(current->parent != NULL); return current->parent->data; }

The display functions as explained above are next. Note, these functions will work only if ItemType is supported by the << operator. For instance, any simple built in C/C++ type (such as int, float, char, etc.) will work without any modification. template <class ItemType>

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 8 of 14

void BinaryTree<ItemType>::DisplayInorder(TreeNode<ItemType>* root) const { if (root == NULL) return; DisplayInorder(root->left); cout << root->data; DisplayInorder(root->right); } template <class ItemType> void BinaryTree<ItemType>::DisplayPreorder(TreeNode<ItemType>* root) const { if (root == NULL) return; cout << root->data; DisplayInorder(root->left); DisplayInorder(root->right); } template <class ItemType> void BinaryTree<ItemType>::DisplayPostorder(TreeNode<ItemType>* root) const { if (root == NULL) return; DisplayInorder(root->left); DisplayInorder(root->right); cout << root->data; }

The clear() function deletes all nodes in the list. This is very easy to do, since we can take advantage of the remove() function, which we has already defined. The remove() functions deletes all nodes of a subtree, as well as the root node. Therefore, we can pass the main root to remove() in order to delete all nodes in the tree.

template <class ItemType> void BinaryTree<ItemType>::clear() { remove(main_root); //use the remove function on the main root main_root = NULL; //since there are no more items, set main_root to NULL current = NULL; }

The IsEmpty() function works main_root points to NULL.

by evaluating main_root. If there aren't any nodes in the tree,

template <class ItemType> bool BinaryTree<ItemType>::IsEmpty() const { return (main_root == NULL); }

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 9 of 14

Finally, other than the data types, the implementation of the IsFull() function does not change from previous classes.

template <class ItemType< bool BinaryTree<ItemType>::IsFull() const { TreeNode<ItemType> *tmp = new TreeNode<ItemType>; if(tmp == NULL) return true; else { delete tmp; return false; } }

Now let's take a look at two additional functions, which are not part of the tree class. Often times it is necessary to know how many nodes are in the list, or how many of them are leafs. One example of when a leaf count is required is in a binary expression tree. Binary expression trees store mathematical expression, for instance, 5*x+7=22. Each character of the expression is represented by one node. They are stored in such a way that the expression can then be displayed using an in-order traversal. Also, preorder and post-order traversals will display the mathematical expression using prefix and postfix notations. This means an operator stored in a node perform an operation on its two children. In such a setup, all operators are internal nodes, whereas variables and constants are leafs. The code for the NodeCount() and LeafCount() functions is displayed below. Both are very short since recursion is used.

template <class ItemType> int LeafCount(TreeNode<ItemType>* root) { if(root == NULL) //base case - if the node doesn't exist, return 0 (don't count it) return 0; if((root->left == NULL) && (root->right == NULL)) //if the node has no children return 1 (it is a leaf) return 1; return LeafCount(root->left) + LeafCount(root->right); //add the leaf nodes in the left and right subtrees } template <class ItemType≶ int NodeCount(TreeNode<ItemType>* root) { if(root == NULL) //base case - if the return 0 if node doesn't exist (don't count it) return 0; else return 1 + NodeCount(root->left) + NodeCount(root->right); //return 1 for the current node, and add the amount of nodes in the left and right subtree }

Binary Search Trees Another type of special binary tree is the binary search tree (BST). BSTs must conform to a property

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 10 of 14

that states all the left children of a node have a lesser value than the node, and all the right children have a value greater than the node. In order to make our binary tree class a BST, we need only modify the insert() function, since nodes can no longer be placed anywhere in the tree by the programmer. Otherwise, a BST acts in the same fashion as a standard binary tree. The new insert() function accepts one parameter, the item to be inserted. If the tree is empty, then the main root is added in the same fashion as it was with a standard binary tree, and the function exits. If the tree is not empty, the function proceeds with a new algorithm. The parent of the new node is found by running the insert_find() function, a new private member function that must be added to the BST class. The insert_find() function accepts two parameters - the root of the tree and the item value. The root is needed as a parameter because insert_find() uses recursion to traverse the tree. The function works by comparing the value of item to each node. If it is less than the node's value, and the node has a left child, the function proceeds to that child and performs the same operation. If it is greater (or equal to) than the node's value, and a right child exists, then the function proceeds to the right child. If item is less than the node and a left child does not exist, or it is more than the node and a right child does not exist, that is where the new node belongs. The insert() function receives a pointer to the new parent, however, we must check again if the new node is to be the left or right child. This is because there is no efficient way a recursive version of insert_find() can return this information. We would have to add another parameter, or write a nonrecursive version of insert_find(). Both methods are far more space and time consuming than simply performing another check. A new node is then created using the same method as in the original version of insert().

template <class ItemType> void BST<ItemType>::insert(const ItemType &item) { //if the tree has no nodes, make a root node if(main_root == NULL) { main_root = new TreeNode<ItemType>; main_root->data = item; main_root->left = NULL; main_root->right = NULL; main_root->parent = NULL; current = main_root; return; } TreeNode<ItemType>* new_parent = insert_find(main_root,item); //find the new node's parent if (item < new_parent->data) //check whether the new node is a left or right child and create it { new_parent->left = new TreeNode<ItemType>; new_parent->left->data = item; new_parent->left->left = NULL; new_parent->left->right = NULL;

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 11 of 14

new_parent->left->parent = new_parent; } else { new_parent->right = new TreeNode<ItemType>; new_parent->right->data = item; new_parent->right->left = NULL; new_parent->right->right = NULL; new_parent->right->parent = new_parent; } } template <class ItemType> TreeNode<ItemType>* BST<ItemType>::insert_find(TreeNode<ItemType>* delete_node,ItemType item) { if((root->left != NULL) && (item < root->data)) return insert_find(root->left,item); if((root->right != NULL) && (item >= root->data)) return insert_find(root->right,item); return root; }

Since the programmer is no longer in control of the location of each node, we can add a function that removes one node at a time. The reason we did not implement such a function for the standard binary tree class is that we do not know how the program requires the structure to handle removing a node. If the node to be deleted is a leaf, the solution is simple. We set its parent's pointer to NULL. If it has one child, then the parent is set to point to that child. However, what happens if the node has two children ? How should we insert those children back into the tree once the node is deleted ? The solution can vary, depending on the programming task. Since a binary search tree follows a specific property, we can libraryelop an algorithm that maintains the binary search tree property when rearranging the node's children. The code for the remove_node() function may look difficult at first, however when broken down into each possible situation, it is very simple to understand. The first case that must be considered is if we wish to remove the main root, and it has only one child. In this situation, that child becomes the new main root. If the node to be deleted has no children, node's parent is set to point to NULL, and node is then deleted. Also as mentioned before, if node has one child, the parent of node is set to point to this child, and node is then deleted. If delete_node has two children, a little bit more work must be done. The question arises as to how to attach delete_node's children to the tree, while preserving the binary search tree property. One possible solution would be to attach one of the children to delete_node's parent, and reinsert each node from the second child subtree one-by-one. However, this method is very inefficient, especially if the node to be deleted has a very large subtree. The ideal approach is to find another node in the tree, which can replace delete_node, and still maintain the binary search tree property. We can then replace the value of delete_node, and remove the

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 12 of 14

node who's value we replaced it with instead. The possible nodes that would fit such a criteria are the largest node in the left subtree, and the smallest node in the right subtree. The largest node in the left subtree is still smaller than any node of the right subtree, and the smallest node in the right subtree is larger than any node in the left subtree, therefore making both possible values that can replace root. If a right subtree exists, then it is used, since it may contain a value that is equal to delete_node (in which case that value will be used). The largest node in the left subtree can be found by moving to the left one time (starting from delete_node), and then moving to the right as much as possible. The smallest node in the right subtree can be found by moving once to the right, and then to the left as much as possible. This means that either node will have at most one child (the largest node in the left subtree can only have a left child, and the smallest node in the right subtree can only have a right child). It can therefore be removed using the above method, by attaching that child to the node's parent. The replace_find() private member function returns the node that is used to replace delete_node. The first parameter of the function is a pointer to the first possible node (if we are searching the left subtree, this value is root->left, whereas it is root->right if we are searching the right subtree), and the second parameter is what direction to search in. Zero means the function will locate the largest value in the left subtree, meaning it will travel to the right as much as possible. A value of one means the function will locate the smallest value in the right subtree, therefore travelling to the left as much as possible. Once the node is found, delete_node is set to its value, and the node is deleted.

template <class ItemType> void BST<ItemType>::remove_node(TreeNode<ItemType>* root) { if((root == main_root) && ((root->left == NULL) || (root->right == NULL))) { //set the main root's only child as the new root. if it has no children, main_root becomes NULL as the tree is empty. if(root->left == NULL) main_root = root->right; else main_root = root->left; main_root->parent = NULL; //set the new main root's parent to NULL if(current == root) //if current is at the original main root, set it to the new root, since the original will be deleted current = main_root; delete root; return; } if(current == root) //if current is at the node to be deleted, set it to the node's parent current = root->parent; if((root->left == NULL) && (root->right == NULL)) //if the root has no children { //have the parent point to NULL in place of it if(root->parent->left == root) //if it a left child root->parent->left = NULL; else //it is a right child root->parent->right = NULL;

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 13 of 14

delete root; return; } //if the root has one child, have the parent point to it in place of root if((root->left == NULL) && (root->right != NULL)) { if(root->parent->left == root) root->parent->left = root->right; else root->parent->right = root->right; delete root; return; } if((root->left != NULL) && (root->right == NULL)) { if(root->parent->left == root) root->parent->left = root->left; else root->parent->right = root->left; delete root; return; } //if the node has two children TreeNode<ItemType> *tmp; if(root->right != NULL) //if the root has a right subtree, search it for the smallest value tmp = replace_find(root->right,1); else //search the left subtree for the largest value tmp = replace_find(root->left,0); root->data = tmp->data; //if tmp has a child, have tmp's parent point to it. Otherwise, have the parent point to NULL in place of tmp. if(tmp->parent->left == tmp) //if tmp is a left child { if(root->right != NULL) //if it has a right child, have the parent point to it tmp->parent->left = tmp->right; else //point to the left child. This value is NULL if there is no left child tmp->parent->left = tmp->left; } else //if tmp is a right child { if(root->right != NULL) //if it has a right child, have the parent point to it tmp->parent->right = tmp->right; else //point to the left child. This value is NULL if there is no left child tmp->parent->right = tmp->left; } delete tmp; } template <class ItemType> TreeNode<ItemType>* BST<ItemType>::replace_find(TreeNode<ItemType>* root,int direction)

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 14 of 14

{ if(direction = 0) //searching left subtree for largest value. go right as much as possible. Return last node. { if(root->right == NULL) return root; return replace_find(root->right,0); } else //searching right subtree for smallest value. go left as much as possible. Return last node. { if(root->left == NULL) return root; return replace_find(root->left,1); } }

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm

12/13/2007

An Introduction to Data Structures with C++

Page 1 of 4

Heaps Another type of special binary tree is called a heap. In order to understand what a heap is, we must first define a complete and full binary tree. In a full binary tree, all nodes are either a parent with two children, or a leaf. In a complete binary tree, all the levels except the last must be completely filled. In the last level, all nodes must be filled in from the left side, without spacing between them, however, it does not have to be filled to the end. A heap is a complete binary tree, which is partially ordered with either the max-heap or min-heap properties. That is, if a heap is a max-heap, then the children of every node have a value less than that node. In a min-heap, the children of every node are greater than the node itself. With such a setup, the main root always has either the highest or lowest value in the tree. For demonstration purposes, we will show how to implement a max-heap, as it also an important part of the HeapSort algorithm, which will be covered later. [It is easy to change to code to work as a min-heap by changing the relational operators between node values]. A max-heap usually used for maintaining priority queues. Priority queues store values and release the object with the highest "priority" (or value) when needed. For instance, a value is associated with a particular task in a program, put into such a structure, and then executed based on its position. Since a heap must conform to the complete tree property, simple formulae can be libraryeloped to find the logical position of a node's children and parent given the position of the node itself. It is therefore very easy and efficient to implement a heap using arrays, and is done so most of the time, even if dynamic memory allocation is available. In an array implementation, we must allocate a certain amount of memory space that may be used for the heap. The space may not be used up, and is therefore a waste of memory. Other times, we may need to add more nodes to the heap than the allocated memory allows for. However, we usually allocate more space than we think may be required in order to insure the heap is usable. If we have a tree of very large structures, this space can be significant. However, this is the price we always pay for greater efficiency. It should noted though, that we no longer have three pointers for every tree node (left, right, parent), which took up a lot of space in the dynamic memory implementation. The logical position of a node in a heap corresponds to the index of the node's array position, thereby making it very easy to access any node. A generic implementation is shown below.

const int MAX_SIZE = 100; //the maximum amount of elements our heap should have. This may be changed to any number so long as memory permits, depending on how the heap will be used. template <class ItemType> class Heap { public:

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\heaps.htm

12/13/2007

An Introduction to Data Structures with C++

Page 2 of 4

Heap(); int left(int) const; int right(int) const; int parent(int) const; void insert(const ItemType&); ItemType remove_max(); bool IsEmpty() const; bool IsFull() const; int count() const; ItemType value(int) const; private: ItemType array[MAX_SIZE]; int elements; //how many elements are in the heap void ReHeap(int); }; //default constructor - initialize private variables template <class ItemType> Heap<ItemType>::Heap() { elements = 0; }

The left(), right(), and parent() functions return the index positions of a node's children and parent. Since the index position of each element correspond to their logical position in the heap, the functions use simple formulae that are derived by observing the heap structure.

template <class ItemType> int Heap<ItemType>::left(int root) const { assert(root return (root * 2) + 1; } template <class ItemType> int Heap<ItemType>::right(int root) const { assert(root < (elements-1)/2); //does a right child exist? return (root * 2) + 2; } template <class ItemType> int Heap<ItemType>::parent(int child) const { assert(child != 0); //main root has no parent return (child - 1) / 2; }

The insert() function accepts the new item value as its parameter. It works by inserting the new item at the end of the heap, and swapping positions with the parent, if the parent has a smaller value than the item. The new item continues to travel up the heap, swapping its position with its new parents until the item's parent is larger than it.

template <class ItemType> void Heap<ItemType>::insert(const ItemType &item)

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\heaps.htm

12/13/2007

An Introduction to Data Structures with C++

Page 3 of 4

{ assert(!IsFull()); array[elements] = item; //elements represents the array position after the last, since indexing starts with 0 int new_pos = elements; //index of the new item elements++; //update the amount of elements in heap while((new_pos != 0) && (array[new_pos] > array[parent(new_pos)])) //loop while the item has not become the main root, and while its value is less than its parent { swap(array[new_pos],array[parent(new_pos)]); //swap the value of item with its lesser parent new_pos = parent(new_pos); //update the item's positions } }

The remove_max() removes the item with the highest priority and returns its value. The item is swapped with the last item, and elements is updated to one less. Notice the item is not physically deleted, it will remain as part of the array. It will not be part of the heap since elements is updated, and the heap goes only as far as (elements - 1). The new root may not have the largest priority, therefore the ReHeap() function is then used to insert the new root into its proper position, thus conserving the heap property. template <class ItemType>

ItemType Heap<ItemType>::remove_max() { assert(!IsEmpty()); elements--; //update the amount of elements in heap if(elements != 0) //if we didn't delete the root { swap(array[0],array[elements]); ReHeap(0); } return array[elements]; }

The ReHeap() function checks of either of root's children are bigger than it, in which case the bigger child is swapped with root. The process is then continued using recursion, on root's new children. The function stops when root is bigger than both of its children.

template <class ItemType> void Heap<ItemType>::ReHeap(int root) { int child = left(root); if((array[child] < array[child+1]) && (child < (elements-1))) //if a right child exists, and it's bigger than the left child, it will be used child++; if(array[root] >= array[child]) //if root is bigger than its largest child, stop. return; swap(array[root],array[child]); //swap root and its biggest child ReHeap(child); //continue the process on root's new children

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\heaps.htm

12/13/2007

An Introduction to Data Structures with C++

Page 4 of 4

}

**The rest of our member functions are east to implement.
**

template <class ItemType> int Heap<ItemType>::count() const { return elements; } template <class ItemType> ItemType Heap<ItemType>::value(int pos) const { assert(pos < elements); //is pos a valid index in the heap return array[pos]; } template <class ItemType< bool Heap<ItemType>::IsEmpty() const { return (elements == 0); } template <class ItemType> bool Heap<ItemType>::IsFull() const { return (elements == MAX_SIZE); }

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\heaps.htm

12/13/2007

An Introduction to Data Structures with C++

Page 1 of 6

Lists A list is one of the most basic data structures in programming. It is a logically sequential order of elements, any of which can be accessed without restriction. Any element in the list can be removed, and its value be read or modified. Also, a new element may be inserted into any location in the list structure. Each element points to the next one in the list, and the last does not reference any other item. Physically, the elements of a list can be stored at various locations in memory, and the addresses of each element are not correlated in any way. The list is linked since each element points to the location of the next item. The dynamic representation of a list is called a linked list. Each element in the list is called a node, and contains two values. The first, is the data value that is to be stored. For instance, in a list of names, this would be a value such as John. The second value in a node is a pointer to the next node in the list. A common representation of a list node is as follows:

template <class ItemType> struct ListNode { ItemType data; ListNode<ItemType> *next; };

Linked lists can be implemented in many ways, depending on how the programmer will use lists in their program. We will show how to implement a generic class, which can be adapted and modified to use in most situations. The member functions implemented will be those necessary to add, modify, or delete nodes in a linked list. The class will be constructed in such a way that if the implementation were to be changed, the class definition would remain the same, and therefore any program that uses the class will not need to be altered. This is usually good practice in the design of any class. A definition of the linked list class is displayed below.

template <class ItemType> class List { public: List(); //constructor - initialize private variables ~List(); //destructor - free used memory void insert(const ItemType); //insert new node at current location void delete(); //remove the current node void next(); //set current to the next node in the list void prev(); //set current to the previous node in the list void reset(); //set current to the first node in the list void clear(); //remove all nodes in the list

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm

12/13/2007

An Introduction to Data Structures with C++

Page 2 of 6

int length() const; //return the amount of nodes in the list bool IsEmpty() const; //returns true if the list doesn't have any nodes bool IsFull() const; //returns true if there is no system memory for additional nodes ItemType value() const; //returns the value of the current node private: ListNode<ItemType> *list; //points to the list header ListNode<ItemType> *prevcurrent; int len; } len will contain the total number of nodes in the list, and is self explanatory. prevcurrent and list will be implemented in a special way, and require further explanation. At first, it would appear logical to have a pointer directly to the node being referenced. However, this would make the implementation of the prev (), as well as the insert() and delete() functions to be time consuming. These three functions require access to the node that precedes the current node being referenced. Therefore, if there were a pointer directly to the required node, the only solution would be to search through the entire list (in the worst case scenario) for the preceding node. A temporary pointer would be created that points to the list's first node, and be used to traverse the list. A condition would then be implemented that triggers when temp->next->data equals the current node's value.

A more efficient approach is to have prevcurrent store the pointer to the node that precedes the one being referenced. Therefore, the time needed to otherwise find this node is not wasted. This approach raises a new concern - if the list has only one item, then a special case will need to be introduced every place prevcurrent is used, since there is no preceding node to point to. The most efficient way of solving this problem is by using a header node. A header node is a "dummy" node that acts as the first node of the list, but is not logically in the list. It is used only in the implementation of the linked list class, and the programmer who uses the class does not need to know about header nodes. It is created, manipulated, and deleted by the member functions. list will point to header node. Now let's look at the implementation of the linked list class. The class constructor will create the header node, and set prevcurrent to point to its location. It will also set length to a value of zero.

template <class ItemType> List<ItemType>::List() { list = new ListNode; //create a new ListNode in memory list->next = NULL; //the header is the only node in the list revcurrent = list; //set current to the header, since there are no nodes len = 0; }

The destructor will delete all nodes of the list, freeing up the memory they occupied. Since the clear() function does this operation, it can be called in the destructor. In addition, the header node will also be deleted in the destructor, as the clear() function only removes actual list nodes. Two local variables of type ListNode<itemType> are used for the operation. traverse is set to the first node in the list, and used to visit every node. tmp will be used to point to the node to be deleted. Since we cannot move on to the next node if the current node is deleted (the next pointer will no longer exist), traverse will be set

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm

12/13/2007

An Introduction to Data Structures with C++

Page 3 of 6

**to the next node first, after which tmp will be deleted.
**

template <class ItemType> void List<ItemType>::clear() { ListNode<ItemType> *tmp; //point to the node to be deleted ListNode<ItemType> *traverse = list->next; //used to visit each node in the list. The header node is not deleted, so we start with the first actual node. while(traverse != NULL) //while the list is not empty { tmp = traverse; //store the current node. traverse = traverse->next; //visit the next node delete tmp; //free the memory taken up by the current node } prevcurrent = list; //set current to the header node len = 0; } template <class ItemType< List<ItemType>::~List() { clear(); //delete all list nodes delete list; //delete the header "dummy" node }

The insert() function will create a new node preceding the one being referenced, and move the reference to it. Remember, current->next is the node currently being referenced, since prevcurrent points to the preceding node.

template <class ItemType> void List<ItemType>::insert(const ItemType item) { assert(!IsFull()); //abort if there is no memory to create a new node ListNode<ItemType> *NewNode = new ListNode<ItemType>; //create a new node in memory NewNode->data = item; //set the node's value NewNode->next = prevcurrent->next; //referenced node will follow new node in order prevcurrent->next = NewNode; //The node that preceded the old node now precedes the new one. The new node is now referenced. len++; //increment length }

The delete() function sets the previous node to point to the node following the one being referenced, thus removing it from the logical list. It is then deleted from memory.

template <class ItemType> void List<ItemType>::delete() { if(len != 0) //don't delete the header node { prevcurrent->next = prevcurrent->next->next; //logically remove it from the list delete (prevcurrent->next); //free up memory

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm

12/13/2007

An Introduction to Data Structures with C++

Page 4 of 6

len--; //decrement length } }

**The next() function is very short, and self explanatory.
**

template <class ItemType> void List<ItemType>::next() { prevcurrent = prevcurrent->next; }

The prev() function visits each node until the one that points to prevcurrent is found, and then sets prevcurrent to this node. The node being referenced now becomes the node prevcurrent pointed to before the function was executed.

template <class ItemType> void List<ItemType>::prev() { if (len > 1) //run only if there is an element behind the current { ListNode *tmp = list; while(tmp->next != prevcurrent) tmp = tmp->next; revcurrent = tmp; } }

The reset() function sets the item in reference to the first by setting prevcurrent to the header node.

template <class ItemType> void List<ItemType>::reset() { prevcurrent = list; }

Next, the length() function simply returns the value of the private length member len. A function would not be necessary to perform this operation if we were to make the length a public data member, in which case the programmer can read it directly. However, a member function is used to retrieve this value for two reasons. First, if the length data member were public, the programmer could also change the length of the list in the program without modifying the number of nodes in the list. Second, if we were to change the implementation of the class, and the length was no longer controlled by a single variable, the programmer would not have to modify his program in order for it to work.

template <class ItemType> int List<ItemType>::length() const { return len; }

The IsEmpty() function works by checking if the length of the list is zero.

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm

12/13/2007

An Introduction to Data Structures with C++

Page 5 of 6

template <class ItemType> bool List<ItemType>::IsEmpty() const { return (len == 0); }

The IsFull() function checks to see if there is enough system memory to create a new node. This is done by attempting to create a new node, and checking if the resulting pointer is NULL. If the new operation was unsuccessful in assigning the appropriate memory space, NULL is the return value. If the value is not NULL, the memory space is freed, and false is returned. Otherwise, the function returns true, meaning no more items can be added to the list.

template <class ItemType> bool List<ItemType>::IsFull() const { ListNode<ItemType> *tmp = new ListNode<ItemType>; if(tmp == NULL) return true; else { delete tmp; return false; } }

**Finally, the value() function returns the value of the node that is currently being referenced.
**

template <class ItemType> ItemType List<ItemType>::value() const { return prevcurrent->next->data; }

Other Types of Lists Like header nodes, lists may also have a trailer node, which is a dummy node at the end of the list. They are maintained in a similar fashion to that of header nodes, that is, they are not part of the logical list structure. Header nodes are used to eliminate any special cases that may arise when inserting a new node at the end of the list. Other type of standard lists exist as well. For instance, in a circular list, the last node points to the first node instead of a NULL value. This would require minor changes in the class implementation. Nodes inserted at the end must now point to the first node, and the same must be done when searching for the last node. We must look for a node which points to the first instead of NULL. Finally, a doubly linked list maintains a prev pointer in ListNode, which points to the previous node. This makes functions such as insert() and remove() as well as the general implementation very simple. We no longer need to maintain a pointer to the node preceding the one being referenced. Since the previous node is required by both functions, we need only check prev for it. Other modifications needed in order to create a doubly linked list include updating each new node's prev pointer when it is

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm

12/13/2007

An Introduction to Data Structures with C++

Page 6 of 6

created. The front pointer has a prev value of NULL and requires a special condition, unless a header node or a circular doubly linked list is used. © 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm

12/13/2007

An Introduction to Data Structures with C++

Page 1 of 4

Queues A queue is another special type of list structure. Elements can only be inserted to the back of a queue, and only the front element can be accessed and modified. The structure of a queue is the same as that of a line of people. A person who wishes to stand in line must go to the back, and the person in front of the line is served. Thus, a queue is a FIFO structure, "First In, First Out". A generic queue is very simple to, a class definition for a linked queue is shown below.

template <class ItemType> struct QueNode { ItemType data; QueNode<ItemType> *next; }; template <class ItemType> class Queue { public: Queue(); //class constructor - initialize variables ~Queue(); //class destructor - return memory used by queue elements void enqueue(const ItemType); //add an item to the back of the queue ItemType dequeue(); //remove the first item from the queue and return its value ItemType first() const; //return the value of the first item in the queue without modification to the structure bool IsEmpty() const; //returns true if there are no elements in the queue bool IsFull() const; //returns true if there is no system memory for a new queue node int length() const; //returns the amount of elements in the queue private: QueNode<ItemType> *front; QueNode<ItemType> *back; int len; };

The front pointer will reference the first node in the queue, and the back pointer will reference the last node in the queue. It is possible to maintain only a front pointer. The last node in the list points to a NULL value, and can easily be found. However, such a design would be inefficient since finding the last node every time its location is needed is very time consuming. Therefore, we maintain a reference to it in our class implementation. The class constructor initializes the private data members.

template <class ItemType>

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\queue.htm

12/13/2007

An Introduction to Data Structures with C++

Page 2 of 4

Queue<ItemType>::Queue() { front = NULL; back = NULL; len = 0; }

The destructor deletes all nodes, freeing up the memory they used. The clear() function is called to do this operation.

template <class ItemType> Queue<ItemType>::~Queue() { clear(); }

The enqueue() function adds a new item to the back of the queue. The algorithm differs slightly, depending on whether the queue is empty or not. If the queue is not empty, the last node is set to point to the newly created node, and the back pointer is set to reference the new node. The value of the new node is item, and its next pointer has a value of NULL. If the queue is empty, a similar procedure is used. However, the front pointer is also set to reference the newly created node. Since there is only one node in the queue, the front and back are one in the same.

template <class ItemType> void Queue<ItemType>::enqueue(const ItemType item) { assert(!IsFull()); //abort if there is no more memory for a new node if(len != 0) //if the queue is not empty { back->next = new QueNode<ItemType>; //create a new node back = back->next; //set the new node as the back node back->data = item; back->next = NULL; } else { back = new QueNode<ItemType>; //create a new node back->data = item; back->next = NULL; front = back; //set front to reference the new node. Since there it is the only node in the queue, it is considered to be both the back and front. } len++; //increment the amount of elements in the queue }

The dequeue() function removes the node at the front of the queue and returns its value. The value of the front node is stored in item. A temporary local pointer is then created to reference the front node, and the front pointer is set to the next element in the queue. The front node is deleted using tmp as a reference. The function then checks if the queue is empty by evaluating front. If it has a value of NULL, the queue is empty, and the back pointer must also be set to NULL since it maintains the address of the now deleted node.

template <class ItemType> ItemType Queue<ItemType>::Dequeue()

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\queue.htm

12/13/2007

An Introduction to Data Structures with C++

Page 3 of 4

{ assert(!IsEmpty()); //abort if the queue is empty, no node to dequeue ItemType item = front->data; //store the value of the first node, to be returned at the end QueNode<ItemType> *tmp = front; //temporary pointer to the first node. front = front->next; //set the second node in the queue as the new front delete tmp; //delete the original first node if(front == NULL) //if the queue is empty, update the back pointer back = NULL; len--; //decrement the amount of nodes in the queue return item; //return the value of the original first element }

**The first() function returns the value of the front node without modifying the queue.
**

template <class ItemType> ItemType Queue<ItemType>::first() const { assert(!IsEmpty()); //abort if the queue is empty return front->data; }

The IsEmpty() function checks to see if there are any nodes in the queue by evaluating the front pointer. If the queue is empty, front has a value of NULL.

template <class ItemType> bool Queue<ItemType>::IsEmpty() const { return (front == NULL); }

The IsFull() function works exactly the same way as it did with other data structures. A node is "created" using the new command, which is then checked. If the new command had failed to set aside the necessary memory, its value is NULL, in which case the function returns true. If the new node is created successfully, it is deleted and the function returns false.

template <class ItemType> bool Queue<ItemType>::IsFull() const { QueNode<ItemType> *tmp = new QueNode<ItemType>; if(tmp == NULL) return true; else { delete tmp; return false; } }

The length() function returns the value of len, which maintains the amount of nodes in the queue. Again, a function is used to retrieve this value instead of making it a public data member to avoid error and make our class abstract.

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\queue.htm

12/13/2007

An Introduction to Data Structures with C++

Page 4 of 4

template <class ItemType> int Queue<ItemType>::length() const { return len; }

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\queue.htm

12/13/2007

An Introduction to Data Structures with C++

Page 1 of 2

Searching Searching refers to finding the location of an element with a specific value within a collection of elements. The simplest search algorithm is the sequential search, which evaluates every element in the array (in order) and compares its value with the one being looked for. The function below demonstrates a sequential search. The function accepts a pointer to an array of integers, the amount of integer in the array, and a number to be looked for in the array. A boolean value is returned specifying whether that number exists in the array. bool InArray(int *array, int size, int num) { for(int j = 0; j < size; j++) if(array[j] == num) return true; return false; } The algorithm is very simple to implement, but also very inefficient. In the average and worst case, it takes O(N) time to find if the item exists. In very large arrays, this is a very slow operation. A more efficient search can be performed if the array is already sorted. Note, sorting an array first, and then using one of the more efficient search algorithms may be inefficient in the long run. It is recommended that such algorithms be used only if the array is sorted to begin with. The binary search algorithm is one of the simplest searching algorithms on a sorted array. For demonstration purposes, we will assume the array is sorted in ascending order. However, the algorithm can easily be modified to work with descending list by changing the relational operators. The binary search algorithm compares the number being looked for to the value of the middle element in the array. Depending on whether it is less or greater, the same process is then done on the left or right part of the array respectively. One possible implementation of a binary search is shown below. bool InArray(int *array, int left, int right, int num) //left and right are the left and right index values of the array. These are both necessary as parameters since the function is recursive. { if(num == ((left+right)/2)) //if a match is found, return true return true; if(left == right) //all possibilities have been searched. num is not in the array return false; if(num < ((left+right)/2)) return InArray(array,left,((left+right)/2) - 1,num); //perform the same operation. New right position is the element before the middle if(num > ((left+right)/2))

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\searching.htm

12/13/2007

An Introduction to Data Structures with C++

Page 2 of 2

return InArray(array,((left+right)/2) + 1,right,num); //perform the same operation. New left position is the element directly after the middle. } Hasing is a somewhat different approach to searching, as it attempts to make searching O(1) efficiency. A hash function is applied, which uses some type of algorithm or formula to determine what index of an array (also called the hash table) the object should be in. The algorithm or formula depends on the type of data in each situation. Most times, at least several collisions occur in the hash table. A collision is when an index is returned by the hash function that already has a value. This could mean it is either a duplicate, or the hash function is not unique, which is often times the case. There are two approaches to resolving a collision. The first is known as open hashing. In this method, collision values are stored "outside" of the array. An example of open hashing is having each array index point to another array, or linked list. Thus, all values that hash to that location are stored in the list. The items in the list can be sorted by their value, access frequency, or the order in which they were put into the table. Closed hashing is another collision resolution technique. In this approach, the collision values are stored at a different location in the hash table. The hash function takes care of this, as the resolution is dependent on data and programming task. Whatever algorithm the hash function uses to resolve a collision must then be used when searching to find the item required. © 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\searching.htm

12/13/2007

An Introduction to Data Structures with C++

Page 1 of 8

Sorting Finding better algorithms to sort a given set of data is an ongoing problem in the field of computer science. Sorting is placing a given set of data in a particular order. Simple sorts place data in ascending or descending order. For discussion purposes, we will look at sorting data in ascending order. However, you may modify the code to sort the data in descending order by reversing the relational operators (i.e. change 'nums[j] < nums[j-1]' to 'nums[j] > nums[j-1]'). In this lesson we will analyze sorts of different efficiency, and discuss when and where they can be used. In order to simplify the explanation of certain algorithms, we will assume a swap() function exists that switches the values of two variables. An example of such a function for int variables is displayed below.

void swap(int &item1,int &item2) //reference parameters, point directly to the storage location of the variables passed. Local copies are not made, and these values are saved after the function life span ends. See 'Functions' in the preliminary lesson for further information. { int tmp; tmp = item1; item1 = item2; item2 = tmp; }

We will first analyze sorts that are O(N^2). These sorts are very easy to understand, however they are very slow when there are a lot of elements to be sorted. The first sort we will look it is called the insertion sort. The algorithm processes each element in turn, and compares it to the elements before it. The first element has no elements before it for comparison, so it is left alone. In the next iteration, the second element is evaluated. It is compared to the element directly before it, which is the first element in the structure. If the second element has a value less than the first, their positions are switched. If they second element is more than the first, then they are left as they are, and the third element is processed. The third element is then compared to the second element in the new list ('new' list here since the first two items may have been swapped). If it is less than the second, then they are swapped, and it is then compared to the first element. If it is more than the second element, it is left in place and the process continues to the next element. In short, each element is moved to the front of the list by switching positions with the previous elements as long as it is smaller than the elements before it. The algorithm is programmed using two nested for loops. The first loop creates n-1 iterations, where n is the number of elements in the list. Since element[0] does not have any elements before it to compare

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm

12/13/2007

An Introduction to Data Structures with C++

Page 2 of 8

to, we start with the second element. The nested loop statement starts at the element that is being processed by the first loop, and works backwards, comparing the element to the one before it. If it is smaller, a swap is made and the loop continues. If it is larger, the loop ends, and the next iteration begins in the outer loop. The code for the insertion sort algorithm is shown below. A standard array of int variable is used for simplicity. However, you can modify the code to work for any linear structure.

void InsertionSort(int *nums,int n) //array called nums with n elements to be sorted { for(int i=1; i<n; i++) for(int j=i; (j>0) && (nums[j]<nums[j-1])); j--) swap(j,j-1); }

The next O(N^2) algorithm that we will analyze is the bubble sort. The bubble sort works from the bottom-up (back to front), and evaluates each element to the one before it. If the element on the bottom has a smaller value than the top, the two are swapped, if not - they remain in their original position. The algorithm compares the next two elements from the bottom-up, no matter what the outcome of the previous comparison was. In this fashion, the smallest value "bubbles up" to the top in each iteration. In subsequent comparisons, the values that were bubbled up in previous iterations are no longer compared, since they are in place. The code for the bubble sort is shown below, using a standard array of int variables. Two nested for loops are used. The first loop has n iterations, the number of elements. Each iteration, at least one element is set into its proper sorted position. The inner for loop runs from the bottom-up, comparing adjacent values, and stops at the group of values that have already been set in place, this position being one more each iteration of the outer loop.

void BubbleSort(int *nums, int n) { for (int i=0; i<n-1; i++) for (int j=n-1; j>i; j--) if(nums[j] < nums[j-1] swap(j,j-1); }

The selection sort will be the final O(N^2) sorting algorithm that we will look at. In a selection sort, the entire list is searched to find the smallest value. That is, we compare every element in the structure and find the smaller value, and then swap it with the first item. Then, every element but the first is searched to find the smallest value out of that group, and it is then swapped with the item in the second position. This continues until all items are in the correct order. This technique is similar to what would be done if a person were sorting a list of items by hand. The list is searched for the smallest value, which is then crossed out and written as the first item in a new list. The computer algorithm is the same, however, in order to preserve memory and not have to make two lists, we use a swap operation. The selection sort uses two nested for loops, the outer having n-1 iterations. When there is only one item left, it will appear in its correct position, last in the structure. The inner for loop searches the unsorted portion of structure (from bottom-top) by assuming the first element in the unsorted section is

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm

12/13/2007

An Introduction to Data Structures with C++

Page 3 of 8

the smallest, and then comparing it to each element in turn. If a smaller element is found, it is considered to be the smallest, and compared to the rest of the elements. The code for selection sort is shown below.

void SelectionSort(int *nums, int n) { int low; //holds the index of the smallest element in the unsorted portion for (int i=0; i<n-1; i++) { low = i; //assume the first item in the unsorted section is the lowest, unless a smaller value is found for (int j=n-1; j>i; j--) { if (nums[j] < nums[low]) //if element has smaller value than low low = j; //it then becomes the new low } swap(i,low); //switch the current position item with the smallest in the unsorted portion } }

The algorithms that follow are O(N log N). The next sorting algorithm that will be covered is the Shell sort, named after its creator D.L. Shell. It is the first algorithm that we will look at that swaps non-adjacent elements, and takes a "divide and conquer" approach. The list is divided into many sublists, which are sorted, and are then merged together. The shell sort takes advantage of the insertion sort, which is very efficient in a best case scenario (that is, the list being sorted is already 'near sorted'). The shell sort divides the list into n/2 sublists, each being n/2 apart. For instance, in a list of ten elements, the first iteration would consist of five lists, two elements each. The first list would be list[0] and list[5], the second would be list[1] and list[6], and so on. Each of these lists is then sorted using an insertion sort. During the next iteration, we divide the list into bigger sublists, with the elements being closer together. In the second iteration, there are n/4 lists, each element being n/4 apart. These lists are then sorted using insertion sort, and so on. The process continues with twice the amount of lists each iteration as in the one before. Each iteration, the list becomes closer to being sorted. The last sort is done on the entire list, using a standard insertion sort. Since the list should be 'near sorted', the algorithm is very efficient. Note, during some iterations, sublists will contain unequal amount of elements, since the amount of sublists does not evenly divide into the total number of elements. Remember, in integer division, the decimal in the answer is dropped, e.g. 5/2 = 2. Therefore, if we have seven elements, there are three lists during the first iteration. One contains three elements, and the other two each contain two elements.

{6,3,1,5,2,4,9} -> {6,3,1,5,2,4,9}

In the example, list one starts at list[0] (6) and includes every item two apart. The next list starts at list[1] (3) and also includes every item two apart from its position. Since there is no item that is two locations after the '4', the list has only two items. The code for shell sort is very simple and straightforward. A slightly modified version of the insertion

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm

12/13/2007

An Introduction to Data Structures with C++

Page 4 of 8

sort is used however. Since the elements of the sublists to be sorted are not adjacent, an increment parameter is added, which specifies how far apart the elements of the sublists are. The value '1' is then replaced by the increment in the original insertion sort code, since it now compares values that are increment apart.

void InSortShell(int *nums,int n, int incr) //array called nums with n elements to be sorted, incr apart { for(int i=incr; i<n; i+=incr) for(int j=i; (j>=incr) && (nums[j] < nums[j-incr]); j-=incr) swap(nums[j],nums[j-incr]); } void ShellSort(int *nums, int n) { for(int i=n/2; i>2; i/=2) //each iteration there are twice as many sublists. Divide the distance between each element by 2. for(int j=0; j<i; j++) //sort each sublist InSortShell(&nums[j],n-j,i); //the first element of each sublist begins at j, and therefore the entire list is j items shorter (n-j). The elements of the sublist are i apart. InSortShell(nums,n,1); //do a standard insertion sort on the now nearly sorted list. } QuickSort

is the quickest algorithm in the average case, however it has a very bad running time in a worst case scenario (e.g. items completely out of order, in reverse of how they should appear sorted, etc). The QuickSort takes a "divide and conquer" approach. A value called a pivot is first selected, usually it is the value of the middle element. A "partition" of the array is then preformed. Any elements that are less than the pivot value will be moved to the beginning of the list, followed by the pivot element, and then all values that are bigger will appear at the end. The elements in each 'sublist' do not need to be sorted in any way with respect to each other, but this order must be maintained. The QuickSort algorithm is then used on each sublist, through recursion. This continues until the structure has been sorted. Let's take a look at how the partitioning algorithm is implemented. The algorithm starts at each end of the sublist being analyzed. It then moves inward from each end. First, it uses a while loop to find the first value from the left that is greater than the pivot. Then, it uses another while loop to find the first value from the right that is less than the pivot. It swaps these two values, and continues the process until the left position value and right position value meet somewhere in the center. When this algorithm is finished, every element at the left position and after is greater than the pivot, and the elements before it are less than the pivot. The code for the partition algorithm is shown below.

int part(int *nums,int left,int right,int pivot) { do { while(nums[left] < pivot) //find the next position greater than the pivot from the left left++; while(nums[right] > pivot) //find the next position less than the pivot from the right

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm

12/13/2007

An Introduction to Data Structures with C++

Page 5 of 8

right--; swap(nums[left],nums[right]); //swap these two values } while(left<right); //move inward until they cross swap(nums[left],nums[right]); //the last swap occurs after left and right have crossed (i.e. left is already < right, after which the outer loops end), therefore we must re-swap these elements back into their correct positions return left; }

**The pivot of each list is simply the middle element.
**

int getpivot(int left,int right) //the left and right indices of the sublist { return (left+right)/2; }

Now let's take a look how QuickSort brings everything together. The left and right indices of the sublist must be passed into the QuickSort() function. Recursion is used by the algorithm to run QuickSort on each partition, which is why these values are required as parameters. On the initial call to sort an array, 0 would be used for left index, and n-1 for the right index. The pivot value itself is already locked in the correct position. Note, the pivot itself is not evaluated for the partition algorithm to work correctly. To accomplish this, we swap it with the position of the last element, and start the partition with right-1. After the partition, it is then swapped with the element at the first position of the right sublist. Now, all the elements to the left are less than the pivot, and all those to the right are greater than the pivot.

void QuickSort(int *nums, int left, int right) { int pivot = getpivot(left,right); //find the pivot swap(nums[pivot],nums[right]); //move the pivot to the last position int r_sublist = part(nums,left,right-1,nums[right]); //partition left->right-1, excluding the pivot swap(nums[r_sublist],nums[right]); //move the pivot in the proper position if((r_sublist - left) > 1) //if a left sublist exists, sort it QuickSort(nums,left,r_sublist-1); if((right - r_sublist) > 1) //if a right sublist exists, sort it QuickSort(nums,r_sublist+1,right); }

The merge sort is another algorithm which takes a "divide and conquer" approach. It begins by dividing a list into two sublists, and then recursively divides each of those sublists until there are sublists with one element each. These sublists are then combined using a simple merging technique. In order to combine two lists, the first value of each is evaluated, and the smaller value is added to the output list. This process continues until one of the lists has become exhausted, at which point the remainder of the other list is simply appended to the output list. Two closest lists are combined at each end, until all the elements are merged back into a single list. The code for merge sort is very straightforward. The algorithm uses two arrays to accomplish the task. First, the items to be sorted are copied to a temporary array, where they are divided. The original array acts as the output array, and will contain the sorted list at the end. The parameters of the MergeSort() function consist of these two arrays, as well as the left and right boundaries of the list to be sorted. This

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm

12/13/2007

An Introduction to Data Structures with C++

Page 6 of 8

is required since the MergeSort() function is recursive, and repeats the algorithms to sublists within the array. On the initial call, left will have a value of 0, and right will have a value of n-1.

void MergeSort(int *nums, int *tmp, int left, int right) { if (left==right) return; //if the boundaries are the same, the sublist has only one element, and cannot be further split int mid = (left+right)/2; MergeSort(nums,left,mid); //sort the first half MergeSort(nums,mid+1,right) //sort the second half //copy the sublist into the temporary array for(int i = left; i<=right; i++) temp[i] = array[i]; //merge the lists int l = left; //the first element in the first sublist int r = mid + 1; //the first element in the second sublist for(int j=left; j<=right; j++) { if(l == mid+1) //if the index of the left list is equal to the first element of the right list, the left list has ended. Insert next item from right list. nums[j] = tmp[r++]; else if (r > right) //if the index of the right sublist has exceeded it's right boundary, the right list has ended. Insert next item from the left list. nums[j] = tmp[l++]; else if(tmp[l] < tmp[r]) //if two lists exist, and the current element in the left is smaller than the right, insert it into the next position in the output array and move to the next element in the left list. nums[j] = tmp[l++]; else //the current element in the right sublist is smaller than the one in the left sublist, insert it into the output array and move to the next element in the right list. nums[j] = tmp[r++]; } }

The next algorithm, HeapSort, is very simple to implement. The array to be sorted is first inserted as a heap structure. Then, a loop is used to remove each element of the heap. Remember from the lesson on heaps, when an element is removed, it is still part of the physical array, and is swapped with the last item of the heap. The process continues and each time the largest item is pushed to the end of the heap (which is directly before the item discarded the previous iteration, since the heap becomes smaller). This approach requires a slightly modified version of the heap class. The changes that were made are shown in bold below.

const int MAX_SIZE = 100; template <class ItemType> class Heap { public: Heap(ItemType*,int); int left(int) const; int right(int) const; int parent(int) const; void insert(const ItemType&);

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm

12/13/2007

An Introduction to Data Structures with C++

Page 7 of 8

ItemType remove_max(); bool IsEmpty() const; bool IsFull() const; int count() const; ItemType value(int) const; private: ItemType *array; int elements; //how many elements are in the heap void ReHeap(int); void BuildHeap(); };

This version of the heap class accepts a pointer to the array to be sorted in the class constructor. The second parameter of the constructor is the size of the array. The array is initialized as a pointer (which will point to the array passed to the construtor), and a the function BuildHeap() has been added, which will make the array into a heap.

template <class ItemType> Heap<ItemType>::Heap(ItemType *array_ptr, int size) { array = array_ptr; elements = size; BuildHeap(); }

The BuildHeap() function begins at the first non-leaf node and works up the array, sorting each subtree with the ReHeap() function. Since leafs can't travel down any further, they do not need to be processed. Instead, they will fall into their proper place by being exchanged with a node on a higher level if necessary, when that node comes down. By working up, the subtrees of a node are made into heaps first, which allows the ReHeap() function to be used. The ReHeap() function relies on the fact that the node's subtrees are heaps. This is because it compares the node to its children, and makes a switch if necessary. If the elements did not follow the heap property, larger items on lower levels would never be brought to the top, since a comparison is made with the node's children and not the parent.

template <class ItemType> void Heap<ItemType>::BuildHeap() { for(int j = n/2 - 1; j >= 0; j--) ReHeap(j); }

The heap sort algorithm makes the array to be sorted into a heap, and then follows the above procedure.

template <class ItemType> void HeapSort(ItemType *array, int size) { Heap<ItemType> sort(array, size); for(int j = 0; j < size; j++) sort.remove_max(); }

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm

12/13/2007

An Introduction to Data Structures with C++

Page 8 of 8

Note, although algorithms such as the QuickSort seem to be the only ones that should be used, this is not always the case. If you know that the array to be sorted is very small, a O(N^2) algorithm is much more simple to implement, and the difference will not be significant. © 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm

12/13/2007

An Introduction to Data Structures with C++

Page 1 of 4

Stacks A stack is a special type of list, where only the element at one end can be accessed. Items can be "pushed" onto one end of the stack structure. New items are inserted before the others, as each old element moves down one position. The first element is referred to as the "top" item, and is the only item that may be accessed at any time. In order to access items that are further down the stack, they must be moved to the top by "popping" the appropriate number of items. Popping refers to removing the top element of a stack. This is referred to as a LIFO structure, "Last In, First Out". These rules make stacks very restricted in use, however they are very efficient and much easier to implement than lists. The uses of stacks vary from programming a simple card game, to maintaining the order of operations in a complex program. For example, a stack is useful in a management program where the newest tasks must be executed first. The node of a stack is usually presented with the following structure, which is very similar to that of a list node.

template <class ItemType> struct StackNode { ItemType data; StackNode<ItemType> *next; };

Implementing a generic stack class, which can be modified to work in any type of programming situation is very easy to do. A definition of such a class is shown below.

template <class ItemType> class Stack { public: Stack(); //class constructor - initialize private variables ~Stack(); //class destructor - free up used memory void push(const ItemType); //add a new node to the top of the stack ItemType pop(); //remove the top node and return its contents ItemType top() const; //return the top node without popping it void clear(); //delete all nodes in the stack bool IsEmpty() const; //return true if the stack has no elements bool IsFull() const; //return true if there is no free memory for new nodes int count() const; //return the amount of nodes on the stack private: StackNode<ItemType> *top; //pointer to the top node in stack int counter; //maintain the amount of nodes in the stack };

The class constructor sets counter to zero. Since there are no nodes in the stack when an instance of the

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\stack.htm

12/13/2007

An Introduction to Data Structures with C++

Page 2 of 4

**class is first created, top is set to NULL.
**

template <class ItemType> Stack<ItemType>::Stack() { counter = 0; top = NULL; }

The role of the destructor is to delete all nodes in the list, and return the memory they occupy to free store. Since the clear() function does this task, it can be called by the destructor.

template <class ItemType> Stack<ItemType>::~Stack() { clear(); }

The push() function inserts a new node on top of the stack, and sets the top pointer to reference this new node. First, we check if there is enough system memory to create a new node, and then create the node, assigning it to top. The node's value is set equal to item, and its next component is set to the node that was on top before the creation of the new node.

template <class ItemType> void Stack<ItemType>::push(const ItemType item) { assert(!IsFull()); //abort if there is not enough memory to create a new node StackNode<ItemType> *tmp = new StackNode<ItemType>; //create a new node on top of the others with value item. set the original top node to follow the new one. tmp->data = item; tmp->next = top; top = tmp; counter++; //increment the amount of nodes in the stack }

The pop() function removes the top node from the stack (freeing up the memory it uses) and returns its value. top is set to the next node in the stack, and a temporary local variable(tmp) is created to point to the original top node. It is then used to reference the memory address of the node to delete. If we were to delete the node using top as a reference, the position of the next node in the stack would be lost, since top->next would no longer exist.

template <class ItemType> ItemType Stack<ItemType>::pop() { assert(!IsEmpty()); //abort if the stack is empty, no node to pop ItemType item = top->data; //maintain top value, to be returned later StackNode<ItemType> *tmp = top; //create a temporary reference to the top node top = top->next; //set top to be the next node delete tmp; //delete the top node counter--; //decrement the amount of nodes in the stack return item; //return the original top value }

The top() function simple returns the value of the top node, without any modifications to the stack.

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\stack.htm

12/13/2007

An Introduction to Data Structures with C++

Page 3 of 4

template <class ItemType> ItemType Stack<ItemType>::top() const { assert(!IsEmpty()); return top->data; }

The clear() function delete all nodes in the stack, and frees the memory they occupy. Each node in the stack is visited using a loop, which executes until it reaches a NULL reference. A temporary variable is used for the same reason as in the list implementation. If we were to delete a node using top as a reference, the position of the next node in the list would be lost, since top->next would no longer exist.

template <class ItemType> void Stack<ItemType>::clear() { StackNode<ItemType> *tmp; while(top != NULL) //loop through every node in the stack { tmp = top; //reference the top node top = top->next; //set top to the next node delete tmp; //delete the original top node } }

The IsEmpty() function returns true if the stack has no nodes. This task is accomplished very simply by checking to see if the top pointer is NULL.

template <class ItemType> bool Stack<ItemType>::IsEmpty() const { return (top == NULL); }

The IsFull() function checks to see if there is enough system memory avaliable to create a new node for the stack. It works exactly the same way as it did in the linked list class. A node is "created" using the new command, which is then evaluated. If the new command had failed to set aside the necessary memory, its value is NULL, in which case the function returns true. If the new node is created successfully, it is deleted, and the function returns false.

template <class ItemType> bool Stack<ItemType>::IsFull() const { StackNode<ItemType> *tmp = new StackNode<ItemType>; if(tmp == NULL) return true; else { delete tmp; return false; } }

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\stack.htm

12/13/2007

An Introduction to Data Structures with C++

Page 4 of 4

The count() function returns the amount of nodes in the stack, which is maintained by the class' private counter member. Again, this value can be maintained using a public member which the programmer can access directly, however it is good practice to hide this value from the programmer, since it can be modified without any nodes being added or removed. Also, if we were to change the class implementation, a program using the stack class would not require any change, since all the new code will be written in the count() function.

template <class ItemType> int Stack<ItemType>::count() const { return counter; }

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\stack.htm

12/13/2007

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd