You are on page 1of 43

An Introduction to Data Structures with C++ Page 1 of 1

The curiosity of our surroundings is the constant force that drives humanity forward. In an effort to
better ourselves and our future we study the world we live in. Our own inventions define the way in
which we lead our lives, and what people we become. Computers have now become the center of many
cultures around the world, as technology unites people in a way that has never been seen before,
changing the way we view each other. In the last decade, the Internet has become a medium that lets
people share their ideas and opinions with millions of others. We are very close to a world where
ignorance is no longer an excuse. Computers are, and will be even more, in the center of our lives.
-- "World Around Us" by Alec Solway

Computer programming is an exiciting field in the modern world. We make our lives easier by "telling"
the computer to perform certain tasks for us. In a sense, this is what programming is. All types of tasks
require some kind of data to be manipulated. Whether we want to play a game, or manage our portfolio,
data is involved. By creating new ways to manage (access and change) data, we can make programs
more efficient, and thus obtain more reliable and faster results. Different types of programs require
different ways of handling data, however, standards exists between various programs. This website gives
you a peek into those standards, into the world of data structures and algorithms. It is
recommended that the reader have some experience with programming in general, although a brief
review of the C++ concepts needed to understand the data structure tutorials is provided. Please click on
"C++ Review" to begin your visit, or on "Data Structures" if you already have a solid C++ programming
background. Enjoy!

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\An Introduction to Data ... 12/13/2007
An Introduction to Data Structures with C++ Page 1 of 14

Binary Trees

Binary trees are different from the three previous structures we have covered before. Lists, stacks, and
queues were all linear structures, that is, the elements in them were logically following each other.

A binary tree structure contains a root node, which is the first in the structure. The root points to one or
two other nodes, its left and right children. The root is considered to be a parent of these two nodes.
Each child is also a sub-tree, since it can have one or two children of its own. If a node has no children,
it is referred to as a leaf node.

Each node in the tree also has a level associated with it. The root node is at level 0, and increases with
each row of nodes below the root.

Binary trees have many different basic implementations. An array implementation is often times used,
where every level must be completely filled. In larger trees, this can be a very big waste of space. For
our demonstration, we will create a generic class using dynamic memory allocation. This particular
implementation was created by the author using a mixture of possible approaches. It is very effective in
explaining the concepts behind binary trees.

The binary tree class gives the programmer complete control over the tree. Nodes may be removed and
inserted into any location in the list. The class allows the user to traverse the tree by keeping a current
pointer, just as in the linked list class. The programmer can then use the functions left(), right(), and
parent() to move from one node to another. The class also allows the user to display the tree in in-
order, post-order, and pre-order. The "order" refers to how the nodes are displayed. For instance, in pre-
order, a node's value is displayed, then the value of its left child, followed by the right child. In the case
of in-order, the node's value is displayed between the value of its left and right child. In post order, the
node's children are displayed before it. The implementation for the binary tree class is displayed below.
Although it may look intimidating at first, the code is very easy to follow. The purpose and code behind
each function is explained following the definition. You will note, most functions are programmed using
recursion. Since each node is actually a tree within itself, using recursion is the easiest approach

Many books make a class for a single node, and use it to implement the tree. However, we will separate
the structure for each node and the entire tree to conserve overhead processing time. Each time a node is
created, much less time and memory is used than when a whole tree structure is made. Each node will
store a value, and pointers to its children and parent. These will be used and modified by the general tree
class.

template <class ItemType>
struct TreeNode
{
ItemType data;

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 2 of 14

TreeNode<ItemType> *left;
TreeNode<ItemType> *right;
TreeNode<ItemType> *parent;
};

template <class ItemType>
class BinaryTree
{
public:
BinaryTree(); //create empty tree with default root node which has no value. set
current to main root node.
BinaryTree(TreeNode<ItemType>*,int); //create new tree with passed node as the
new main root. set current to main root. if the second parameter is 0, the new
object simply points to the node of the original tree. If the second parameter is
1, a new copy of the subtree is created, which the object points to.
~BinaryTree();
void insert(const ItemType&,int); //insert new node as child of current. 0=left
1=right
void remove(TreeNode<ItemType>*); //delete node and its subtree

ItemType value() const; //return value of current

//navigate the tree
void left();
void right();
void parent();
void reset(); //go to main_root
void SetCurrent(TreeNode<ItemType>*);

//return subtree (node) pointers
TreeNode<ItemType>* pointer_left() const;
TreeNode<ItemType>* pointer_right() const;
TreeNode<ItemType>* pointer_parent() const;
TreeNode<ItemType>* pointer_current() const;

//return values of children and parent without leaving current node
ItemType peek_left() const;
ItemType peek_right() const;
ItemType peek_parent() const;

//print the tree or a subtree. only works if ItemType is supported by <<
operator
void DisplayInorder(TreeNode<ItemType>*) const;
void DisplayPreorder(TreeNode<ItemType>*) const;
void DisplayPostorder(TreeNode<ItemType>*) const;

//delete all nodes in the tree
void clear();

//
bool IsEmpty() const;
bool IsFull() const;
private:
TreeNode<ItemType>* current;
TreeNode<ItemType>* main_root;

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 3 of 14

TreeNode<ItemType>* CopyTree(TreeNode<ItemType>*,TreeNode<ItemType>*)
const; //create a new copy of a subtree if passed to the constructor
bool subtree; //does it reference a part of a larger object?
};

The first constructor simply sets the main_root and current data members to NULL, since the tree has
no nodes. A new tree is made, therefore it is not part of a larger tree object, and the subtree value is set
accordingly.

template <class ItemType>
BinaryTree<ItemType>::BinaryTree()
{
//create a root node with no value
main_root = NULL;
current = NULL;
subtree = false;
}

The second constructor accepts a pointer to a node, and creates a new tree object with the node that is
passed acting as the new tree's main root. current is then set to the main root. The second parameter
specifies whether the new subtree object points directly to the original tree's nodes (the root and its
decedents), or creates a copy of the subtree and is thus a new tree. The subtree variable specifies if the
subtree points directly to the original tree's nodes. As you will later find out, this is important in the class
destructor.

template <class ItemType>
BinaryTree<ItemType>::BinaryTree(TreeNode<ItemType>* root, int op)
{
if(op = 0)
{
main_root = root;
current = root;
subtree = true;
}
else
{
main_root = CopyTree(root,NULL);
current = main_root;
subtree = false;
}
}

The CopyTree() function creates a copy of subtree root and returns a pointer to the location of the new
copy's root node. The second parameter is a pointer to the parent of the subtree being passed. Since
CopyTree() uses recursion to traverse the original tree, passing each node's parent as a parameter is the
most efficient way of assigning each new node's parent value. Since the parent of the main root is
always NULL, we pass NULL as the second parameter in the class constructor above.

template <class ItemType>
TreeNode<ItemType>* BinaryTree<ItemType>::CopyTree(TreeNode<ItemType> *root,
TreeNode<ItemType> *parent) const
{

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 4 of 14

if(root == NULL) //base case - if the node doesn't exist, return NULL.
return NULL;
TreeNode<ItemType>* tmp = new TreeNode<ItemType>; //make a new location in
memory
tmp->data = root->data; //make a copy of the node's data
tmp->parent = parent; //set the new node's parent
tmp->left = CopyTree(root->left,tmp); //copy the left subtree of the current
node. pass the current node as the subtree's parent
tmp->right = CopyTree(root->right,tmp); //do the same with the right subtree
return tmp; //return a pointer to the newly created node.
}

The job of the class destructor is to delete all the nodes, and free up memory as usual. The clear()
function is called just as in the previous data structure implementations. However, this operation is only
performed if the object is a main tree. If the object is a subtree that points to the nodes of a larger tree, it
will be deleted when the main tree itself is deleted. Attempting to delete the data in the memory
associated with the subtree after it has already been deleted by the main tree will have unpredictable
results.

template <class ItemType>
BinaryTree<ItemType>::~BinaryTree()
{
if(!subtree)
clear(); //delete all nodes
}

The insert() function creates a new node as a child of current. The first parameter is a value for the
new node, and the second parameter is an integer indicating what child the new node will become. A
value of 0 indicates that the new node will be a left child of current, whereas a value of 1 indicates the
new node will be a right child. If a node already exists in the location that programmer wishes to insert
it, that node adopts the value passed to insert(). If the tree does not have any nodes, the second
parameter is disregarded, and a main root is created.

template <class ItemType>
void BinaryTree<ItemType>::insert(const ItemType &item,int pos) //insert as child
of current 0=left 1=right. if item already exists, replace it
{
assert(!IsFull());
//if the tree has no nodes, make a root node, disregard pos.
if(main_root == NULL)
{
main_root = new TreeNode<ItemType>;
main_root->data = item;
main_root->left = NULL;
main_root->right = NULL;
main_root->parent = NULL;
current = main_root;
return; //node created, exit the function
}

if(pos == 0) //new node is a left child of current
{
if(current->left != NULL) //if child already exists, replace value

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 5 of 14

(current->left)->data = item;
else
{
current->left = new TreeNode<ItemType>;
current->left->data = item;
current->left->left = NULL;
current->left->right = NULL;
current->left->parent = current;
}
}
else //new node is a right child of current
{
if(current->right != NULL) //if child already exists, replace value
(current->right)->data = item;
else
{
current->right = new TreeNode<ItemType>;
current->right->data = item;
current->right->left = NULL;
current->right->right = NULL;
current->right->parent = current;
}
}
}

The remove() function removes the subtree referenced to by root, as well as the root node itself.
Depending on whether it was a left or right child, the left or right pointer of the parent is set to NULL.
The function uses recursion to perform the necessary operation on all nodes of the subtree. We must
start with the nodes on the lowest level, and work our way up. If we were to delete the top level nodes
first, we would loose the link the lower levels.

template <class ItemType>
void BinaryTree<ItemType>::remove(TreeNode<ItemType>* root)
{
if(root == NULL) //base case - if the root doesn't exist, do nothing
return;
remove(root->left); //perform the remove operation on the nodes left subtree
first
remove(root->right); //perform the remove operation on the nodes right subtree
first
if(root->parent == NULL) //if the main root is being deleted, main_root must be
set to NULL
main_root = NULL;
else
{
if(root->parent->left == root) //make sure the parent of the subtree's root
points to NULL, since the node no longer exists
root->parent->left = NULL;
else
root->parent->right = NULL;
}
current = root->parent; //set current to the parent of the subtree removed.
delete root;
}

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 6 of 14

The next function returns the value of current.

template <class ItemType>
ItemType BinaryTree::value() const
{
return current->data;
}

The next five functions are used to navigate the tree. The programmer can visit a node's left child, right
child, or parent, as well as reset current to the main root. Finally, the programmer can set current to a
specific node by supplying a pointer to it. This is very helpful if the programmer would like to work
with subtrees within the main tree object. Note, the SetCurrent() function should be used with caution.
If a pointer is supplied to a node that is not within the tree, the results are unpredictable.

template <class ItemType>
void BinaryTree<ItemType>::left()
{
current = current->left;
}

template <class ItemType>
void BinaryTree<ItemType>::right()
{
current = current->right;
}

template <class ItemType>
void BinaryTree<ItemType>::parent()
{
current = current->parent;
}

template <class ItemType>
void BinaryTree<ItemType>::reset()
{
current = main_root;
}

template <class ItemType>
void BinaryTree<ItemType>::SetCurrent(TreeNode<ItemType>* root)
{
current = root;
}

The four functions that follow return pointers to various nodes in the tree, depending on current. This
is a required parameter for a few of our other functions, such as remove() and the three display
functions. It is also used by one of our class constructors, which can make a new tree object from a
subtree. The only function that is required is pointer_current(), since the programmer can navigate
the tree to any node. The other three functions were also included for ease of use. It is often times
necessary to perform an operation on a node's children or parent without leaving the node. The functions
are also useful if a programmer would like to work on a subtree. An external TreeNode* pointer can be
created, set by one of the pointer returning functions, and then passed to the operation functions of the

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 7 of 14

class.

template <class ItemType>
TreeNode<ItemType>* BinaryTree<ItemType>::pointer_left() const
{
return current->left;
}

template <class ItemType>
TreeNode<ItemType>* BinaryTree<ItemType>::pointer_right() const
{
return current->right;
}

template <class ItemType>
TreeNode<ItemType>* BinaryTree<ItemType>::pointer_parent() const
{
return current->parent;
}

template <class ItemType>
TreeNode<ItemType>* BinaryTree<ItemType>::pointer_current() const
{
return current;
}

The next three functions are also not required, but were added for ease of use. They return the values of
a node's two children and parent without having to leave the node.

template <class ItemType>
ItemType BinaryTree<ItemType>::peek_left() const
{
assert(current->left != NULL);
return current->left->data;
}

template <class ItemType>
ItemType BinaryTree<ItemType>::peek_right() const
{
assert(current->right != NULL);
return current->right->data;
}

template <class ItemType>
ItemType BinaryTree<ItemType>::peek_parent() const
{
assert(current->parent != NULL);
return current->parent->data;
}

The display functions as explained above are next. Note, these functions will work only if ItemType is
supported by the << operator. For instance, any simple built in C/C++ type (such as int, float, char,
etc.) will work without any modification. template <class ItemType>

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 8 of 14

void BinaryTree<ItemType>::DisplayInorder(TreeNode<ItemType>* root) const
{
if (root == NULL)
return;

DisplayInorder(root->left);
cout << root->data;
DisplayInorder(root->right);
}

template <class ItemType>
void BinaryTree<ItemType>::DisplayPreorder(TreeNode<ItemType>* root) const
{
if (root == NULL)
return;

cout << root->data;
DisplayInorder(root->left);
DisplayInorder(root->right);
}

template <class ItemType>
void BinaryTree<ItemType>::DisplayPostorder(TreeNode<ItemType>* root) const
{
if (root == NULL)
return;

DisplayInorder(root->left);
DisplayInorder(root->right);
cout << root->data;
}

The clear() function deletes all nodes in the list. This is very easy to do, since we can take advantage
of the remove() function, which we has already defined. The remove() functions deletes all nodes of a
subtree, as well as the root node. Therefore, we can pass the main root to remove() in order to delete all
nodes in the tree.

template <class ItemType>
void BinaryTree<ItemType>::clear()
{
remove(main_root); //use the remove function on the main root
main_root = NULL; //since there are no more items, set main_root to NULL
current = NULL;
}

The IsEmpty() function works by evaluating main_root. If there aren't any nodes in the tree,
main_root points to NULL.

template <class ItemType>
bool BinaryTree<ItemType>::IsEmpty() const
{
return (main_root == NULL);
}

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 9 of 14

Finally, other than the data types, the implementation of the IsFull() function does not change from
previous classes.

template <class ItemType<
bool BinaryTree<ItemType>::IsFull() const
{
TreeNode<ItemType> *tmp = new TreeNode<ItemType>;
if(tmp == NULL)
return true;
else
{
delete tmp;
return false;
}
}

Now let's take a look at two additional functions, which are not part of the tree class. Often times it is
necessary to know how many nodes are in the list, or how many of them are leafs. One example of when
a leaf count is required is in a binary expression tree. Binary expression trees store mathematical
expression, for instance, 5*x+7=22. Each character of the expression is represented by one node. They
are stored in such a way that the expression can then be displayed using an in-order traversal. Also, pre-
order and post-order traversals will display the mathematical expression using prefix and postfix
notations. This means an operator stored in a node perform an operation on its two children. In such a
setup, all operators are internal nodes, whereas variables and constants are leafs. The code for the
NodeCount() and LeafCount() functions is displayed below. Both are very short since recursion is used.

template <class ItemType>
int LeafCount(TreeNode<ItemType>* root)
{
if(root == NULL) //base case - if the node doesn't exist, return 0 (don't count
it)
return 0;
if((root->left == NULL) && (root->right == NULL)) //if the node has no children
return 1 (it is a leaf)
return 1;
return LeafCount(root->left) + LeafCount(root->right); //add the leaf nodes in
the left and right subtrees
}

template <class ItemType&lg;
int NodeCount(TreeNode<ItemType>* root)
{
if(root == NULL) //base case - if the return 0 if node doesn't exist (don't
count it)
return 0;
else
return 1 + NodeCount(root->left) + NodeCount(root->right); //return 1 for the
current node, and add the amount of nodes in the left and right subtree
}

Binary Search Trees

Another type of special binary tree is the binary search tree (BST). BSTs must conform to a property

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 10 of 14

that states all the left children of a node have a lesser value than the node, and all the right children
have a value greater than the node. In order to make our binary tree class a BST, we need only modify
the insert() function, since nodes can no longer be placed anywhere in the tree by the programmer.
Otherwise, a BST acts in the same fashion as a standard binary tree.

The new insert() function accepts one parameter, the item to be inserted. If the tree is empty, then the
main root is added in the same fashion as it was with a standard binary tree, and the function exits. If the
tree is not empty, the function proceeds with a new algorithm.

The parent of the new node is found by running the insert_find() function, a new private member
function that must be added to the BST class. The insert_find() function accepts two parameters - the
root of the tree and the item value. The root is needed as a parameter because insert_find() uses
recursion to traverse the tree. The function works by comparing the value of item to each node. If it is
less than the node's value, and the node has a left child, the function proceeds to that child and performs
the same operation. If it is greater (or equal to) than the node's value, and a right child exists, then the
function proceeds to the right child. If item is less than the node and a left child does not exist, or it is
more than the node and a right child does not exist, that is where the new node belongs.

The insert() function receives a pointer to the new parent, however, we must check again if the new
node is to be the left or right child. This is because there is no efficient way a recursive version of
insert_find() can return this information. We would have to add another parameter, or write a non-
recursive version of insert_find(). Both methods are far more space and time consuming than simply
performing another check.

A new node is then created using the same method as in the original version of insert().

template <class ItemType>
void BST<ItemType>::insert(const ItemType &item)
{
//if the tree has no nodes, make a root node
if(main_root == NULL)
{
main_root = new TreeNode<ItemType>;
main_root->data = item;
main_root->left = NULL;
main_root->right = NULL;
main_root->parent = NULL;
current = main_root;
return;
}

TreeNode<ItemType>* new_parent = insert_find(main_root,item); //find the new
node's parent

if (item < new_parent->data) //check whether the new node is a left or right
child and create it
{
new_parent->left = new TreeNode<ItemType>;
new_parent->left->data = item;
new_parent->left->left = NULL;
new_parent->left->right = NULL;

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 11 of 14

new_parent->left->parent = new_parent;
}
else
{
new_parent->right = new TreeNode<ItemType>;
new_parent->right->data = item;
new_parent->right->left = NULL;
new_parent->right->right = NULL;
new_parent->right->parent = new_parent;
}
}

template <class ItemType>
TreeNode<ItemType>* BST<ItemType>::insert_find(TreeNode<ItemType>*
delete_node,ItemType item)
{
if((root->left != NULL) && (item < root->data))
return insert_find(root->left,item);

if((root->right != NULL) && (item >= root->data))
return insert_find(root->right,item);
return root; }

Since the programmer is no longer in control of the location of each node, we can add a function that
removes one node at a time. The reason we did not implement such a function for the standard binary
tree class is that we do not know how the program requires the structure to handle removing a node. If
the node to be deleted is a leaf, the solution is simple. We set its parent's pointer to NULL. If it has one
child, then the parent is set to point to that child. However, what happens if the node has two children ?
How should we insert those children back into the tree once the node is deleted ? The solution can vary,
depending on the programming task. Since a binary search tree follows a specific property, we can
libraryelop an algorithm that maintains the binary search tree property when rearranging the node's
children.

The code for the remove_node() function may look difficult at first, however when broken down into
each possible situation, it is very simple to understand. The first case that must be considered is if we
wish to remove the main root, and it has only one child. In this situation, that child becomes the new
main root.

If the node to be deleted has no children, node's parent is set to point to NULL, and node is then deleted.

Also as mentioned before, if node has one child, the parent of node is set to point to this child, and node
is then deleted.

If delete_node has two children, a little bit more work must be done. The question arises as to how to
attach delete_node's children to the tree, while preserving the binary search tree property. One possible
solution would be to attach one of the children to delete_node's parent, and reinsert each node from the
second child subtree one-by-one. However, this method is very inefficient, especially if the node to be
deleted has a very large subtree.

The ideal approach is to find another node in the tree, which can replace delete_node, and still
maintain the binary search tree property. We can then replace the value of delete_node, and remove the

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 12 of 14

node who's value we replaced it with instead. The possible nodes that would fit such a criteria are the
largest node in the left subtree, and the smallest node in the right subtree. The largest node in the left
subtree is still smaller than any node of the right subtree, and the smallest node in the right subtree is
larger than any node in the left subtree, therefore making both possible values that can replace root. If a
right subtree exists, then it is used, since it may contain a value that is equal to delete_node (in which
case that value will be used).

The largest node in the left subtree can be found by moving to the left one time (starting from
delete_node), and then moving to the right as much as possible. The smallest node in the right subtree
can be found by moving once to the right, and then to the left as much as possible. This means that
either node will have at most one child (the largest node in the left subtree can only have a left child, and
the smallest node in the right subtree can only have a right child). It can therefore be removed using the
above method, by attaching that child to the node's parent.

The replace_find() private member function returns the node that is used to replace delete_node.
The first parameter of the function is a pointer to the first possible node (if we are searching the left
subtree, this value is root->left, whereas it is root->right if we are searching the right subtree), and
the second parameter is what direction to search in. Zero means the function will locate the largest value
in the left subtree, meaning it will travel to the right as much as possible. A value of one means the
function will locate the smallest value in the right subtree, therefore travelling to the left as much as
possible. Once the node is found, delete_node is set to its value, and the node is deleted.

template <class ItemType>
void BST<ItemType>::remove_node(TreeNode<ItemType>* root)
{
if((root == main_root) && ((root->left == NULL) || (root->right == NULL)))
{
//set the main root's only child as the new root. if it has no children,
main_root becomes NULL as the tree is empty.
if(root->left == NULL)
main_root = root->right;
else
main_root = root->left;
main_root->parent = NULL; //set the new main root's parent to NULL
if(current == root) //if current is at the original main root, set it to the
new root, since the original will be deleted
current = main_root;
delete root;
return;
}

if(current == root) //if current is at the node to be deleted, set it to the
node's parent
current = root->parent;

if((root->left == NULL) && (root->right == NULL)) //if the root has no children
{
//have the parent point to NULL in place of it
if(root->parent->left == root) //if it a left child
root->parent->left = NULL;
else //it is a right child
root->parent->right = NULL;

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 13 of 14

delete root;
return;
}

//if the root has one child, have the parent point to it in place of root
if((root->left == NULL) && (root->right != NULL))
{
if(root->parent->left == root)
root->parent->left = root->right;
else
root->parent->right = root->right;
delete root;
return;
}
if((root->left != NULL) && (root->right == NULL))
{
if(root->parent->left == root)
root->parent->left = root->left;
else
root->parent->right = root->left;
delete root;
return;
}

//if the node has two children
TreeNode<ItemType> *tmp;
if(root->right != NULL) //if the root has a right subtree, search it for the
smallest value
tmp = replace_find(root->right,1);
else //search the left subtree for the largest value
tmp = replace_find(root->left,0);
root->data = tmp->data;
//if tmp has a child, have tmp's parent point to it. Otherwise, have the parent
point to NULL in place of tmp.
if(tmp->parent->left == tmp) //if tmp is a left child
{
if(root->right != NULL) //if it has a right child, have the parent point to
it
tmp->parent->left = tmp->right;
else //point to the left child. This value is NULL if there is no left child
tmp->parent->left = tmp->left;
}
else //if tmp is a right child
{
if(root->right != NULL) //if it has a right child, have the parent point to
it
tmp->parent->right = tmp->right;
else //point to the left child. This value is NULL if there is no left child
tmp->parent->right = tmp->left;
}
delete tmp;
}

template <class ItemType>
TreeNode<ItemType>* BST<ItemType>::replace_find(TreeNode<ItemType>* root,int
direction)

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 14 of 14

{
if(direction = 0) //searching left subtree for largest value. go right as much
as possible. Return last node.
{
if(root->right == NULL)
return root;
return replace_find(root->right,0);
}
else //searching right subtree for smallest value. go left as much as possible.
Return last node.
{
if(root->left == NULL)
return root;
return replace_find(root->left,1);
}
}

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\binarytree.htm 12/13/2007
An Introduction to Data Structures with C++ Page 1 of 4

Heaps

Another type of special binary tree is called a heap. In order to understand what a heap is, we must first
define a complete and full binary tree. In a full binary tree, all nodes are either a parent with two
children, or a leaf. In a complete binary tree, all the levels except the last must be completely filled. In
the last level, all nodes must be filled in from the left side, without spacing between them, however, it
does not have to be filled to the end.

A heap is a complete binary tree, which is partially ordered with either the max-heap or min-heap
properties. That is, if a heap is a max-heap, then the children of every node have a value less than that
node. In a min-heap, the children of every node are greater than the node itself. With such a setup, the
main root always has either the highest or lowest value in the tree. For demonstration purposes, we will
show how to implement a max-heap, as it also an important part of the HeapSort algorithm, which will
be covered later. [It is easy to change to code to work as a min-heap by changing the relational operators
between node values]. A max-heap usually used for maintaining priority queues. Priority queues store
values and release the object with the highest "priority" (or value) when needed. For instance, a value is
associated with a particular task in a program, put into such a structure, and then executed based on its
position.

Since a heap must conform to the complete tree property, simple formulae can be libraryeloped to find
the logical position of a node's children and parent given the position of the node itself. It is therefore
very easy and efficient to implement a heap using arrays, and is done so most of the time, even if
dynamic memory allocation is available.

In an array implementation, we must allocate a certain amount of memory space that may be used for
the heap. The space may not be used up, and is therefore a waste of memory. Other times, we may need
to add more nodes to the heap than the allocated memory allows for. However, we usually allocate more
space than we think may be required in order to insure the heap is usable. If we have a tree of very large
structures, this space can be significant. However, this is the price we always pay for greater efficiency.
It should noted though, that we no longer have three pointers for every tree node (left, right, parent),
which took up a lot of space in the dynamic memory implementation. The logical position of a node in a
heap corresponds to the index of the node's array position, thereby making it very easy to access any
node. A generic implementation is shown below.

const int MAX_SIZE = 100; //the maximum amount of elements our heap should have.
This may be changed to any number so long as memory permits, depending on how the
heap will be used.

template <class ItemType>
class Heap
{
public:

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\heaps.htm 12/13/2007
An Introduction to Data Structures with C++ Page 2 of 4

Heap();
int left(int) const;
int right(int) const;
int parent(int) const;
void insert(const ItemType&);
ItemType remove_max();
bool IsEmpty() const;
bool IsFull() const;
int count() const;
ItemType value(int) const;
private:
ItemType array[MAX_SIZE];
int elements; //how many elements are in the heap
void ReHeap(int);
};

//default constructor - initialize private variables
template <class ItemType>
Heap<ItemType>::Heap()
{
elements = 0;
}

The left(), right(), and parent() functions return the index positions of a node's children and
parent. Since the index position of each element correspond to their logical position in the heap, the
functions use simple formulae that are derived by observing the heap structure.

template <class ItemType>
int Heap<ItemType>::left(int root) const
{
assert(root return (root * 2) + 1;
}

template <class ItemType>
int Heap<ItemType>::right(int root) const
{
assert(root < (elements-1)/2); //does a right child exist?
return (root * 2) + 2;
}

template <class ItemType>
int Heap<ItemType>::parent(int child) const
{
assert(child != 0); //main root has no parent
return (child - 1) / 2;
}

The insert() function accepts the new item value as its parameter. It works by inserting the new item
at the end of the heap, and swapping positions with the parent, if the parent has a smaller value than the
item. The new item continues to travel up the heap, swapping its position with its new parents until the
item's parent is larger than it.

template <class ItemType>
void Heap<ItemType>::insert(const ItemType &item)

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\heaps.htm 12/13/2007
An Introduction to Data Structures with C++ Page 3 of 4

{
assert(!IsFull());
array[elements] = item; //elements represents the array position after the last,
since indexing starts with 0
int new_pos = elements; //index of the new item
elements++; //update the amount of elements in heap
while((new_pos != 0) && (array[new_pos] > array[parent(new_pos)])) //loop while
the item has not become the main root, and while its value is less than its parent
{
swap(array[new_pos],array[parent(new_pos)]); //swap the value of item with
its lesser parent
new_pos = parent(new_pos); //update the item's positions
}
}

The remove_max() removes the item with the highest priority and returns its value. The item is swapped
with the last item, and elements is updated to one less.

Notice the item is not physically deleted, it will remain as part of the array. It will not be part of the heap
since elements is updated, and the heap goes only as far as (elements - 1).

The new root may not have the largest priority, therefore the ReHeap() function is then used to insert the
new root into its proper position, thus conserving the heap property.

template <class ItemType>
ItemType Heap<ItemType>::remove_max()
{
assert(!IsEmpty());
elements--; //update the amount of elements in heap
if(elements != 0) //if we didn't delete the root
{
swap(array[0],array[elements]);
ReHeap(0);
}
return array[elements];
}

The ReHeap() function checks of either of root's children are bigger than it, in which case the bigger
child is swapped with root. The process is then continued using recursion, on root's new children. The
function stops when root is bigger than both of its children.

template <class ItemType>
void Heap<ItemType>::ReHeap(int root)
{
int child = left(root);
if((array[child] < array[child+1]) && (child < (elements-1))) //if a right child
exists, and it's bigger than the left child, it will be used
child++;
if(array[root] >= array[child]) //if root is bigger than its largest child,
stop.
return;
swap(array[root],array[child]); //swap root and its biggest child
ReHeap(child); //continue the process on root's new children

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\heaps.htm 12/13/2007
An Introduction to Data Structures with C++ Page 4 of 4

}

The rest of our member functions are east to implement.

template <class ItemType>
int Heap<ItemType>::count() const
{
return elements;
}

template <class ItemType>
ItemType Heap<ItemType>::value(int pos) const
{
assert(pos < elements); //is pos a valid index in the heap
return array[pos];
}

template <class ItemType<
bool Heap<ItemType>::IsEmpty() const
{
return (elements == 0);
}

template <class ItemType>
bool Heap<ItemType>::IsFull() const
{
return (elements == MAX_SIZE);
}

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\heaps.htm 12/13/2007
An Introduction to Data Structures with C++ Page 1 of 6

Lists

A list is one of the most basic data structures in programming. It is a logically sequential order of
elements, any of which can be accessed without restriction. Any element in the list can be removed, and
its value be read or modified. Also, a new element may be inserted into any location in the list structure.
Each element points to the next one in the list, and the last does not reference any other item.

Physically, the elements of a list can be stored at various locations in memory, and the addresses of each
element are not correlated in any way. The list is linked since each element points to the location of the
next item.

The dynamic representation of a list is called a linked list. Each element in the list is called a node, and
contains two values. The first, is the data value that is to be stored. For instance, in a list of names, this
would be a value such as John. The second value in a node is a pointer to the next node in the list. A
common representation of a list node is as follows:

template <class ItemType>
struct ListNode
{
ItemType data;
ListNode<ItemType> *next;
};

Linked lists can be implemented in many ways, depending on how the programmer will use lists in their
program. We will show how to implement a generic class, which can be adapted and modified to use in
most situations. The member functions implemented will be those necessary to add, modify, or delete
nodes in a linked list. The class will be constructed in such a way that if the implementation were to be
changed, the class definition would remain the same, and therefore any program that uses the class will
not need to be altered. This is usually good practice in the design of any class. A definition of the linked
list class is displayed below.

template <class ItemType>
class List
{
public:
List(); //constructor - initialize private variables
~List(); //destructor - free used memory
void insert(const ItemType); //insert new node at current location
void delete(); //remove the current node
void next(); //set current to the next node in the list
void prev(); //set current to the previous node in the list
void reset(); //set current to the first node in the list
void clear(); //remove all nodes in the list

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm 12/13/2007
An Introduction to Data Structures with C++ Page 2 of 6

int length() const; //return the amount of nodes in the list
bool IsEmpty() const; //returns true if the list doesn't have any nodes
bool IsFull() const; //returns true if there is no system memory for additional
nodes
ItemType value() const; //returns the value of the current node
private:
ListNode<ItemType> *list; //points to the list header
ListNode<ItemType> *prevcurrent;
int len;
}

len will contain the total number of nodes in the list, and is self explanatory. prevcurrent and list will be
implemented in a special way, and require further explanation. At first, it would appear logical to have a
pointer directly to the node being referenced. However, this would make the implementation of the prev
(), as well as the insert() and delete() functions to be time consuming. These three functions
require access to the node that precedes the current node being referenced. Therefore, if there were a
pointer directly to the required node, the only solution would be to search through the entire list (in the
worst case scenario) for the preceding node. A temporary pointer would be created that points to the
list's first node, and be used to traverse the list. A condition would then be implemented that triggers
when temp->next->data equals the current node's value.

A more efficient approach is to have prevcurrent store the pointer to the node that precedes the one
being referenced. Therefore, the time needed to otherwise find this node is not wasted.

This approach raises a new concern - if the list has only one item, then a special case will need to be
introduced every place prevcurrent is used, since there is no preceding node to point to. The most
efficient way of solving this problem is by using a header node. A header node is a "dummy" node that
acts as the first node of the list, but is not logically in the list. It is used only in the implementation of the
linked list class, and the programmer who uses the class does not need to know about header nodes. It is
created, manipulated, and deleted by the member functions. list will point to header node. Now let's
look at the implementation of the linked list class.

The class constructor will create the header node, and set prevcurrent to point to its location. It will
also set length to a value of zero.

template <class ItemType>
List<ItemType>::List()
{
list = new ListNode; //create a new ListNode in memory
list->next = NULL; //the header is the only node in the list
revcurrent = list; //set current to the header, since there are no nodes
len = 0;
}

The destructor will delete all nodes of the list, freeing up the memory they occupied. Since the clear()
function does this operation, it can be called in the destructor. In addition, the header node will also be
deleted in the destructor, as the clear() function only removes actual list nodes. Two local variables of
type ListNode<itemType> are used for the operation. traverse is set to the first node in the list, and
used to visit every node. tmp will be used to point to the node to be deleted. Since we cannot move on to
the next node if the current node is deleted (the next pointer will no longer exist), traverse will be set

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm 12/13/2007
An Introduction to Data Structures with C++ Page 3 of 6

to the next node first, after which tmp will be deleted.

template <class ItemType>
void List<ItemType>::clear()
{
ListNode<ItemType> *tmp; //point to the node to be deleted
ListNode<ItemType> *traverse = list->next; //used to visit each node in the
list. The header node is not deleted, so we start with the first actual node.
while(traverse != NULL) //while the list is not empty
{
tmp = traverse; //store the current node.
traverse = traverse->next; //visit the next node
delete tmp; //free the memory taken up by the current node
}
prevcurrent = list; //set current to the header node
len = 0;
}

template <class ItemType<
List<ItemType>::~List()
{
clear(); //delete all list nodes
delete list; //delete the header "dummy" node
}

The insert() function will create a new node preceding the one being referenced, and move the
reference to it. Remember, current->next is the node currently being referenced, since prevcurrent
points to the preceding node.

template <class ItemType>
void List<ItemType>::insert(const ItemType item)
{
assert(!IsFull()); //abort if there is no memory to create a new node
ListNode<ItemType> *NewNode = new ListNode<ItemType>; //create a new node in
memory
NewNode->data = item; //set the node's value
NewNode->next = prevcurrent->next; //referenced node will follow new node in
order
prevcurrent->next = NewNode; //The node that preceded the old node now precedes
the new one. The new node is now referenced.
len++; //increment length
}

The delete() function sets the previous node to point to the node following the one being referenced,
thus removing it from the logical list. It is then deleted from memory.

template <class ItemType>
void List<ItemType>::delete()
{
if(len != 0) //don't delete the header node
{
prevcurrent->next = prevcurrent->next->next; //logically remove it from the
list
delete (prevcurrent->next); //free up memory

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm 12/13/2007
An Introduction to Data Structures with C++ Page 4 of 6

len--; //decrement length
}
}

The next() function is very short, and self explanatory.

template <class ItemType>
void List<ItemType>::next()
{
prevcurrent = prevcurrent->next;
}

The prev() function visits each node until the one that points to prevcurrent is found, and then sets
prevcurrent to this node. The node being referenced now becomes the node prevcurrent pointed to
before the function was executed.

template <class ItemType>
void List<ItemType>::prev()
{
if (len > 1) //run only if there is an element behind the current
{
ListNode *tmp = list;
while(tmp->next != prevcurrent)
tmp = tmp->next;
revcurrent = tmp;
}
}

The reset() function sets the item in reference to the first by setting prevcurrent to the header node.

template <class ItemType>
void List<ItemType>::reset()
{
prevcurrent = list;
}

Next, the length() function simply returns the value of the private length member len. A function
would not be necessary to perform this operation if we were to make the length a public data member, in
which case the programmer can read it directly. However, a member function is used to retrieve this
value for two reasons. First, if the length data member were public, the programmer could also change
the length of the list in the program without modifying the number of nodes in the list. Second, if we
were to change the implementation of the class, and the length was no longer controlled by a single
variable, the programmer would not have to modify his program in order for it to work.

template <class ItemType>
int List<ItemType>::length() const
{
return len;
}

The IsEmpty() function works by checking if the length of the list is zero.

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm 12/13/2007
An Introduction to Data Structures with C++ Page 5 of 6

template <class ItemType>
bool List<ItemType>::IsEmpty() const
{
return (len == 0);
}

The IsFull() function checks to see if there is enough system memory to create a new node. This is
done by attempting to create a new node, and checking if the resulting pointer is NULL. If the new
operation was unsuccessful in assigning the appropriate memory space, NULL is the return value. If the
value is not NULL, the memory space is freed, and false is returned. Otherwise, the function returns
true, meaning no more items can be added to the list.

template <class ItemType>

bool List<ItemType>::IsFull() const
{
ListNode<ItemType> *tmp = new ListNode<ItemType>;
if(tmp == NULL)
return true;
else
{
delete tmp;
return false;
}
}

Finally, the value() function returns the value of the node that is currently being referenced.

template <class ItemType>
ItemType List<ItemType>::value() const
{
return prevcurrent->next->data;
}

Other Types of Lists

Like header nodes, lists may also have a trailer node, which is a dummy node at the end of the list. They
are maintained in a similar fashion to that of header nodes, that is, they are not part of the logical list
structure. Header nodes are used to eliminate any special cases that may arise when inserting a new node
at the end of the list.

Other type of standard lists exist as well. For instance, in a circular list, the last node points to the first
node instead of a NULL value. This would require minor changes in the class implementation. Nodes
inserted at the end must now point to the first node, and the same must be done when searching for the
last node. We must look for a node which points to the first instead of NULL.

Finally, a doubly linked list maintains a prev pointer in ListNode, which points to the previous node.
This makes functions such as insert() and remove() as well as the general implementation very
simple. We no longer need to maintain a pointer to the node preceding the one being referenced. Since
the previous node is required by both functions, we need only check prev for it. Other modifications
needed in order to create a doubly linked list include updating each new node's prev pointer when it is

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm 12/13/2007
An Introduction to Data Structures with C++ Page 6 of 6

created. The front pointer has a prev value of NULL and requires a special condition, unless a header
node or a circular doubly linked list is used.

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\links.htm 12/13/2007
An Introduction to Data Structures with C++ Page 1 of 4

Queues

A queue is another special type of list structure. Elements can only be inserted to the back of a queue,
and only the front element can be accessed and modified. The structure of a queue is the same as that of
a line of people. A person who wishes to stand in line must go to the back, and the person in front of the
line is served. Thus, a queue is a FIFO structure, "First In, First Out". A generic queue is very simple to,
a class definition for a linked queue is shown below.

template <class ItemType>
struct QueNode
{
ItemType data;
QueNode<ItemType> *next;
};

template <class ItemType>
class Queue
{
public:
Queue(); //class constructor - initialize variables
~Queue(); //class destructor - return memory used by queue elements
void enqueue(const ItemType); //add an item to the back of the queue
ItemType dequeue(); //remove the first item from the queue and return its value
ItemType first() const; //return the value of the first item in the queue
without modification to the structure
bool IsEmpty() const; //returns true if there are no elements in the queue
bool IsFull() const; //returns true if there is no system memory for a new queue
node
int length() const; //returns the amount of elements in the queue
private:
QueNode<ItemType> *front;
QueNode<ItemType> *back;
int len;
};

The front pointer will reference the first node in the queue, and the back pointer will reference the last
node in the queue. It is possible to maintain only a front pointer. The last node in the list points to a
NULL value, and can easily be found. However, such a design would be inefficient since finding the last
node every time its location is needed is very time consuming. Therefore, we maintain a reference to it
in our class implementation.

The class constructor initializes the private data members.

template <class ItemType>

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\queue.htm 12/13/2007
An Introduction to Data Structures with C++ Page 2 of 4

Queue<ItemType>::Queue()
{
front = NULL;
back = NULL;
len = 0;
}

The destructor deletes all nodes, freeing up the memory they used. The clear() function is called to do
this operation.

template <class ItemType>
Queue<ItemType>::~Queue()
{
clear();
}

The enqueue() function adds a new item to the back of the queue. The algorithm differs slightly,
depending on whether the queue is empty or not. If the queue is not empty, the last node is set to point to
the newly created node, and the back pointer is set to reference the new node. The value of the new node
is item, and its next pointer has a value of NULL.

If the queue is empty, a similar procedure is used. However, the front pointer is also set to reference the
newly created node. Since there is only one node in the queue, the front and back are one in the same.

template <class ItemType>
void Queue<ItemType>::enqueue(const ItemType item)
{
assert(!IsFull()); //abort if there is no more memory for a new node
if(len != 0) //if the queue is not empty
{
back->next = new QueNode<ItemType>; //create a new node
back = back->next; //set the new node as the back node
back->data = item; back->next = NULL; }
else
{
back = new QueNode<ItemType>; //create a new node
back->data = item; back->next = NULL; front = back; //set front
to reference the new node. Since there it is the only node in the queue, it is
considered to be both the back and front.
}
len++; //increment the amount of elements in the queue
}

The dequeue() function removes the node at the front of the queue and returns its value. The value of
the front node is stored in item. A temporary local pointer is then created to reference the front node,
and the front pointer is set to the next element in the queue. The front node is deleted using tmp as a
reference. The function then checks if the queue is empty by evaluating front. If it has a value of NULL,
the queue is empty, and the back pointer must also be set to NULL since it maintains the address of the
now deleted node.

template <class ItemType>
ItemType Queue<ItemType>::Dequeue()

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\queue.htm 12/13/2007
An Introduction to Data Structures with C++ Page 3 of 4

{
assert(!IsEmpty()); //abort if the queue is empty, no node to dequeue
ItemType item = front->data; //store the value of the first node, to be returned
at the end
QueNode<ItemType> *tmp = front; //temporary pointer to the first node.
front = front->next; //set the second node in the queue as the new front
delete tmp; //delete the original first node
if(front == NULL) //if the queue is empty, update the back pointer
back = NULL;
len--; //decrement the amount of nodes in the queue
return item; //return the value of the original first element
}

The first() function returns the value of the front node without modifying the queue.

template <class ItemType>
ItemType Queue<ItemType>::first() const
{
assert(!IsEmpty()); //abort if the queue is empty
return front->data;
}

The IsEmpty() function checks to see if there are any nodes in the queue by evaluating the front
pointer. If the queue is empty, front has a value of NULL.

template <class ItemType>
bool Queue<ItemType>::IsEmpty() const
{
return (front == NULL);
}

The IsFull() function works exactly the same way as it did with other data structures. A node is
"created" using the new command, which is then checked. If the new command had failed to set aside
the necessary memory, its value is NULL, in which case the function returns true. If the new node is
created successfully, it is deleted and the function returns false.

template <class ItemType>
bool Queue<ItemType>::IsFull() const
{
QueNode<ItemType> *tmp = new QueNode<ItemType>;
if(tmp == NULL)
return true;
else
{
delete tmp;
return false;
}
}

The length() function returns the value of len, which maintains the amount of nodes in the queue.
Again, a function is used to retrieve this value instead of making it a public data member to avoid error
and make our class abstract.

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\queue.htm 12/13/2007
An Introduction to Data Structures with C++ Page 4 of 4

template <class ItemType>
int Queue<ItemType>::length() const
{
return len;
}

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\queue.htm 12/13/2007
An Introduction to Data Structures with C++ Page 1 of 2

Searching

Searching refers to finding the location of an element with a specific value within a collection of
elements. The simplest search algorithm is the sequential search, which evaluates every element in the
array (in order) and compares its value with the one being looked for. The function below demonstrates
a sequential search. The function accepts a pointer to an array of integers, the amount of integer in the
array, and a number to be looked for in the array. A boolean value is returned specifying whether that
number exists in the array.

bool InArray(int *array, int size, int num)
{
for(int j = 0; j < size; j++)
if(array[j] == num)
return true;
return false;
}

The algorithm is very simple to implement, but also very inefficient. In the average and worst case, it
takes O(N) time to find if the item exists. In very large arrays, this is a very slow operation.

A more efficient search can be performed if the array is already sorted. Note, sorting an array first, and
then using one of the more efficient search algorithms may be inefficient in the long run. It is
recommended that such algorithms be used only if the array is sorted to begin with. The binary search
algorithm is one of the simplest searching algorithms on a sorted array. For demonstration purposes, we
will assume the array is sorted in ascending order. However, the algorithm can easily be modified to
work with descending list by changing the relational operators. The binary search algorithm compares
the number being looked for to the value of the middle element in the array. Depending on whether it is
less or greater, the same process is then done on the left or right part of the array respectively. One
possible implementation of a binary search is shown below.

bool InArray(int *array, int left, int right, int num) //left and right are the left and right index values of
the array. These are both necessary as parameters since the function is recursive.
{
if(num == ((left+right)/2)) //if a match is found, return true
return true;
if(left == right) //all possibilities have been searched. num is not in the array
return false;
if(num < ((left+right)/2))
return InArray(array,left,((left+right)/2) - 1,num); //perform the same operation. New right position
is the element before the middle
if(num > ((left+right)/2))

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\searching.htm 12/13/2007
An Introduction to Data Structures with C++ Page 2 of 2

return InArray(array,((left+right)/2) + 1,right,num); //perform the same operation. New left position is
the element directly after the middle.
}

Hasing is a somewhat different approach to searching, as it attempts to make searching O(1) efficiency.
A hash function is applied, which uses some type of algorithm or formula to determine what index of an
array (also called the hash table) the object should be in. The algorithm or formula depends on the type
of data in each situation.

Most times, at least several collisions occur in the hash table. A collision is when an index is returned by
the hash function that already has a value. This could mean it is either a duplicate, or the hash function is
not unique, which is often times the case. There are two approaches to resolving a collision.

The first is known as open hashing. In this method, collision values are stored "outside" of the array. An
example of open hashing is having each array index point to another array, or linked list. Thus, all
values that hash to that location are stored in the list. The items in the list can be sorted by their value,
access frequency, or the order in which they were put into the table.

Closed hashing is another collision resolution technique. In this approach, the collision values are stored
at a different location in the hash table. The hash function takes care of this, as the resolution is
dependent on data and programming task. Whatever algorithm the hash function uses to resolve a
collision must then be used when searching to find the item required.

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\searching.htm 12/13/2007
An Introduction to Data Structures with C++ Page 1 of 8

Sorting

Finding better algorithms to sort a given set of data is an ongoing problem in the field of computer
science. Sorting is placing a given set of data in a particular order. Simple sorts place data in ascending
or descending order. For discussion purposes, we will look at sorting data in ascending order. However,
you may modify the code to sort the data in descending order by reversing the relational operators (i.e.
change 'nums[j] < nums[j-1]' to 'nums[j] > nums[j-1]').

In this lesson we will analyze sorts of different efficiency, and discuss when and where they can be used.
In order to simplify the explanation of certain algorithms, we will assume a swap() function exists that
switches the values of two variables. An example of such a function for int variables is displayed
below.

void swap(int &item1,int &item2) //reference parameters, point directly to the
storage location of the variables passed. Local copies are not made, and these
values are saved after the function life span ends. See 'Functions' in the
preliminary lesson for further information.
{
int tmp;
tmp = item1;
item1 = item2;
item2 = tmp;
}

We will first analyze sorts that are O(N^2). These sorts are very easy to understand, however they are
very slow when there are a lot of elements to be sorted.

The first sort we will look it is called the insertion sort. The algorithm processes each element in turn,
and compares it to the elements before it. The first element has no elements before it for comparison, so
it is left alone. In the next iteration, the second element is evaluated. It is compared to the element
directly before it, which is the first element in the structure. If the second element has a value less than
the first, their positions are switched. If they second element is more than the first, then they are left as
they are, and the third element is processed. The third element is then compared to the second element in
the new list ('new' list here since the first two items may have been swapped). If it is less than the
second, then they are swapped, and it is then compared to the first element. If it is more than the second
element, it is left in place and the process continues to the next element.

In short, each element is moved to the front of the list by switching positions with the previous elements
as long as it is smaller than the elements before it.

The algorithm is programmed using two nested for loops. The first loop creates n-1 iterations, where n
is the number of elements in the list. Since element[0] does not have any elements before it to compare

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm 12/13/2007
An Introduction to Data Structures with C++ Page 2 of 8

to, we start with the second element. The nested loop statement starts at the element that is being
processed by the first loop, and works backwards, comparing the element to the one before it. If it is
smaller, a swap is made and the loop continues. If it is larger, the loop ends, and the next iteration begins
in the outer loop. The code for the insertion sort algorithm is shown below. A standard array of int
variable is used for simplicity. However, you can modify the code to work for any linear structure.

void InsertionSort(int *nums,int n) //array called nums with n elements to be
sorted
{
for(int i=1; i<n; i++)
for(int j=i; (j>0) && (nums[j]<nums[j-1])); j--)
swap(j,j-1);
}

The next O(N^2) algorithm that we will analyze is the bubble sort. The bubble sort works from the
bottom-up (back to front), and evaluates each element to the one before it. If the element on the bottom
has a smaller value than the top, the two are swapped, if not - they remain in their original position. The
algorithm compares the next two elements from the bottom-up, no matter what the outcome of the
previous comparison was. In this fashion, the smallest value "bubbles up" to the top in each iteration. In
subsequent comparisons, the values that were bubbled up in previous iterations are no longer compared,
since they are in place.

The code for the bubble sort is shown below, using a standard array of int variables. Two nested for
loops are used. The first loop has n iterations, the number of elements. Each iteration, at least one
element is set into its proper sorted position. The inner for loop runs from the bottom-up, comparing
adjacent values, and stops at the group of values that have already been set in place, this position being
one more each iteration of the outer loop.

void BubbleSort(int *nums, int n)
{
for (int i=0; i<n-1; i++)
for (int j=n-1; j>i; j--)
if(nums[j] < nums[j-1]
swap(j,j-1);
}

The selection sort will be the final O(N^2) sorting algorithm that we will look at. In a selection sort, the
entire list is searched to find the smallest value. That is, we compare every element in the structure and
find the smaller value, and then swap it with the first item. Then, every element but the first is searched
to find the smallest value out of that group, and it is then swapped with the item in the second position.
This continues until all items are in the correct order.

This technique is similar to what would be done if a person were sorting a list of items by hand. The list
is searched for the smallest value, which is then crossed out and written as the first item in a new list.
The computer algorithm is the same, however, in order to preserve memory and not have to make two
lists, we use a swap operation.

The selection sort uses two nested for loops, the outer having n-1 iterations. When there is only one
item left, it will appear in its correct position, last in the structure. The inner for loop searches the
unsorted portion of structure (from bottom-top) by assuming the first element in the unsorted section is

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm 12/13/2007
An Introduction to Data Structures with C++ Page 3 of 8

the smallest, and then comparing it to each element in turn. If a smaller element is found, it is considered
to be the smallest, and compared to the rest of the elements. The code for selection sort is shown below.

void SelectionSort(int *nums, int n)
{
int low; //holds the index of the smallest element in the unsorted portion
for (int i=0; i<n-1; i++)
{
low = i; //assume the first item in the unsorted section is the lowest,
unless a smaller value is found
for (int j=n-1; j>i; j--)
{
if (nums[j] < nums[low]) //if element has smaller value than low
low = j; //it then becomes the new low
}
swap(i,low); //switch the current position item with the smallest in the
unsorted portion
}
}

The algorithms that follow are O(N log N).

The next sorting algorithm that will be covered is the Shell sort, named after its creator D.L. Shell. It is
the first algorithm that we will look at that swaps non-adjacent elements, and takes a "divide and
conquer" approach. The list is divided into many sublists, which are sorted, and are then merged
together. The shell sort takes advantage of the insertion sort, which is very efficient in a best case
scenario (that is, the list being sorted is already 'near sorted').

The shell sort divides the list into n/2 sublists, each being n/2 apart. For instance, in a list of ten
elements, the first iteration would consist of five lists, two elements each. The first list would be list[0]
and list[5], the second would be list[1] and list[6], and so on. Each of these lists is then sorted using an
insertion sort. During the next iteration, we divide the list into bigger sublists, with the elements being
closer together. In the second iteration, there are n/4 lists, each element being n/4 apart. These lists are
then sorted using insertion sort, and so on. The process continues with twice the amount of lists each
iteration as in the one before. Each iteration, the list becomes closer to being sorted. The last sort is done
on the entire list, using a standard insertion sort. Since the list should be 'near sorted', the algorithm is
very efficient.

Note, during some iterations, sublists will contain unequal amount of elements, since the amount of
sublists does not evenly divide into the total number of elements. Remember, in integer division, the
decimal in the answer is dropped, e.g. 5/2 = 2. Therefore, if we have seven elements, there are three lists
during the first iteration. One contains three elements, and the other two each contain two elements.

{6,3,1,5,2,4,9} -> {6,3,1,5,2,4,9}

In the example, list one starts at list[0] (6) and includes every item two apart. The next list starts at list[1]
(3) and also includes every item two apart from its position. Since there is no item that is two locations
after the '4', the list has only two items.

The code for shell sort is very simple and straightforward. A slightly modified version of the insertion

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm 12/13/2007
An Introduction to Data Structures with C++ Page 4 of 8

sort is used however. Since the elements of the sublists to be sorted are not adjacent, an increment
parameter is added, which specifies how far apart the elements of the sublists are. The value '1' is then
replaced by the increment in the original insertion sort code, since it now compares values that are
increment apart.

void InSortShell(int *nums,int n, int incr) //array called nums with n elements to
be sorted, incr apart
{
for(int i=incr; i<n; i+=incr)
for(int j=i; (j>=incr) && (nums[j] < nums[j-incr]); j-=incr)
swap(nums[j],nums[j-incr]);
}

void ShellSort(int *nums, int n)
{
for(int i=n/2; i>2; i/=2) //each iteration there are twice as many sublists.
Divide the distance between each element by 2.
for(int j=0; j<i; j++) //sort each sublist
InSortShell(&nums[j],n-j,i); //the first element of each sublist begins at
j, and therefore the entire list is j items shorter (n-j). The elements of the
sublist are i apart.
InSortShell(nums,n,1); //do a standard insertion sort on the now nearly sorted
list.
}

QuickSort is the quickest algorithm in the average case, however it has a very bad running time in a
worst case scenario (e.g. items completely out of order, in reverse of how they should appear sorted,
etc). The QuickSort takes a "divide and conquer" approach. A value called a pivot is first selected,
usually it is the value of the middle element. A "partition" of the array is then preformed. Any elements
that are less than the pivot value will be moved to the beginning of the list, followed by the pivot
element, and then all values that are bigger will appear at the end. The elements in each 'sublist' do not
need to be sorted in any way with respect to each other, but this order must be maintained. The
QuickSort algorithm is then used on each sublist, through recursion. This continues until the structure
has been sorted.

Let's take a look at how the partitioning algorithm is implemented. The algorithm starts at each end of
the sublist being analyzed. It then moves inward from each end. First, it uses a while loop to find the
first value from the left that is greater than the pivot. Then, it uses another while loop to find the first
value from the right that is less than the pivot. It swaps these two values, and continues the process until
the left position value and right position value meet somewhere in the center. When this algorithm is
finished, every element at the left position and after is greater than the pivot, and the elements before it
are less than the pivot. The code for the partition algorithm is shown below.

int part(int *nums,int left,int right,int pivot)
{
do
{
while(nums[left] < pivot) //find the next position greater than the pivot
from the left
left++;
while(nums[right] > pivot) //find the next position less than the pivot from
the right

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm 12/13/2007
An Introduction to Data Structures with C++ Page 5 of 8

right--;
swap(nums[left],nums[right]); //swap these two values
} while(left<right); //move inward until they cross
swap(nums[left],nums[right]); //the last swap occurs after left and right have
crossed (i.e. left is already < right, after which the outer loops end), therefore
we must re-swap these elements back into their correct positions
return left;
}

The pivot of each list is simply the middle element.

int getpivot(int left,int right) //the left and right indices of the sublist
{
return (left+right)/2;
}

Now let's take a look how QuickSort brings everything together. The left and right indices of the sublist
must be passed into the QuickSort() function. Recursion is used by the algorithm to run QuickSort on
each partition, which is why these values are required as parameters. On the initial call to sort an array, 0
would be used for left index, and n-1 for the right index. The pivot value itself is already locked in the
correct position. Note, the pivot itself is not evaluated for the partition algorithm to work correctly. To
accomplish this, we swap it with the position of the last element, and start the partition with right-1.
After the partition, it is then swapped with the element at the first position of the right sublist. Now, all
the elements to the left are less than the pivot, and all those to the right are greater than the pivot.

void QuickSort(int *nums, int left, int right)
{
int pivot = getpivot(left,right); //find the pivot
swap(nums[pivot],nums[right]); //move the pivot to the last position
int r_sublist = part(nums,left,right-1,nums[right]); //partition left->right-1,
excluding the pivot
swap(nums[r_sublist],nums[right]); //move the pivot in the proper position
if((r_sublist - left) > 1) //if a left sublist exists, sort it
QuickSort(nums,left,r_sublist-1);
if((right - r_sublist) > 1) //if a right sublist exists, sort it
QuickSort(nums,r_sublist+1,right);
}

The merge sort is another algorithm which takes a "divide and conquer" approach. It begins by dividing
a list into two sublists, and then recursively divides each of those sublists until there are sublists with
one element each. These sublists are then combined using a simple merging technique.

In order to combine two lists, the first value of each is evaluated, and the smaller value is added to the
output list. This process continues until one of the lists has become exhausted, at which point the
remainder of the other list is simply appended to the output list. Two closest lists are combined at each
end, until all the elements are merged back into a single list.

The code for merge sort is very straightforward. The algorithm uses two arrays to accomplish the task.
First, the items to be sorted are copied to a temporary array, where they are divided. The original array
acts as the output array, and will contain the sorted list at the end. The parameters of the MergeSort()
function consist of these two arrays, as well as the left and right boundaries of the list to be sorted. This

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm 12/13/2007
An Introduction to Data Structures with C++ Page 6 of 8

is required since the MergeSort() function is recursive, and repeats the algorithms to sublists within the
array. On the initial call, left will have a value of 0, and right will have a value of n-1.

void MergeSort(int *nums, int *tmp, int left, int right)
{
if (left==right) return; //if the boundaries are the same, the sublist has only
one element, and cannot be further split
int mid = (left+right)/2;
MergeSort(nums,left,mid); //sort the first half
MergeSort(nums,mid+1,right) //sort the second half
//copy the sublist into the temporary array
for(int i = left; i<=right; i++)
temp[i] = array[i];
//merge the lists
int l = left; //the first element in the first sublist
int r = mid + 1; //the first element in the second sublist
for(int j=left; j<=right; j++)
{
if(l == mid+1) //if the index of the left list is equal to the first element
of the right list, the left list has ended. Insert next item from right list.
nums[j] = tmp[r++];
else if (r > right) //if the index of the right sublist has exceeded it's
right boundary, the right list has ended. Insert next item from the left list.
nums[j] = tmp[l++];
else if(tmp[l] < tmp[r]) //if two lists exist, and the current element in the
left is smaller than the right, insert it into the next position in the output
array and move to the next element in the left list.
nums[j] = tmp[l++];
else //the current element in the right sublist is smaller than the one in
the left sublist, insert it into the output array and move to the next element in
the right list.
nums[j] = tmp[r++];
}
}

The next algorithm, HeapSort, is very simple to implement. The array to be sorted is first inserted as a
heap structure. Then, a loop is used to remove each element of the heap. Remember from the lesson on
heaps, when an element is removed, it is still part of the physical array, and is swapped with the last item
of the heap. The process continues and each time the largest item is pushed to the end of the heap (which
is directly before the item discarded the previous iteration, since the heap becomes smaller). This
approach requires a slightly modified version of the heap class. The changes that were made are shown
in bold below.

const int MAX_SIZE = 100;

template <class ItemType>
class Heap
{
public:
Heap(ItemType*,int);
int left(int) const;
int right(int) const;
int parent(int) const;
void insert(const ItemType&);

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm 12/13/2007
An Introduction to Data Structures with C++ Page 7 of 8

ItemType remove_max();
bool IsEmpty() const;
bool IsFull() const;
int count() const;
ItemType value(int) const;
private:
ItemType *array;
int elements; //how many elements are in the heap
void ReHeap(int);
void BuildHeap();
};

This version of the heap class accepts a pointer to the array to be sorted in the class constructor. The
second parameter of the constructor is the size of the array. The array is initialized as a pointer (which
will point to the array passed to the construtor), and a the function BuildHeap() has been added, which
will make the array into a heap.

template <class ItemType>
Heap<ItemType>::Heap(ItemType *array_ptr, int size)
{
array = array_ptr;
elements = size;
BuildHeap();
}

The BuildHeap() function begins at the first non-leaf node and works up the array, sorting each subtree
with the ReHeap() function. Since leafs can't travel down any further, they do not need to be processed.
Instead, they will fall into their proper place by being exchanged with a node on a higher level if
necessary, when that node comes down.

By working up, the subtrees of a node are made into heaps first, which allows the ReHeap() function to
be used. The ReHeap() function relies on the fact that the node's subtrees are heaps. This is because it
compares the node to its children, and makes a switch if necessary. If the elements did not follow the
heap property, larger items on lower levels would never be brought to the top, since a comparison is
made with the node's children and not the parent.

template <class ItemType>
void Heap<ItemType>::BuildHeap()
{
for(int j = n/2 - 1; j >= 0; j--)
ReHeap(j);
}

The heap sort algorithm makes the array to be sorted into a heap, and then follows the above procedure.

template <class ItemType>
void HeapSort(ItemType *array, int size)
{
Heap<ItemType> sort(array, size);
for(int j = 0; j < size; j++)
sort.remove_max();
}

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm 12/13/2007
An Introduction to Data Structures with C++ Page 8 of 8

Note, although algorithms such as the QuickSort seem to be the only ones that should be used, this is
not always the case. If you know that the array to be sorted is very small, a O(N^2) algorithm is much
more simple to implement, and the difference will not be significant.

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\sorting.htm 12/13/2007
An Introduction to Data Structures with C++ Page 1 of 4

Stacks

A stack is a special type of list, where only the element at one end can be accessed. Items can be
"pushed" onto one end of the stack structure. New items are inserted before the others, as each old
element moves down one position. The first element is referred to as the "top" item, and is the only item
that may be accessed at any time. In order to access items that are further down the stack, they must be
moved to the top by "popping" the appropriate number of items. Popping refers to removing the top
element of a stack. This is referred to as a LIFO structure, "Last In, First Out".

These rules make stacks very restricted in use, however they are very efficient and much easier to
implement than lists. The uses of stacks vary from programming a simple card game, to maintaining the
order of operations in a complex program. For example, a stack is useful in a management program
where the newest tasks must be executed first. The node of a stack is usually presented with the
following structure, which is very similar to that of a list node.

template <class ItemType>
struct StackNode
{
ItemType data;
StackNode<ItemType> *next;
};

Implementing a generic stack class, which can be modified to work in any type of programming
situation is very easy to do. A definition of such a class is shown below.

template <class ItemType>
class Stack
{
public:
Stack(); //class constructor - initialize private variables
~Stack(); //class destructor - free up used memory
void push(const ItemType); //add a new node to the top of the stack
ItemType pop(); //remove the top node and return its contents
ItemType top() const; //return the top node without popping it
void clear(); //delete all nodes in the stack
bool IsEmpty() const; //return true if the stack has no elements
bool IsFull() const; //return true if there is no free memory for new nodes
int count() const; //return the amount of nodes on the stack
private:
StackNode<ItemType> *top; //pointer to the top node in stack
int counter; //maintain the amount of nodes in the stack
};

The class constructor sets counter to zero. Since there are no nodes in the stack when an instance of the

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\stack.htm 12/13/2007
An Introduction to Data Structures with C++ Page 2 of 4

class is first created, top is set to NULL.

template <class ItemType>
Stack<ItemType>::Stack()
{
counter = 0;
top = NULL;
}

The role of the destructor is to delete all nodes in the list, and return the memory they occupy to free
store. Since the clear() function does this task, it can be called by the destructor.

template <class ItemType>
Stack<ItemType>::~Stack()
{
clear();
}

The push() function inserts a new node on top of the stack, and sets the top pointer to reference this new
node. First, we check if there is enough system memory to create a new node, and then create the node,
assigning it to top. The node's value is set equal to item, and its next component is set to the node that
was on top before the creation of the new node.

template <class ItemType>
void Stack<ItemType>::push(const ItemType item)
{
assert(!IsFull()); //abort if there is not enough memory to create a new node
StackNode<ItemType> *tmp = new StackNode<ItemType>; //create a new node on top
of the others with value item. set the original top node to follow the new one.
tmp->data = item; tmp->next = top; top = tmp; counter++; //increment
the amount of nodes in the stack
}

The pop() function removes the top node from the stack (freeing up the memory it uses) and returns its
value. top is set to the next node in the stack, and a temporary local variable(tmp) is created to point to
the original top node. It is then used to reference the memory address of the node to delete. If we were to
delete the node using top as a reference, the position of the next node in the stack would be lost, since
top->next would no longer exist.

template <class ItemType>
ItemType Stack<ItemType>::pop()
{
assert(!IsEmpty()); //abort if the stack is empty, no node to pop
ItemType item = top->data; //maintain top value, to be returned later
StackNode<ItemType> *tmp = top; //create a temporary reference to the top node
top = top->next; //set top to be the next node
delete tmp; //delete the top node
counter--; //decrement the amount of nodes in the stack
return item; //return the original top value
}

The top() function simple returns the value of the top node, without any modifications to the stack.

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\stack.htm 12/13/2007
An Introduction to Data Structures with C++ Page 3 of 4

template <class ItemType>
ItemType Stack<ItemType>::top() const
{
assert(!IsEmpty());
return top->data;
}

The clear() function delete all nodes in the stack, and frees the memory they occupy. Each node in the
stack is visited using a loop, which executes until it reaches a NULL reference. A temporary variable is
used for the same reason as in the list implementation. If we were to delete a node using top as a
reference, the position of the next node in the list would be lost, since top->next would no longer exist.

template <class ItemType>
void Stack<ItemType>::clear()
{
StackNode<ItemType> *tmp;
while(top != NULL) //loop through every node in the stack
{
tmp = top; //reference the top node
top = top->next; //set top to the next node
delete tmp; //delete the original top node
}
}

The IsEmpty() function returns true if the stack has no nodes. This task is accomplished very simply by
checking to see if the top pointer is NULL.

template <class ItemType>
bool Stack<ItemType>::IsEmpty() const
{
return (top == NULL);
}

The IsFull() function checks to see if there is enough system memory avaliable to create a new node
for the stack. It works exactly the same way as it did in the linked list class. A node is "created" using
the new command, which is then evaluated. If the new command had failed to set aside the necessary
memory, its value is NULL, in which case the function returns true. If the new node is created
successfully, it is deleted, and the function returns false.

template <class ItemType>
bool Stack<ItemType>::IsFull() const
{
StackNode<ItemType> *tmp = new StackNode<ItemType>;
if(tmp == NULL)
return true;
else
{
delete tmp;
return false;
}
}

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\stack.htm 12/13/2007
An Introduction to Data Structures with C++ Page 4 of 4

The count() function returns the amount of nodes in the stack, which is maintained by the class' private
counter member. Again, this value can be maintained using a public member which the programmer can
access directly, however it is good practice to hide this value from the programmer, since it can be
modified without any nodes being added or removed. Also, if we were to change the class
implementation, a program using the stack class would not require any change, since all the new code
will be written in the count() function.

template <class ItemType>
int Stack<ItemType>::count() const
{
return counter;
}

© 2000 ThinkQuest Team C005618

file://F:\My Ebooks\Introduction_to_Data_Structures_with_C++\stack.htm 12/13/2007