# Introduction

CHAPTER 6

Data Structures

6.1 Introduction
In this section we will examine various ways of implementing sets of elements efﬁciently. The actual representation used in each case depends on the purpose for which the set is to be used. Different data structures have different strengths and weaknesses as we shall see. It is therefore critical that we understand these strengths and weaknesses so that we may pick the right data structure for our application. Sets are typically used to hold and retrieve elements as a part of some algorithm or application. Depending on the algorithm, various operations may need to be favored over others. What are some of the operations that we might wish to perform on sets?
1. Member 2. 3. 4.

— to determine whether a particular element is a member of a particular set. Insert — to insert a given element into a set. Delete — to delete a given element from a given set Union— to take the union of two sets
99

Chapter Draft of October 22, 1998

Data Structures

— take the intersection of two sets. 6. Find — given an element a and a collection of sets that form a partition, ﬁnd the name of the set of which contains a. 7. Min — ﬁnd the minimum element of some set 8. Split — assuming an ordered set and an element a, split the set into two sets such that all the elements of the ﬁrst set have values less than or equal to a and all members of the second set have values greater than a. 9. Take — ﬁnd and remove an arbitrary element from the set. 10. Iterate — iterate over all the members of the set The data structure of choice depends on the operations that are needed in the algorithm being implemented. The idea is to use a data structure that is as fast as possible for the desired operations. For example, if the operations are: Member, Insert, and Delete, and the set consists of integers in a compact range, say [0:10000], then the best representaton may be a bit array:
bool isMember[range];

5. Intersection

Then Insert, Member, and Delete are simple indexing operations. The only problem with this representation is that a set must be initialized to false for the entire range. This can be avoided by a famous trick, which requires more memory, an a little bit more time per operation. It also makes it possible to iterate over the set in time proportional to the size of the set. Suppose we declare the array member as follows:
int isMember[range];

but we also declare an array the same size to hold the actual elements:
int member[range]; int size = 0;

The intention is this: The isMember array contains an index of the member array location containng the actual value. Therefore, for an element to be a member, the value in the member array must be between 0 and the current value of size. Thus, the membership test for x is:
(0 <= (int i = isMember[x]) && i < size ? member[i] == x : FALSE)

100

Advanced Programming and Applied Algorithms

Introduction

and insertion can be accomplished by the following code fragment:
if !Member(x) then { int i = size++; member[i] = x; isMember[x] = i;

Thus the data structure can support Member and Insert in constant time and Iterate in time proporational to the number of elements in the set. Delete, on the other hand is harder:
if Member(x) then { int i = isMember[x]; if (i != --size) { int t = member[size]; member[size] = member[i]; member[i] = t; isMember[t] = i; } }

Here is the complete class along with its iterator:
const int NULL_ELEMENT = -1; class FastSet { friend class FastSetIterator; public: FastSet(int); ~FastSet() { delete [] isMem; delete [] member; } FastSet(const FastSet &); FastSet & operator=(const FastSet &); bool isMember(int) const; void insertMember(int); void deleteMember(int); int popFirstMember(); void print(); private: int * isMem; int * member; int range; int size; void swapMembers(int, int); void copyFastSet(const FastSet & s); };

Chapter Draft of October 22, 1998

101

Data Structures

class FastSetIterator { public: FastSetIterator(const FastSet & s) { curSet = &s; curMemLoc = 0; } bool notExhausted() const { return curMemLoc < curSet->size; } int curr() const { return ( notExhausted() ? curSet->member[curMemLoc] : NULL_ELEMENT ); } int operator*() const { return curr(); } operator bool() const { return notExhausted(); } FastSetIterator & operator++() { ++curMemLoc; return *this; } FastSetIterator operator++(int); private: int curMemLoc; const FastSet * curSet; }; class elementOutOfRangeException { public: elementOutOfRangeException (int i) { val = i; } int value() const { return val; } private: int val; };

Here are the implementations:
FastSet::FastSet(int r){ range = r; size = 0; isMem = new int[range]; member = new int[range]; } FastSet::FastSet(const FastSet & s) { copyFastSet(s); } FastSet & FastSet::operator=(const FastSet & s) { if (this != &s) { delete [] isMem; delete [] member; copyFastSet(s); } return *this;

102

Advanced Programming and Applied Algorithms

Introduction

} bool FastSet::isMember(int x) const { if (x < range && x >= 0) { int i = isMem[x]; return (0 <= i && i < size ? member[i] == x : false); } else throw elementOutOfRangeException(x); return false; // Eliminates a warning } void FastSet::insertMember(int x) { if (!isMember(x)) { int i = size++; member[i] = x; isMem[x] = i; } } void FastSet::deleteMember(int x) { if (isMember(x)) { int i = isMem[x]; if (i != --size) swapMembers(i,size); } } int FastSet::popFirstMember() { if ( size > 0 ) { int first = member[0]; deleteMember(first); return first; } else return NULL_ELEMENT; } void FastSet::print() { cout << "{"; for (int i = 0; i < size; i++ ) { cout << " " << member[i]; } cout << " }" << endl; } void FastSet::swapMembers(int i,int j) { int t = member[j]; member[j] = member[i]; isMem[member[j]] = j; member[i] = t; isMem[t] = i; } void FastSet::copyFastSet(const FastSet & s) { range = s.range;

Chapter Draft of October 22, 1998

103

Data Structures

size = s.size; isMem = new int[range]; member = new int[range]; for (int i = 0; i < size; i++) { insertMember(s.member[i]); } } FastSetIterator FastSetIterator::operator++(int) { FastSetIterator ret = *this; curMemLoc++; return ret; }

As a simple example of usage, consider the following code:
FastSet intset(100); try { for (int i = 1; i < 100; i += 2) { intset.insertMember(i); } intset.print(); cout << endl; // Print in order for (int i = 1; i < 100; i++) { if (intset.isMember(i)) cout << " " << i; } cout << endl; // Print fast but out of order FastSetIterator p = intset; while (p) { cout << *p++ << " "; } cout << endl; // Cause an exception if (intset.isMember(100)) cout << "It’s in the set!" << endl; } // Catch an out-of-range exception catch (elementOutOfRangeException e) { cout << "Exception on membership test:" << e.value() << endl; } }

104

Advanced Programming and Applied Algorithms

Hashing

6.2 Hashing
What is the best representation for member, insert, delete and iterate if the range is too large for a simple array or if the set is much smaller than the range? Answer: a hash table. My own preference is for bucket hash like the one given in the table part for lab 1. It is easy to see how to do Member, Insert and Delete, but what about Iterate? That could be done in either of two ways: linking all elements together or linking non-empty buckets. Can we get away with a singly-linked list of all elements and still do delete in constant time?. One strategy is to mark an element deleted and actually adjust the links on the next Iterate, charging the cost to the delete operations. One aspect of bucket hashing has to do with growing the number of buckets as the table grows. In the reference implementation, we used a strategy that doubled the number of buckets whenever the number of elements in the hash table is the same as the number of buckets, rehashing each element. One question is whether this defeats the constant time average-time cost of hashing. To analyze this, assume that we will amortize the total cost of hashing across all the elements of the table. When there are n = 2m elements in the table (just before the next rehash), we can say that 2m–1 of them have been hashed only once, while 2m–2 have been hashed twice, 2m–3 have been hashed 3 times, etc. Thus, the total T(n) of hashed insertions for 2m elements is: 1(2m–1) + 2(2m–2) + ... + (m –1)21 Hence
m–1

T (n) =

∑ ( m – k )2

m–1 k

= 2(m – 1) +
m–2

∑ ( m – k )2
j+1

k

k=1

k=2

= 2(m – 1) +

∑ ( m – j – 1 )2

j=1

Chapter Draft of October 22, 1998

105

Data Structures

m–2

= 2(m – 1) + 2

∑ ( m – j )2
m–1

m–2 j

–2

∑2

j

j=1

j=1

= 2(m – 1) + 2(T (n) – 2 = 2T ( n ) – 2 Rearranging, we get: T (n) = T (2 ) = 2
m m+1

) – 2(2

m–1

– 2)

m+1

+ 2m + 2

– 2m – 2 = O ( n )

Thus, the total cost of rehashing is bounded by a constant times the total number of elements in the table.

6.3 Trees
Why would anyone ever choose to use a tree over a hash table for set representation? The answer is that trees can be used to support ordering, so that operations like Min and Split can be supported. 6.3.1 Standard Ordered Trees In this section we will show how to implement a simple ordered tree, which is deﬁned as one in which the inorder walk will produce an ordered list. To deﬁne a tree, we use a standard mechanism in which the internals of a tree are handled by a class TreeNode, which can only be manipulated by class Tree and tis associated class TreeIterator.
class TreeNode { friend class Tree; friend class TreeIterator; public: Key keyData() const { return datum; } virtual void print() const { datum.print(); } protected: // Only friends and derived classes can make a TreeNode TreeNode(); TreeNode(const TreeNode & tn) { datum = tn.datum; right = tn.right; left=tn.left; parent = tn.parent; } TreeNode(const Key &, TreeNode * par); virtual ~TreeNode() { }

106

Advanced Programming and Applied Algorithms

Trees

TreeNode * left; TreeNode * right; TreeNode * parent; Key datum; TreeNode * search(const Key &); virtual TreeNode * insert(const Key &); TreeNode * minimum(); virtual TreeNode * deleteKey(const Key &); TreeNode * successor(); virtual void structurePrint(int) const; virtual TreeNode * clone(const Key & k, TreeNode * p) const { return new TreeNode(k,p); } virtual TreeNode * cloneSubtree(TreeNode *) const; void deleteTreeNode(); void copyValues(TreeNode * tp) { datum = tp->datum; } void swapValues(TreeNode *); virtual void relocateNode(TreeNode *, TreeNode *, TreeNode *); void setParents(); }; class Tree { friend class TreeIterator; public: Tree() { root = 0; }; Tree(const Tree &); Tree(TreeNode *); virtual ~Tree() { deleteTree(); }; TreeNode * search(const Key &) const; TreeNode * insert(const Key &); void deleteKey (const Key &); TreeNode * minimum() const; void print() const; void structurePrint() const; protected: TreeNode * root; Tree * parent; TreeNode * searchTree(TreeNode * , Key &); Tree * copyTree(); void deleteTree(); }; // TreeNode Implementations TreeNode::TreeNode(const Key & k, TreeNode * par) { left = 0; right = 0; datum = k; parent = par; }

Chapter Draft of October 22, 1998

107

Data Structures

TreeNode * TreeNode::search(const Key & k) { if (k < datum) { if (left) return left->search(k); else return 0; } else if (k > datum) { if (right) return right->search(k); else return 0; } else { // datum == k return this; } } TreeNode * TreeNode::insert(const Key & k) { if (k < datum) { if (left) return left->insert(k); else return (left = clone(k,this)); } else if (k > datum) { if (right) return right->insert(k); else return (right = clone(k,this)); } else { // datum == k return this; } } TreeNode * TreeNode::minimum() { TreeNode * t = this; TreeNode * tLeft = t->left; while(tLeft) { t = tLeft; tLeft = t->left; } return t; } TreeNode * TreeNode::deleteKey(const Key & k) { if (k == datum) { // delete this one if (left == 0) { if (parent->left == this) parent->left = this->right; else parent->right = this->right; if (this->right) right->parent = this->parent; return this; } else if (right == 0) { if (parent->left == this) parent->left = this->left; else parent->right = this->left; if (this->left) left->parent = this->parent; return this; 108
Advanced Programming and Applied Algorithms

Trees

} else { TreeNode * m = right->minimum(); swapValues(m); return (m = right->deleteKey(k)); } } else if (k < datum) return (left ? left->deleteKey(k) : 0 ); else /* k > datum */ return (right ? right->deleteKey(k) : 0 ); } TreeNode * TreeNode::successor() { TreeNode * rt = right; if (rt) return rt->minimum(); else { TreeNode * tp = this; while(tp = tp->parent) { if (tp->datum > this->datum) return tp; } return 0; } } void TreeNode::structurePrint(int level) const { for (int i = 0; i < level; i++) cout << " "; this->print(); cout << endl; if (left) left->structurePrint(level+1); if (right) right->structurePrint(level+1); } TreeNode * TreeNode::cloneSubtree(TreeNode * parent) const { TreeNode * newNode = new TreeNode(this->datum, parent); newNode->left = (left ? left->cloneSubtree(newNode) : 0); newNode->right = (right ? right->cloneSubtree(newNode) : 0); return newNode; } void TreeNode::deleteTreeNode() { if (left) { left->deleteTreeNode() ; delete left; } if (right) { right->deleteTreeNode() ; delete right; } } void TreeNode::swapValues(TreeNode * tn) { Key k = datum; datum = tn->datum;
Chapter Draft of October 22, 1998

109

Data Structures

tn->datum = k; } void TreeNode::relocateNode(TreeNode * l,TreeNode * r ,TreeNode * p){ left = l; right = r; parent = p; setParents(); } void TreeNode::setParents() { if (left) left->parent = this; if (right) right->parent = this; } // Tree Implementations Tree::Tree(const Tree & t) { if (t.root) root = t.root->cloneSubtree(0); else root = 0; } Tree::Tree(TreeNode * tp) { root = tp; } TreeNode * Tree::search(const Key & k) const { if (root) return root->search(k); else return 0; } TreeNode * Tree::insert(const Key & k) { if (root) return root->insert(k); else return (root = new TreeNode(k,0)); } void Tree::deleteKey (const Key & k) { if (root) delete root->deleteKey(k); } TreeNode * Tree::minimum() const { if (root) return root->minimum(); else return 0; } void Tree::print() const { cout << "{"; TreeIterator p = *this; while(p) { (*p++).print(); if(p) cout << ", ";} cout << "}" << endl; } void Tree::structurePrint() const { 110
Advanced Programming and Applied Algorithms

Trees

if (root) root->structurePrint(0); } Tree * Tree::copyTree() { return (root ? new Tree(root->cloneSubtree(0)) : 0); } void Tree::deleteTree() { if (root) { root->deleteTreeNode(); delete root;} }

6.3.2 Iteration Over a Tree We now turn to the subject of iteration over every node in a binary tree. To do this we will develop a TreeIterator class:
class TreeIterator { public: TreeIterator() { curNode = 0; } TreeIterator(const Tree &); TreeIterator(TreeNode *); TreeIterator(const TreeIterator &); virtual ~TreeIterator() { delete &curStack; }; TreeIterator & operator=(const TreeIterator &); bool operator==(const TreeIterator &) const; bool operator!=(const TreeIterator &) const; bool empty() const; operator bool() const { return !empty(); } TreeIterator & operator++() { advance(); return *this; } TreeIterator operator++(int) { TreeIterator ret = *this; advance(); return ret; } Key & operator*() { return curNode->datum; } Key * operator->() { return &(curNode->datum); } private: TreeNode * curNode; stack<TreeNode *> curStack; TreeNode * findLeftmost(TreeNode *); void advance(); };

The implementation of this iterator keeps track of the location of the iteration by keeping a current node pointer and a current stack of the nodes.
// TreeIterator Implementations

Chapter Draft of October 22, 1998

111

Data Structures

TreeIterator::TreeIterator(const TreeIterator & tp) { curNode = tp.curNode; curStack = tp.curStack; } TreeIterator & TreeIterator::operator=(const TreeIterator & tp) { if (this != &tp) { curNode = tp.curNode; curStack = tp.curStack; } return *this; } bool TreeIterator::operator==(const TreeIterator & tp) const { return curNode == tp.curNode && curStack == tp.curStack; } bool TreeIterator::operator!=(const TreeIterator & tp) const { return curNode != tp.curNode || curStack != tp.curStack; } TreeNode * TreeIterator::findLeftmost(TreeNode * r) { if (r) { while (r->left) { curStack.push(r); r = r->left; } return r; } else return 0; } TreeIterator::TreeIterator(const Tree & t) { curStack = stack<TreeNode*>(); curNode = findLeftmost(t.root); } bool TreeIterator::empty() const { return !curNode || (!curNode->right && curStack.empty()); } void TreeIterator::advance() { if (curNode) { 112
Advanced Programming and Applied Algorithms

Trees

if (curNode->right) curNode = findLeftmost(curNode->right); else if (curStack.empty()) curNode = 0; else { curNode = curStack.top(); curStack.pop(); } } }

The complexity of the iterator is not simple to analyze. Although the advance() method can take a variable amount of time, iterating over the entire set takes time proportional to the number of vertices in the tree. To see this, note that iterating over the entire tree results in two visits to each of the nodes. The ﬁrst visit takes place when the process is going left to ﬁnd the minimum element in a subtree, while the second takes place as a result of popping the stack. Since each visit requires at most a constant amount of work (if we amortize the cost of findLeftmost() over the nodes visited during its execution), the overall cost is a constant times the number of vertices. As an interesting sidelight, a quick and dirty implementation of the stack can be achieved from the List class as follows.
class TreeNodePtrElt : public ListElt { public: TreeNodePtrElt() { pTree = 0; } TreeNodePtrElt(TreeNode * t) { pTree = t; } virtual ~TreeNodePtrElt() { } int operator==(const ListElt & e) const { if (const TreeNode * t = dynamic_cast<const TreeNode *>(&e)) return pTree == t; else return false; } TreeNode * value() { return pTree; } virtual ListElt * clone() const { return new TreeNodePtrElt(this->pTree); } virtual void print() const { pTree->print(); cout << " "; } private: TreeNode * pTree; }; class TreeNodeStack : private List { public: void mkEmpty() { List::deleteList(); }

Chapter Draft of October 22, 1998

113

Data Structures

bool notEmpty () const { return List::hdr != 0; } void push(TreeNode * t) { List::prepend(new TreeNodePtrElt(t)); } TreeNode * pop() { TreeNodePtrElt * p = dynamic_cast<TreeNodePtrElt *> (List::popFirst()); return (p ? p->value() : 0 ); } using List::print; // access specifier };

One problem with this iterator is that it uses a lot of extra space for the stack. In applications where space is critical, there is a trick that will permit the iterator to work with a constant amount of extra space, if the user does not do anything with the tree while the iteration is taking place. 6.3.3 AVL Trees An AVL tree is the same as an ordered tree except that it is kept balanced. This means that we associate a height with each node in the tree, where h(x) is deﬁned as the length of the longest path from the node x to a leaf. The height of any leaf is deﬁned to be zero. Deﬁnition 6.1. A tree is said to be balanced if, for each interior node x the height of its two subtrees differs by at most 1. If one of the subtrees is nil, by convention, it is assigned a height of -1. Now we turn to the issue of whether a tree that is balanced in this sense is truly balanced in the sense of having approximately half its nodes in each subtree. To do this we need to estimate the maximum and minumu number of nodes for a given height h. Theorem 6.1. Let n be the number of vertices in a tree of height h. Then the following inequality holds Fh + 2 – 1 ≤ n ≤ 2
h+1

–1

(EQ 6.1)

where Fi is the ith Fibonacci number. Proof. By induction on height. Basis.Any tree of height 0 has exactly 1 node. By deﬁnition a tree of height -1 has zero nodes.

114

Advanced Programming and Applied Algorithms

Trees

F2 – 1 = 1 = 2

0+1

–1

Induction. Maximum. Assume the maximum is true for a tree of any height less than h. What is the maximum size of a tree of height h? Clearly the maximum is acheived when it has two equal subtrees of height h–1, for if one subtree were of height h–2, then the size of the tree would be lower. By induction then, each of the subtrees has a maximum of 2h–1 – 1 vertices. Thus the tree of height h has a maximum of 2(2h – 1) + 1 = 2h+1 – 1 vertices. Minimum. Assume that the minimum number holds for trees of height less than h. The minimum tree of height h must clearly have one subtree of height h – 1 and another of height h – 2. Thus the minimum number of vertices in a tree of height h is (Fh + 1 – 1) + (Fh – 1) + 1 = Fh + 2 – 1 Q.E.D. As useful as this Theorem is, we will need some lower bound on Fibonacci numbers to establish that the AVL balance condition gives us search times that are logarithmic in the number of elements in an AVL tree. We will begin with the following well-known formula for Fibonacci numbers: ˆ ϕ –ϕ F i = --------------5 where 1+ 5 1– 5 ˆ ϕ = --------------- and ϕ = --------------2 2 Rather than attempt to use this directly, we will prove the following lemma. Lemma 6.1. For i ≥ 0, Fi+2 ≥ ϕi. Proof. By induction on i.
i i
(EQ 6.2)

Chapter Draft of October 22, 1998

115

Data Structures

Basis For i = 0, Fi+2 = F2 = 2 > ϕ0 = 1. 1+ 5 For i = 1, Fi+2 = F3 = 3. But ϕ1 = ϕ = --------------- < 2 < 3 = F3. 2 Induction. Assume true for all values less than i. Then Fi + 2 = Fi + 1 + Fi ≥ ϕ but
2 3+ 5 1+2 5+5 ϕ = ---------------------------- = --------------- = ϕ + 1 2 4 i–1

i–2

= ϕ

i–2

(ϕ + 1) = ϕ

i – 2 3

+ 5 ---------------   2 

Therefore, Fi + 2 ≥ ϕ
i–2

(ϕ + 1) = ϕ

i–2 2

ϕ = ϕ Q.E.D.

i

With this established we can restate the bounds on the number of nodes n in an AVL tree of height h. Corollary 6.1. Let n be the number of nodes in an AVL tree of height h. Then ϕ –1≤n≤2
h h+1

–1

(EQ 6.3)

From this we can derive the following result: lg ( n + 1 ) Theorem 6.2. lg ( n + 1 ) – 1 ≤ h ≤ --------------------lgϕ Proof. If we take the lg of both sides of Equation 6.3 we get hlgϕ ≤ lg(n + 1) ≤ h + 1 The result follows immediately. Q.E.D. The result establishes that h = Ω ( lgn ) . Hence, any operation that takes time proportional to the height of an AVL tree is logarithmic in the number of vertices in that tree. AVL trees are balanced trees in which a particular algorithm is used to maintain the balance. An AVL tree can be deﬁned by simply adding a height ﬁeld to a TreeNode.
class AVLTreeNode : public TreeNode {

116

Advanced Programming and Applied Algorithms

Trees

friend class AVLTree; friend class Tree; friend class TreeIterator; public: int height() const { return _ht; } void print() const { TreeNode::print(); cout << ":" << _ht; } private: // Only friends can make a TreeNode int _ht; AVLTreeNode() { _ht = 0; } AVLTreeNode(const Key & k, TreeNode * par) : _ht(0), TreeNode(k,par) { } AVLTreeNode(const Key & k, TreeNode * par, int h) : _ht(h), TreeNode(k,par) { } AVLTreeNode(const TreeNode & tn) : TreeNode(tn) { _ht = 0; } virtual ~AVLTreeNode() { } virtual TreeNode * insert(const Key &); virtual TreeNode * deleteKey(const Key &); virtual TreeNode * clone(const Key & k, TreeNode * par) const { return new AVLTreeNode(k,par); } virtual TreeNode * cloneSubtree(TreeNode *) const; void computeHeight(); void rebalance(); void rotateLeft(); void rotateRight(); virtual void relocateNode(TreeNode *, TreeNode *, TreeNode *); }; class AVLTree : public Tree { public: AVLTree() { }; AVLTree(const AVLTree &t) : Tree(t) { } AVLTree(AVLTreeNode * tp) : Tree(tp) { } virtual ~AVLTree() { Tree::deleteTree(); }; virtual TreeNode * insert(const Key &); int height() { AVLTreeNode * r = dynamic_cast<AVLTreeNode *>(root); return (r ? r->height() : -1); } };

Chapter Draft of October 22, 1998

117

Data Structures

6.3.3.1 AVL Insertion Note that search, successor, predecessor, maximum, and minimum are

all unchanged for AVL trees, the height will make no difference. The only operations that change are insert and delete. Lets tackle insert ﬁrst. Suppose we simply invoke the insert procedure for TreeNode:
TreeNode * AVLTreeNode::insert(const Key & k) { TreeNode * retNode = TreeNode::insert(k); rebalance(); return retNode; } void AVLTreeNode::rebalance() { AVLTreeNode * l = static_cast<AVLTreeNode *>(left); AVLTreeNode * r = static_cast<AVLTreeNode *>(right); int hL = (l ? l->height() : -1); int hR = (r ? r->height() : -1); if ((hR-hL)>1) rotateLeft(); else if ((hL-hR)>1) rotateRight(); else computeHeight(); }

The problem is that the tree can come back unbalanced. Let us restrict our consideration to the case of return from an insert to the left subtree. What if it comes back unbalanced? There are two cases to consider:
• Type 1: subtrees of the left subtree, where insertion occurs are of

unequal height with insertion having occurred in the left

d h+1 b h a h-1 c h-2 e h-2 a h-1

b h d h-1 c h-2 e h-2

118

Advanced Programming and Applied Algorithms

Trees

• Type 2: subtrees of the left subtree are unequal height with insertion

having occurred on the right. f h+1 b h a h-2 c h-2 d h-1 e h-3 a h-2 c h-2 e h-3 g h-2 g h-2 d h

b h-1

f h-1

void AVLTreeNode::rotateRight() { AVLTreeNode * ll = static_cast<AVLTreeNode *>(left->left); AVLTreeNode * lr = static_cast<AVLTreeNode *>(left->right); int hLL = (ll ? ll->height() : -1); int hLR = (lr ? lr->height() : -1); if (hLL > hLR) { // rotate right Type 1 swapValues(left); left->relocateNode(lr,right,this); relocateNode(ll,left, parent); } else { // rotate right Type 2 AVLTreeNode * lrl = static_cast<AVLTreeNode *>(lr->left); AVLTreeNode * lrr = static_cast<AVLTreeNode *>(lr->right); swapValues(lr); lr->relocateNode(lrr,right,this); left->relocateNode(ll,lrl,this); relocateNode(left,lr,parent); } }

Chapter Draft of October 22, 1998

119

Data Structures

6.3.3.2 AVL Deletion

At ﬁrst the problem of deletion from an AVL tree seems difﬁcult but, in fact, it is trivial. When we delete from an AVL tree, we have two cases to consider: the tree after deletion still satisﬁes the AVL condition, in which case there is nothing to do, or 2. there exists some node at which the AVL balance condition no longer holds. • Let us examine how the second condition might arise. Consider the diagram below in which the balance condition fails:
1.

d h+3 b h Deletion here reduces height by 1 a h-1 c h-1 f h+2

This can be addressed by a simple rotation: d h+3 d h+2 b h a h-1 c h-1 e h+1 f h+2 b h g h+1 a h-1 c h-2 e h+1 f h+3 g h+1

120

Advanced Programming and Applied Algorithms

Data Base Directories

Since this is a constant number of operations, and it leaves the subtrees balanced, the total deletion time is proportional to the height of the tree, which is O(log n).
TreeNode * AVLTreeNode::deleteKey(const Key & k) { TreeNode * retNode = TreeNode::deleteKey(k); rebalance(); return retNode; }

6.4 Data Base Directories
In this course we will look at introductory data structures for respresenting data base directories, with the goal of presenting material on various types of balanced tree algorithms. The problem of building a data base directory may be stated as a class construction problem. The goal is to create a class that permits the access of data base entries according to a key, which is some value attached to records of the data base upon which we wish to conduct searches. Typically, keys are strings, but they could have various types. For example, it might be desirable to search data bases by age of employee or number of years of service. Let us construct a class interface for a typical directory:
typedef unsigned int diskLoc; class Directory { public: Directory(); Directory(const Directory &); Directory(istream &); ~Directory(); Directory & operator=(const Directory &); vector<diskLoc> find(Key &, int); vector<diskLoc> findRange(Key &, Key &, int); void insert(Key &, diskLoc loc); void deleteKey(Key &); void save(istream &); private: ... }

Chapter Draft of October 22, 1998

121

Data Structures

6.4.1 B-Trees B-trees are balanced trees that have been especially designed for use with large databases. The key observation about a large data base maintained on disk is that not only will the data records themselves be kept on disk, but most of the directory itself will be kept on disk. To understand the impact of this, consider how disk storage works. Data are stored on tracks and, within a track, are organized into pages. A typical page is quite large (2Kbytes or more) and represents the smallest unit of data that can be usefully moved between disk and main memory. Because of the seek times and rotational delays associated with accesses to a speciﬁc page on disk, it will often take 5 to 30 milliseconds or more to begin reading a page. Once reading begins, however, transfers are at very high rates. Thus, in working with disk, the usual strategy is to read large blocks and read them as seldom as possible. This strategy presens a problem for pointer-based data structures, because you will not be able to tell where the next block is to come from until you have the current one. Furthermore, simple binary trees will not be very practical, because the amount of useful information stored ina single node cannot make it worthwhile to do a whole disk access. One solution to this is to move from binary trees to k-ary trees, which have k children instead of only two. Then, you can use an algorithm like binary or even linear search to ﬁnd the right subtree to search. B-trees are a type of k-ary tree. A B-tree T is a rooted tree with root root[T], having the following properties:
1.

2.

3.

Every node x has the following ﬁelds: a. n[x], the number of keys currently stored in x, b. The n[x] keys themselves in nondecreasing order: key1[x] ≤ key2[x] ≤ ... ≤ keyn[x][x], and c. leaf[x], a boolean value that is true if x is a leaf and false if x is an internal node. If x is an internal node, it also contains n[x]+1 pointers: c0[x ], c1[x ], ..., cn[x][x ] to its children. Leaf nodes have no children so these ﬁelds are undeﬁned. The keys keyi[x] separate the ranges of keys stored in each subtree: if ki is any key stored in the subtree with root ci[x ], then k0 ≤ key1[x] ≤ k1 ≤ key2[x] ≤ k2 ≤... kn[x]-1≤ keyn[x][x] ≤ kn[x].

122

Advanced Programming and Applied Algorithms

Data Base Directories

4. 5.

Every leaf has the same height, which is the tree’s height h. There are lower and upper bounds on the number of keys a node can contain. Let the ﬁxed integer t be calle d the minimum degree of the Btree. a. Every node other than the root must have at least t-1 keys. Every internal node other than the root thus has at least t children. If the tree is nonempty, the root must have at least one key. b. A node can contain at most 2t –1 keys. Therefore, an internal node can have at most 2t children. We say that a node if full if it contains exactly 2t –1 keys.

The height of a B-tree is established by the following theorem: Theorem 6.3. If n ≥ 1, then for any n-key B-tree of height h and minimum degree t ≥ 2, n+1 h ≤ log t(----------- ) . 2 Proof. What is the minimum number of nodes in a tree of height h? We get this by counting nodes. The minimum number is obtained when the root contains one key and all the other nodes contain t – 1 keys. The tree contains 2 nodes at depth 1, 2t nodes at depth 2, 2t2nodes at depth 3 and so on. The number of keys satisﬁes the inequality: n ≥ 1 + ( t – 1 ) ∑ 2t
i=1 h i–1

t –1 h = 1 + 2 ( t – 1 )  ------------  = 2t – 1  t –1

h

Which implies the result. Q.E.D.
6.4.1.1 Basic Operations on B-trees

Searching a B-tree is straightforward: instead of making a 2-way decision at each node, we make an n[x]-way decision at each node. This could be done using binary search or linear search, as the cost of the search will be dominated by the cost of accessing disk to get directory blocks. If the search ﬁnds the desired key k in the node x, then it returns a list of locations associated with that key. Otherwise it ﬁnds the ﬁrst i such that k < keyi[x] The algorithm then recursively searches for k in the subtree ci-1[k].

Chapter Draft of October 22, 1998

123

Data Structures

Insertion is more complicated because it can cause the tree to grow. The basic idea behind insertion is to split a full node before attempting to insert into it. Thus a key component of the algorithm is a method associated with a B-tree node that splits a given child of that node, lifting its median key into the parent. Note that this will only work if the parent is guaranteed not to be full. Thus, the procedure presented here will always guarantee that a B-tree node is not full before recursively inserting at that node. It does this by ﬁrst determining the subtree into which the insertion will be made (insertions always happen at leaves), and then splitting the root of that subtree before attempting to insert into it if the subtree is full.In the special case that the root is full, it will be split and one key moved up to a new root, increasing the height of the tree by 1. The algorithms for search and insert are presented below.
static const int Bt = 2; // minimum degree of the B-tree typedef unsigned long diskLoc; class BTreeNode; typedef BTreeNode* ChildPtr; class BTreeNode { friend class BTree; public: private: BTreeNode(); BTreeNode(Key &, diskLoc); virtual ~BTreeNode(); void reserveSpace(); diskLoc find(Key &); void splitChild(int); void insertNonFull(Key &, diskLoc); int Nkeys; bool leaf; void print(int) const; ChildPtr * child; Key * key; diskLoc * location; }; class BTree { public: BTree(); BTree(BTreeNode *); BTree(istream &); virtual ~BTree(); virtual diskLoc find(Key &); virtual void insertKey(Key &, diskLoc); virtual void deleteKey(Key &, diskLoc);

124

Advanced Programming and Applied Algorithms

Data Base Directories

virtual void print() const; private: BTreeNode * root; }; // A Btree Class BTree::BTree() : root(0) { } BTree::BTree(BTreeNode * r) : root(r) { } BTree::BTree(istream & infile) : root(0) { string keyName; int i = 0; while( infile >> keyName ) { insertKey(keyName,i++); } } BTree::~BTree() { delete root; } diskLoc BTree::find(Key & k) { if (root == 0) return 0; else return root->find(k); } void BTree::insertKey(Key & k, diskLoc l) { cout << "Inserting: "; k.print(); cout << endl; if (root == 0) root = new BTreeNode(k,l); else if (root->Nkeys < 2*Bt-1) root->insertNonFull(k,l); else { BTreeNode * r = root; root = new BTreeNode(); root->leaf = false; root->child[0] = r; root->splitChild(0); root->insertNonFull(k,l); } } void BTree::deleteKey(Key & k, diskLoc l) { } void BTree::print() const { if (root == 0) cout << "Empty Tree!"; else root->print(0); }
Chapter Draft of October 22, 1998

125

Data Structures

// A Btree Node Class BTreeNode::BTreeNode() : Nkeys(0), leaf(true) { reserveSpace(); } BTreeNode::BTreeNode(Key & k, diskLoc l) : Nkeys(1), leaf(true) { reserveSpace(); key[0] = k; location[0] = l; } void BTreeNode::reserveSpace() { key = new Key[2*Bt-1]; location = new diskLoc[2*Bt-1]; child = new ChildPtr[2*Bt]; } BTreeNode::~BTreeNode(){ delete [] key; delete [] location; delete [] child; } diskLoc BTreeNode::find(Key & k) { int i; for (i = 0; i < Nkeys; i++) { if (k <= key[i]) break; } if (i == Nkeys) return (leaf ? 0 : child[Nkeys]->find(k)); else if (k == key[i]) return location[i]; else return (leaf ? 0 : child[i]->find(k)); } void BTreeNode::splitChild(int i) { BTreeNode * iChild = child[i]; BTreeNode * newChild = new BTreeNode(); newChild->leaf = iChild->leaf; newChild->Nkeys = Bt - 1; // Copy Bt-1 keys from iChild to newChild for (int j = 0; j < Bt-1; j++) { newChild->key[j] = iChild->key[j+Bt]; newChild->location[j] = iChild->location[j+Bt]; } // copy the corresponding subtrees if (!(iChild->leaf)) { for (int j = 0; j < Bt; j++) newChild->child[j] = iChild->child[j+Bt]; 126
Advanced Programming and Applied Algorithms

Data Base Directories

} iChild->Nkeys = Bt - 1; // move keys and children to make room for // a new pointer at child[i+1] for (int j = Nkeys; j > i; j--) { child[j+1] = child[j]; key[j] = key[j-1]; location[j] = location[j-1]; } child[i+1] = newChild; key[i] = iChild->key[Bt-1]; location[i] = iChild->location[Bt-1]; Nkeys++; } void BTreeNode::insertNonFull(Key & k, diskLoc l) { int i; for (i = Nkeys-1; i >= 0 && k < key[i]; i--); // here k >= key[i] && k < key[i+1] or i = -1 int insertLoc = i+1; cout << "Inserting at Location " << insertLoc << endl; if (leaf) { for (i = Nkeys-1; i >= insertLoc; i--) { key[i+1] = key[i]; location[i+1] = location[i]; } key[insertLoc] = k; location[insertLoc] = l; Nkeys++; } else { if (child[insertLoc]->Nkeys == 2*Bt - 1) { splitChild(insertLoc); if (k > key[insertLoc]) insertLoc++; } child[insertLoc]->insertNonFull(k, l); } } void BTreeNode::BTreeNode::print(int nDent) const { for (int i = 0; i < nDent; i++) { cout << " "; } for (int i = 0; i < Nkeys; i++) { key[i].print(); cout << " "; } cout << endl; if (!leaf) { for (int i = 0; i < Nkeys+1; i++) { child[i]->print(nDent+1);
Chapter Draft of October 22, 1998

127

Data Structures

} } }

Here are some examples of this process. First, we examine the behavior of splitChild for tB = 3 applied to the root.

A

D

H H

L

F

A

D

L

F

128

Advanced Programming and Applied Algorithms

Data Base Directories

Original: G M P X

A C D E B Inserted:

J K

N O

R S T U V

Y Z

G

M

P

X

A B C D E Q Inserted

J K

N O

R S T U V

Y Z

G

M

P

T

X

A B C D E

J K

N O

Q R S

U V

Y Z

L Inserted: P

G

M

T

X

A B C D E

J K L

N O

Q R S

U V

Y Z

Chapter Draft of October 22, 1998

129

Data Structures

F Inserted:

P G M T X

C

A B

D E F

J K L

N O

Q R S

U V

Y Z

Deletion from a B-tree is more complicated and the algorithm will be sketched rather than elaborated in code. The basic challenge is to descend into the tree so that only one pass will be required from top to bottom. This is done by always insuring that a node contains at least t (the minimum degree) keys rather than t–1 keys before descending to it. In some cases, this means that we will need to move a key downward in the tree before descending. In the following pseudo-code, we note that if the root of the tree ever becomes an internal node with no keys, it will be deleted and its only child will become the root of the tree. To delete a key k from a node x:
1. 2.

If the key k is in the node x and x is a leaf, simply delete k from x. If the key k is in x and x is an internal node, do the following: a. If the child y that precedes k in the tree has at least t keys, then ﬁnd the predecessor k' of k in the subtree rooted at y. Recursively delete k' from the subtree and replace k by k' in x. (Finding k' and deleting it can be performed in a single downward pass if we ensure that we always descend to nodes with t keys or more.)

130

Advanced Programming and Applied Algorithms

Data Base Directories

3.

Symmetrically, if the child z that follows k has at least t keys, then ﬁnd the successor k' of k in the tree rooted at z. Recursively delete k' from the tree and replace k by k' in x. c. Otherwise, if both y and z have only t–1 keys, merge k and all of z into y, so that x loses both k and the pointer to z, and y now has 2t-1 keys. Then free z and recursively delete k from the tree rooted at y. If the key k is not present in x, determine the root ci[x] of the subtree that must contain k if k is in the tree at all. If ci[x] has only t–1 keys, execute step 3a or 3b as appropriate to ensure that the subtree has at least t keys, then recursively visit that tree. a. If ci[x] has only t–1 keys but a sibling has at least t keys, then give ci[x] an extra key by moving a key from x down to ci[x], moving a key from the sibling up to x and moving a subtree from the sibling to x. b. If ci[x] and all of its siblings have only t–1 keys, merge ci[x] with one sibling, which involves moving a key from x down into the new merged node to become the median key for that node.
b.

Note that when the B-tree deletion procedure operates, it moves down the tree in a single pass, without backup, except that it may need to revisit a node to replace a key in step 2a or 2b. Note also that the total number of disk operations is O(d). To illustrate this process, we continue the example used for insertion.

Chapter Draft of October 22, 1998

131

Data Structures

F deleted, case 1:

P G M T X

C

A B

D E

J K L

N O

Q R S

U V

Y Z

M deleted, case 2a:

P G L T X

C

A B

D E

J K

N O

Q R S

U V

Y Z

G deleted, case 2c:

P L T X

C

A B

D

E

J

K

N O

Q R S

U V

Y Z

132

Advanced Programming and Applied Algorithms

Data Base Directories

D deleted case 3b:

C

L

P

T

X

A B

E

J

K

N O

Q R S

U V

Y Z

Tree shrinks in height

C

L

P

T

X

A B

E

J

K

N O

Q R S

U V

Y Z

B deleted, case 3a:

E

L

P

T

X

A C

J

K

N O

Q R S

U V

Y Z

Chapter Draft of October 22, 1998

133

Data Structures

6.4.1.2 Operations on Record Address Lists

In the inverted ﬁle structure suggested in this section, each key is associated with a varying-length list of locations which contains the disk addresses of each record in the data base having the speciﬁed key. The location lists could be quite long and should probably be stored on disk themselves so that they do not overwhelm the storage associated with Btree nodes, thus reducing the effectiveness of the B-tree algorithms.This structure, in which the directory points directly to a list of records with the desired key, is called an inverted ﬁle. If complex queries, involving intersection and union of key criteria, are permitted, there needs to be some way to efﬁciently cut down on the number of disk accesses associated with records that do not satisfy the query expression. In the case of intersection, the total number of records matching a particular query could be much smaller than the number of records having each key separately. We would like to be able to cut down on the number of records actually fetched from the data base by clever organization of the directory lists. If the lists of locations are maintained in sorted order, the number of records actually fetched can be pared substantially by performing a variant of update merge on the two lists of locations. The merge procedure can look at pairs of locations to determine if they are equal, only putting a location in the output list if it appears in both input lists.
vector<diskLoc> & intersect(vector<diskLoc> \$ in1, vector<diskLoc> & in2) { vector<diskLoc> * out new vector<long>; vector<diskLoc> j = in2.begin() for (vector<diskLoc> i = in1.begin(); i != in1.end(), i++) { while (j!=in2.end() && *j < *i) j++;

// Here j=in2.end() || in2[j] >= in1[i] if (j = in2.end()) break; if (*i == *j) out->push_back(*i);
} return *out; }

This and other update merge procedures take O(m+n) where m and n are the sizes of the two location lists referred to earlier.

134

Advanced Programming and Applied Algorithms

Union-Find

6.4.2 Building a Directory Suppose we begin with a simple ﬁle, where each record has a location in the DB and each record has some number of keys. How do we construct the directory and how long does it take. Here is a rough procedure for doing this:
(For each record) { let l be the location of the record; (For each searchable key in the record) { add the pair (key, l) to list; } Sort the (key. loc) pairs using merge sort; Determine the size of each leaf node; Fill leaf nodes in sequence, pushing the last key up to the next level of the hierarchy until all the keys have been assigned a node;

6.5 Union-Find
Suppose we wish to develop a set representation that must carry out three operations:
1. MakeSet(Element * x)

makes a singleton set with the element x in

it.
2. Union(Element * x, Element * y)

3.

takes the two sets represented by x and y and creates the union of the two sets, returning a pointer to the new representative element (representing the union set). Find(Element * x) returns a pointer to the representative element for the set of which x is a member.

How would we use such a representation. Here is an example Suppose we wish to build an application that determines whether you can travel between two cities entirely on Continental Airlines. The problem is that Continental adds more city pairs each day. So the phone consultants need a fast way to determine if two cities are connected by a contiguous set of Continental routes. Note that there may be thousands (even hundreds of thousands) of cities in the database. The Union-Find structure satisﬁes this need because it allows simple ways to ensure that the effects of new cities and new legs can be properly entered into the system—each time a new city is added, MakeSet is invoked on that city. Each time a new leg is added to the schedule, a Union is performed. To determine whether a Continental route

Chapter Draft of October 22, 1998

135

Data Structures

between two cities exists, we must perform Find on each city and see if they have the same representative. Later I will present a more complicated example from the theory of compilation. 6.5.1 Simple List Representation We begin with a simple approach, in which each set is represented by a single representative element and each element of the set points directly to that representor.
class Element { private: Element * parent; Element * next; int size; public: ... void MakeSet() { parent = this; size = 1; next = NULL; } void Union(Element * y) {Link(Find(), y->Find());} void Link(Element * x, Element * y) { Element * big = y; Element * small = x; if (x->size > y->size) { big = x; small = y}; big->size += small->size;

L1:

// insert all of small after the first elt of big Element * rest = big->next; big->next = small; Element * e = small; Element * last = big; while (e != NULL) { e->parent = big; last = e; e = e->next; } last->next = rest;
} Element * Find() { return parent; }

}

The basic idea behind this to have each element start out pointing to itself and, whenever a union is done, all of the elements in the smaller list are merged into the bigger one right after the head, which is also the representor of the larger list.
136

Advanced Programming and Applied Algorithms

Union-Find

The key observation is that if an element is visited for the kth time in loop L1, there must be at least at least 2k elements in the resulting set. This is easy to see by induction. Hence the total number of visits for any element is ceil lg n. In other words, the costs of MakeSet and Find are constant, while the total cost of all unions is bounded by n lg n. Hence the total cost for a mix of m operations is O(m + nlgn) 6.5.2 Disjoint-Set Forests We now turn to the development of a faster algorithm. Building on the parent pointer idea, we will reduce the cost of union at the expense of a higher cost for ﬁnd.
class Element { private: Element * parent; int rank; public: ... void MakeSet() {parent = this; rank = 0} void Union(Element * y) {Link(Find(), y->Find());} void Link(Element * x, Element * y) { if (x->rank > y->rank) then y->parent = x; else { x->parent = y; if (x->rank == y->rank) then y->rank += 1; } } Element * Find() { if (this != parent) then parent = parent->Find(); return parent; } 6.5.2.1 Analysis

Let us analyze the complexity of this algorithm. Lemma 6.2. If the rank of a root node r is k then the subtree rooted at r contains at least 2k nodes. Proof. By induction on r.

Chapter Draft of October 22, 1998

137

Data Structures

Basis:If r = 0 then the tree contains exactly one elment. Since 20 = 1, the basis is estblished. Induction: The rank of a node can be changed only if the ranks of the two roots being combined into a single tree are equal. If this is so, each subtree has rank r–1 and, by the induction hypothesis, each of these trees must have at least 2r-1 vertices. Hence the merged tree has at least 2r vertices, establishing the lemma. QED. Lemma 6.3. There are no more than n/2r nodes of rank r. Proof: Each node of rank r has at least 2r nodes. Assume that there are k>n/2r nodes of rank r. Then the subtrees rooted at these nodes have at least k2r nodes, which is greater than n, a contradiction. QED. Lemma 6.4. No vertex can have rank > lgn . Proof. Assume there exists a vertex with rank r > lgn . By Lemma 6.3 the tree can have no more than n n n ---- < ------------- ≤ -------- = 1 r lgn lgn 2 2 2 Thus there must be fewer than one node with this rank. QED A corollary of this is that the height of the tree is no more than lgn . Now we introduce the funtion F(i), deﬁned as follows F (0) = 1 F (i) = 2
F (i – 1)

Thus we can set up a table of values of n and F(n):
n F(n)

0 1 2 3
TABLE 3

1 2 4 16

Sample Values for F(n)

138

Advanced Programming and Applied Algorithms

Union-Find

n

F(n)

4 5
TABLE 3

65536 265536

Sample Values for F(n) Clearly, this function grows very rapidly. Consider its functional inverse G(n):
G(n)

G(1) = 0 G(2) = 1 G(4) = 2 G(16) = 3 G(65536) = 4 G(265536) = 5 In the literature, G(n) is known as lg* n because it is the number of times you have to take the log of a number to produce the value 1. This function can be extended to other values in a straightforward way.
G(n)

G(1) = 0 G(2) = 1 G(3-4) = 2 G(5-16) = 3 G(17-65536) = 4 This divides the integers into groups by their group numbers. Theorem 6.4. A sequence of m MakeSet, Union, and Find opeartions takes no more than O(mG(n)) time. Proof. Clearly, a MakeSet and the non-Find portion of a Union takes constant time. Hence, we must only consider the time to perform Finds.

Chapter Draft of October 22, 1998

139

Data Structures

Suppose we partition the nodes into rank groups such that every vertex of rank r is put into group G(r).
ranks Group)

0,1 2 3,4 5-16 lgn Note that G(n) = G(2

G(1) = 0 G(2) = 1 G(4) = 2 G(16) = 3 G( lgn )

lgn

) = G ( lgn ) + 1 ≥ G ( lgn ) + 1

Hence, we have rank groups 0...G(n) - 1 We will use a bookkeeping trick to account for Finds. Assume there is an edge between vertex v and its parent.
1.

2.

If v and its parent are in different groups or the parent of v is the root, then chage 1 unit to the ﬁnd. If v and its parent are in the same rank group charge one unit to the vertex.

This has the following implications.
1.

2.

Since there are no more than G(n) rank groups, no ﬁnd instruction is charged more than G(n). Hence, the total charge for O(m) ﬁnds is O(mG(n)). Consider the vertices. A vertex is charged one unit if its parent is not the root and it is in the same rank group as its parent. But then it is moved and gets a parent of a higher rank. Each time a vertex is charged, it is moved. How many times can a vertex be moved within the same rank group?. This is bounded by the number of elements in a rank group g = G(i). Note that G(i) is the smallest k such that F(k) ≥ i. F(g) is the largest element in group G and F(g-1)+1 is the smallest element. Hence, The total number of elements in group g is F(g) - F(g-1) This is the maximum number of units that can be assigned to any vertex before it acquires a parent in a higher group.

140

Advanced Programming and Applied Algorithms

Union-Find

Now consider rank group m. How many vertices can we have such that G(i) = m. n n n n 1 N (m) ≤ ∑ ----i ≤ --------------------------( ∑ ----i) ≤ -------------------) ≤ ------------) F (m – 1 F (m – 1) + 1 F (m 2 2 2 2 i = F (m – 1) + 1 i=0 Since the maximum charge to any vertex is F(m) - F(m-1), the total charge to vertices in group m is less than or equal to n ------------ ( F ( m ) – F ( m – 1 ) ) ≤ n F (m) Since there are no more than G(n) rank groups, the total change is at most nG(n). Since m≥n, we have the total cost is O(mG(n)). QED.
6.5.2.2 An Example
F (m) ∞

Let us now consider how we might apply this to a real computer science problem. The language Fortran has the ability to perform EQUIVALENCE operations, which look like this:
EQUIVALENCE EQUIVALENCE EQUIVALENCE EQUIVALENCE (A(1), D(101)) // A offset 100 from D (B(10), C(20)) // B offset 10 from C (A(10), B(1)) // A offset -9 from B (A(1), C(11)) // Error

We would like to determine whether there are any multiple equivalence errors and then determine a base array and offset for each array that can be equivalenced to it directly or indirectly.
class EqArray { private: EqArray * int int

parent; rank; offset;

public: void Declare() { parent = this; rank = 0; offset = 0; }

Chapter Draft of October 22, 1998

141

Data Structures

EqArray * FindBase() { if (this != parent) then { EqArray * p = parent; parent = parent->FindBase(); offset += p->offset; } return parent; } void Equivalence (EqArray * y, delta) { EqArray * xBase = FindBase(); EqArray * yBase = y->FindBase(); if (xBase = yBase) then Error(); else { int diffBase = x->offset - y->offset - delta; Link(xBase, yBase, diffBase); } } void Link(EqArray * x, EqArray * y, int diff) { if (x->rank > y->rank) then { y->parent = x; y->offset = -diff } else { x->parent = y; x->offset = diff; if (x->rank == y->rank) then y->rank += 1; } }

142

Advanced Programming and Applied Algorithms

Sign up to vote on this title