# West University of Timisoara Faculty of Mathematics and Computer Science Department of Computer Science Specialization Computer Science in English

BACHELOR THESIS
A C++ tree libray

Scientific Coordinator: Lector. Dr. Stelian Mihalaş

Timişoara, 2011

ABSTRACT
The main goal of this thesis was the implementation of a tree library in C++, library which was subsequently used in an application implementing an embedding algorithm for rooted trees. The library organizes data in the form of a so-called n-ary tree. This is a tree in which every node is connected to an arbitrary number of child nodes. Nodes at the same level of the tree are called „siblings", while nodes that are below a given node are called its „children". At the top of the tree, there is a set of nodes which are characterised by the fact that they do not have any parents. The collection of these nodes is called the „head" of the tree or of the forrest. The thesis is structured in six chapters, each with a great role in elaborating a briefly analyse of the tree library. It is given a concise definition of trees, different types of trees are analyzed and it is shown how a tree can be implemented. Also it is offered an overview of the algorithm and how it was created. It contains a description of all the methods used in the library and how are used. An application which uses this library is described in chapter 6. In the end is presented a short summary and suggested further deveolpments.

CONTENT

1

INTRODUCTION .......................................................................................... 5 1.1 1.2 1.3 The purpose of this library ......................................................................... 5 The Visual Studio 2010 development environment ....................................... 6 Using the library...................................................................................................7

2

TREES, CLSSIFICATION AND REPRESENTATION ..................................... 8 2.1 2.2 2.3 2.4 Trees ....................................................................................................... 8 Tree representations ................................................................................ 10 Binary trees ............................................................................................ 12 Types of binary trees ............................................................................... 19

3

THE CLASS HIERARCHY........................................................................... 21 3.1 Inheritance , classes and subclasses ........................................................... 21 3.2 The Tree class hierarchy .......................................................................... 24 3.3 The Tree class ........................................................................................ 26 3.3.2 General description ............................................................................... 26 3.4 The TreeNode class .................................................................................... 27 3.4.1 The class diagram ............................................................................. 27 3.4.2 General description ........................................................................... 27 3.5 The RootedTree class .............................................................................. 28 3.5.1 The class diagram ............................................................................. 28 3.5.2 General description ........................................................................... 28 3.6 The BinaryTree class ............................................................................... 31 3.6.1 The class diagram ............................................................................. 31 3.6.2 General description ........................................................................... 31

4

ITERATORS................................................................................................ 32 4.1 Iterators ................................................................................................. 32 4.2 The Base Iterator Class ............................................................................ 36 4.2.1 The class diagram ............................................................................. 36 4.2.2 General description ........................................................................... 37 4.3 The PreOrderIter Class ............................................................................ 38 4.3.1 The Class Diagram ............................................................................ 38 4.3.2 General Description .......................................................................... 38 4.4 The PostOrderIter Class ........................................................................... 40 4.4.1 The Class Diagram ............................................................................ 40 4.4.2 General Description .......................................................................... 40 4.5 The SiblingIter Class ............................................................................... 42

.................................... 62 ...................... 54 5................................................................2 The method code.......... 42 General Description ...............2....................1 4.... 52 5...........................2 The Serialization Method ............................5... 58 6........................................2...................... 57 6 AN APPLICATION USING THE TREE LIBRARY ..................................................................... 45 5........... 45 5...........1 Functional description ........1 Functional Description.................1 6........................................3 The Text File Used for Input ........................................2 The Binary Embedding Application .................................................................................................1............. 58 Classes and methods used ................................ 42 SERIALIZATION ...........4................................. 52 5.................. 45 5.. 61 BIBLIOGRAPHY .........................5............................... 59 7 8 CONCLUSIONS AND FURTHER DEVELOPMENTS ..........................................1 The Graphml File Format .............................................................................................2 5 The Class Diagram ..................................................................................................

though few of them are intended to do so dynamically.1 The purpose of this library I chose and treated this subject with great pleasure and interest because it is wide and on the same time applicable in many programming problems. insertions. Various types of 5 . I also think that data represented as a tree is well-structured because trees store data in a hierarchical manner. directory structures. A tree library is actually a collection of methods that together form an algorithm used to store and organise data. and sequential access. at worst it will be as effective for searching and sorting as a linked list. common data structures like binary search trees. Trees play a significant role in the organization of data for efficient information retrieval and are ideal candidates for fast searches. They are also convenient for conceptualizing algorithms. The tree library for C++ provides an STL-like container class for n-ary trees. So for data stored in a tree. reading words that later need to be searched. parsing mathematical expressions. when writing a chess program) but they are also effective in many situations where data that's received sequentially can be beneficially stored in a more organised fashion . The major advantage of trees over other data structures is that the related sorting algorithms and search algorithms such as in-order traversal can be very efficient. B-trees.1 INTRODUCTION 1. deletions.g. parsing sentences. In the last ten years or so there have been many papers which discuss algorithms for aesthetically laying out hierarchical trees. such as: organizational charts. Trees are an easy way to represent information which can be hierarchically subdivided. stack etc. or retrieved in alphabetical order. Efficiency and speed are possible because trees "spread out" the data they store. template over the data stored at the nodes.g. so that different paths in the tree lead quickly to the relevant data. At best it will be much faster. and AVL-trees. Trees are often a natural choice (e. design spaces.e.

it exposes an application programming interface (API) for creating add-ins and extensions. including colors. 7 .quickly improve the organization of your code while you’re coding. and visual designers. If the out-of-the-box development environment doesn’t offer a feature you need. For example. and it plays an important role in its implementation. a Solution Explorer for working with your projects. It introduces even more features. which allow you to type an abbreviation that expands to a code template. such as a call hierarchy. A plethora of tools are available to aid you in your quest to rapidly create quality software. In this case. you can write your own macros to automate a series of tasks you find yourself repeating. Several third-party companies have chosen to integrate their own applications with Visual. which lets you see the call paths in your code. which also changes every placen in the program that references that identifier. The operations implemented by routines in the various libraries greatly enhance productivity by saving the effort of writing and testing the code for such operations. firstly. and layout. the library was created to serve for the Binary Embending for Hierarchical Taxonomies algorithm. the Rename refactoring allows you to change an identifier name where it is defined. It is very convinient to use a tree library. and action lists for automatically generating new code. editor options. The rich and customizable development environment in Visual Studio helps you work the way you want to. 1. For more sophisticated customization. You can customize many parts of the Visual Studio environment. snippets. a Server Explorer for working with operating system services and databases. You have the Toolbox jam-packed with controls. The options are so extensive that you’ll need to know where to look to find them all. Embarcadero’s Delphi language and development environment is hosted in Visual Studio. For example.3 Using the library A library is simply a collection of prewritten routines that supports and extends the language in which is created by providing standard code units that the developer can incorporate into his programs to carry out common operations. The library represents actually a tool for this algorithm . testing utilities. because it stores data in an organized way and secondly because it is easy to implement and to follow.

The junction of two branches is a node. they have a more extensive nomenclature for referring to their various subparts. Each node in a tree has zero or more child nodes . it is one of the most important for data structures. with all other points defined relative to the root. Generally speaking. In a tree structure. Conceptually. Trees are structurally more complex than lists. or founding father).If there is a maximum number N of successors for a node . The root of a tree data structure is the most important single point of the tree structure.In particular a binary tree is a tree in which each node has either 0. data is associated. 8 . the root is a node.1 Trees A tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.2 TREES.." thus accenting the all-important root at the top. tournament winner.A free tree is a tree that is not rooted. the tree is called general tree. as are all of its subordinates.A node has at most one parent. the names are direct analogies drawn from two sources: the parts of arboreal trees and genealogical relationships. which are below it in the tree. By definition a (nonempty) tree has only one root (e. CEO. Computer scientists emphasize this aspect by drawing trees "upside down.g. Each node in a tree has exactly one predecessor and each one has at most a particular number of succesors. or stored.A node with no successors is called a leaf and there will usually be many leaves in a tree. and that root is the reference point for the entire tree. CLSSIFICATION AND REPRESENTATION 2. then the tree is called an N-ary tree. Although this biological term is less well known than the others.1 or 2 successors. with the node.If there is no limit on the number of successors that a node can have .A node that has a child is called the child’s parent node.

they represent the logical relationship between two nodes (e. of a tree connect two nodes together. etc.The branches. or edges. and vice versa. A subtree of a tree T is a tree consisting of a node in T and all of its descendants in T. supervisor-subordinate. the subtree corresponding to any other node is called a proper subtree. we shall refer to a tree as an unordered 9 . A series of branches connecting a parent to its child to the grandchild and so on is called a path. The level of the node is the same as its height. A tree with just one node (and no path) has height 1. the more numerous (and less significant) are the branches.. the World Series has height 4. but only three rounds in the tournament—or branches in the path. an external node or outer node is any node that does not have child nodes and is a leaf. winner of the game. Beware of one side effect of this convention in some mathematical derivations: a node at a higher numeric level is visually located below those with lower levels. Note that this means that the root is at level 0. An internal node or inner node in any node that has child nodes and is not a leaf node. we get a tree. if we add just one node to any forest and regard the trees of the forest as subtrees of the new node.). If we wish explicitly to ignore the order of children.Similary . The children of a node are usually ordered from left-to-right. The smallest (farthest from the root) branches are sometimes called twigs. A forest is a set (usually an ordered set) of zero or more disjoint trees. the height is one more than the longest path. The further one gets from the root.g. The subtree corresponding to the root node is the entire tree.Another way to phrase part (b) of the definition of tree would be to say that the nodes of a tree excluding the root form a forest. they are drawn as lines. For example. Therefore the words tree and forest are often used almost interchangeably during informal discussions about data structures. while the empty tree will have height zero. and the furthest leaves at level h. The longest path (including root and leaf) dictates the height of the tree: for nonempty trees. If we delete the root of a tree. parent-child. Schematically.There is very little distinction between abstract forests and trees. we have a forest. Branch is a better word than edge because branches are directional: one end is the superior and one the subordinate. conceptually. The number of branches on a single path is called its pathlength. conversely.

inorder and postorder. The relevant rule is that if a and b are siblings. common representations represent the nodes as records allocated on the heap with pointers to their children. the children of each node are drawn below the node. 2. for finding those nodes to its left and those to its right.   If a tree T is null. with relationships between them determined by their positions in the array. All nodes and descendants of nodes branching off to the right are to the right of n. below it are its children. Drawing a tree has some rules. then the empty list is the preorder. and postorder listing of T. and all descendants of such nodes. The three most important orderings are called preorder. or both. There are several useful ways in which we can systematically order all nodes of a tree.2 Tree representations There are many different ways to represent trees.tree.An arc connects a node to each of its children. A simple rule. inorder and postorder listing of T. 10 . The "left-to-right" ordering of siblings (children of the same node) can be extended to compare any two nodes that are not related by the ancestor-descendant relationship. inorder. and a is to the left of b. their parents. then all the descendants of a are to the left of all the descendants of b. then that node by itself is the preorder. is to draw the path from the root to n. these orderings are defined recursively as follows. All nodes branching off to the left of this path.Then we continue in the same manner . given a node n. If T consists a single node.The root is drawn at the top . or as items in an array. are to the left of n.

it is extremely important : all algorithms for processing trees depend upon it.maybe the left child is younger than the right.E). there could be no significance to left and right. A tree is ordered if there is more significance to the order of the subtrees. These inner trees are called subtrees. 11 . or (as is the case here) maybe the left child has the name that is earlier in the alphabet.There is a unique path from the rooth to any node.Simple as this property seems .1 In general.In this case the tree is unordered and we could redraw the tree exchanging subtrees without affecting the meaning of the tree. B is the root of a little tree (B.2 If this is a family tree . Fig.D.The depth or height of a tree is the maximum depth of the nodes in the tree.The depth or level of a node is actually the length of this path.. For example.Fig 2. each child of a node is the root of a tree within the big tree. the tree is ordered and we are not free to move around the subtrees. 2. so is C.. there may be some significance to left and right . On the other hand.For example A-B-E and C-F are paths. Then.The length of a path could be counted as either the number of nodes on the path.A-B-E has the length 3. A path in a tree is any linear subset of a tree. The subtrees of a node are the trees whose roots are the children of the node.

The left and right pointers recursively point to smaller "subtrees" on either side. because there would be too much wasted space.the empty tree. However. tree_ptr first_child. typedef struct tree_node *tree_ptr.3 Figure 2.One way to implement a tree would be to have in each node. The following declaration is typical. struct tree_node { element_type element. In the tree of Figure 5. besides its data. }. Fig. since the number of children per node can vary so greatly and is not known in advance. Null pointers are not drawn.The "root" pointer points to the topmost node in the tree. where each node contains a "left" pointer.3 Binary trees A binary tree is made of nodes.3 shows how a tree might be represented in this implementation. 2. A null pointer represents a binary tree with no elements -. node E has both a pointer to a sibling (F) and a pointer to a child (I). Arrows that point downward are first_child pointers. Arrows that go left to right are next_sibling pointers. The solution is simple: Keep the children of each node in a linked list of tree nodes. because there are too many. and a data element. a pointer to each child of the node. it might be infeasible to make the children direct links in the data structure. 2. The 12 . a "right" pointer. tree_ptr next_sibling. while some nodes have neither.

formal recursive definition is: a binary tree is either empty (represented by a null pointer). Familiar examples of binary trees are the family tree (pedigree) with a person's father and mother asdescendants .In-degree of a node is the number of edges arriving at that node and out-degree of a node is the number of edges leaving that node. n0=n2+1. The number of NULL links in a Complete Binary Tree of n-node is (n+1). 13 . The number of leaf nodes L in a perfect binary tree can be found using this formula : L=2h where h is the height of the tree. The number of nodes n in a complete binary tree is minimum: n=2h and maximum: n=2h+1-1 where h is the height of the tree. For any non-empty binary tree with n0 leaf nodes and n2 nodes of degree 2.There is at most one root node in a rooted tree and a leaf has no children.Siblings in a binary tree are nodes that share the same parent node and a node p is an ancestor of a node q if it exists on the path from q to the root. The number of leaf nodes L in a perfect binary tree can be found using the formula : n=2L-1 where L is the number of leaf nodes in the tree. with each operator denoting a branch node with its operands as subtrees. Binary trees have several properties : The number of nodes n in a perfect binary tree can be found using the formula : n=2h+1-1 where h is the height of the tree. the history of a tennis tournament with each game being a node denoted by its winner and the two previous games of the combatants as its descendants. The root node of a binary tree is the node with no parents. where the left and right pointers (recursive definition ahead) each point to a binary tree. The number of leaf nodes in a Complete Binary Tree of n-node is [n/2].The node q is then termed a descendandat of p. or is made of a single node. or an arithmetic expression with dyadic operators.

For example . all these sequences are the results of three legitimate traversals out of 8!=40. 20.Operations with binary trees Tree traversal One of the most important operations on a binary tree is traversal. 31. 2. 31. there are n! different traversals. 10. 25. 12. 29. Fig. 29. and only a few of them can be used for a different number of data. there are as many tree traversals as there are permutations of nodes . Tree traversal is the process of visiting each node in the tree exactly one time.4 The second sequence lists all nodes from level to level right to left . The definition of traversal specifies only one condition – visiting each node only one time – but it does not specify the order in which the nodes are visited. 12. visiting nodes on each level from left 14 .The first sequence lists even numbers and then odd numbers in ascending order. 2. lacks generality: For each n .Hence . 31 and the sequence 29. a separate set of traversal procedures must be implemented .320.Most of them . are rather chaotic and do not indicare much regularity so that implementing such traversals . 20. starting from the lowest level up to the root. Nevertheless. 13. 12. 10. Traversal may be interpreted as putting all nodes on one line or linearizing a tree.It is just a random jumping from node to node that in all likelihood is of no use. however . 25. 20. for a tree with n nodes . Breadth-first traversal is visiting each node starting from the lowest (or highest) level and moving down (or up) level by level . two possible traversals of the tree in the Figure 1 that may be of some use are the sequence 2. 10. 13. 2.The sequence 13. 25 does not indicate any regularity in the order of numbers or in the order of the traversed nodes.

We repeat this process until all nodes are visited. its children are on level n+1. 12. they are visited after all nodes from level n are visited.to right (or from right to left).The three tasks can themselves be ordered in 3! = 6 ways . are placed at the end of the queue. it can be reduced to three traversals where the move is always from left to right and attention is focused on the first column. left-to-right breadth-first traversal of the tree in Fig. the restriction that all nodes on level n must be visited before visiting any nodes on level n+1 is accomplished. There are some variations of the depth-first traversal: V.Considering that for a node on level n . its children. It scans the tree in 15 . 29. 10. if any. so there are six possible ordered depth-first traversals: VLR VRL LVR RVL LRV RLV If the number of different orders still seems like a lot.visiting a node L-traversing the left subtree R-traversing the right subtree An orderly traversal takes place if these tasks are performed in the same order for each node. then backs up until the first crossroad . however . and again as far as possible to the left (or right). does not clearly specify exactly when nodes are visited : before proceeding down the tree or after backing up. breadth-first traversal. and the node at the beginning of the queue is visited.The three traversals are given these standard names: VLR – preorder tree traversal LVR – inorder tree traversal LRV – postorder tree traversal Searching a binary tree does not modify the tree.1 – results in the sequence 13. 25. Depth-first traversal proceeds as far as possible to the left (or right) . by placing these children at the end of the queue . There are thus four possibilities . After a node is visited. and one such possibility – a top-down . 20. Implementation of this kind of traversal is straightforward when a queue is used. 2. Thus. 31. goes one step to the right (or left) . This definition .Consider a top-down left-to-right .

but restores it to the same condition as before it started. denoted as c_node . with a dead end has to be reached. deleting them . In analyzing the problem of traversing binary trees. Only the second approach needs some preparatory operations on the tree to become feasible: it requires threads. Deletion Deleting a node is another operation necessary to maintain a binary tree. but the tree itself remains undisturbed after such an operation.5) 16 . traversing with the aid of threads. merging trees. These threads may be created each time before the traversal procedure starts its task and removed each time it is finished. such as adding nodes . the key of the n_node to be inserted is compared to the value of a node. it has no children. three approaches have been presented: traversing with the help of a stack . the right child is tested. A t_node is found using the same technique that tree searching used.predetermined way to access some or all of the keys in the tree. the scanning is discontinued and the n_node becomes this child.Another approach is to maintain the threads in all operations on the tree when inserting a new element in the binary tree.The third approach changes it. and balancing trees to reduce their height.To insert a new node .The appropriate pointer of its parent is set to null and the node is diposed of by delete.(Fig 2. this becomes a viable option. otherwise. the left child (if any) is tried. Whether or not the tree is modified depends on the actions prescribed by visit(). modifying elements . It is by far more difficult to delete a node having two subtrees than to delete a leaf. There are certain operations that always make some systematic changes in the tree . The first approach does not change the tree during the process.There are three cases of deleting a node from the binary tree: The node is a leaf.If the traversal is performed infrequently. and traversing through tree transformation.This is the easiest case to deal with. If the child of the c_node to be tested is empty. called n_node . a tree node . and the new node has to be attached to it. the complexity of the deletion algorithm is proportional to the number of children the node has. currently being examined during a tree scan. called t_node . Tree traversal can change the tree but they may also leave it in the same condition.The level of complexity in performing the operation depends on the position of the node to be deleted in the tree. If it is less than that value.

no one-step operation can be performed since the parent’s right or left pointer cannot point to both node’s children at the same time. which is 16. The same could be done by setting the left pointer of the 17 . and there is no danger of violating the property of binary tree in the original tree by setting that rightmost node’s right pointer to the right subtree. 2. 2. This technique is called deleting by merging. the node’s children are lifted up by one level and all great-great…grandchildren lose one ―great‖ from their kinship designations. Deletion by merging solution makes one tree out of the two subtrees of the node and then attaches it to the node’s parent. so the best thing to do is to find in the left subtree the node with the greatest value and make it a parent of the right subtree.In this case. every value of the roght subtree is greater than every value of the left subtree. The parent’s pointer to the node is reset to point to the node’s child. This case is not complicated. It can be located by moving along this subtree and taking right pointers until null is encountered.6) is deleted by setting the right pointer of its parent containing 15 to point to 20’s only child. the node with the lowest value can be found in the right subtree and made a parent of the left subtree.6 The node has two children. This means that this node will not have a right child.5 The node has one child.Fig. the node containing 20 (Fig 2.For example . Fig. Symmetrically. The desired node is the rightmost node of the left subtree. In this way. By the nature of binary trees.

Programmers use a binary tree as a model to create a data structure to encode logic used to make complex decisions. You’ll recall that a binary expression evaluates to either a Boolean true or false. the if statement executes one of two sets of instructions. Next. The algorithm is asymmetric. Another solution is deletion by copying and it was proposed by Thomas Hibbard and Donald Knuth: ―If the node has two children. Based on the evaluation. possibly reducing the height of the left subtree and leaving the right subtree unaffected. Depending on the Boolean value.If the rightmost node is a leaf . First. To convert a general ordered tree to binary tree. Let’s say that a stem consists of a set of program instructions. In this way. but it still causes a problem if it is applied many times along with insertion. again. we only need to represent the general tree in left child- 18 . by moving one step to the left by first reaching the root of the node’s left subtree and then moving as far to the right as possible. Here’s how this works. which in particular is used by Lisp to represent general ordered trees as binary trees. deletion by copying removes a key k1 by overwriting it by another key k2 and then removing a key k1 along with the node that holds it. the program evaluates a binary expression. At the end of the stem. the key of the located node replaces the key to be deleted. The basic concept of a binary tree isn’t new to you because it uses Boolean logic that you learned to implement using an if statement in your program. This algorithm does not increase the height of the tree . it always deletes the node of the immediate predecessor of information in node. its immediate successor is the key in the leftmost node in the right subtree). This can be done by replacing the key being deleted with its immediate predecessor (or successor). A key’s predecessor is the key in the rightmost node in the left subtree (and analogically.‖ This is done. the first case applies. the second case is relevant. the predecessor has to be located. the program proceeds down one of two branches. An if statement evaluates an expression that results in a Boolean value.leftmost node of the right subtree to the left subtree. however. There is a one-to-one mapping between general ordered trees and binary trees. Each branch has its own set of program instructions. And that is where one of two simple cases comes into play. it can be reduced to one of two simple cases: The node is a leaf or the node has only one nonempty child. if it has one child.

chained together with their right fields. and the node only has a pointer to the beginning or head of this list. or a doubly chained tree.4Types of binary trees There are several types of binary trees: 1. through its left field. This binary tree representation of a general order tree is sometimes also referred to as a left child-right sibling binary tree (LCRS tree). 2.D. the next node in order among the children of the parent of N. or a Filial-Heir chain. The result of this representation will be automatically binary tree. the left child of N' is the node corresponding to the first child of N. One way of thinking about this is that each node's children are in a linked list.G}. Rooted Binary Tree – is a tree with a root node in which every node has at most two children. Full Binary Tree – is a tree in which every node other than the leaves has two children. in the tree on the left.7 2. A has the 6 children {B. Fig.F.sibling way. and the right child of N' is the node corresponding to N 's next sibling --.For example.C.that is. Each node N in the ordered tree corresponds to a node N' in the binary tree. 2. 19 .E. if viewed from a different perspective. It can be converted into the binary tree on the right.

The cardinal number of the set of all paths is 2 at the power0.3. Infinite Complete Binary Tree – is a tree with 0 levels . Balanced Binary Tree – is commonly defined as a binary tree in which the height of the two subtrees of every node never differ by more than 1 . 5. 9.where for each level d the number of existing nodes at level d is equal to 2d. there is only one associated child node. 10. Complete Binary Tree – is a binary tree in which every level .The cardinal number of the set of all nodes is 0. is completely filled . 7. 6. and all nodes are as far left as possible. Strictly Binary Tree – When the tree is fully expanded . Rooted Complete Binary Tree – can be identified with a free magma.This means that in a performance measurement . although in general it is a binary tree where no leaf is much farther away from the root than any other leaf. Tango Tree is a tree optimized for fast searches. 8. Balanced trees are important in information retrieval applications. the tree will behave like a linked list data structure.with 2 degree expension. except possibly the last . Perfect Binary Tree – is a full binary tree in which all leaves are at the same depth or same level 4. 20 . Degenerate Tree – is a tree where for each parent node .

classes serve as templates for individual objects. In any objectoriented language. Each object is an instance of a particular class. without having to duplicate (reimplement) the superclass's behavior.1 Inheritance . which is called its superclass. as in "a square is a kind of rectangle".In this way. One of the defining characteristics of the object-oriented paradigm is that classes form hierarchies. The subclass-superclass relationship is often confused with that of classes and instances.3 THE CLASS HIERARCHY 3. The superclass mechanism is extensively used in object-oriented programming due to the reusability that can be achieved: common features are encapsulated in modular objects. that class automatically inherits the behavior of its superclass. a subclass is a more specific version of its superclass. the square has the more restricted feature that all of its sides have the same length. The Manx cat in the table is still a class — there are many instances of Manx cats. Subclasses that wish to implement special behavior can do so via virtual methods. that object is also an instance of all other classes in the hierarchy above it in the superclass chain. And if a particular cat (an 21 . If you create an object that is an instance of a class. is it important to know what class hierarchies in C++ are. Languages may support both abstract and concrete superclasses. An "instance of cat" refers to one particular cat.One can usually think of the subclass as being "a kind of" its superclass. A superclass allows for a generic interface to include specialized functionality through the use of virtual functions. Any class can be designated as a subclass of some other class. When you define a new class in C++. A subclass is a class that inherits some properties from its superclass. As noted on this week’s section handout. classes and subclasses Firstly . which can serve as a pattern for many different objects. While all rectangles have four sides. A class represents a specialization of its superclass. most class hierarchies are tree-structured even though C++ permits more complicated structures.

instance of the cat class) happen to have its tail bitten off by a fox, that does not change the cat class. It's just that particular cat that has changed. Subclasses and superclasses are often referred to as derived and base classes, respectively, terms coined by C++ creator Bjarne Stroustrup, who found these terms more intuitive than the traditional nomenclature. Derivation is the definition of a new class by extending an existing class. The new class is called the derived class and the existing class from which it is derived is called the base class . The base class is the highest class and does not inherit from any other class. Other classes can inherit from a base class. The derived class will inherit all the features of the base class in C++ inheritance. The derived class can also add its own features, data etc., It can also override some of the features (functions) of the base class, if the function is declared as virtual in base class. A derived class can extend the base class in several ways: New instance attributes can be used, new methods can be defined, and existing methods can be overridden . If a method is defined in a derived class that has the same name as a method in a base class, the method in the derived class overrides the one in the base class. An instance of a derived class can be used anywhere in a program where an instance of the base class may be used. C++ inheritance is very similar to a parent-child relationship. When a class is inherited all the functions and data member are inherited, although not all of them will be accessible by the member functions of the derived class. But there are some exceptions to it too. Because expressions have more than one form, a C++ class that represents expressions can be represented most easily by a class hierarchy in which each of the expression types is a separate subclass, as shown in the following diagram:

Fig. 3.1

22

Even though the class hierarchy is organized in terms of the different types of nodes, clients of the expression package will almost always work with pointers to nodes instead. As I did last time, I will therefore give the pointer type the name expressionT. • The first step in creating a C++ subclass is to indicate the superclass on the header line, using the following syntax: class subclass: public subclass { body of class definition } In contrast to Java, a subclass cannot automatically override the definition of a method in its superclass. To permit such overriding, both classes must mark the prototype for that method with the keyword virtual. An abstract class is a class that doesn’t actually represent any objects but instead serves only as a common superclass for concrete classes that do generate objects. In C++, methods for an abstract class that are always implemented by the concrete subclasses are indicated by including = 0 before the semicolon on the prototype line. The Generalization relationship indicates that one of the two related classes (the subclass) is considered to be a specialized form of the other (the super type) and superclass is considered as 'Generalization' of subclass. In practice, this means that any instance of the subtype is also an instance of the superclass. An exemplary tree of generalizations of this form is found in binomial nomenclature: human beings are a subclass of simian, which are a subclass of mammal, and so on. The relationship is most easily understood by the phrase 'an A is a B' (a human is a mammal, a mammal is an animal). Inheritance is a way to compartmentalize and reuse code by creating collections of attributes and behaviors called objects which can be based on previously created objects. In classical inheritance where objects are defined by classes, classes can inherit other classes. The new classes, known as subclasses (or derived classes), inherit attributes and behavior (i.e. previously coded algorithms) of the pre-existing classes, which are referred to as superclasses (or ancestor classes). The inheritance relationships of classes gives rise to a hierarchy. In prototype-based programming, objects can be defined directly from other objects without the need to define any classes, in which case

23

this feature is called differential inheritance. Inheritance does not entail behavioral subtyping either. It is entirely possible to derive a class whose object will behave incorrectly when used in a context where the parent class is expected; see the Liskov substitution principle.

3.2 The Tree class hierarchy
For a better understanding of the tree class hierarchy I have represented the classes and their relationships in a class diagram.

Fig. 3.2 A class diagram is a type of static structure diagram that describes the structure of a system by showing the system's classes, their attributes, operations(or)methods and the relationships between the classes. It offers a prime example of the structure diagram type, and provides us with an initial set of notation elements that all other structure diagrams use. A thing to remember is that a class diagram is a static view of a system. The structure of a system is represented using class diagrams. Class diagrams are referenced time and again by the developers while implementing the system.

24

of the class. the classes are arranged in groups that share common characteristics. enabling you to easily reuse existing data and code. inheritance. each box having three rectangles inside. the lower rectangle contains the methods. 25 . A very important concept in object-oriented design. connect the boxes. and also have its own methods. we say that we have ―pure inheritance‖ when ―A‖ inherits all of the attributes and methods of ―B. These lines define the relationships.It can use all the methods use in the parent class and also it can create new methods. Binary Tree inherits Rooted Tree Class and is a subclass of this class. and then add new functionality of its own. Rooted Tree is a subclass of the Tree Class and it inherits it. In our case we can see that the Tree Class is the base class. which may have arrows at one or both ends.‖ Furthermore. Lines. like a set of parent classes and related child classes under the parent classes. When ―A‖ inherits from ―B‖ we say that ―A‖ is the subclass of ―B‖ and that ―B‖ is the superclass of ―A.‖ The UML modeling notation for inheritance is a line with a closed arrowhead pointing from the subclass to the superclass. classes in a class diagram are interconnected in a hierarchical fashion. The top rectangle contains the name of the class. To model inheritance on a class diagram.In a class diagram. also called operations. a solid line is drawn from the child class (the class inheriting the behavior) with a closed. between the classes. The TreeNode Class is a base class that doesn’t have any children or parents and for this reason it doesn’t have any relationship with the other classes. A class diagram resembles a flowchart in which classes are portrayed as boxes.In the same way . Inheritance models ―is a‖ and ―is like‖ relationships. Interestingly. unfilled arrowhead (or triangle) pointing to the super class. also called associations.The classes and interfaces in the diagram represent the members of a family tree and the relationships between the classes are analogous to relationships between members in a family tree. the middle rectangle contains the attributes of the class. refers to the ability of one class (child class) to inherit the identical functionality of another class (super class). A class diagram consists of a group of classes and interfaces reflecting important entities.It can use all the methods use in the Rooted Tree Class . A UML class diagram is similar to a family tree.

1.3.3 The Tree class 3.3.3.In this class we declare the iterators and the tree node id.It is inhereted by the Rooted Tree Class and is declared like this : 26 . Class Diagram Fig.3 3. 3.2 General description The class Tree Class is called a base class or a supperclass and all other classes are subclasses because they are derived from it.

class Tree { public: Tree(void). TreeNode * rr.2 General description The Tree Node Class contains the declaration of the nodes and their id.4.4. TreeNode * nn) { nodeId = id. TreeNode * pr. TreeNode * pa. 27 .4 The TreeNode class 3.1 The class diagram Fig.4 3. } 3.It is a base class that doesn’t have any subclasses. TreeNode * ll. Tree(int).It contains three methods that are defined below: TreeNode::TreeNode(int id. 3. Tree(Iterator &).

prev = pr.} 3. } TreeNode::~TreeNode(void) { } void TreeNode::display() { printf("nodeId = %d myNode = %8x botLeft = %8x rightUp = %8x\n".5. next = nn. right). left = ll.Also we can say that is derives from the base class which is the Tree Class. nodeId. A derived class inherits all the attributes of 28 . 3. this.1 The class diagram Fig. right = rr.5 3.parent = pa.5. left.5 The RootedTree class 3.2 General description Rooted Tree Class is a subclass of the Tree Class.

char * tok = "". int nodeId.For this case. besides the three methods that are declared in the class. TreeNode * firstNode = NULL.c_str()).But a derived class can also have methods of its own and such a class can become a base class for other classes that can be derived from it so that the inheritance can be deliberately extended. nodeId = atoi(tok). line)) { cout << "[ " << line << " ]" << endl. } string line. tok = strtok_s(myLine. while (getline(ifs. "-". 0). if(!ifs) cout << "Error: file could not be opened" << endl. char * myLine = _strdup(line. TreeNode * prevNode = NULL. firstNode = theRoot. NULL). ifs. That is. idToNodeMap[nodeId] = theRoot. The methods used in this class are declared in the code below: RootedTree::RootedTree(string file_spec) { ifstream ifs. } else { 29 . NULL. hash_map <int. NULL. the Rooted Tree Class can use all the methods from the Tree Class. if (firstNode == NULL) { theRoot = new TreeNode(nodeId.its base class. void *> idToNodeMap. the derived class contains all the class attributes contained in the base class and the derived class supports all the same operations provided by the base class. void *> :: const_iterator myIter.open(file_spec). TreeNode * currNode = NULL. hash_map <int.

NULL. idToNodeMap[nodeId] = currNode. prevNode->right = currNode. 30 . } else { firstNode = (TreeNode *)myIter->second. NULL.find(nodeId). idToNodeMap[nodeId] = currNode. TreeNode firstNode).myIter = idToNodeMap. 0)) { nodeId = atoi(tok). } } prevNode = firstNode.". ". prevNode->left = currNode. } cout << " ** tree file loaded" << endl. exit(0). prevNode = currNode.". if (myIter == idToNodeMap.end()) { cout << "error in input file " << endl. } free(myLine). 0). TreeNode firstNode). } while (tok = strtok_s(NULL. } RootedTree::~RootedTree(void) { } RootedTree * * currNode = new TreeNode(nodeId. NULL. prevNode = currNode. ". * currNode = new TreeNode(nodeId. NULL. tok = strtok_s(NULL. if (tok != NULL) { nodeId = atoi(tok).

this last class becomes base class for the BinaryTree Class. It has all the methods of the RootedTree Class and it creates two new methods that are only declared in the program. } 3.6.1 The class diagram Fig. The declaration of the methods look as follows: BinaryTree::BinaryTree(void) { } BinaryTree::~BinaryTree(void) { } 31 .2 General description The BinaryTree Class is a class derived from the RootedTree Class.RootedTree::binaryEmbed() { return 0. Because it is derived from the RootedTree. 3.6.6 3.6 The BinaryTree class 3.

pointing to some element in a range of elements (such as an array or a container). The basic idea is that for every concrete container class we will also implement a related concrete iterator derived from an abstract Iterator class. or some other sort of data structure. or "library code"--to communicate without concern for the other's internal details. and can iterate through them using the increment operator (++). the increment (++) and dereference (*) operators). Notice that while a pointer is a form of iterator. has the ability to iterate through the elements of that range using a set of operators (at least. not all iterators have the same functionality a pointer has. an iterator is any object that. and the implementer of the data structure. each container type (such as a vector) has a specific iterator type designed to iterate through its elements in an efficient way. This principle of intentional ignorance is what lets a collection of elements (in any language) expose those elements to the outside world without revealing the details of the collection's internal implementation. Iterators are an alternative to using the visitor. Probably the best definition of an interaror is this: „Provide a way to access the elements of an aggregate object sequentially without exposing its underlying representation.‖ The most obvious form of iterator is a pointer: A pointer can point to elements in an array.e. five different iterator categories exist: 32 . An iterator provides a means for visiting one-by-one all the objects in a container.1 Iterators In C++. Iterators are not unique to C++.4 ITERATORS 4. whether it is a hash table. linked list. For example. tree. i. The concept of an iterator is something that allows two parties--generally the consumer of some data structure or "client code". But other forms of iterators exist. To distinguish between the requirements an iterator shall have for a specific algorithm.

The characteristics of each category of iterators are: Fig. each iterator category implements the functionalities of all categories to its right:     Input and output iterators are the most limited types of iterators.2 33 . Forward iterators have all the functionality of input and output iterators. 4. they have the ability to access ranges non-sequentially: offsets can be directly applied to these iterators without iterating through all the elements in between. plus. specialized in performing only sequential input or ouput operations.1 In this graph. Bidirectional iterators can be iterated through in both directions.Fig. although they are limited to one direction in which to iterate through a range. All standard containers support at least bidirectional iterators types. This provides these iterators with the same functionality as standard pointers (pointers are iterators of this category). 4. Random access iterators implement all the functionalities of bidirectional iterators.

template <class Iterator. there are many ways to do this.Using the knowledge of an iterator's category one can provide optimized implementations of an algorithm. For a list. However. Hence the distance type depends on the iterator type. Also. ptrdiff_t is the distance type of all other iterators in STL and Standard Library. The iterator category. It increments (or decrements for negative n) an iterator. class Distance> inline void advance (Iterator& i.e. For a C++ array one would simply perform pointer arithmetic. 34 . There are two types that might vary depending on the iterator type: 1. The Distance Type : An operation like advance() obviously needs an argument that indicates how far to advance the iterator: template inline void advance (Iterator& i. It is useful for providing optimized versions of an operation like advance(). add n to the C++ pointer: i += n. For C++ pointers the distance type is the C++ type ptrdiff_t. The advance() operation is an example. if (n >= 0) while (n--) ++i. Distance n). is information related to an iterator. which is an abstraction that represents a set of requirements to an iterator. Distance n). else while (n++) --i. which can represent the differenc between any two C++ pointers. Iterators must step through the sequence and advance step-by-step. the distance type in STL and the Standard C++ Library is not limited to ptrdiff_t. The type of this distance argument must represent the distance between any two iterators.. Obviously. i.

More generally. For example. the value type will be T. Algorithms. in some cases the iterator is very permissive in allowing the container to change while iterating. its value type and its distance type. In STL and Standard C++ Library algorithms are separated from containers. There are many varieties of iterators each with slightly different behavior . but not every type of container supports every type of iterator. It then returns a reference to a value stored in a container. For my case I decided to represent the iterators that I’d used in a class diagram : 35 . Iterators that have greater requirements and so more powerful access to elements may be used in place of iterators with fewer requirements. which is the key design idea behind the STL. There are several reasons to use iterators.   Member functiuons. for example. if the iterator refers to a container holding integers. The <algorithm> functions use iterators. Flexible. you might decide later that the number of insertions and deletions is so high that a list would be more efficient than a vector.2. i. or erase.Each iterator has two related types. The type of this referenced value also depends on the respective iterator.e. Subscripts can not be used on most of the containers (eg. Value type and distance type are sometimes needed to implement algorithms..   Not always possible. The Value Type An iterator can be dereferenced. For example. insert. It is easily to change underlying container types. so you must use iterators in many cases. Iterator safety is defined separately for the different types of standard containers. This clear separation of containers and algorithms is the basic idea of Generic Programming. No information about the container itself is available to an algorithm. Many of the member functions for vector use iterators. the value type will be int. an algorithm takes an iterator and uses it to access the container. assign. It is possible for users to create their own iterator types by deriving subclasses from the standard std::iterator class template and this is the most convenient way in our case too. if the iterator refers to a container that stores elements of an arbitrary type T. list and map).

SiblingIter. 4. The essential difference between a container with the structure of a tree and the STL containers is that the latter are \linear‖. BreathFirstIter. PreOrderIter) are inhereted from the Iterator Class.Fig. While the STL containers thus only have essentially one way in which one can iterate over their elements.All other classes ( PostOrderIter.2 4.2. 4. this is not true for trees. The tree library provides (at present) four different iteration schemes.4 36 .3 As we can see . the Iterator Class is the Base Class.1 The Base Iterator Class The class diagram Fig 4.

2 General description The Base Iterator Class is the base class for four other classes.2. We can see how the Iterator is declared as a pointer and is called several times: Iterator the code for each method: Iterator::Iterator(void) { } Iterator::Iterator(TreeNode * node) { theNode = node. } Iterator & Iterator::operator--() { return *this.Here is 37 .4. } void Iterator::skipChildren(bool skip) { skipCurrChildren = skip. } Iterator::~Iterator(void) { } Iterator & Iterator::operator++() { return *this. } &.It contains two fields that are implemented in the program and six methods that are used. } void Iterator::skipChildren() { skipCurrChildren = true.

This class is not a base class for any other class.3.3.It has all the methods form the Iterator Claas and it defines three new methods.2 General Description The PreOrderIter Class is a class that is inhereted from the Iterator Class.4.theNode) { } PreOrderIter & PreOrderIter::operator++() { assert(this->theNode != 0).1 The Class Diagram Fig 4.5 4.3 The PreOrderIter Class 4. PreOrderIter::PreOrderIter(void) { } PreOrderIter::PreOrderIter(TreeNode * node) : Iterator(node) { skipCurrChildren = false. } PreOrderIter::PreOrderIter(const Iterator & iter) : Iterator(iter. 38 .

else return false. return it.if (!this->skipCurrChildren && this->theNode->left != 0) { this->theNode = this->theNode->left. else return false. } bool PreOrderIter::operator==(PreOrderIter & it) { if(it. while (this->theNode->next == 0) { this->theNode = this->theNode->parent. } PreOrderIter & PreOrderIter::operator--() { PreOrderIter it = *this.theNode == this->theNode) return true. } return *this. } bool PreOrderIter::operator!=(PreOrderIter & it) { if(it. } this->theNode = this->theNode->next. } else { this->skipCurrChildren = false. --(*this).theNode != this->theNode) return true. } 39 . if (this->theNode ==0) return *this.

PostOrderIter::PostOrderIter(void) { } PostOrderIter::PostOrderIter(TreeNode * node) : Iterator(node) { } PostOrderIter::PostOrderIter(const Iterator & iter) : Iterator(iter. 40 .4.2 General Description PostOrderIter Class is also a class derived from the Iterator Class.6 4.4.1 The Class Diagram Fig 4.theNode) { } PostOrderIter & PostOrderIter::operator++() { assert(this->theNode != 0).It is not a base class for any other class and it defines three new method besides the ones from the Iterator Class.4 The PostOrderIter Class 4.4. if (this->theNode->next == 0) { this->theNode = this->theNode->parent.

if(this->skipCurrChildren || this->theNode->right == 0) { this->skipCurrChildren = false. while(this->theNode->prev == 0) this->theNode = this->theNode->parent.theNode == this->theNode) return true. this->theNode = this->theNode->prev. else return false. } PostOrderIter & PostOrderIter::operator--() { assert(this->theNode != 0). } else { while (this->theNode->left) this->theNode = this->theNode->left. } else { this->theNode = this->theNode->next. if (this->skipCurrChildren) { this->skipCurrChildren = false. } } return *this. } bool PostOrderIter::operator==(PostOrderIter & it) { if(it. 41 . } return *this.this->skipCurrChildren = false. } else { this->theNode = this->theNode->right.

7 4.2 General Description The SiblingIter Class is a class that is inhereted form the Iterator Class and it defines six new methods. SiblingIter::SiblingIter(void) { } 42 .} bool PostOrderIter::operator!=(PostOrderIter & it) { if(it. } 4.5.theNode != this->theNode) return true.1 The SiblingIter Class The Class Diagram Fig.5 4. else return false.5. 4.

else return false.theNode) { setParent(). } bool SiblingIter::operator==(SiblingIter & it) { if(it. } bool SiblingIter::operator!=(SiblingIter & it) { if(it. } SiblingIter & SiblingIter::operator++() { if(this->theNode) this->theNode = this->theNode->next. this->theNode = theParent->right. else { assert(theParent). } return *this. } SiblingIter::SiblingIter(const Iterator & iter) : Iterator(iter.theNode == this->theNode) return true. else return false. } SiblingIter & SiblingIter::operator--() { if (this->theNode) this->theNode = this->theNode->prev.SiblingIter::SiblingIter(TreeNode * node) : Iterator(node) { setParent(). return *this.theNode != this->theNode) return true. } void 43 .

SiblingIter::setParent() { theParent = 0. if (this->theNode->parent != 0) theParent = this->theNode->parent. if (this->theNode == 0) return. } 44 .

undirected. GraphML Primer is a non-normative document intended to provide an easily readable description of the GraphML facilities. mixed graphs. and application-specific attributes.1 The Graphml File Format Functional Description GraphML is an XML-based file format for graphs.2. Fig.Virtual Presentation The purpose of a GraphML document is to define a graph. The GraphML file format results from the joint effort of the graph drawing community to define a common format for exchanging graph structure data. It uses an XML-based syntax and supports the entire range of possible graph structure constellations including directed.1. hypergraphs. and is oriented towards quickly understanding how to create GraphML documents.1 45 . It contains 11 nodes and 12 edges. 5. This primer describes the language features through examples which are complemented by references to normative texts. 5.5 SERIALIZATION 5. Let us start by considering the graph shown in the figure below.1 5.1.

graphdrawing. edge.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.org/xmlns/1.0" encoding="UTF-8"?> <graphml xmlns="http://graphml.xsd"> <graph id="G" edgedefault="undirected"> <node id="n0"/> <node id="n1"/> <node id="n2"/> <node id="n3"/> <node id="n4"/> <node id="n5"/> <node id="n6"/> <node id="n7"/> <node id="n8"/> <node id="n9"/> <node id="n10"/> <edge source="n0" target="n2"/> <edge source="n1" target="n2"/> <edge source="n2" target="n3"/> <edge source="n3" target="n5"/> <edge source="n3" target="n4"/> <edge source="n4" target="n6"/> <edge source="n6" target="n5"/> <edge source="n5" target="n7"/> <edge source="n6" target="n8"/> <edge source="n8" target="n7"/> <edge source="n8" target="n9"/> <edge source="n8" target="n10"/> </graph> </graphml> The GraphML document consists of a graphml element and a variety of subelements: graph.0/graphml.w3. node.org/xmlns" xmlns:xsi="http://www.The graph is contained in the file simple.graphml <?xml version="1.graphdrawing.graphdrawing.0 standard and that the encoding of the document is 46 . The first line of the document is an XML process instruction which defines that the document adheres to the XML 1.org/xmlns http://graphml.

The second attribute.org/2001/XMLSchema-instance". not surprisingly.graphdrawing. <graph id="G" edgedefault="directed"> <node id="n0"/> <node id="n1"/> .graphdrawing. If no direction is specified when an edge is declared.org/xmlns. A node is declared with a node element. xmlns:xsi="http://www. and an egde with an edge element. like all other GraphML elements. denoted by a graph element.. belongs to the namespacehttp://graphml. Nested inside a graph element are the declarations of nodes and edges. The default direction is declared as the XML 47 . in other words. Of course other encodings can be chosen for GraphML documents. the standard encoding for XML documents. A graph is. defines xsi as the XML Schema namespace. The XML Schema reference is not required but it provides means to validate the document and is therefore strongly recommended.UTF-8.org server.graphdrawing.org/xmlns/1. In our example we use the standard schema for GraphML The documents first located on the graphdrawing. <edge source="n8" target="n10"/> </graph> Graphs in GraphML are mixed.w3. defines the XML Schema location for all elements in the GraphML namespace.. The two other XML Attributes are needed to specify the XML Schema for this document. attribute. xsi:schemaLocation="http://graphml.org/xmlns" to it. For this reason we define this namespace as the default namespace in the document by adding the XML Attributexmlns="http://graphml.0/graphml.org/xmlns http://graphml.. they can contain directed and undirected edges at the same time. the default direction is applied to the edge.xsd" . The graphml element. The second line contains the root-element element of a GraphML document: the graphml element. In GraphML there is no order defined for the appearance of node and edge elements. <node id="n10"/> <edge source="n0" target="n2"/> <edge source="n1" target="n2"/> ..graphdrawing.

A GraphML-Attribute is defined by a key element which specifies the identifier. Note that the default direction must be specified. If the value of the source. GraphML-Attributes must not be confounded with XML-Attributes which are a different concept. which must be unique within the entire document. The name of the GraphML-Attribute is defined by the XML-Attribute attr..g. The id XML-Attribute is used. If you want to add structured content to graph elements you should use the key/data extension mechanism of GraphML. Simple type means that the information is restricted to scalar values. also called loops. i.Edges with only one endpoint. The value true declares a directed edge..Attribute edgedefault of the graph element. Each node has an identifier. or reflexive edges... The identifier of a node is defined by the XML-Attribute id. Optionally an identifier for the graph can be specified with the XML Attribute id. the value false an undirected edge. are defined by having the same value for source and target. With the help of the extension GraphML-Attributes one can specify additional information of simple type for the elements of the graph. target. type and domain of the attribute. selfloops. e. when it is necessary to reference the edge. the default direction is applied to this edge as defined in the enclosing graph.. when it is necessary to reference the graph. The identifier is used. . Each edge must define its two endpoints with the XML-Attributes source and target. Edges in the graph are declared by the edge element. The optional XML-Attribute directed declares if the edge is directed or undirected. name. resp.name and must be unique among all GraphML-Attributes declared in the document. must be the identifier of a node in the same document. Attributes themselfes are specialized data/key extensions. The 48 . Optionally an identifier for the edge can be specified with the XML Attribute id. <edge id="e1" directed="true" source="n0" target="n2"/> . in a document there must be no two nodes with the same identifier.e. The identifier is specified by the XML-Attribute id and is used to refer to the GraphML-Attribute inside the document. Nodes in the graph are declared by the node element. If the direction is not explicitely defined. The two possible value for this XML Attribute are directed and undirected. numerical values and strings.

Therefore this GraphML-Attribute has the default value. float. then this default value is applied to the graph element.. node. These types are defined like the corresponding types in the Java(TM)-Programming language.. the 49 . and all. . If a default value is defined for this GraphML-Attribute. yellow for this node.name="color" attr.purpose of the name is that applications can identify the meaning of the attribute. The domain of the GraphML-Attribute specifies for which graph elements the GraphML-Attribute is declared. double. The text content of the default element defines this default value..For the first kind. There are two kinds of meta-data: information about the number of elements and information how specific data is encoded in the document. as for the GraphML-Attribute weight in the above example. There can be graph elements for which a GraphML-Attribute is defined but no value is declared by a corresponding data element. Note that the name of the GraphML-Attribute is not used inside the document. or string. All XMLAttributes denoting meta-data are prefixed with parse. This value must be of the type declared in the correspondingkey definition. the value of the GraphML-Attribute is undefined for the graph element. It is possible to define a default value for a GraphML-Attribute. the identifier is used for this purpose.The type of the GraphML-Attribute can be either boolean. The value of a GraphML-Attribute for a graph element is defined by a data element nested inside the element for the graph element. To make it possible to implement optimized parsers for GraphML documents meta-data can be attached as XML-Attributes to some GraphML elements..type="string"> <default>yellow</default> </key> . The value of the GraphML-Attribute is the text content of the data element. The data element has an XML-Attribute key. In the above example the value is undefined of the GraphML-Attribute weight for the edge with identifier e3. int. In the above example no value is defined for the node with identifier n1 and the GraphML-Attribute with name color. information about the number of elements. long. edge. which refers to the identifier of the GraphML-Attribute. Possible values include graph. <key id="d0" for="node" attr. If no default value is specified.

graphdrawing.maxindegree="2" parse.For the second kind. For the node element the XML-Attribute parse.order denotes the order in which node and edge elements occur in the document. For the value nodesfirst no node element is allowed to occur after the first occurence of an edge element.nodeids="canonical" parse.order="nodesfirst"> <node id="n0" parse. Otherwise the value of the XML-Attribute is free.graphdrawing. The XML-Attribute parse.outdegree the outdegree.This file was written by the JAVA GraphML Library.edgesthe number of edges. The same holds for edges for which the corresponding XML-Attribute parse. information about element encoding.maxoutdegree the maximum outdegree.outdegree="1"/> 50 .indegree="0" parse.edgeids is defined.outdegree="1"/> <node id="n1" parse.following XML-Attributes for the graph element are defined: The XML-Attribute parse.graphdrawing.indegree="0" parse. where X denotes the number of occurences of the node element before the current element.--> <graphml xmlns="http://graphml. the following XML-Attributes for the graph element are defined: If the XML-Attribute parse.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.0/graphml. the declariation of a node is followed the declaration of its adjacent edges. For the value adjacencylist.maxoutdegree="3" parse.org/xmlns/1.0" encoding="UTF-8"?> <!-.edges="12" parse.org/xmlns http://graphml. The XML-Attribute parse.indegree denotes the indegree of the node and the XML-Attribute parse.nodes="11" parse. with the only difference that the identifiers of the edges follow the pattern eX.w3.nodes denotes the number of nodes in the graph.nodeids has the value canonical. all nodes have identifiers following the pattern nX.org/xmlns" xmlns:xsi="http://www. The following example demonstrates the parse info meta-data on our running example: <?xml version="1.xsd"> <graph id="G" edgedefault="directed" parse. the XML-Attribute parse. For the value free no order is imposed.maxindegree denotes the maximum indegree of the nodes in the graph and the XML-Attribute parse.edgeids="free" parse.

outdegree="2"/> <node id="n7" parse.outdegree="1"/> <node id="n3" parse.outdegree="2"/> <node id="n4" parse.outdegree="0"/> <edge id="edge0001" source="n0" target="n2"/> <edge id="edge0002" source="n1" target="n2"/> <edge id="edge0003" source="n2" target="n3"/> <edge id="edge0004" source="n3" target="n5"/> <edge id="edge0005" source="n3" target="n4"/> <edge id="edge0006" source="n4" target="n6"/> <edge id="edge0007" source="n6" target="n5"/> <edge id="edge0008" source="n5" target="n7"/> <edge id="edge0009" source="n6" target="n8"/> <edge id="edge0010" source="n8" target="n7"/> <edge id="edge0011" source="n8" target="n9"/> <edge id="edge0012" source="n8" target="n10"/> </graph> </graphml> Work on GraphML was initiated in a workshop during the 2000 Graph Drawing Symposium in Williamsburg.indegree="1" parse.outdegree="3"/> <node id="n9" parse.indegree="2" parse.indegree="2" parse.indegree="1" parse.<node id="n2" parse.outdegree="1"/> <node id="n6" parse.outdegree="1"/> <node id="n5" parse. Since then. Software to help add GraphML support to several popular tools and libraries is under development.outdegree="0"/> <node id="n8" parse.indegree="1" parse. The next major steps will be extensions for abstract graph layout information and templates to transform such information into a variety of graphics formats. and a proposal for the structural layer was presented at the 2001 Graph Drawing Symposium in Vienna.outdegree="0"/> <node id="n10" parse.indegree="1" parse.indegree="2" parse. extensions have been provided that support basic attribute data types and the embedding of information for light-weight parsers. 51 .indegree="1" parse.indegree="1" parse.

Serialization does not write class variables because they are not part of the state of the object. The basic mechanisms are to flatten object(s) into a one-dimensional stream of bits.g. When the resulting series of bits is reread according to the serialization format. especially in software componentry such as COM. a method for detecting changes in time-varying data. and to turn that stream of bits back into the original object(s).2. Serialization provides:     a method of persisting objects which is more convenient than writing their properties to a text file on disk.1 The Serialization Method Functional description Serialization is the process of converting a data structure or object into a format that can be stored (for example. This process of serializing an object is also called deflating or marshalling an object. extracting a data structure from a series of bytes.2 5. or transmitted across a network connection link) and "resurrected" later in the same or another computer environment. its method dictionary) because the program deserializing the stream must load that class. architecture independence must be maintained. it can be used to create a semantically identical clone of the original object. perhaps on another computer.. for maximal use of distribution. in a file or memory buffer. a method of remote procedure calls. as in SOAP a method for distributing objects. is deserialization (which is also called inflating or unmarshalling). CORBA. this process is not straightforward. and re-assembling them by reading this back in. put them on a disk or send them through a wire or wireless transport mechanism. It also does not transmit the object's class object (e.g. reverse the process: resurrect the original object(s).5. etc. Each serializable or externalizable class has a description of its serialization fields and methods. For example. The opposite operation. such as those that make extensive use of references. For many complex objects. then later. For some of these features to be useful.. a computer running on a 52 . e. It lets you take an object or group of objects.

In applications where higher performance is an issue. or simply different ways of representing data structures in different programming languages. because it enables simple. common I/O interfaces to be utilized to hold and pass on the state of an object. breaks the opacity of an abstract data type by potentially exposing private implementation details. altered. and thus : 1) detect differences between the objects being serialized and their prior copies 2) provide the input for the next such detection. because the objects to which they point may be reloaded to a different location in memory. Inherent to any serialization scheme is that. the Serialize function in Microsoft Foundation Classes) it is possible for the common code to do both at the same time.different hardware architecture should be able to reliably reconstruct a serialized data stream. It is not necessary to actually build the prior copy. In many applications this linearity is an asset. To discourage competitors from making compatible products. Even on a single machine. the serialization process includes a step called unswizzling or pointer unswizzling and the deserialization process includes a step called pointer swizzling. or made to handle input events without necessarily having to write separate code to do those things. because the encoding of the data is by definition serial. non-linear storage organization. It is useful in the programming of user interfaces whose contents are timevarying — graphical objects can be created. and reconstructed. Some deliberately obfuscate or even 53 . (for example. memory layout. Serializing the data structure in an architecture independent format means that we do not suffer from the problems of byte ordering. publishers of proprietary software often keep the details of their programs' serialization formats a trade secret. extracting one part of the serialized data structure requires that the entire object be read from start to end. however. removed. regardless of endianness. This means that the simpler and faster procedure of directly copying the memory layout of the data structure cannot work reliably for all architectures. it can make sense to expend more effort to deal with a more complex. Serialization. This is a way to understand the technique called differential execution. Since both serializing and deserializing can be driven from common code. To deal with this. since differences can be detected "on the fly". primitive pointer objects are too fragile to save.

Moreover if the state of an object is to be saved objects need to be serialized. Usually. containing at least the Ids of the source and target nodes. The method uses as input a variable of type string containing the name of the file in which the tree is saved. For each edge (arc) of the tree. 5. A graphml file is an xml file. After appending the graphml extension to the name of the file. ofstream mlFile. containing at least the ID of the node.Whenever an object is to be sent over the network objects need to be serialized. the <graph> and <graphml> elements are closed.append(". Therefore.Binary Serializable 2.open(treeFileSpec.XML Serializable The serializable interface is an empty interface it does not contain any methods. containing the ID of the graph.encrypt the serialized data. Once the node and edge lists are exhausted. Yet. the file is open for output. mlFile. Two types: 1.is_open()) { The method code 54 .graphml").2. Serialization is a mechanism by which you can save the state of an object by converting it to a byte stream. an <edge> element is created. So we do not implement any methods. elements containing representation information of the nodes is added as well. For each node (vertex) of the graph. ios::out). a <node> element is created. After writing the initial lines. remote method call architectures such as CORBA define their serialization formats in detail. TreeNode * currNode. if (!mlFile. a <graph> element is open. The method used for the serialization in graphml format of a Tree object is Tree::saveGraphml().2 void Tree::saveGraphml(string fileName) { string treeFileSpec = fileName. interoperability requires that applications be able to understand each other's serialization formats.

0\" << "<node id=\"" << currNode->nodeId << << " << "<graphml xmlns=\"http://graphml. 55 . mlFile mlFile << << "<key "<graph for=\"node\" id=\"d1\" yfiles. mlFile "\">"<<endl.yworks. mlFile << " mlFile width=\"30.w3.com/xml/yed/3\"" .org/xmlns http://graphml.0\" encoding=\"UTF-8\"?>" << transparent=\"false\"/>"<<endl. id=\""<<this->theId<<"\" edgedefault=\"directed\">" << endl .type=\"nodegraphics\"/>" .skipChildren(false).theNode. while (prIt != this->endPre()) { currNode = prIt. mlFile << " <data key=\"d1\">"<<endl.0\" />"<<endl.0\"/>"<<endl. mlFile mlFile << << " " <y:Fill <y:BorderStyle color=\"#FFCC00\" color=\"#000000\" <y:ShapeNode>"<<endl.graphdrawing.yworks.org/xmlns\"" . << " <y:Geometry height=\"30.xsd\">"<<endl .org/2001/XMLSchema<< "<?xml version=\"1. xmlns:xsi=\"http://www. prIt.return.com/xml/graphml\"" . mlFile << " mlFile <y:NodeLabel alignment=\"center\" ". mlFile mlFile instance\"" .graphdrawing. type=\"line\" width=\"1. } else { // antet mlFile endl.0/graphml.org/xmlns/1. << " autoSizePolicy=\"content\" fontFamily=\"Dialog\" fontSize=\"13\" fontStyle=\"plain\" ". // nodes and arcs PreOrderIter prIt = this->startPre(). mlFile << " xmlns:y=\"http://www. mlFile << " xmlns:yed=\"http://www. mlFile << " xsi:schemaLocation=\"http://graphml.graphdrawing.

}else { mlFile <<">"<<"</y:NodeLabel>"<<endl. } if(currNode->nodeId < 1000 ) { mlFile type=\"rectangle\"/>"<<endl. ++prIt. mlFile. if (currNode->parent != this->theHead) mlFile << "<edge id=\"e" << currNode->nodeId << "\" source=\"" << currNode->parent->nodeId << "\" target=\"" << currNode>nodeId << "\"/>" << endl.23\" x=\"9\" y=\"5\"". mlFile << "</node>" << endl. } // final mlFile << "</graph>" << endl. mlFile << "</graphml>" << endl.flush(). mlFile.close(). if(currNode->nodeId < 1000 ) { mlFile >nodeId<<"</y:NodeLabel>"<<endl. modelPosition=\"c\" visible=\"true\" width=\"11. }else { mlFile <<" } mlFile <<" </y:ShapeNode>"<<endl. <<" <y:Shape <<">"<<currNode- mlFile <<" </data> "<<endl . } } 56 . <y:Shape type=\"diamond\"/>"<<endl.mlFile mlFile <<" <<" hasBackgroundColor=\"false\" textColor=\"#000000\" hasLineColor=\"false\" height=\"20\" modelName=\"internal\" ".

17. followed by a dash and a comma separated list of the IDs of its children.5.10 8-9 10-11 15-16.31.5.32 28-29.18.13 5-6.4.txt and has the following content: 1-2.30 57 .20.25.3 The Text File Used for Input The text file used for input contains on each line the ID of a node.21.15 2-3.26 18-19.28.24 21-22. In our case.7.8. the input text file is called one_tree.23 26-17.

creates in equivalent and represents it in the memory. The application opens a file structured in a defined way .6 AN APPLICATION USING THE TREE LIBRARY 6.Taxonomies structured as hierarchies form an easier way to navigate and access the data as well as to maintain and enrich it. and even the ACM Computing Classification System are hierarchical in structure. a conflative term is always a polyseme. v0) be a rooted tree (hierarchical taxonomy). that applies to all objects. (v1 . resulting a binary tree. 58 .A taxonomy is stored in the memory and it can be embedded using a raw embedding algorithm. v2 ) ∈ V x V. A binary embedding of T is an application Φ:V → Bn such that for any pair ( v1 . Hierarchical taxonomies have become an important tool in the organization of knowledge in many domains: The US Patent Office class codes. At the top of this structure is a single classification. to later be visualized and/or edited in any graph editing software that supports the graphml file type. Nodes below this root are more specific classifications that apply to subsets of the total set of classified objects. A binary embedding has the following definition: Let T=(V. A. market research and articial intelligence.In scientific taxonomies. medicine. it parses it. This binary tree is keeping the information about the hierarchy of the nodes and itcan be exported to a Graphml file. v2) ∈ <=> (Φ(v1). the Library of Congress catalog.A hierarchical taxonomy is a tree structure of classifications for a given set of objects. the root node.Φ(v2)) ∈ . The progress of reasoning proceeds from the general to the more specific.1 The Binary Embedding Application The binary embedding of hierarchical taxonomies application is a vast application that can be applied in many domains like: systematic biology.

class CembAlgApp : public CWinApp { public: CembAlgApp().1 First class : CAboutDlg contains information about the dialog used for About Application. // Dialog Data enum { IDD = IDD_ABOUTBOX }. protected: virtual void DoDataExchange(CDataExchange* pDX).2 Classes and methods used The Embedding Algorithm contains three main classes that are represented in the class diagram below : Fig. } Second class: CembAlgApp contains information to define the class behaviors for the application.6. 6. public: 59 . class CAboutDlg : public CDialogEx { public: CAboutDlg(). // Implementation protected: DECLARE_MESSAGE_MAP().

afx_msg void OnBnClickedParseFile(). afx_msg void OnBnClickedExit(). DECLARE_MESSAGE_MAP() public: afx_msg void OnBnClickedFileOpen(). afx_msg void OnSysCommand(UINT nID. afx_msg void OnBnClickedSaveBinary(). string embedTreeFileSpec. protected: HICON m_hIcon.It is defined in the following code: class CembAlgDlg : public CDialogEx { // Construction public: CembAlgDlg(CWnd* pParent = NULL). virtual BOOL OnInitDialog(). RootedTree * origTree. afx_msg void OnPaint(). string origTreeFileSpec. }. CListBox m_event_log. saving the current tree in graphml format and for the raw embedding. afx_msg void OnBnClickedClearLog(). afx_msg HCURSOR OnQueryDragIcon(). protected: virtual void DoDataExchange(CDataExchange* pDX). RootedTree * embedTree. RootedTree * currTree.virtual BOOL InitInstance(). // standard constructor // Dialog Data enum { IDD = IDD_EMBALG_DIALOG }. 60 . LPARAM lParam). DECLARE_MESSAGE_MAP() }. afx_msg void OnBnClickedEmbedTree(). extern CembAlgApp theApp. The third class: CembAlgDlg contains methods to implement the input file processing.

Also I have used iterators. tree traversal or inserting nodes. For the library to be implemented I have used different classes and iterators. insertSubtree. like appendChild. In this paper I covered some examples of how to use trees.The library provides four different iteration schemes. moveAfter Implementing invariant computations like getDepth. with many applications‖. but are not limited to:     Implementing a breadth first iterator Implementing operations on trees. replace. they are fairly powerful data structures. and in many other applications. getNumberOfSiblings Implementing allocation and deallocation routines for better memory management 61 . getMaxDepth. I also have applied several operarions initialising. Further developments include. the one with the great importance being The Rooted Tree Class. Though trees seem complex at first. The purpose of this library is to organize data in a well-structured from which is the n-ary tree. The tree library I have created can be used in the Binary Embedding for Hierarchical Taxonomies Application.7 CONCLUSIONS AND FURTHER DEVELOPMENTS ―Trees are remarkably useful and powerful data structures.There are four main classes .

http://www. Addison-Wesley. „The Art of Computer Programming: Fundamental Algorithms”. Third Edition. Special Edition.cplusplus. Addison-Wesley. Upper Saddle River.8 BIBLIOGRAPHY [1] Donald Knuth. Inc [6] Binary Trees. Third Edition. ―C++Data Structures” . Ryba.. http://www. Bjarne. NJ 1997 [4] Nell Dale. 1997 [2] Adam Drozdek. http://cslibrary.stanford. Copyright 2003 by Jones and Bartlett Publishers [5] Kruse Robert L . „The C++ Programming Language‖ . Second Edition.osix. „Data Structures and Program Design in C++”. Brooks/Cole 2001 . “Data Structures and Algorithms in C++” ..net/modules/article/?id=348 [8] Iterators.com/reference/std/iterator/ 62 . 2000 by Prentice-Hall. Alexander J. A division by Thomson Learning [3] Stroustrup.html [7] Serialization.edu/110/BinaryTrees.