Advanced Data Structures

Saurabh Pandey
Department of Computer Application Azad Institute of Engineering & Technology Lucknow, India e-mail: saurabhpandey7@indiatimes.com
Abstract— This paper investigates how to maintain an efficient dynamic ordered set of bit strings, which is an important problem in the field of information search and information processing. Generally, a dynamic ordered set is required to support 5 essential operations including search, insertion, deletion, max-value retrieval and next-larger-value retrieval. Based on previous research fruits, we present an advanced data structure named rich binary tree (RBT), which follows both the binary-search-tree property and the digital-searchtree property. Also, every key K keeps the most significant difference bit (MSDB) between itself and the next larger value among K’s ancestors, as well as that between itself and the next smaller one among its ancestors. With the new data structure, we can maintain a dynamic ordered set in O(L) time. Since computers represent objects in binary mode, our method has a big potential in application. In fact, RBT can be viewed as a general-purpose data structure for problems concerning order, such as search, sorting and maintaining a priority queue. For example, when RBT is applied in sorting, we get a linear-time algorithm with regard to the key number and its performance is far better than quick-sort. What is more powerful than quick-sort is that RBT supports constant-time dynamic insertion/deletion. Keywords- - information processing - dynamic ordered set algorithms and data structures - rich binary tree Introduction. I. INTRODUCTION

and communication. We hope to give a cleaner and more modern view than you might have seen before. • Data structures for range queries (and other geometric problems) and problems on trees. There are surprising equivalences between such problems, as well as interesting solutions.

Figure 1: Structure in lab view Data structures optimized for external memory, and cache-oblivious data structures. Any problem (e.g., sorting, priority queues) is different when you're dealing with disk instead of main memory, or you care about cache performance. Memory hierarchies have become important in practice because of the recent escalation in data size. I.
COMPLEXITY OF ALGORTHIM

Data structures play a central role in modern computer science. You interact with data structures much more often than with algorithms (think of Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course will cover major results and current directions of research in data structures: • Classic comparison-based data structures. The area is still rich with open problems, such as whether there is a single best (dynamically optimal) binary search tree. • Dynamic graph problems. In almost any network, a link's availability and speed are anything but a constant, which has led to a re-evaluation of the common understanding of graph problems: how to maintain essential information such as a minimum-weight spanning forest while the graph changes. • Integer data structures: beating the O(lg n) barrier in sorting and searching. If you haven't seen this before, beating O(lg n) may come as a surprise. If you have seen this before, you might think that it's about a bunch of grudgy bit-tricks. In fact, it is about fundamental issues regarding information

An algorthim is a sequence of steps that gives method of solving a problem.It creates the logic program.Efficency of alogrithm depends upon on two major criteria,first one is run time of that algorthim and second is the space.Run time of algorthim means the time taken by program for exeucation.complexity tends to be used to characterize something with many parts in intricate arrangement. The study of these complex linkages is the main goal of network theory and network science. In science[1] there are at this time a number of approaches to characterizing complexity, many of which are reflected in this article. In a business context, complexity management is the methodology to minimize value-destroying complexity and efficiently control value-adding complexity in a cross-functional

approach.

Figure 2: Web Mining Process II.
CHOOSING

AN ALOGORITHM

• For every problem there is a multitude of algorithms that solve the problem. So youhave a choice of algorithms to code up as programs. • If a program is likely to be used only once on a small amount of data, then you should select the algorithm that is easiest to implement. Code it up correctly, run it and move on to something else. • But if the program will be used many times and has a lifetime that makes maintenance likely, then other factors come into play including readability, extensibility, portability, reusability, ease of use and efficiency. It is efficiency that we be looking at in this part of the module. III.
BEST ,WORST AND AVERAGE CASE

In computer science, best, worst and average cases of a given algorithm express what the resource usage is at least, at most and on average, respectively. Usually the resource being considered is running time, but it could also be memory or other resources.

much time might be needed in the worst case to guarantee that the algorithm will always finish on time. Average performance and worst-case performance are the most used in algorithm analysis. Less widely found is best-case performance, but it does have uses: for example, where the best cases of individual tasks are known, they can be used to improve the accuracy of an overall worst-case analysis. The term best-case performance is used in computer science to describe the way an algorithm behaves under optimal conditions. For example, the best case for a simple linear search on a list occurs when the desired element is the first element of the list. Worst-case performance analysis and average case performance analysis have some similarities, but in practice usually require different tools and approaches. On the other hand some algorithms like hash tables have very poor worst case behaviours, but a well written hash table of sufficient size will statistically never give the worst case; the average number of operations performed follows an exponential decay curve, and so the run time of an operation is statistically bounded. Linear search on a list of n elements. In the worst case, the search must visit every element once. This happens when the value being searched for is either the last element in the list, or is not in the list. However, on average, assuming the value searched for is in the list and each list element is equally likely to be the value searched for, the search visits only n/2 elements. Quicksort applied to a list of n elements, again assumed to be all different and initially in random order. This popular sorting algorithm has an average-case performance of O(n log n), which contributes to making it a very fast algorithm in practice. But given a worst-case input, its performance degrades to O(n2). IV. SORTING ALGORITHM In computer science and mathematics, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important to optimizing the use of other algorithms (such as search and merge algorithms) that require sorted lists to work correctly; it is also often useful for canonical data and for producing human-readable output. The output is in non decreasing order (each element is no smaller than the previous element according to the desired total order). since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.Although many consider it a solved problem, useful new sorting algorithms are still being invented (for example, library sort was first published in 2004). Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide-and-conquer algorithms, data structures, randomized algorithms, best,

In real-time computing, the worst-case execution time is often of particular concern since it is important to know how

worst and average case analysis, time-space trade offs, and lower bounds.

A red-black tree is a binary search tree where each node has a color attribute, the value of which is either red or black. In addition to the ordinary requirements imposed on binary search trees, the following requirements apply to red-black trees: A node is either red or black. The root is black. (This rule is sometimes omitted from other definitions. Since the root can always be changed from red to black but not necessarily vice-versa this rule has little effect on analysis.) All leaves are black. Both children of every red node are black. Every simple path from a given node to any of its descendant leaves contains the same number of black nodes.

Time complexity in big O notation Average Space O(n) Search O(log n) Insert O(log n) Delete O(log n) Figure 3: Bubble Sort V.
RED BLACK TREE

Worst case O(n) O(log n) O(log n) O(log n)

VI.

EULER TOUR TREE

A red-black tree is a type of self-balancing binary search tree, a data structure used in computing science, typically used to implement associative arrays. The original structure was invented in 1972 by Rudolf Bayer[1] and named "symmetric binary B-tree," but acquired its modern name in a paper in 1978 by Leonidas J. Guibas and Robert Sedgewick.[2] It is complex, but has good worst-case running time for its operations and is efficient in practice: it can search, insert, and delete in O(log n) time, where n is total number of elements in the tree. Put very simply, a redblack tree is a binary search tree that inserts and removes intelligently, to ensure the tree is reasonably balanced.. The resulting algorithm is named as ‘Weighted Page Rank’[9]. A red-black tree is a special type of binary tree, used in computer science to organize pieces of comparable data, such as text fragments or numbers. Red-black trees, like all binary search trees, allow efficient in-order traversal in the fashion, Left-Root-Right, of their elements. The search-time results from the traversal from root to leaf, and therefore a balanced tree, having the least possible tree height, results in O(log n) search time.

Dynamic trees, also known as link-cut trees. Link-cut trees are able to represent a dynamic forest of rooted trees in O(log n) amortized time per operation. examining a simpler, although not strictly better, alternative to link-cut trees known as Euler tour trees. We will then use Euler tour trees to achieve dynamic connectivity in general graphs in O(log2 n) time. Finally we will survey some of what is and is not known for dynamic graphs. Euler Tour trees are due to Henzinger and King [1] and are an alternative to link-cut trees for representing dynamic trees. Euler tour trees are simpler and easier to analyze than link-cut trees, but do not naturally store aggregate information about paths in the tree. Euler tour trees are well suited for storing aggregate information on subtrees, which is a feature we will use in the next section. The idea behind Euler Tour trees is to store the Euler tour of the tree. In an arbitrary graph, an Euler tour is a path that traverses each edge exactly once. For trees we say that each edge is bidirectional, so the Euler tour of a tree is the path through the tree that begins at the root and ends at the root, traversing each edge exactly twice | once to enter the subtree, and once to exit it. The Euler tour of a tree is essentially the depth traversal of a tree that returns to the root at the end.

input, are cited in most algorithms going quadratic in reallife applications.3 No matter how hard implementers try, they cannot (without great sacrifice of speed) defend against all inputs. This note describes an adversarial method that finds chinks in the defenses of any implementation. A polymorphic implementation of quicksort, such as the standard C function qsort, never looks at the data. It relies instead on an externally supplied comparison function. And that allows us to monitor and influence the program’s progress noninvasively. To do so we make a comparison function that observes the pattern of comparisons and constructs adverse data on the fly. Recall that quicksort sorts a sequence of n data items in three phases: 1. Pick a data item as pivot.We assume that this phase uses Figure 5:The correspondence between a tree and its Euler O(1) comparisons. 2. Partition the data into three parts that respectively tour contain all items less than the pivot, the pivot item itself, and all items greater than the pivot. The placement of items equal An Euler-Tour tree supports the following operations: to the pivot varies among implementations. FindRoot(v) Find the root of the tree containing node v. In quicksort( void *a, int low, int high ) the Euler tour of a tree, the root is Visited rest and last. { Therefore we simply return the minimum or maximum int pivot; element in the BST. Cut(v) Cut the subtree rooted at v from the rest of the tree. /* Termination condition! */ Note that the Euler tour of v's subtree is a contiguous if ( high > low ) subsequence of visits that starts and ends with v, contained in { the sequence of visits for the whole tree. To cut the subtree pivot = partition( a, low, high ); rooted at v, we may simply split the BST before its rst and quicksort( a, low, pivot-1 ); after its last visit to v. This splitting gives us the Euler tour of quicksort( a, pivot+1, high ); the tree before reaching v, the tour of v's subtree, and the tour } of the tree after leaving v. Concatenating the rst and last } pieces together, and possibly deleting one redundant visitation between the end of the rst piece and beginning of the last, gives us our answer. Link(u; v) Insert u's subtree as a child of v. In the resulting Quicksort is a divide and conquer algorithm. Quicksort Euler tour, we need to traverse u's subtree immediately after first divides a large list into two smaller sub-lists: the low and immediately before visits to v. Therefore we will split v's elements and the high elements. Quicksort can then traversal before the last visit to v, and then concatenate onto recursively sort the sub-lists. the left piece a singleton visit to v, followed by the Euler tour The steps are: of u's subtree, followed by the right piece. Pick an element, called a pivot, from the list. Each of these operations performs a constant number of Reorder the list so that all elements with values less than search, split, and merge operations on the Euler tour tree. the pivot come before the pivot, while all elements with Each of these operations takes O(log n) per operation on a values greater than the pivot come after it (equal values can balanced BST data structure. Consequently, the total running go either way). After this partitioning, the pivot is in its final time for Euler-Tour trees is O(log n) per operation. If we use position. This is called the partition operation. a B-tree with fanout of (log n) instead of a balanced BST in Recursively sort the sub-list of lesser elements and the our Euler tour trees, we can achieve O(log n= log log n) sub-list of greater elements.we boost the weights of links in searches (from the depth of the tree) and O(log2 n= log log whose anchor - a window of a fixed width - query terms n) updates (from the depth times the branching factor). occur. AVERAGE COMPLEXITY:Even if pivots aren't chosen randomly, quicksort still QUICK SORT requires only time over all possible permutations of its input. Because this average is simply the sum of the times Quicksort can be made to go quadratic by constructing input over all permutations of the input divided by n factorial, it's on the fly in response to the sequence of items compared.The equivalent to choosing a random permutation of the input. technique is illustrated by a specific adversary for the When we do this, the pivot choices are essentially random, standard C qsort function. The generalmethod works against leading to an algorithm with the same running time as any implementation of quicksort–even a randomizing one– randomized quicksort. More precisely, the average number that satisfies certain very mild and realistic assumptions. of comparisons over all permutations of the input sequence When using quicksort one often feels a nagging tension: can be estimated accurately by solving the recurrence suppose it goes quadratic? Tactics to avoid embarrassing relation: results in some low-entropy cases, such as already ordered

Here, n − 1 is the number of comparisons the partition uses. Since the pivot is equally likely to fall anywhere in the sorted list order, the sum is averaging over all possible splits. This means that, on average, quicksort performs only about 39% worse than the ideal number of comparisons, which is its best case. In this sense it is closer to the best case than the worst case. This fast average runtime is another reason for quicksort's practical dominance over other sorting algorithms. The disadvantage of the simple version above is that it requires O(n) extra storage space, which is as bad as merge sort. The additional memory allocations required can also drastically impact speed and cache performance in practical implementations. There is a more complex version which uses an in-place partition algorithm and can achieve the complete sort using O(log n). The outline of a formal proof of the O(nlogn) expected time complexity follows. Assume that there are no duplicates as duplicates could be handled with linear time pre- and post-processing, or considered cases easier than the analyzed. Choosing a pivot, uniformly at random from 0 to n − 1, is then equivalent to choosing the size of one particular partition, uniformly at random from 0 to n − 1. Figure 8. Quick sort search time. The operations we are allowed to do are to copy a finger, follow a pointer to a node, and to retrieve or change the contents of a node’s fields. So, the BST model allows for a subset of the data structures allowable in the pointer-machine model. We will focus on data structures in this model that solve the dynamic connectivity problem. VIII. ACKNOWLEGMENT I would like to acknowledge and extend my sincere gratitude to the following persons who have made the completion of this Research possible: Head of the Department of Computer Science and Engineering Mr. Shafeeq Ahmad. Asst. Prof. Mr. Hemant Kumar Singh, for providing me with timely guidance and motivation and all Computer Science and Engineering Department faculty members and Staff. REFERENCES
[1] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. & Stein, Clifford (2001). Introduction to Algorithms. Chapter 1: Foundations (Second ed.). Cambridge, MA: MIT Press and McGraw-Hill. pp. 3–122. ISBN 0262032937. Sedgewick, Robert (1998). Algorithms in C, Parts 1-4: Fundamentals, Data Structures, Sorting, Searching (3rd ed.). Reading, MA: Addison-Wesley Professional. ISBN 9780201314526.

VII. CONCLUSION Various algorithms and sorting used in advanced data struture to arrange the data in memory. Algorithm analysis is important in practice because the accidental or unintentional use of an inefficient algorithm can significantly impact system performance. In time-sensitive applications, an algorithm taking too long to run can render its results outdated or useless. Quicksort is a spaceoptimized version of the binary tree sort. Instead of inserting items sequentially into an explicit tree, quicksort organizes them concurrently into a tree that is implied by the recursive calls. The algorithms make exactly the same comparisons, but in a different order. The most direct competitor of quicksort is heapsort. Heapsort's worst-case running time is always . But, heapsort is assumed to be on average somewhat slower than quicksort. This is still debated and in research, with some publications indicating the opposite. In Quicksort remains the chance of worst case performance except in the introsort variant, which switches to heapsort when a bad case is detected. If it is known in advance that heapsort is going to be necessary, using it directly will be faster than waiting for intro sort to switch to it. Red-black trees, like all binary search trees, allow efficient in-order traversal in the fashion, Left-RootRight, of their elements. The search-time results from the traversal from root to leaf, and therefore a balanced tree, having the least possible tree height, results in O(log n)

[2]

[3] [4]

[5]

[6] [7] [8]

[9]

Knuth, Donald. The Art of Computer Programming. AddisonWesley. Greene, Daniel A.; Knuth, Donald E. (1982). Mathematics for the Analysis of Algorithms (Second ed.). Birkhäuser. ISBN 374633102X. Goldreich, Oded (2008). Computational Complexity: A Conceptual Perspective. Cambridge University Press. ISBN 9780521884730. Mathworld: Red-Black Tree San Diego State University: CS 660: Red-Black tree notes, by Roger Whitney Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7 . Chapter 13: Red-Black Trees Brin, Sergey; Page, Lawrence; “The Anatomy of a Large-Scale Hypertextual Web Search Engine;” 7th Int. WWW Conf. Proceedings, Brisbane, Australia; April 1998.

[10] ^ J. M. Zayed, N. Nouvel, U. Rauwald, O. A. Scherman, Chemical Complexity – supramolecular self-assembly of synthetic and biological building blocks in water, Chemical Society Reviews, 2010, 39, 2806–2816 http://pubs.rsc.org/en/Content/ArticleLanding/2010/CS/b922348g [11] ^ Weaver, Warren (1948). "Science and Complexity". American Scientist 36 (4): 536. PMID 18882675. http://www.ceptualinstitute.com/genre/weaver/weaver-1947b.htm. Retrieved 2007-11-21 [12] ^ R.E. Tarjan, U. Vishkin: "Finding biconnected components and computing tree functions in logarithmic parallel time" [13] A. LaMarca and R. E. Ladner. "The Influence of Caches on the Performance of Sorting." Proceedings of the Eighth Annual ACMSIAM Symposium on Discrete Algorithms, 1997. pp. 370–379. [14] Faron Moller. Analysis of Quicksort. CS 332: Designing Algorithms. Department of Computer Science, Swansea University. Conrado Martínez and Salvador Roura, Optimal sampling strategies in quicksort and quickselect. SIAM J. Computing 31(3):683-705, 2001.