You are on page 1of 45

Ideas on Treaps

N,33 H,20 E,9 M,14 K,7 T,17 S,12 P,8 U,2 W,6 Z,4

Maverick Woo <pooh+@cmu.edu>

Disclaimer
Articles of interest
 Raimund Seidel and Cecilia R. Aragon.

Randomized search trees. Algorithmica 16 (1996), 464-497.

 Guy E. Blelloch and Margaret Reid-Miller.

Fast Set Operations Using Treaps. In Proc. 10th Annual ACM SPAA, 1998.

Of course this is joint work with Guy.
 Hopefully Daniel will also show up.
May 2, 2001 2

Background
Very high level talk
 No analysis  To make this a technical talk  i will insert a math symbol S

Some background
 Splay Trees (zig, zig zig, zig zig zig…)

 Treaps, if you still remember…

May 2, 2001

3

Agenda
Data structure research overview Treaps refresher Some current issues on Treaps

May 2, 2001

4

2001 5 . Not that many high-level problems.Data Structure Research I am not qualified to say yet.  Representing a set/ordering  Support some operations Some say it’s all about applications. but I do have some “feelings” about it.  But need to be specific enough---we can make assumptions.  Applications don’t have to very specific. May 2.

What Operations? Basic  Insert.g. Intersection. Membership Intermediate  Delete (e. Difference  Finger Search May 2.g. 2001 6 . Binomial vs. Fibonacci Heaps)  Disjoint-Union (e. Union-Find) Higher Level  Union.

k. a. “Cache-oblivious”  Runs efficiently on hierarchical memory  Avoid memory-specific parameterization  Forget data block size.  Not my theme today May 2. 2001 7 .a.Behavior Restrictions Persistence  “Functional”  More later… Architecture Independence  Relatively new. cache line width etc.

Less memory leak.    For the theoretician You don’t need to worry about side effects.Why Persistence? Many reasons for persistence  It’s practical with good garbage collectors. less dangling pointers May 2.  Functional programming makes everyone’s life easier. Better analysis possible: NESL  For the programmer   You don’t need to worry about side effects. 2001 8 .

2001 9 .  You index the web. May 2.  You build your indices with your cool data structures.  Conjunction query (AND) is intersection.  Now one of the indices can get corrupted.Real-life example 1 You are have operations working on multiple-instances.  You do the intersection on two indices.

 You even learned how to write multi- threaded programs.  Thread2 searches for y on SplayInstance42. 2001 10 .  Real-world situation: search engines May 2. in a dot-com far away…  You run a multi-processor machine.  Thread1 searches for x on SplayInstance42.Real-life example 2 You are rich.  You learned that Splay Trees are cool.  Once upon a time.

Data Structure vs. Enqueue(Q. 2001 11 . Dequeue(Q)  Need to grow. real example  (Persistent) FIFO Queues  Operations  IsEmpty(Q).x). Hacking Examples  To learn more about Splay Trees  Dial (412)-HACKERS. let’s use Linked List… May 2.  Ask for Danny Sleator…  OK.

isn’t this just a hack? May 2.  Suppose queue is x1x2…xiyi+1yi+2…yn. If one is not good enough.  In the end. may be faster with a tree.  Either Enqueue or Dequeue is going to be linear time. 2001 12 .  Represent as [x1x2…xi].  How about doubly-ended queues (deques)?  With that much extra space.[yn.yn-1…yi+1].  You can figure out the details yourself.FIFO Queues Linked List is “bad” though  Transverse to tail takes linear time. use two.

Agenda Data structure research overview Treaps refresher Some current issues on Treaps May 2. 2001 13 .

Treaps Refresher A Treap is a recursive data structure.  Assume all unique Arrange key in in-order.  datatype 'a Treap = E | T of priority * 'a Treap * 'a * 'a Treap Each node has a key and a priority. 2001 14 .  8-way independence suffices for the analysis  Can be computed with hash functions  Don’t need to store the priority  A key’s priority can be made consistent across runs May 2. priority in heap-order Priority is chosen uniformly at random.

etc. 2001 15 . May 2. etc.  Walk on the left spine.Treap Operations Membership  As in binary search trees Insert  Add as leaf by key (in-order)  Rotate up by priority (heap-order) Delete  Reverse what insert does Find-min.

2001 16 . m. r1)  Else  (root.right))  If (root. T(root. k)  (l1. root.left. root.left.left.p. root.right.k. m. m.k > k) // want to split left subtree  Let (l1. m.right) May 2.Treap Split Want top-down split (it’s faster) (less.k. l1).k. gtr) = Split(root. r1. r1) = Split(root. root.p. k)  (T(root. root. r1) = Split(root.k < k) // want to split right subtree  Let (l1. x. root. k)  If (root.

4 E.14 K.8 gtr W.17 S.17 S.33 T.33 H.9 M.20 E.8 U. 2001 17 .14 K.7 N.20 M.7 N.4 May 2.“V”) T.12 P.9 After less H.2 Z.2 W.12 P.Treap Split Example Before Split(Tr.6 Z.6 U.

6 Z.2 Z.33 H.12 P.7 S.8 U.17 gtr W. Only 4 new nodes created All on the search path to “V” N.14 K.4 S.6 U.20 W.4 N.8 E.9 M.14 K.Treap Split Persistence These figures are deceptive.17 H.7 May 2. 2001 less T.12 P.20 E.9 M.2 18 .33 T.

gtr.k.p)  T(less. less. gtr))  Else  T(gtr. less.p > gtr.left).k.right) May 2. Join(less.right. gtr.Treap Join Join(less. gtr) // less < x < gtr  Handle empty less or gtr  If (less. Join(less. gtr.p.p. 2001 19 .left.

17 S.2 W.14 K.20 M.12 P.20 E. 2001 20 .7 N.9 M.7 T.17 S.Treap Join Example After Join(less.6 Z.33 T.8 U.8 gtr W.9 Before less H.12 P.33 H.gtr) N.2 Z.4 E.14 K.4 May 2.6 U.

2001 21 .Treap Running Time All expected O(lg n) Also of note is Finger Search  Given a finger in a treap  Find the key that is d away in sorted order  Expected O(lg d) time  Require parent pointers  Evil… Waste so much space See Seidel and Aragon for details. May 2.

x.  T(p1. Union(a.gtr) = Split(b.  Let (less.p1). gtr)) May 2. (k2.Treap Union Treaps really shine in set operations.left.k1). Union(a. 2001 22 . less).b)  Suppose roots are (k1. Union(a.right. k1.p2)  WLOG assume p1 > p2.

2001 23 .left.Treap Intersection Inter(a.x.gtr) = Split(b. gtr))  Else  T(p1. Inter(a. Inter(a. (k2.left. sorry dude  Join(Inter(a.right. gtr)) May 2. k1. less).p2).right. less).k1)  If x is null // k1 is not in b. Inter(a.b)  Suppose roots are (k1.p1). p1>p2  Let (less.

Treap Difference Similar to intersection  Change the logic a bit  Messier because it is not symmetric Leave as an exercise to the reader. 2001 24 . May 2.

2001 25 .Points of Note Persistence  Did you see a side effect? (assignments?) Parallelization  Parallelize without persistence is a pain.  Very natural divide-and-conqueror  Run the two recursive calls on different CPUs Running times… May 2.

2001 26 .Set Operation Running Time For two sets of size m and n (m · n)  Optimal is  q(m lg (n/m)) What’s known before this work  With AVL Trees. O(m lg(n/m))  Rather complicated algorithms  For the sake of your smooth digestion…  Compare this to O(m+n) or O(m lg n)  With Treaps  Can use Finger Search if we have parent pointers  Does not parallelize---multiple fingers??? May 2.

Set Operation Running Time What’s known after this work  No parent pointers  Parallelize naturally  Optimal expected running time  O(m lg (n/m))  Analysis available in Blelloch and Miller  Relatively simple algorithm  Experimental results  6.8 speedup on 8-processor SGI machine  4.4 speedup on 5-processor Sun machine May 2.1-4.3-6. 2001 27 .

Agenda Data structure research overview Treaps refresher Some current issues on Treaps May 2. 2001 28 .

Danny??? May 2.A Word on Splay Trees Splay Trees are slow in practice!  Even a single simple search would require O(lg n) pointer updates! Skip Lists are way simpler and faster. 2001 29 . Let’s switch all Splay Trees to Skip Lists.

 Then praise Skip Lists. I wonder if that works. 2001 30 .  Splay Trees are not much slower than Skip List in practice.  Ditch Splay Trees---say they are slow.  ask who’s my advisor. So I tried. May 2.Bruce said… First find Danny. Danny will  refute by quoting experimental studies.

Current Issues on Treaps Treaps are simpler than Splay Trees  No famous conjecture for my back pocket  Neat idea from Adam Kalai  Not self-adjusting  Access introduces more explicit changes Adding data compression to Treaps Finger search on Treaps  Work by Guy + Daniel Blandford May 2. 2001 31 .

Adding Compression to Treaps Search engines  Infrequent offline update (once a month)  Frequent online query and set operations  Keys are unique. Let’s compress the keys! Assume they are 64-bit integers. May 2.  Keys can be huge and occurs sparsely. 2001 32 .

Begin with the simplest---Array The naïve approach  Compress the whole array  When need to access an element  decompress the whole array  do the access  compress the whole array again May 2. 2001 33 .We’ve got a problem! I don’t know how to deploy data compression to general data structures.

 Now we are back to “constant” time! Shh!!! That could be a trade secret. 2001 34 . May 2.  Compress each block individually.Isn’t that dumb? Any suggestions? Use chunking  Divide the array into blocks of size C.  Of course they use something better than vanilla array.

2001 35 .  Need better chunking rules Chunks  Can’t be too big---hurt running time  Can’t be too small---hurt compression (space) May 2.Chunking a Treap A sub-tree is a chunk.  Desire consistent chunk size But Treaps are usually not full.

May 2.Vocab Internal node and Leaf block More precisely  datatype tblock = Packed of int * key * key * key vector | UnPacked of int * int * key vector datatype trearray = TE | TB of tblock | TN of trearray * key * trearray All running time are in expected case. 2001 36 .

log(maxP) For n=(p. then n is an internal node.  Otherwise. maxP . Trick done when a key is inserted.Idea 1 – Thresholds Priority is in the range 1 to maxP Invent a threshold Pth  e.  Also maintained by various operations. 2001 37 .g. n is in some leaf block. May 2.k)  If p > Pth.

2001 38 . constant ratio between internal keys to “keys in block”. N keys  log N internal nodes  Height is log log N. each w/ a block  Expect (N-log N) / O(log N) keys / block  Binary search in block takes O(log N) May 2.Idea 1 – Features On average.  O(log N) “bottom” node. With Pth = maxP .log(maxP).

Set operations rely on Join and Split’s O(log n) running time. 2001 39 . Join. Looking good… May 2. Insert is also O(log n).Idea 1 – Running Time Query is still O(log n). Split both take O(log n).

Idea 1 – Problems Asymptotic bound Need to work out the constants  Exact analysis in progress  I now think of Knuth even higher…  SML implementation  Make the idea as concrete as code  Can now do more experiments May 2. 2001 40 .

Idea 1 – Questions Do we really need to maintain consistent priority across runs?  Make things simpler  But Union looks suspicious What compression algorithm to use?  No general data compression  Take advantage of index distribution May 2. 2001 41 .

2001 42 .Idea 2 – Small Blocks Want a more-or-less constant block size Small blocks are more realistic Say 20  Processor specific---fit cache line size  How well can we compress 20 integers? Leave for second stage investigation May 2.

 Now if SML has a debugger… Space time tradeoff is very real May 2. 2001 43 .  Good for sloppy people like me.Perhaps I can share Writing down algorithm as code helps  Pseudo code are good for short algorithms  Real code is more concrete.  Actual SML code  You can figure out you missed some cases.

2001 44 .Treap Finger Search Daniel is working on it. No parent pointers needed Can mimic parent pointers by reversing root-to-(last accessed leaf) path Should probably leave this to him May 2.

don’t kick me too hard… May 2. 2001 45 .Q&A / Suggestions Work in progress. welcome suggestions Danny.