You are on page 1of 51

Balanced BST

Balanced BSTs guarantee O(logN)


performance at all times

the height or left and right sub-trees


are about the same
simple BST are O(N) in the worst case

Categories of BSTs

AVL, SPLAY trees: dynamic environment


optimal trees: static environment

E.G.M. Petrakis

Trees in Main Memory

AVL Trees
AVL (Adelson, Lelskii, Landis): the height

of the left and right subtrees differ by at


most 1
the same for every subtree

Number of comparisons for membership


operations
best case: completely balanced
worst case: 1.44 log(N+2)
expected case: logN + .25 <= log(N

E.G.M. Petrakis

Trees in Main Memory

1)
2

AVL Trees

: completely balanced sub-tree


/ : the left sub-tree is 1 higher
\ : the right sub-tree is 1 higher
E.G.M. Petrakis

Trees in Main Memory

AVL Trees

E.G.M. Petrakis

Trees in Main Memory

Non AVL Trees


critical node

//
\\
E.G.M. Petrakis

: the left sub-tree is 2 higher


: the right sub-tree is 2 higher
Trees in Main Memory

Single Right Rotations


Insertions or deletions may result in non AVL
trees => apply rotations to balance the tree

T3

T1

E.G.M. Petrakis

T1

T2

T2

Trees in Main Memory

T3

Single Left Rotations


B

T1

T2

E.G.M. Petrakis

T3

T1

Trees in Main Memory

T3

T2

insert 1

single right
rotation

1
4

4
insert 9

single left rotation

9
E.G.M. Petrakis

Trees in Main Memory

10

5
4

11

E.G.M. Petrakis

Trees in Main Memory

Double Left Rotation


Composition of two single rotations (one
right and one left rotation)

T4
T4

T3

T2

E.G.M. Petrakis

or

T1

T3

T4

T2

T3
or

T1

or in Main Memory
Trees

10

Example of Double Left Rotation


7

Critical node

4 \\
4
2
2 =

/ 8 =

2
6

/ 7 =

insert 6

E.G.M. Petrakis

Trees in Main Memory

11

Double Right Rotation


Composition of two single rotations (one
left and one right rotation)
B

T3

T4

T2

T3

E.G.M. Petrakis

or

T4

T1

T1

T1

T2

T3
or

T2
Trees in MainorMemory

12

Insertion (deletion) in AVL


1. Follow the search path to verify
that the key is not already there
2. Insert (delete) the key
3. Retreat along the search path and
check the balance factor
4. Rebalance if necessary (see next)
E.G.M. Petrakis

Trees in Main Memory

13

Rebalancing
For every node reached coming up from its
left sub-tree after insertion readjust
balance factor
= becomes / => no operation
\ becomes = => no operation
/ becomes // => must be rebalanced!!

The // node becomes a critical node


Only the path from the critical node to the
leaf has to be rebalanced!!
Rotation is applied only at the critical node!
E.G.M. Petrakis

Trees in Main Memory

14

Rebalancing (cont.)
The balance factor of the critical node
determines what rotation is to take place
single or double

If the child and the grand child (inserted


node) of the critical node are on the same
direction (both / or \) => single rotation
Else => double rotation
Rebalance similarly if coming up from the
right sub-tree (opposite signs)
E.G.M. Petrakis

Trees in Main Memory

15

Performance
Performance of membership operations on
AVL trees:
easy for the worst case!

An AVL tree will never be more than 45%


higher that its perfectly balanced
counterpart (Adelson, Velskii, Landis):

log(N+1) <= hb(N) <=


l.4404log(N+2) 0.302
E.G.M. Petrakis

Trees in Main Memory

16

Worst case AVL


Sparse tree => each sub-tree has
minimum number of nodes

E.G.M. Petrakis

Trees in Main Memory

17

Fibonacci Trees
Th: tree of height h
Th has two sub-trees, one with height h-1
and one with height h-2
else it wouldnt have minimum number of nodes

T0 is the empty sub-tree (also Fibonacci)


T1 is the sub-tree with 1 node (also
Fibonacci)

E.G.M. Petrakis

Trees in Main Memory

18

Fibonacci Trees (cont.)


Average height

Nh number of nodes of Th
Nh = Nh-1 + Nh-2 + 1
N0 = 1
N1 = 2

1
Nh
5

h 2

1 5

h 2

1 5

From which h <= 1.44 log(N+1)


E.G.M. Petrakis

Trees in Main Memory

19

More Examples
single rotation
7

insert 9
9
E.G.M. Petrakis

Trees in Main Memory

20

Examples (cont.)
double rotation
7

insert 7

E.G.M. Petrakis

7
Trees in Main Memory

21

Examples (cont.)
double rotation
8

8
6

6
insert 7

E.G.M. Petrakis

7
Trees in Main Memory

22

Examples (cont.)
single rotation
7

7
6

8
9

E.G.M. Petrakis

delete 6

Trees in Main Memory

23

Examples (cont.)
single rotation
5

6
8

8
7

9
7

delete 5
E.G.M. Petrakis

Trees in Main Memory

24

Examples (cont.)
double rotation
5

6
8

8
7

delete 5
E.G.M. Petrakis

Trees in Main Memory

25

General Deletions
5

5
3

6
E.G.M. Petrakis

2
delete 4
1

10

11

Trees in Main Memory

delete 8
10

11

26

General Deletions (cont.)


5
2

5
2

delete 8

10
delete 5

delete 6
1

E.G.M. Petrakis

10

11

Trees in Main Memory

11

27

General Deletions (cont.)


3
2
delete 5

10

11

9
E.G.M. Petrakis

Trees in Main Memory

28

Self Organizing Search


Splay Trees: adapt to query patterns

move to root (front): whenever a node is


accessed move it to the root using
rotations
equivalent to move-to-front in arrays

current node

insertions: inserted node


deletions: father of deleted node or null
if this is the root
membership: Trees
thein Main
last
accessed node
E.G.M.Petrakis
Memory
29

20

Search(10)
30

15

second rotation
13

14

14

20
10

30

15
first rotation 10

13
10

20

10
30

15
13

third rotation

20
15

30

13
14

14
E.G.M. Petrakis

Trees in Main Memory

30

Splay Cases

If the current node q has no


grandfather but it has father p =>
only one single rotation

two symmetric cases: L, R


p

b
E.G.M. Petrakis

a
Trees in Main Memory

c
31

If p has also grandfather qp => 4 cases


gp

q
c

b
E.G.M. Petrakis

RL

gp

gp

LL

c
q
b c

LL symmetric of RR
RL symmetric of RL

Trees in Main Memory

b
gp

p
d
32

current
node
5

E.G.M. Petrakis

LL

RR

7
6

Trees in Main Memory

33

1
a

4
3

RL

3
7

q
5

LR

6
e

f
g

a, b, c are sub-trees
E.G.M. Petrakis

Trees in Main Memory

34

5
1
a

2
b

E.G.M. Petrakis

4
3

i
7

f
e

Trees in Main Memory

35

Splay Performance
Splay trees adapt to unknown or changing
probability distributions
Splay trees do not guarantee logarithmic
cost for each access

AVL trees do!


asymptotic cost close to the cost of the optimal
BST for unknown probability distributions

It can be shown that the cost of m


operations on an initially empty splay tree,
where n are insertions is O(mlogn) in the
worst case
E.G.M. Petrakis

Trees in Main Memory

36

Optimal BST
Static environment: no insertions or
deletions
Keys are accessed with various
frequencies
Have the most frequently accessed
keys near the root
Application: a symbol table in main
memory
E.G.M. Petrakis

Trees in Main Memory

37

Searching
Given symbols a1 < a2 < .< an and their
probabilities: p1, p2, pn minimize
cost
n
Successful search cos t pi level (ai )
i 1

Transform unsuccessful to successful


consider new symbols E1, E2, En
- 1 2 i i+1 .n n+1
E0

E1

Ei= (i , i+1 )
E.G.M. Petrakis

E2

Ei

E0= (- , 1 )
Trees in Main Memory

En

En= (n , )
38

Unsuccessful Search
an
an-1

an-2

Ei

unsuccessful search for


all values in Ei
terminates the same
failure node (in fact, one
node higher)
failure node

an-3

E.G.M. Petrakis

cos t pi {level (ai ) 1}


i 1

Trees in Main Memory

39

Search Cost
If pi is the probability to search for
ai and qi is the probability to search in
Ei then
n
n
pi qi 1

i 1
i 1

i 1

i 1

cost pi level(ai) qi {level (Ei) 1}

E.G.M. Petrakis

unsuccessful
successful
search
search
Trees in Main Memory

40

Optimal Tree
The binary search tree with the least cost
Minimizes the average cost for searching
given the search probabilities for a static
set of given keys
The nodes are known in advance
No insertions or deletions

Not balanced in the general case


An AVL is not always optimal
E.G.M. Petrakis

Trees in Main Memory

41

Example
(a1, a2, a3) = (do, if, read)
p i = q i = 1/7

if

do

ifif

read

do

read

if
read

do

cost = 13/7
Optimal BST

cost = 15/7
E.G.M. Petrakis

Trees in Main Memory

cost = 15/7
42

read

do
read

do
if

if

cost = 15/7

cost = 15/7

E.G.M. Petrakis

Trees in Main Memory

43

Observation 1
In a BST, a subtree has nodes
that are consecutive in a sorted
sequence of keys (e.g. [5,26])
20

10
5

13
12

E.G.M. Petrakis

25
24

26

14
Trees in Main Memory

44

Observation 2
If Tij is a sub-tree of an optimal
BST holding keys from i to j then
Tij must be optimal among all
possible BSTs that store the same
keys
optimality lemma: all sub-trees of
an optimal BST are also optimal
E.G.M. Petrakis

Trees in Main Memory

45

Optimal BST Construction


1) Construct all possible trees:
1 2n
trees!!
NP-hard, there are
n 1 n
2) Dynamic programming solves the
problem in polynomial time O(n3)
at each step, the algorithm finds and
stores the optimal tree in each range
of key values
increases the size of the range at each
step until the
range is obtained 46
E.G.M. Petrakis
Treeswhole
in Main Memory

Example (successful search only):


keys
1
probabilities 0,3

10
0,2

20
0,1

40
0,4

1) BSTs with 1 node


range 1
cost= 0.3

10
0.2

20
0.1

2) BSTs with 2 nodes


range 1-10

optimal

40
0.4

k=1-10 k=10-20 k=20-20


range 10-20

optimal

10

10

cost=0.3 1+0.2 2=0.7

cost=0.2 1+0,3.2=0.8
E.G.M. Petrakis

20
20

cost=0.2+0.1 2=0.4

10
1

range 20-40

20
10

cost=0.1+0.2 2=0.5

Trees in Main Memory

40

cost=0.1+0.8=0.9
40

optimal
20

cost=0.4+0.2=0.6
47

3) BSTs with 3 nodes


k=10-40
range 10-40
10
40
20

k=1-20
range 1-20
20
1

10
cost=0.1+2 0.3+3 0.2=1.3

cost=0.2+2 0.4+3 0.1=1.3

10

20

1
20
cost=0.2+2(0.3+0.1)=1
1

10
40
cost=0.1+2(0.2+0.4)=1.3

optimal
10

10

20

20

cost=0.4+2 0.2+30.1=1.1

cost=0.3+2 0.2+3 0.1=1


E.G.M. Petrakis

40

Trees in Main Memory

48

4) BSTs with 4 nodes


range 1-40
1
40

cost=0.3+2 0.4+3 0.2+4 0.1=2.1

10

20

10
1

40
20

cost=0.2+2(0.3+0.4)+3 0.1=1.9

OPTIMAL BST

20
40

cost=0.1+2(0.3+0.4)+3 0.2=2.1

10
40
1

cost=0.4+2 0.3+3 0.2+4 0.1=2

E.G.M. Petrakis10

Trees in Main Memory

20

49

Complexity
Compute all optimal BSTs for all Cij, i,j=1,2..n
Let m=j-i: number of keys in range Cij
n-m-1 Cijs must be computed
The one with the minimum cost must be
found, this takes O(m(n-m-1)) time
2
3
(nm

m
)

O(n
)
For all Cijs it takes
1 m n
There is a better O(n2) algorithm by Knuth
There is also an O(n) greedy algorithm

E.G.M. Petrakis

Trees in Main Memory

50

Optimal BSTs
High probability keys should be near
the root
But, the value of a key is also a factor
It may not be desirable to put the
smallest or largest key close to the
root => this may result in skinny trees
(e.g., lists)
E.G.M. Petrakis

Trees in Main Memory

51