You are on page 1of 54

CHAPTER 10

Search Structures

All the programs in this file are selected from


Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed
“Fundamentals of Data Structures in C”,
Computer Science Press, 1992.
AVL Trees
• Dynamic tables may also be
maintained as binary search trees.
• Depending on the order of the
symbols putting into the table, the
resulting binary search trees would
be different. Thus the average
comparisons for accessing a symbol is
different.
Binary Search Tree for The
Months of The Year
Input Sequence: JAN, FEB, MAR, APR, MAY, JUNE, JULY, AUG,
SEPT, OCT, NOV, DEC
JAN

FEB MAR

APR JUNE MAY

AUG JULY SEPT

DEC OCT
Max comparisons: 6
NOV
Average comparisons: 3.5
A Balanced Binary Search Tree
For The Months of The Year

Input Sequence: JULY, FEB, MAY, AUG, DEC, MAR, OCT, APR,
JAN, JUNE, SEPT, NOV
Max comparisons: 4
JULY Average comparisons: 3.1

FEB MAY

JAN
AUG MAR OCT

APR DEC JUNE NOV SEPT


Degenerate Binary Search
Tree
APR
AUG Input Sequence: APR, AUG, DEC, FEB,
JAN, JULY, JUNE, MAR, MAY, NOV, OCT,
DEC
SEPT
FEB
JAN
JULY
JUNE
MAR
MAY

Max comparisons: 12 NOV


OCT
Average comparisons: 6.5
SEPT
Minimize The Search Time of Binary
Search Tree In Dynamic Situation

• From the above three examples, we know that the


average and maximum search time will be
minimized if the binary search tree is maintained
as a complete binary search tree at all times.
• However, to achieve this in a dynamic situation,
we have to pay a high price to restructure the
tree to be a complete binary tree all the time.
• In 1962, Adelson-Velskii and Landis introduced a
binary tree structure that is balanced with
respect to the heights of subtrees. As a result of
the balanced nature of this type of tree, dynamic
retrievals can be performed in O(log n) time if
the tree has n nodes. The resulting tree remains
height-balanced. This is called an AVL tree.
AVL Tree
• Definition: An empty tree is height-balanced. If T
is a nonempty binary tree with TL and TR as its left
and right subtrees respectively, then T is height-
balanced iff
(1) TL and TR are height-balanced, and
(2) |hL – hR| ≤ 1 where hL and hR are the heights of TL
and TR, respectively.
• Definition: The Balance factor, BF(T) , of a node T
is a binary tree is defined to be hL – hR, where hL
and hR, respectively, are the heights of left and
right subtrees of T. For any node T in an AVL
tree, BF(T) = -1, 0, or 1.
Balanced Trees Obtained for
The Months of The Year
-2
0
0 RR
MAR MAY
MAR -1
0 0
MAY NOV
(a) Insert MARCH 0 MAR

NOV

(c) Insert NOVEMBER


-1
+1
MAR
0 MAY
+1 0
MAY
MAY NOV
(b) Insert MAY 0
AUG (d) Insert AUGUST
Balanced Trees Obtained for
The Months of The Year
(Cont.)
+2 +1
MAY LL MAY
+2 0 0 0
MAR NOV AUG NOV
+1 0 0
AUG APR MAR
0 (e) Insert APRIL
APR +2 0
MAY MAR
-1 0 0 -1
LR
AUG NOV AUG MAY
0 +1 0 0 0

APR MAR APR JAN NOV


0
JAN (f) Insert JANUARY
Balanced Trees Obtained for
The Months of The Year
(Cont.)

+1 +1

MAR MAR
-1 -1 -1
-1
AUG MAY AUG MAY
0 0 0 0
0 +1
APR NOV APR JAN NOV
JAN
0 0
0
DEC DEC JULY

(g) Insert DECEMBER (h) Insert JULY


Balanced Trees Obtained for
The Months of The Year
(Cont.)
+2 +1
MAR MAR
-2 -2 RL -1
0
AUG MAY DEC MAY
0 +1 0 +1 0 0

APR JAN NOV AUG JAN NOV


-1 0 0 0 0
DEC JULY APR FEB JULY
0
FEB

(i) Insert FEBRUARY


Balanced Trees Obtained for
The Months of The Year
(Cont.)
+2
MAR
0
LR
-1 -1
JAN
DEC MAY
+1 0
+1 -1 0 DEC MAR
AUG JAN NOV
+1 0 -1 -1
0 0 -1
AUG FEB JULY MAY
APR FEB JULY
0 0 0 0
JUNE APR JUNE NOV

(j) Insert JUNE


Balanced Trees Obtained for
The Months of The Year
(Cont.)
-1 -1
JAN RR JAN
+1 -1 +1 0
DEC MAR DEC MAR
+1 0 -1 -2 +1 0 -1 0
AUG FEB JULY MAY AUG FEB JULY NOV
0
0 0 -1 0 0 0
APR JUNE NOV APR JUNE MAY OCT
0
OCT

(k) Insert OCTOBER


Balanced Trees Obtained for
The Months of The Year
(Cont.)
-1

JAN
+1 -1

DEC MAR

+1 0 -1 -1

AUG FEB JULY NOV

0 0 0 -1
APR JUNE MAY OCT

(i) Insert SEPTEMBER SEPT


Rebalancing Rotation of
Binary Search Tree
• LL: new node Y is inserted in the left subtree of
the left subtree of A
• LR: Y is inserted in the right subtree of the left
subtree of A
• RR: Y is inserted in the right subtree of the right
subtree of A
• RL: Y is inserted in the left subtree of the right
subtree of A.
• If a height–balanced binary tree becomes
unbalanced as a result of an insertion, then these
are the only four cases possible for rebalancing.
Rebalancing Rotation LL
LL

+1 +2 0
A A B

0 +1 0
B B A
h+2 h+2
AR h AR BL
BL BR BL BR
BR AR

height of BL increases
to h+1
Rebalancing Rotation RR
RR

-1 -2 0
A A B

0 -1 0
B B A
h+2 h+2
AL AL BR
BL BR BL BR
AL BL

height of BR increases
to h+1
Rebalancing Rotation LR(a)

+1 +2 0
A A C
LR(a)
0 -1 0 0
B B B A

0
C
Rebalancing Rotation LR(b)

LR(b)
+1 +2
A A 0
C

0 -1
B 0 -1
B
B A
h+2
0 AR +1 h+2
h AR
C C
h BL BL
CL CR CL CR CL CR
h BL AR
Rebalancing Rotation LR(c)

+2 0
A C
LR(c)
-1 +1 0
B B A
h+2
-1 AR
C
BL CL CR
h BL AR
CL CR
AVL Trees (Cont.)
• Once rebalancing has been carried
out on the subtree in question,
examining the remaining tree is
unnecessary.
• To perform insertion, binary search
tree with n nodes could have O(n) in
worst case. But for AVL, the
insertion time is O(log n).
AVL Insertion Complexity
• Let Nh be the minimum number of nodes in a
height-balanced tree of height h. In the worst
case, the height of one of the subtrees will be
h-1 and that of the other h-2. Both subtrees
must also be height balanced. Nh = Nh-1 + Nh-2 + 1,
and N0 = 0, N1 = 1, and N2 = 2.
• The recursive definition for Nh and that for the
Fibonacci numbers Fn= Fn-1 + Fn-2, F0=0, F1= 1.
• It can be shown that hN 2 h
= Fh+2 – 1. Therefore we
can derive that N h   / 5  1 . So the worst-case
insertion time for a height-balanced tree with n
nodes is O(log n).
Probability of Each Type of
Rebalancing Rotation

• Research has shown that a random


insertion requires no rebalancing, a
rebalancing rotation of type LL or RR,
and a rebalancing rotation of type LR
and RL, with probabilities 0.5349,
0.2327, and 0.2324, respectively.
Comparison of Various
Structures
Operation Sequential List Linked List AVL Tree

Search for x O(log n) O(n) O(log n)

Search for kth O(1) O(k) O(log n)


item
Delete x O(n) O(1)1 O(log n)

Delete kth item O(n - k) O(k) O(log n)

Insert x O(n) O(1)2 O(log n)

Output in order O(n) O(n) O(n)

1. Doubly linked list and position of x known.


2. Position for insertion known
2-3 Trees
• If search trees of degree greater than 2 is used, we’ll have
simpler insertion and deletion algorithms than those of AVL trees.
The algorithms’ complexity is still O(log n).
• Definition: A 2-3 tree is a search tree that either is empty or
satisfies the following properties:
(1) Each internal ndoe is a 2-node or a 3-node. A 2-node has one
element; a 3-node has two elements.
(2) Let LeftChild and MiddleChild denote the children of a 2-node.
Let dataL be the element in this node, and let dataL.key be its
key. All elements in the 2-3 subtree with root LeftChild have
key less than dataL.key, whereas all elements in the 2-3 subtree
with root MiddleChild have key greater than dataL.key.
(3) Let LeftChild, MiddleChild, and RightChild denote the children
of a 3-node. Let dataL and dataR be the two elements in this
node. Then, dataL.key < dataR.key; all keys in the 2-3 subtree
with root LeftChild are less than dataL.key; all keys in the 2-3
subtree with root MiddleChild are less than dataR.key and
greater than dataL.key; and all keys in the 2-3 subtree with
root RightChild are greater than dataR.key.
(4) All external nodes are at the same level.
2-3 Tree Example
A
40

B C
10 20 80
The Height of A 2-3 Tree

• Like leftist tree, external nodes are


introduced only to make it easier to
define and talk about 2-3 trees. External
nodes are not physically represented
inside a computer.
• The number of elements in a 2-3 tree
with height h is between 2h - 1 and 3h - 1.
Hence, the height of a 2-3 tree with n
elements is between  log 3 (n  1) and  log 2 (n  1)
2-3 Tree Data Structure
typedef struct two_three *two_three_ptr;
struct two_three {
element data_l, data_r;
two_three_ptr left_child,
middle_child,
right_child;
};
Searching A 2-3 Tree
• The search algorithm for binary search tree can
be easily extended to obtain the search function
of a 2-3 tree (Search()23).
• The search function calls a function compare
that compares a key x with the keys in a given
node p. It returns the value 1, 2, 3, or 4,
depending on whether x is less than the first
key, between the first key and the second key,
greater than the second key, or equal to one of
the keys in node p.

Program 10.4: Function to search a 2-3 tree


Insertion Into A 2-3 Tree
• First we use search function to search the 2-3
tree for the key that is to be inserted.
• If the key being searched is already in the tree,
then the insertion fails, as all keys in a 2-3 tree
are distinct. Otherwise, we will encounter a unique
leaf node U. The node U may be in two states:
– the node U only has one element: then the key can be
inserted in this node.
– the node U already contains two elements: A new node is
created. The newly created node will contain the
element with the largest key from among the two
elements initially in p and the element x. The element
with the smallest key will be in the original node, and the
element with median key, together with a pointer to the
newly created node, will be inserted into the parent of
U.
Insertion to A 2-3 Tree
Example

A A
40 20 40

B C B D C
10 20 70 80 10 30 70 80

(a) 70 inserted (b) 30 inserted


Insertion of 60 Into Figure
10.15(b)
G
40

A F
20 70

B D C E
10 30 60 80
Node Split
• From the above examples, we find
that each time an attempt is made to
add an element into a 3-node p, a new
node q is created. This is referred to
as a node split.

Program 10.5: Insertion into a 2-3 tree (P.501)


Deletion From a 2-3 Tree
• If the element to be deleted is not in a
leaf node, the deletion operation can be
transformed to a leaf node. The deleted
element can be replaced by either the
element with the largest key on the left or
the element with the smallest key on the
right subtree.
• Now we can focus on the deletion on a leaf
node.
Deletion From A 2-3Tree
Example
A A
50 80 50 80

B C D B C D
10 20 60 70 90 95 10 20 60 90 95

A (b) 70 deleted
(a) Initial 2-3 tree
50 80

B C D
10 20 60 95

(c) 90 deleted
Deletion From A 2-3Tree
Example (Cont.)
A A
(d) 60 deleted (e) 95 deleted
20 80 20

B C D B C
10 50 95 10 50 80

A
(f) 50 deleted (g) 10 deleted
20 B
20 80
B C
10 80
Rotation and Combine
• As shown in the example, deletion may
invoke a rotation or a combine operations.
• For a rotation, there are three cases
– the leaf node p is the left child of its parent r.
– the leaf node p is the middle child of its parent
r.
– the leaf node p is the right child of its parent
r.
Three Rotation Cases
r r r
x ? y ? w z

p q p q q p
y z x z x y

a b c d a b c d b c d e

(a) p is the left child of r

r r r
z ? y ? w y

q p q p q p
a
x y x z x z

a b c d a b c d b c d e
(b) p is the middle child of r (c) p is the right child of r
Steps in Deletion From a
Leaf Of a 2-3 Tree
• Step 1: Modify node p as necessary to reflect its status after the
desired element has been deleted.
• Step 2: while( p has zero elements && p is not the root ) {
let r be the parent of p;
let q be the left or right sibling of p ( as appropriate );
if( q is a 3-node )
rotate;
else
combine;
p=r;
}

• Step 3: If p has zero elements, then p must be the root. The left
child of p becomes the new root, and node p is deleted.
Combine When p is the Left
Child of r
r r
x z z

p q p
y x y

a b c a b c

(a)
r r
x z z
q p p
d
y x d

a b c a b c
(b)
M-Way Search Tree
Definition: An m-way search tree, either is empty or
satisfies the following properties:
(1)The root has at most m subtrees and has the following
structures:
n, A0, (K1, A1), (K2, A2), …, (Kn, An)
where the Ai, 0 ≤ i ≤ n ≤ m, are pointers to subtrees, and
the Ki, 1 ≤ i ≤ n ≤ m, are key values.
(2) Ki < Ki +1, 1 ≤ i ≤ n
(3) All key values in the subtree Ai are less than Ki +1 and
greater then Ki , 0 ≤ i ≤ n
(4) All key values in the subtree An are greater than Kn , and
those in A0 are less than K1.
(5) The subtrees Ai, 0 ≤ i ≤ n , are also m-way search trees.
Searching an m-Way Search
Tree

• Suppose to search a m-Way search tree T


for the key value x. Assume T resides on a
disk. By searching the keys of the root, we
determine i such that Ki ≤ x < Ki+1.
– If x = Ki, the search is complete.
– If x ≠ Ki, x must be in a subtree Ai if x is in T.
– We then proceed to retrieve the root of the
subtree Ai and continue the search until we
find x or determine that x is not in T.
Searching an m-Way Search
Tree
• The maximum number of nodes in a tree of degree
m and height h is
m
0 i  h 1
i
 (m h  1) /( m  1)

• Therefore, for an m-Way search tree, the


maximum number of keys it has is mh - 1.
• To achieve a performance close to that of the
best m-way search trees for a given number of
keys n, the search tree must be balanced.
B-Tree
Definition: A B-tree of order m is an m-way
search tree that either is empty or
satisfies the following properties:
(1) The root node has at least two children.
(2) All nodes other than the root node and
failure nodes have at least  m / 2 children.
(3) All failure nodes are at the same level.
B-Tree (Cont.)
• Note that 2-3 tree is a B-tree of order 3 and 2-3-4 tree is
a B-tree of order 4.
• Also all B-trees of order 2 are full binary trees.
• A B-tree of order m and height l has at most ml -1 keys.
• For a B-tree of order m and height l, the minimum number
of keys (N) in such a tree is N  2  m / 2 l 1  1, l  1.
• If there are N key values in a B-tree of order m, then all
nonfailure nodes are at levels less than or equal to l,
l  log  m / 2  {( N. The
 1) / 2maximum
}1 number of accesses that have
to be made for a search is l.
• For example, a B-tree of order m=200, an index with N ≤
2x106-2 will have l ≤ 3.
The Choice of m
• B-trees of high order are desirable since they
result in a reduction in the number of disk
accesses.
• If the index has N entries, then a B-tree of order
m=N+1 has only one level. But this is not
reasonable since all the N entries can not fit in
the internal memory.
• In selecting a reasonable choice for m, we need
to keep in mind that we are really interested in
minimizing the total amount of time needed to
search the B-tree for a value x. This time has two
components:
(1) the time for reading in the node from the disk
(2) the time needed to search this node for x.
The Choice of m (Cont.)
• Assume a node of a B-tree of order m is of a fixed size and is
large enough to accommodate n, A0 , and m-1 triple (Ki , Ai , Bi), 1
≤ j < m.
• If the Ki are at most charactersα long and Ai and Bi each
characters βlong, then the size of a node is about m(α+2β).
Then the time to access a node is
ts + tl + m(α+2β) tc = a+bm
where a = ts + tl = seek time + latency time
b = (α+2β) tc , and tc = transmission time per character.
• If binary search is used to search each node of the B-tree,
then the internal processing time per node is c log2 m+d for
some constants c and d.
• The total processing time per node is τ= a + bm + c log2 m+d
ad bm
• The maximum search time is f * log 2 {( N  1) / 2} * { 
log 2 m log 2 m
 c}
where f is some constant.
Figure 10.36: Values of
(35+0.06m)/log2m
m Search time (sec)
2 35.12
4 17.62
8 11.83
16 8.99
32 7.38
64 6.47
128 6.10
256 6.30
512 7.30
1024 9.64
2048 14.35
4096 23.40
8192 40.50
Figure 10.37: Plot of
(35+0.06m)/log2m
Total maximum search time

6.8

5.7

50 125 400
m
Insertion into a B-Tree
• Instead of using 2-3-4 tree’s top-down insertion, we generalize
the two-pass insertion algorithm for 2-3 trees because 2-3-4
tree’s top-down insertion splits many nodes, and each time we
change a node, it has to be written back to disk. This increases the
number of disk accesses.
• The insertion algorithm for B-trees of order m first performs a
search to determine the leaf node p into which the new key is to
be inserted.
– If the insertion of the new key into p results p having m keys, the node
p is split.
– Otherwise, the new p is written to the disk, and the insertion is
complete.
• Assume that the h nodes read in during the top-down pass can be
saved in memory so that they are not to be retrieved from disk
during the bottom-up pass, then the number of disk accesses for
an insertion is at most h (downward pass) +2(h-1) (nonroot splits) +
3(root split) = 3h+1.
• The average number of disk accesses is approximately h+1 for
large m.
Figure 10.38: B-Trees of
Order 3
20
10, 30

10 25, 30

(a) p = 1, s = 0

(b) p = 3, s = 1
p is the number of
nonfailure nodes in 20, 28
the final B-tree (c) p = 4, s = 2
with N entries.
10 10 25, 30
s is the number
of split
Deletion from a B-Tree
• The deletion algorithm for B-tree is also a
generalization of the deletion algorithm for 2-3
trees.
• First, we search for the key x to be deleted.
– If x is found in a node z, that is not a leaf, then the
position occupied by x in z is filled by a key from a leaf
node of the B-tree.
– Suppose that x is the ith key in z (x =Ki). Then x may be
replaced by either the smallest key in the sbutree Ai or
the largest in the subtree Ai-1. Since both nodes are leaf
nodes, the deletion of x from a nonleaf node is
transformed into a deletion from a leaf.
Deletion from a B-Tree
(Cont.)
• There are four possible cases when deleting from a leaf
node p.
– In the first case, p is also the root. If the root is left with at
least one key, the changed root is written back to disk.
Otherwise, the B-tree is empty following the deletion.
– In the second case, following the deletion, p has at least
 m / 2  1keys. The modified leaf is written back to disk.
– In the third case, p has  m / 2  2 keys, and its nearest sibling,
q, has at least  m / 2 keys. Check only one of p’s nearest
siblings. p is deficient, as it has one less than the minimum
number of keys required. q has more keys than the minimum
required. As in the case of a 2-3 tree, a rotation is performed.
In this rotation, the number of keys in q decreases by one, and
the number in p increases by one.
– In the fourth case, p has  m / 2  2 keys, and q has  m / 2  1 keys.
p is deficient and q has minimum number of keys permissible
for a nonroot node. Nodes p and q and the keys K i are combined
to form a single node.
Figure 10.39 B-Tree of
Order 5

2 20 35

2 10 15 2 25 30 3 40 45 50

You might also like