You are on page 1of 21

Static Dictionaries

Collection of items. Each item is a pair.


(key, element) Pairs have different keys.

Operations are:
initialize/create get (search)

Hashing
Perfect hashing (no collisions). Minimal perfect hashing (space = n). CHD algorithm. O(n) time to construct the perfect or minimal perfect hash function. O(1) search time. Bothelo, Belazzougui & Dietzfelbinger. Compress, hash, and displace. 17th European Symposium on Algorithms, 2009.

Search Tree
Hashing not efficient for extended operations such as range search and nearest match. Will examine a binary search tree structure for static dictionaries. Each item/key/element has an estimated access frequency (or probability).

Example
Key a b c
b

Probability 0.8 0.1 0.1 a<b<c


a b

c Cost = 0.8 * 1 + 0.1 * 2 + 0.1 * 3 = 1.2

Cost = 0.8 * 2 + 0.1 * 1 + 0.1 * 2 = 1.9

Search Types
b a f0 f1 f2 c f3

Successful.
Search for a key that is in the dictionary. Terminates at an internal node.

Unsuccessful.
Search for a key that is not in the dictionary. Terminates at an external/failure node.

Internal And External Nodes


b a f0 f1 f2 c f3

A binary tree with n internal nodes has n + 1 external nodes. Let s1, s2, , sn be the internal nodes, in inorder. key(s1) < key(s2) < < key(sn). Let key(s0) = infinity and key(sn+1) = infinity. Let f0, f1, , fn be the external nodes, in inorder. fi is reached iff key(si) < search key < key(si+1).

Cost Of Binary Search Tree


b a f0 f1 f2 c f3

Let pi = probability for key(si). Let qi = probability for key(si) < search key < key(si+1). Sum of ps and qs = 1. Cost of tree = S0 <= i <= n qi (level(fi) 1) + S1<= i <= n pi * level(si) Cost = weighted path length.

Brute Force Algorithm


Generate all binary search trees with n internal nodes. Compute the weighted path length of each. Determine tree with minimum weighted path length. Number of trees to examine is O(4n/n1.5). Brute force approach is impractical for large n.

Dynamic Programming
Keys are a1 < a2 < < an. Let Ti j= least cost tree for ai+1, ai+2, , aj. T0n= least cost tree for a1, a2, , an.
a5 a6

a3 a4 f5

T2,7
a7 f7

f2

f3

f4

f6

Ti j includes pi+1, pi+2, , pj and qi, qi+1, , qj.

Terminology
Ti j= least cost tree for ai+1, ai+2, , aj. a5 T ci j= cost of Ti j 2,7 = Si <= u <= j qu (level(fu) 1) a6 a3 + Si < u <= j pu * level(su). a4 f5 ri j= root of Ti j. f2 wi j= weight of Ti j f6 f4 f3 = sum of ps and qs in Ti j = pi+1+ pi+2+ + pj + qi + qi+1 + + qj

a7 f7

i=j
Ti j includes pi+1, pi+2, , pj and qi, qi+1, , qj. Ti i includes qi only.
fi

Tii

ci i = cost of Ti i = 0. ri i = root of Ti i = 0. wi i = weight of Ti i = sum of ps and qs in Ti i = qi

i<j
Ti j= least cost tree for ai+1, ai+2, , aj. Ti j includes pi+1, pi+2, , pj and qi, qi+1, , qj. Let ak, i < k <= j, be in the root of Ti j. a5
ak
a3 a6 a4 f3 f5 f4 f6

Ti j
L R f2

a7
f7

L includes pi+1, pi+2, , pk-1 and qi, qi+1, , qk-1. R includes pk+1, pk+2, , pj and qk, qk+1, , qj.

cost(L)
a3

a5 a6 a7 f7

f2
L f3

a4

f5
f6

f4

L includes pi+1, pi+2, , pk-1 and qi, qi+1, , qk-1. cost(L) = weighted path length of L when viewed as a stand alone binary search tree.

Contribution To cij
ak
L L R

Ti j

ci j = Si <= u <= j qu (level(fu) 1) + Si < u <= j pu * level(su). When L is viewed as a subtree of Ti j , the level of each node is 1 more than when L is viewed as a stand alone tree. So, contribution of L to cij is cost(L) + wi k-1.

cij
ak
a3

a5 a6 a7 f7

Ti j
L R

f2

a4

f5
f6

f3

f4

Contribution of L to cij is cost(L) + wi k-1. Contribution of R to cij is cost(R) + wkj. cij = cost(L) + wi k-1 + cost(R) + wkj + pk = cost(L) + cost(R) + wij

cij
ak
a3

a5 a6 a7 f7

Ti j
L R

f2

a4

f5
f6

cij = cost(L) + cost(R) + wij cost(L) = cik-1 cost(R) = ckj cij = cik-1 + ckj + wij Dont know k. cij = mini < k <= j{cik-1 + ckj} + wij

f3

f4

cij
ak

Ti j
L R

cij = mini < k <= j{cik-1 + ckj} + wij rij = k that minimizes right side.

Computation Of c0n And r0n


Start with ci i = 0, ri i = 0, wi i = qi, 0 <= i <= n (zero-key trees). Use cij = mini < k <= j{cik-1 + ckj} + wij to compute cii+1, ri i+1, 0 <= i <= n 1 (one-key trees). Now use the equation to compute cii+2, ri i+2, 0 <= i <= n 2 (two-key trees). Now use the equation to compute cii+3, ri i+3, 0 <= i <= n 3 (three-key trees). Continue until c0n and r0n(n-key tree) have been computed.

Computation Of c0n And r0n


0 1 2 3 4
0 1

2
3 4

cij, rij, i <= j

Complexity
cij = mini < k <= j{cik-1 + ckj} + wij O(n) time to compute one cij. O(n2) cijs to compute. Total time is O(n3). May be reduced to O(n2) by using cij = min ri,j-1 < k <= ri+1,j{cik-1 + ckj} + wij

Construct T0n
Root is r0n. Suppose that r0n = 10.
a10

T0n T10,n

T09

Construct T09 and T10,n recursively. Time is O(n).

You might also like