Professional Documents
Culture Documents
Operations are:
initialize/create get (search)
Hashing
Perfect hashing (no collisions). Minimal perfect hashing (space = n). CHD algorithm. O(n) time to construct the perfect or minimal perfect hash function. O(1) search time. Bothelo, Belazzougui & Dietzfelbinger. Compress, hash, and displace. 17th European Symposium on Algorithms, 2009.
Search Tree
Hashing not efficient for extended operations such as range search and nearest match. Will examine a binary search tree structure for static dictionaries. Each item/key/element has an estimated access frequency (or probability).
Example
Key a b c
b
Search Types
b a f0 f1 f2 c f3
Successful.
Search for a key that is in the dictionary. Terminates at an internal node.
Unsuccessful.
Search for a key that is not in the dictionary. Terminates at an external/failure node.
A binary tree with n internal nodes has n + 1 external nodes. Let s1, s2, , sn be the internal nodes, in inorder. key(s1) < key(s2) < < key(sn). Let key(s0) = infinity and key(sn+1) = infinity. Let f0, f1, , fn be the external nodes, in inorder. fi is reached iff key(si) < search key < key(si+1).
Let pi = probability for key(si). Let qi = probability for key(si) < search key < key(si+1). Sum of ps and qs = 1. Cost of tree = S0 <= i <= n qi (level(fi) 1) + S1<= i <= n pi * level(si) Cost = weighted path length.
Dynamic Programming
Keys are a1 < a2 < < an. Let Ti j= least cost tree for ai+1, ai+2, , aj. T0n= least cost tree for a1, a2, , an.
a5 a6
a3 a4 f5
T2,7
a7 f7
f2
f3
f4
f6
Terminology
Ti j= least cost tree for ai+1, ai+2, , aj. a5 T ci j= cost of Ti j 2,7 = Si <= u <= j qu (level(fu) 1) a6 a3 + Si < u <= j pu * level(su). a4 f5 ri j= root of Ti j. f2 wi j= weight of Ti j f6 f4 f3 = sum of ps and qs in Ti j = pi+1+ pi+2+ + pj + qi + qi+1 + + qj
a7 f7
i=j
Ti j includes pi+1, pi+2, , pj and qi, qi+1, , qj. Ti i includes qi only.
fi
Tii
i<j
Ti j= least cost tree for ai+1, ai+2, , aj. Ti j includes pi+1, pi+2, , pj and qi, qi+1, , qj. Let ak, i < k <= j, be in the root of Ti j. a5
ak
a3 a6 a4 f3 f5 f4 f6
Ti j
L R f2
a7
f7
L includes pi+1, pi+2, , pk-1 and qi, qi+1, , qk-1. R includes pk+1, pk+2, , pj and qk, qk+1, , qj.
cost(L)
a3
a5 a6 a7 f7
f2
L f3
a4
f5
f6
f4
L includes pi+1, pi+2, , pk-1 and qi, qi+1, , qk-1. cost(L) = weighted path length of L when viewed as a stand alone binary search tree.
Contribution To cij
ak
L L R
Ti j
ci j = Si <= u <= j qu (level(fu) 1) + Si < u <= j pu * level(su). When L is viewed as a subtree of Ti j , the level of each node is 1 more than when L is viewed as a stand alone tree. So, contribution of L to cij is cost(L) + wi k-1.
cij
ak
a3
a5 a6 a7 f7
Ti j
L R
f2
a4
f5
f6
f3
f4
Contribution of L to cij is cost(L) + wi k-1. Contribution of R to cij is cost(R) + wkj. cij = cost(L) + wi k-1 + cost(R) + wkj + pk = cost(L) + cost(R) + wij
cij
ak
a3
a5 a6 a7 f7
Ti j
L R
f2
a4
f5
f6
cij = cost(L) + cost(R) + wij cost(L) = cik-1 cost(R) = ckj cij = cik-1 + ckj + wij Dont know k. cij = mini < k <= j{cik-1 + ckj} + wij
f3
f4
cij
ak
Ti j
L R
cij = mini < k <= j{cik-1 + ckj} + wij rij = k that minimizes right side.
2
3 4
Complexity
cij = mini < k <= j{cik-1 + ckj} + wij O(n) time to compute one cij. O(n2) cijs to compute. Total time is O(n3). May be reduced to O(n2) by using cij = min ri,j-1 < k <= ri+1,j{cik-1 + ckj} + wij
Construct T0n
Root is r0n. Suppose that r0n = 10.
a10
T0n T10,n
T09