Professional Documents
Culture Documents
typical case
ordered operations
implementation
operations on keys
‣ character-based operations
Algorithms F O U R T H E D I T I O N
† under uniform hashing assumption
R OBERT S EDGEWICK | K EVIN W AYNE use array accesses to make R-way decisions
http://algs4.cs.princeton.edu (instead of binary decisions)
Q. Can we do better?
A. Yes, if we can avoid examining the entire key, as with string sorting.
String symbol table basic API String symbol table implementations cost summary
void put(String key, Value val) put key-value pair into the symbol table hashing
L L L 4N to 16N 0.76 40.6
(linear probing)
Value get(String key) return value paired with given key
Goal. Faster than hashing, more flexible than BSTs. Challenge. Efficient performance for string keys.
3 4
Tries
5.2 T RIES
‣ R-way tries
‣ ternary search tries
‣ character-based operations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
Tries. [from retrieval, but pronounced "try"] Follow links corresponding to each character in the key.
・Store characters in nodes (not keys). ・Search hit: node where search ends has a non-null value.
・Each node has R children, one for each possible character. ・Search miss: reach null link or node where search ends has null value.
(for now, we do not draw null links)
link to trie for all keys get("shells")
root that start with s
link to trie for all keys
that start with she
b s t b s t
y 4 e h h y 4 e h h
key value
by 4 a 6 l e 0 o e 5 a 6 l e 0 o e 5
sea 6
value for she in node
sells 1 l l r corresponding to last l l r
she 0 key character
shells 3 s 1 l e 7 s 1 l e 7
shore 7
return value associated
the 5 s 3 s 3 with last key character
7 (return 3) 8
Search in a trie Search in a trie
Follow links corresponding to each character in the key. Follow links corresponding to each character in the key.
・Search hit: node where search ends has a non-null value. ・Search hit: node where search ends has a non-null value.
・Search miss: reach null link or node where search ends has null value. ・Search miss: reach null link or node where search ends has null value.
get("she") get("shell")
b s t b s t
y 4 e h h y 4 e h h
a 6 l e 0 o e 5 a 6 l e 0 o e 5
s 1 l e 7 s 1 l e 7
no value associated
with last key character
s 3 s 3
(return null)
9 10
Follow links corresponding to each character in the key. Follow links corresponding to each character in the key.
・Search hit: node where search ends has a non-null value. ・Encounter a null link: create new node.
・Search miss: reach null link or node where search ends has null value. ・Encounter the last character of the key: set value in that node.
get("shelter") put("shore", 7)
b s t b s t
y 4 e h h y 4 e h h
a 6 l e 0 o e 5 a 6 l e 0 o e 5
l l r l l r
s 1 l e 7 s 1 l e 7
no link to t
s 3 s 3
(return null)
11 12
Trie construction demo Trie construction demo
trie trie
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
s 3
⋮
Trie representation
15 16
R-way trie: Java implementation (continued) Trie performance
⋮
public boolean contains(String key) Search miss.
{ return get(key) != null; }
・Could have mismatch on first character.
public Value get(String key) ・Typical case: examine only a few characters (sublinear).
{
Node x = get(root, key, 0);
Space. R null links at each leaf.
if (x == null) return null;
return (Value) x.val; cast needed (but sublinear space possible if many short strings share common prefixes)
}
characters are implicitly
defined by link index s
private Node get(Node x, String key, int d) s
e h
{ e h
if (x == null) return null; a l e
a l e 2 0
if (d == key.length()) return x; 2 0
l
char c = key.charAt(d); each node has
l
an array of links
return get(x.next[c], key, d+1); s and a value
s 1 1
}
Trie representation
}
Bottom line. Fast search hit and even faster search miss, but wastes space.
17 18
b s t b s t
y 4 e h h y 4 e h h
a 6 l e 0 o e 5 a 6 l e 0 o e 5
l l r l l r
s 1 l e 7 s 1 l e 7
・Store characters and values in nodes (not keys). ・Store characters and values in nodes (not keys).
・Each node has 3 children: smaller (left), equal (middle), larger (right). ・Each node has 3 children: smaller (left), equal (middle), larger (right).
link to TST for all keys link to TST for all keys
that start with that start with s
a letter before s
s
b t
a b s t a
Fast Algorithms for Sorting and Searching Strings
get("sea") get("shelter")
s s
b t b t
y 4 h h y 4 h h
e e
l e 0 e 5 l e 0 e 5
6 a o 6 a o
l l r l l r
return value associated
with last key character
s 1 l e 7 s 1 l e 7
s 3 s 3 no link to t
(return null)
25 26
Ternary search trie construction demo Ternary search trie construction demo
b t
y 4 h h
e
l e 0 e 5
6 a o
l l r
s 1 l e 7
s 3
27 28
Search in a TST 26-way trie vs. TST
now
Follow links corresponding to each character in the key. 26-way trie. 26 null links in each leaf. for
・If equal, take the middle link and move to the next key character.
dim
tag
jot
sob
nob
Search hit. Node where search ends has a non-null value. sky
hut
Search miss. Reach a null link or node where search ends has null value. ace
bet
26-way trie (1035 null links, not shown)
men
get("sea") match: take middle link, egg
move to next char few
jay
mismatch: take left or right link,
do not move to next char
s
TST. 3 null links in each leaf. owl
joy
b t rap
a
gig
wee
r y 4 h h
e u was
cab
e 12 l e 10 r e 8 wad
14 a o
caw
l l r e cue
fee
return value tap
s 11 l e 7 l
associated with ago
last key character tar
s 15 y 13
jam
dug
TST search example TST (155 null links) and
29 30
・A character c.
{ {
private Value val; private Node root;
31 32
TST: Java implementation (continued) String symbol table implementation cost summary
TST with R2 branching at root String symbol table implementation cost summary
hashing
L L L 4N to 16N 0.76 40.6
(linear probing)
aa ab ac zy zz
R-way trie L log R N L (R+1) N 1.12 out of
memory
Q. What about one- and two-letter words? Bottom line. Faster than hashing for our benchmark client.
35 36
TST vs. hashing
Hashing.
・Need to examine entire key.
・Search hits and misses cost about the same.
・Performance relies on hash function.
・Does not support ordered symbol table operations. 5.2 T RIES
TSTs. ‣ R-way tries
・Works only for string (or digital) keys. ‣ ternary search tries
・Only examines just enough key characters. ‣ character-based operations
・Search miss may involve only a few characters. Algorithms
・Supports ordered symbol table operations (plus extras!).
R OBERT S EDGEWICK | K EVIN W AYNE
Bottom line. TSTs are: http://algs4.cs.princeton.edu
Character-based operations. The string symbol table API supports several public class StringST<Value>
useful character-based operations.
StringST() create a symbol table with string keys
key value
by 4 void put(String key, Value val) put key-value pair into the symbol table
sea 6
Value get(String key) value paired with key
sells 1
shells 3 ⋮
shore 7
Iterable<String> keys() all keys
the 5
Iterable<String> keysWithPrefix(String s) keys having s as a prefix
Prefix match. Keys with prefix sh: she, shells, and shore. Iterable<String> keysThatMatch(String s) keys that match s (where . is a wildcard)
Remark. Can also add other ordered ST methods, e.g., floor() and rank().
39 40
Warmup: ordered iteration Ordered iteration: Java implementation
To iterate through all keys in sorted order: To iterate through all keys in sorted order:
・Do inorder traversal of trie; add keys encountered to a queue. ・Do inorder traversal of trie; add keys encountered to a queue.
・Maintain sequence of characters on path from root to node. ・Maintain sequence of characters on path from root to node.
keysWithPrefix("");
keys()
public Iterable<String> keys()
key q {
b b s t Queue<String> queue = new Queue<String>();
by by
s y e h collect(root, "", queue);
4 h
se return queue; sequence of characters
sea by sea a 6 l e 0 o e 5
sel
} on path from root to x
sell l l r
sells by sea sells private void collect(Node x, String prefix, Queue<String> q)
sh s 1 l e 7
she by sea sells she {
shell s 3
if (x == null) return;
shells by sea sells she shells
sho if (x.val != null) q.enqueue(prefix);
shor for (char c = 0; c < R; c++)
shore by sea sells she shells shore
t
collect(x.next[c], prefix + c, q);
th } or use StringBuilder
the by sea sells she shells shore the
Find all keys in a symbol table starting with a given prefix. Find all keys in a symbol table starting with a given prefix.
Ex. Autocomplete in a cell phone, search bar, text editor, or shell. keysWithPrefix("sh");
keysWithPrefix("sh");
・
key qkey q
User types characters one at a time. sh sh
Prefix match
Prefix in
match
a triein a trie
keysWithPrefix("sh");
public Iterable<String> keysWithPrefix(String prefix) key
key queue
q
{ sh
b s t b s t she she
Queue<String> queue = new Queue<String>();
shel
Node x =y get(root,
4 e h 0); y 4
h prefix, e h h shell
shells she shells
collect(x, aprefix, 6 l e queue);
0 o e 5 a 6 l e 0 o e 5 sho
find subtrie for
return all
queue; root of subtrie for all strings shor
keys beginning with "sh" l l r l with
beginning l given
r prefix shore she shells shore
}
s 1 l e 7 s 1 l e 7 collect keys
43 s 3 s 3
in that subtrie 44
Find longest key in symbol table that is a prefix of query string. Find longest key in symbol table that is a prefix of query string.
・Search for query string.
Ex. To send packet toward destination IP address, router chooses IP ・Keep track of longest key encountered.
address in routing table that is longest prefix match.
represented as 32-bit
"128"
binary number for IPv4
"128.112" (instead of string)
"she"
"she"
"she" "shell"
"shell"
"shell" "shellsort"
"shellsort"
"shellsort" "shelt
"she
"s
"128.112.055"
"128.112.055.15" s s s s s s
s s s
"128.112.136" longestPrefixOf("128.112.136.11") = "128.112.136" e e e h h h e e e h h h
longestPrefixOf("128.112.100.16") = "128.112"
e e e h h h e
"128.112.155.11" a a a l2 le e
2 2l 0 0e 0 a a a l2 le e
2 2l 0 0e 0search ends
search
search atends
ends at at
endend
ofend
string
of string
of string a a a l2 le e a a
longestPrefixOf("128.166.123.45") = "128"
value is
value null
is null
value is null 2 2l 0 0e 0 2
"128.112.155.13" l l ll l l l l ll l l return she
return sheshe
return
search
searchends
search atends
ends at at (last key
(last onkey
key
(last path)
on path)
on path) l l ll l l
"128.222" s s
1 1s
l 1l l endend ofend
string
of string
of string s s1 1s
l 1l l
value is
value not null
is not
value nullnull
is not s s
1 1s
l 1l l
"128.222.136" s s3 3s 3return
return shesheshe
return s s
3 3s 3
search ends
search
search atends
ends at at
s s
3 3s 3 null link
null linklink
null
return
return shells
shells
return shells
(last key
(last onkey
key
(last path)
on path)
on path)
Possibilities
Possibilities forfor
Possibilities
Possibilities for
for longestPrefixOf()
longestPrefixOf()
longestPrefixOf()
longestPrefixOf()
Note. Not the same as floor: floor("128.112.100.16") = "128.112.055.15"
45 46
Find longest key in symbol table that is a prefix of query string. Goal. Type text messages on a phone keypad.
・Search for query string.
・Keep track of longest key encountered. Multi-tap input. Enter a letter by repeatedly pressing a key.
Ex. hello: 4 4 3 3 5 5 5 5 5 5 6 6 6
Patricia trie. [Practical Algorithm to Retrieve Information Coded in Alphanumeric] Suffix tree.
・Remove one-way branching. ・Patricia trie of suffixes of a string.
・Each node represents a sequence of characters. ・Linear-time construction: well beyond scope of this course.
・Implementation: one step beyond this course.
put("shells", 1);
put("shellfish", 2);
shell BANANAS A NA S
Applications. s s 1 fish 2
・Database search. h
NA S S NAS
i external Applications.
・Linear-time: longest repeated substring, longest common substring,
one-way
branching
s
longest palindromic substring, substring search, tandem repeats, ….
・Computational biology databases (BLAST, FASTA).
h 2
Red-black BST.
・Performance guarantee: log N key compares.
・Supports ordered symbol table API.
Hash tables.
・Performance guarantee: constant number of probes.
・Requires good hash function for key type.
Tries. R-way, TST.
・Performance guarantee: log N characters accessed.
・Supports character-based operations.
Bottom line. You can get at anything by examining 50-100 bits (!!!)
53