You are on page 1of 22

# Space and time trade-offs

## Two varieties of space-time trade-offs algorithms:

input enhancement preprocess the input (or its part) to
store some info to be used later in solving the problem
counting sorts
string matching algorithms

## prestructuring preprocess the input to make accessing its

elements easier
i.e, use extra space to facilitate faster and/or more flexible
access to the data
hashing

Sorting By counting
ALGORITHM ComparisonCountingsort(A[0..n-1])
//Sort an array by comparison counting
//Input: An array A[0..n-1] of A's elements sorted in nondecreasing
order
for i 0 to n-1 do Count[i] 0
for i 0 to n-2 do
for j i+1 to n-1 do
if A[i] <A[j]
Count[j] Count[j]+1
else Count[i] Count[i]+1
for i 0 to n-1 do S[Count[i]] A[i]
return S

Example:

51

20

60

C(n)= ?

70

12

40

10

## pattern: a string of m characters to search for

text: a (long) string of n characters to search in

## Brute force algorithm

Step 1 Align pattern at beginning of text
Step 2 Moving from left to right, compare each character of
pattern to the corresponding character in text until either all
characters are found to match (successful search) or a
mismatch is detected
Step 3 While a mismatch is detected and the text is not yet
exhausted, realign pattern one position to the right and repeat
Step 2
3

## String searching by preprocessing

Several string searching algorithms are based on the input
enhancement idea of preprocessing the pattern
Knuth-Morris-Pratt (KMP) algorithm preprocesses pattern

## by using just one table

Horspools Algorithm
A simplified version of Boyer-Moore algorithm:
preprocesses pattern to generate a shift table that determines

## how much to shift the pattern when a mismatch occurs

always makes a shift based on the texts character c aligned with

entry for c

## How far to shift?

Look at first (rightmost) character in text that was compared:
The character is not in the pattern

BARBER

## .....B...................... (B occurs twice in pattern)

BARBER
[ shift by two length]
.....A...................... (A occurs once in pattern)

BARBER

## [ shift by four length]

The rightmost characters do match but there is another R, for this specific example

....AR......................
BARBER
[ shift by three length]

The rightmost characters do match but there is no another R, for this specific example

....AR......................
LEADER
[ shift by entire length]
6

Shift table
Shift sizes can be precomputed by the formula

## distance from cs rightmost occurrence in pattern

among its first m-1 characters to its right end
t(c) =
patterns length m, otherwise
by scanning pattern before search begins and stored in a
table called shift table
Shift table is indexed by text and pattern alphabet

## [characters distance to the

last character of the pattern]

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

4 2 6 6 1 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6
7

## Example of Horspools alg. application

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

4 2 6 6 1 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6

J I M _ S A W _ M E _ I N _ A _BARBERS H O P
BARB ER
BA R BER
BAR BER
BARBER
BAR BE R
BARBER

Boyer-Moore algorithm
Based on same two ideas:
comparing pattern characters to text from right to left
precomputing shift sizes in two tables

## character causing a mismatch

good-suffix table indicates how much to shift based on matched

## Bad-symbol shift in Boyer-Moore algorithm

If the rightmost character of the pattern doesnt match, BM

## algorithm acts as Horspools

If the rightmost character of the pattern does match, BM
compares preceding characters right to left until either all
patterns characters match or a mismatch on texts character c is
encountered after k > 0 matches
text
C
pattern

k matches

10

## Good-suffix shift in Boyer-Moore algorithm

Good-suffix shift d2 is applied after 0 < k < m last characters were

matched
d2(k) = the distance between matched suffix of size k and its
rightmost occurrence in the pattern that is not preceded by the
same character as the suffix
Example: CABABA d2(1) = 4
If there is no such occurrence, match the longest part of the k-

## character suffix with corresponding prefix;

if there are no such suffix-prefix matches, d2 (k) = m

11

## Good-suffix shift in the Boyer-Moore alg. (cont.)

After matching successfully 0 < k < m characters, the algorithm shifts
the pattern right by
d = max {d1, d2}
where d1 = max{t1(c) - k, 1} is bad-symbol shift
d2(k) is good-suffix shift

12

Step 1
Step 2
Step 3
Step 4

13

## Fill in the bad-symbol shift table

Fill in the good-suffix shift table
Align the pattern against the beginning of the text
Repeat until a matching substring is found or text ends:
Compare the corresponding characters right to left.
If no characters match, retrieve entry t1(c) from the bad-symbol
table for the texts character c causing the mismatch and shift the
pattern to the right by t1(c).
If 0 < k < m characters are matched, retrieve entry t1(c) from the
bad-symbol table for the texts character c causing the mismatch
and entry d2(k) from the good-suffix table and shift the pattern to
the right by
d = max {d1, d2}
where d1 = max{t1(c) - k, 1}.

## Example of Boyer-Moore alg. application

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6 6

B E S S _ K N E W _ A B O U T _ B A O B A B S
B A O B A B
d1 = t1(K) = 6
B A O B A B
d1 = t1(_)-2 = 4
d2(2) = 5
pattern d2
B A O B A B
BAOBAB 2
d1 = t1(_)-1 = 5
BAOBAB 5
d2(1) = 2
B A O B A B (success)
BAOBAB 5

BAOBAB

BAOBAB

k
1

14

## Boyer-Moore example from their paper

Find pattern AT_THAT in
WHICH_FINALLY_HALTS. _ _ AT_THAT

15

Hashing
A very efficient method for implementing a dictionary, i.e., a set

## with the operations:

find (Search)
insert
delete
Each element in the set contains a key and satellite data (the
remainder of the record.)
The keys are unique, but the satellite data are not.

## Based on representation-change and space-for-time tradeoff

ideas

Important applications:
symbol tables
databases (extendible hashing)
16

## Hash tables and hash functions

The idea of hashing is to map keys of a given file of size n into
a table of size m, called the hash table, by using a predefined
function, called the hash function,
h: K location (cell) in the hash table
Example: student records, key = ID. Hash function:
h(K) = K mod m where m is some integer (typically, prime)
If m = 1000, where is record with ID= 314159265 stored?

## Generally, a hash function should:

be easy to compute
distribute keys about evenly throughout the hash table

17

Collisions

## If h(K1) = h(K2), there is a collision

Good hash functions result in fewer collisions but some collisions

should be expected

Open hashing

## each cell is a header of linked list of all keys hashed to it

Closed hashing
one key per cell
in case of collision, finds another cell by

18

## Open hashing (Separate chaining)

Keys are stored in linked lists outside a hash table whose
elements serve as the lists headers.
Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED
h(K) = sum of K s letters positions in the alphabet MOD 13
Key

h(K)

9
2

6
3

10
5

MONEY

ARE

SOON

PARTED

11

11

12

AND MONEY

10

11

SOON

19

12

## list will be = n/m. This ratio is called load factor.

Where: n- Number of elements
m- table size
Average number of probes in successful, S, and unsuccessful
searches, U:
S 1+/2, U =
Load is typically kept small (ideally, about 1)
Open hashing still works if n > m
20

## Closed hashing (Open addressing)

If position h(K)- is occupied, then the position in the probing

## sequence: norm( h(k)+p(1)), norm(h(k)+p(2)),,

norm(h(K)+p(i)), are tried.
Linear probing: p(i)=i
Example: A5 , A2 , A3 , B5 , A9 , B2 , B9 , C2
Assume: h(A2)=2, h(A3)=3, h(A5)=5, h(A9)=9, h(B2)=2, h(B5)=5,
h(B9)=9, h(C2)=2
A2
A3
A5
Insert: A5 , A2 , A3 0
1
2
3
4
5
6
7
8
9
Insert: B5 , A9 , B2
Insert: B9 , C2

21

A2
0

B9
0

A3

B2

A5
4

A2

A3

B2

A5
5

B5
6

B5
6

A9
7

C2
7

A9
8

## Closed hashing (cont.)

Does not work if n > m

Avoids pointers
Deletions are not straightforward
Number of probes to find/insert/delete a key depends on load

## factor = n/m (hash table density) and collision resolution

strategy. For linear probing:
S = () (1+ 1/(1- )) and U = () (1+ 1/(1- ))

22