Professional Documents
Culture Documents
Uninformed Search 5 Edit Distance: Naive Method
Uninformed Search 5 Edit Distance: Naive Method
CS 440/ECE 448
Fall 2020
Margaret Fleck
Uninformed Search 5
Edit distance
A good search algorithm must be coupled with a good representation of the problem. So transforming to a
different state graph model may dramatically improve performance. We'll illustrate this with finding edit
distance, and then see another example when we get to classical planning.
Problem: find the best alignment between two strings, e.g. "ALGORITHM" and "ALTRUISTIC". By "best," we
mean the alignment that requires the fewest editing operations (delete char, add char, substitute char) to
transform the first word into the second. We can get from ALGORITHM to ALTRUISTIC with six edits,
producing the following optimal alignment:
A L G O R I T H M
A L T R U I S T I C
D S A A S S
D = delete character
A = add character
S = substitute
For more detail, see Wikipedia on edit distance and Jeff Erickson's notes.
Naive method
Each state is a string. So the start state would be "algorithm" and the end state is "altruistic." Each action applies
one edit operation, anywhere in the word.
In this model, each state has a very large number of successors. So there are about 28 single-character changes
9
Better method
A better representation for this problem explores the space of partial matches. Each state is an alignment of the
initial sections (prefixes) of the two strings. So the start state is an alignment of two empty strings, and the end
state is an alignment of the full strings.
https://courses.grainger.illinois.edu/cs440/fa2020/lectures/search5.html 1/2
9/17/2020 CS440 Lectures
In this representation, each action extends the alignment by one character. From each state, there are only three
legal moves:
In this specific case, we can pack our work into a nifty compact 2D table.
This algorithm can be adapted for situations where the search space is too large to maintain the full alignment
table. for example, we may need to match DNA sequences that are 10 characters long. 5
We can an approximate search in which we keep only the best k alignments of each length (e.g. speech
recognition).
We can compute the edit distance but not the optimal alignment, which requires storing only two columns
from the table.
https://courses.grainger.illinois.edu/cs440/fa2020/lectures/search5.html 2/2