You are on page 1of 2

9/17/2020 CS440 Lectures

CS 440/ECE 448
Fall 2020
Margaret Fleck
Uninformed Search 5

Edit distance
A good search algorithm must be coupled with a good representation of the problem. So transforming to a
different state graph model may dramatically improve performance. We'll illustrate this with finding edit
distance, and then see another example when we get to classical planning.

Problem: find the best alignment between two strings, e.g. "ALGORITHM" and "ALTRUISTIC". By "best," we
mean the alignment that requires the fewest editing operations (delete char, add char, substitute char) to
transform the first word into the second. We can get from ALGORITHM to ALTRUISTIC with six edits,
producing the following optimal alignment:
A L G O R I T H M
A L T R U I S T I C
D S A A S S

D = delete character
A = add character
S = substitute

For more detail, see Wikipedia on edit distance and Jeff Erickson's notes.

Versions of this task appear in

theory classes (example of dynamic programming)


computational biology (matching DNA sequences)
evaluating speech recognizers (comparing machine transcription to correct one)
machine translation (aligning text in two different languages to match related words)

Naive method
Each state is a string. So the start state would be "algorithm" and the end state is "altruistic." Each action applies
one edit operation, anywhere in the word.

In this model, each state has a very large number of successors. So there are about 28 single-character changes
9

you could make to ALGORITHM, resulting in words like:


LGORITHM, AGORITHM, ..., ALGORIHM, ...
BLGORITHM, CLGORITHM, ....
AAGORITHM, ....
...
ALBORJTHM, ...
....

This massive fan-out will cause search to be very slow.

Better method

A better representation for this problem explores the space of partial matches. Each state is an alignment of the
initial sections (prefixes) of the two strings. So the start state is an alignment of two empty strings, and the end
state is an alignment of the full strings.
https://courses.grainger.illinois.edu/cs440/fa2020/lectures/search5.html 1/2
9/17/2020 CS440 Lectures
In this representation, each action extends the alignment by one character. From each state, there are only three
legal moves:

pair the next characters in the two strings


skip a character in the first string
skip a character in the second string

In this specific case, we can pack our work into a nifty compact 2D table.

This algorithm can be adapted for situations where the search space is too large to maintain the full alignment
table. for example, we may need to match DNA sequences that are 10 characters long. 5

We can an approximate search in which we keep only the best k alignments of each length (e.g. speech
recognition).
We can compute the edit distance but not the optimal alignment, which requires storing only two columns
from the table.

https://courses.grainger.illinois.edu/cs440/fa2020/lectures/search5.html 2/2

You might also like