You are on page 1of 3

CHUA, Justin

LEGASPI, John

INTRNLP Assignment #2 : Levenshtein Distance

1) Implementing the code for Levenshtein Distance


After some research, it can be seen that there are two ways to implement
Levenshtein Distance into code. The main difference is the cost of changing a character.
Normally it is considered to cost 2, 1 for removing the character, and another for
inserting another character to replace it, but for some replacing a character with another
is considered to be 1 cost. This was considered when implementing the code for
Levenshtein Distance, hence there are two java files submitted, along with this
document.

Figure 1. Levenshtein Distance with Replacement Cost of 2

Figure 2. Levenshtein Distance with the Replacement Cost of 1

Aside from the generation of the table along with the output of the Levenshtein
Distance, there is also an additional output displaying what happened to the source
string as it is being edited to the target string. With characters R, M, I, and D showing
Replacement of character, Matching character, Including a new character, and Deleting
a character respectively.
Figure 3. Display of edits done to the strings

2) Comparing Levenshtein Distance with Other Algorithms


a) Needleman-Wunsch Algorithm
The Levenshtein Distance is a type of edit distance that is used to measure the
degree of similarity between two strings or sequences. It is defined by a set of
edit operations which are insertion, deletion and substitution and every operation
corresponds to a cost. The distance between the two strings or sequences are
determined by the total cost of every operation to transform one string to another.

The Levenshtein Distance finds the minimum number to transform a particular


sequence to another sequence. While the Needleman-Wunsch algorithm divides
the large problem into sequences of smaller problems. The Needleman-Wunsch
algorithm is building up the best possible alignment of the sequences by using
optimal alignments of smaller subsequences. It is computed by assigning a score
to each alignment between the two input strings and choosing the score of the
best alignment which is the highest computed total score

The main difference between the Needleman-Wunsch algorithm and the


Levenshtein distance algorithm is that the Levenshtein distance algorithm uses a
fixed penalty cost to any mismatched letters while the Needleman-Wunsch
algorithm gives weights to matches and mismatches differently.

b) Smith-Waterman Algorithm
The Smith-Waterman Algorithm was based on the earlier model which was the
Needleman-Wunsch algorithm. It is an algorithm that takes alignments of any
length for a character sequence at any location in the sequence. It determines
whether an optimal alignment can be found based on scores, weights which are
assigned to each character that is being compared. Scores are added together
and the highest scoring alignment would be chosen.

It is similar to edit distance but instead of finding the minimum, it is finding the
maximum scores by finding similar parts of the sequences.
References:

- Smith-Waterman Algorithm. (n.d.). Retrieved July 27, 2020, from


https://cs.stanford.edu/people/eroberts/courses/soco/projects/computers-and-the-hgp/sm
ith_waterman.html
- Doan, A., & Ives, Z. (2012). Levenshtein Distance. Retrieved July 27, 2020, from
https://www.sciencedirect.com/topics/computer-science/levenshtein-distance

You might also like