A study about the working of Edit Distance or Levenshtein algorithm

© All Rights Reserved

15 views

A study about the working of Edit Distance or Levenshtein algorithm

© All Rights Reserved

- course outline math 11 aw
- AIM BASIC Reference Manual
- Single vs Double Quotes - Qlikview - How and When to Use
- Java Record
- Pro Decrypting VBScript Viruses
- dis doc 2017-2018
- Parsing CSV Values Into Multiple Rows - SQLTeam
- strings (2)
- Lesson Template Mathemaics
- Chapt12 Str
- Go for Pythonistas
- AAL8209
- Basic Math Guide
- Undergrad Experience Mathematics
- Windows Inspectors 81_110706
- Components of a Java Program
- Dorrie Heinrich Problems
- 541 video enhanced lesson plan
- 15.4 Problem Set
- 2nd ass

You are on page 1of 14

LEVENSHTEIN

DISTANCE

[Minimum Edit Distance between

two strings]

PROJECT BY

DINESH KUMAR R K K RAM KUMAAR

(106112026) (106112045)

1

TABLE OF CONTENTS

OVERVIEW

ALGORITHM

CODE

OUTPUT AND SCREENSHOTS

2

OVERVIEW:

Levenshtein distance (LD) is a measure of the

similarity between two strings, which we will refer to as

the source string (s) and the target string (t). The distance

is the number of deletions, insertions, or substitutions

required to transform s into t.

For example,

If s is "test" and t is "test", then LD(s,t) = 0, because

no transformations are needed. The strings are

already identical.

If s is "test" and t is "tent", then LD(s,t) = 1, because

one substitution (change "s" to "n") is sufficient to

transform s into t.

The greater the Levenshtein distance, the more

different the strings are.

3

Levenshtein distance is named after the

Russian Vladimir Levenshtein, who considered this

distance in 1965 may also be referred to as edit

distance, although that may also denote a

larger family of distance metrics. It is closely related

to pairwise string alignments.

The Levenshtein distance has several simple

upper and lower bounds. These include:

It is always at least the difference of the sizes of the

two strings.

It is at most the length of the longer string.

It is zero if and only if the strings are equal.

If the strings are the same size, the Hamming

distance is an upper bound on the Levenshtein

distance.

The Levenshtein distance between two strings is no

greater than the sum of their Levenshtein distances

from a third string (triangle inequality).

The Levenshtein distance can also be computed

between two longer strings, but the cost to compute it,

which is roughly proportional to the product of the two

string lengths, makes this impractical.

4

APPLICATIONS OF LEVEINSHTEIN DISTANCE

INCLUDE:

Spell checking

Speech recognition

DNA analysis

Plagiarism detection

Software to assist natural language translation based

on translation memory.

Correction systems for Optical character recognition.

In approximate string matching, the objective is

to find matches for short strings in many longer

texts, in situations where a small number of

differences is to be expected. The short strings could

come from a dictionary, for instance. Here, one of

the strings is typically short, while the other is

arbitrarily long.

The Dynamic Implementation of the

algorithm works in the order of O(mn), where m and n

are the lengths of string 1 and string 2 respectively.

5

ALGORITHM :

STEP 1 : Set n to be the length of s.

Set m to be the length of t.

If n = 0, return m and exit.

If m = 0, return n and exit.

Construct a matrix of 0..m rows and 0..n columns.

STEP 2 : Initialize the first row to 0..n.

Initialize the first column to 0..m.

STEP 3 : Examine each character of s (i from 1 to n).

STEP 4 : Examine each character of t (j from 1 to m).

STEP 5 : If s[i] equals t[j], the cost is 0.

If s[i] doesn't equal t[j], the cost is 1.

STEP 6 : Set cell d[i,j] of the matrix equal to the minimum

of:

a. The cell immediately above plus 1: d[i-1,j] + 1.

b. The cell immediately to the left plus 1: d[i,j-1] + 1.

c. The cell diagonally above and to the left plus the cost:

d[i-1,j-1] + cost.

6

Code:

#include<iostream>

#include<cstring>

using namespace std;

#define C (1)

int Minimum(int a, int b, int c)

{

int min=a;

if(b<min)

min=b;

if(c<min)

min=c;

return min;

}

7

int EditDistanceDP(char X[], char Y[])

{

int left, top, diagtopleft;

const int m = strlen(X)+1;

const int n = strlen(Y)+1;

int T[m][n];

for(int i = 0; i < m; i++)

for(int j = 0; j < n; j++)

T[i][j] = -1;

for(int i = 0; i < m; i++) // base case : 0'th column

T[i][0] = i;

for(int j = 0; j < n; j++) // base case : o'th row

T[0][j] = j;

8

for(int i = 1; i < m; i++)

{

for(int j = 1; j < n; j++)

{

left = T[i][j-1]; //case1 : deletion

left += C;

top = T[i-1][j]; //case2 : insertion

top += C;

diagtopleft = T[i-1][j-1]; //case3 : Replace

diagtopleft += (X[i-1] != Y[j-1]);

T[i][j] = Minimum(left, top, diagtopleft);

}

}

return T[m-1][n-1];

}

9

int EditDistanceRecursion( char *X, char *Y, int m, int n )

{

if( m == 0 && n == 0 )

return 0;

if( m == 0 )

return n;

if( n == 0 )

return m;

int left = EditDistanceRecursion(X, Y, m-1, n) + 1;

int right = EditDistanceRecursion(X, Y, m, n-1) + 1;

int corner = EditDistanceRecursion(X, Y, m-1, n-1) +

(X[m-1] != Y[n-1]);

return Minimum(left, right, corner);

}

10

int main()

{

char a[15],b[15];

cout<<"Enter string A : ";

cin>>a;

cout<<"Enter string B : ";

cin>>b;

cout<<"\nDP:\nMinimum edits required to convert

"<<a<<" into "<<b<<" is "<<EditDistanceDP(a, b)<<"\n";

cout<<"\nRecursion:\nMinimum edits required to

convert "<<a<<" into "<<b<<" is "

<<EditDistanceRecursion(a, b,strlen(a),strlen(b))<<"\n";

return 0;

}

11

Sample Inputs and Outputs:

Output 1:

Source string : Levinshtein

Target string : Meilenstein

12

Output 2:

Source String : September

Target String : October

13

Output 3 :

Source string: Algorithms

Target String: datastructures

- course outline math 11 awUploaded byapi-236184654
- AIM BASIC Reference ManualUploaded byMoe B. Us
- Single vs Double Quotes - Qlikview - How and When to UseUploaded byaejiii
- Java RecordUploaded bySuresh Prabhu
- Pro Decrypting VBScript VirusesUploaded byadirocks89
- dis doc 2017-2018Uploaded byapi-368419710
- Parsing CSV Values Into Multiple Rows - SQLTeamUploaded byravelstein
- strings (2)Uploaded byJulio Trecenti
- Lesson Template MathemaicsUploaded byMurali Sambhu
- Chapt12 StrUploaded bysvdkar
- Go for PythonistasUploaded byJanek Podwala
- AAL8209Uploaded byMoe B. Us
- Basic Math GuideUploaded byquaser79
- Undergrad Experience MathematicsUploaded byRafael Estrada
- Windows Inspectors 81_110706Uploaded byhamza
- Components of a Java ProgramUploaded byaleksandarpmau
- Dorrie Heinrich ProblemsUploaded byRaghuveer Chandra
- 541 video enhanced lesson planUploaded byapi-245623530
- 15.4 Problem SetUploaded byddlu123
- 2nd assUploaded bygumshoot
- COMP1161 2013 14 Semester2 Tutorial2Uploaded byDayton Good Kush Allen
- The CL Corner More on ILE CEE Time APIsUploaded byrachmat99
- Java AugustUploaded byDilesh Kumar
- A AUploaded byVishwanath Shervegar
- 11 - IndexersUploaded byxsamy
- 02_Computing_Minimum_Edit_Distance_5-54.pdfUploaded byAdvay Rajhansa
- Graphics SasikalaUploaded byAkila Narayanan
- Geospatial Platform Developper ManualUploaded byMathias Eder
- CS116Week2LectureUploaded byGrantham University
- Exercise 3Uploaded bykapilpriyanshu

- Twenty Six Characters in Search of a Function 2Uploaded byNaraine Kanth
- dpUploaded byKisalay Kumar
- Indo-European Languages TreeUploaded bysatand2121
- (2000) Extracting Semantic Clusters From the Aligment of Definitions [GSierra - MacNaught]]Uploaded byGeorg Ohm
- Measuring musical rhythm similarity: Transformation versus feature-based methodsUploaded byarsem
- Edit DistanceUploaded bybinodrit98
- A Survey of Spelling Error Detection and Correction TechniquesUploaded byseventhsensegroup
- CD Asia Online User's Guide.pdfUploaded byAshAngeL
- Review of Ontology Matching Approaches and ChallengesUploaded byAnonymous 7rFPCyMK
- R_ Approximate String Matching (Fuzzy Matching)Uploaded byloshude
- Aproximate String MatchingUploaded byMauliadi
- u04702144156Uploaded byapi-273486369
- Fast Approximate Search in Large Dictionaries - Stoyan Mihov, Klaus U. SchulzUploaded byPhat Luong
- IJETTCS-2014-02-25-116Uploaded byAnonymous vQrJlEN
- White PaperUploaded byJohn Worthley
- Binary codes capable of correcting deletions, insertions and reversalsUploaded byarrraa
- Meichelbeck Julien ThesisUploaded byozma77
- Fuzzy MatchingUploaded byzelote57
- Cheating Detection in Online ExaminationsUploaded byTanay Bansal
- Data Quality Reference CardUploaded byoparikoko
- Consumer Data ResearchUploaded byJoaquín Osorio Arjona
- Approximate MatchingUploaded byskgcp864355
- IR2Uploaded bybrightday87
- Cpp LabsUploaded byCao Huy
- Paper 6-Introduction of the Weight Edition Errors in the Levenshtein DistanceUploaded byIjarai ManagingEditor
- CSF-469 - L6 (Spelling Correction)Uploaded byKriti Goyal
- Srinivasamurthy_BeijingOperaPatternsUploaded byabc7615
- SOUNDEX y FUZZY VLOOKUP FOR VBA EXCEL 201107.docxUploaded byegrodrig20082275
- anatomyOfTheInfamousArtificialExampleUpdatedHyperlinksIncludesSkepticalAndCommentOnObscenityDatedUploaded byJayden K. Amolli
- Damerau-Levenshtein Algorithm and Bayes Theorem for Spell Checker OptimizationUploaded byIskandar Setiadi