文字列編集距離(Edit Distance)分析とソースコード
1.文字列編集距離定義
定義:文字列編集距離とは、文字列s 1がs 2になるのに少なくとも何回置換、挿入、削除操作が必要かを指す.
Ed(s1,s2) = minimum # of operations (insertion, deletion,substitution) to change s1 to s2
Example:
s1: Tom Hanks
s2: Ton Hank
ed(s1, s2) = 2
2.文字列編集距離の解
動的計画解,時間複雑度O(mn)
A = a1 a2...am
B = b1 b2...bn
Di,j = edit distance of a1 a2 ... ai and b1 b2... bj.
Di,0 = i, 0<=i <= m
D0,j = j, 0 <= j<= n
Di,j = min{Di-1, j + 1,
Di, j-1 + 1,
Di-1, j-1 + ti, j}, 1 <= i <=m, 1 <=j <= n
where ti, j = 0 if ai = bj and ti, j = 1 if ai != bj.
2.ソースプログラム
定義:文字列編集距離とは、文字列s 1がs 2になるのに少なくとも何回置換、挿入、削除操作が必要かを指す.
Ed(s1,s2) = minimum # of operations (insertion, deletion,substitution) to change s1 to s2
Example:
s1: Tom Hanks
s2: Ton Hank
ed(s1, s2) = 2
2.文字列編集距離の解
動的計画解,時間複雑度O(mn)
A = a1 a2...am
B = b1 b2...bn
Di,j = edit distance of a1 a2 ... ai and b1 b2... bj.
Di,0 = i, 0<=i <= m
D0,j = j, 0 <= j<= n
Di,j = min{Di-1, j + 1,
Di, j-1 + 1,
Di-1, j-1 + ti, j}, 1 <= i <=m, 1 <=j <= n
where ti, j = 0 if ai = bj and ti, j = 1 if ai != bj.
2.ソースプログラム
int getMin(int a, int b, int c)
{
int min1 = (a<=b)?a:b;
int min2 =(min1<=c)?min1:c;
return min2;
}
int getED(const char*str1, const char*str2) {
int m = strlen(str1);
int n = strlen(str2);
int **dis = new int*[m+1];
for(int i=0; i<m+1; i++) {
dis[i] = new int[n+1];
}
for(int i=0; i<=m; i++) {
dis[i][0]=i;
}
for(int j=0; j<=n; j++) {
dis[0][j]=j;
}
for(int i=1; i<=m; i++) {
for(int j=1; j<=n; j++) {
int tij=(str1[i-1]==str2[j-1])?0:1;
dis[i][j] = getMin(dis[i-1][j]+1, dis[i][j-1]+1, dis[i-1][j-1]+tij);
}
}
return dis[m][n];
}