Abstract:
The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simple O(ND) time and space algorithm is developed where N is the sum of the lengths of A and B and D is the size of the minimum edit script for A and B. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to have O(N + D 2) expected-time performance under a basic stochastic model. A refinement of the algorithm requires only O(N) space, and the use of suffix trees leads to an O(NlgN + D 2) time variation. KEY WORDS longest common subsequence shortest edit script edit graph file comparison 1.
Citations
|
599
|
Data Structures and Algorithms
– Aho, Hopcroft, et al.
- 1983
|
|
407
|
The string-to-string correction problem
– Wagner, Fischer
- 1974
|
|
391
|
A space-economical suffix tree construction algorithm
– McCreight
- 1976
|
|
294
|
Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
– Sankoff, Kruskal, et al.
- 1983
|
|
292
|
The Art of Computer Programming, Vol. 3: Sorting and Searching
– Knuth
- 1973
|
|
238
|
The source code control system
– Rochkind
- 1975
|
|
221
|
Fast algorithms for finding nearest common ancestors
– Harel, Tarjan
- 1984
|
|
166
|
A linear space algorithm for computing maximal common subsequences
– Hirschberg
- 1975
|
|
119
|
A faster algorithm for computing string edit distances
– Masek, Paterson
- 1980
|
|
104
|
Algorithms for the longest common subsequence problem
– Hirschberg
- 1977
|
|
104
|
A Fast Algorithm for Computing Longest Common Subsequences
– Hunt, Szymanski
- 1977
|
|
82
|
A Note on Two
– Dijkstra
|
|
65
|
The string-to-string correction problemwith block moves
– Tichy
- 1984
|
|
57
|
An Algorithm for Differential File Comparision
– Hunt, McIllroy
- 1976
|
|
41
|
Bounds on the complexity of the longest common subsequence problem
– Ullman, Aho, et al.
- 1976
|
|
17
|
A longest common subsequence algorithm suitable for similar text strings
– Nakatsu, Kambayashi, et al.
- 1982
|
|
7
|
A redisplay algorithm
– GOSLING
- 1981
|
|
7
|
An information-theoretic lower bound for the longest common subsequence problem
– Hirschberg
- 1978
|