MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  An O(ND) difference algorithm and its variations (1986) [90 citations — 2 self]

Download:
pdf
by Eugene W. Myers
Algorithmica
http://xmailserver.org/diff2.pdf
Add To MetaCart

Abstract:

The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simple O(ND) time and space algorithm is developed where N is the sum of the lengths of A and B and D is the size of the minimum edit script for A and B. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to have O(N + D 2) expected-time performance under a basic stochastic model. A refinement of the algorithm requires only O(N) space, and the use of suffix trees leads to an O(NlgN + D 2) time variation. KEY WORDS longest common subsequence shortest edit script edit graph file comparison 1.

Citations

599 Data Structures and Algorithms – Aho, Hopcroft, et al. - 1983
407 The string-to-string correction problem – Wagner, Fischer - 1974
391 A space-economical suffix tree construction algorithm – McCreight - 1976
294 Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison – Sankoff, Kruskal, et al. - 1983
292 The Art of Computer Programming, Vol. 3: Sorting and Searching – Knuth - 1973
238 The source code control system – Rochkind - 1975
221 Fast algorithms for finding nearest common ancestors – Harel, Tarjan - 1984
166 A linear space algorithm for computing maximal common subsequences – Hirschberg - 1975
119 A faster algorithm for computing string edit distances – Masek, Paterson - 1980
104 Algorithms for the longest common subsequence problem – Hirschberg - 1977
104 A Fast Algorithm for Computing Longest Common Subsequences – Hunt, Szymanski - 1977
82 A Note on Two – Dijkstra
65 The string-to-string correction problemwith block moves – Tichy - 1984
57 An Algorithm for Differential File Comparision – Hunt, McIllroy - 1976
41 Bounds on the complexity of the longest common subsequence problem – Ullman, Aho, et al. - 1976
17 A longest common subsequence algorithm suitable for similar text strings – Nakatsu, Kambayashi, et al. - 1982
7 A redisplay algorithm – GOSLING - 1981
7 An information-theoretic lower bound for the longest common subsequence problem – Hirschberg - 1978