Results 1  10
of
61
A Guided Tour to Approximate String Matching
 ACM COMPUTING SURVEYS
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 598 (36 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems.
OnLine Construction of Suffix Trees
, 1995
"... An online algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It has always the suffix tree for the scanned part of the strin ..."
Abstract

Cited by 437 (2 self)
 Add to MetaCart
An online algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It has always the suffix tree for the scanned part of the string ready. The method is developed as a lineartime version of a very simple algorithm for (quadratic size) suffix tries. Regardless of its quadratic worstcase this latter algorithm can be a good practical method when the string is not too long. Another variation of this method is shown to give in a natural way the wellknown algorithms for constructing suffix automata (DAWGs).
A theory of parameterized pattern matching: algorithms and applications
 In Proc. 25th Annual ACM Symposium on the Theory of Computing
, 1993
"... ..."
Approximate string matching over suffix trees
 PROCEEDINGS OF THE 4TH ANNUAL SYMPOSIUM ON COMBINATORIAL PATTERN MATCHING, NUMBER 684 IN LECTURE NOTES IN COMPUTER SCIENCE
, 1993
"... The classical approximate stringmatching problem of finding the locations of approximate occurrences P 0 of pattern string P in text string T such that the edit distance between P and P 0 is k is considered. We concentrate on the special case in which T is available for preprocessing before the se ..."
Abstract

Cited by 70 (1 self)
 Add to MetaCart
The classical approximate stringmatching problem of finding the locations of approximate occurrences P 0 of pattern string P in text string T such that the edit distance between P and P 0 is k is considered. We concentrate on the special case in which T is available for preprocessing before the searches with varying P and k. It is shown how the searches can be done fast using the suffix tree of T augmented with the suffix links as the preprocessed form of T and applying dynamic programming over the tree. Three variations of the search algorithm are developed with running times O(mq + n), O(mq log q + size of the output), and O(m
Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance
 SIAM Journal on Computing
, 1997
"... As an aid in software maintenance, it would be useful to be able to track down duplication in large software systems efficiently. Duplication in code is often in the form of sections of code that are the same except for a systematic change of parameters such as identifiers and constants. To model su ..."
Abstract

Cited by 65 (5 self)
 Add to MetaCart
As an aid in software maintenance, it would be useful to be able to track down duplication in large software systems efficiently. Duplication in code is often in the form of sections of code that are the same except for a systematic change of parameters such as identifiers and constants. To model such parameterized duplication in code, this paper introduces the notions of parameterized strings and parameterized matches of parameterized strings. A data structure called a parameterized suffix tree is defined to aid in searching for parameterized matches. For fixed alphabets, algorithms are given to construct a parameterized suffix tree in linear time and to find all maximal parameterized matches over a threshold length in a parameterized pstring in time linear in the size of the input plus the number of matches reported. The algorithms have been implemented
Multiple Filtration and Approximate Pattern Matching
, 1995
"... Given a text of length n and a query of length q, we present an algorithm for finding all locations of mtuples in the text and in the query that differ by at most k mismatches. This problem is motivated by the dotmatrix constructions for sequence comparison and optimal oligonucleotide probe selec ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
Given a text of length n and a query of length q, we present an algorithm for finding all locations of mtuples in the text and in the query that differ by at most k mismatches. This problem is motivated by the dotmatrix constructions for sequence comparison and optimal oligonucleotide probe selection routinely used in molecular biology. In the case q = m the problem coincides with the classical approximate string matching with k mismatches problem. We present a new approach to this problem based on multiple hashing, which may have advantages over some sophisticated and theoretically efficient methods that have been proposed. This paper describes a twostage process. The first stage (multiple filtration) uses a new technique to preselect roughly similar mtuples. The second stage compares these mtuples using an accurate method. We demonstrate the advantages of multiple filtration in comparison with other techniques for approximate pattern matching.
Block Edit Models for Approximate String Matching
 Theoretical Computer Science
, 1997
"... In this paper we examine string block edit distance, in which two strings A and B are compared by extracting collections of substrings and placing them into correspondence. This model accounts for certain phenomena encountered in important realworld applications, including pen computing and molecu ..."
Abstract

Cited by 51 (4 self)
 Add to MetaCart
In this paper we examine string block edit distance, in which two strings A and B are compared by extracting collections of substrings and placing them into correspondence. This model accounts for certain phenomena encountered in important realworld applications, including pen computing and molecular biology. The basic problem admits a family of variations depending on whether the strings must be matched in their entireties, and whether overlap is permitted. We show that several variants are NPcomplete, and give polynomialtime algorithms for solving the remainder. Keywords: block edit distance, approximate string matching, sequence comparison, approximate ink matching, dynamic programming. 1 Introduction The edit distance model for string comparison [Lev66, NW70, WF74] has found widespread application in fields ranging from molecular biology to bird song classification [SK83]. A great deal of research has been devoted to this area, and numerous algorithms have been proposed for com...
A Comparison of Approximate String Matching Algorithms
, 1991
"... Experimental comparison of the running time of approximate string matching algorithms for the�differences problem is presented. Given a pattern string, a text string, and integer�, the task is to find all approximate occurrences of the pattern in the text with at most�differences (insertions, deleti ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
Experimental comparison of the running time of approximate string matching algorithms for the�differences problem is presented. Given a pattern string, a text string, and integer�, the task is to find all approximate occurrences of the pattern in the text with at most�differences (insertions, deletions, changes). We consider seven algorithms based on different approaches including dynamic programming, BoyerMoore string matching, suffix automata, and the distribution of characters. It turns out that none of the algorithms is the best for all values of the problem parameters, and the speed differences between the methods can be considerable. 2��� KEY WORDS String matching Edit distance k differences problem
Approximate String Matching: A Simpler Faster Algorithm
 SIAM J. Comput
, 1998
"... Abstract. We give two algorithms for finding all approximate matches of a pattern in a text, where the edit distance between the pattern and the matching text substring is at most k. The first algorithm, which is quite simple, runs in time O ( nk3 + n + m) on all patterns except kbreak periodic m s ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
Abstract. We give two algorithms for finding all approximate matches of a pattern in a text, where the edit distance between the pattern and the matching text substring is at most k. The first algorithm, which is quite simple, runs in time O ( nk3 + n + m) on all patterns except kbreak periodic m strings (defined later). The second algorithm runs in time O ( nk4 + n + m) onkbreak periodic m patterns. The two classes of patterns are easily distinguished in O(m) time.