27 citations found. Retrieving documents...
R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192, 1992. LNCS 644.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Fast and Flexible String Matching by Combining.. - Navarro, Raffinot (1998)   (3 citations)  (Correct)

....distance. 8.2 Edit Distance The Levenshtein distance (or just edit distance) between two words is the minimal number of substitutions, insertions and deletions of letters needed to make them equal. For instance, d( survey , surgery ) 2. A number of solutions to this problem exist, being [6, 5, 19, 16, 20, 29] the fastest in practice. We present two extensions of our algorithm for approximate string matching. 8.2.1 Partitioning into Exact Searching In [28] a simple but very effective filter is proposed for approximate string matching. It is based on the observation that if a pattern of length m ....

....00 11 11 11 00 00 00 11 11 11 00 00 00 11 11 11 00 00 00 11 11 11 e e e e y e y y e e e e r r r e e v v v e e u u no errors 2 errors 1 error e e s s s I e e e e e e e u Figure 10: Our NFA to recognize suffixes of the reversed pattern. Later, [6] proposed the use of a multipattern Boyer Moore strategy to perform the above search, which at the cost of not allowing limited expressions gives a much more efficient algorithm. In [5] this algorithm was implemented and shown to be the fastest in practice when the number of errors is low enough ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192. Springer-Verlag, 1992. LNCS 644.


A Partial Deterministic Automaton for Approximate String Matching - Navarro (1997)   (2 citations)  (Correct)

....on the active values. It was conjectured (and later proved in [6, 4] that the average number of active values is O(k) By working only on the active values, O(kn) average time is achieved. Other approaches attempt to filter the text, reducing the area in which dynamic programming needs to be used [21, 20, 19, 7, 8, 11, 5, 18]. These algorithms achieve sublinear expected time in many cases (O(kn log oe m=m) is a typical figure) for moderate k=m ratios, but the filtration is not effective for larger ratios. Yet other approaches use bit parallelism [1, 27] to reduce the number of operations [24, 28] In [28] O(kn=w) ....

....Because of its match reporting policy and its options, it is hard to compare fairly with the other algorithms, but we include it as a reference point. Baeza Yates Navarro [4] parallelizes the NFA by diagonals using bits of the computer word. The code is from the authors. Baeza Yates Perleberg [5] is a filter based on partitioning the pattern plus exact search of the pieces. The code is ours. Chang Lampe [6] is the algorithm kn.clp, which computes only the places where the value of the dynamic programming matrix does not change along each column. The code is from the author. ....

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Z. Galil A. Apostolico, M. Crochemore and U. Manber, editors, Proc. CPM'92, pages 185--192. Springer-Verlag, 1992. Lecture Notes in Computer Science 644.


An Algorithm for Estimating all Matches Between Two Strings - Atallah, Chyzak, Dumas   (Correct)

....model in which the convolution of two M length vectors can be done in O(M log M) time. The algorithm of Baeza Yates and Gonnet [6] solves the problem in O(NM log M= log N) time, which is better than O(N log M) for small M , i.e. if M log N . The algorithm of Baeza Yates and Perleberg [7] solves the problem in average time O(NM=oe) where oe is the size of the alphabet (i.e. the number of distinct symbols that appear in M ) which is good for large oe. The interest in the vector C is usually motivated by the need to find all positions in the text at which the pattern almost ....

R.A. Baeza-Yates and C.H. Perleberg, "Fast and Practical Approximate Pattern Matching," Information Processing Letters, 59, 1996, pp. 21--27.


A Bit-parallel Approach to Suffix Automata: Fast Extended.. - Navarro, Raffinot (1998)   (4 citations)  (Correct)

....It is based on dividing the pattern in k 1 pieces and searching all the pieces in parallel. Since k errors cannot destroy the k 1 pieces, some of the pieces must appear with no errors close to each occurrence. They use the multipattern search algorithm mentioned in the previous paragraph. In [4, 3], a multipattern Boyer Moore strategy is preferred, which is faster but does not handle classes of characters and other extensions. This algorithm is the fastest one for low error levels. Our multipattern search technique presented in the previous section combines the best of both worlds: our ....

....is not five fold but two or three fold because of shorter shifts) and are more efficient than the proposal of [22] provided oe 8. Finally, we show the performance of our multipattern algorithm when used for approximate string matching. We include the fastest known algorithms in the comparison [4, 3, 7, 13, 16, 23, 22]. We compare those algorithms against our version of [4] where the Sunday algorithm is replaced by our BNDM) while we consider [22] not as the bit parallel algorithm presented there but their other proposal, namely reduction to exact searching using their algorithm Multi WM for multipattern ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192. Springer-Verlag, 1992. LNCS 644.


An Algorithm for Estimating all Matches Between Two Strings - Atallah, Chyzak, Dumas   (Correct)

....model in which the convolution of two M length vectors can be done in O(M log M) time. The algorithm of Baeza Yates and Gonnet [5] solves the problem in O(NM log M= log N) time, which is better than O(N log M) for small M , i.e. if M log N . The algorithm of Baeza Yates and Perleberg [6] solves the problem in average time O(NM=oe) where oe is the size of the alphabet (i.e. the number of distinct symbols that appear in M ) which is good Position i Text . b c a a b c a a b b b a c . Pattern a b a b b a Matches Figure 1: The pattern is slid along the text and for ....

R.A. Baeza-Yates and C.H. Perleberg, "Fast and Practical Approximate Pattern Matching," Information Processing Letters, 59, 1996, pp. 21--27.


Improved Approximate Pattern Matching on Hypertext - Navarro (1998)   (2 citations)  (Correct)

....patt) k. The classical solution is O(mn) time and involves dynamic programming [11] This solution is the most flexible to allow different distance functions. For the particular case of ed( a number of algorithms have been presented to improve the worst case to O(kn) or the average case, e.g. [8, 13, 5, 12, 4, 14, 15, 3] Pattern matching on hypertext [6] has been considered only recently. The model is that the text forms a graph of N nodes and E edges, where a string is stored inside each node, and the edges indicate alternative texts that may follow the current node. The pattern is still a simple string of ....

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, LNCS 644, pages 185--192, 1992.


Multiple Approximate String Matching by Counting - Navarro (1997)   (5 citations)  (Correct)

....[7, 11, 17, 5] These algorithms normally achieve O(kn) time complexity in the worst or the average case. An exception is [5] which achieves O(kn= p oe) on average. Those that filter the text, quickly leaving out most of the text and verifying only the areas that seem interesting, e.g. [16, 15, 14, 6, 4]. They achieve sublinear expected time in many cases (e.g. O(kn log oe m=m) for small ff. However, they tend to be practical only for m not too small and low error ratios. Some exceptions are [4, 9] which are O(n) for small ff even when m is small. Those that parallelize the computation of ....

....leaving out most of the text and verifying only the areas that seem interesting, e.g. 16, 15, 14, 6, 4] They achieve sublinear expected time in many cases (e.g. O(kn log oe m=m) for small ff. However, they tend to be practical only for m not too small and low error ratios. Some exceptions are [4, 9], which are O(n) for small ff even when m is small. Those that parallelize the computation of a classical algorithm in the bits of computer words [21, 19, 22, 2, 1] We call w the number of bits in the computer word, which is assumed to be Theta(log n) These algorithms normally obtain a ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192, 1992. LNCS 644.


Large Text Searching Allowing Errors - Araujo, Navarro, Ziviani (1997)   (4 citations)  (Correct)

....The case k = 0 corresponds to the classical exact matching problem. The classical solution for approximate searching is O(mn) time, where m is the size of the pattern and n is the size of the text [Sel80] Since the beginning of the eighties there is a long list of papers on the subject, where [BYG92, WM92, CL92, ST95, BYP92, WMM96, BYN96a, Nav97] is a partial list of the most recent ones. From the practical point of view an important new paradigm called bit parallelism was developed by Baeza Yates and Gonnet [BYG92] In their algorithm the state of the search is represented as a number and only bitwise logical operations shifts and ....

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, Springer-Verlag LNCS, v. 644, pages 185-- 192, 1992.


New and Faster Filters for Multiple Approximate String Matching - Baeza-Yates, Navarro   Self-citation (Baeza-yates)   (Correct)

....Then we present the three new techniques. In Section 3 we present automaton superimposition , which extends a bit parallel simulation of a nondeterministic finite automaton (NFA) 6] In Section 4 we present exact partitioning , that extends a filter based on exact searching of pattern pieces [7, 6, 24]. In Section 5 we present counting , based on counting pattern letters in a text window [14] In Section 6 we analyze our algorithms and in Section 7 we compare them experimentally. Finally, in Section 8 we give our conclusions. Some detailed analyses are left for Appendices A and B. Although ....

....4 Partitioning into Exact Searching This technique (called exact partitioning for short) is based on a single pattern filter which reduces the problem of approximate searching to a problem of multipattern exact searching. The algorithm first appeared in [34] and was later improved in [7, 6, 24]. We first present the singlepattern version and then our extension to multiple patterns. 4.1 A Filter Based on Exact Searching A particular case of Lemma 1 shows that if a pattern matches a text position with k errors, and we split the pattern in k 1 pieces, then at least one of the pieces must ....

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192, 1992. LNCS 644.


Fast Multi-Dimensional Approximate Pattern Matching - Navarro, Baeza-Yates (1998)   (1 citation)  Self-citation (Baeza-yates)   (Correct)

....dynamic programming algorithm [18] to search a pattern in a text allowing errors uses dynamic programming and is O(mn) time and O(m) space. This solution was later improved by a number of algorithms, which we do not cover here. The only one of interest to this work is a filtering algorithm [19, 7, 5]. It states that if a pattern is cut in k 1 pieces, then any occurrence with up to k errors must contain one of the pieces unchanged. This is obvious since k errors cannot alter the k 1 pieces given the edit operations that we consider (which cannot alter two pieces at the same time) The ....

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, LNCS 644, pages 185--192, 1992.


Very Fast and Simple Approximate String Matching - Navarro, Baeza-Yates (1998)   (4 citations)  Self-citation (Baeza-yates)   (Correct)

....classical solution, involving dynamic programming, is O(mn) time [8] In the last years many This work has been supported in part by FONDECYT grant 1950622. Preprint submitted to Elsevier Preprint 8 April 1998 algorithms have been presented that improve the classical algorithm, for instance [12,5,4,11,3,14,15,6,2]. For low error levels, the fastest known algorithm is [3] It is a filtration algorithm, i.e. it quickly discards large text areas that cannot contain a match and later verifies suspicious areas using a classical algorithm. As all filtration algorithms, it ceases to work well for moderate error ....

....last years many This work has been supported in part by FONDECYT grant 1950622. Preprint submitted to Elsevier Preprint 8 April 1998 algorithms have been presented that improve the classical algorithm, for instance [12,5,4,11,3,14,15,6,2] For low error levels, the fastest known algorithm is [3]. It is a filtration algorithm, i.e. it quickly discards large text areas that cannot contain a match and later verifies suspicious areas using a classical algorithm. As all filtration algorithms, it ceases to work well for moderate error levels. In this paper we use a new technique to verify ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. Information Processing Letters, 59:21--27, 1996.


Fast Approximate String Matching in a Dictionary - Baeza-Yates, Navarro (1998)   Self-citation (Baeza-yates)   (Correct)

....used in deterministic or nondeterministic form [35, 4, 21] Another trend is that of filtration algorithms: a fast filter is run over the text quickly discarding uninteresting parts. The interesting parts are later verified with a more expensive algorithm. Examples of filtration approaches are [29, 6]. Some are sublinear in the sense that they do not inspect all the text characters, but the on line problem is Omega Gamma n) if m is taken as constant. If the text is large and has to be searched frequently, even the fastest on line algorithms are not practical, and preprocessing the text ....

....of the dictionary is traversed (the percentage is decreasing since the number of comparisons are sublinear except for FQH) The user times correspond quite well to the number of comparisons. We show the percentage of user times using the structures versus the best online algorithm for this case [6]. As it can be seen, for the maximum dictionary size we reach 40 of the online time for many metric structures (this percentage will improve for BKT and FQT in larger dictionaries) From those structures, we believe that FQT and BKT with b = 1 are the best choices, since they are sublinear and ....

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192, 1992. LNCS 644.


A Faster Algorithm for Approximate String Matching - Baeza-Yates, Navarro (1996)   (2 citations)  Self-citation (Baeza-yates)   (Correct)

....which dynamic programming needs to be used [18, 19, 17, 16, 7, 8] These algorithms achieve sublinear expected time in many cases (O(kn log oe m=m) is a typical figure) for moderate k=m ratios, but the filtration is not effective for larger ratios. A simple and fast filtering technique is shown in [5], which yields an O(n) algorithm for moderate k=m ratios. Yet other approaches use bit parallelism [2, 25] in a RAM machine of word length O(log n) to reduce the number of operations. 24] achieves O(kmn= log n) time, which is competitive for patterns of length O(logn) 22] packs the cells ....

....1= log n) otherwise it is O( p mk= log n n) i.e. O( p k n) for m = O(log n) else O(kn) It involves also a cost to verify potential matches, which is shown to be not significant for ff ff 1 1 Gamma m 1= p log n = p oe. This algorithm is a generalization of an earlier heuristic [23, 5], that reduces the problem to subproblems of exact matching and is shown to be O(n) for ff ff 0 = 1= 3 log oe m) The second one partitions the automaton in sub automata, being O(k 2 n= p oe log n) on average. For ff 1 Gamma 1= p oe its worst case, O( m Gamma k)kn= log n) ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192. Springer-Verlag, 1992. LNCS 644.


Improving an Algorithm for Approximate Pattern Matching - Navarro, Baeza-Yates (1998)   (6 citations)  Self-citation (Baeza-yates)   (Correct)

.... of the properties of the dynamic programming matrix (e.g. values in neighbor cells differ at most in one) The best average complexity achieved under this approach is O(kn= p oe) 8] Other approaches attempt to filter the text, reducing the area in which dynamic programming needs to be used [26, 30, 25, 24, 9, 10, 18, 7, 22]. The filtration is based on the fact that some portions of the pattern must appear with no errors even in an approximate occurrence. These algorithms achieve sublinear expected time in many cases for low error ratios (i.e. not all text characters are read, O(kn log oe m=m) is a typical figure) ....

....lazy form. The algorithm is proposed in [13] and studied more in detail in [19] whose implementation we use. EP (exact partitioning) is the filtering algorithm proposed in [33] which splits the pattern in k 1 pieces and searches them using a Boyer Moore multipattern algorithm, as suggested in [7]. The code is ours and uses an extension of the Sunday [23] algorithm. 26 Four Russians applies a Four Russians technique to pack many automaton transitions in computer words. The code is from the authors [34] and is used with r = 5 as suggested in their paper (r is related with the size of the ....

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. Information Processing Letters, 59:21--27, 1996. 28


A New Indexing Method for Approximate String Matching - Navarro, Baeza-Yates (1999)   (4 citations)  Self-citation (Baeza-yates)   (Correct)

....needed to make them equal. We define the error level as ff = k=m. In the on line version of the problem, the pattern can be preprocessed but the text cannot. The classical solution uses dynamic programming and is O(mn) time [27, 28] A number of algorithms improved later this result [34, 20,16, 11, 35,32, 12, 30, 36, 9, 8, 24]. The lower bound of the on line problem (proved and reached in [12] is O(n(k log oe m) m) which is of course Omega (n) for constant m. If the text is large even the fastest on line algorithms are not practical, and preprocessing the text becomes necessary. However, just a few years ago, ....

....on the pattern partitioning technique of [23, 8] The core of the idea is that, if a pattern of length m occurs with k errors and we split the pattern in j parts, then at least one part will appear with bk=jc errors inside the occurrence. In fact, the case j = k 1 is the basis for the algorithm [9] and the q gram index [7] The new algorithm follows. We evenly divide the pattern in j pieces (j is unspecified by now) Then we search in the suffix tree the j pieces with bk=jc errors using the algorithm of Section 2.1. For each match found ending at text position i we check the text area [i ....

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. Information Processing Letters, 59:21--27, 1996.


A Unified View to String Matching Algorithms - Baeza-Yates (1996)   (1 citation)  Self-citation (Baeza-yates)   (Correct)

....in several words. For approximate string matching, another possibility is to divide the pattern in pieces of length m= where up to k= errors are allowed, searching all pieces at once and checking the whole pattern if a piece is found. This idea is a generalization of techniques presented in [WM92, BYP92, Mye94] and together with automata partition and other ideas is used in [BYN96b, BYN96a] to design a fast expected time algorithm for approximate string matching. Acknowledgements We acknowledge the helpful discussions and comments of Ricardo Dahab, Gaston Gonnet, Udi Manber, Borivoj Melichar and ....

R.A. Baeza-Yates and C.H. Perleberg. Fast and practical approximate pattern matching. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, Lecture Notes in Computer Science 644, pages 185--192, Tucson, AZ, April/May 1992. Springer Verlag.


Indexing Methods for Approximate Text Retrieval (Extended.. - Baeza-Yates, al.   Self-citation (Baeza-yates)   (Correct)

....short patterns based on bit parallelism was presented in [5] Another trend is that of filtration algorithms: a fast filter is run over the text quickly discarding uninteresting parts. The interesting parts are later verified with a more expensive algorithm. Examples of filtration approaches are [10, 35, 33, 8, 29]. Some are sublinear in the sense that they do not inspect all the text characters, but the on line problem is Omega Gamma n) if m is taken as constant. If the text is large and has to be searched frequently, even the fastest on line algorithms are not practical, and preprocessing the text ....

....lemma. Its most general form, proved in [5] establishes that if a string matches a pattern, then we can partition the pattern in j pieces as we like and some piece will appear in the string with at most bk=jc errors. The particular case of j = k 1 (reducing to exact matching) was proved before [43, 8]. Anther particular case appears in [28] 3 Word Oriented Indexes 3.1 Partial Inversion plus Sequential Search The first proposal for a word oriented index was due to Manber and Wu [25] In a very practical approach, they propose a scheme based on a modified inverted file and sequential ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192, 1992. LNCS 644.


A Practical Index for Text Retrieval Allowing Errors - Baeza-Yates, Navarro (1997)   (4 citations)  Self-citation (Baeza-yates)   (Correct)

....occurrences. We call ff = k=m the error ratio . In the on line version of the problem, it is possible to preprocess the pattern but not the text. The classical solution involves dynamic programming and is O(mn) time [20] Recently, a number of algorithms improved the classical one, for instance [24, 9, 22, 8, 27, 28, 4]. Some of them are sublinear in the sense that they do not inspect all the characters of the text, but of course the on line problem is Omega Gamma n) if m is taken as constant. In [4, 3] it is shown that [8] is the fastest algorithm for moderately low error ratios and pattern length. Our ....

....a number of algorithms improved the classical one, for instance [24, 9, 22, 8, 27, 28, 4] Some of them are sublinear in the sense that they do not inspect all the characters of the text, but of course the on line problem is Omega Gamma n) if m is taken as constant. In [4, 3] it is shown that [8] is the fastest algorithm for moderately low error ratios and pattern length. Our present work can be seen as an off line version of that algorithm. We are particularly interested in information retrieval (IR) where the text is normally so large that the on line algorithms are not practical. ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192, 1992. LNCS 644.


A Practical q-Gram Index for Text Retrieval Allowing Errors - Navarro, Baeza-Yates   Self-citation (Baeza-yates)   (Correct)

....occurrences. We call ff = k=m the error ratio . In the on line version of the problem, it is possible to preprocess the pattern but not the text. The classical solution involves dynamic programming and is O(mn) time [25] Recently, a number of algorithms improved the classical one, for instance [29, 10, 27, 9, 33, 34, 4, 23]. Some of them are sublinear in the sense that they do not inspect all the characters of the text, but of course the on line problem is Omega Gamma n) if m is taken as constant. In [4, 3] it is shown that [9] is the fastest algorithm for moderately low error ratios and pattern length. Our ....

....of algorithms improved the classical one, for instance [29, 10, 27, 9, 33, 34, 4, 23] Some of them are sublinear in the sense that they do not inspect all the characters of the text, but of course the on line problem is Omega Gamma n) if m is taken as constant. In [4, 3] it is shown that [9] is the fastest algorithm for moderately low error ratios and pattern length. Our present work can be seen as an off line version of that algorithm. This work has been supported in part by Fondef grant 96 1064 (Chile) Although our index is applicable to other scenarios, we are particularly ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. 3rd Combinatorial Pattern Matching (CPM'92), pages 185--192, 1992. LNCS 644.


Text Retrieval: Theory and Practice - Baeza-Yates (1992)   (3 citations)  Self-citation (Baeza-yates)   (Correct)

.... algorithms [HS91] ffl fast algorithms for long patterns based on n grams [KST91] ffl several new algorithms for string matching with mismatches, based on the Boyer Moore approach, which are faster and more practical [BY89b, BYG92, GL89, TU90] ffl algorithms based on partitioning the pattern [WM91, BYP91]; ffl faster practical algorithms for string matching with errors, with O(kn) worst case searching time or better on average, where k is the maximum number of errors allowed [GP90, UW89, TU90, CL90, Ukk91] Empirical and theoretical comparisons of these algorithms can be found in [CL91b, JTU91] ....

....0 3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 t a h t e l p m a x e n a s i s i h t Count = Text Pattern = t h a n Figure 5: Searching example for the counting approach. This idea is implicit in [MW89] where is used for string matching with insertions and deletions only. It was independently discovered in [BYP91], which presents the algorithm mentioned above for mismatches. This technique is also very fast and does not use character comparisons. 4 Retrieval Algorithms for Indexed Text In this section we structure the algorithms according to the data structure used as index for the text. An index is ....

R.A. Baeza-Yates and C.H. Perleberg. Fast and practical approximate pattern matching. Technical report, Dept. of Computer Science, Univ. of Chile, 1991.


Multiple Approximate String Matching - Baeza-Yates, Navarro (1997)   Self-citation (Baeza-yates)   (Correct)

....solution for a single pattern is O(mn) running time [14] In the last years several algorithms have improved the classical one. Some achieve O(kn) cost by using the properties of the dynamic programming matrix [18, 8, 10, 19, 6] Others filter the text to quickly eliminate uninteresting parts [17, 16, 7, 13, 5], some of them being sublinear on average for moderate ff. Yet other approaches use bit parallelism [2] to reduce the number of operations This work has been supported in part by FONDECYT grants 1950622 and 1960881. 20, 22, 21, 4] In [21] the search is modeled with a non deterministic ....

....positions, keeping many counters in a single computer word and updating them in a single operation. In this work, we present two new algorithms that are extensions of previous ones to the case of multiple search. In Sections 2 and 3 we explain and extend [4] In Section 4 we do the same for [5]. In Section 5 and 6 we analyze our algorithms and compare them against [11] and [13] Although [11] allows to search for many patterns, it is limited to only one error. We allow any number of errors, and improve [11] when the number of patterns is not very large (say, less than 60) We improve ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192. Springer-Verlag, 1992. LNCS 644.


Fast and Practical Approximate String Matching - Baeza-Yates, Perleberg (1992)   (3 citations)  Self-citation (Baeza-yates Perleberg)   (Correct)

....O(m 2 ) extra space. This result generalizes to any algorithm based in partitioning the pattern, as in [32] and is simpler than Chang and Lawler s algorithm [9] For a complete set of references on these problems we refer the reader to [14] A preliminary version of this paper was presented in [7]. 2 String Matching with Mismatches First let us describe the string matching with k mismatches algorithm using the best (and practical) case of only distinct characters in P . In this case, each character in P has a unique offset from the beginning of P , and every time a character of P is ....

R.A. Baeza-Yates and C.H. Perleberg. Fast and practical approximate pattern matching. In A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, editors, Combinatorial Pattern Matching, Lecture Notes in Computer Science 644, pages 185--192, Tucson, AZ, April/May 1992. Springer Verlag.


A Fast Heuristic for Approximate String Matching - Baeza-Yates, Navarro   Self-citation (Baeza-yates)   (Correct)

....These algorithms achieve sublinear expected time in many cases (O(kn log oe m=m) is a typical figure) for moderate ff, but the filtration is not effective for larger ratios. Moreover, they are not practical if m is not very large. An exception is a simple and fast filtering technique shown in [4], which yields an O(n) algorithm for moderate ff. Yet other approaches use bit parallelism [1, 23] in a RAM machine of word length O(logn) to reduce the number of operations. 22] achieves O(kmn= log n) time, which is competitive for patterns of length O(logn) 20] packs the cells differently ....

....string searching algorithm. We use a variation of Boyer Moore Horspool Sunday [13] to drive this search. Although this variant is linear and much faster than the others, it cannot be used past ff 0 = 1= 3 log oe m) where the verifications become too expensive. This is essentially the idea of [4]. We find later a more exact replacement for the theoretical value 3 in the above expression. 3.4 The Combined Algorithm Our final algorithm uses the simple automaton when it fits in a word, otherwise it uses j = k 1 for ff ff 0 , pure pattern partitioning for ff 0 ff ff 1 , mixed ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192. Springer-Verlag, 1992. LNCS 644.


A New Approach to Text Searching - Baeza-Yates, Gonnet   Self-citation (Baeza-yates)   (Correct)

....28] During this time, Wu and Manber extended very nicely the basic algorithm presented in this paper to include approximate string searching [29, 30] Their paper appears in this issue of CACM. Recently, another technique for string searching which is not based on comparisons has been presented [11]. These results show that the full potential of practical non comparison based text searching algorithms is yet to be explored. Acknowledgements We wish to acknowledge the helpful comments of the anonymous referees. ....

R.A. Baeza-Yates and C.H. Perleberg. Fast and practical approximate pattern matching. In Combinatorial Pattern Matching, pages 182--189, Tucson, AZ, April 1992.


A Faster Algorithm for Approximate String Matching (Extended .. - Baeza-Yates, al. (1996)   (2 citations)  Self-citation (Baeza-yates)   (Correct)

....which dynamic programming needs to be used [15, 18, 14, 13, 7, 8] These algorithms achieve sublinear expected time in many cases (O(kn log c m=m) is a typical figure) for moderate k=m ratios, but the filtration is not effective for larger ratios. A simple and fast filtering technique is shown in [5], which yields an O(n) algorithm for moderate k=m ratios. Yet other approaches use bit paralellism [2] in a RAM machine of word length O(log n) to reduce the number of operations. 21] achieves O(kmn= log n) which is competitive for patterns of length O(log n) 19] packs the cells differently to ....

....1= log n) otherwise it is O( p mk= log n n) i.e. O( p k n) for m = O(log n) else O(kn) It involves also a cost to verify potential matches, which is shown to be not significant for ff ff 1 1 Gamma m 1= p log n = p c. This algorithm is a generalization of an earlier heuristic [20, 5], that reduces the problem to exact matching and is shown to be O(n) for ff ff 0 = 1= 3 log c m) and better than problem partitioning for ff ff 0 0 1= 2 log c m) The second one partitions the automaton into subautomata, being O(k 2 n= p c log n) on average. For ff 1 Gamma 1= ....

[Article contains additional citation context not shown here]

R. Baeza-Yates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185--192. Springer-Verlag, 1992. LNCS 644.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC