MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Faster string matching with super{alphabets (2002) [1 citations — 0 self]

Download:
pdf | ps
by Kimmo Fredriksson
In: Proceedings of the 9th International Symposium Symposium on String Processing and Information Retrieval (SPIRE'2002). LNCS 2476
http://www.cs.helsinki.fi/u/kfredrik/pub/papers/spire02.ps
Add To MetaCart

Abstract:

alphabet of size , nding the exact occurrences of P in T requires at least (n log m=m) character comparisons on average, as shown in [19]. Consequently, it is believed that this lower bound implies also an (n log m=m) lower bound for the execution time of an optimal algorithm. However, in this paper we show how to obtain an O(n=m) average time algorithm. This is achieved by slightly changing the model of computation, and with a modication of an existing algorithm. Our technique uses a super{alphabet for simulating sux automaton. The space usage of the algorithm is O(m). The technique can be applied to many other string matching algorithms, including dictionary matching, which is also solved in expected time O(n=m), and approximate matching allowing k edit operations (mismatches, insertions or deletions of characters). This is solved in expected time O(nk=m) for k O(m = log m). The known lower bound for this problem is (n(k + log m)=m), given in [6]. Finally we show how to adopt a similar technique to the shift{or algorithm, extending its bit{parallelism in another direction. This gives a speed{up by a factor s, where s is the number of characters processed simultaneously. Some of the algorithms are implemented, and we show that the methods work well in practice too. This is especially true for the shift{or algorithm, which in some cases works faster than predicted by the theory. The result is the fastest known algorithm for exact string matching for short patterns and small alphabets. All the methods and analyses assume the ram model of computation, and that each symbol is coded in b = dlog 2 e bits. They work for larger b too, but the speed{up is decreased.

Citations

447 Fast pattern matching in strings – Knuth, Morris, et al. - 1977
377 A fast string searching algorithm – Boyer, Moore - 1977
328 Efficient string matching: An aid to bibliographic search – Aho, Corasick - 1975
165 A new approach to Text searching – Baeza-Yates, Gonnet - 1992
119 A faster algorithm for computing string edit distances – Masek, Paterson - 1980
72 Speeding up two string matching algorithms – Crochemore, Czumaj, et al. - 1994
60 Practical fast searching in strings – Horspool - 1980
54 A method for the construction of minimum redundancy codes – Human - 1952
31 Boyer-Moore string matching over Ziv-Lempel compressed text – Navarro, Raffinot - 2000
8 Approximate string matching with local similarity – Chang, Marr - 1994
8 Speeding up the pattern matching machine for compressed texts – Miyazaki, Fukamachi, et al. - 1998
7 String searching algorithms revisited – Baeza-Yates - 1989
7 Fast practical multi-pattern matching – Crochemore, Czumaj, et al. - 1999
6 Improved string searching – Baeza-Yates - 1989
5 Fast and word searching on compressed text – Moura, Navarro, et al. - 2000
5 A bit-parallel approach to sux automata: Fast extended string matching – Navarro, Ranot - 1998
3 Tuning string matching for huge pattern sets – Kytojoki, Salmela, et al. - 2003
3 Baeza-Yates. Improved string searching – A - 1989
1 String matching in the DNA alphabet – Tarhio, Peltola - 1997