Results 1 -
8 of
8
Speeding Up Pattern Matching By Text Compression
, 2000
"... Pattern matching is one of the most fundamental operations in string processing. Recently, a new trend for accelerating pattern matching has emerged: Speeding up pattern matching by text compression. From the traditional criteria for data compression, i.e., compression ratio and compression/decom ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
Pattern matching is one of the most fundamental operations in string processing. Recently, a new trend for accelerating pattern matching has emerged: Speeding up pattern matching by text compression. From the traditional criteria for data compression, i.e., compression ratio and compression/decompression time, adaptive dictionary methods such as the Lempel-Ziv family are often preferred. However, such methods cannot speed up the pattern matching since an extra work is needed to keep track of compression mechanism. We have to re-examine existing compression methods or develop a new method in the light of the new criterion: Efficiency of pattern matching in compressed text. Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is slow and the compression ratio is not as good as...
The Zooming Method: A Recursive Approach to Time-Space Efficient String-Matching
- Comput. Sci
, 1995
"... A new approach to time-space efficient string-matching is presented. The method is flexible, its implementation depends whether or not the alphabet is linearly ordered. The only known linear-time constant-space algorithm for string-matching over nonordered alphabets is the GalilSeiferas algorithm, s ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
A new approach to time-space efficient string-matching is presented. The method is flexible, its implementation depends whether or not the alphabet is linearly ordered. The only known linear-time constant-space algorithm for string-matching over nonordered alphabets is the GalilSeiferas algorithm, see [8, 6] which is rather complicated. The zooming method gives probably the simplest string-matching algorithm working in constant space and linear time for nonordered alphabets. The novel feature of our algorithm is the application of the searching phase (which is usually simpler than preprocessing) in the preprocessing phase. The preprocessing has a recursive structure similar to selection in linear time, see [1]. For ordered alphabets the preprocessing part is much simpler, its basic component is a simple and well-known algorithm for finding the maximal suffix, see [7]. Hence we demonstrate a new application of this algorithm, see also [5]. The idea of the zooming method was applied in [...
Constant-Space String Matching with Smaller Number of Comparisons: Sequential Sampling
, 1995
"... A new string-matching algorithm working in constant space and linear time is presented. It is based on a powerful idea of sampling, originally introduced in parallel computations. The algorithm uses a sample S which consists of two positions inside the pattern P . First the positions of the sample S ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
A new string-matching algorithm working in constant space and linear time is presented. It is based on a powerful idea of sampling, originally introduced in parallel computations. The algorithm uses a sample S which consists of two positions inside the pattern P . First the positions of the sample S are tested against the corresponding positions of the text T , then a version of Knuth-Morris-Pratt algorithm is applied. This gives the simplest known string-matching algorithm which works in constant space and linear time and which does not use any linear order of the alphabet. A rened version of the algorithm gives the fastest (in the sense of number of comparisons) known algorithm for string-matching in constant space. It makes (1 + ")n +O( n m ) symbol comparisons. This improves substantially the result of [3], where a ( 3 2 + ")n comparisons constant space algorithm was designed. 1 Introduction Assume we are given two strings: a pattern P of length m and a text T of length n. The ...
On the Comparison Complexity of the String Prefix-Matching Problem
- In Proc. 2nd European Symposium on Algorithms, number 855 in Lecture Notes in Computer Science
, 1995
"... In this paper we study the exact comparison complexity of the string prefix-matching problem in the deterministic sequential comparison model with equality tests. We derive almost tight lower and upper bounds on the number of symbol comparisons required in the worst case by on-line prefix-matchi ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper we study the exact comparison complexity of the string prefix-matching problem in the deterministic sequential comparison model with equality tests. We derive almost tight lower and upper bounds on the number of symbol comparisons required in the worst case by on-line prefix-matching algorithms for any fixed pattern and variable text. Unlike previous results on the comparison complexity of string-matching and prefix-matching algorithms, our bounds are almost tight for any particular pattern. We also consider the special case where the pattern and the text are the same string. This problem, which we call the string self-prefix problem, is similar to the pattern preprocessing step of the Knuth-Morris-Pratt stringmatching algorithm that is used in several comparison efficient stringmatching and prefix-matching algorithms, including in our new algorithm. We obtain roughly tight lower and upper bounds on the number of symbol comparisons required in the worst case...
Tighter upper bounds on the exact complexity of string matching
- SIAM Journal on Computing
, 1997
"... Abstract. This paper considers how many character comparisons are needed to find all occurrences of a pattern of length m in a text of length n. The main contribution is to show an upper bound of the form of n + O(n/m) character comparisons, following preprocessing. Specifically, we 8 show an upper ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. This paper considers how many character comparisons are needed to find all occurrences of a pattern of length m in a text of length n. The main contribution is to show an upper bound of the form of n + O(n/m) character comparisons, following preprocessing. Specifically, we 8 show an upper bound of n + (n − m) character comparisons. This bound is achieved by an 3(m+1) online algorithm which performs O(n) work in total and requires O(m) space and O(m2) time for preprocessing. The current best lower bound for online algorithms is n + 16 (n − m) character 7m+27 comparisons for m =16k+ 19, for any integer k ≥ 1, and for general algorithms is n + 2 (n − m) m+3 character comparisons, for m =2k+ 1, for any integer k ≥ 1.
Constant-Space String-Matching in Sublinear Average Time (Extended Abstract)
"... ) Maxime Crochemore Universit`e de Marne-la-Vall`ee Leszek Gasieniec y Max-Planck Institut fur Informatik Wojciech Rytter z Warsaw University and University of Liverpool Abstract Given two strings: pattern P of length m and text T of length n. The stringmatching problem is to find all o ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
) Maxime Crochemore Universit`e de Marne-la-Vall`ee Leszek Gasieniec y Max-Planck Institut fur Informatik Wojciech Rytter z Warsaw University and University of Liverpool Abstract Given two strings: pattern P of length m and text T of length n. The stringmatching problem is to find all occurrences of the pattern P in the text T . We present a simple string-matching algorithms which works in average o(n) time with constant additional space for one-dimensional texts and two-dimensional arrays. This is the first attempt to the small-space string-matching problem in which sublinear time algorithms are delivered. More precisely we show that all occurrences of one- or two-dimensional patterns can be found in O( n r ) average time with constant memory, where r is the repetition size (size of the longest repeated subword) of P . Institut Gaspard Monge, Universit`e de Marne-la-Vall`ee, France (mac@univ-mlv.fr). y Max-Planck Institut fur Informatik, Im Stadtwald, D--6612...

