Results 1 - 10
of
25
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract
- in Proceedings of the 32nd Annual ACM Symposium on the Theory of Computing
, 2000
"... Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed al ..."
Abstract
-
Cited by 172 (15 self)
- Add to MetaCart
Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for space-efficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg |Σ | bits by encoding each symbol with lg |Σ | bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require Ω(n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Ω(n) memory words, each of Ω(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Ω(lg |Σ | n), which is significant when Σ is of constant size, such as in ascii or unicode. On the other hand, these indexes support fast searching, either in O(m lg |Σ|) timeorinO(m +lgn) time, plus an output-sensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m / lg |Σ | n +lgɛ |Σ | n) search time in the worst case, for any constant
Efficient 2-dimensional Approximate Matching of Half-rectangular Figures
, 1993
"... Efficient algorithms exist for the approximate two dimensional matching problem for rectangles. This is the problem of finding all occurrences of an m \Theta m pattern in an n \Theta n text with no more than k mismatch, insertion, and deletion errors. In computer vision it is important to general ..."
Abstract
-
Cited by 31 (11 self)
- Add to MetaCart
Efficient algorithms exist for the approximate two dimensional matching problem for rectangles. This is the problem of finding all occurrences of an m \Theta m pattern in an n \Theta n text with no more than k mismatch, insertion, and deletion errors. In computer vision it is important to generalize this problem to non-rectangular figures. We make progress towards this goal by defining half-rectangular figures of height m and area a. The approximate two dimensional matching problem for half-rectangular patterns can be solved using a dynamic programming approach in time O(an 2 ). We show an O(kn 2 p m log m p k log k + k 2 n 2 ) algorithm which combines convolutions with dynamic programming. Note that our algorithm is superior to previous known solutions for k m 1=3 . At the heart of the algorithm are the Smaller Matching Problem and the k-Aligned Ones with Location Problem. These are interesting problems in their own right. Efficient algorithms to solve both t...
A Lower Bound for Parallel String Matching
- SIAM J. Comput
, 1993
"... This talk presents the derivation of an\Omega\Gamma/28 log m) lower bound on the number of rounds necessary for finding occurrences of a pattern string P [1::m] in a text string T [1::2m] in parallel using m comparisons in each round. The parallel complexity of the string matching problem using p ..."
Abstract
-
Cited by 26 (13 self)
- Add to MetaCart
This talk presents the derivation of an\Omega\Gamma/28 log m) lower bound on the number of rounds necessary for finding occurrences of a pattern string P [1::m] in a text string T [1::2m] in parallel using m comparisons in each round. The parallel complexity of the string matching problem using p processors for general alphabets follows. 1. Introduction Better and better parallel algorithms have been designed for string-matching. All are on CRCW-PRAM with the weakest form of simultaneous write conflict resolution: all processors which write into the same memory location must write the same value of 1. The best CREW-PRAM algorithms are those obtained from the CRCW algorithms for a logarithmic loss of efficiency. Optimal algorithms have been designed: O(logm) time in [8, 17] and O(log log m) time in [4]. (An optimal algorithm is one with pt = O(n) where t is the time and p is the number of processors used.) Recently, Vishkin [18] developed an optimal O(log m) time algorithm. Unlike...
An Alphabet Independent Approach to Two Dimensional Matching
, 1994
"... There are many solutions to the string matching problem which are strictly linear in the input size and independent of alphabet size. Furthermore, the model of computation for these algorithms is very weak: they allow only simple arithmetic and comparisons of equality between characters of the in ..."
Abstract
-
Cited by 25 (8 self)
- Add to MetaCart
There are many solutions to the string matching problem which are strictly linear in the input size and independent of alphabet size. Furthermore, the model of computation for these algorithms is very weak: they allow only simple arithmetic and comparisons of equality between characters of the input. In contrast, algorithm for two dimensional matching have needed stronger models of computation, most notably assuming a totally ordered alphabet. The fastest algorithms for two dimensional matching have therefore had a logarithmic dependence on the alphabet size. In the worst case, this gives an algorithm that runs in O(n log m) with O(m log m) preprocessing.
Speeding Up Pattern Matching By Text Compression
, 2000
"... Pattern matching is one of the most fundamental operations in string processing. Recently, a new trend for accelerating pattern matching has emerged: Speeding up pattern matching by text compression. From the traditional criteria for data compression, i.e., compression ratio and compression/decom ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
Pattern matching is one of the most fundamental operations in string processing. Recently, a new trend for accelerating pattern matching has emerged: Speeding up pattern matching by text compression. From the traditional criteria for data compression, i.e., compression ratio and compression/decompression time, adaptive dictionary methods such as the Lempel-Ziv family are often preferred. However, such methods cannot speed up the pattern matching since an extra work is needed to keep track of compression mechanism. We have to re-examine existing compression methods or develop a new method in the light of the new criterion: Efficiency of pattern matching in compressed text. Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is slow and the compression ratio is not as good as...
Efficient 2-dimensional approximate matching of non-rectangular figures
- Proc. of 2nd Symoposium on Descrete Algorithms
, 1991
"... Finding all occurrences of a non-rectangular pattern of height m and area a in an nn text with no more than k mismatch, insertion, and deletion errors is an important problem in computer vision. It can be solved using a dynamic programming approach in time O(an 2). We show a O(kn 2 # m log m # k log ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
Finding all occurrences of a non-rectangular pattern of height m and area a in an nn text with no more than k mismatch, insertion, and deletion errors is an important problem in computer vision. It can be solved using a dynamic programming approach in time O(an 2). We show a O(kn 2 # m log m # k log k + k 2 n 2) algorithm which combines convolutions with dynamic programming. At the heart of the algorithm are the Smaller Matching Problem and the k-Aligned Ones with Location Problem. Efficient algorithms to solve both these problems are presented.
Alphabet Dependence in Parameterized Matching
- Information Processing Letters
, 1994
"... The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set \Sigma. A recently introduced model is that of parameterized pattern matching. The main motivation for this scheme lies in software maintenance where pro ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set \Sigma. A recently introduced model is that of parameterized pattern matching. The main motivation for this scheme lies in software maintenance where program fragments are considered "identical" even if variables names are different. Besides the fixed symbols from \Sigma, strings under this model have additional symbols from a variable set \Pi and occurrences of one string in the other are sought, where renaming of the variables from \Pi is allowed in a match. In this paper we provide an algorithm to find all occurrences of a pattern string of length m in a text string of length n under the parameterized pattern matching model. Our algorithm takes time O(n log ß), where ß = min(m; j\Pij), independent of j\Sigmaj. Our algorithm is optimal since we show that this dependence on j\Pij is inherent to any algorithm for this problem in the compari...
Two Dimensional Dictionary Matching
- Information Processing Letters
, 1992
"... Most traditional pattern matching algorithms solve the problem of finding all occurrences of a given pattern string P in a given text T . Another important paradigm is the dictionary matching problem. Let D = {P 1 , ..., P k } be the dictionary. We seek all locations of dictionary patterns that a ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Most traditional pattern matching algorithms solve the problem of finding all occurrences of a given pattern string P in a given text T . Another important paradigm is the dictionary matching problem. Let D = {P 1 , ..., P k } be the dictionary. We seek all locations of dictionary patterns that appear in a given text T .
Efficient Comparison Based String Matching
, 1992
"... We study the exact number of symbol comparisons that are required to solve the string matching problem and present a family of efficient algorithms. Unlike previous string matching algorithms, the algorithms in this family do not "forget" results of comparisons, what makes their analysis much sim ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We study the exact number of symbol comparisons that are required to solve the string matching problem and present a family of efficient algorithms. Unlike previous string matching algorithms, the algorithms in this family do not "forget" results of comparisons, what makes their analysis much simpler. In particular, we give a linear-time algorithm that finds all occurrences of a pattern of length m in a text of length n in n+d 4 log m+2 m (n \Gamma m)e comparisons. The pattern preprocessing takes linear time and makes at most 2m comparisons. This algorithm establishes that, in general, searching for a long pattern is easier than searching for a short one. We also show that any algorithm in the family of the algorithms presented must make at least n + blog mcb n\Gammam m c symbol comparisons, for m = 2 k \Gamma 1 and any integer k 1.
An Optimal O(log log n) Time Parallel Algorithm for Detecting all Squares in a String
, 1995
"... An optimal O(log log n) time concurrent-read concurrent-write parallel algorithm for detecting all squares in a string is presented. A tight lower bound shows that over general alphabets this is the fastest possible optimal algorithm. When p processors are available the bounds become \Theta(d n ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
An optimal O(log log n) time concurrent-read concurrent-write parallel algorithm for detecting all squares in a string is presented. A tight lower bound shows that over general alphabets this is the fastest possible optimal algorithm. When p processors are available the bounds become \Theta(d n log n p e + log log d1+p=ne 2p). The algorithm uses an optimal parallel string-matching algorithm together with periodicity properties to locate the squares within the input string.

