Randomized Efficient Algorithms for Compressed Strings: the Finger-Print Approach
Abstract:
Denote by LZ(w) the coded form of a string w produced by Lempel-Ziv encoding algorithm. We consider several classical algorithmic problems for texts in the compressed setting. The first of them is the equality-testing: given LZ(w) and integers i; j; k test the equality: w[i: : : i + k] = w[j: : : j + k]. We give a simple and efficient randomized algorithm for this problem using the finger-printing idea. The equality testing is reduced to the equivalence of certain context-free grammars generating single strings. The equality-testing is the bottleneck in other algorithms for compressed texts. We relate the time complexity of several classical problems for texts to the complexity Eq(n) of equality-testing. Assume n = jLZ(T)j, m = jLZ(P)j and U = jT j. Then we can compute the compressed representations of the sets of occurrences of P in T, periods of T, palindromes of T, and squares of T respectively in times O(n log
Citations
| 189 | Efficient randomized pattern-matching algorithms – Karp, Rabin - 1987 |
| 131 | On the complexity of finite sequences – Lempel, Ziv - 1976 |
| 69 | String matching in Lempel-Ziv compressed strings – Farach, Thorup - 1998 |
| 64 | Efficient two-dimensional com-pressed matching – Amir, Benson - 1992 |
| 28 | Testing equivalence of morphisms on context-free languages – Plandowski - 1994 |
| 26 | Optimal parallel algorithms for periods, palindromes and squares – Apostolico, Breslauer, et al. - 1992 |
| 21 | Pattern-matching for strings with short description – Karpinski, Rytter, et al. - 1995 |
| 10 | The Art of Computing, Vol. II: Seminumerical Algorithms. Second edition – Knuth - 1981 |
| 8 | Let sleeping files lie: pattern-matching – Amir, Benson, et al. - 1994 |
| 3 | The fully compressed string matching for Lempel-Ziv encoding – Karpinski, Plandowski, et al. - 1995 |

