Results

**1 - 5**of**5**### Queries on LZ-Bounded Encodings∗

"... We describe a data structure that stores a string S in space similar to that of its Lempel-Ziv encoding and efficiently supports access, rank and select queries. These queries are fundamental for implementing succinct and compressed data structures, such as compressed trees and graphs. We show that ..."

Abstract
- Add to MetaCart

(Show Context)
We describe a data structure that stores a string S in space similar to that of its Lempel-Ziv encoding and efficiently supports access, rank and select queries. These queries are fundamental for implementing succinct and compressed data structures, such as compressed trees and graphs. We show that our data struc-ture can be built in a scalable manner and is both small and fast in practice compared to other data structures supporting such queries. 1

### AFaster Compressed Suffix Trees for Repetitive Collections

"... Recent compressed suffix trees targeted to highly repetitive sequence collections reach excellent compression performance, but operation times are very high. We design a new suffix tree representation for this scenario that still achieves very low space usage, only slightly larger than the best prev ..."

Abstract
- Add to MetaCart

Recent compressed suffix trees targeted to highly repetitive sequence collections reach excellent compression performance, but operation times are very high. We design a new suffix tree representation for this scenario that still achieves very low space usage, only slightly larger than the best previous one, but supports the operations orders of magnitude faster. Our suffix tree is still orders of magnitude slower than general-purpose compressed suffix trees, but these use several times more space when the collection is repetitive. Our main novelty is a practical grammar-compressed tree representation with full navigation functionality, which is useful in all applications where large trees with repetitive topology must be represented.

### Document Counting in Compressed Space∗

"... We address the problem of counting the number of strings in a collection where a given pattern appears, which has applications in information retrieval and data mining. Existing solutions are in a theoretical stage. In this pa-per we implement these solutions and explore compressed variants, aiming ..."

Abstract
- Add to MetaCart

We address the problem of counting the number of strings in a collection where a given pattern appears, which has applications in information retrieval and data mining. Existing solutions are in a theoretical stage. In this pa-per we implement these solutions and explore compressed variants, aiming to reduce data structure size. Our main result is to uncover some unexpected compressibility properties of the fastest known data structure for the problem. By taking advantage of these properties, we can reduce the size of the structure by a factor of 5-400, depending on the dataset. 1

### Document Counting in Practice?

"... Abstract. We address the problem of counting the number of strings in a collection where a given pattern appears, which has applications in information retrieval and data mining. Existing solutions are in a theoretical stage. We implement these solutions and develop some new variants, comparing them ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract. We address the problem of counting the number of strings in a collection where a given pattern appears, which has applications in information retrieval and data mining. Existing solutions are in a theoretical stage. We implement these solutions and develop some new variants, comparing them experimentally on various datasets. Our results not only show which are the best options for each situation and help discard practically unappealing solutions, but also uncover some unexpected compressibility properties of the best data structures. By taking advantage of these properties, we can reduce the size of the structures by a factor of 5–400, depending on the dataset.

### Rank, select and access in grammar-compressed strings

, 2014

"... Given a string S of length N on a fixed alphabet of σ symbols, a grammar compressor produces a context-free grammar G of size n that generates S and only S. In this paper we describe data structures to support the following operations on a grammar-compressed string: rankc(S, i) (return the number ..."

Abstract
- Add to MetaCart

(Show Context)
Given a string S of length N on a fixed alphabet of σ symbols, a grammar compressor produces a context-free grammar G of size n that generates S and only S. In this paper we describe data structures to support the following operations on a grammar-compressed string: rankc(S, i) (return the number of occurrences of symbol c before position i in S); selectc(S, i) (return the position of the ith occurrence of c in S); and access(S, i, j) (return substring S[i, j]). For rank and select we describe data structures of size O(nσ logN) bits that support the two operations in O(logN) time. We propose another structure that uses O(nσ log(N/n)(logN)1+) bits and that supports the two queries in O(logN / log logN), where > 0 is an arbitrary constant. To our knowledge, we are the first to study the asymptotic complexity of rank and select in the grammar-compressed setting, and we provide a hardness result showing that significantly improving the bounds we achieve would imply a major breakthrough on a hard graph-theoretical problem. Our main result for access is a method that requires O(n logN) bits of space and O(logN + m / logσ N) time to extract m = j − i + 1 consecutive symbols from S. Alternatively, we can achieve O(logN / log logN+m / logσ N) query time using O(n log(N/n)(logN) 1+) bits of space. This matches a lower bound stated by Verbin and Yu for strings where N is polynomially related to n.