Results 1  10
of
19
Wavelet Trees for All
"... The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabli ..."
Abstract

Cited by 32 (12 self)
 Add to MetaCart
(Show Context)
The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabling compressed representations. New competitive solutions to a number of problems, based on wavelet trees, are appearing every year. In this survey we give an overview of wavelet trees and the surprising number of applications in which we have found them useful: basic and weighted point grids, sets of rectangles, strings, permutations, binary relations, graphs, inverted indexes, document retrieval indexes, fulltext indexes, XML indexes, and general numeric sequences.
Improved grammarbased compressed indexes
 In Proc. 19th SPIRE, LNCS 7608
, 2012
"... Abstract. We introduce the first grammarcompressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T [1..u] that is represented by a (contextfree) grammar of n (terminal and nonterminal) symbols and size N (meas ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We introduce the first grammarcompressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T [1..u] that is represented by a (contextfree) grammar of n (terminal and nonterminal) symbols and size N (measured as the sum of the lengths of the right hands of the rules), a basic grammarbased representation of T takes N lg n bits of space. Our representation requires 2N lg n + N lg u + ɛ n lg n + o(N lg n) bits of space, for any 0 < ɛ ≤ 1. It can find the positions of the occ occurrences of a pattern of length m in T in O (m 2 /ɛ) lg lg u lg n + (m + occ) lg n time, and extract any substring of length ℓ of T in time O(ℓ + h lg(N/h)), where h is the height of the grammar tree.
Tree structure compression with RePair
, 2010
"... Larsson and Moffat’s RePair algorithm is generalized from strings to trees. The new algorithm (TreeRePair) produces straightline linear contextfree tree (SLT) grammars which are smaller than those produced by previous grammarbased compressors such as BPLEX. Experiments show that a Huffmanbased ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Larsson and Moffat’s RePair algorithm is generalized from strings to trees. The new algorithm (TreeRePair) produces straightline linear contextfree tree (SLT) grammars which are smaller than those produced by previous grammarbased compressors such as BPLEX. Experiments show that a Huffmanbased coding of the resulting grammars gives compression ratios comparable to the best known XML file compressors. Moreover, SLT grammars can be used as efficient memory representation of trees. Our investigations show that tree traversals over TreeRePair grammars are 14 times slower than over pointer structures and 5 times slower than over succinct trees, while memory consumption is only 1/43 and 1/6, respectively. 1
Algorithmics on SLPcompressed strings: a survey,
 Groups Complex. Cryptol.
, 2012
"... Abstract Results on algorithmic problems on strings that are given in a compressed form via straightline programs are surveyed. A straightline program is a contextfree grammar that generates exactly one string. In this way, exponential compression rates can be achieved. Among others, we study pat ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
Abstract Results on algorithmic problems on strings that are given in a compressed form via straightline programs are surveyed. A straightline program is a contextfree grammar that generates exactly one string. In this way, exponential compression rates can be achieved. Among others, we study pattern matching for compressed strings, membership problems for compressed strings in various kinds of formal languages, and the problem of querying compressed strings. Applications in combinatorial group theory and computational topology and to the solution of word equations are discussed as well. Finally, extensions to compressed trees and pictures are considered.
The Wavelet Matrix
"... Abstract. The wavelet tree (Grossi et al., SODA 2003) is nowadays a popular succinct data structure for text indexes, discrete grids, and many other applications. When it has many nodes, a levelwise representation proposed by Mäkinen and Navarro (LATIN 2006) is preferable. We propose a different arr ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
Abstract. The wavelet tree (Grossi et al., SODA 2003) is nowadays a popular succinct data structure for text indexes, discrete grids, and many other applications. When it has many nodes, a levelwise representation proposed by Mäkinen and Navarro (LATIN 2006) is preferable. We propose a different arrangement of the levelwise data, so that the bitmaps are shuffled in a different way. The result can no more be called a wavelet tree, and we dub it wavelet matrix. We demonstrate that the wavelet matrix is simpler to build, simpler to query, and faster in practice than the levelwise wavelet tree. This has a direct impact on many applications that use the levelwise wavelet tree for different purposes. 1
Fast and tiny structural selfindexes for XML
 CoRR
"... XML document markup is highly repetitive and therefore well compressible using dictionarybased methods such as DAGs or grammars. In the context of selectivity estimation, grammarcompressed trees were used before as synopsis for structural XPath queries. Here a fullyfledged index over such gramm ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
(Show Context)
XML document markup is highly repetitive and therefore well compressible using dictionarybased methods such as DAGs or grammars. In the context of selectivity estimation, grammarcompressed trees were used before as synopsis for structural XPath queries. Here a fullyfledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slowdown that is comparable to the space improvement. More interestingly, certain algorithms execute much faster over the index (because no decompression occurs). E.g., for structural XPath count queries, evaluating over the index is faster than previous XPath implementations, often by two orders of magnitude. The index also allows to serialize XML results (including texts) faster than previous systems, by a factor of ca. 2–3. This is due to efficient copy handling of grammar repetitions, and because materialization is totally avoided. In order to compare with twig join implementations, we implemented a materializer which writes out preorder numbers of result nodes, and show its competitiveness. 1.
Compact binary relation representations with rich functionality
, 2013
"... Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generaliz ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Binary relations are an important abstraction arising in many data representation problems. The data structures proposed so far to represent them support just a few basic operations required to fit one particular application. We identify many of those operations arising in applications and generalize them into a wide set of desirable queries for a binary relation representation. We also identify reductions among those operations. We then introduce several novel binary relation representations, some simple and some quite sophisticated, that not only are spaceefficient but also efficiently support a large subset of the desired queries.
Orthogonal range searching for text indexing
 In SpaceEfficient Data Structures, Streams, and Algorithms
, 2013
"... Abstract. Text indexing, the problem in which one desires to preprocess a (usually large) text for future (shorter) queries, has been researched ever since the suffix tree was invented in the early 70’s. With textual data continuing to increase and with changes in the way it is accessed, new data s ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Text indexing, the problem in which one desires to preprocess a (usually large) text for future (shorter) queries, has been researched ever since the suffix tree was invented in the early 70’s. With textual data continuing to increase and with changes in the way it is accessed, new data structures and new algorithmic methods are continuously required. Therefore, text indexing is of utmost importance and is a very active research domain. Orthogonal range searching, classically associated with the computational geometry community, is one of the tools that has increasingly become important for various text indexing applications. Initially, in the mid 90’s there were a couple of results recognizing this connection. In the last few years we have seen an increase in use of this method and are reaching a deeper understanding of the range searching uses for text indexing. In this monograph we survey some of these results.