Results 1  10
of
35
Fullyfunctional succinct trees
 In Proc. 21st SODA
, 2010
"... We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any nnode static tree can be represented in 2n + o(n) bits and a large number of operations on the tree can be supported in constant time under the wordRAM model. However existing data s ..."
Abstract

Cited by 62 (21 self)
 Add to MetaCart
(Show Context)
We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any nnode static tree can be represented in 2n + o(n) bits and a large number of operations on the tree can be supported in constant time under the wordRAM model. However existing data structures are not satisfactory in both theory and practice because (1) the lowerorder term is Ω(nlog log n / log n), which cannot be neglected in practice, (2) the hidden constant is also large, (3) the data structures are complicated and difficult to implement, and (4) the techniques do not extend to dynamic trees supporting insertions and deletions of nodes. We propose a simple and flexible data structure, called the range minmax tree, that reduces the large number of relevant tree operations considered in the literature to a few primitives, which are carried out in constant time on sufficiently small trees. The result is then extended to trees of arbitrary size, achieving 2n + O(n/polylog(n)) bits of space. The redundancy is significantly lower than in any previous proposal, and the data structure is easily implemented. Furthermore, using the same framework, we derive the first fullyfunctional dynamic succinct trees. 1
Orthogonal Range Searching on the RAM, Revisited
, 2011
"... We present a number of new results on one of the most extensively studied topics in computational geometry, orthogonal range searching. All our results are in the standard word RAM model: 1. We present two data structures for 2d orthogonal range emptiness. The first achieves O(n lg lg n) space and ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
(Show Context)
We present a number of new results on one of the most extensively studied topics in computational geometry, orthogonal range searching. All our results are in the standard word RAM model: 1. We present two data structures for 2d orthogonal range emptiness. The first achieves O(n lg lg n) space and O(lg lg n) query time, assuming that the n given points are in rank space. This improves the previous results by Alstrup, Brodal, and Rauhe (FOCS’00), with O(n lg ε n) space and O(lg lg n) query time, or with O(n lg lg n) space and O(lg 2 lg n) query time. Our second data structure uses O(n) space and answers queries in O(lg ε n) time. The best previous O(n)space data structure, due to Nekrich (WADS’07), answers queries in O(lg n / lg lg n) time. 2. We give a data structure for 3d orthogonal range reporting with O(n lg 1+ε n) space and O(lg lg n+ k) query time for points in rank space, for any constant ε> 0. This improves the previous results by Afshani (ESA’08), Karpinski and Nekrich (COCOON’09), and Chan (SODA’11), with O(n lg 3 n) space and O(lg lg n + k) query time, or with O(n lg 1+ε n) space and O(lg 2 lg n + k) query time. Consequently, we obtain improved upper bounds for orthogonal range reporting in all constant dimensions above 3.
Wavelet Trees for All
"... The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabli ..."
Abstract

Cited by 32 (12 self)
 Add to MetaCart
The wavelet tree is a versatile data structure that serves a number of purposes, from string processing to geometry. It can be regarded as a device that represents a sequence, a reordering, or a grid of points. In addition, its space adapts to various entropy measures of the data it encodes, enabling compressed representations. New competitive solutions to a number of problems, based on wavelet trees, are appearing every year. In this survey we give an overview of wavelet trees and the surprising number of applications in which we have found them useful: basic and weighted point grids, sets of rectangles, strings, permutations, binary relations, graphs, inverted indexes, document retrieval indexes, fulltext indexes, XML indexes, and general numeric sequences.
Colored Range Queries and Document Retrieval
"... Colored range queries are a wellstudied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important onedimensional colored range queries — colore ..."
Abstract

Cited by 31 (18 self)
 Add to MetaCart
(Show Context)
Colored range queries are a wellstudied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important onedimensional colored range queries — colored range listing, colored range topk queries and colored range counting — and, thus, new bounds for various document retrieval problems on general collections of sequences. Specifically, we first describe a framework including almost all recent results on colored range listing and document listing, which suggests new combinations of data structures for these problems. For example, we give the fastest compressed data structures for colored range listing and document listing, and an efficient data structure for document listing whose size is bounded in terms of the highorder entropies of the library of documents. We then show how (approximate) colored topk queries can be reduced to (approximate) rangemode queries on subsequences, yielding the first efficient data structure for this problem. Finally, we show how a modified wavelet tree can support colored range counting in logarithmic time and space that is succinct whenever the number of colors is superpolylogarithmic in the length of the sequence.
Topk document retrieval in optimal time and linear space
 In Proc. 22nd Annual ACMSIAM Symposium on Discrete Algorithms (SODA 2012
, 2012
"... We describe a data structure that uses O(n)word space and reports k most relevant documents that contain a query pattern P in optimal O(P  + k) time. Our construction supports an ample set of important relevance measures, such as the frequency of P in a document and the minimal distance between t ..."
Abstract

Cited by 28 (17 self)
 Add to MetaCart
(Show Context)
We describe a data structure that uses O(n)word space and reports k most relevant documents that contain a query pattern P in optimal O(P  + k) time. Our construction supports an ample set of important relevance measures, such as the frequency of P in a document and the minimal distance between two occurrences of P in a document. We show how to reduce the space of the data structure from O(n log n) to O(n(log σ+log D+log log n)) bits, where σ is the alphabet size and D is the total number of documents. 1
SelfIndexing Based on LZ77
"... We introduce the first selfindex based on the LempelZiv 1977 compression format (LZ77). It is particularly competitive for highly repetitive text collections such as sequence databases of genomes of related species, software repositories, versioned document collections, and temporal text databases ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
(Show Context)
We introduce the first selfindex based on the LempelZiv 1977 compression format (LZ77). It is particularly competitive for highly repetitive text collections such as sequence databases of genomes of related species, software repositories, versioned document collections, and temporal text databases. Such collections are extremely compressible but classical selfindexes fail to capture that source of compressibility. Our selfindex takes in practice a few times the space of the text compressed with LZ77 (as little as 2.5 times), extracts 1–2 million characters of the text per second, and finds patterns at a rate of 10–50 microseconds per occurrence. It is smaller (up to one half) than the best current selfindex for repetitive collections, and faster in many cases.
Fullyfunctional static and dynamic succinct trees
, 2010
"... We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any nnode static tree can be represented in 2n + o(n) bits and various operations on the tree can be supported in constant time under the wordRAM model. However the data structures are c ..."
Abstract

Cited by 26 (16 self)
 Add to MetaCart
(Show Context)
We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any nnode static tree can be represented in 2n + o(n) bits and various operations on the tree can be supported in constant time under the wordRAM model. However the data structures are complicated and difficult to dynamize. We propose a simple and flexible data structure, called the range minmax tree, that reduces the large number of relevant tree operations considered in the literature, to a few primitives that are carried out in constant time on sufficiently small trees. The result is extended to trees of arbitrary size, achieving 2n + O(n/polylog(n)) bits of space. The redundancy is significantly lower than any previous proposal. For the dynamic case, where insertion/deletion of nodes is allowed, the existing data structures support very limited operations. Our data structure builds on the range minmax tree to achieve 2n + O(n / log n) bits of space and O(log n) time for all the operations. We also propose an improved data structure using 2n+O(n loglog n / logn) bits and improving the time to O(log n / loglog n) for most operations.
On Compressing and Indexing Repetitive Sequences
, 2011
"... We introduce LZEnd, a new member of the LempelZiv family of text compressors, which achieves compression ratios close to those of LZ77 but performs much faster at extracting arbitrary text substrings. We then build the first selfindex based on LZ77 (or LZEnd) compression, which in addition to te ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
We introduce LZEnd, a new member of the LempelZiv family of text compressors, which achieves compression ratios close to those of LZ77 but performs much faster at extracting arbitrary text substrings. We then build the first selfindex based on LZ77 (or LZEnd) compression, which in addition to text extraction offers fast indexed searches on the compressed text. This selfindex is particularly effective to represent highly repetitive sequence collections, which arise for example when storing versioned documents, software repositories, periodic publications, and biological sequence databases.
SelfIndexed GrammarBased Compression
, 2001
"... Selfindexes aim at representing text collections in a compressed format that allows extracting arbitrary portions and also offers indexed searching on the collection. Current selfindexes are unable of fully exploiting the redundancy of highly repetitive text collections that arise in several appl ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
Selfindexes aim at representing text collections in a compressed format that allows extracting arbitrary portions and also offers indexed searching on the collection. Current selfindexes are unable of fully exploiting the redundancy of highly repetitive text collections that arise in several applications. Grammarbased compression is well suited to exploit such repetitiveness. We introduce the first grammarbased selfindex. It builds on StraightLine Programs (SLPs), a rather general kind of contextfree grammars. If an SLP of n rules represents a text T [1, u], then an SLPcompressed representation of T requires 2n log 2 n bits. For that same SLP, our selfindex takes O(n log n) + n log 2 u bits. It extracts any text substring of length m in time O((m + h) log n), and finds occ occurrences of a pattern string of length m in time O((m(m + h) + h occ) log n), where h is the height of the parse tree of the SLP. No previous grammar representation had achieved o(n) search time. As byproducts we introduce (i) a representation of SLPs that takes 2n log 2 n(1 + o(1)) bits and efficiently supports more operations than a plain array of rules; (ii) a representation for binary relations with labels supporting various extended queries; (iii) a generalization of our selfindex to grammar
LinearSpace Data Structures for Range Mode Query in Arrays
"... A mode of a multiset S is an element a ∈ S of maximum multiplicity; that is, a occurs at least as frequently as any other element in S. Given an array A[1: n] of n elements, we consider a basic problem: constructing a static data structure that efficiently answers range mode queries on A. Each query ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
(Show Context)
A mode of a multiset S is an element a ∈ S of maximum multiplicity; that is, a occurs at least as frequently as any other element in S. Given an array A[1: n] of n elements, we consider a basic problem: constructing a static data structure that efficiently answers range mode queries on A. Each query consists of an input pair of indices (i, j) for which a mode of A[i: j] must be returned. The best previous data structure with linear space, by Krizanc, Morin, and Smid (ISAAC 2003), requires O ( √ n log log n) query time. We improve their result and present an O(n)space data structure that supports range mode queries in O ( p n / log n) worstcase time. Furthermore, we present strong evidence that a query time significantly below √ n cannot be achieved by purely combinatorial techniques; we show that boolean matrix multiplication of two √ n × √ n matrices reduces to n range mode queries in an array of size O(n). Additionally, we give linearspace data structures for orthogonal range mode in higher dimensions (queries in near O(n 1−1/2d) time) and for halfspace range mode in higher dimensions (queries in O(n 1−1/d2) time).