Results 1  10
of
51
Random Access to GrammarCompressed Strings
, 2011
"... Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is the inverse of the k th row of Ackermann’s function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammarcompressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{P k, k 4 + P } + log N) + occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammarcompressed trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two ”biased” weighted ancestor data structures, and a compact representation of heavypaths in grammars.
Compressing and indexing labeled trees, with applications
, 2009
"... Consider an ordered, static tree T where each node has a label from alphabet �. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any chi ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
Consider an ordered, static tree T where each node has a label from alphabet �. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any child with some label,...) as well as more sophisticated pathbased search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBWtransform of the tree in the spirit of the wellknown BurrowsWheeler transform for strings [1994]. The XBWtransform uses pathsorting to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBWtransform, our compressed indexes go beyond the informationtheoretic lower bound, and support navigational and pathsearch operations over labeled trees within (near)optimal time bounds and entropybounded space. Our XBWtransform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBWtransform to design and implement a compressed index for XML documents whose compression ratio is significantly better than the one achievable by stateoftheart tools, and its query time performance is order
Word problems and membership problems on compressed words
 SIAM J. Comput., 35(5):1210
"... Abstract. We consider a compressed form of the word problem for finitely presented monoids, where the input consists of two compressed representations of words over the generators of a monoid M, and we ask whether these two words represent the same monoid element of M. Words are compressed using str ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
(Show Context)
Abstract. We consider a compressed form of the word problem for finitely presented monoids, where the input consists of two compressed representations of words over the generators of a monoid M, and we ask whether these two words represent the same monoid element of M. Words are compressed using straightline programs, i.e., contextfree grammars that generate exactly one word. For several classes of finitely presented monoids we obtain completeness results for complexity classes in the range from P to EXPSPACE. As a byproduct of our results on compressed word problems we obtain a fixed deterministic contextfree language with a PSPACEcomplete compressed membership problem. The existence of such a language was open so far. Finally, we will investigate the complexity of the compressed membership problem for various circuit complexity classes. Key words. grammarbased compression, word problems for monoids, contextfree languages, complexity AMS subject classifications. 20F10, 68Q17, 68Q42
Structural selectivity estimation for XML documents
 In ICDE
, 2007
"... Estimating the selectivity of queries is a crucial problem in database systems. Virtually all database systems rely on the use of selectivity estimates to choose amongst the many possible execution plans for a particular query. In terms of XML databases, the problem of selectivity estimation of quer ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
Estimating the selectivity of queries is a crucial problem in database systems. Virtually all database systems rely on the use of selectivity estimates to choose amongst the many possible execution plans for a particular query. In terms of XML databases, the problem of selectivity estimation of queries presents new challenges: many evaluation operators are possible, such as simple navigation, structural joins, or twig joins, and many different indexes are possible ranging from traditional Btrees to complicated XMLspecific graph indexes. A new synopsis for XML documents is introduced which can be effectively used to estimate the selectivity of complex path queries. The synopsis is based on a lossy compression of the document tree that underlies the XML document, and can be computed in one pass from the document. It has several advantages over existing approaches: (1) it allows one to estimate the selectivity of queries containing all XPath axes, including the ordersensitive ones, (2) the estimator returns a range within which the actual selectivity is guaranteed to lie, with the size of this range implicitly providing a confidence measure of the estimate, and (3) the synopsis can be incrementally updated to reflect changes in the XML database. 1
Context matching for compressed terms
 In LICS 2008
, 2008
"... This paper is an investigation of the matching problem for term equations s = t where s contains context variables, and both terms s and t are given using some kind of compressed representation. In this setting, term representation with dags, but also with the more general formalism of singleton t ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
This paper is an investigation of the matching problem for term equations s = t where s contains context variables, and both terms s and t are given using some kind of compressed representation. In this setting, term representation with dags, but also with the more general formalism of singleton tree grammars, are considered. The main result is a polynomial time algorithm for context matching with dags, when the number of different context variables is fixed for the problem. NPcompleteness is obtained when the terms are represented using singleton tree grammars. The special cases of firstorder matching and also unification with STGs are shown to be decidable in PTIME. 1
Monadic secondorder unification is NPcomplete
, 2006
"... Bounded SecondOrder Unification is the problem of deciding, for a given secondorder equation t? = u and a positive integer m, whether there exists a unifier σ such that, for every secondorder variable F, the terms instantiated for F have at most m occurrences of every bound variable. It is alre ..."
Abstract

Cited by 12 (8 self)
 Add to MetaCart
(Show Context)
Bounded SecondOrder Unification is the problem of deciding, for a given secondorder equation t? = u and a positive integer m, whether there exists a unifier σ such that, for every secondorder variable F, the terms instantiated for F have at most m occurrences of every bound variable. It is already known that Bounded SecondOrder Unification is decidable and NPhard, whereas general SecondOrder Unification is undecidable. We prove that Bounded SecondOrder Unification is NPcomplete, provided that m is given in unary encoding, by proving that a sizeminimal solution can be represented in polynomial space, and then applying a generalization of Plandowski’s polynomial algorithm that compares compacted terms in polynomial time.
Stratified context unification is npcomplete
 In Proc. of the 3rd International Joint Conference on Automated Reasoning, IJCAR’06
, 2006
"... Abstract. Context Unification is the problem to decide for a given set of secondorder equations E where all secondorder variables are unary, whether there exists a unifier, such that for every secondorder variable X, theabstractionλx.r instantiated for X has exactly one occurrence of the bound va ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Context Unification is the problem to decide for a given set of secondorder equations E where all secondorder variables are unary, whether there exists a unifier, such that for every secondorder variable X, theabstractionλx.r instantiated for X has exactly one occurrence of the bound variable x in r. Stratified Context Unification is a specialization where the nesting of secondorder variables in E is restricted. It is already known that Stratified Context Unification is decidable, NPhard, and in PSPACE, whereas the decidability and the complexity of Context Unification is unknown. We prove that Stratified Context Unification is in NP by proving that a sizeminimal solution can be represented in a singleton tree grammar of polynomial size, and then applying a generalization of Plandowski’s polynomial algorithm that compares compacted terms in polynomial time. This also demonstrates the high potential of singleton tree grammars for optimizing programs maintaining large terms. A corollary of our result is that solvability of rewrite constraints is NPcomplete. 1
Xquec: A queryconscious compressed xml database
 ACM Trans. Internet Tech
"... XML compression has gained prominence recently because it counters the disadvantage of the “verbose ” representation XML gives to data. In many applications, such as data exchange and data archiving, entirely compressing and decompressing a document is acceptable. In other applications, where querie ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
XML compression has gained prominence recently because it counters the disadvantage of the “verbose ” representation XML gives to data. In many applications, such as data exchange and data archiving, entirely compressing and decompressing a document is acceptable. In other applications, where queries must be run over compressed documents, compression may not be beneficial since the performance penalty in running the query processor over compressed data outweighs the data compression benefits. While balancing the interests of compression and query processing has received significant attention in the domain of relational databases, these results do not immediately translate to XML data. In this paper, we address the problem of embedding compression into XML databases without degrading query performance. Since the setting is rather different from relational databases, the choice of compression granularity and compression algorithms must be revisited. Query execution in the compressed domain must also be rethought in the framework of XML query processing, due to the richer structure of XML data. Indeed, a proper storage design for the compressed data plays a crucial role here. The XQueC system (standing for XQuery Processor and C ompressor) covers a wide set of
Tree structure compression with RePair
, 2010
"... Larsson and Moffat’s RePair algorithm is generalized from strings to trees. The new algorithm (TreeRePair) produces straightline linear contextfree tree (SLT) grammars which are smaller than those produced by previous grammarbased compressors such as BPLEX. Experiments show that a Huffmanbased ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Larsson and Moffat’s RePair algorithm is generalized from strings to trees. The new algorithm (TreeRePair) produces straightline linear contextfree tree (SLT) grammars which are smaller than those produced by previous grammarbased compressors such as BPLEX. Experiments show that a Huffmanbased coding of the resulting grammars gives compression ratios comparable to the best known XML file compressors. Moreover, SLT grammars can be used as efficient memory representation of trees. Our investigations show that tree traversals over TreeRePair grammars are 14 times slower than over pointer structures and 5 times slower than over succinct trees, while memory consumption is only 1/43 and 1/6, respectively. 1
Parameter reduction in grammarcompressed trees
 In 12th FoSSaCS, volume 5504 of LNCS
, 2009
"... Abstract. Trees can be conveniently compressed with linear straightline contextfree tree grammars. Such grammars generalize straightline contextfree string grammars which are widely used in the development of algorithms that execute directly on compressed structures (without prior decompression) ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Abstract. Trees can be conveniently compressed with linear straightline contextfree tree grammars. Such grammars generalize straightline contextfree string grammars which are widely used in the development of algorithms that execute directly on compressed structures (without prior decompression). It is shown that every linear straightline contextfree tree grammar can be transformed in polynomial time into a monadic (and linear) one. A tree grammar is monadic if each nonterminal uses at most one context parameter. Based on this result, a polynomial time algorithm is presented for testing whether a given nondeterministic tree automaton with sibling constraints accepts a tree given by a linear straightline contextfree tree grammar. It is shown that if tree grammars are nondeterministic or nonlinear, then reducing their numbers of parameters cannot be done without an exponential blowup in grammar size. 1