#### DMCA

## Tree compression with top trees. (2013)

Venue: | In Proc. ICALP |

Citations: | 4 - 0 self |

### Citations

230 | XMill: an efficient compressor for XML data,”, in
- Liefke, Suciu
- 2000
(Show Context)
Citation Context ...le using only constant time for queries. Such representations are called succinct data structures, and have been generalized to include a richer set of queries such as subtree-size queries [17,18] and level-ancestor queries [19]. For labeled trees, Ferragina et al. [20] gave a representation using 2n logσ + O (n) bits that supports basic navigational operations, such as find the parent of node v , the i’th child of v , and any child of v with label α. Ferragina et al. also introduced the notion of k’th order tree entropy Hk in a restricted model. In this model, used by popular XML compressors [21,22], the label of a node is a function of the labels of all its ancestors. For such a tree T , Ferragina et al. gave a representation requiring at most nHk(T ) + 2.01n + o(n) bits. Note that the above space bounds do not guarantee a compact representation when the input contains many subtree repeats or tree pattern repeats. In particular, the total space is never o(n) bits. 1.2. Our results We propose a new compression scheme for labeled trees, which we call top tree compression. To the best of our knowledge, this is the first compression scheme for trees that (i) takes advantage of tree pattern ... |

223 |
Space-efficient static trees and graphs
- Jacobson
- 1989
(Show Context)
Citation Context ...NCA returns the nearest common ancestor to a given pair of nodes. Finally, the Decompress operation decompresses and returns any rooted subtree. 3 1.3 Related work (Succinct data structures) Jacobson =-=[16]-=- was the first to observe that the naive pointer-based tree representation using Θ(n log n) bits is wasteful. He showed that unlabeled trees can be represented using 2n + o(n) bits and support various... |

160 | Succinct representation of balanced parentheses, static trees and planar graphs
- Munro, Raman
- 1997
(Show Context)
Citation Context ...rt various queries by inspection of Θ(lg n) bits in the bit probe model. This space bound is asymptotically optimal with the information-theoretic lower bound averaged over all trees. Munro and Raman =-=[21]-=- showed how to achieve the same bound in the RAM model while using only constant time for queries. Such representations are called succinct data structures, and have been generalized to trees with hig... |

117 |
Variations on the common subexpression problem.
- Downey, Sethi, et al.
- 1980
(Show Context)
Citation Context ...occurrence of T ′. This way, it is possible to represent T as a directed acyclic graph (DAG). Over all possible DAGs that can represent T , the smallest one is unique and can be computed in O(n) time =-=[11]-=-. Its size can be exponentially smaller than n. Using subtree repeats for compression was studied in [7, 13], and a Lempel-Ziv analog of subtree repeats was suggested in [1]. It is also possible to su... |

105 | Compressing XML with multiplexed hierarchical PPM models,”
- Cheney, J
- 2001
(Show Context)
Citation Context ...le using only constant time for queries. Such representations are called succinct data structures, and have been generalized to include a richer set of queries such as subtree-size queries [17,18] and level-ancestor queries [19]. For labeled trees, Ferragina et al. [20] gave a representation using 2n logσ + O (n) bits that supports basic navigational operations, such as find the parent of node v , the i’th child of v , and any child of v with label α. Ferragina et al. also introduced the notion of k’th order tree entropy Hk in a restricted model. In this model, used by popular XML compressors [21,22], the label of a node is a function of the labels of all its ancestors. For such a tree T , Ferragina et al. gave a representation requiring at most nHk(T ) + 2.01n + o(n) bits. Note that the above space bounds do not guarantee a compact representation when the input contains many subtree repeats or tree pattern repeats. In particular, the total space is never o(n) bits. 1.2. Our results We propose a new compression scheme for labeled trees, which we call top tree compression. To the best of our knowledge, this is the first compression scheme for trees that (i) takes advantage of tree pattern ... |

95 | Representing trees of higher degree
- Benoit, Demaine, et al.
(Show Context)
Citation Context ...to achieve the same bound in the RAM model while using only constant time for queries. Such representations are called succinct data structures, and have been generalized to trees with higher degrees =-=[5]-=- and to a richer set of queries such as subtree-size queries [21] and level-ancestor queries [15]. For labeled trees, Ferragina et al. [12] gave a representation using 2n log σ+O(n) bits that supports... |

62 | The smallest grammar problem
- Charikar, Lehman, et al.
(Show Context)
Citation Context ...ng trees and were studied in [8,9,18–20]. Compared to DAG compression a tree grammar can be exponentially smaller than the minimal DAG [18]. Unfortunately, computing a minimal tree grammar is NP-Hard =-=[10]-=-, and all known tree grammar based compression schemes can only support navigational queries in time proportional to the height of the grammar which can be Ω(n). 1.2 Our Results. We propose a new comp... |

56 | Path queries on compressed XML.
- Buneman, Grohe, et al.
- 2003
(Show Context)
Citation Context ...02 v1s[ cs .D S]s2 1 A prs20 13 T’ T’ T T’ T’ T Figure 1: A tree T with a subtree repeat T ′ (left), and a tree pattern repeat T ′ (right). subgraph of T . Subtree repeats are used in DAG compression =-=[7, 13]-=- and tree patterns repeats in tree grammars [8, 9, 18–20]. In this paper we introduce top tree compression based top trees [3] that exploits tree pattern repeats. Compared to the existing techniques o... |

51 | Succinct ordinal trees with level-ancestor queries
- Geary, Raman, et al.
(Show Context)
Citation Context ...esentations are called succinct data structures, and have been generalized to trees with higher degrees [5] and to a richer set of queries such as subtree-size queries [21] and level-ancestor queries =-=[15]-=-. For labeled trees, Ferragina et al. [12] gave a representation using 2n log σ+O(n) bits that supports basic navigational operations, such as find the parent of node v, the i’th child of v, and any c... |

50 | Efficient memory representation of XML document trees - Busatto, Lohrey, et al. - 2008 |

31 | Minimizing diameters of dynamic trees, in: - Alstrup, Holm, et al. - 1997 |

31 | Random access to grammar-compressed strings.
- Bille, Landau, et al.
- 2011
(Show Context)
Citation Context ...ally smaller than n. Using subtree repeats for compression was studied in [7, 13], and a Lempel-Ziv analog of subtree repeats was suggested in [1]. It is also possible to support navigational queries =-=[6]-=- and path queries [7] directly on the DAG representation in logarithmic time. The problem with subtree repeats is that we can miss many internal repeats. Consider for example the case where T is a sin... |

31 |
Algorithms and Data Structures in VLSI Design.
- Meinel, Theobald
(Show Context)
Citation Context ...pression schemes (see e.g., the recent survey of Sakr [8]). DAG compression. Using subtree repeats, a node in the tree T that has a child with subtree T ′ can instead point to any other occurrence of T ′ . This way, it is possible to represent T as a Directed Acyclic Graph (DAG). Over all possible DAGs that can represent T , the smallest one is unique and can be computed in O (n) time [9]. Its size can be exponentially smaller than n. DAG representation of trees are broadly used for identifying and sharing common subexpressions, e.g., in programming languages [10] and binary decision diagrams [11]. Compression based on DAGs has also been studied more recently in [1,2,12] and a Lempel–Ziv analog of subtree repeats was suggested in [13]. It is possible to support navigational queries [14] and path queries [1] directly on the DAG representation in logarithmic time. The problem with subtree repeats is that we can miss many internal repeats. Consider for example the case where T is a single path of n nodes with the same label. Even though T is highly compressible (we can represent it by just storing the label and the path length) it does not contain a single subtree repeat and its minimal D... |

30 | Query evaluation on compressed trees
- Frick, Grohe, et al.
- 2003
(Show Context)
Citation Context ...02 v1s[ cs .D S]s2 1 A prs20 13 T’ T’ T T’ T’ T Figure 1: A tree T with a subtree repeat T ′ (left), and a tree pattern repeat T ′ (right). subgraph of T . Subtree repeats are used in DAG compression =-=[7, 13]-=- and tree patterns repeats in tree grammars [8, 9, 18–20]. In this paper we introduce top tree compression based top trees [3] that exploits tree pattern repeats. Compared to the existing techniques o... |

30 |
Ecient algorithms for Lempel-Ziv encodings
- Gasieniec, Karpinksi, et al.
- 1996
(Show Context)
Citation Context ...and compresses better than a logarithmic factor of the DAG compression. • Pattern matching in compressed strings is a well-studied and well-developed area with numerous results, see e.g., the surveys =-=[14,17,22]-=-. Pattern matching in compressed trees (especially within tree compression schemes that exploit tree pattern repeats) is a wide open area. • We wonder if top tree compression is practical. In prelimin... |

22 | Compressing and indexing labeled trees, with applications,
- Ferragina, Luccio, et al.
- 2009
(Show Context)
Citation Context ...tures, and have been generalized to trees with higher degrees [5] and to a richer set of queries such as subtree-size queries [21] and level-ancestor queries [15]. For labeled trees, Ferragina et al. =-=[12]-=- gave a representation using 2n log σ+O(n) bits that supports basic navigational operations, such as find the parent of node v, the i’th child of v, and any child of v with label α. All the above boun... |

20 |
The complexity of tree automata and XPath on grammarcompressed trees
- Lohrey, Maneth
(Show Context)
Citation Context ...Tree grammars generalize grammars from deriving strings to deriving trees and were studied in [8,9,18–20]. Compared to DAG compression a tree grammar can be exponentially smaller than the minimal DAG =-=[18]-=-. Unfortunately, computing a minimal tree grammar is NP-Hard [10], and all known tree grammar based compression schemes can only support navigational queries in time proportional to the height of the ... |

20 |
Grammar compression, LZ-encodings, and string algorithms with implicit input.
- Rytter
- 2004
(Show Context)
Citation Context ...and compresses better than a logarithmic factor of the DAG compression. • Pattern matching in compressed strings is a well-studied and well-developed area with numerous results, see e.g., the surveys =-=[14,17,22]-=-. Pattern matching in compressed trees (especially within tree compression schemes that exploit tree pattern repeats) is a wide open area. • We wonder if top tree compression is practical. In prelimin... |

17 |
XML compression techniques: a survey and comparison,
- Sakr
- 2009
(Show Context)
Citation Context ...t). either DAG compression or tree grammars), our scheme can compress exponentially better than DAG compression, and the compression ratio is never worse than DAG compression by more than a log n factor. 1.1. Previous work The previous work on tree compression can be described by three major approaches: using subtree repeats, using tree pattern repeats, and using succinct data structures. Below we briefly discuss these approaches and the existing tree compression schemes. Extensive practical work has recently been done on all these tree compression schemes (see e.g., the recent survey of Sakr [8]). DAG compression. Using subtree repeats, a node in the tree T that has a child with subtree T ′ can instead point to any other occurrence of T ′ . This way, it is possible to represent T as a Directed Acyclic Graph (DAG). Over all possible DAGs that can represent T , the smallest one is unique and can be computed in O (n) time [9]. Its size can be exponentially smaller than n. DAG representation of trees are broadly used for identifying and sharing common subexpressions, e.g., in programming languages [10] and binary decision diagrams [11]. Compression based on DAGs has also been studied mor... |

16 | Maintaining center and median in dynamic trees, in: - Alstrup, Holm, et al. - 2000 |

16 | Tree transducers and tree compressions. - Maneth, Busatto - 2004 |

12 | Maintaining information in fully-dynamic trees with top trees,
- Alstrup, Holm, et al.
- 2003
(Show Context)
Citation Context ...lle † phbi@dtu.dk Inge Li Gørtz† inge@dtu.dk Gad M. Landau‡ landau@cs.haifa.ac.il Oren Weimann† oren@cs.haifa.ac.il Abstract We introduce a new compression scheme for labeled trees based on top trees =-=[3]-=-. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while als... |

10 | Algorithmics on SLP-compressed strings: a survey,
- Lohrey
- 2012
(Show Context)
Citation Context ...and compresses better than a logarithmic factor of the DAG compression. • Pattern matching in compressed strings is a well-studied and well-developed area with numerous results, see e.g., the surveys =-=[14,17,22]-=-. Pattern matching in compressed trees (especially within tree compression schemes that exploit tree pattern repeats) is a wide open area. • We wonder if top tree compression is practical. In prelimin... |

10 | Tree structure compression with repair. - Lohrey, Maneth, et al. - 2011 |

7 | Grammar-based tree compression, - Busatto, Lohrey, et al. - 2004 |

6 | XML compression via DAGs, in:
- Lohrey, Maneth, et al.
- 2013
(Show Context)
Citation Context ...sion. Using subtree repeats, a node in the tree T that has a child with subtree T ′ can instead point to any other occurrence of T ′ . This way, it is possible to represent T as a Directed Acyclic Graph (DAG). Over all possible DAGs that can represent T , the smallest one is unique and can be computed in O (n) time [9]. Its size can be exponentially smaller than n. DAG representation of trees are broadly used for identifying and sharing common subexpressions, e.g., in programming languages [10] and binary decision diagrams [11]. Compression based on DAGs has also been studied more recently in [1,2,12] and a Lempel–Ziv analog of subtree repeats was suggested in [13]. It is possible to support navigational queries [14] and path queries [1] directly on the DAG representation in logarithmic time. The problem with subtree repeats is that we can miss many internal repeats. Consider for example the case where T is a single path of n nodes with the same label. Even though T is highly compressible (we can represent it by just storing the label and the path length) it does not contain a single subtree repeat and its minimal DAG is of size n. Tree grammars. Alternatively, tree grammars are capable of... |

3 |
de la Fuente, Lempel–Ziv compression of highly structured documents,
- Adiego, Navarro, et al.
- 2007
(Show Context)
Citation Context ...th subtree T ′ can instead point to any other occurrence of T ′ . This way, it is possible to represent T as a Directed Acyclic Graph (DAG). Over all possible DAGs that can represent T , the smallest one is unique and can be computed in O (n) time [9]. Its size can be exponentially smaller than n. DAG representation of trees are broadly used for identifying and sharing common subexpressions, e.g., in programming languages [10] and binary decision diagrams [11]. Compression based on DAGs has also been studied more recently in [1,2,12] and a Lempel–Ziv analog of subtree repeats was suggested in [13]. It is possible to support navigational queries [14] and path queries [1] directly on the DAG representation in logarithmic time. The problem with subtree repeats is that we can miss many internal repeats. Consider for example the case where T is a single path of n nodes with the same label. Even though T is highly compressible (we can represent it by just storing the label and the path length) it does not contain a single subtree repeat and its minimal DAG is of size n. Tree grammars. Alternatively, tree grammars are capable of exploiting tree pattern repeats. Tree grammars generalize gramma... |

2 |
la Fuente. Lempel-Ziv compression of highly structured documents
- Adiego, Navarro, et al.
(Show Context)
Citation Context ... be computed in O(n) time [11]. Its size can be exponentially smaller than n. Using subtree repeats for compression was studied in [7, 13], and a Lempel-Ziv analog of subtree repeats was suggested in =-=[1]-=-. It is also possible to support navigational queries [6] and path queries [7] directly on the DAG representation in logarithmic time. The problem with subtree repeats is that we can miss many interna... |