Results 1 - 10
of
38
Efficient Memory Representation of XML Document Trees.
- Inf. Syst.
, 2008
"... Abstract Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of ..."
Abstract
-
Cited by 51 (14 self)
- Add to MetaCart
(Show Context)
Abstract Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. Here a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by "compressing" their tree structure; the latter means to detect and remove repetitions of tree patterns. The functionality of basic tree operations, like traversal along edges, is preserved in the compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. For certain tasks like validation against an XML type or checking equality of documents, the representation allows for provably more efficient algorithms than those running on conventional representations.
Statistical Learning Techniques for Costing XML Queries
- In VLDB
, 2005
"... Developing cost models for query optimization is significantly harder for XML queries than for traditional relational queries. The reason is that XML query operators are much more complex than relational operators such as table scans and joins. In this paper, we propose a new approach, called C ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
(Show Context)
Developing cost models for query optimization is significantly harder for XML queries than for traditional relational queries. The reason is that XML query operators are much more complex than relational operators such as table scans and joins. In this paper, we propose a new approach, called Comet, to modeling the cost of XML operators; to our knowledge, Comet is the first method ever proposed for addressing the XML query costing problem. As in relational cost estimation, Comet exploits a set of system catalog statistics that summarizes the XML data; the set of "simple path" statistics that we propose is new, and is well suited to the XML setting.
XSEED: Accurate and fast cardinality estimation for XPath queries
- In to appear Proc. 22nd Int. Conf. on Data Engineering (ICDE
, 2006
"... We propose XSEED, a synopsis of path queries for cardinality estimation that is accurate, robust, efficient, and adaptive to memory budgets. XSEED starts from a very small kernel, and then incrementally updates information of the synopsis. With such an incremental construction, a synopsis structure ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
(Show Context)
We propose XSEED, a synopsis of path queries for cardinality estimation that is accurate, robust, efficient, and adaptive to memory budgets. XSEED starts from a very small kernel, and then incrementally updates information of the synopsis. With such an incremental construction, a synopsis structure can be dynamically configured to accommodate different memory budgets. Cardinality estimation based on XSEED can be performed very efficiently and accurately. Extensive experiments on both synthetic and real data sets show that even with less memory, XSEED could achieve accuracy that is an order of magnitude better than that of other synopsis structures. The cardinality estimation time is under 2 % of the actual querying time for a wide range of queries in all test cases. 1
FIX: Feature-based Indexing Technique for XML Documents
, 2006
"... Indexing large XML databases is crucial for efficient evaluation of XML twig queries. In this paper, we propose a feature-based indexing technique, called FIX, based on spectral graph theory. The basic idea is that for each twig pattern in a collection of XML documents, we calculate a vector of feat ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Indexing large XML databases is crucial for efficient evaluation of XML twig queries. In this paper, we propose a feature-based indexing technique, called FIX, based on spectral graph theory. The basic idea is that for each twig pattern in a collection of XML documents, we calculate a vector of features based on its structural properties. These features are used as keys for the patterns and stored in a B + tree. Given an XPath query, its feature vector is first calculated and looked up in the index. Then a further refinement phase is performed to fetch the final results. We experimentally study the indexing technique over both synthetic and real data sets. Our experiments show that FIX provides great pruning power and could gain an order of magnitude performance improvement for many XPath queries over existing evaluation techniques.
Efficient updates in dynamic XML data: from binary string to quaternary string
- THE VLDB JOURNAL
"... ..."
System RX: one part relational, one part XML
- in: Proceedings of ACM SIGMOD International Conference on Management of Data, 2005
"... This paper describes the overall architecture and design aspects of a hybrid relational and XML database system called System RX. We believe that such a system is fundamental in the evolution of enterprise data management solutions: XML and relational data will co-exist and complement each other in ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
(Show Context)
This paper describes the overall architecture and design aspects of a hybrid relational and XML database system called System RX. We believe that such a system is fundamental in the evolution of enterprise data management solutions: XML and relational data will co-exist and complement each other in enterprise solutions. Furthermore, a successful XML repository requires much of the same infrastructure that already exists in a relational database management system. Finally, XML query languages have con-siderable conceptual and functional overlap with relational data-flow engines. System RX is the first truly hybrid system that co-mingles XML and relational data, giving them equal footing. The new support for XML includes native support for storage and indexing as well as query compilation and evaluation support for the latest industry-standard query languages, SQL/XML and XQuery. By building a hybrid system, we leverage more than 20 years of data management research to advance XML technology to the same standards expected from mature relational systems. 1.
Query optimization in XML structured-document databases
- THE VLDB JOURNAL
, 2006
"... While the information published in the form of XML-compliant documents keeps fast mounting up, efficient and effective query processing and optimization for XML have now become more important than ever. This article reports our recent advances in XML structureddocument query optimization. In this ar ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
While the information published in the form of XML-compliant documents keeps fast mounting up, efficient and effective query processing and optimization for XML have now become more important than ever. This article reports our recent advances in XML structureddocument query optimization. In this article, we elaborate on a novel approach and the techniques developed for XML query optimization. Our approach performs heuristic-based algebraic transformations on XPath queries, represented as PAT algebraic expressions, to achieve query optimization. This article first presents a comprehensive set of general equivalences with regard to XML documents and XML queries. Based on these equivalences, we developed a large set of deterministic algebraic transformation rules for XML query optimization. Our approach is unique, in that it performs exclusively deterministic transformations on queries for fast optimization. The deterministic nature of the proposed approach straightforwardly renders high optimization efficiency and simplicity in implementation. Our approach is a logical-level one, which is independent of any particular storage model. Therefore, the optimizers developed based on our approach can be easily adapted to a broad range of XML data/information servers to achieve fast query optimization. Experimental study confirms the validity and effectiveness of the proposed approach.
Tree-Pattern Queries on a Lightweight XML Processor
- In VLDB
, 2005
"... Popular XML languages, like XPath, use "treepattern " queries to select nodes based on their structural characteristics. While many processing methods have already been proposed for such queries, none of them has found its way to any of the existing "lightweight" XML engines ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
Popular XML languages, like XPath, use "treepattern " queries to select nodes based on their structural characteristics. While many processing methods have already been proposed for such queries, none of them has found its way to any of the existing "lightweight" XML engines (i.e. engines without optimization modules). The main reason is the lack of a systematic comparison of query methods under a common storage model. In this work, we aim to fill this gap and answer two important questions: what the relative similarities and important differences among the tree-pattern query methods are, and if there is a prominent method among them in terms of effectiveness and robustness that an XML processor should support.
Compact Access Control Labeling for Efficient Secure XML Query Evaluation
, 2004
"... Fine-grained access controls for XML define access privileges at the granularity of individual XML nodes. In this paper, we present a fine-grained access control mechanism for XML data. This mechanism exploits the structural locality of access rights as well as correlations among the access rights o ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Fine-grained access controls for XML define access privileges at the granularity of individual XML nodes. In this paper, we present a fine-grained access control mechanism for XML data. This mechanism exploits the structural locality of access rights as well as correlations among the access rights of different users to produce a compact physical encoding of the access control data. This encoding can be constructed using a single pass over a labeled XML database. It is block-oriented and suitable for use in secondary storage. We show how this access control mechanism can be integrated with a next-of-kin (NoK) XML query processor to provide efficient, secure query evaluation. The key idea is that the structural information of the nodes and their encoded access controls are stored together so the access privileges can be checked efficiently. Our evaluation shows that the access control mechanism introduces little overhead into the query evaluation process. 1