Results 1 - 10
of
20
MonetDB/XQuery: a fast XQuery processor powered by a relational engine
- IN SIGMOD
, 2006
"... Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-bas ..."
Abstract
-
Cited by 135 (26 self)
- Add to MetaCart
(Show Context)
Relational XQuery systems try to re-use mature relational data management infrastructures to create fast and scalable XML database technology. This paper describes the main features, key contributions, and lessons learned while implementing such a system. Its architecture consists of (i) a range-based encoding of XML documents into relational tables, (ii) a compilation technique that translates XQuery into a basic relational algebra, (iii) a restricted (order) property-aware peephole relational query optimization strategy, and (iv) a mapping from XML update statements into relational updates. Thus, this system implements all essential XML database functionalities (rather than a single feature) such that we can learn from the full consequences of our architectural decisions. While implementing this system, we had to extend the state-of-theart with a number of new technical contributions, such as looplifted staircase join and efficient relational query evaluation strategies for XQuery theta-joins with existential semantics. These contributions as well as the architectural lessons learned are also deemed valuable for other relational back-end engines. The performance and scalability of the resulting system is evaluated on the XMark benchmark up to data sizes of 11 GB. The performance section also provides an extensive comparison of all major XMark results published previously, which confirm that the goal of purely relational XQuery processing, namely speed and scalability, was met.
Strategies for Query Unnesting in XML Databases
- ACM TRANSACTIONS ON DATABASE SYSTEMS
, 2006
"... Queries formulated in a nested way are very common in XQuery. Unfortunately, their evaluation is usually very inefficient when done in a straightforward fashion. We present a framework for handling nested queries that is based on unnesting the queries after having translated them into an algebra. We ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
Queries formulated in a nested way are very common in XQuery. Unfortunately, their evaluation is usually very inefficient when done in a straightforward fashion. We present a framework for handling nested queries that is based on unnesting the queries after having translated them into an algebra. We not only present a collection of algebraic equivalences, but also supply a strategy on how to use them effectively. The full potential of the approach is demonstrated by applying our rewrites to actual queries and showing that performance gains of several orders of magnitude are possible.
First-Class Functions for First-Order Database Engines
- In Proc. DBPL
, 2013
"... We describe query defunctionalization which enables off-the-shelf first-order database engines to process queries over first-class func-tions. Support for first-class functions is characterized by the ability to treat functions like regular data items that can be constructed at query runtime, passed ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
We describe query defunctionalization which enables off-the-shelf first-order database engines to process queries over first-class func-tions. Support for first-class functions is characterized by the ability to treat functions like regular data items that can be constructed at query runtime, passed to or returned from other (higher-order) functions, assigned to variables, and stored in persistent data struc-tures. Query defunctionalization is a non-invasive approach that transforms such function-centric queries into the data-centric opera-tions implemented by common query processors. Experiments with XQuery and PL/SQL database systems demonstrate that first-order database engines can faithfully and efficiently support the expressive “functions as data ” paradigm. 1. Functions Should be First-Class Since the early working drafts of 2001, XQuery’s syntax and
Leveraging Windows Workflow Foundation for Scientific Workflows in Wind Tunnel Applications
- IEEE Workshop on Workflow and Data Flow for Scientific Applications (SciFlow'06
, 2006
"... Scientific and engineering experiments often produce large volumes of data that must be processed and visualised in near-realtime. An example of this, described in this paper, is microphone array processing of data from wind tunnels for aeroacoustic measurements. The overall turnaround time from dat ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Scientific and engineering experiments often produce large volumes of data that must be processed and visualised in near-realtime. An example of this, described in this paper, is microphone array processing of data from wind tunnels for aeroacoustic measurements. The overall turnaround time from data acquisition and movement, to data processing and visualization is often inhibited by factors such as manual data movement, system interoperability issues, manual resource discovery for job scheduling, and disparate physical locality between the experiment and scientist or engineer post-event. Workflow frameworks and runtimes can enable rapid composition and execution of complex scientific workflows. In this paper we explore two approaches based on Windows Workflow Foundation, a component of Microsoft WinFX. In our first approach, we present a framework for users to compose sequential workflows and access Globus grid services seamlessly using a.NET-based Commodity Grid Toolkit (MyCoG.NET). We demonstrate how application specific activity sets can be developed and extended by users. In our second approach we highlight how it can be advantageous to keep databases as central to the complete workflow enactment. These two approaches are demonstrated in the context of a wind tunnel Grid system being developed to help experimental aerodynamicists orchestrate such workflows. 1.
XML Design for Relational Storage
- WWW 2007
, 2007
"... Design principles for XML schemas that eliminate redundancies and avoid update anomalies have been studied recently. Several normal forms, generalizing those for relational databases, have been proposed. All of them, however, are based on the assumption of a native XML storage, while in practice mos ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Design principles for XML schemas that eliminate redundancies and avoid update anomalies have been studied recently. Several normal forms, generalizing those for relational databases, have been proposed. All of them, however, are based on the assumption of a native XML storage, while in practice most of XML data is stored in relational databases. In this paper we study XML design and normalization for relational storage of XML documents. To be able to relate and compare XML and relational designs, we use an information-theoretic framework that measures information content in relations and documents, with higher values corresponding to lower levels of redundancy. We show that most common relational storage schemes preserve the notion of being well-designed (i.e., anomalies- and redundancyfree). Thus, existing XML normal forms guarantee welldesigned relational storages as well. We further show that if this perfect option is not achievable, then a slight restriction on XML constraints guarantees a “second-best ” relational design, according to possible values of the informationtheoretic measure. We finally consider an edge-based relational representation of XML documents, and show that while it has similar information-theoretic properties with other relational representations, it can behave significantly worse in terms of enforcing integrity constraints.
XQuery Rewrite Optimization in IBM DB2 pureXML
- IEEE Data Engineering Bulletin
"... In this paper, we describe XQuery compilation and rewrite optimization in DB2 pureXML, a hybrid relational and XML database management system. DB2 pureXML has been designed to scale to large collections of XML data. In such a system, effective filtering of XML documents and efficient execution of XM ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we describe XQuery compilation and rewrite optimization in DB2 pureXML, a hybrid relational and XML database management system. DB2 pureXML has been designed to scale to large collections of XML data. In such a system, effective filtering of XML documents and efficient execution of XML navigation are vital for high throughput. Hence the focus of rewrite optimization is to consolidate navigation constructs as much as possible and to pushdown comparison predicates and navigation constructs into data access to enable index usage. In this paper, we describe the new rewrite transformations we have implemented specifically for XQuery and its navigational constructs. We also briefly discuss how some of the existing rewrite transformations developed for the SQL engine are extended and adapted for XQuery. 1
Automatic physical design for xml databases
, 2010
"... I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Database systems employ physical structures such as in ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Database systems employ physical structures such as indexes and materialized views to improve query performance, potentially by orders of magnitude. It is therefore important for a database administrator to choose the appropriate con-figuration of these physical structures (i.e., the appropriate physical design) for a given database. Deciding on the physical design of a database is not an easy task, and a considerable amount of research exists on automatic physical design tools for relational databases. Recently, XML database systems are increasingly being used for managing highly structured XML data, and support for XML data is be-ing added to commercial relational database systems. This raises the important question of how to choose the appropriate physical design (i.e., the appropriate set
Query Processing and Optimization in Native XML Databases
, 2006
"... XML has emerged as a semantic markup language for documents as well as the de facto language for data exchange over the World Wide Web. Declarative query languages, such as XPath and XQuery, are proposed for querying over large volumes of XML data. A number of techniques have been proposed to evalu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
XML has emerged as a semantic markup language for documents as well as the de facto language for data exchange over the World Wide Web. Declarative query languages, such as XPath and XQuery, are proposed for querying over large volumes of XML data. A number of techniques have been proposed to evaluate XML queries more efficiently. Many of these techniques assume a tree model of XML documents and are, therefore, also applicable to other data sources that can be explicitly or implicitly translated into a similar data model. The focus of this thesis is on efficient evaluation and optimization of path expressions in native XML databases. Specifically, the following issues are considered: storage system design, design of physical operators and efficient execution algorithms, and the cost-based query optimizer. The proposed storage system linearizes the tree structure into strings that can be decomposed into disk pages. Simple statistics are kept in the page headers to facilitate I/O-efficient navigation. Based on this storage system, a hybrid approach is developed to evaluate path expressions that exploit the advantages of navigational and join-based
semistructured data
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
- Add to MetaCart
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Copyright © and Moral Rights for this thesis are retained by the author and/or
"... other copyright owners. A copy can be downloaded for personal noncommercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in ..."
Abstract
- Add to MetaCart
(Show Context)
other copyright owners. A copy can be downloaded for personal noncommercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g. AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination