Results 1 - 10
of
17
XML Index Recommendation with Tight Optimizer Coupling
, 2007
"... XML database systems are expected to handle increasingly complex queries over increasingly large and highly structured XML databases. An important problem that needs to be solved for these systems is how to choose the best set of indexes for a given workload. In this paper, we present an XML Index A ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
(Show Context)
XML database systems are expected to handle increasingly complex queries over increasingly large and highly structured XML databases. An important problem that needs to be solved for these systems is how to choose the best set of indexes for a given workload. In this paper, we present an XML Index Advisor that solves this XML index recommendation problem, and has the key characteristic of being tightly coupled with the query optimizer. We rely on the optimizer to enumerate candidate indexes and to estimate the benefit gained from potential index configurations. We expand the set of candidate indexes obtained from the query optimizer to include more general indexes that can be useful for queries other than those in the training workload. To recommended an index configuration, we introduce two new search algorithms. The first algorithm finds the best set of indexes for the specific training workload, and the second algorithm finds a general set of indexes that can benefit the training workload as well as other similar workloads. We have implemented our XML Index Advisor in a prototype version of IBM R ○ DB2 R ○ 9, which supports both relational and XML data, and we experimentally demonstrate the effectiveness of our advisor using this implementation. 1
A Service-Oriented System to Support Data Integration on Data Grids
, 2007
"... Data Grids provide transparent access to heterogeneous and autonomous data resources. The main contribution of this paper is the presentation of a data sharing system that (i) is tailored to data grids, (ii) supports well established and widely spread relational DBMSs, and (iii) adopts a hybrid arch ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
(Show Context)
Data Grids provide transparent access to heterogeneous and autonomous data resources. The main contribution of this paper is the presentation of a data sharing system that (i) is tailored to data grids, (ii) supports well established and widely spread relational DBMSs, and (iii) adopts a hybrid architecture by relying on a peer model for query reformulation for retrieving semantically equivalent expressions, and on a wrapper-mediator integration model for accessing and querying distributed data sources. The system builds upon the infrastructure provided by the OGSA-DQP distributed query processor and the XMAP query reformulation algorithm. The paper discusses the implementation methodology, and also presents empirical evaluation results. 1
The Importance of Sibling Clustering for Efficient Bulkload of XML Document Trees
- IBM Systems Journal
, 2005
"... In an XML Data Store (XDS), importing documents from external sources is a very frequent operation. Since a document import consists of a large number of individual node inserts, it is essentially a small bulkload operation. Hence, efficient bulkload support is crucial for XDSs. Essentially, XML bul ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
(Show Context)
In an XML Data Store (XDS), importing documents from external sources is a very frequent operation. Since a document import consists of a large number of individual node inserts, it is essentially a small bulkload operation. Hence, efficient bulkload support is crucial for XDSs. Essentially, XML bulkload is the transformation of an XML parser’s output into the XDS’s persistent storage structures. This involves two major subtasks: (1) Partitioning the documents ’ logical tree structure into subtrees smaller than a disk page in a way that is both space-efficient an suitable for later processing. (2) Mapping the subtrees to the XDS’s internal page representation. In enterprise-scale environments with very large documents and/or very many parallel bulkloads, task (1) is particularly challenging, as not only disk space consumption, but also CPU and main-memory usage are important factors. In this article, we (1) discuss requirements for an XML bulkload module, (2) examine existing algorithms for tree partitioning with respect to their applicability as XML bulkload algorithms, (3) derive a new tree partitioning algorithm, (4) present the design and implementation of the bulkload module used in our Natix XDS, and (5) evaluate the implementation. 1
Data integration and query reformulation in service-based grids
- In Proc. of the 1st CoreGRID Integration Workshop
, 2005
"... This paper describes the XMAP data integration framework and query reformulation algorithm, provides insights into the performance of the algorithm, and about its use in implementing query processing services. Here we propose an approach for data integration-enabled distributed query processing on G ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
This paper describes the XMAP data integration framework and query reformulation algorithm, provides insights into the performance of the algorithm, and about its use in implementing query processing services. Here we propose an approach for data integration-enabled distributed query processing on Grids by embedding the XMAP reformulation algorithm within the OGSA-DQP distributed query processor. To this aim we exploit the OGSA-DQP XML representation of relational schemas by applying the XMAP algorithm on them. Moreover, we introduce a technique to rewrite an XPath query into an equivalent OQL one. Finally, the paper presents a roadmap for the integration system implementation aiming at constructing an extended set of services that will allow users to submit queries over a single database and receive the results from multiple databases that are semantically con'elated with the former one. Keywords: XML databases, semantic data integration, schema mappings, distributed query
Roantree M. Using an Oracle Repository to Accelerate XPath Queries
- Proceedings of the 17th DEXA conference, LNCS
, 2006
"... Abstract. One of the problems associated with XML databases is the poor performance of XPath queries. Although this has attracted much attention by the research community, solutions are either partial (not providing full XPath functionality) or unable to manage database updates. In this work, we exp ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract. One of the problems associated with XML databases is the poor performance of XPath queries. Although this has attracted much attention by the research community, solutions are either partial (not providing full XPath functionality) or unable to manage database updates. In this work, we exploit features of Oracle 10g in order to rapidly build indexes that improve the processing times of XML databases. Significantly, we can also support XML database updates as the rebuild time for entire indexes is reasonably fast and thus, provides for flexible update strategies. This paper discusses the process for building the index repository and describes a series of experiments that demonstrate our improved query response times. 1
Mapping XML to a Wide Sparse Table
"... XML is commonly supported by SQL database systems. However, existing mappings of XML to tables can only deliver satisfactory query performance for limited use cases. In this paper, we propose a novel mapping of XML data into one wide table whose columns are sparsely populated. This mapping provides ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
XML is commonly supported by SQL database systems. However, existing mappings of XML to tables can only deliver satisfactory query performance for limited use cases. In this paper, we propose a novel mapping of XML data into one wide table whose columns are sparsely populated. This mapping provides good performance for document types and queries that are observed in enterprise applications but are not supported efficiently by existing work. XML queries are evaluated by translating them into SQL queries over the wide sparsely-populated table. We show how to translate full XPath 1.0 into SQL. Based on the characteristics of the new mapping, we present rewriting optimizations that minimize the number of joins. Experiments demonstrate that query evaluation over the new mapping delivers considerable improvements over existing techniques for the target use cases.
Automatic physical design for xml databases
, 2010
"... I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Database systems employ physical structures such as in ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Database systems employ physical structures such as indexes and materialized views to improve query performance, potentially by orders of magnitude. It is therefore important for a database administrator to choose the appropriate con-figuration of these physical structures (i.e., the appropriate physical design) for a given database. Deciding on the physical design of a database is not an easy task, and a considerable amount of research exists on automatic physical design tools for relational databases. Recently, XML database systems are increasingly being used for managing highly structured XML data, and support for XML data is be-ing added to commercial relational database systems. This raises the important question of how to choose the appropriate physical design (i.e., the appropriate set
Developing a Web Service for Distributed Persistent Objects in the Context of an XML Database Programming Language
- In International Conference On the Move(OTM), volume 3670 of LNCS
, 2005
"... Abstract. The development of data centric applications should be performed in a high-level and transparent way. In particular, aspects concerning the persistency and distribution of business objects should not influence or restrict the application design. Furthermore applications should be platform ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. The development of data centric applications should be performed in a high-level and transparent way. In particular, aspects concerning the persistency and distribution of business objects should not influence or restrict the application design. Furthermore applications should be platform independent and should be able to exchange data independently of their programming language origin. There are several approaches for an architecture for distributed objects. One example is CORBA. JDO and EJB allow specifications for distributed persistent objects offering transparent persistency up to a certain degree. Nevertheless, the programmer is still forced to write explicit code for making objects persistent or for connecting to distributed objects. In contrast to existing approaches, the XOBEDBPL project develops a database programming language with transparency with respect to types, and persistency and distribution with respect to objects. Application development is performed on a high-level business object level only. A web service for realizing distributed persistency and data exchange is internal and completely integrated in the XOBEDBPL runtime environment. Although the XOBEDBPL language is an extension of the Java programming language, the introduced concepts could be easily transferred to other object-oriented programming languages. 1
Efficient Native Storage for Semi-structured Data
"... Semi-structured data is becoming commonplace with examples such as XML, Bioinformatics suffix-trees, scientific computing data, and even generic directory-file hierarchies. Such semi-structured data must be stored on mass storage devices for persistence as well as cost-efficiency. Current approaches ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Semi-structured data is becoming commonplace with examples such as XML, Bioinformatics suffix-trees, scientific computing data, and even generic directory-file hierarchies. Such semi-structured data must be stored on mass storage devices for persistence as well as cost-efficiency. Current approaches, which map semi-structured data to relational databases or simply use flat files, incur a mismatch between the structure of the data and the underlying storage device (disk drive). In this paper, we explore alternate native strategies for storing semi-structured data that match its access characteristics to those of disk drives, using XML data as a concrete case study. In particular, we present algorithms that, given semi-structured data and a disk drive, decide how to store the data on the drive in a way that will later allow efficient navigation and retrieval. We evaluate our proposed methods using the DiskSim disk simulator and benchmark XPath queries. The experimental results indicate savings of as much as 7X-34X in query execution time for an important class of navigational queries (which we call non-deep-focused class), compared to the baseline sequential layout of the XML data. 1
A service-oriented system for distributed data querying and integration on Grids
- FUTURE GENERATION COMPUTER SYSTEMS
, 2009
"... ..."