Results 1 - 10
of
192
Relational Databases for Querying XML Documents: Limitations and Opportunities
, 1999
"... XML is fast emerging as the dominant standard for representing data in the World Wide Web. Sophisticated query engines that allow users to effectively tap the data stored in XML documents will be crucial to exploiting the full power of XML. While there has been a great deal of activity recently prop ..."
Abstract
-
Cited by 478 (9 self)
- Add to MetaCart
XML is fast emerging as the dominant standard for representing data in the World Wide Web. Sophisticated query engines that allow users to effectively tap the data stored in XML documents will be crucial to exploiting the full power of XML. While there has been a great deal of activity recently proposing new semistructured data models and query languages for this purpose, this paper explores the more conservative approach of using traditional relational database engines for processing XML documents conforming to Document Type Descriptors (DTDs). To this end, we have developed algorithms and implemented a prototype system that converts XML documents to relational tuples, translates semi-structured queries over XML documents to SQL queries over tables, and converts the results to XML. We have qualitatively evaluated this approach using several real DTDs drawn from diverse domains. It turns out that the relational approach can handle most (but not all) of the semantics of semi-structured queries over XML data, but is likely to be effective only in some cases. We identify the causes for these limitations and propose certain extensions to the relational
Join Indices
- ACM Transactions on Database Systems
, 1987
"... In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of com ..."
Abstract
-
Cited by 231 (4 self)
- Add to MetaCart
(Show Context)
In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of complex queries. For most of the joins, updates to join indices incur very little overhead. Some properties of a join index are (i) its efficient use of memory and adaptiveness to parallel execution, data type join predicates, (iv) its support for multirelation clustering, and (v) its use in representing directed graphs and in evaluating recursive queries. Finally, the analysis of the join algorithm using join indices shows its excellent performance.
Scalable semantic web data management using vertical partitioning
- In VLDB
, 2007
"... The dataset used for this benchmark is taken from the publicly available Barton Libraries dataset [1]. This data is provided by the Simile Project [3], which develops tools for library data management and interoperability. The data contains records that compose an RDF-formatted dump of the MIT Libra ..."
Abstract
-
Cited by 190 (6 self)
- Add to MetaCart
(Show Context)
The dataset used for this benchmark is taken from the publicly available Barton Libraries dataset [1]. This data is provided by the Simile Project [3], which develops tools for library data management and interoperability. The data contains records that compose an RDF-formatted dump of the MIT Libraries Barton catalog, converted from raw data stored in an old library format standard called MARC (Machine Readable Catalog). Because of the multiple sources the data was derived from and the diverse nature of the data that is cataloged, the structure of the data is quite irregular. At the time of publication of this report, there are slightly more than 50 million triples in the dataset, with a total of 221 unique properties, of which the vast majority appear infrequently. Of these properties, 82 (37%) are multi-valued, meaning that they appear more than once for a given subject; however, these properties appear more often (77 % of the triples have a multi-valued property). The dataset provides a good demonstration of the relatively unstructured nature of Semantic Web data. 2. LONGWELL OVERVIEW Longwell [2] is a tool developed by the Simile Project, which provides a graphical user interface for generic RDF data exploration in a web browser. It begins by presenting the user with a list of the values the type property can take (such as Text or Notated Music in the library dataset). The user can click on the types of data he desires to further explore. Longwell shows the list of currently filtered resources (RDF subjects) in the main portion of the screen, and a list of filters in panels along the side. Each panel represents a property that is defined on the current filter, with popular object values for that property and their frequency also presented in this box. If the user selects an object value, this filters the working set of resources to those that have that property-object value defined,
Database architecture optimized for the new bottleneck: Memory access
- In Proceedings of VLDB Conference
, 1999
"... In the past decade, advances in speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article, we use a simple scan test to show the severe impac ..."
Abstract
-
Cited by 161 (15 self)
- Add to MetaCart
(Show Context)
In the past decade, advances in speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article, we use a simple scan test to show the severe impact of this bottleneck. The insights gained are translated into guidelines for database architecture; in terms of both data structures and algorithms. We discuss how vertically fragmented data structures optimize cache performance on sequential data access. We then focus on equi-join, typically a random-access operation, and introduce radix algorithms for partitioned hash-join. The performance of these algorithms is quantified using a detailed analytical model that incorporates memory access cost. Experiments that validate this model were performed on the Monet database system. We obtained exact statistics on events like TLB misses, L1 and L2 cache misses, by using hardware performance counters found in modern CPUs. Using our cost model, we show how the carefully tuned memory access pattern of our radix algorithms make them perform well, which is confirmed by experimental results. ∗*This work was carried out when the author was at the
MonetDB/X100: Hyper-pipelining query execution
- In CIDR
, 2005
"... Database systems tend to achieve only low IPC (instructions-per-cycle) efficiency on modern CPUs in compute-intensive application areas like decision support, OLAP and multimedia retrieval. This paper starts with an in-depth investigation to the reason why this happens, focusing on the TPC-H benchma ..."
Abstract
-
Cited by 156 (23 self)
- Add to MetaCart
(Show Context)
Database systems tend to achieve only low IPC (instructions-per-cycle) efficiency on modern CPUs in compute-intensive application areas like decision support, OLAP and multimedia retrieval. This paper starts with an in-depth investigation to the reason why this happens, focusing on the TPC-H benchmark. Our analysis of various relational systems and MonetDB leads us to a new set of guidelines for designing a query processor. The second part of the paper describes the architecture of our new X100 query engine for the MonetDB system that follows these guidelines. On the surface, it resembles a classical Volcano-style engine, but the crucial difference to base all execution on the concept of vector processing makes it highly CPU efficient. We evaluate the power of MonetDB/X100 on the 100GB version of TPC-H, showing its raw execution power to be between one and two orders of magnitude higher than previous technology. 1
A Performance Evaluation of Alternative Mapping Schemes for Storing XML Data in a Relational Database
, 1999
"... XML is emerging as one of the dominant data formats for data processing on the Internet. To query XML data, query languages like XQL, Lorel, XML-QL, or XML-GL have been proposed. In this paper, we study how XML data can be stored and queried using a standard relational database system. For this pur ..."
Abstract
-
Cited by 151 (2 self)
- Add to MetaCart
XML is emerging as one of the dominant data formats for data processing on the Internet. To query XML data, query languages like XQL, Lorel, XML-QL, or XML-GL have been proposed. In this paper, we study how XML data can be stored and queried using a standard relational database system. For this purpose, we present alternative mapping schemes to store XML data in a relational database and discuss how XML-QL queries can be translated into SQL queries for every mapping scheme. We present the results of comprehensive performance experiments that analyze the tradeos of the alternative mapping schemes in terms of database size, query performance and update performance. While our discussion is focussed on XML and XML-QL, the results of this paper are relevant for most semi-structured data models and most query languages for semi-structured data.
Weaving Relations for Cache Performance
, 2001
"... Relational database systems have traditionally optimzed for I/O performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages). Recent research, however, indicates that cache utilization and performance is becoming increasingly important on m ..."
Abstract
-
Cited by 127 (15 self)
- Add to MetaCart
Relational database systems have traditionally optimzed for I/O performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages). Recent research, however, indicates that cache utilization and performance is becoming increasingly important on modern platforms. In this paper, we first demonstrate that in-page data placement is the key to high cache performance and that NSM exhibits low cache utilization on modern platforms. Next, we propose a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping together all values of each attribute within each page. Because PAX only affects layout inside the pages, it incurs no storage penalty and does not affect I/O behavior. According to our experimental results, when compared to NSM (a) PAX exhibits superior cache and memory bandwidth utilization, saving at least 75% of NSM's stall time due to data cache accesses, (b) range selection queries and updates on memoryresident relations execute 17-25% faster, and (c) TPC-H queries involving I/O execute 11-48% faster.
Super-Scalar RAM-CPU Cache Compression
- In Proceedings of the International Conference of Data Engineering (IEEE ICDE
, 2006
"... CWI is a founding member of ERCIM, the European Research Consortium for Informatics and Mathematics. CWI's research has a theme-oriented structure and is grouped into four clusters. Listed below are the names of the clusters and in parentheses their acronyms. ..."
Abstract
-
Cited by 106 (18 self)
- Add to MetaCart
(Show Context)
CWI is a founding member of ERCIM, the European Research Consortium for Informatics and Mathematics. CWI's research has a theme-oriented structure and is grouped into four clusters. Listed below are the names of the clusters and in parentheses their acronyms.
Storage and Querying of E-Commerce Data
, 2001
"... New generation of e-commerce applications require data schemas that are constantly evolving and sparsely populated. The conventional horizontal row representation fails to meet these requirements. We represent objects in a vertical format storing an object as a set of tuples. Each tuple consists of ..."
Abstract
-
Cited by 89 (2 self)
- Add to MetaCart
New generation of e-commerce applications require data schemas that are constantly evolving and sparsely populated. The conventional horizontal row representation fails to meet these requirements. We represent objects in a vertical format storing an object as a set of tuples. Each tuple consists of an object identifier and attribute name-value pair. Schema evolution is now easy. However, writing queries against this format becomes cumbersome. We create a logical horizontal view of the vertical representation and transform queries on this view to the vertical table. We present alternative implementations and performance results that show the effectiveness of the vertical representation for sparse data. We also identify additional facilities needed in database systems to support these applications well.
MIL Primitives For Querying A Fragmented World
, 1999
"... In query-intensive database application areas, like decision support and data mining, systems that use vertical fragmentation have a significant performance advantage. In order to support relational or object oriented applications on top of such a fragmented data model, a flexible yet powerful inter ..."
Abstract
-
Cited by 82 (26 self)
- Add to MetaCart
(Show Context)
In query-intensive database application areas, like decision support and data mining, systems that use vertical fragmentation have a significant performance advantage. In order to support relational or object oriented applications on top of such a fragmented data model, a flexible yet powerful intermediate language is needed. This problem has been successfully tackled in Monet, a modern extensible database kernel developed by our group. We focus on the design choices made in the Monet Interpreter Language (MIL), its algebraic query language, and outline how its concept of tactical optimization enhances and simplifies the optimization of complex queries. Finally, we summarize the experience gained in Monet by creating a highly efficient implementation of MIL.