Results 1 - 10
of
231
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 767 (11 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Answering Queries Using Views: A Survey
, 2000
"... The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a w ..."
Abstract
-
Cited by 562 (32 self)
- Add to MetaCart
(Show Context)
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results.
An Introduction to Spatial Database Systems
- THE VLDB JOURNAL
, 1994
"... We propose a definition of a spatial database system as a database system that offers spatial data types in its data model and query language, and supports ..."
Abstract
-
Cited by 214 (9 self)
- Add to MetaCart
(Show Context)
We propose a definition of a spatial database system as a database system that offers spatial data types in its data model and query language, and supports
Partition Based Spatial-Merge Join
, 1996
"... This paper describes PBSM (Partition Based Spatial--Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a situation could arise if both inputs to the join are interme ..."
Abstract
-
Cited by 184 (12 self)
- Add to MetaCart
(Show Context)
This paper describes PBSM (Partition Based Spatial--Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a situation could arise if both inputs to the join are intermediate results in a complex query, or in a parallel environment where the inputs must be dynamically redistributed. The PBSM algorithm partitions the inputs into manageable chunks, and joins them using a computational geometry based plane--sweeping technique. This paper also presents a performance study comparing the the traditional indexed nested loops join algorithm, a spatial join algorithm based on joining spatial indices, and the PBSM algorithm. These comparisons are based on complete implementations of these algorithms in Paradise, a database system for handling GIS applications. Using real data sets, the performance study examines the behavior of these spatial join algorithms in a vari...
Database architecture optimized for the new bottleneck: Memory access
- In Proceedings of VLDB Conference
, 1999
"... In the past decade, advances in speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article, we use a simple scan test to show the severe impac ..."
Abstract
-
Cited by 161 (15 self)
- Add to MetaCart
(Show Context)
In the past decade, advances in speed of commodity CPUs have far out-paced advances in memory latency. Main-memory access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article, we use a simple scan test to show the severe impact of this bottleneck. The insights gained are translated into guidelines for database architecture; in terms of both data structures and algorithms. We discuss how vertically fragmented data structures optimize cache performance on sequential data access. We then focus on equi-join, typically a random-access operation, and introduce radix algorithms for partitioned hash-join. The performance of these algorithms is quantified using a detailed analytical model that incorporates memory access cost. Experiments that validate this model were performed on the Monet database system. We obtained exact statistics on events like TLB misses, L1 and L2 cache misses, by using hardware performance counters found in modern CPUs. Using our cost model, we show how the carefully tuned memory access pattern of our radix algorithms make them perform well, which is confirmed by experimental results. ∗*This work was carried out when the author was at the
A fast index for semistructured data
- In VLDB
, 2001
"... Queries navigate semistructured data via path expressions, and can be accelerated using an index. Our solution encodes paths as strings, and inserts those strings into a special index that is highly optimized for long and complex keys. We describe the Index Fabric, an indexing structure that provide ..."
Abstract
-
Cited by 138 (5 self)
- Add to MetaCart
Queries navigate semistructured data via path expressions, and can be accelerated using an index. Our solution encodes paths as strings, and inserts those strings into a special index that is highly optimized for long and complex keys. We describe the Index Fabric, an indexing structure that provides the efficiency and flexibility we need. We discuss how "raw paths " are used to optimize ad hoc queries over semistructured data, and how "refined paths " optimize specific access paths. Although we can use knowledge about the queries and structure of the data to create refined paths, no such knowledge is needed for raw paths. A performance study shows that our techniques, when implemented on top of a commercial relational database system, outperform the more traditional approach of using the commercial system’s indexing mechanisms to query the XML. 1.
Spatial Hash-Joins
, 1996
"... The hash-join paradigm works well for relational joins, but is hard to apply to spatial joins. Relational hash-joins can guarantee that items in different hash buckets are irrelevant to each other for the purpose of join, but complexities intrinsic to spatial join predicates preclude such guarantees ..."
Abstract
-
Cited by 106 (1 self)
- Add to MetaCart
The hash-join paradigm works well for relational joins, but is hard to apply to spatial joins. Relational hash-joins can guarantee that items in different hash buckets are irrelevant to each other for the purpose of join, but complexities intrinsic to spatial join predicates preclude such guarantees. It is also difficult to design spatial partition functions that produce equal-sized buckets. We examine how to apply the hash-join paradigm to spatial joins, and define a new framework for spatial hash-joins. Our spatial partition functions have two components: a set of bucket extents and an assignment function, which may map a data item into multiple buckets. Furthermore, the partition functions for the two input datasets may be different. We have designed and tested a spatial hashjoin method based on this framework. The partition function for the inner dataset is initialized by sampling the dataset, and evolves as data are inserted. The partition function for the outer dataset is immutab...
Cache Conscious Algorithms for Relational Query Processing
- In Proceedings of the 20th VLDB Conference
, 1994
"... The current main memory (DRAM) access speeds lag far behind CPU speeds. Cache memory, made of static RAM, is being used in today's architectures to bridge this gap. It provides access latencies of 2--4 processor cycles, in contrast to main memory which requires 15--25 cycles. Therefore, the per ..."
Abstract
-
Cited by 102 (2 self)
- Add to MetaCart
(Show Context)
The current main memory (DRAM) access speeds lag far behind CPU speeds. Cache memory, made of static RAM, is being used in today's architectures to bridge this gap. It provides access latencies of 2--4 processor cycles, in contrast to main memory which requires 15--25 cycles. Therefore, the performance of the CPU depends upon how well the cache can be utilized. We show that there are significant benefits in redesigning our traditional query processing algorithms so that they can make better use of the cache. The new algorithms run 8%--200% faster than the traditional ones. 1 Introduction The DRAM access speeds have not reduced much compared to the CPU cycle time reduction resulting from the improvements in VLSI technology. Cache memories, made of fast static RAM, help alleviate this disparity by exploiting the spatial and temporal locality in the data accesses of a program. However, programs with poor access locality waste significantly many cycles transferring the data to and from th...
Efficient Computation of Spatial Joins
, 1993
"... Spatial joins are join operations that involve spatial data types and operators. Due to some basic properties of spatial data, many conventional join processing strategies suffer serious performance penalties or are not applicable at all in this case. In this paper we explore which of the join strat ..."
Abstract
-
Cited by 93 (4 self)
- Add to MetaCart
(Show Context)
Spatial joins are join operations that involve spatial data types and operators. Due to some basic properties of spatial data, many conventional join processing strategies suffer serious performance penalties or are not applicable at all in this case. In this paper we explore which of the join strategies known from conventional databases can be applied to spatial joins as well, and how some of these techniques can be modified to be more efficient in the context of spatial data. Furthermore, we describe a class of tree structures, called generalization trees, that can be applied efficiently to compute spatial joins in a hierarchical manner. Finally, we model the performance of the most promising strategies analytically and conduct a comparative study. Parts of this work have been carried out while the author was visiting the International Computer Science Institute and the University of California at Berkeley. 1 1 Introduction Spatial databases have become a very active research top...
MIL Primitives For Querying A Fragmented World
, 1999
"... In query-intensive database application areas, like decision support and data mining, systems that use vertical fragmentation have a significant performance advantage. In order to support relational or object oriented applications on top of such a fragmented data model, a flexible yet powerful inter ..."
Abstract
-
Cited by 82 (26 self)
- Add to MetaCart
(Show Context)
In query-intensive database application areas, like decision support and data mining, systems that use vertical fragmentation have a significant performance advantage. In order to support relational or object oriented applications on top of such a fragmented data model, a flexible yet powerful intermediate language is needed. This problem has been successfully tackled in Monet, a modern extensible database kernel developed by our group. We focus on the design choices made in the Monet Interpreter Language (MIL), its algebraic query language, and outline how its concept of tactical optimization enhances and simplifies the optimization of complex queries. Finally, we summarize the experience gained in Monet by creating a highly efficient implementation of MIL.