Results 1 - 10
of
72
An overview of data warehousing and OLAP technology
- SIGMOD Record
, 1997
"... Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offering ..."
Abstract
-
Cited by 234 (3 self)
- Add to MetaCart
Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications. This paper provides an overview of data warehousing and OLAP technologies, with an emphasis on their new requirements. We describe back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing; and tools for metadata management and for managing the warehouse. In addition to surveying the state of the art, this paper also identifies some promising research issues, some of which are related to problems that the database research community has worked on for years, but others are only just beginning to be addressed. This overview is based on a tutorial that the authors presented at the VLDB Conference, 1996. 1.
Predicate Migration: Optimizing Queries with Expensive Predicates
, 1993
"... . The traditional focus of relational query optimization schemes has been on the choice of join methods and join orders. Restrictions have typically been handled in query optimizers by "predicate pushdown" rules, which apply restrictions in some random order before as many joins as possible. These r ..."
Abstract
-
Cited by 135 (7 self)
- Add to MetaCart
. The traditional focus of relational query optimization schemes has been on the choice of join methods and join orders. Restrictions have typically been handled in query optimizers by "predicate pushdown" rules, which apply restrictions in some random order before as many joins as possible. These rules work under the assumption that restriction is essentially a zero-time operation. However, today's extensible and object-oriented database systems allow users to define time-consuming functions, which may be used in a query's restriction and join predicates. Furthermore, SQL has long supported subquery predicates, which may be arbitrarily time-consuming to check. Thus restrictions should not be considered zero-time operations, and the model of query optimization must be enhanced. In this paper we develop a theory for moving expensive predicates in a query plan so that the total cost of the plan --- including the costs of both joins and restrictions --- is minimal. We present an algorithm to implement the theory, as well as results of our implementation in POSTGRES. Our experience with the newly enhanced POSTGRES query optimizer demonstrates that correctly optimizing queries with expensive predicates often produces plans that are orders of magnitude faster than plans generated by a traditional query optimizer. The additional complexity of considering expensive predicates during optimization is found to be manageably small. 1
Aggregate-Query Processing in Data Warehousing Environments
- In Proceedings of the International Conference on Very Large Databases
, 1995
"... In this paper we introduce generalized projections (GPs), an extension of duplicate eliminating projections, that capture aggregations, groupbys, duplicate-eliminating projections (distinct), and duplicate-preserving projections in a common unified framework. Using GPs we extend well known and simpl ..."
Abstract
-
Cited by 105 (1 self)
- Add to MetaCart
In this paper we introduce generalized projections (GPs), an extension of duplicate eliminating projections, that capture aggregations, groupbys, duplicate-eliminating projections (distinct), and duplicate-preserving projections in a common unified framework. Using GPs we extend well known and simple algorithms for SQL queries that use distinct projections to derive algorithms for queries using aggregations like sum, max, min, count, and avg. We develop powerful query rewrite rules for aggregate queries that unify and extend rewrite rules previously known in the literature. We then illustrate the power of our approach by solving a very practical and important problem in data warehousing: how to answer an aggregate query on base tables using materialized aggregate views (summary tables).
Extensible/Rule Based Query Rewrite Optimization in Starburst
- In SIGMOD
, 1992
"... This paper describes the Query Rewrite facility of the Starburst extensible database system, a novel phase of query optimization. We present a suite of rewrite rules used in Starburst to transform queries into equivalent queries for faster execution, and also describe the production rule engine whic ..."
Abstract
-
Cited by 101 (3 self)
- Add to MetaCart
This paper describes the Query Rewrite facility of the Starburst extensible database system, a novel phase of query optimization. We present a suite of rewrite rules used in Starburst to transform queries into equivalent queries for faster execution, and also describe the production rule engine which is used by Starburst to choose and execute these rules. Examples are provided demonstrating that these Query Rewrite transformations lead to query execution time improvements of orders of magnitude, suggesting that Query Rewrite in general --- and these rewrite rules in particular --- are an essential step in query optimization for modern database systems. 1 Introduction In traditional database systems, query optimization typically consists of a single phase of processing in which access methods, join orders and join methods are chosen to provide an efficient plan for executing a user's declarative query. We refer to this phase as plan optimization. In this paper we present a distinct ph...
Including Group-By in Query Optimization
, 1994
"... In existing relational database systems, processing of group-by and computation of aggregate functions are always postponed until all joins are performed. In this paper, we present transformations that make it possible to push group-by operation past one or more joins and can potentially reduce the ..."
Abstract
-
Cited by 101 (7 self)
- Add to MetaCart
In existing relational database systems, processing of group-by and computation of aggregate functions are always postponed until all joins are performed. In this paper, we present transformations that make it possible to push group-by operation past one or more joins and can potentially reduce the cost of processing a query significantly. Therefore, the placement of group-by should be decided based on cost estimation. We explain how the traditional System-R style optimizers can be modified by incorporating the greedy conservative heuristic that we developed. We prove that applications of greedy conservative heuristic produce plans that are better (or no worse) than the plans generated by a traditional optimizer. Our experimental study shows that the extent of improvement in the quality of plans is significant with only a modest increase in optimization cost. Our technique also applies to optimization of Select Distinct queries by pushing down duplicate elimination in a cost-based fas...
An overview of query optimization in relational systems
- In PODS
, 1998
"... There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to focus primarily on the optimization of SQL queries in relational database systems and present my biased an ..."
Abstract
-
Cited by 99 (1 self)
- Add to MetaCart
There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to focus primarily on the optimization of SQL queries in relational database systems and present my biased and incomplete view of this field. The goal of this article is not to be comprehensive, but rather to explain the foundations and present samplings of significant work in this area. I would like to apologize to the many contributors in this area whose work I have failed to explicitly acknowledge due to oversight or lack of space. I take the liberty of trading technical precision for ease of presentation. 2.
Algorithms for materialized view design in data warehousing environment
, 1997
"... Selecting views to materialize is one of the most important decisions in designing a data warehouse. In this paper, we present a frame-work for analyzing the issues in selecting views to materialize so as to achieve the best com-bination of good query performance and low view maintenance. We first d ..."
Abstract
-
Cited by 85 (10 self)
- Add to MetaCart
Selecting views to materialize is one of the most important decisions in designing a data warehouse. In this paper, we present a frame-work for analyzing the issues in selecting views to materialize so as to achieve the best com-bination of good query performance and low view maintenance. We first develop a heuristic algorithm which can provide a feasible solu-tion based on individual optimal query plans. We also map the materialized view design problem as O-l integer programming problem, whose solution can guarantee an optimal so-lution. 1
Nested Queries in Object Bases
- In Proc. Int. Workshop on Database Programming Languages
"... Many declarative query languages for object-oriented databases allow nested subqueries. This paper contains the first (to our knowledge) proposal to optimize them. A two-phase approach is used to optimize nested queries in the object-oriented context. The first phase---called dependency-based opt ..."
Abstract
-
Cited by 67 (21 self)
- Add to MetaCart
Many declarative query languages for object-oriented databases allow nested subqueries. This paper contains the first (to our knowledge) proposal to optimize them. A two-phase approach is used to optimize nested queries in the object-oriented context. The first phase---called dependency-based optimization---transforms queries at the query language level in order to treat common subexpressions and independent subqueries more efficiently. The transformed queries are translated to nested algebraic expressions. These entail nested loop evaluation which may be very inefficient. Hence, the second phase unnests nested algebraic expressions to allow for more efficient evaluation. 1 Introduction Many declarative query languages for object-oriented database management systems have been proposed in the last few years (e.g. [3, 5, 2, 18, 14]). To express complex conditions, access nested structure, or produce nested results, an essential feature found in these languages is the nesting of q...
Sequence Query Processing
- In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data
, 1994
"... Many applications require the ability to manipulate sequences of data. We motivate the importance of sequence query processing, and present a framework for the optimization of sequence queries based on several novel techniques. These include query transformations, optimizations that utilize meta--da ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
Many applications require the ability to manipulate sequences of data. We motivate the importance of sequence query processing, and present a framework for the optimization of sequence queries based on several novel techniques. These include query transformations, optimizations that utilize meta--data, and caching of intermediate results. We present a bottom-up algorithm that generates an efficient query evaluation plan based on cost estimates. This work also identifies a number of directions in which future research can be directed. 1 Introduction Many real life applications manipulate data that is inherently sequential. Such data is logically viewed and queried in terms of a sequence abstraction and is often physically stored as a sequence. Databases should (a) allow sequences to be queried in a declarative manner, utilizing the ordered semantics of the data, and (b) take advantage of the opportunities available for query optimization. Relational databases are inadequate in this re...
Performing Group-By before Join
- In ICDE
, 1994
"... Assume that we have an SQL query containing joins and a group-by. The standard way of evaluating this type of query is to first perform all the joins and then the group-by operation. However, it may be possible to perform the group-by early, that is, to push the groupby operation past one or more jo ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
Assume that we have an SQL query containing joins and a group-by. The standard way of evaluating this type of query is to first perform all the joins and then the group-by operation. However, it may be possible to perform the group-by early, that is, to push the groupby operation past one or more joins. Early grouping may reduce the query processing cost by reducing the amount of data participating in joins. We formally define the problem, adhering strictly to the semantics of NULL and duplicate elimination in SQL2, and prove necessary and sufficient conditions for deciding when this transformation is valid. In practice, it may be expensive or even impossible to test whether the conditions are satisfied. Therefore, we also present a more practical algorithm that tests a simpler, sufficient condition. This algorithm is fast and detects a large subclass of transformable queries. 1 Introduction SQL queries containing joins and group-by are fairly common. The standard way of evaluating su...

