Results 1 - 10
of
47
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 182 (2 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
Approximate Query Processing Using Wavelets
, 2000
"... Abstract. Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today’s decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically ..."
Abstract
-
Cited by 158 (9 self)
- Add to MetaCart
Abstract. Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today’s decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing
DBProxy: A dynamic data cache for Web applications
- In Proc. ICDE
, 2003
"... The majority of web pages served today are generated dynamically, usually by an application server querying a back-end database. To enhance the scalability of dynamic content serving in large sites, application servers are offloaded to front-end nodes, called edge servers. The improvement from such ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
The majority of web pages served today are generated dynamically, usually by an application server querying a back-end database. To enhance the scalability of dynamic content serving in large sites, application servers are offloaded to front-end nodes, called edge servers. The improvement from such application offloading is marginal, however, if data is still fetched from the origin database system. To further improve scalability and cut response times, data must be effectively cached on such edge servers. The scale of deployment of edge servers and the rising costs of their administration demand that such caches be self-managing and adaptive. In this paper, we describe DBProxy, an edge-of-network semantic data cache for web applications. DBProxy is designed to adapt to changes in the workload in a transparent and graceful fashion by caching a large number of overlapping and dynamically changing "materialized views". New "views" are added automatically while others may be discarded to save space. In this paper, we discuss the challenges of designing and implementing such a dynamic edge data cache, and describe our proposed solutions.
Materialized view selection for multidimensional datasets
, 1998
"... To fulfill the requirement of fast interactive multidimensional data analysis, database sys-tems precompute aggregate views on some sub-sets of dimensions and their corresponding hi-erarchies. However, the problem of what to precompute is difficult and intriguing. The leading existing algorithm, BPU ..."
Abstract
-
Cited by 57 (1 self)
- Add to MetaCart
To fulfill the requirement of fast interactive multidimensional data analysis, database sys-tems precompute aggregate views on some sub-sets of dimensions and their corresponding hi-erarchies. However, the problem of what to precompute is difficult and intriguing. The leading existing algorithm, BPUS, has a run-ning time that is polynomial in the number of views and is guaranteed to be within (0.63- f) of optimal, where f is the fraction of available space consumed by the largest aggregate. Un-fortunately, BPUS can be impractically slow, and in some instances may miss good solu-tions due to the coarse granularity at which it makes its decisions of what to precompute. In view of this, we study the structure of the pre-computation problem and show that under cer-tain broad conditions on the multidimensional data, an even simpler and faster algorithm, PBS, achieves the same (0.63- f) bound. Our empirical study of the behavior of PBS shows that even when this condition does not hold, PBS picks a surprisingly good set of aggregates for precomputation. Furthermore, BPUS and other previous work has assumed that all ag-gregates are either precomputed in their en-tirety or not at all. We show that if one re-laxes this and allows aggregates to be partially precomputed, not only is it possible to find so-lutions that are better than those found by pre-vious algorithms, in some cases it is even pos-sible to find solutions that are better than the solution that is ‘optimal ’ by the previous defi-nition.
Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions
, 1988
"... Efficiently answering decision support queries is an important problem. Most of the work in this direction has been in the context of the data cube. Queries are efficiently answered by pre-computing large parts of the cube. Besides having large space requirements, such pre-computation requires that ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
Efficiently answering decision support queries is an important problem. Most of the work in this direction has been in the context of the data cube. Queries are efficiently answered by pre-computing large parts of the cube. Besides having large space requirements, such pre-computation requires that the hierarchy along each dimension be fixed (hence dimensions are categorical or prediscretized) . Queries that take advantage of pre-computation can thus only drill-down or roll-up along this fixed hierarchy. Another disadvantage of existing pre-computation techniques is that the target measure, along with the aggregation function of interest, is fixed for each cube. Queries over more than one target measure or using different aggregation functions, would require pre-computing larger data cubes. In this paper, we propose a new compressed representation of the data cube that (a) drastically reduces storage requirements, (b) does not require the discretization hierarchy along each query dimension to be fixed beforehand and (c) treats each dimension as a potential target measure and supports multiple aggregation functions without additional storage costs. The tradeoff is approximate, yet relatively accurate, answers to queries. We outline mechanisms to reduce the error in the approximation. Our performance evaluation indicates that our compression technique effectively addresses the limitations of existing approaches.
Using Semantic Caching to Manage Location Dependent Data in Mobile Computing
, 2000
"... Location-dependent applications are becoming very popular in mobile environments. To improve system performance and facilitate disconnection, caching is crucial to such applications. In this paper, a semantic caching scheme is used to access location dependent data in mobile computing. We first deve ..."
Abstract
-
Cited by 50 (2 self)
- Add to MetaCart
Location-dependent applications are becoming very popular in mobile environments. To improve system performance and facilitate disconnection, caching is crucial to such applications. In this paper, a semantic caching scheme is used to access location dependent data in mobile computing. We first develop a mobility model to represent the moving behaviors of mobile users and formally define location dependent queries. We then investigate query processing and cache management strategies. The performance of the semantic caching scheme and its replacement strategy FAR is evaluated through a simulation study. Our results show that semantic caching is more flexible and effective for use in LDD applications than page caching, whose performance is quite sensitive to the database physical organization. We also notice that the semantic cache replacement strategy FAR, which utilizes the semantic locality in terms of locations, performs robustly under different kinds of workloads.
DynaMat: A Dynamic View Management System for Data Warehouses
- In SIGMOD
, 1999
"... Pre-computation and materialization of views with aggregate functions is a common technique in Data Warehouses. Due to the complex structure of the warehouse and the different profiles of the users who submit queries, there is need for tools that will automate the selection and management of the mat ..."
Abstract
-
Cited by 50 (10 self)
- Add to MetaCart
Pre-computation and materialization of views with aggregate functions is a common technique in Data Warehouses. Due to the complex structure of the warehouse and the different profiles of the users who submit queries, there is need for tools that will automate the selection and management of the materialized data. In this paper we present DynaMat, a system that dynamically materializes information at multiple levels of granularity in order to match the demand (workload) but also takes into account the maintenance restrictions for the warehouse, such as down time to update the views and space availability. DynaMat unifies the view selection and the view maintenance problems under a single framework using a novel “goodness ” measure for the materialized views. DynaMat constantly monitors incoming queries and materializes the best set of views subject to the space constraints. During updates, DynaMat reconciles the current materialized view selection and refreshes the most beneficial subset of it within a given maintenance window. We compare DynaMat against a system that is given all queries in advance and the pre-computed optimal static view selection. The comparison is made based on a new metric, the Detailed Cost Savings Ratio introduced for quantifying the benefits of view materialization against incoming queries. These experiments show that DynaMat’s dynamic view selection outperforms the optimal static view selection and thus, any sub-optimal static algorithm that has appeared in the literature. 1
An Adaptive Peer-to-Peer Network for Distributed Caching of OLAP Results
- In Proc. of SIGMOD
, 2002
"... Peer-to-Peer (P2P) systems are becoming increasingly popular as they enable users to exchange digital information by participating in complex networks. Such systems are inexpensive, easy to use, highly scalable and do not require central administration. Despite their advantages, however, limited wor ..."
Abstract
-
Cited by 49 (10 self)
- Add to MetaCart
Peer-to-Peer (P2P) systems are becoming increasingly popular as they enable users to exchange digital information by participating in complex networks. Such systems are inexpensive, easy to use, highly scalable and do not require central administration. Despite their advantages, however, limited work has been done on employing database systems on top of P2P networks.
Form-Based Proxy Caching for Database-Backed Web Sites: Keywords and Functions
, 2008
"... Web caching proxy servers are essential for improving web performance and scalability, and recent research has focused on making proxy caching work for database-backed web sites. In this paper, we explore a new proxy caching framework that exploits the query semantics of HTML forms. We identify two ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
Web caching proxy servers are essential for improving web performance and scalability, and recent research has focused on making proxy caching work for database-backed web sites. In this paper, we explore a new proxy caching framework that exploits the query semantics of HTML forms. We identify two common classes of form-based queries from real-world database-backed web sites, namely, keyword-based queries and function-embedded queries. Using typical examples of these queries, we study two representative caching schemes within our framework: (i) traditional passive query caching, and (ii) active query caching, in which the proxy cache can service a request by evaluating a query over the contents of the cache. Results from our experimental implementation show that our form-based proxy is a general and flexible approach that efficiently enables active caching schemes for database-backed web sites. Furthermore, handling query containment at the proxy yields significant performance advantages over passive query caching, but extending the power of the active cache to do full semantic caching appears to be less generally effective.
ICICLES: Self-tuning Samples for Approximate Query Answering
- VLDB
, 2000
"... Approximate query answering systems provide very fast alternatives to OLAP systems when applications are tolerant to small errors in query answers. ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
Approximate query answering systems provide very fast alternatives to OLAP systems when applications are tolerant to small errors in query answers.

