Results 1 - 10
of
13
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 320 (3 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
Cache Tables: Paving the Way for an Adaptive Database Cache
- In Proc. VLDB
, 2003
"... We introduce a new database object called Cache Table that enables persistent caching of the full or partial content of a remote database table. The content of a cache table is either defined declaratively and populated in advance at setup time, or determined dynamically and populated on demand at q ..."
Abstract
-
Cited by 50 (1 self)
- Add to MetaCart
We introduce a new database object called Cache Table that enables persistent caching of the full or partial content of a remote database table. The content of a cache table is either defined declaratively and populated in advance at setup time, or determined dynamically and populated on demand at query execution time. Dynamic cache tables exploit the characteristics of typical transactional web applications with a high volume of short transactions, simple equality predicates, and 3-4 way joins. Based on federated query processing capabilities, we developed a set of new technologies for database caching: cache tables, "Janus" (two-headed) query execution plans, cache constraints, and asynchronous cache population methods. Our solution supports transparent caching both at the edge of contentdelivery networks and in the middle-tier of an enterprise application infrastructure, improving the response time, throughput and scalability of transactional web applications.
Rouvellou: A Middleware System Which Intelligently Caches Query Results. Middleware Conference
, 2000
"... Abstract. This paper describes how caching was used to improve per-formance in the Accessible Business Rules framework (ABR) for IBM’s Websphere. ABR is a middleware system which enables application writ-ers to build applications where the time and situation-variable parts of their business logic ar ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
(Show Context)
Abstract. This paper describes how caching was used to improve per-formance in the Accessible Business Rules framework (ABR) for IBM’s Websphere. ABR is a middleware system which enables application writ-ers to build applications where the time and situation-variable parts of their business logic are externally applied entities known as business rules. The cache significantly reduced the number of queries to remote databases by storing query results. A key problem we faced was how to keep the cache current after database updates. This was solved using data update propagation (DUP). Two enhancements we made to DUP were to employ an update strategy which considers the values of database updates in order to perform intelligent cache invalidations and to auto-matically compute dependencies using compile and run-time analysis. Our techniques can be applied to other caching environments besides ABR. We show how our cache invalidation strategies perform for appli-cations with database updates having queries similar to those in the Set Query benchmark. 1
Holistic optimization by prefetching query results
- In SIGMOD
, 2012
"... In this paper we address the problem of optimizing performance of database/web-service backed applications by means of automatically prefetching query results. Prefetching has been performed in earlier work based on predicting query access patterns; however such prediction is often of limited value, ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
In this paper we address the problem of optimizing performance of database/web-service backed applications by means of automatically prefetching query results. Prefetching has been performed in earlier work based on predicting query access patterns; however such prediction is often of limited value, and can perform unnecessary prefetches. There has been some earlier work on program analysis and rewriting to automatically insert prefetch requests; however, such work has been restricted to rewriting of single procedures. In many cases, the queryis in a procedure which does not offer muchscope for prefetching within the procedure; in contrast, our approach can perform prefetching in a calling procedure, even when the actual query is in a called procedure, therebygreatly improving thebenefits dueto prefetching. Ourapproachdoes notperform anyintrusivechangesto the source code, and places prefetch instructions at the earliest possible points while avoiding wasteful prefetches. We have incorporated our techniques into a tool for holistic optimization called DBridge, to prefetch query results in Java programs that use JDBC. Our tool can be easily extended to handle Hibernate API calls as well as Web service requests. Our experiments on several real world applications demonstrate the applicability and significant performance gains due to our techniques. Categories andSubject Descriptors
Stop-and-Restart Style Execution for Long Running Decision Support Queries
- In Proc. of the 33 rd Intl. Conf. on Very Large Data Bases (VLDB
, 2007
"... Long running decision support queries can be resource intensive and often lead to resource contention in data warehousing systems. Today, the only real option available to the DBAs when faced with such contention is to carefully select one or more queries and terminate them. However, the work done b ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Long running decision support queries can be resource intensive and often lead to resource contention in data warehousing systems. Today, the only real option available to the DBAs when faced with such contention is to carefully select one or more queries and terminate them. However, the work done by such terminated queries is entirely lost even if they were very close to completion and these queries will need to be run in their entirety at a later time. In this paper, we show how instead we can support a Stop-and-Restart style query execution that can leverage partially the work done in the initial query execution. In order to re-execute only the remaining work of the query, a Stop-and-Restart execution would need to save all the previous work. But this approach would clearly incur high overheads which is undesirable. In contrast, we present a technique that can be used to save information selectively from the past execution so that the overhead can be bounded. Despite saving only limited information, our technique is able to reduce the running time of the restarted queries substantially. We show the effectiveness of our approach using real and benchmark data. 1.
Abstract Replacement Strategies for XQuery Caching Systems
"... To improve the query performance over XML documents in a distributed environment, we develop a semantic caching system named ACE-XQ for XQuery queries. ACE-XQ applies innovative query containment and rewriting techniques to answer user queries using cached queries. We also design a fine-grained repl ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
To improve the query performance over XML documents in a distributed environment, we develop a semantic caching system named ACE-XQ for XQuery queries. ACE-XQ applies innovative query containment and rewriting techniques to answer user queries using cached queries. We also design a fine-grained replacement strategy which records user access statistics at a finer granularity than the complete XML query regions. As a result, less frequently used XML view fragments are replaced to maintain a better utilization of the cache space. Extensive experimental results illustrate the performance improvement achieved by this strategy over the traditional one for a variety of situations.
Achieving Communication Efficiency through Push-Pull Partitioning of Semantic Spaces to Disseminate Dynamic Information
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2006
"... Many database applications that need to disseminate dynamic information from a server to various clients can suffer from heavy communication costs. Data caching at a client can help mitigate these costs, particularly when individual PUSH-PULL decisions are made for the different semantic regions in ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Many database applications that need to disseminate dynamic information from a server to various clients can suffer from heavy communication costs. Data caching at a client can help mitigate these costs, particularly when individual PUSH-PULL decisions are made for the different semantic regions in the data space. The server is responsible for notifying the client about updates in the PUSH regions. The client needs to contact the server for queries that ask for data in the PULL regions. We call the idea of partitioning the data space into PUSH-PULL regions to minimize communication cost data gerrymandering. In this paper we present solutions to technical challenges in adopting this simple but powerful idea. We give a provably optimal-cost dynamic programming algorithm for gerrymandering on a single query attribute. We propose a family of efficient heuristics for gerrymandering on multiple query attributes. We handle the dynamic case in which the workloads of queries and updates evolve over time. We validate our methods through extensive experiments on real and synthetic data sets.
Semantic Caching for XML Queries
- IN THE MACH 3.5 LOW-DISTURBANCE WIND TUNNEL AND COMPARISONS OF DATA WITH THEORY," SAE 892379
, 1989
"... With the advent of XML, great challenges arise from the demand for efficiently retrieving information from remote XML sources across the Inter-net. The semantic caching technology can help to improve the efficiency of XML query processing in the Web environment. Different from the traditional tuple ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
With the advent of XML, great challenges arise from the demand for efficiently retrieving information from remote XML sources across the Inter-net. The semantic caching technology can help to improve the efficiency of XML query processing in the Web environment. Different from the traditional tuple or page-based caching systems, semantic caching systems ex-ploit the idea of reusing cached query results to answer new queries based on the query containment and rewriting techniques. Fundamental results on the containment of relational queries have been established. In the XML setting, the containment problem remains unexplored for comprehensive XML query languages such as XQuery, and little has been studied with re-spect to the cache management issue such as replacement. Hence, this dis-sertation addresses two issues fundamental to building an XQuery-based semantic caching system: XQuery containment and rewriting, and an ef-fective replacement strategy. We first
Site-Autonomous Distributed Semantic Caching
"... Semantic caching augments cached data with a semantic description of the data. These semantic descriptions can be used to improve execution time for similar queries by retrieving some data from cache and issuing a remainder query for the rest. This is an improvement over traditional page caching, si ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Semantic caching augments cached data with a semantic description of the data. These semantic descriptions can be used to improve execution time for similar queries by retrieving some data from cache and issuing a remainder query for the rest. This is an improvement over traditional page caching, since caches are no longer limited to only base tables but are extended to contain intermediate results. In large-scale distributed database systems, using a central server with complete knowledge of the system will be a serious bottleneck and single point of failure. In this paper, we propose a distributed semantic caching method where sites make autonomous caching decisions based on locally available information, thereby reducing the need for centralized control. We implement the method in the DASCOSA-DB distributed database system prototype and use this implementation to do experiments that show the applicability and efficiency of our approach. Our evaluation shows that execution times for queries with similar subqueries are significantly reduced and that overhead caused by cache management is marginal.
Context-Aware Prefetching at the Storage Server
"... In many of today’s applications, access to storage constitutes the major cost of processing a user request. Data prefetching has been used to alleviate the storage access latency. Under current prefetching techniques, the storage system prefetches a batch of blocks upon detecting an access pattern. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
In many of today’s applications, access to storage constitutes the major cost of processing a user request. Data prefetching has been used to alleviate the storage access latency. Under current prefetching techniques, the storage system prefetches a batch of blocks upon detecting an access pattern. However, the high level of concurrency in today’s applications typically leads to interleaved block accesses, which makes detecting an access pattern a very challenging problem. Towards this, we propose and evaluate QuickMine, a novel, lightweight and minimally intrusive method for contextaware prefetching. Under QuickMine, we capture application contexts, such as a transaction or query, and leverage them for context-aware prediction and improved prefetching effectiveness in the storage cache. We implement a prototype of our context-aware prefetching algorithm in a storage-area network (SAN) built using Network Block Device (NBD). Our prototype shows that context-aware prefetching clearly outperforms existing context-oblivious prefetching algorithms, resulting in factors of up to 2 improvements in application latency for two e-commerce workloads with repeatable access patterns, TPC-W and RUBiS. 1