Results 1 - 10
of
23
Content Integration for E-Business
- In Proc. Sigmod
, 2001
"... We define the problem of content integration for E-Business, and show how it differs in fundamental ways from traditional issues surrounding data integration, application integration, data warehousing and OLTP. Content integration includes catalog integration as a special case, but encompasses a bro ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
(Show Context)
We define the problem of content integration for E-Business, and show how it differs in fundamental ways from traditional issues surrounding data integration, application integration, data warehousing and OLTP. Content integration includes catalog integration as a special case, but encompasses a broader set of applications and challenges. We explore the characteristics of content integration and required services for any solution. In addition, we explore architectural alternatives and discuss the use of XML in this arena. 1.
Query optimization in distributed networks of autonomous database systems
- ACM Trans. Database Syst
"... Large-scale distributed environments, where each node is completely autonomous and offers ser-vices to its peers through external communication, pose significant challenges to query processing and optimization. Autonomy is the main source of the problem, as it results in lack of knowledge about any ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Large-scale distributed environments, where each node is completely autonomous and offers ser-vices to its peers through external communication, pose significant challenges to query processing and optimization. Autonomy is the main source of the problem, as it results in lack of knowledge about any particular node with respect to the information it can produce and its characteristics, for example, cost of production or quality of produced results. In this article, inspired by e-commerce technology, we recognize queries as commodities and model query optimization as a trading nego-tiation process. Subquery answers and subquery operator execution jobs are traded between nodes until deals are struck with some nodes for all of them. Such trading may also occur recursively, in the sense that some nodes may play the role of intermediaries between other nodes (subcon-tracting). We identify the key parameters of the overall framework and suggest several potential alternatives for each one. In comparison to trading negotiations for e-commerce, query optimiza-tion faces unique new challenges that stem primarily from the fact that queries have a complex structure and can be broken into smaller parts. We address these challenges through a particular instantiation of our framework focusing primarily on the optimization algorithms run on “buying” and “selling ” nodes, the evaluation metrics of the queries, and the negotiation strategy. Finally, we
A novel approach to resource scheduling for parallel query processing on computational grids. Distributed and Parallel Databases
, 2006
"... processing on computational grids ..."
(Show Context)
A Characterization of the Sensitivity of Query Optimization to Storage Access Parameters
- In Proceedings of ACM SIGMOD
, 2003
"... Most relational query optimizers make use of information about the costs of accessing tuples and data structures on various storage devices. This information can at times be off by several orders of magnitude due to human error in configuration setup, sudden changes in load, or hardware failure. In ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
Most relational query optimizers make use of information about the costs of accessing tuples and data structures on various storage devices. This information can at times be off by several orders of magnitude due to human error in configuration setup, sudden changes in load, or hardware failure. In this paper, we attempt to answer the following questions: • Are inaccurate access cost estimates likely to cause a typical query optimizer to choose a suboptimal query plan? • If an optimizer chooses a suboptimal plan as a result of inaccurate access cost estimates, how far from optimal is this plan likely to be? To address these issues, we provide a theoretical, vector-based framework for analyzing the costs of query plans under various storage parameter costs. We then use this geometric framework to characterize experimentally a commercial query optimizer. We develop algorithms for extracting detailed information about query plans through narrow optimizer interfaces, and we perform the characterization using database statistics from a published run of the TPC-H benchmark and a wide range of storage parameters. We show that, when data structures such as tables, indexes, and sorted runs reside on different storage devices, the optimizer can derive significant benefits from having accurate and timely information regarding the cost of accessing storage devices.
Minimizing Communication Cost in Distributed Multi-query Processing
, 2009
"... Increasing prevalence of large-scale distributed monitoring and computing environments such as sensor networks, scientific federations, Grids etc., has led to a renewed interest in the area of distributed query processing and optimization. In this paper we address a general, distributed multiquery ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Increasing prevalence of large-scale distributed monitoring and computing environments such as sensor networks, scientific federations, Grids etc., has led to a renewed interest in the area of distributed query processing and optimization. In this paper we address a general, distributed multiquery processing problem motivated by the need to minimize the communication cost in these environments. Specifically we address the problem of optimally sharing data movement across the communication edges in a distributed communication network given a set of overlapping queries and query plans for them (specifying the operations to be executed). Most of the problem variations of our general problem can be shown to be NP-Hard by a reduction from the Steiner tree problem. However, we show that the problem can be solved optimally if the communication network is a tree, and present a novel algorithm for finding an optimal data movement plan. For general communication networks, we present efficient approximation algorithms for several variations of the problem. Finally, we present an experimental study over synthetic datasets showing both the need for exploiting the sharing of data movement and the effectiveness of our algorithms at finding such plans.
Network-Aware Join Processing in Global-Scale Database Federations
"... Abstract — We introduce join scheduling algorithms that employ a balanced network utilization metric to optimize the use of all network paths in a global-scale database federation. This metric allows algorithms to exploit excess capacity in the network, while avoiding narrow, long-haul paths. We giv ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Abstract — We introduce join scheduling algorithms that employ a balanced network utilization metric to optimize the use of all network paths in a global-scale database federation. This metric allows algorithms to exploit excess capacity in the network, while avoiding narrow, long-haul paths. We give a twoapproximate, polynomial-time algorithm for serial (left-deep) join schedules. We also present extensions to this algorithm that explore parallel schedules, reduce resource usage, and define tradeoffs between computation and network utilization. We evaluate these techniques within the SkyQuery federation of Astronomy databases using spatial-join queries submitted by SkyQuery’s users. Experiments show that our algorithms realize near-optimal network utilization with minor computational overhead. I.
Query Optimization over Crowdsourced Data ⇤
"... Deco is a comprehensive system for answering declarative queries posed over stored relational data together with data obtained ondemand from the crowd. In this paper we describe Deco’s costbased query optimizer, building on Deco’s data model, query language, and query execution engine presented earl ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Deco is a comprehensive system for answering declarative queries posed over stored relational data together with data obtained ondemand from the crowd. In this paper we describe Deco’s costbased query optimizer, building on Deco’s data model, query language, and query execution engine presented earlier. Deco’s objective in query optimization is to find the best query plan to answer a query, in terms of estimated monetary cost. Deco’s query semantics and plan execution strategies require several fundamental changes to traditional query optimization. Novel techniques incorporated into Deco’s query optimizer include a cost model distinguishing between “free ” existing data versus paid new data, a cardinality estimation algorithm coping with changes to the database state during query execution, and a plan enumeration algorithm maximizing reuse of common subplans in a setting that makes reuse challenging. We experimentally evaluate Deco’s query optimizer, focusing on the accuracy of cost estimation and the efficiency of plan enumeration. 1.
Optimal service ordering in decentralized queries over web services
, 2010
"... The problem of ordering expensive predicates (or filter ordering) has recently received renewed attention due to emerging computing paradigms such as processing engines for queries over remote Web Services, and cloud and grid computing. The optimization of pipelined plans over services differs from ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
The problem of ordering expensive predicates (or filter ordering) has recently received renewed attention due to emerging computing paradigms such as processing engines for queries over remote Web Services, and cloud and grid computing. The optimization of pipelined plans over services differs from traditional optimization significantly, since execution takes place in parallel and thus the query response time is determined by the slowest node in the plan, which is called the bottleneck node. Although polynomial algorithms have been proposed for several variants of optimization problems in this setting, the fact that communication links are typically heterogeneous in wide-area environments has been largely overlooked. Our proposal is the first attempt, to the best of our knowledge, which tries to optimize linear orderings of services when the services communicate directly with each other and the communication links are heterogeneous. We propose a novel optimal algorithm to solve this problem efficiently. The evaluation of the proposal shows that it can result in significant reductions of the response time.
Replication-Aware Query Processing in Large-Scale Distributed Information Systems
- In WebDB
, 2006
"... In this work, we address the problem of replica selection in distributed query processing over the Web, in the presence of user preferences for Quality of Service and Quality of Data. In particular, we propose RAQP, which stands for Replication-Aware Query Processing. RAQP uses an initial statically ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
In this work, we address the problem of replica selection in distributed query processing over the Web, in the presence of user preferences for Quality of Service and Quality of Data. In particular, we propose RAQP, which stands for Replication-Aware Query Processing. RAQP uses an initial statically-optimized logical plan, and then selects the execution site for each operator and also selects which replica to use, thus converting the logical plan to an executable plan. Unlike prior work, we do not perform an exhaustive search for the second phase, which allows RAQP to scale significantly better. Extensive experiments show that our scheme can provide improvements in both query response time and overall quality of QoS and QoD as compared to random site allocation with iterative improvement. 1.
Query Performance Evaluation of an Architecture for Fine-Grained Integration of Heterogeneous Grid Data Sources ✩
"... Grid data sources may have schema- and data-level conflicts that need to be addressed using data transformation and integration technologies not supported by the current generation of Grid data access and querying middleware. We present an architecture that combines Grid data access and distributed ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Grid data sources may have schema- and data-level conflicts that need to be addressed using data transformation and integration technologies not supported by the current generation of Grid data access and querying middleware. We present an architecture that combines Grid data access and distributed querying with fine-grained data transformation/integration technologies, and the results of a query performance evaluation on this architecture. The performance evaluation indicates that it is indeed feasible to combine such technologies while achieving acceptable query performance. We also discuss the significance of our results for the further development of query performance over heterogeneous Grid data sources. Key words: data integration, query processing, Grid computing, bioinformatics