Results 1 - 10
of
13
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 182 (2 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
Chained Declustering: A New Availability Strategy for Multiprocssor Database
- IN PROCEEDINGS OF 6TH INTERNATIONAL DATA ENGINEERING CONFERENCE
, 1990
"... This paper presents a new strategy for increasing the availability of data in multi-processor, shared-nothing database machines. This technique, termed chained declustering, is demonstrated to provide superior performance in the event of failures while maintaining a very high degree of data availabi ..."
Abstract
-
Cited by 112 (6 self)
- Add to MetaCart
This paper presents a new strategy for increasing the availability of data in multi-processor, shared-nothing database machines. This technique, termed chained declustering, is demonstrated to provide superior performance in the event of failures while maintaining a very high degree of data availability. Furthermore, unlike most earlier replication strategies, the implementation of chained declustering requires no special hardware and only minimal modifications to existing software.
Performance Tradeoffs for Client-Server Query Processing
, 1996
"... The construction of high-performance database systems that combine the best aspects of the relational and object-oriented approaches requires the design of client-server architectures that can fully exploit client and server resources in a flexible manner. The two predominant paradigms for client-se ..."
Abstract
-
Cited by 61 (17 self)
- Add to MetaCart
The construction of high-performance database systems that combine the best aspects of the relational and object-oriented approaches requires the design of client-server architectures that can fully exploit client and server resources in a flexible manner. The two predominant paradigms for client-server query execution are datashipping and query-shipping. We first define these policies in terms of the restrictions they place on operator site selection during query optimization. We then investigate the performance tradeoffs between them for bulk query processing. While each strategy has advantages, neither one on its own is efficient across a wide range of circumstances. We describe andevaluate a more flexible policy called hybrid-shipping, which can execute queries at clients, servers, or any combination of the two. Hybrid-shipping is shown to at least match the best of the two "pure" policies, and in some situations, to perform better than both. The implementation of hybrid-shipping rais...
A Performance Study of Three High Availability Data Replication Strategies
- Proceedings of the 1st Conference on Parallel and Distributed Information Systems
, 1991
"... Several data replication strategies have been proposed to provide high data availability for database system applications. However, the tradeoffs among the different strategies for various workloads and different operating modes is still not well understood. In this paper, we study the relative perf ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
Several data replication strategies have been proposed to provide high data availability for database system applications. However, the tradeoffs among the different strategies for various workloads and different operating modes is still not well understood. In this paper, we study the relative performance of three high availability data replication strategies, chained declustering, mirrored disks, and interleaved declustering, in a shared nothing database machine environment. Among the issues that we have examined are (1) the relative performance of different strategies when no failures have occurred, (2) the effect of a single node failure on system throughput and response time, (3) the performance impact of varying the CPU speed and/or disk page size on the different replication strategies, and (4) the tradeoff between the benefit of intra query parallelism and the overhead of activating and scheduling extra operator processes. Experimental results obtained from a simulation study indicates that, in the normal mode of operation, chained declustering and interleaved declustering perform comparably. Both perform better than mirrored disks if an application is I/O bound (due to disk scheduling), but slightly worse than mirrored disks if the application is CPU bound. In the event of a disk failure, because chained declustering is able to balance the workload while the other two cannot, it provides noticeably better performance than interleaved declustering and much better performance than mirrored disks. 1.
Decoupled Query Optimization for Federated Database Systems
"... We study the problem of query optimization in federated relational database systems. The nature of federated databases explicitly decouples many aspects of the optimization process, often making it imperative for the optimizer to consult underlying data sources while doing costbased optimization. Th ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
We study the problem of query optimization in federated relational database systems. The nature of federated databases explicitly decouples many aspects of the optimization process, often making it imperative for the optimizer to consult underlying data sources while doing costbased optimization. This not only increases the cost of optimization, but also changes the trade-offs involved in the optimization process significantly. The dominant cost in the decoupled optimization process is the "cost of costing" that traditionally has been considered insignificant. The optimizer can only afford a few rounds of messages to the underlying data sources and hence the optimization techniques in this environment must be geared toward gathering all the required cost information with minimal communication.
Cache Investment: Integrating Query Optimization and Distributed Data Placement
- ACM TODS
, 2000
"... Emerging distributed query processing systems support... In this paper, we propose Cache Investment mechanisms and policies and analyze their performance. The analysis uses results from both an implementation on the SHORE storage manager and a detailed simulation model. Our results show that Cache I ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Emerging distributed query processing systems support... In this paper, we propose Cache Investment mechanisms and policies and analyze their performance. The analysis uses results from both an implementation on the SHORE storage manager and a detailed simulation model. Our results show that Cache Investment can significantly improve the overall performance of a system and demonstrate the tradeoffs among various alternative policies.
A Study of Query Execution Strategies for Client-Server Database Systems
- In Proceedings of ACM SIGMOD Conference
, 1996
"... Query processing in a client-server database system raises the question of where to execute queries to minimize the communication costs and response time of a query, and to load-balance the system. This paper evaluates the two common query execution strategies, data shipping and query shipping, and ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Query processing in a client-server database system raises the question of where to execute queries to minimize the communication costs and response time of a query, and to load-balance the system. This paper evaluates the two common query execution strategies, data shipping and query shipping, and a policy referred to as hybrid shipping. Data shipping determines that queries be executed at clients; query shipping determines that queries be executed at servers; and hybrid shipping provides the flexibility to execute queries at clients and servers. The experiments with a client-server model confirm that the query execution policy is critical for the performance of a system. Neither data nor query shipping are optimal in all situations, and the performance penalities can be substantial. Hybrid shipping at least matches the best performance of data and query shipping and shows better performance than both in many cases. The performance of hybrid shipping plans, however, is shown to be sen...
Cache Investment Strategies
- University of Maryland, College Park, MD
, 1997
"... Emerging client-server and peer-to-peer distributed information systems employ data caching to improve performance and reduce the need for remote access to data. In distributed database systems, caching is a by-product of query operator placement --- data that are brought to a site by a query operat ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Emerging client-server and peer-to-peer distributed information systems employ data caching to improve performance and reduce the need for remote access to data. In distributed database systems, caching is a by-product of query operator placement --- data that are brought to a site by a query operator can be retained at that site for future use. Operator placement, however, must take the location of cached data into account in order to avoid excessive data movement. Thus, there exists a fundamental circular dependency between caching and query optimization. In this paper, we identify this circularity and show that in order to break it, query optimization must be extended to look beyond the performance of a single query. To do so, we propose the notion of Cache Investment, in which a sub-optimal plan may be generated for a particular query in order to effect a data placement that is beneficial for subsequent queries. We develop a framework for integrating Cache Investment decisions into...
Architecture and Performance of Large and Disperse Distributed Object Base Systems
- Position Paper for OOPSLA Workshop on Object Database Behaviour, Benchmarks, and Performance
, 1995
"... Introduction Most object-oriented database systems and persistent object stores have been constructed in a workstation-server environment in which client workstations are connected to the server machines by a local-area network (e.g., an Ethernet). The servers are responsible for the persistent and ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Introduction Most object-oriented database systems and persistent object stores have been constructed in a workstation-server environment in which client workstations are connected to the server machines by a local-area network (e.g., an Ethernet). The servers are responsible for the persistent and consistent storage of the object base and for preventing unauthorized access to the data. Application programs are carried out on the client workstations to exploit the resources of the workstations and to avoid computation on servers, the potential bottlenecks of the system. Typically, object-oriented databases have been used in very computation-intensive and/or highly interactive applications (e.g., engineering applications), and therefore, techniques such as pointer swizzling [Mos92, KK95] have been used to speed up the processing of persistent objects at workstations. In this environment, scalabilty means that an arbitrary number of client workstations can be connected to the se
Performance Evaluation of Nested Transactions on Locally Distributed Database Systems
- In Proceedings of 2nd International Symposium on Parallel Architectures, Algorithms, and Networks, 1-SPAN, IEEE
, 1996
"... This paper describes an execution time estimating model for nested transactions running on locally distributed database systems. At first the model of nested transactions and the model of a locally distributed database system are established. The performance evaluation model of nested transactions i ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes an execution time estimating model for nested transactions running on locally distributed database systems. At first the model of nested transactions and the model of a locally distributed database system are established. The performance evaluation model of nested transactions is then built in three steps. The first step describes a nondeterministic algorithm that evaluates the execution time of a nested transaction using general routing strategies. The second step gives explicit form solutions for a few special cases. The last step uses some approximate methods to develop the lower and upper bounds of the general form solution. Key Words: Nested transaction; Locally distributed database systems, Performance evaluation, Combinatorics. 1 Introduction The demand for high transaction processing rate has motivated the development of multiprocessor or locally distributed database systems [9]. A locally distributed database system has a tightly interconnection among it...

