Results 1 -
8 of
8
Parallel Evaluation of Multi-Join Queries
- In Proc. ACM SIGMOD Int'l. Conf
, 1995
"... A number of execution strategies for parallel evaluation of multi-join queries have been proposed in the literature; their performance was evaluated by simulation. In this paper we give a comparative performance evaluation of four execution strategies by implementing all of them on the same parallel ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
A number of execution strategies for parallel evaluation of multi-join queries have been proposed in the literature; their performance was evaluated by simulation. In this paper we give a comparative performance evaluation of four execution strategies by implementing all of them on the same parallel database system, PRISMA/DB. Experiments have been done up to 80 processors. The basic strategy is to first determine an execution schedule with minimum total cost and then parallelize this schedule with one of the four execution strategies. These strategies, coming from the literature, are named: Sequential Parallel, Synchronous Execution, Segmented Right-Deep, and Full Parallel. Based on the experiments clear guidelines are given when to use which strategy. 1
PRISMA/DB: A Parallel Main Memory Relational DBMS
- IEEE Transactions on Knowledge and Data Engineering
, 1992
"... Abstract-PRISWDB is a full-fledged parallel, main memory relational database management system the design of which is characterized by two main ideas. In the first place, high performance is obtained by the use of parallelism for query processing and main memory storage of the entire database. In th ..."
Abstract
-
Cited by 40 (12 self)
- Add to MetaCart
Abstract-PRISWDB is a full-fledged parallel, main memory relational database management system the design of which is characterized by two main ideas. In the first place, high performance is obtained by the use of parallelism for query processing and main memory storage of the entire database. In the second place, a flexible architecture for experimenting with functionality and performance is obtained via a modular implementation of the system in an object-oriented programming language. This paper describes the design and implementation of PRISWDB in detail. Also, a performance evaluation of the system shows that the system is comparable to other state-of-the-art database machines. The prototype implementation of the system is ready, and runs on a 100-node parallel multiprocessor. The achieved flexibility of the system makes it a valuable platform for research in various directions. Index Terms-Parallel, main memory, relational database man-agement system, design and implementation, architecture, query execution, experimentation, integrity constraints. I.
A survey of parallel execution strategies for transitive closure and logic programs
- DISTRIBUTED AND PARALLEL DATABASES
, 1993
"... An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particu ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple
Data Fragmentation for Parallel Transitive Closure Strategies
- In Proceedings of the IEEE 9th International Conference on Data Engineering
, 1993
"... A topic that is currently inspiring a lot of research is parallel (distributed) computation of transitive closure queries. In [10] the disconnection set approach has been introduced as an effective strategy for such a computation. It involves reformulating a transitive closure query on a relation in ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
A topic that is currently inspiring a lot of research is parallel (distributed) computation of transitive closure queries. In [10] the disconnection set approach has been introduced as an effective strategy for such a computation. It involves reformulating a transitive closure query on a relation into a number of transitive closure queries on smaller fragments; these queries can then execute independently on the fragments, without need for communication and without computing the same tuples at more than one processor. Now that effective strategies as just mentioned have been developed, the next problem is that of developing adequate data fragmentation strategies for these approaches. This is a difficult problem, but of paramount importance to the success of these approaches. We discuss the issues that influence data fragmentation. We present a number of algorithms, each focusing on one of the important issues. We discuss the pros and cons of the algorithms, and we give some results of ...
StreamJoin: A Generic Database Approach to Support the Class of Stream-Oriented Applications
- In International Database Engineering and Applications Symposium IDEAS
, 2000
"... Today many applications routinely generate large quantities of data. The data often takes the form of (time) series, or more generally streams, i.e. an ordered sequence of records. Analysis of this data requires stream processing techniques which differ in significant ways from what current database ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Today many applications routinely generate large quantities of data. The data often takes the form of (time) series, or more generally streams, i.e. an ordered sequence of records. Analysis of this data requires stream processing techniques which differ in significant ways from what current database analysis and query techniques have been optimized for. In this paper we present a new operator, called StreamJoin, that can efficiently be used to solve stream-related problems of various applications, such as universal quantification, pattern recognition and data mining. Contrary to other approaches, StreamJoin processing provides rapid response times, a non-blocking execution as well as economical resource utilization. Adaptability to different application scenarios is realized by means of parameters. In addition, the StreamJoin operator can be efficiently embedded into the database engine, thus implicitly using the optimization and parallelization capabilities for the benefit of the appl...
A Parallel and Distributed Approach for Finding Transitive Closures of Data Records: A Proposal
"... Abstract: In this paper, we propose an approach to find transitive closures on large data sets in distributed (i.e., parallel) environment. Finding transitive closures of data records is a preprocessing step of a two-step approach to data quality control, such as data accuracy, redundancy, consisten ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract: In this paper, we propose an approach to find transitive closures on large data sets in distributed (i.e., parallel) environment. Finding transitive closures of data records is a preprocessing step of a two-step approach to data quality control, such as data accuracy, redundancy, consistency, currency and completeness. The objective of finding transitive closures is to reduce the number of records to be considered in the second step, from a whole data source having hundreds of millions to billions of records to the range of hundreds to thousands. To process hundreds of millions to billions of records, an efficient approach is essential that works in distributed environment. As a part of this approach, this paper presents an efficient distributed algorithm for solving distributed transitive closure problem on large data sets. Due to huge volumes of data, many real world applications are in need of fast and efficient approaches for data analysis and data mining. However, data cleansing, which precedes data analysis and data mining, is becoming the center of research interest in recent years. As a part of it, the process of finding transitive closures (i.e., finding all related records) is the main goal of this paper. The computation of transitive closures of data records has two related but independent activities. One is to determine if two records are related based on some definition of relatedness
StreamJoin: A Generic Database Approach to Support the Class of Stream-Oriented Applications
"... ..."
Minimum Cost Path for a Shared Nothing Architecture
, 2004
"... Abstract: Computing the minimum cost path is a key requirement in Intelligent Transportation Systems (ITS) and in some Geographical Information Systems (GIS) applications. The major characteristics of these systems are the facts that the underlying transportation graph is large in size and the compu ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract: Computing the minimum cost path is a key requirement in Intelligent Transportation Systems (ITS) and in some Geographical Information Systems (GIS) applications. The major characteristics of these systems are the facts that the underlying transportation graph is large in size and the computation is under time constraint. Due to the insufficiency of the classic algorithms under these settings, recent studies have focused on speeding the computation by employing alternative techniques such as heuristics, precomputation and parallelization. In this study, we investigate solutions assuming a shared nothing architecture (i. e., Teradata multimedia database system) as a way of speeding up the computation further. We build our algorithms on a recently developed graph model, Hierarchical mulTigraph (HiTi), and describe both concurrent and parallel versions of the algorithms. The concurrent algorithm allows simultaneous exploration of the search space by utilizing dynamically created agents across multiple disk nodes, which is efficiently supported by the Teradata multimedia database system architecture. The parallel algorithm breaks the problem into a set of smaller subproblems by exploiting a set of intermediate nodes that the shortest path passes through. We also investigate the impact of replicating subgraphs in the performance of our algorithms. We evaluated our algorithms via a simulation study and demonstrated that our concurrent and parallel algorithms show almost a linear speedup as the number of disk/CPU nodes is increased. Concurrent algorithm exhibits better sizeup, and scaleup results than the parallel algorithm.