Results 1 - 10
of
23
Parallel database systems: the future of high performance database systems
- Communications of the ACM
, 1992
"... Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper ..."
Abstract
-
Cited by 466 (8 self)
- Add to MetaCart
Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews the techniques used by such systems, and surveys current commercial and research systems. 1.
Disk-directed I/O for MIMD Multiprocessors
, 1994
"... Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth ..."
Abstract
-
Cited by 217 (18 self)
- Add to MetaCart
Many scientific applications that run on today’s multiprocessors, such as weather forecasting and seismic analysis, are bottlenecked by their file-I/O needs. Even if the multiprocessor is configured with sufficient I/O hardware, the file-system software often fails to provide the available bandwidth to the application. Although libraries and enhanced file-system interfaces can make a significant improvement, we believe that fundamental changes are needed in the file-server software. We propose a new technique, disk-directed I/O, to allow the disk servers to determine the flow of data for maximum performance. Our simulations show that tremendous performance gains are possible. Indeed, disk-directed I/O provided consistent high performance that was largely independent of data distribution, obtained up to 93 % of peak disk bandwidth, and was as much as 16 times faster than traditional parallel file systems.
The Gamma database machine project
- IEEE Transactions on Knowledge and Data Engineering
, 1990
"... This paper describes the design of the Gamma database machine and the techniques employed in its implementation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives. Gamma employs three key technical ideas which enable the arc ..."
Abstract
-
Cited by 203 (27 self)
- Add to MetaCart
This paper describes the design of the Gamma database machine and the techniques employed in its implementation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives. Gamma employs three key technical ideas which enable the architecture to be scaled to 100s of processors. First, all relations are horizontally partitioned across multiple disk drives enabling relations to be scanned in parallel. Second, novel parallel algorithms based on hashing are used to implement the complex relational operators such as join and aggregate functions. Third, dataflow scheduling techniques are used to coordinate multioperator queries. By using these techniques it is possible to control the execution of very complex queries with minimal coordination- a necessity for configurations involving a very large number of processors. In addition to describing the design of the Gamma software, a thorough performance evaluation of the iPSC/2 hypercube version of Gamma is also presented. In addition to measuring the effect of relation size and indices on the response time for selection, join, aggregation, and update queries, we also analyze the performance of Gamma relative to the number of processors employed when the sizes of the input relations are kept constant (speedup) and when the sizes of the input relations are increased proportionally to the number of processors (scaleup). The speedup results obtained for both selection and join queries are linear; thus, doubling the number of processors
Parallel sorting on a shared-nothing architecture using probabilistic splitting
, 1991
"... We consider the problem of external sorting in a shared-nothing multiprocessor. A critical step in the algorithms we consider is to determine the range of sort keys to be handled by each processor. We consider two techniques for determining these ranges of sort keys: exact splitting, using a paralle ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
We consider the problem of external sorting in a shared-nothing multiprocessor. A critical step in the algorithms we consider is to determine the range of sort keys to be handled by each processor. We consider two techniques for determining these ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, which uses sampling to estimate quantiles. We present analytic results showing that probabilistic splitting performs better than exact splitting. Finally, we present experimental results from an implementation of sorting via probabilistic splitting in the Gamma parallel database machine.
An Evaluation of Non-Equijoin Algorithms
- IN VLDB
, 1991
"... A non-equijoin of relations R and S is a band join if the join predicate requires values in the join attribute of R to fall within a speci ed band about the values in the join attribute of S. We propose a new algorithm, termed a partitioned band join, for evaluating band joins. We present a comparis ..."
Abstract
-
Cited by 72 (0 self)
- Add to MetaCart
A non-equijoin of relations R and S is a band join if the join predicate requires values in the join attribute of R to fall within a speci ed band about the values in the join attribute of S. We propose a new algorithm, termed a partitioned band join, for evaluating band joins. We present a comparison between the partitioned band join algorithm and the classical sort-merge join algorithm (optimized for band joins) using both an analytical model and an implementation on top of the WiSS storage system. The results show that the partitioned band join algorithm outperforms sortmerge unless memory is scarce and the operands of the join are of equal size. We also describe a parallel implementation of the partitioned band join on the Gamma database machine, and present data from speedup and scaleup experiments demonstrating that the partitioned band join is efficiently parallelizable.
Parallel Database Systems: The Future of Database Processing or a Passing Fad?
- SIGMOD RECORD
, 1991
"... Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews th ..."
Abstract
-
Cited by 46 (6 self)
- Add to MetaCart
Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews the techniques used by such systems, and surveys current commercial and research systems.
The Wisconsin benchmark: Past, present, and future. The Benchmark Handbook for Database and Transaction Processing Systems
, 1991
"... In 1981 as we were completing the implementation of the DIRECT database machine [DEWI79, BORA82], attention turned to evaluating its performance. At that time no standard database benchmark existed. There were only a few application-specific benchmarks. While application-specific benchmarks measure ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
In 1981 as we were completing the implementation of the DIRECT database machine [DEWI79, BORA82], attention turned to evaluating its performance. At that time no standard database benchmark existed. There were only a few application-specific benchmarks. While application-specific benchmarks measure which database system is best for a particular
Expanding the potential for disk-directed I/O
- In Proceedings of the 1995 IEEE Symposium on Parallel and Distributed Processing
, 1995
"... As parallel computers are increasingly used to run scienti c applications with large data sets, and as processor speeds continue to increase, it becomes more important to provide fast, e ective parallel le systems for data storage and for temporary les. In an earlier work we demonstrated that a tech ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
As parallel computers are increasingly used to run scienti c applications with large data sets, and as processor speeds continue to increase, it becomes more important to provide fast, e ective parallel le systems for data storage and for temporary les. In an earlier work we demonstrated that a technique we call disk-directed I/O has the potential to provide consistent high performance for large, collective, structured I/O requests. In this paper we expand on this potential by demonstrating the ability of a disk-directed I/O system to read irregular subsets of data from a le, and to lter and distribute incoming data according to data-dependent functions. 1
A Case for Parallelism in Data Warehousing and OLAP
- IN THE 9TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA98
, 1998
"... In recent years the database community has experienced a tremendous increase in the availability of new technologies to support efficient storage and retrieval of large volumes of data, namely data warehousing and On-Line Analytical Processing (OLAP) products. Efficient query processing is critical ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
In recent years the database community has experienced a tremendous increase in the availability of new technologies to support efficient storage and retrieval of large volumes of data, namely data warehousing and On-Line Analytical Processing (OLAP) products. Efficient query processing is critical in such an environment, yet achieving quick response times with OLAP queries is still largely an open issue. In this paper we propose a solution approach to this problem by applying parallel processing techniques to a warehouse environment. We suggest an efficient partitioning strategy based on the relational representation of a data warehouse (i.e., star schema). Furthermore, we incorporate a particular indexing strategy, DataIndexes, to further improve query processing times and parallel resource utilization, and propose a preliminary parallel star-join strategy.

