Results 1 - 10
of
23,516
Unifying Data and Control Transformations for Distributed Shared-Memory Machines
, 1994
"... We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations involve changing the execution order of programs. We have developed new techniques for compiler optimiz ..."
Abstract
-
Cited by 176 (10 self)
- Add to MetaCart
optimizations for distributed shared-memory machines, although the same techniques can be used for sequential machines with a memory hierarchy. Our compiler optimizations are based on an algebraic representation of data mappings and a new data locality model. We present a pure data transformation algorithm
Parallel sequence mining on shared-memory machines
- Journal of Parallel and Distributed Computing
, 2001
"... We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques, and simple join operations. Further each clas ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
class can be solved independently on each processor requiring no synchronization. However, dynamic inter-class and intraclass load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup
MPI Performance Comparison on Distributed and Shared Memory Machines
, 1996
"... The widely implemented MPI Standard [10] defines primitives for point-to-point interprocessor communication (IPC), collective IPC, and synchronization based on message passing. The main reason to use a message passing standard is to ease the development, porting, and execution of applications on ..."
Abstract
- Add to MetaCart
on the variety of parallel computers that can support the paradigm, including shared memory, distributed memory, and shared memory array multiprocessors. This paper compares the SGI Power Challenge, a shared memory multiprocessor, with the Intel Paragon, a distributed memory machine. This paper addresses two
Implementation Tradeoffs in Distributed Shared Memory Machines
"... The construction of a cache-coherent distributed shared memory (DSM) machine involves many organizational and implementation trade-offs. This paper studies the performance implications of these trade-offs as made on some real DSM machines. We focus on characteristics related to communication and exa ..."
Abstract
- Add to MetaCart
The construction of a cache-coherent distributed shared memory (DSM) machine involves many organizational and implementation trade-offs. This paper studies the performance implications of these trade-offs as made on some real DSM machines. We focus on characteristics related to communication
Parallel Sequence Mining on Shared-Memory Machines
- Journal of Parallel and Distributed Computing
, 2000
"... We present pSPADE, a parallel algorithm for fast discovery of frequent sequences in large databases. pSPADE decomposes the original search space into smaller suffix-based classes. Each class can be solved in main-memory using efficient search techniques, and simple join operations. Further each clas ..."
Abstract
- Add to MetaCart
class can be solved independently on each processor requiring no synchronization. However, dynamic inter-class and intraclass load balancing must be exploited to ensure that each processor gets an equal amount of work. Experiments on a 12 processor SGI Origin 2000 shared memory system show good speedup
Comparison of MPI implementations on a shared memory machine
- in in Proceedings of the 15th IPDPS 2000 Workshops on Parallel and Distributed Processing
, 2000
"... Abstract. There are several alternative MPI implementations available to parallel application developers. LAM MPI and MPICH are the most common. System vendors also provide their own implementations of MPI. Each version of MPI has options that can be tuned to best t the characteristics of the applic ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. There are several alternative MPI implementations available to parallel application developers. LAM MPI and MPICH are the most common. System vendors also provide their own implementations of MPI. Each version of MPI has options that can be tuned to best t the characteristics of the application and platform. The parallel application developer needs to know which implementation and options are best suited to the problem and platform at hand. In this study the RTCOMM1 communication benchmark from the Real Time Parallel Benchmark Suite is used to collect performance data on several MPI implementations for a Sun Enterprise 4500. This benchmark provides the data needed to create a re ned cost model for each MPI implementation and to produce visualizations of those models. In addition, this benchmark provides best, worst, and typical message passing performance data which is of particular interest to real-time parallel programmers. 1
ALGORITHMS ON PC CLUSTERS AND SHARED MEMORY MACHINES
"... In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PC clusters based on Coarse Grained Multicomputer (CGM) algorithms. CGMgraph implements parallel methods for various graph problems. Our implementations of deterministic list ranking, Euler tour, connected ..."
Abstract
- Add to MetaCart
In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PC clusters based on Coarse Grained Multicomputer (CGM) algorithms. CGMgraph implements parallel methods for various graph problems. Our implementations of deterministic list ranking, Euler tour, connected components, spanning forest, and bipartite graph detection are, to our knowledge, the first efficient implementations for PC clusters. Our library also includes CGMlib, a library of basic CGM tools such as sorting, prefix sum, one-to-all broadcast, all-to-one gather, h-Relation, all-to-all broadcast, array balancing, and CGM partitioning. Both libraries are available for download at
Pipelined Iterative Methods for Shared Memory Machines
, 1987
"... In this paper we describe a new parallel iterative technique to solve a set of linear equations. The technique can be applied to any serial iterative scheme and involves pipelining sllccessive iterations. We give an example of this technique by modifying the classical successive Qver-relaxation meth ..."
Abstract
- Add to MetaCart
In this paper we describe a new parallel iterative technique to solve a set of linear equations. The technique can be applied to any serial iterative scheme and involves pipelining sllccessive iterations. We give an example of this technique by modifying the classical successive Qver-relaxation method (SOR). The algorithm is implemented on a Sequent Balance 21000 and the experimental results are presented.
Program transformation and runtime support for threaded MPI execution on shared-memory machines
- ACM Transactions on Programming Languages and Systems
, 2000
"... Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared memory machines map each MPI node to an OS process, which can suffer serious per ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared memory machines map each MPI node to an OS process, which can suffer serious
Results 1 - 10
of
23,516