Results 1  10
of
13
Partitioning an Array onto a Mesh of Processors
 In Proc. of the Workshop on Applied Parallel Computing in Industrial Problems
, 1996
"... . Achieving an even load balance with a low communication overhead is a fundamental task in parallel computing. In this paper we consider the problem of partitioning an array into a number of blocks such that the maximum amount of work in any block is as low as possible. We review different proposed ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
. Achieving an even load balance with a low communication overhead is a fundamental task in parallel computing. In this paper we consider the problem of partitioning an array into a number of blocks such that the maximum amount of work in any block is as low as possible. We review different proposed schemes for this problem and the complexity of their communication pattern. We present new approximation algorithms for computing a well balanced generalized block distribution as well as an algorithm for computing an optimal semigeneralized block distribution. The various algorithms are tested and compared on a number of different matrices. 1 Introduction A basic task in parallel computing is the partitioning and subsequent distribution of data to processors. The problem one faces in this operation is how to balance two often contradictory aims; finding an equal distribution of the computational work and at the same time minimizing the imposed communication. In the data parallel model th...
Parallelization Techniques for Sparse Matrix Applications
, 1996
"... Sparse matrix problems are difficult to parallelize efficiently on distributed memory machines since data is often accessed indirectly. Inspector/executor strategies, which are typically used to parallelize loops with indirect references, incur substantial runtime preprocessing overheads when refere ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Sparse matrix problems are difficult to parallelize efficiently on distributed memory machines since data is often accessed indirectly. Inspector/executor strategies, which are typically used to parallelize loops with indirect references, incur substantial runtime preprocessing overheads when references with multiple levels of indirection are encountered  a frequent occurrence in sparse matrix algorithms. The sparse array rolling (SAR) technique, introduced in [15], significantly reduces these preprocessing overheads. This paper outlines the SAR approach and describes its runtime support accompanied by a detailed performance evaluation. The results demonstrate that SAR yields significant reduction in preprocessing overheads compared to standard inspector/executor techniques. 1 Introduction Sparse matrices are used in a large number of important scientific codes, such as molecular dynamics, CFD solvers, finite element methods and climate modelling. Unfortunately, these applications...
On the Complexity of the Generalized Block Distribution
, 1996
"... We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an efficient way of achieving an even ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an efficient way of achieving an even load balance while at the same time imposing a simple communication pattern. In this paper we consider the problem of computing an optimal generalized block distribution. We show that this problem is NPcomplete even for very simple cost functions. We also classify a number of variants of the general problem.
Softspec: Softwarebased Speculative Parallelism
 In 3rd ACM Workshop on FeedbackDirected and Dynamic Optimization (FDDO3
, 2000
"... We present Softspec, a technique for parallelizing sequential applications using a hybrid compiletime and runtime technique. Softspec parallelizes loops whose memory references are stridepredictable. By detecting and speculatively executing potential parallelism at runtime Softspec eliminates the ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
We present Softspec, a technique for parallelizing sequential applications using a hybrid compiletime and runtime technique. Softspec parallelizes loops whose memory references are stridepredictable. By detecting and speculatively executing potential parallelism at runtime Softspec eliminates the need for complex program analysis required by parallelizing compilers. By using runtime information Softspec succeeds in parallelizing loops whose memory access patterns are statically indeterminable. For example, Softspec can parallelize while loops with unanalyzable exit conditions, linked list traversals, and sparse matrix applications with predictable memory patterns. We show performance results using our software prototype implementation.
The design and implementation of a parallel array operator for the arbitrary remapping of data
 In Proceedings of the ACM Conference on Principles and Practice of Parallel Programming
, 2003
"... Gather and scatter are data redistribution functions of longstanding importance to high performance computing. In this paper, we present a highlygeneral array operator with powerful gather and scatter capabilities unmatched by other array languages. We discuss an efficient parallel implementation, ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Gather and scatter are data redistribution functions of longstanding importance to high performance computing. In this paper, we present a highlygeneral array operator with powerful gather and scatter capabilities unmatched by other array languages. We discuss an efficient parallel implementation, introducing three new optimizations—schedule compression, dead array reuse, and direct communication—that reduce the costs associated with the operator’s wide applicability. In our implementation of this operator in ZPL, we demonstrate performance comparable to the handcoded Fortran + MPI versions of the NAS FT and CG benchmarks. Categories and Subject Descriptors D.3.2 [Programming Languages]: Language Classifications—concurrent, distributed and parallel languages
Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90
 J. Supercomputing
, 2001
"... Fortran 90 provides a rich set of array intrinsic functions. Each of these array intrinsic functions operates on the elements of multidimensional array objects concurrently. They provide a rich source of parallelism and play an increasingly important role in automatic support of data parallel progr ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Fortran 90 provides a rich set of array intrinsic functions. Each of these array intrinsic functions operates on the elements of multidimensional array objects concurrently. They provide a rich source of parallelism and play an increasingly important role in automatic support of data parallel programming. However, there is no such support if these intrinsic functions are applied to sparse data sets. In this paper, we address this open gap by presenting an efficient library for parallel sparse computations with Fortran 90 array intrinsic operations. Our method provides both compression schemes and distribution schemes on distributed memory environments applicable to higherdimensional sparse arrays. This way, programmers need not worry about lowlevel system details when developing sparse applications. Sparse programs can be expressed concisely using array expressions, and parallelized with the help of our library. Our sparse libraries are built for array intrinsics of Fortran 90, and they include an extensive set of array operations such as CSHIFT, EOSHIFT, MATMUL, MERGE, PACK, SUM, RESHAPE, SPREAD, TRANSPOSE, UNPACK, and section moves. Our work, to our best knowledge, is the first work to give sparse and parallel sparse supports for array intrinsics of Fortran 90. In addition, we provide a complete complexity analysis for our sparse implementation. The complexity of our algorithms is in proportion to the number of nonzero elements in the arrays, and that is consistent with the conventional design criteria for sparse algorithms and data structures. Our current testbed is an IBM SP2 workstation cluster. Preliminary experimental results with numerical routines, numerical applications, and dataintensive applications related to OLAP (online analytical processing) show ...
Compiler Optimization for Parallel Sparse Programs with Array Intrinsics of Fortran 90
 In the International Conference on Parallel Processing
, 1999
"... ..."
Efficient Support of Parallel Sparse Computation for Array Intrinsic Functions of Fortran 90
, 1998
"... Fortran 90 provides a rich set of array intrinsic functions. They form a rich source of parallelism and play an increasingly important role in automatic support of data parallel programming. However, there is no such support if these intrinsic functions are applied to sparse data sets. We address th ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Fortran 90 provides a rich set of array intrinsic functions. They form a rich source of parallelism and play an increasingly important role in automatic support of data parallel programming. However, there is no such support if these intrinsic functions are applied to sparse data sets. We address this open gap by presenting an efficient library for parallel sparse computations with Fortran 90 array intrinsic operations. Our method provides both compression schemes and distribution schemes on distributed memory environments applicable to higherdimensional sparse arrays. Sparse programs can be expressed concisely using array expressions, and parallelized with the help of our library. Preliminary experimental results on an IBM SP2 workstation cluster show that our approach is promising in supporting efficient sparse matrix computations on both sequential and distributed memory environments. 1 Introduction An increasing number of programming languages, such as APL, Fortran 90, High Perfor...
Experimental Evaluation of Efficient Sparse Matrix Distributions
 10th ACM Int'l. Conf. on Supercomputing, Philadelphia, PN
, 1996
"... Sparse matrix problems are difficult to parallelize efficiently on distributed memory machines since nonzero elements are unevenly scattered and are accessed via multiple levels of indirection. Distributions that achieve good load balance and locality are hard to compute and also lead to further in ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Sparse matrix problems are difficult to parallelize efficiently on distributed memory machines since nonzero elements are unevenly scattered and are accessed via multiple levels of indirection. Distributions that achieve good load balance and locality are hard to compute and also lead to further indirection in locating distributed data. This paper evaluates alternative distribution strategies which trade off the quality of loadbalance and locality for lower decomposition costs and efficient lookup. The proposed techniques are compared with previous strategies for parallelizing sparse matrix problems and the relative merits of each method is outlined. 1 Introduction Sparse matrices are used in a large number of important scientific codes, such as molecular dynamics, CFD solvers, finite element methods and climate modelling. Unfortunately, these applications are hard to parallelize efficiently, particularly using automated compiler techniques. This is because sparse matrices are repre...
Development and Implementation of DataParallel Compilation Techniques for Sparse Codes
, 1995
"... Over the past few years, dataparallel compilation has been imposed as a standard method for generating parallel code from a sequential program with a small number of annotations. Those languages were initially focussed on regular computation and then extended with some general concepts for irregula ..."
Abstract
 Add to MetaCart
Over the past few years, dataparallel compilation has been imposed as a standard method for generating parallel code from a sequential program with a small number of annotations. Those languages were initially focussed on regular computation and then extended with some general concepts for irregular applications that don't retain the efficiency in most of the complex cases. In [18], we extend the functionality of dataparallel languages in order to address sparse codes with an efficient code generation [19, 21]. This paper describes the implementation of the compiler required by those methods. We explain how the tool was developed, designed and the compilation strategies implemented to be able to successfully compile and parallelize real sparse codes. We illustrate with an example the sequence of transformations that leads to the final SPMD program, the set of information extracted and handled at compiletime and the different tasks to postpone at runtime.