Results 1 - 10
of
73
PVM: A Framework for Parallel Distributed Computing
- Concurrency: Practice and Experience
, 1990
"... The PVM system is a programming environment for the development and execution of large concurrent or parallel applications that consist of many interacting, but relatively independent, components. It is intended to operate on a collection of heterogeneous computing elements interconnected by one or ..."
Abstract
-
Cited by 691 (27 self)
- Add to MetaCart
The PVM system is a programming environment for the development and execution of large concurrent or parallel applications that consist of many interacting, but relatively independent, components. It is intended to operate on a collection of heterogeneous computing elements interconnected by one or more networks. The participating processors may be scalar machines, multiprocessors, or special-purpose computers, enabling application components to execute on the architecture most appropriate to the algorithm. PVM provides a straightforward and general interface that permits the description of various types of algorithms (and their interactions), while the underlying infrastructure permits the execution of applications on a virtual computing environment that supports multiple parallel computation models. PVM contains facilities for concurrent, sequential, or conditional execution of application components, is portable to a variety of architectures, and supports certain forms of error dete...
Fortran D Language Specification
, 1990
"... This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution ..."
Abstract
-
Cited by 279 (47 self)
- Add to MetaCart
This paper presents Fortran D, a version of Fortran enhanced with data decomposition specifications. It is designed to support two fundamental stages of writing a data-parallel program: problem mapping using sophisticated array alignments, and machine mapping through a rich set of data distribution functions. We believe that Fortran D provides a simple machine-independent programming model for most numerical computations. We intend to evaluate its usefulness for both programmers and advanced compilers on a variety of parallel architectures.
Debugging concurrent programs
- ACM Computing Surveys
, 1989
"... The main problems associated with debugging concurrent programs are increased complexity, the “probe effect, ” nonrepeatability, and the lack of a synchronized global clock. The probe effect refers to the fact that any attempt to observe the behavior of a distributed system may change the behavior o ..."
Abstract
-
Cited by 164 (1 self)
- Add to MetaCart
The main problems associated with debugging concurrent programs are increased complexity, the “probe effect, ” nonrepeatability, and the lack of a synchronized global clock. The probe effect refers to the fact that any attempt to observe the behavior of a distributed system may change the behavior of that system. For some parallel programs,
Hypertool: A Programming Aid for Message-Passing Systems
- IEEE Trans. on Parallel and Distributed Systems
, 1990
"... Abstract|As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more di cult and error-prone. This paper discusses programming assistance and automation concepts and their application to a program development tool for messag ..."
Abstract
-
Cited by 146 (17 self)
- Add to MetaCart
Abstract|As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more di cult and error-prone. This paper discusses programming assistance and automation concepts and their application to a program development tool for message-passing systems called Hypertool. It performs scheduling and handles the communication primitive insertion automatically. Two algorithms, based on the critical-path method, are presented for scheduling processes statically. Hypertool also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs. I.
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers
- IEEE Transactions on Parallel and Distributed Systems
, 1992
"... An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper ..."
Abstract
-
Cited by 145 (17 self)
- Add to MetaCart
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper, we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data distribution, and show how, based on performance considerations, a compiler identifies constraints to be imposed on the distribution of various data structures. These constraints are then combined by the compiler to obtain a complete and consistent picture of the data distribution scheme, one that offers good performance in terms of the overall execution time. We present results of a study we performed on Fortran programs taken from the Linpack and Eispack libraries and the Perfect Benchmarks to determine the applicability of our approach to real programs. The results a...
Automatic Data Partitioning on Distributed Memory Multiprocessors
, 1991
"... An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper ..."
Abstract
-
Cited by 102 (6 self)
- Add to MetaCart
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Most of the current projects leave this tedious problem almost entirely to the user. In this paper, we present a novel approach to the problem of automatic data partitioning. We introduce the notion of constraints on data distribution, and show how, based on performance considerations, a compiler identifies constraints to be imposed on the distribution of various data structures. These constraints are then combined by the compiler to obtain a complete and consistent picture of the data distribution scheme, one that offers good performance in terms of the overall execution time.
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1991
"... This paper addresses the problem of partitioning data for distributed memory machines (multicomputers). In current day multicomputers, interprocessor communication is more time-consuming than instruction execution. If insufficient attention is paid to the data allocation problem, then the amount of ..."
Abstract
-
Cited by 81 (13 self)
- Add to MetaCart
This paper addresses the problem of partitioning data for distributed memory machines (multicomputers). In current day multicomputers, interprocessor communication is more time-consuming than instruction execution. If insufficient attention is paid to the data allocation problem, then the amount of time spent in interprocessor communication might be so high as to seriously undermine the benefits of parallelism. It is therefore worthwhile for a compiler to analyze patterns of data usage to determine allocation, in order to minimize interprocessor communication. We present a machineindependent analysis of communication-free partitions. We present a matrix notation to describe array accesses in fully parallel loops which lets us derive sufficient conditions for communication-free partitioning (decomposition) of arrays. In the case of a commonly occurring class of accesses, we present a problem formulation to minimize communication costs, when communication-free partitioning of arrays is not possible.
Compiling Collection-Oriented Languages onto Massively Parallel Computers
- Journal of Parallel and Distributed Computing
, 1990
"... : This paper introduces techniques for compiling the nested parallelism of collectionoriented languages onto existing parallel hardware. Programmers of parallel machines encounter nested parallelism whenever they write a routine that performs parallel operations, and then want to call that routine ..."
Abstract
-
Cited by 78 (10 self)
- Add to MetaCart
: This paper introduces techniques for compiling the nested parallelism of collectionoriented languages onto existing parallel hardware. Programmers of parallel machines encounter nested parallelism whenever they write a routine that performs parallel operations, and then want to call that routine itself in parallel. This occurs naturally in many applications. Most parallel systems, however, do not permit the expression of nested parallelism. This forces the programmer to exploit only one level of parallelism or to implement nested parallelism themselves. Both of these alternatives tend to produce code that is harder to maintain and less modular than code described at a higher-level with nested parallel constructs. Not permitting the expression of nested parallelism is analogous to not permitting nested loops in serial languages. This paper describes issues and techniques for taking high-level descriptions of parallelism in the form of operations on nested collections and automaticall...
Object Distribution in Orca using Compile-Time and Run-Time Techniques
, 1993
"... Orca is a language for parallel programming on distributed systems. Communication in Orca is based on shared data-objects, which is a form of distributed shared memory. The performance of Orca programs depends strongly on how shared dataobjects are distributed among the local physical memories of th ..."
Abstract
-
Cited by 77 (20 self)
- Add to MetaCart
Orca is a language for parallel programming on distributed systems. Communication in Orca is based on shared data-objects, which is a form of distributed shared memory. The performance of Orca programs depends strongly on how shared dataobjects are distributed among the local physical memories of the processors. This paper studies a new and efficient solution to this problem, based on an integration of compile-time and run-time techniques. The Orca compiler has been extended to determine the access patterns of processes to shared objects. The compiler passes a summary of this information to the run-time system, which uses it to make good decisions about which objects to replicate and where to store nonreplicated objects. Measurements show that the new system gives better overall performance than any previous implementation of Orca. 3333333333333333 1 This research was supported in part by a PIONIER grant from the Netherlands Organization for Scientific Research (N.W.O.). 2 This re...
Compiling Fortran 90D/HPF for distributed memory MIMD computers
- Journal of Parallel and Distributed Computing
, 1994
"... This paper describes the design of the Fortran90D/HPF compiler, a source-to-source parallel compiler for distributed memory systems being developed at Syracuse University. Fortran 90D/HPF is a data parallel language with special directives to specify data alignment and distributions. A systematic me ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
This paper describes the design of the Fortran90D/HPF compiler, a source-to-source parallel compiler for distributed memory systems being developed at Syracuse University. Fortran 90D/HPF is a data parallel language with special directives to specify data alignment and distributions. A systematic methodology to process distribution directives of Fortran 90D/HPF is presented. Furthermore, techniques for data and computation partitioning, communication detection and generation, and the run-time support for the compiler are discussed. Finally, initial performance results for the compiler are presented. We believe that the methodology to process data distribution, computation partitioning, communication system design and the overall compiler design can be used by the implementors of compilers for HPF.

