Results 1 - 10
of
83
Interprocedural Compilation of Fortran D for MIMD Distributed-Memory Machines
- COMMUNICATIONS OF THE ACM
, 1992
"... Algorithms exist for compiling Fortran D for MIMD distributed-memory machines, but are significantly restricted in the presence of procedure calls. This paper presents interprocedural analysis, optimization, and code generation algorithms for Fortran D that limit compilation to only one pass over ea ..."
Abstract
-
Cited by 300 (46 self)
- Add to MetaCart
Algorithms exist for compiling Fortran D for MIMD distributed-memory machines, but are significantly restricted in the presence of procedure calls. This paper presents interprocedural analysis, optimization, and code generation algorithms for Fortran D that limit compilation to only one pass over each procedure. This is accomplished by collecting summary information after edits, then compiling procedures in reverse topological order to propagate necessary information. Delaying instantiation of the computation partition, communication, and dynamic data decomposition is key to enabling interprocedural optimization. Recompilation analysis preserves the benefits of separate compilation. Empirical results show that interprocedural optimization is crucial in achieving acceptable performance for a common application.
Algorithms for the Satisfiability (SAT) Problem: A Survey
- DIMACS Series in Discrete Mathematics and Theoretical Computer Science
, 1996
"... . The satisfiability (SAT) problem is a core problem in mathematical logic and computing theory. In practice, SAT is fundamental in solving many problems in automated reasoning, computer-aided design, computeraided manufacturing, machine vision, database, robotics, integrated circuit design, compute ..."
Abstract
-
Cited by 107 (3 self)
- Add to MetaCart
. The satisfiability (SAT) problem is a core problem in mathematical logic and computing theory. In practice, SAT is fundamental in solving many problems in automated reasoning, computer-aided design, computeraided manufacturing, machine vision, database, robotics, integrated circuit design, computer architecture design, and computer network design. Traditional methods treat SAT as a discrete, constrained decision problem. In recent years, many optimization methods, parallel algorithms, and practical techniques have been developed for solving SAT. In this survey, we present a general framework (an algorithm space) that integrates existing SAT algorithms into a unified perspective. We describe sequential and parallel SAT algorithms including variable splitting, resolution, local search, global optimization, mathematical programming, and practical SAT algorithms. We give performance evaluation of some existing SAT algorithms. Finally, we provide a set of practical applications of the sat...
Compiler Optimizations for Fortran D on MIMD Distributed-Memory Machines
- In Proceedings of the 1992 ACM International Conference on Supercomputing
, 1991
"... Massively parallel MIMD distributed-memory machines can provide enormous computation power. However, the difficulty of developing parallel programs for these machines has limited their accessibility. This paper presents compiler algorithms to automatically derive efficient message-passing programs b ..."
Abstract
-
Cited by 96 (13 self)
- Add to MetaCart
Massively parallel MIMD distributed-memory machines can provide enormous computation power. However, the difficulty of developing parallel programs for these machines has limited their accessibility. This paper presents compiler algorithms to automatically derive efficient message-passing programs based on data decompositions. Optimizations are presented to minimize load imbalance and communication costs for both loosely synchronous and pipelined loops. These techniques are employed in the compiler being developed at Rice University for Fortran D, a version of Fortran enhanced with data decomposition specifications. 1 Introduction It is widely recognized that parallel computing represents the only plausible way to continue to increase the computational power available to computational scientists and engineers. However, parallel computers are not likely to be widely successful until they are easy to program. A major component in the success of vector supercomputers is the ability of ...
A Survey of Collective Communication in Wormhole-Routed Massively Parallel Computers
- IEEE COMPUTER
, 1994
"... Massively parallel computers (MPC) are characterized by the distribution of memory among an ensemble of nodes. Since memory is physically distributed, MPC nodes communicate by sending data through a network. In order to program an MPC, the user may directly invoke low-level message passing primitive ..."
Abstract
-
Cited by 93 (6 self)
- Add to MetaCart
Massively parallel computers (MPC) are characterized by the distribution of memory among an ensemble of nodes. Since memory is physically distributed, MPC nodes communicate by sending data through a network. In order to program an MPC, the user may directly invoke low-level message passing primitives, may use a higher-level communications library, or may write the program in a data parallel language and rely on the compiler to translate language constructs into communication operations. Whichever method is used, the performance of communication operations directly affects the total computation time of the parallel application. Communication operations may be either point-to-point, which involves a single source and a single destination, or collective, in which more than two processes participate. This paper discusses the design of collective communication operations for current systems that use the wormhole routing switching strategy, in which messages are divided into small pieces and...
Integrating Message-Passing and Shared-Memory: Early Experience
, 1993
"... This paper discusses some of the issues involved in implementing a shared-address space programming model on large-scale, distributed-memory multiprocessors. While such a programming model can be implemented on both shared-memory and messagepassing architectures, we argue that the transparent, coher ..."
Abstract
-
Cited by 85 (14 self)
- Add to MetaCart
This paper discusses some of the issues involved in implementing a shared-address space programming model on large-scale, distributed-memory multiprocessors. While such a programming model can be implemented on both shared-memory and messagepassing architectures, we argue that the transparent, coherent caching of global data provided by many shared-memory architectures is of crucial importance. Because message-passing mechanisms are much more efficient than shared-memory loads and stores for certain types of interprocessor communication and synchronization operations, however, we argue for building multiprocessors that efficiently support both shared-memory and message-passing mechanisms. We describe an architecture, Alewife, that integrates support for shared-memory and message-passing through a simple interface; we expect the compiler and runtime system to cooperate in using appropriate hardware mechanisms that are most efficient for specific operations. We report on both integrated and exclusively shared-memory implementations of our runtime system and two applications. The integrated runtime system drastically cuts down the cost of communication incurred by the scheduling, load balancing, and certain synchronization operations. We also present preliminary performance results comparing the two systems.
A Linear Algebra Framework for Static HPF Code Distribution
, 1995
"... High Performance Fortran (hpf) was developed to support data parallel programming for simd and mimd machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to ..."
Abstract
-
Cited by 72 (7 self)
- Add to MetaCart
High Performance Fortran (hpf) was developed to support data parallel programming for simd and mimd machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to allocate arrays in the local memories, to assign computations to elementary processors and to migrate data between processors when required. We show here that linear algebra is a powerful framework to encode Hpf directives and to synthesize distributed code with space-efficient array allocation, tight loop bounds and vectorized communications for INDEPENDENT loops. The generated code includes traditional optimizations such as guard elimination, message vectorization and aggregation, overlap analysis... The systematic use of an affine framework makes it possible to prove the compilation scheme correct. An early version of this paper was presented at the Fourth International Workshop on Comp...
Evaluating Compiler Optimizations For Fortran D
, 1994
"... The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empiric ..."
Abstract
-
Cited by 68 (4 self)
- Add to MetaCart
The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empirically evaluated for stencil computations. Communication optimizations reduce communication overhead by decreasing the number of messages and hide communication overhead by overlapping the cost of remaining messages with local computation. Parallelism optimizations exploit parallel and pipelined computations, and may need to restructure the computation to increase parallelism. Profitability formulas are derived for each optimization. Empirical results show that exploiting parallelism for pipelined computations, reductions, and scans is vital. Message vectorization, collective communication, and efficient coarse-grain pipelining also significantly affect performance. Scalability of communicatio...
An Overview of the Fortran D Programming System
- IN PROCEEDINGS OF THE FOURTH WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING
, 1991
"... The success of large-scale parallel architectures is limited by the difficulty of developing machine-independent parallel programs. We have developed Fortran D, a version of Fortran extended with data decomposition specifications, to provide a portable data-parallel programming model. This paper pre ..."
Abstract
-
Cited by 66 (16 self)
- Add to MetaCart
The success of large-scale parallel architectures is limited by the difficulty of developing machine-independent parallel programs. We have developed Fortran D, a version of Fortran extended with data decomposition specifications, to provide a portable data-parallel programming model. This paper presents the design of two key components of the Fortran D programming system: a prototype compiler and an environment to assist automatic data decomposition. The Fortran D compiler addresses program partitioning, communication generation and optimization, data decomposition analysis, run-time support for unstructured computations, and storage management. The Fortran D programming environment provides a static performance estimator and an automatic data partitioner. We believe that the Fortran D programming system will significantly ease the task of writing machine-independent data-parallel programs.
Preliminary Experiences with the Fortran D Compiler
- IN PROCEEDINGS OF SUPERCOMPUTING '93
, 1993
"... Fortran D is a version of Fortran enhanced with data decomposition specifications. Case studies illustrate strengths and weaknesses of the prototype Fortran D comprier when compiling hnear algebra codes and whole programs. Statement groups, execution conditions, inter-loop communication optimization ..."
Abstract
-
Cited by 56 (17 self)
- Add to MetaCart
Fortran D is a version of Fortran enhanced with data decomposition specifications. Case studies illustrate strengths and weaknesses of the prototype Fortran D comprier when compiling hnear algebra codes and whole programs. Statement groups, execution conditions, inter-loop communication optimizations, multi-reductions, and array kills for rephcated arrays are identified as new compilation issues. On the Intel iPSC/860, the output of the proto- type Fortran D compiler approaches the performance of hand-optimized code for parallel computations, but needs improvement for hnear algebra and pipehned codes. The Fortran D compiler outperforms the CM Fortran compiler (2.1 beta) by a factor of four or more on the TMC CM-5 when not using vector units. Better analysis, run-time support, and flexibility are required for the prototype compiler to be useful for a wider range of programs.
Using Integer Sets for Data-Parallel Program Analysis and Optimization
- In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation
, 1998
"... In this paper, we describe our experience with using an abstract integer-set framework to develop the Rice dHPF compiler, a compiler for High Performance Fortran. We present simple, yet general formulations of the major computation partitioning and communication analysis tasks as well as a number of ..."
Abstract
-
Cited by 55 (30 self)
- Add to MetaCart
In this paper, we describe our experience with using an abstract integer-set framework to develop the Rice dHPF compiler, a compiler for High Performance Fortran. We present simple, yet general formulations of the major computation partitioning and communication analysis tasks as well as a number of important optimizations in terms of abstract operations on sets of integer tuples. This approach has made it possible to implement a comprehensive collection of advanced optimizations in dHPF, and to do so in the context of a more general computation partitioning model than previous compilers. One potential limitation of the approach is that the underlying class of integer set problems is fundamentally unable to represent HPF data distributions on a symbolic number of processors. We describe how we extend the approach to compile codes for a symbolic number of processors, without requiring any changes to the set formulations for the above optimizations. We show experimentally that the set re...

